Advances in Diabetes & Endocrinology
Research Article
Validated Models Using EHRs or Claims Data to Distinguish Diabetes Type among Adults
Campione JR1*, Nooney JG2, Kirkman MS2, Pfaff E3, Mardon R1, Benoit SR4, McKeever-Bullard K4, Yang DH1, Rivero G1, Rolka D4 and Saydah S4
1Westat, Rockville, MD, USA
2Division of Endocrinology and Metabolism, Department of
Medicine, University of North Carolina, Chapel Hill, NC, USA
3NC TraCS Institute, University of North Carolina, Chapel Hill, NC,
USA
4Division of Diabetes Translation, Centers for Disease Control and
Prevention, Atlanta, GA, USA
*Address for Correspondence:
Campione JR, Westat, Rockville, MD, USA; Tel: 919-768-7325; E-mail:
joannecampione@westat.com
Submission: 21 December, 2022
Accepted: 23 January, 2023
Published: 26 January, 2023
Copyright: © 2023 Campione JR, et al. This is an open access article
distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium,
provided the original work is properly cited.
Abstract
Purpose: Clinical data provides the opportunity for efficient and
timely disease surveillance. We developed and validated advanced
phenotyping models to classify adult patients with diabetes to type 1,
type 2, or other/indeterminate using structured fields from EHR data.
To simulate the use of claims data supplemented with medication
information, we compared model performance before and after the
removal of body mass index (BMI) and laboratory results.
Methods: We used 3 years of EHR data from a sample of 2,465
adult patients with diabetes from a health care system’s clinical data
warehouse. A weighted ratio of type 1 diabetes codes to all diabetes
codes was created by down-weighting codes from care settings that
do not treat diabetes. We developed two multinomial regression
models and a machine learning conditional inference tree to classify
patients to type 1, type 2, or other/indeterminate. The models were
validated by calculating sensitivity, specificity, positive predictive
value (PPV), and negative predictive value (NPV) relative to a gold
standard.
Results: For all models, the weighted ratio of type 1 diabetes was
the strongest predictive factor. The models had validation statistics ≥
93% for sensitivity; ≥ 87% for specificity; ≥ 88% for PPV, and ≥ 93% for NPV.
After removal of BMI and laboratory data from the regression model
the largest decline in performance from the full model was in type 2
diabetes specificity (90.8% to 89.2%).
Conclusion: Prediction models and machine learning conditional
inference trees using either structured fields from EHR data or claims
data supplemented with medication data can be used to accurately
distinguish diabetes type among adults. The inclusion of BMI and
laboratory results improves model specificity for type 2 diabetes.