Advances in Diabetes & Endocrinology

Research Article

Validated Models Using EHRs or Claims Data to Distinguish Diabetes Type among Adults

Campione JR1*, Nooney JG2, Kirkman MS2, Pfaff E3, Mardon R1, Benoit SR4, McKeever-Bullard K4, Yang DH1, Rivero G1, Rolka D4 and Saydah S4

1Westat, Rockville, MD, USA
2Division of Endocrinology and Metabolism, Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
3NC TraCS Institute, University of North Carolina, Chapel Hill, NC, USA
4Division of Diabetes Translation, Centers for Disease Control and Prevention, Atlanta, GA, USA
*Address for Correspondence: Campione JR, Westat, Rockville, MD, USA; Tel: 919-768-7325; E-mail: joannecampione@westat.com
Submission: 21 December, 2022
Accepted: 23 January, 2023
Published: 26 January, 2023
Copyright: © 2023 Campione JR, et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Purpose: Clinical data provides the opportunity for efficient and timely disease surveillance. We developed and validated advanced phenotyping models to classify adult patients with diabetes to type 1, type 2, or other/indeterminate using structured fields from EHR data. To simulate the use of claims data supplemented with medication information, we compared model performance before and after the removal of body mass index (BMI) and laboratory results.
Methods: We used 3 years of EHR data from a sample of 2,465 adult patients with diabetes from a health care system’s clinical data warehouse. A weighted ratio of type 1 diabetes codes to all diabetes codes was created by down-weighting codes from care settings that do not treat diabetes. We developed two multinomial regression models and a machine learning conditional inference tree to classify patients to type 1, type 2, or other/indeterminate. The models were validated by calculating sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) relative to a gold standard.
Results: For all models, the weighted ratio of type 1 diabetes was the strongest predictive factor. The models had validation statistics ≥ 93% for sensitivity; ≥ 87% for specificity; ≥ 88% for PPV, and ≥ 93% for NPV. After removal of BMI and laboratory data from the regression model the largest decline in performance from the full model was in type 2 diabetes specificity (90.8% to 89.2%).
Conclusion: Prediction models and machine learning conditional inference trees using either structured fields from EHR data or claims data supplemented with medication data can be used to accurately distinguish diabetes type among adults. The inclusion of BMI and laboratory results improves model specificity for type 2 diabetes.