Article Figures & Tables
Tables
- Table 1:
Demographic and relevant clinical features comparing patients with type 1 and type 2 diabetes
Characteristic Group; % of patients (95% CI)*† Type 2 diabetes
n = 1199Type 1 diabetes
n = 110Total
n = 1309Sex, male 53.5 (50.6–56.3) 47.3 (37.7–57.0) 52.9 (50.2–55.7) Age, yr, mean (95% CI) 64.6 (63.9–65.3) 46.0 (42.8–49.2) 63.0 (62.3–63.8) No. of encounters in past year, mean (95% CI) 5.1 (4.8–5.3) 4.0 (3.2–4.8) 5.0 (4.8–5.2) Prescription for insulin (A10AB - -)‡ In past year 6.6 (5.3–8.2) 30.0 (21.8–39.6) 8.6 (7.1–10.2) In past 2 years 8.8 (7.2–10.5) 47.3 (37.8–57.0) 12.0 (10.3–13.9) At any time 13.0 (11.2–15.1) 76.4 (67.1–83.7) 18.3 (16.3–20.6) Prescription for blood glucose–lowering drugs excluding insulin (A10B - - - )‡ In past year 45.5 (42.6–48.3) 12.7 (7.4–20.8) 42.7 (40.0–45.4) In past 2 years 54.6 (51.8–57.5) 20.9 (14.0–29.9) 51.8 (49.0–54.5) At any time 71.9 (69.2–74.4) 26.4 (18.6–35.8) 68.1 (65.5–70.6) Occurrence of “type 1” in any text field 0.7 (0.3–1.4) 40.0 (30.9–49.8) 4.0 (3.0–5.2) Billing code 250.01 in past year 0 10.0 (5.3–17.6) 0.8 (0.4–1.5) Occurrence of “type 2” in any text field 26.3 (23.8–28.9) 7.3 (3.4–14.3) 24.7 (22.4–27.1) Occurrence of “diabetes” in any text field 95.3 (93.9–96.4) 99.1 (94.3–100) 95.6 (94.4–96.7) Note: CI = confidence interval.
↵* Except where indicated otherwise.
↵† CIs for proportions are exact.
↵‡ The parenthetical notation represents relevant codes in the Anatomical Therapeutic Chemical Classification system, where each code is 7 characters long and dashes represent “wild card” characters. Specifically, insulin is represented by various codes in which the first 5 characters are A10AB, and blood glucose–lowering drugs other than insulin are represented by various codes in which the first 4 characters are A10B.
- Table 2:
Ten-fold cross-validation results for each of 4 machine learning algorithms, minimizing or maximizing various metrics*
Metric and algorithm Sensitivity, % Specificity, % PPV, % NPV, % Accuracy, %† Misclassification rate C5.0 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5) CaRT 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5) CHAID 40.0 (30.9–49.8) 99.3 (98.6–99.7) 84.6 (71.4–92.7) 94.7 (93.3–95.9) 94.3 (92.9–95.5) LASSO 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5) F1 score C5.0 61.8 (52.0–70.8) 96.5 (95.2–97.4) 61.8 (52.0–70.8) 96.5 (95.2–97.4) 93.5 (92.0–94.8) CaRT 60.9 (51.1–69.9) 96.3 (95.0–97.3) 60.4 (50.6–69.4) 96.4 (95.1–97.3) 93.3 (91.8–94.6) CHAID 51.8 (42.1–61.4) 98.6 (97.7–99.1) 77.0 (65.5–85.7) 95.7 (94.3–96.7) 94.6 (93.2–95.8) LASSO 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5) PPV C5.0 43.6 (34.3–53.4) 99.1 (98.3–99.5) 81.4 (68.7–89.9) 95.0 (93.6–96.1) 94.4 (93.0–95.5) CaRT 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5) CHAID‡ 42.7 (33.5–52.5) 99.3 (98.6–99.7) 85.5 (72.8–93.1) 94.9 (93.5–96.1) 94.5 (93.1–95.7) LASSO 40.9 (31.8–50.7) 99.3 (98.6–99.7) 84.9 (71.9–92.8) 94.8 (93.4–95.9) 94.4 (93.0–95.5) Youden J statistic C5.0 85.5 (77.2–91.2) 85.5 (83.4–87.5) 35.3 (29.7–41.5) 98.5 (97.4–99.1) 85.5 (83.5–87.4) CaRT 80.9 (72.1–87.5) 89.2 (87.2–90.8) 40.8 (34.3–47.7) 98.1 (97.0–98.8) 88.5 (86.6–90.1) CHAID 52.7 (43.0–62.2) 97.9 (96.9–98.6) 69.9 (58.7–79.2) 95.7 (94.4–96.8) 94.1 (92.6–95.3) LASSO‡ 87.3 (79.2–92.6) 85.4 (83.2–87.3) 35.6 (29.9–41.6) 98.6 (97.7–99.2) 85.5 (83.5–87.4) Note: CaRT = classification and regression tree, CHAID = chi-square automated interaction detection, LASSO = least absolute shrinkage and selection operator, NPV = negative predictive value, PPV = positive predictive value.
↵* The misclassification rate metric was minimized, whereas the F1 score, PPV and Youden J statistic metrics were maximized.
↵† A dummy classifier that assumes all cases were type 2 diabetes would achieve an accuracy of 91.6%.
↵‡ Instances reported as final case definitions.
Type of analysis Case definition CHAID with maximized PPV Any of the following 2 criteria: Anywhere text “type 1”
Age < 22 yr at time of original diabetes diagnosis
LASSO with maximized Youden J statistic Any of the following criteria: Anywhere text “type 1”
Any occurrence of A10AB- - in the medication table (insulin and analogues for injection, fast acting)†
Age < 30 yr at time of original diabetes diagnosis
Note: CHAID = chi-square automated interaction detection, LASSO = least absolute shrinkage and selection operator, PPV = positive predictive value.
↵* Disease status assumed to be type 2 diabetes or a diabetes subtype, unless the patient meets criteria for type 1 diabetes.
↵† The specified notation represents relevant codes in the Anatomical Therapeutic Chemical Classification system, where each code is 7 characters long and dashes represent “wild card” characters. Specifically, insulin is represented by various codes in which the first 5 characters are A10AB.