Rutgers Robert Wood Johnson Medical School New Brunswick, New Jersey, United States
Disclosure information not submitted.
Background/Purpose: Administrative claims databases enable research in large populations with JIA. We previously showed that machine learning (ML)-based algorithms accurately identify new JIA diagnoses within US commercial insurance claims data. We externally validated these algorithms within US public insurance claims data. Methods: We performed a cross-sectional validation study using US commercial health plan data (2013-20) and national US Medicaid data (2013-18). We identified children diagnosed with JIA (ICD-9-CM: 696.0, 714, 720; ICD-10-CM: L40.5, M05, M06, M08, M45) before age 18 after ≥12 months’ continuous enrollment without JIA diagnosis or immunosuppression. JIA diagnoses were based on 3 previously validated definitions: 1) rheumatologist's diagnosis plus ≥2 specific lab test orders; 2) ≥2 outpatient diagnoses 8-52 weeks apart; or 3) 1 inpatient diagnosis. A random set of available qualifying charts were abstracted and independently adjudicated as definite, probable, possible, or unlikely JIA by clinical experts; discrepancies were resolved by consensus. Incident JIA was defined as definite or probable JIA diagnosed ≤4 months before first JIA claim. ML-based algorithms used simulation-based balancing and logistic regression regularization hyperparameters with 10-fold cross-validation. We used optimal predictive model variables to assess sensitivity (Se), specificity (Sp), and positive predictive value (PPV) (95% confidence interval [CI]). We also tested rule-based algorithms refined based on provider type, JIA diagnosis counts, laboratory test counts, and documented JIA treatment. We compared results across databases and ICD types. Results: Of 298 eligible charts reviewed (182 commercial, 116 public), 151 had incident JIA (ICD-9 commercial 58%, public 53%; ICD-10 commercial 41%, public 52%). Optimal ML-based algorithms derived within commercial claims data enabled excellent discrimination between incident JIA and unlikely JIA (ICD-9: Se 100%, Sp 96%, PPV 97%; ICD-10: Se 100%, Sp 97%, PPV 97%) (Table 1). However, the same algorithm was not accurate within the Medicaid sample (ICD-9: Se 97%, Sp 19%, PPV 67%; ICD-10: Se 100%, Sp 29%, PPV 70%), and more accurate algorithms derived within Medicaid data used distinct sets of predictive variables (Table). Moreover, optimal ML-based algorithms differed in number and types of predictors across ICD-9 and ICD-10 data. Rule-based algorithms had lower specificity and/or sensitivity, but refined algorithms were more accurate and consistent across databases and ICD types (Table 2-3). Preferred rule-based algorithms required either: 1) rheumatologist’s outpatient diagnosis plus ≥4-5 specific lab orders, or 2) ≥5 outpatient JIA visits (first diagnosis not for eye care) plus any JIA treatment. Conclusion: While ML-based diagnostic algorithms for incident JIA performed well within each database and ICD type, results differed across databases and ICD types. In contrast, refined rule-based algorithms had better external validity, with similarly high PPVs across databases and ICD code types. These preferred rule-based algorithms will improve the quality of future claims-based research on the diagnosis, management, and outcomes of newly diagnosed JIA.