Authors: Brian Ayers MD MBA, Ray Funahashi MD, Veysel Kocaman PhD, Enes Hosgor PhD
Can an automated approach help identify clinically significant biases within machine learning prognostic models?
Of the 1,343 total patients included in the study, 179 (13%) patients died. The final model accuracy on the validation cohort overall was 80% (minority class F1 score=0.39, AUC=0.663). However, using our automated tool there were numerous clinically significant differences identified in model accuracy on different patient subsets. For example, the model was much more accurate for patients requiring the intensive care unit (86% accuracy) but was much worse for other sub cohorts such as current smokers (60% accuracy) and male patients (78% accuracy). Moreover, there were many sub cohorts with insufficient patient data to perform sufficient analysis.
These data demonstrate the high risk for model performance discrepancies on subset of patients with different characteristics. Using a standardized, automated approach for systematic model validation is instrumental in minimizing model biases before implementing a machine learning model in a clinical setting.