An automated approach for machine learning model performance analysis on sub cohorts to assess for biases

Authors: Brian Ayers MD MBA, Ray Funahashi MD, Veysel Kocaman PhD, Enes Hosgor PhD

1. Research Questions

Can an automated approach help identify clinically significant biases within machine learning prognostic models?

2. Findings

Of the 1,343 total patients included in the study, 179 (13%) patients died. The final model accuracy on the validation cohort overall was 80% (minority class F1 score=0.39, AUC=0.663). However, using our automated tool there were numerous clinically significant differences identified in model accuracy on different patient subsets. For example, the model was much more accurate for patients requiring the intensive care unit (86% accuracy) but was much worse for other sub cohorts such as current smokers (60% accuracy) and male patients (78% accuracy). Moreover, there were many sub cohorts with insufficient patient data to perform sufficient analysis.

3. Conclusions

These data demonstrate the high risk for model performance discrepancies on subset of patients with different characteristics. Using a standardized, automated approach for systematic model validation is instrumental in minimizing model biases before implementing a machine learning model in a clinical setting.