Abstract

Background Since the beginning of the COVID-19 crisis, healthcare policies tried to overcome the overload of healthcare systems by targeting patients with a higher risk of developing a severe form of the disease.

Aim Concept and compare machine learning models predicting poor outcomes in patients hospitalized for COVID-19.

Methods We analyzed data from 1954 patients admitted to the COVID-19 unit in Sfax, Tunisia, between November 2020 and May 2022. Clinical, biological, and radiological findings were integrated into three machine-learning (ML) models (Elastic net regularized logistic regression (ElasticNet), random forest (RF), and k-nearest neighbors (KNN)). Prediction performance was compared within the three ML models after training and repeated 10-fold cross-validation. A compound endpoint consisting of mortality and transfer to the intensive care unit (ICU) was used to evaluate the outcome.

Results Prediction performance of the endpoint was similar within the 3 models (Matthews correlation coefficient (MCC) = 0.64, 0.64, 0.59 for ElasticNet, RF, and KNN respectively). KNN was more likely to reflect the heterogeneity of the population (AUC-ROC = 0.86 for a cut-off = 0.76). Length of hospital stay, percutaneous oxygen saturation (SpO2), heart rate, troponin level, lymphocyte count, and prothrombin ratio, were 6 of the 10 top important variables selected by the three models to predict poor outcomes. The radiologic severity was not important for the three models.

Conclusion Benchmarking of ML models can drive insights from the high amount of data collected in times of crisis.