Pulmonary embolism (PE) is a life-threatening event, and many clinical scores for risk stratification of PE exist, but PE scores are limited and provide unsatisfactory results. We aimed to develop a predictive model using machine learning (ML) for acute PE in hospitalized patients.
As a derivation-validation retrospective study, this study was conducted in two large academic medical centers. The derivation cohort consisted of inpatients from January 2013 to December 2017, and the validation cohort was established between January to December 2017. Patient data are extracted from electronic medical records (EMR). To develop a prediction model, we trained two decision tree-based ML algorithms: the extreme gradient boosting (XGBoost) and the random forest (RF). The following clinical scores were used to compare the performance of prediction; Wells' criterion, simplified Wells' criterion, modified Geneva score, simplified Geneva score, YEARS algorithm, and Padua score.
Two ML models were developed from 18,764 inpatients. Of these, 638 patients (3.4%) were diagnosed with PE. The XGBoost model obtained an AUROC of 0.814, and the RF model obtained an AUROC of 0.692. The XGBoost model demonstrated the most remarkable performance for predicting PE among predictive models, including the pre-existing clinical scores. In the validation cohort, 43 patients (2.0%) showed PE, and the XGBoost model obtained an AUROC of 0.914 for predicting PE, demonstrating the high reliability of the predictive model.
Our predictive model using XGBoost for acute PE could be a promising tool to improve patient outcomes by identifying PE occurrence and supporting appropriate intervention.