Methods
We evaluated a deep learning algorithm (DL), for classifying HRCT based on ATS/ERS/JRS/ALAT IPF guideline criteria (SOFIA), among an international group of radiologists and pulmonologists. Participants evaluated HRCTs from 203 suspected IPF patients, assigning a likelihood score for each of the guideline-based HRCT categories (each 0-100%, summing to 100%). SOFIA scores were then provided, and participants were given the opportunity to revise their scores. Agreement on (weighted kappa) and prognostic accuracy (Cox regression and C-index) of 1) UIP scores, 2) guideline-based diagnosis and 3) INBUILD categorisation (UIP/probable UIP vs indeterminate/alternative diagnosis – i.e., trial screening mode) were evaluated.
Results
116 participants completed the study, including 20 ILD trained radiologists. The majority opinion of ILD radiologists on each HRCT was used as a diagnostic reference standard. SOFIA improved agreement for UIP probability scores among all participants, excluding the ILD radiologists, (0.67 [IQR 0.57-0.73] vs 0.71 [IQR, 0.65-0.76], p=2.1x10-5) and guideline-based diagnoses (0.50 [IQR 0.43-0.54] vs 0.61 [IQR, 0.56-0.66], p=2.8x10-16) and INBUILD categorisation (0.42 [IQR 0.35-0.47] vs 0.56 [IQR, 0.49-0.62], p=7.1x10-19). Prognostic accuracy for UIP probability scores (mortality) were good for radiologist scoring (n=116, C-index=0.60 [IQR 0.58-0.62]), and these improved with the addition of SOFIA (C-index=0.63 [IQR 0.61-0.65], p=3.6x10-12).
Conclusion
In pulmonary fibrosis, DL support may improve accuracy of HRCT diagnoses, provide prognostic information and faciliate screening in clinical trials.