TY - JOUR
T1 - Risk of mortality and cardiopulmonary arrest in critical patients presenting to the emergency department using machine learning and natural language processing
AU - Fernandes, Marta
AU - Mendes, Rúben
AU - Vieira, Susana M.
AU - Leite, Francisca
AU - Palos, Carlos
AU - Johnson, Alistair
AU - Finkelstein, Stan
AU - Horng, Steven
AU - Celi, Leo Anthony
N1 - Funding Information:
This work was supported by the Portuguese Foundation for Science & Technology (FCT) (URL 1), through IDMEC, under LAETA, project UIDB/50022/2020 and LISBOA-01-0145-FEDER-031474 supported by Programa Operacional Regional de Lisboa by FEDER (URL 2) and FCT. The work of Marta Fernandes was supported by the PhD Scholarship PD/BD/114150/ 2016 from FCT. URL 1: https://www.fct.pt/ URL 2: https://www.europarl.europa.eu/factsheets/pt/ sheet/95/el-fondo-europeo-de-desarrollo-regionalfeder- The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors would like to acknowledge Hospital Beatriz Ângelo for having provided access to their databases for this study. There are no conflicts of interest.
Publisher Copyright:
© 2020 Fernandes et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2020/4/2
Y1 - 2020/4/2
N2 - Emergency department triage is the first point in time when a patient’s acuity level is determined. The time to assign a priority at triage is short and it is vital to accurately stratify patients at this stage, since under-triage can lead to increased morbidity, mortality and costs. Our aim was to present a model that can assist healthcare professionals in triage decision making, namely in the stratification of patients through the risk prediction of a composite critical outcome—mortality and cardiopulmonary arrest. Our study cohort consisted of 235826 adult patients triaged at a Portuguese Emergency Department from 2012 to 2016. Patients were assigned to emergent, very urgent or urgent priorities of the Manchester Triage System (MTS). Demographics, clinical variables routinely collected at triage and the patients’ chief complaint were used. Logistic regression, random forests and extreme gradient boosting were developed using all available variables. The term frequency–inverse document frequency (TF-IDF) natural language processing weighting factor was applied to vectorize the chief complaint. Stratified random sampling was used to split the data into train (70%) and test (30%) data sets. Ten-fold cross validation was performed in train to optimize model hyper-parameters. The performance obtained with the best model was compared against the reference model—a regularized logistic regression trained using only triage priorities. Extreme gradient boosting exhibited good calibration properties and yielded areas under the receiver operating characteristic and precision-recall curves of 0.96 (95% CI 0.95-0.97) and 0.31 (95% CI 0.26-0.36), respectively. The predictors ranked with higher importance by this model were the Glasgow coma score, the patients’ age, pulse oximetry and arrival mode. Compared to the reference, the extreme gradient boosting model using clinical variables and the chief complaint presented higher recall for patients assigned MTS-3 and can identify those who are at risk of the composite outcome.
AB - Emergency department triage is the first point in time when a patient’s acuity level is determined. The time to assign a priority at triage is short and it is vital to accurately stratify patients at this stage, since under-triage can lead to increased morbidity, mortality and costs. Our aim was to present a model that can assist healthcare professionals in triage decision making, namely in the stratification of patients through the risk prediction of a composite critical outcome—mortality and cardiopulmonary arrest. Our study cohort consisted of 235826 adult patients triaged at a Portuguese Emergency Department from 2012 to 2016. Patients were assigned to emergent, very urgent or urgent priorities of the Manchester Triage System (MTS). Demographics, clinical variables routinely collected at triage and the patients’ chief complaint were used. Logistic regression, random forests and extreme gradient boosting were developed using all available variables. The term frequency–inverse document frequency (TF-IDF) natural language processing weighting factor was applied to vectorize the chief complaint. Stratified random sampling was used to split the data into train (70%) and test (30%) data sets. Ten-fold cross validation was performed in train to optimize model hyper-parameters. The performance obtained with the best model was compared against the reference model—a regularized logistic regression trained using only triage priorities. Extreme gradient boosting exhibited good calibration properties and yielded areas under the receiver operating characteristic and precision-recall curves of 0.96 (95% CI 0.95-0.97) and 0.31 (95% CI 0.26-0.36), respectively. The predictors ranked with higher importance by this model were the Glasgow coma score, the patients’ age, pulse oximetry and arrival mode. Compared to the reference, the extreme gradient boosting model using clinical variables and the chief complaint presented higher recall for patients assigned MTS-3 and can identify those who are at risk of the composite outcome.
UR - http://www.scopus.com/inward/record.url?scp=85082829842&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0230876
DO - 10.1371/journal.pone.0230876
M3 - Article
C2 - 32240233
AN - SCOPUS:85082829842
SN - 1932-6203
VL - 15
SP - 1
EP - 20
JO - PLoS one
JF - PLoS one
IS - 4
M1 - e0230876
ER -