TY - GEN
T1 - Multistage modeling for the classification of numerical and categorical datasets
AU - Salgado, Cátia M.
AU - Fernandes, Marta P.
AU - Horta, Alexandra
AU - Xavier, Miguel
AU - Sousa, João M.C.
AU - Vieira, Susana M.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/23
Y1 - 2017/8/23
N2 - Logistic regression and Takagi-Sugeno fuzzy models are sequentially trained with categorical and numerical data in an ensemble-based multistage scheme. In the first stage, a logistic regression model is used to transform the binary feature space into a numerical feature that is used to train a second stage of models consisting of an ensemble of two Takagi-Sugeno fuzzy models. In the ensemble, one model is trained in the space of numerical features and first stage prediction values. The other model is trained only with samples that were classified with a low degree of confidence by the first stage model, in the space of numerical variables. The final output is given by the average of the ensemble predictions at second stage. This scheme was devised under the hypothesis that separating binary from numerical features in the modeling process would increase the performance of a single model using both types of features together. The proposed multistage approach is used to solve a clinical classification problem in a Portuguese hospital. The problem consists of predicting comanagement signalling based on patient clinical data, including diagnosis, procedures, comorbidities and numerical scores, collected before surgery. The multistage performed better in the comanagement dataset, and in 2 out of 5 benchmark datasets.
AB - Logistic regression and Takagi-Sugeno fuzzy models are sequentially trained with categorical and numerical data in an ensemble-based multistage scheme. In the first stage, a logistic regression model is used to transform the binary feature space into a numerical feature that is used to train a second stage of models consisting of an ensemble of two Takagi-Sugeno fuzzy models. In the ensemble, one model is trained in the space of numerical features and first stage prediction values. The other model is trained only with samples that were classified with a low degree of confidence by the first stage model, in the space of numerical variables. The final output is given by the average of the ensemble predictions at second stage. This scheme was devised under the hypothesis that separating binary from numerical features in the modeling process would increase the performance of a single model using both types of features together. The proposed multistage approach is used to solve a clinical classification problem in a Portuguese hospital. The problem consists of predicting comanagement signalling based on patient clinical data, including diagnosis, procedures, comorbidities and numerical scores, collected before surgery. The multistage performed better in the comanagement dataset, and in 2 out of 5 benchmark datasets.
UR - https://www.scopus.com/pages/publications/85030164707
U2 - 10.1109/FUZZ-IEEE.2017.8015665
DO - 10.1109/FUZZ-IEEE.2017.8015665
M3 - Conference contribution
AN - SCOPUS:85030164707
T3 - IEEE International Conference on Fuzzy Systems
BT - 2017 IEEE International Conference on Fuzzy Systems, FUZZ 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Fuzzy Systems, FUZZ 2017
Y2 - 9 July 2017 through 12 July 2017
ER -