TY - JOUR
T1 - Optimization approaches to supervised classification
AU - Silva, A. Pedro Duarte
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2017/9/1
Y1 - 2017/9/1
N2 - The Supervised Classification problem, one of the oldest and most recurrent problems in applied data analysis, has always been analyzed from many different perspectives. When the emphasis is placed on its overall goal of developing classification rules with minimal classification cost, Supervised Classification can be understood as an optimization problem. On the other hand, when the focus is in modeling the uncertainty involved in the classification of future unknown entities, it can be formulated as a statistical problem. Other perspectives that pay particular attention to pattern recognition and machine learning aspects of Supervised Classification have also a long history that has lead to influential insights and different methodologies. In this review, two approaches to Supervised Classification strongly related to optimization theory will be discussed and compared. In particular, we will review methodologies based on Mathematical Programming models that optimize observable criteria linked to the true objective of misclassification error (or cost) minimization, and approaches derived from the minimization of known bounds on the true misclassification error. The former approach is known as the Mathematical Programming approach to Supervised Classification, while the latter is in the origin of the well known Classification Support Vector Machines. Throughout the review two-group as well as general multi-group problems will be considered, and the review will conclude with a discussion of the most promising research directions in this area.
AB - The Supervised Classification problem, one of the oldest and most recurrent problems in applied data analysis, has always been analyzed from many different perspectives. When the emphasis is placed on its overall goal of developing classification rules with minimal classification cost, Supervised Classification can be understood as an optimization problem. On the other hand, when the focus is in modeling the uncertainty involved in the classification of future unknown entities, it can be formulated as a statistical problem. Other perspectives that pay particular attention to pattern recognition and machine learning aspects of Supervised Classification have also a long history that has lead to influential insights and different methodologies. In this review, two approaches to Supervised Classification strongly related to optimization theory will be discussed and compared. In particular, we will review methodologies based on Mathematical Programming models that optimize observable criteria linked to the true objective of misclassification error (or cost) minimization, and approaches derived from the minimization of known bounds on the true misclassification error. The former approach is known as the Mathematical Programming approach to Supervised Classification, while the latter is in the origin of the well known Classification Support Vector Machines. Throughout the review two-group as well as general multi-group problems will be considered, and the review will conclude with a discussion of the most promising research directions in this area.
KW - Discriminant analysis
KW - Mathematical programming
KW - Multivariate statistics
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=85016750152&partnerID=8YFLogxK
U2 - 10.1016/j.ejor.2017.02.020
DO - 10.1016/j.ejor.2017.02.020
M3 - Article
AN - SCOPUS:85016750152
SN - 0377-2217
VL - 261
SP - 772
EP - 788
JO - European Journal of Operational Research
JF - European Journal of Operational Research
IS - 2
ER -