TY - JOUR
T1 - Discriminant analysis of interval data
T2 - an assessment of parametric and distance-based approaches
AU - Silva, A. Pedro Duarte
AU - Brito, Paula
N1 - Publisher Copyright:
© 2015, Classification Society of North America.
Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2015/10/1
Y1 - 2015/10/1
N2 - Building on probabilistic models for interval-valued variables, parametric classification rules, based on Normal or Skew-Normal distributions, are derived for interval data. The performance of such rules is then compared with distancebased methods previously investigated. The results show that Gaussian parametric approaches outperform Skew-Normal parametric and distance-based ones in most conditions analyzed. In particular, with heterocedastic data a quadratic Gaussian rule always performs best. Moreover, restricted cases of the variance-covariance matrix lead to parsimonious rules which for small training samples in heterocedastic problems can outperform unrestricted quadratic rules, even in some cases where the model assumed by these rules is not true. These restrictions take into account the particular nature of interval data, where observations are defined by both MidPoints and Ranges, which may or may not be correlated. Under homocedastic conditions linear Gaussian rules are often the best rules, but distance-based methods may perform better in very specific conditions.
AB - Building on probabilistic models for interval-valued variables, parametric classification rules, based on Normal or Skew-Normal distributions, are derived for interval data. The performance of such rules is then compared with distancebased methods previously investigated. The results show that Gaussian parametric approaches outperform Skew-Normal parametric and distance-based ones in most conditions analyzed. In particular, with heterocedastic data a quadratic Gaussian rule always performs best. Moreover, restricted cases of the variance-covariance matrix lead to parsimonious rules which for small training samples in heterocedastic problems can outperform unrestricted quadratic rules, even in some cases where the model assumed by these rules is not true. These restrictions take into account the particular nature of interval data, where observations are defined by both MidPoints and Ranges, which may or may not be correlated. Under homocedastic conditions linear Gaussian rules are often the best rules, but distance-based methods may perform better in very specific conditions.
KW - Discriminant analysis
KW - Interval data
KW - Parametric modelling of interval data
KW - Symbolic data analysis
UR - http://www.scopus.com/inward/record.url?scp=84947129703&partnerID=8YFLogxK
U2 - 10.1007/s00357-015-9189-8
DO - 10.1007/s00357-015-9189-8
M3 - Article
AN - SCOPUS:84947129703
SN - 0176-4268
VL - 32
SP - 516
EP - 541
JO - Journal of Classification
JF - Journal of Classification
IS - 3
ER -