Measuring similarity of complex and heterogeneous data in clustering of large data sets

Helena Bacelar-Nicolau*, Fernando Nicolau, Áurea Sousa, Leonor Bacelar-Nicolau

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis methods and techniques for finding a clustering structure on a dataset. That may refer either to groups of statistical data units or to groups of variables. In this work we deal with a generalization of this paradigm concerning clustering of complex data described by three different types of variables, frequently present in a three-way context. We obtain compatible versions of the same affinity coefficient for measuring similarity between statistical data units described by those three types of variables. A global generalized similarity coefficient is analyzed for such kind of mixed data, often arising in data mining or knowledge mining.

Original languageEnglish
Pages (from-to)9-18
Number of pages10
JournalBiocybernetics and Biomedical Engineering
Volume29
Issue number2
Publication statusPublished - 2009
Externally publishedYes

Keywords

  • Cluster analysis
  • Different type variables
  • Similarity coefficient
  • Three-way data

Fingerprint

Dive into the research topics of 'Measuring similarity of complex and heterogeneous data in clustering of large data sets'. Together they form a unique fingerprint.

Cite this