Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches

Margarita Kotti*, Luís Gustavo P.M. Martins, Emmanouil Benetos, Jaime S. Cardoso, Constantine Kotropoulos

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Citations (Scopus)

Abstract

This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian Information Criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method consists of three modules. In the first module, a measure based on second-order statistics is used; in the second module, the Euclidean distance and T2 Hotelling statistic are applied; and in the third module, the BIC is utilized. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. A comparison between the performance of the three systems is made based on t-statistics.
Original languageEnglish
Title of host publication2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
Pages1101-1104
Number of pages4
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Toronto, ON, Canada
Duration: 9 Jul 200612 Jul 2006

Publication series

Name2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
Volume2006

Conference

Conference2006 IEEE International Conference on Multimedia and Expo, ICME 2006
Country/TerritoryCanada
CityToronto, ON
Period9/07/0612/07/06

Fingerprint

Dive into the research topics of 'Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches'. Together they form a unique fingerprint.

Cite this