TY - GEN
T1 - Automatic speaker segmentation using multiple features and distance measures
T2 - 2006 IEEE International Conference on Multimedia and Expo, ICME 2006
AU - Kotti, Margarita
AU - Martins, Luís Gustavo P.M.
AU - Benetos, Emmanouil
AU - Cardoso, Jaime S.
AU - Kotropoulos, Constantine
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2006
Y1 - 2006
N2 - This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian Information Criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method consists of three modules. In the first module, a measure based on second-order statistics is used; in the second module, the Euclidean distance and T2 Hotelling statistic are applied; and in the third module, the BIC is utilized. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. A comparison between the performance of the three systems is made based on t-statistics.
AB - This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian Information Criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method consists of three modules. In the first module, a measure based on second-order statistics is used; in the second module, the Euclidean distance and T2 Hotelling statistic are applied; and in the third module, the BIC is utilized. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. A comparison between the performance of the three systems is made based on t-statistics.
UR - http://www.scopus.com/inward/record.url?scp=34247559206&partnerID=8YFLogxK
U2 - 10.1109/ICME.2006.262727
DO - 10.1109/ICME.2006.262727
M3 - Conference contribution
AN - SCOPUS:34247559206
SN - 1424403677
SN - 9781424403677
T3 - 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
SP - 1101
EP - 1104
BT - 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
Y2 - 9 July 2006 through 12 July 2006
ER -