TY - GEN
T1 - The C-ORAL-BRASIL I
T2 - 8th International Conference on Language Resources and Evaluation, LREC 2012
AU - Raso, Tommaso
AU - Mello, Heliana
AU - Mittmann, Maryuale M.
N1 - Funding Information:
The C-ORAL-BRASIL project was funded by the National Council for Scientific and Technological Development (CNPq), the Foundation for Research Support of Minas Gerais (Fapemig), the Faculty of Letters of the Federal University of Minas Gerais (Fale/UFMG) and by Santander Bank. We thank the coordinators of the C-ORAL-ROM project Massimo Moneglia and Emanuela Cresti for the constant help and support, as well as the members of the Linguistic Laboratory of the Italianistic Department of the University of Florence (LABLITA): Alessandro Panunzi, Ida Tucci, Lorenzo Gregori and Gloria Gagliardi.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - C-ORAL-BRASIL I is a Brazilian Portuguese spontaneous speech corpus compiled following the same architecture adopted by the C-ORAL-ROM resource. The main goal is the documentation of the diaphasic and diastratic variations in Brazilian Portuguese. The diatopic variety represented is that of the metropolitan area of Belo Horizonte, capital city of Minas Gerais. Even though it was not a primary goal, a nice balance was achieved in terms of speakers' diastratic features (sex, age and school level). The corpus is entirely dedicated to informal spontaneous speech and comprises 139 informal speech texts, 208,130 words and 21:08:52 hours of recording, distributed into family/private (80%) and public (20%) contexts. The LR includes audio files, transcripts in text format and text-to-speech alignment (accessible with WinPitch Pro software). C-ORAL-BRASIL I also provides transcripts with Part-of-Speech annotation implemented through the parser system Palavras. Transcripts were validated regarding the proper application of transcription criteria and also for the annotation of prosodic boundaries. Some quantitative features of C-ORAL-BRASIL I in comparison with the informal C-ORAL-ROM are reported.
AB - C-ORAL-BRASIL I is a Brazilian Portuguese spontaneous speech corpus compiled following the same architecture adopted by the C-ORAL-ROM resource. The main goal is the documentation of the diaphasic and diastratic variations in Brazilian Portuguese. The diatopic variety represented is that of the metropolitan area of Belo Horizonte, capital city of Minas Gerais. Even though it was not a primary goal, a nice balance was achieved in terms of speakers' diastratic features (sex, age and school level). The corpus is entirely dedicated to informal spontaneous speech and comprises 139 informal speech texts, 208,130 words and 21:08:52 hours of recording, distributed into family/private (80%) and public (20%) contexts. The LR includes audio files, transcripts in text format and text-to-speech alignment (accessible with WinPitch Pro software). C-ORAL-BRASIL I also provides transcripts with Part-of-Speech annotation implemented through the parser system Palavras. Transcripts were validated regarding the proper application of transcription criteria and also for the annotation of prosodic boundaries. Some quantitative features of C-ORAL-BRASIL I in comparison with the informal C-ORAL-ROM are reported.
KW - Brazilian Portuguese
KW - C-ORAL-BRASIL
KW - Spontaneous speech
UR - http://www.scopus.com/inward/record.url?scp=85037378677&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85037378677
T3 - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
SP - 106
EP - 113
BT - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
A2 - Dogan, Mehmet Ugur
A2 - Mariani, Joseph
A2 - Moreno, Asuncion
A2 - Goggi, Sara
A2 - Choukri, Khalid
A2 - Calzolari, Nicoletta
A2 - Odijk, Jan
A2 - Declerck, Thierry
A2 - Maegaard, Bente
A2 - Piperidis, Stelios
A2 - Mazo, Helene
A2 - Hamon, Olivier
PB - European Language Resources Association (ELRA)
Y2 - 21 May 2012 through 27 May 2012
ER -