Multilingual financial narrative processing: analyzing annual reports in English, Spanish, and Portuguese

Mahmoud El-Haj, Paul Rayson, Paulo Alves, Carlos Herrero-Zorita, Steven Young

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

14 Citations (Scopus)

Abstract

This chapter describes and evaluates the use of information extraction (IE) and natural language processing (NLP) methods for extraction and analysis of financial annual reports in three languages: English, Spanish, and Portuguese. The work described retains information on document structure which is needed to enable a clear distinction between narrative and financial statement components of annual reports and between individual sections within the narratives component. Extraction accuracy varies between languages with English exceeding 95%. We apply the extraction methods on a comprehensive sample of annual reports published by UK, Spanish, and Portuguese non-financial firms between 2003 and 2014.

Original languageEnglish
Title of host publicationMultilingual text analysis
Subtitle of host publicationchallenges, models, and approaches
EditorsMarina Litvak, Natalia Vanetik
PublisherWorld Scientific Publishing Co.
Pages441-463
Number of pages23
ISBN (Electronic)9789813274884
ISBN (Print)9789813274877
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Dive into the research topics of 'Multilingual financial narrative processing: analyzing annual reports in English, Spanish, and Portuguese'. Together they form a unique fingerprint.

Cite this