This chapter describes and evaluates the use of information extraction (IE) and natural language processing (NLP) methods for extraction and analysis of financial annual reports in three languages: English, Spanish, and Portuguese. The work described retains information on document structure which is needed to enable a clear distinction between narrative and financial statement components of annual reports and between individual sections within the narratives component. Extraction accuracy varies between languages with English exceeding 95%. We apply the extraction methods on a comprehensive sample of annual reports published by UK, Spanish, and Portuguese non-financial firms between 2003 and 2014.
|Title of host publication||Multilingual text analysis|
|Subtitle of host publication||challenges, models, and approaches|
|Editors||Marina Litvak, Natalia Vanetik|
|Publisher||World Scientific Publishing Co.|
|Number of pages||23|
|Publication status||Published - 1 Jan 2019|