Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata

From Wikipedia Quality
Revision as of 22:11, 30 June 2019 by Zoey (talk | contribs) (+ Infobox work)
Jump to: navigation, search


Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata
Authors
Pawel Chrzaszcz
Publication date
2016
DOI
10.18653/v1/W16-1815
Links
Original Preprint

Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata - scientific work related to Wikipedia quality published in 2016, written by Pawel Chrzaszcz.

Overview

Linguistic resources for Polish are often missing multiword expressions (MWEs) – idioms, compound nouns and other expressions which have their own distinct meaning as a whole. This paper describes an effort to extract and recognize nominal MWEs in Polish text using Wikipedia, inflection dictionaries and finite-state automata. Wikipedia is used as a lexicon of MWEs and as a corpus annotated with links to articles. Incoming links for each article are used to determine the inflection pattern of the headword – this approach helps eliminate invalid inflected forms. The goal is to recognize known MWEs as well as to find more expressions sharing similar grammatical structure and occurring in similar context.