Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata

From Wikipedia Quality
Revision as of 07:26, 27 May 2019 by Lily (talk | contribs) (Adding new article - Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata - scientific work related to Wikipedia quality published in 2016, written by Pawel Chrzaszcz.

Overview

Linguistic resources for Polish are often missing multiword expressions (MWEs) – idioms, compound nouns and other expressions which have their own distinct meaning as a whole. This paper describes an effort to extract and recognize nominal MWEs in Polish text using Wikipedia, inflection dictionaries and finite-state automata. Wikipedia is used as a lexicon of MWEs and as a corpus annotated with links to articles. Incoming links for each article are used to determine the inflection pattern of the headword – this approach helps eliminate invalid inflected forms. The goal is to recognize known MWEs as well as to find more expressions sharing similar grammatical structure and occurring in similar context.