Difference between revisions of "Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata"
(+ links) |
(+ Infobox work) |
||
Line 1: | Line 1: | ||
+ | {{Infobox work | ||
+ | | title = Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata | ||
+ | | date = 2016 | ||
+ | | authors = [[Pawel Chrzaszcz]] | ||
+ | | doi = 10.18653/v1/W16-1815 | ||
+ | | link = https://aaltodoc.aalto.fi:443/handle/123456789/15381 | ||
+ | | plink = https://www.semanticscholar.org/paper/Extraction-and-Recognition-of-Polish-Multiword-and-Chrzaszcz/9f6f532cec52138f5606aaf971895716bf74a084 | ||
+ | }} | ||
'''Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata''' - scientific work related to [[Wikipedia quality]] published in 2016, written by [[Pawel Chrzaszcz]]. | '''Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata''' - scientific work related to [[Wikipedia quality]] published in 2016, written by [[Pawel Chrzaszcz]]. | ||
== Overview == | == Overview == | ||
Linguistic resources for Polish are often missing multiword expressions (MWEs) – idioms, compound nouns and other expressions which have their own distinct meaning as a whole. This paper describes an effort to extract and recognize nominal MWEs in Polish text using [[Wikipedia]], inflection dictionaries and finite-state automata. Wikipedia is used as a lexicon of MWEs and as a corpus annotated with links to articles. Incoming links for each article are used to determine the inflection pattern of the headword – this approach helps eliminate invalid inflected forms. The goal is to recognize known MWEs as well as to find more expressions sharing similar grammatical structure and occurring in similar context. | Linguistic resources for Polish are often missing multiword expressions (MWEs) – idioms, compound nouns and other expressions which have their own distinct meaning as a whole. This paper describes an effort to extract and recognize nominal MWEs in Polish text using [[Wikipedia]], inflection dictionaries and finite-state automata. Wikipedia is used as a lexicon of MWEs and as a corpus annotated with links to articles. Incoming links for each article are used to determine the inflection pattern of the headword – this approach helps eliminate invalid inflected forms. The goal is to recognize known MWEs as well as to find more expressions sharing similar grammatical structure and occurring in similar context. |
Revision as of 22:11, 30 June 2019
Authors | Pawel Chrzaszcz |
---|---|
Publication date | 2016 |
DOI | 10.18653/v1/W16-1815 |
Links | Original Preprint |
Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata - scientific work related to Wikipedia quality published in 2016, written by Pawel Chrzaszcz.
Overview
Linguistic resources for Polish are often missing multiword expressions (MWEs) – idioms, compound nouns and other expressions which have their own distinct meaning as a whole. This paper describes an effort to extract and recognize nominal MWEs in Polish text using Wikipedia, inflection dictionaries and finite-state automata. Wikipedia is used as a lexicon of MWEs and as a corpus annotated with links to articles. Incoming links for each article are used to determine the inflection pattern of the headword – this approach helps eliminate invalid inflected forms. The goal is to recognize known MWEs as well as to find more expressions sharing similar grammatical structure and occurring in similar context.