# Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata

Authors | Pawel Chrzaszcz |
---|---|

Publication date | 2016 |

DOI | 10.18653/v1/W16-1815 |

Links | Original Preprint |

**Extraction and Recognition of Polish Multiword Expressions Using Wikipedia and Finite-State Automata** - scientific work related to Wikipedia quality published in 2016, written by Pawel Chrzaszcz.

## Overview

Linguistic resources for Polish are often missing multiword expressions (MWEs) – idioms, compound nouns and other expressions which have their own distinct meaning as a whole. This paper describes an effort to extract and recognize nominal MWEs in Polish text using Wikipedia, inflection dictionaries and finite-state automata. Wikipedia is used as a lexicon of MWEs and as a corpus annotated with links to articles. Incoming links for each article are used to determine the inflection pattern of the headword – this approach helps eliminate invalid inflected forms. The goal is to recognize known MWEs as well as to find more expressions sharing similar grammatical structure and occurring in similar context.

