Extending a Multilingual Lexical Resource by Bootstrapping Named Entity Classification Using Wikipedia's Category System

Extending a Multilingual Lexical Resource by Bootstrapping Named Entity Classification Using Wikipedia's Category System - scientific work related to Wikipedia quality published in 2011, written by .

Overview

Named Entity Recognition and Classification (NERC) is a well-studied NLP task which is typically approached using machine learning algorithms that rely on training data whose creation usually is expensive. The high costs result in the lack of NERC training data for many languages. An approach to create a multilingual NE corpus was presented in Wentland et al. (2008). The resulting resource called HeiNER describes a valuable number of NEs but does not include their types. Authors present a bootstrap approach based on Wikipedia’s category system to classify the NEs contained in HeiNER that is able to classify more than two million named entities to improve the resource’s quality.

Extending a Multilingual Lexical Resource by Bootstrapping Named Entity Classification Using Wikipedia's Category System

Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools