Named Entity Corpus Construction Using Wikipedia and Dbpedia Ontology

From Wikipedia Quality
Revision as of 09:48, 30 August 2020 by Autumn (talk | contribs) (+ wikilinks)
Jump to: navigation, search

Named Entity Corpus Construction Using Wikipedia and Dbpedia Ontology - scientific work related to Wikipedia quality published in 2014, written by YoungGyun Hahm, Jungyeul Park, Kyungtae Lim, Youngsik Kim, Dosam Hwang and Key-Sun Choi.

Overview

In this paper, authors propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, authors suggest that the NE corpus generated by proposed method, can be used as training data. Authors approach introduces Wikipedia as a raw text and uses the DBpedia data set for named entity disambiguation. Authors method is language-independent and easy to be applied to many different languages where Wikipedia and DBpedia are provided. Throughout the paper, authors demonstrate that NE corpus is of comparable quality even to the manually annotated NE corpus.