Creating an Extended Named Entity Dictionary from Wikipedia

From Wikipedia Quality
Jump to: navigation, search


Creating an Extended Named Entity Dictionary from Wikipedia
Authors
Ryuichiro Higashinaka
Kugatsu Sadamitsu
Kuniko Saito
Toshiro Makino
Yoshihiro Matsuo
Publication date
2012
Links
Original

Creating an Extended Named Entity Dictionary from Wikipedia - scientific work related to Wikipedia quality published in 2012, written by Ryuichiro Higashinaka, Kugatsu Sadamitsu, Kuniko Saito, Toshiro Makino and Yoshihiro Matsuo.

Overview

Automatic methods to create entity dictionaries or gazetteers have used only a small number of entity types (18 at maximum), which could pose a limitation for fine-grained information extraction. This paper aims to create a dictionary of 200 extended named entity (ENE) types. Using Wikipedia as a basic resource, authors classify Wikipedia titles into ENE types to create an ENE dictionary. In method, authors derive a large number of features for Wikipedia titles and train a multiclass classifier by supervised learning. Authors devise an extensive list of features for the accurate classification into the ENE types, such as those related to the surface string of a title, the content of the article, and the meta data provided with Wikipedia. By experiments, authors successfully show that it is possible to classify Wikipedia titles into ENE types with 79.63% accuracy. Authors applied classifier to all Wikipedia titles and, by discarding low-confidence classification results, created an ENE dictionary of over one million entities covering 182 ENE types with an estimated accuracy of 89.48%. This is the first large scale ENE dictionary. TITLE AND ABSTRACT IN ANOTHER LANGUAGE (JAPANESE) Wikipediaを用いた拡張固有表現辞書の構築 従来の固有表現辞書では,少ない数(最大で 18)の固有表現タイプが用いられてきたため, ピンポイントな情報抽出に適用することが難しいという問題があった.そこで,本稿では, 200の拡張固有表現タイプを用いた固有表現辞書の構築を目指す.具体的には,教師あり学 習による多クラス分類器を用い,Wikipediaの見出し語を拡張固有表現タイプに分類するこ とで辞書を構築する.特徴量として,見出し語そのもの,本文,そして,カテゴリ等のメタ データに関するものを数多く列挙し用いた.結果として,見出し語を,79.63%の精度で,拡 張固有表現タイプに分類できることが分かった.学習された多クラス分類器を,Wikipediaの すべての見出し語に適用し,また,信頼度の低い分類結果については除外するようにしたと ころ,推定分類精度が 89.48%で,また,182の拡張固有表現タイプをカバーする,百万以 上のエントリを持つ拡張固有表現辞書を構築することができた.この辞書は,初の大規模な 拡張固有表現辞書である.

Embed

Wikipedia Quality

Higashinaka, Ryuichiro; Sadamitsu, Kugatsu; Saito, Kuniko; Makino, Toshiro; Matsuo, Yoshihiro. (2012). "[[Creating an Extended Named Entity Dictionary from Wikipedia]]".

English Wikipedia

{{cite journal |last1=Higashinaka |first1=Ryuichiro |last2=Sadamitsu |first2=Kugatsu |last3=Saito |first3=Kuniko |last4=Makino |first4=Toshiro |last5=Matsuo |first5=Yoshihiro |title=Creating an Extended Named Entity Dictionary from Wikipedia |date=2012 |url=https://wikipediaquality.com/wiki/Creating_an_Extended_Named_Entity_Dictionary_from_Wikipedia}}

HTML

Higashinaka, Ryuichiro; Sadamitsu, Kugatsu; Saito, Kuniko; Makino, Toshiro; Matsuo, Yoshihiro. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Creating_an_Extended_Named_Entity_Dictionary_from_Wikipedia">Creating an Extended Named Entity Dictionary from Wikipedia</a>&quot;.