Difference between revisions of "Chinese Named Entity Recognition and Disambiguation based on Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Wikilinks)
(Adding infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Chinese Named Entity Recognition and Disambiguation based on Wikipedia
 +
| date = 2012
 +
| authors = [[Yu Miao]]<br />[[Lv Yajuan]]<br />[[Liu Qun]]<br />[[Su Jinsong]]<br />[[Xiong Hao]]
 +
| doi = 10.1007/978-3-642-34456-5_25
 +
| link = https://link.springer.com/chapter/10.1007%2F978-3-642-34456-5_25
 +
}}
 
'''Chinese Named Entity Recognition and Disambiguation based on Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2012, written by [[Yu Miao]], [[Lv Yajuan]], [[Liu Qun]], [[Su Jinsong]] and [[Xiong Hao]].
 
'''Chinese Named Entity Recognition and Disambiguation based on Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2012, written by [[Yu Miao]], [[Lv Yajuan]], [[Liu Qun]], [[Su Jinsong]] and [[Xiong Hao]].
  
 
== Overview ==
 
== Overview ==
 
This paper presents a method for [[named entity recognition]] and dis- ambiguation based on [[Wikipedia]]. First, authors establish Wikipedia database using [[open source]] tools named JWPL. Second, authors extract the definition term from the first sentence of Wikipedia page and use it as external knowledge in [[named entity]] recognition. Finally, authors achieve named entity disambiguation using Wikipedia disambiguation pages and contextual information. The experiments show that the use of Wikipedia [[features]] can improve the accuracy of named [[entity recognition]]. A large of new information emerged and formed information explosion with the rapid development of information technology and Internet. Many emerging information processing technologies such as [[information retrieval]], [[information extraction]], data mining and [[machine translation]] appeared in this background. Named entity is the main carrier of information and it expresses the main content of the text. It is the very import part of these researches. The research on named entity recognition has strategic signi- ficance to language understanding and information processing. At present, there are a lot of researches on named entity. The methods can be divided into three types. One is rule-based method. Its effect is good, but writing rules is time-consuming and labor-intensive and it lacks field adaptive capacity. The second is statistics-based method. Although statistics-based method has a good ability of model learning without human intervention, it is limited by the limited scale of the training corpus. As a result the last work emerged which combine rule-based method and sta- tistics-based method. It aimed to reduce the complexity and blindness of the rule-based method. In recent years, a large number of new words are emerging and most of them are named entity including person names, location names and organization names. Traditional rule-based or statistics-based method can't satisfy the named entity recog- nition and translation tasks, because of the accelerated update speed and expanding scale. In this work, authors research on named entity recognition based on network re- sources in order to improve the performance of the tasks.
 
This paper presents a method for [[named entity recognition]] and dis- ambiguation based on [[Wikipedia]]. First, authors establish Wikipedia database using [[open source]] tools named JWPL. Second, authors extract the definition term from the first sentence of Wikipedia page and use it as external knowledge in [[named entity]] recognition. Finally, authors achieve named entity disambiguation using Wikipedia disambiguation pages and contextual information. The experiments show that the use of Wikipedia [[features]] can improve the accuracy of named [[entity recognition]]. A large of new information emerged and formed information explosion with the rapid development of information technology and Internet. Many emerging information processing technologies such as [[information retrieval]], [[information extraction]], data mining and [[machine translation]] appeared in this background. Named entity is the main carrier of information and it expresses the main content of the text. It is the very import part of these researches. The research on named entity recognition has strategic signi- ficance to language understanding and information processing. At present, there are a lot of researches on named entity. The methods can be divided into three types. One is rule-based method. Its effect is good, but writing rules is time-consuming and labor-intensive and it lacks field adaptive capacity. The second is statistics-based method. Although statistics-based method has a good ability of model learning without human intervention, it is limited by the limited scale of the training corpus. As a result the last work emerged which combine rule-based method and sta- tistics-based method. It aimed to reduce the complexity and blindness of the rule-based method. In recent years, a large number of new words are emerging and most of them are named entity including person names, location names and organization names. Traditional rule-based or statistics-based method can't satisfy the named entity recog- nition and translation tasks, because of the accelerated update speed and expanding scale. In this work, authors research on named entity recognition based on network re- sources in order to improve the performance of the tasks.

Revision as of 10:22, 14 July 2019


Chinese Named Entity Recognition and Disambiguation based on Wikipedia
Authors
Yu Miao
Lv Yajuan
Liu Qun
Su Jinsong
Xiong Hao
Publication date
2012
DOI
10.1007/978-3-642-34456-5_25
Links
Original

Chinese Named Entity Recognition and Disambiguation based on Wikipedia - scientific work related to Wikipedia quality published in 2012, written by Yu Miao, Lv Yajuan, Liu Qun, Su Jinsong and Xiong Hao.

Overview

This paper presents a method for named entity recognition and dis- ambiguation based on Wikipedia. First, authors establish Wikipedia database using open source tools named JWPL. Second, authors extract the definition term from the first sentence of Wikipedia page and use it as external knowledge in named entity recognition. Finally, authors achieve named entity disambiguation using Wikipedia disambiguation pages and contextual information. The experiments show that the use of Wikipedia features can improve the accuracy of named entity recognition. A large of new information emerged and formed information explosion with the rapid development of information technology and Internet. Many emerging information processing technologies such as information retrieval, information extraction, data mining and machine translation appeared in this background. Named entity is the main carrier of information and it expresses the main content of the text. It is the very import part of these researches. The research on named entity recognition has strategic signi- ficance to language understanding and information processing. At present, there are a lot of researches on named entity. The methods can be divided into three types. One is rule-based method. Its effect is good, but writing rules is time-consuming and labor-intensive and it lacks field adaptive capacity. The second is statistics-based method. Although statistics-based method has a good ability of model learning without human intervention, it is limited by the limited scale of the training corpus. As a result the last work emerged which combine rule-based method and sta- tistics-based method. It aimed to reduce the complexity and blindness of the rule-based method. In recent years, a large number of new words are emerging and most of them are named entity including person names, location names and organization names. Traditional rule-based or statistics-based method can't satisfy the named entity recog- nition and translation tasks, because of the accelerated update speed and expanding scale. In this work, authors research on named entity recognition based on network re- sources in order to improve the performance of the tasks.