Difference between revisions of "Annotating Wikipedia Articles with Semantic Tags for Structured Retrieval"

From Wikipedia Quality
Jump to: navigation, search
(Links)
(+ infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Annotating Wikipedia Articles with Semantic Tags for Structured Retrieval
 +
| date = 2009
 +
| authors = [[Saravadee Sae Tan]]<br />[[Tang Enya Kong]]<br />[[Gian Chand Sodhy]]
 +
| doi = 10.1145/1651437.1651441
 +
| link = http://dl.acm.org/citation.cfm?id=1651437.1651441
 +
}}
 
'''Annotating Wikipedia Articles with Semantic Tags for Structured Retrieval''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Saravadee Sae Tan]], [[Tang Enya Kong]] and [[Gian Chand Sodhy]].
 
'''Annotating Wikipedia Articles with Semantic Tags for Structured Retrieval''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Saravadee Sae Tan]], [[Tang Enya Kong]] and [[Gian Chand Sodhy]].
  
 
== Overview ==
 
== Overview ==
 
Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of documents to improve [[information retrieval]]. Therefore, the availability of semantic structure in the documents is an important factor for the success of structured retrieval. However, the majority of documents in the Web still lack semantically-rich structure. This motivates us to automatically identify the [[semantic information]] in web documents and explicitly annotate the information with semantic tags. Based on the well-known [[Wikipedia]] corpus, this paper describes an unsupervised learning approach to identify conceptual information and descriptive information of an entity described in a Wikipedia article. Authors approach utilizes Wikipedia link structure and Infobox information in order to learn the semantic structure of the Wikipedia articles. Authors also describe a lazy approach used in the learning process. By utilizing the [[Wikipedia categories]] provided by the contributors, only a subset of entities in a Wikipedia category is used as training data in the learning process and the results can be applied to the rest of the entities in the category.
 
Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of documents to improve [[information retrieval]]. Therefore, the availability of semantic structure in the documents is an important factor for the success of structured retrieval. However, the majority of documents in the Web still lack semantically-rich structure. This motivates us to automatically identify the [[semantic information]] in web documents and explicitly annotate the information with semantic tags. Based on the well-known [[Wikipedia]] corpus, this paper describes an unsupervised learning approach to identify conceptual information and descriptive information of an entity described in a Wikipedia article. Authors approach utilizes Wikipedia link structure and Infobox information in order to learn the semantic structure of the Wikipedia articles. Authors also describe a lazy approach used in the learning process. By utilizing the [[Wikipedia categories]] provided by the contributors, only a subset of entities in a Wikipedia category is used as training data in the learning process and the results can be applied to the rest of the entities in the category.

Revision as of 09:52, 19 January 2021


Annotating Wikipedia Articles with Semantic Tags for Structured Retrieval
Authors
Saravadee Sae Tan
Tang Enya Kong
Gian Chand Sodhy
Publication date
2009
DOI
10.1145/1651437.1651441
Links
Original

Annotating Wikipedia Articles with Semantic Tags for Structured Retrieval - scientific work related to Wikipedia quality published in 2009, written by Saravadee Sae Tan, Tang Enya Kong and Gian Chand Sodhy.

Overview

Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of documents to improve information retrieval. Therefore, the availability of semantic structure in the documents is an important factor for the success of structured retrieval. However, the majority of documents in the Web still lack semantically-rich structure. This motivates us to automatically identify the semantic information in web documents and explicitly annotate the information with semantic tags. Based on the well-known Wikipedia corpus, this paper describes an unsupervised learning approach to identify conceptual information and descriptive information of an entity described in a Wikipedia article. Authors approach utilizes Wikipedia link structure and Infobox information in order to learn the semantic structure of the Wikipedia articles. Authors also describe a lazy approach used in the learning process. By utilizing the Wikipedia categories provided by the contributors, only a subset of entities in a Wikipedia category is used as training data in the learning process and the results can be applied to the rest of the entities in the category.