Improved Text Annotation with Wikipedia Entities

From Wikipedia Quality
Revision as of 20:15, 25 May 2019 by Olivia (talk | contribs) (Creating a new page - Improved Text Annotation with Wikipedia Entities)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Improved Text Annotation with Wikipedia Entities - scientific work related to Wikipedia quality published in 2013, written by Christos Makris, Yannis Plegas and Evangelos Theodoridis.

Overview

Text annotation is the procedure of initially identifying, in a segment of text, a set of dominant in meaning words and later on attaching to them extra information (usually drawn from a concept ontology, implemented as a catalog) that expresses their conceptual content in the current context. Attaching additional semantic information and structure helps to represent, in a machine interpretable way, the topic of the text and is a fundamental preprocessing step to many Information Retrieval tasks like indexing, clustering, classification, text summarization and cross-referencing content on web pages, posts, tweets etc. In this paper, authors deal with automatic annotation of text documents with entities of Wikipedia, the largest online knowledge base; a process that is commonly known as Wikification . Moving similarly to previous approaches the cross-reference of words in the text to Wikipedia articles is based on local compatibility between the text around the term and textual information embedded in the article. The main contribution of this paper is a set of disambiguation techniques that enhance previously published approaches by employing both the WordNet lexical database and the Wikipedia article's PageRank scores in the disambiguation process. The experimental evaluation performed depicts that the exploitation of these additional semantic information sources leads to more accurate Text Annotation.