Extracting World and Linguistic Knowledge from Wikipedia

Extracting World and Linguistic Knowledge from Wikipedia - scientific work related to Wikipedia quality published in 2009, written by Simone Paolo Ponzetto and Michael Strube.


Many research efforts have been devoted to develop robust statistical modeling techniques for many NLP tasks. Authors field is now moving towards more complex tasks (e.g. RTE, QA), which require to complement these methods with a semantically rich representation based on world and linguistic knowledge (i.e. annotated linguistic data). In this tutorial authors show several approaches to extract this knowledge from Wikipedia. This resource has attracted the attention of much work in the AI community, mainly because it provides semi-structured information and a large amount of manual annotations. The purpose of this tutorial is to introduce Wikipedia as a resource to the NLP community and to provide an introduction for NLP researchers both from a scientific and a practical (i.e. data acquisition and processing issues) perspective.