Dbpedia and the Live Extraction of Structured Data from Wikipedia

From Wikipedia Quality
Revision as of 07:02, 17 June 2019 by Autumn (talk | contribs) (wikilinks)
Jump to: navigation, search

Dbpedia and the Live Extraction of Structured Data from Wikipedia - scientific work related to Wikipedia quality published in 2012, written by Mohamed Morsey, Jens Lehmann, Sören Auer, Claus Stadler and Sebastian Hellmann.

Overview

Purpose – DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia‐Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues.Design/methodology/approach – Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia‐Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia‐Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors.Findings – During the realization of DBpedia‐Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently‐upd...