Modelling Provenance of Dbpedia Resources Using Wikipedia Contributions

From Wikipedia Quality
Revision as of 10:56, 14 July 2019 by Maria (talk | contribs) (+ Infobox work)
Jump to: navigation, search


Modelling Provenance of Dbpedia Resources Using Wikipedia Contributions
Authors
Fabrizio Orlandi
Alexandre Passant
Publication date
2011
DOI
10.1016/j.websem.2011.03.002
Links
Original

Modelling Provenance of Dbpedia Resources Using Wikipedia Contributions - scientific work related to Wikipedia quality published in 2011, written by Fabrizio Orlandi and Alexandre Passant.

Overview

Abstract DBpedia is one of the largest datasets in the linked Open Data cloud. Its centrality and its cross-domain nature makes it one of the most important and most referred to knowledge bases on the Web of Data, generally used as a reference for data interlinking. Yet, in spite of its authoritative aspect, there is no work so far tackling the provenance aspect of DBpedia statements. By being extracted from Wikipedia, an open and collaborative encyclopedia, delivering provenance information about it would help to ensure trustworthiness of its data, a major need for people using DBpedia data for building applications. To overcome this problem, authors propose an approach for modelling and managing provenance on DBpedia using Wikipedia edits, and making this information available on the Web of Data. In this paper, authors describe the framework that authors implemented to do so, consisting in (1) a lightweight modelling solution to semantically represent provenance of both DBpedia resources and Wikipedia content, along with mappings to popular ontologies such as the W7 – what , when , where , how , who , which , and why – and OPM – open provenance model – models, (2) an information extraction process and a provenance-computation system combining Wikipedia articles’ history with DBpedia information, (3) a set of scripts to make provenance information about DBpedia statements directly available when browsing this source, as well as being publicly exposed in RDF for letting software agents consume it.