Extracting Prov Provenance Traces from Wikipedia History Pages

From Wikipedia Quality
Revision as of 00:27, 27 October 2019 by Eliana (talk | contribs) (+ links)
Jump to: navigation, search

Extracting Prov Provenance Traces from Wikipedia History Pages - scientific work related to Wikipedia quality published in 2013, written by Paolo Missier and Ziyu Chen.

Overview

Wikipedia History pages contain provenance metadata that describes the history of revisions of each Wikipedia article. Authors have developed a simple extractor which, starting from a user-specified article page, crawls through the graph of its associated history pages, and encodes the essential elements of those pages according to the PROV data model. The crawling is performed on the live pages using the Wikipedia REST interface. The resulting PROV provenance graphs are stored in a graph database (Neo4J), where they can be queried using the Cypher graph query language (proprietary to Neo4J), or traversed programmatically using the Neo4J Java Traversal API.