From French Wikipedia to Erudit: a Test Case for Cross-Domain Open Information Extraction

From Wikipedia Quality
Revision as of 09:51, 5 December 2019 by Emma (talk | contribs) (Basic information on From French Wikipedia to Erudit: a Test Case for Cross-Domain Open Information Extraction)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

From French Wikipedia to Erudit: a Test Case for Cross-Domain Open Information Extraction - scientific work related to Wikipedia quality published in 2018, written by Fabrizio Gotti and Philippe Langlais.

Overview

In this paper, authors describe an open information extraction pipeline based on ReVerb for extracting knowledge from French text. Authors put it to the test by using the information triples extracted to build an entity classifier, ie, a system able to label a given instance with its type (for instance, Michel Foucault is a philosopher). The classifier requires little supervision. One novel aspect of this study is that authors show how general domain information triples (extracted from French Wikipedia) can be used for deriving new knowledge from domain-specific documents unrelated to Wikipedia, in case scholarly articles focusing on the humanities. Authors believe that the present study is the first that focuses on such a cross-domain, recall-oriented approach in open information extraction. While system's performance shows room for improvement, manual assessments show that the task is quite hard, even for a human, in part because of the cross-domain aspect of the problem authors tackle.