Llamafur: Learning Latent Category Matrix to Find Unexpected Relations in Wikipedia (Long Version)

From Wikipedia Quality
Revision as of 10:41, 6 November 2019 by Mila (talk | contribs) (Adding wikilinks)
Jump to: navigation, search

Llamafur: Learning Latent Category Matrix to Find Unexpected Relations in Wikipedia (Long Version) - scientific work related to Wikipedia quality published in 2016, written by Paolo Boldi and Corrado Monti.

Overview

Besides finding trends and unveiling typical patterns, modern information retrieval is increasingly more interested in the discovery of surprising information in textual datasets. In this work authors focus on finding "unexpected links" in hyperlinked document corpora when documents are assigned to categories. To achieve this goal, authors model the hyperlinks graph through node categories: the presence of an arc is fostered or discouraged by the categories of the head and the tail of the arc. Specifically, authors determine a latent category matrix that explains common links. The matrix is built using a margin-based online learning algorithm (Passive-Aggressive), which makes us able to process graphs with $10^{8}$ links in less than $10$ minutes. Authors show that method provides better accuracy than most existing text-based techniques, with higher efficiency and relying on a much smaller amount of information. It also provides higher precision than standard link prediction, especially at low recall levels; the two methods are in fact shown to be orthogonal to each other and can therefore be fruitfully combined.