Completing Wikipedia's Hyperlink Structure Through Dimensionality Reduction

From Wikipedia Quality
Revision as of 08:52, 20 October 2019 by Sylwia (talk | contribs) (wikilinks)
Jump to: navigation, search

Completing Wikipedia's Hyperlink Structure Through Dimensionality Reduction - scientific work related to Wikipedia quality published in 2009, written by Robert West, Doina Precup and Joelle Pineau.

Overview

Wikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyperlinks. However, since these links are created by human authors, links one would expect to see are often missing. The goal of this work is to detect such gaps automatically. In this paper, authors propose a novel method for augmenting the structure of hyperlinked document collections such as Wikipedia. It does not require the extraction of any manually defined features from the article to be augmented. Instead, it is based on principal component analysis, a well-founded mathematical generalization technique, and predicts new links purely based on the statistical structure of the graph formed by the existing links. Authors method does not rely on the textual content of articles; authors are exploiting only hyperlinks. A user evaluation of technique shows that it improves the quality of top link suggestions over the state of the art and that the best predicted links are significantly more valuable than the 'average' link already present in Wikipedia. Beyond link prediction, algorithm can potentially be used to point out topics an article misses to cover and to cluster articles semantically.