Link Analysis of Wikipedia Documents Using Mapreduce

From Wikipedia Quality
Revision as of 08:29, 15 June 2019 by Piper (talk | contribs) (Adding wikilinks)
Jump to: navigation, search

Link Analysis of Wikipedia Documents Using Mapreduce - scientific work related to Wikipedia quality published in 2015, written by Vasa Hardik, Vasudevan Anirudh and Palanisamy Balaji.

Overview

Wikipedia, a collaborative and user driven encyclopedia is considered to be the largest content thesaurus on the web, expanding into a massive database housing a huge amount of information. In this paper, authors present the design and implementation of a MapReduce-based Wikipedia link analysis system that provides a hierarchical examination of document connectivity in Wikipedia and captures the semantic relationships between the articles. Authors system consists of a Wikipedia crawler, a MapReduce-based distributed parser and the link analysis techniques. The results produced by this study are then modelled to the web Key Performance Indicators (KPIs) for link-structure interpretation. Authors find that Wikipedia has a remarkable capability as a corpus for content correlation with respect to connectivity among articles. Link Analysis and Semantic Structuration of Wikipedia not only provides an ergonomic report of tire-based link hierarchy of Wikipedia articles but also reflects the general cognition on semantic relationship between them. The results of analysis are aimed at providing valuable insights on evaluating the accuracy and the content scalability of Wikipedia through its link schematics.