Wiicluster: a Platform for Wikipedia Infobox Generation

From Wikipedia Quality
Revision as of 23:23, 26 September 2019 by Ariel (talk | contribs) (infobox)
Jump to: navigation, search


Wiicluster: a Platform for Wikipedia Infobox Generation
Authors
Kezun Zhang
Yanghua Xiao
Hanghang Tong
Haixun Wang
Wei Wang
Publication date
2014
DOI
10.1145/2661829.2661840
Links
Original

Wiicluster: a Platform for Wikipedia Infobox Generation - scientific work related to Wikipedia quality published in 2014, written by Kezun Zhang, Yanghua Xiao, Hanghang Tong, Haixun Wang and Wei Wang.

Overview

Wikipedia has become one of the best sources for creating and sharing a massive volume of human knowledge. Much effort has been devoted to generating and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Most, if not all, of the existing work share the same paradigm, that is, starting with information extraction over the unstructured text data, followed by supervised machine learning. Although remarkable progresses have been made, this paradigm has its own limitations in terms of effectiveness, scalability as well as the high labeling cost. Authors present WiiCluster, a scalable platform for automatically generating infobox for articles in Wikipedia. The heart of system is an effective cluster-then-label algorithm over a rich set of semi-structured data in Wikipedia articles: linked entities . It is totally unsupervised and thus does not require any human label. It is effective in generating semantically meaningful summarization for Wikipedia articles. Authors further propose a cluster-reuse algorithm to scale up system. Overall, WiiCluster is able to generate nearly 10 million new facts. Authors also develop a web-based platform to demonstrate WiiCluster, which enables the users to access and browse the generated knowledge.