Methods for Exploring and Mining Tables on Wikipedia - scientific work related to Wikipedia quality published in 2013, written by Chandra Bhagavatula, Thanapon Noraset and Doug Downey.


Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. Authors present WikiTables , a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, authors show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover "interesting" relationships between table columns. Authors find that a "Semantic Relatedness" measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, authors show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Authors work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs.