Clustering Wikipedia Infoboxes to Discover Their Types

From Wikipedia Quality
Revision as of 00:39, 4 July 2018 by Librarian (talk | contribs) (New scientific work)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Clustering Wikipedia Infoboxes to Discover Their Types
Authors
Thanh Hoang Nguyen
Hoa Dieu Nguyen
Viviane Pereira Moreira
Juliana Freire
Publication date
2012
ISBN
978-145031156-4
DOI
10.1145/2396761.2398588
Links

Clustering Wikipedia Infoboxes to Discover Their Types - scientific work about Wikipedia quality published in 2012, written by Thanh Hoang Nguyen, Hoa Dieu Nguyen, Viviane Pereira Moreira and Juliana Freire.

Overview

Wikipedia has emerged as an important source of structured information on the Web. But while the success of Wikipedia can be attributed in part to the simplicity of adding and modifying content, this has also created challenges when it comes to using, querying, and integrating the information. Even though authors are encouraged to select appropriate categories and provide infoboxes that follow pre-defined templates, many do not follow the guidelines or follow them loosely. This leads to undesirable effects, such as template duplication, heterogeneity, and schema drift. As a step towards addressing this problem, authors propose a new unsupervised approach for clustering Wikipedia infoboxes. Instead of relying on manually assigned categories and template labels, authors use the structured information available in infoboxes to group them and infer their entity types. Experiments using over 48,000 infoboxes indicate that their clustering approach is effective and produces high quality clusters.

Embed

Wikipedia Quality

Nguyen, Thanh Hoang; Nguyen, Hoa Dieu; Moreira, Viviane Pereira; Freire, Juliana. (2012). "[[Clustering Wikipedia Infoboxes to Discover Their Types]]". ACM International Conference Proceeding Series 2012, pp. 2134-2138. ISBN: 978-145031156-4. DOI: 10.1145/2396761.2398588.

English Wikipedia

{{cite journal |last1=Nguyen |first1=Thanh Hoang |last2=Nguyen |first2=Hoa Dieu |last3=Moreira |first3=Viviane Pereira |last4=Freire |first4=Juliana |title=Clustering Wikipedia Infoboxes to Discover Their Types |date=2012 |isbn=978-145031156-4 |doi=10.1145/2396761.2398588 |url=https://wikipediaquality.com/wiki/Clustering_Wikipedia_Infoboxes_to_Discover_Their_Types |journal=ACM International Conference Proceeding Series 2012, pp. 2134-2138}}

HTML

Nguyen, Thanh Hoang; Nguyen, Hoa Dieu; Moreira, Viviane Pereira; Freire, Juliana. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Clustering_Wikipedia_Infoboxes_to_Discover_Their_Types">Clustering Wikipedia Infoboxes to Discover Their Types</a>&quot;. ACM International Conference Proceeding Series 2012, pp. 2134-2138. ISBN: 978-145031156-4. DOI: 10.1145/2396761.2398588.