YAWN: A Semantically Annotated Wikipedia XML Corpus

From Wikipedia Quality
Jump to: navigation, search
YAWN: A Semantically Annotated Wikipedia XML Corpus
Authors
Ralf Schenkel
Fabian M. Suchanek
Gjergji Kasneci
Publication date
2007
ISBN
978-388579197-3
Links

YAWN: A Semantically Annotated Wikipedia XML Corpus - scientific work about Wikipedia quality published in 2007, written by Ralf Schenkel, Fabian M. Suchanek and Gjergji Kasneci.

Overview

The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. Authors introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. Authors give examples how such annotations can be exploited for high-precision queries.

Embed

Wikipedia Quality

Schenkel, Ralf; Suchanek, Fabian M.; Kasneci, Gjergji. (2007). "[[YAWN: A Semantically Annotated Wikipedia XML Corpus]]". CEUR Workshop Proceedings Volume 295, 2007, 8p. ISBN: 978-388579197-3.

English Wikipedia

{{cite journal |last1=Schenkel |first1=Ralf |last2=Suchanek |first2=Fabian M. |last3=Kasneci |first3=Gjergji |title=YAWN: A Semantically Annotated Wikipedia XML Corpus |date=2007 |isbn=978-388579197-3 |url=https://wikipediaquality.com/wiki/YAWN:_A_Semantically_Annotated_Wikipedia_XML_Corpus |journal=CEUR Workshop Proceedings Volume 295, 2007, 8p}}

HTML

Schenkel, Ralf; Suchanek, Fabian M.; Kasneci, Gjergji. (2007). &quot;<a href="https://wikipediaquality.com/wiki/YAWN:_A_Semantically_Annotated_Wikipedia_XML_Corpus">YAWN: A Semantically Annotated Wikipedia XML Corpus</a>&quot;. CEUR Workshop Proceedings Volume 295, 2007, 8p. ISBN: 978-388579197-3.