YAWN: A Semantically Annotated Wikipedia XML Corpus
Authors | Ralf Schenkel Fabian M. Suchanek Gjergji Kasneci |
---|---|
Publication date | 2007 |
ISBN | 978-388579197-3 |
Links |
YAWN: A Semantically Annotated Wikipedia XML Corpus - scientific work about Wikipedia quality published in 2007, written by Ralf Schenkel, Fabian M. Suchanek and Gjergji Kasneci.
Overview
The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. Authors introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. Authors give examples how such annotations can be exploited for high-precision queries.
Embed
Wikipedia Quality
Schenkel, Ralf; Suchanek, Fabian M.; Kasneci, Gjergji. (2007). "[[YAWN: A Semantically Annotated Wikipedia XML Corpus]]". CEUR Workshop Proceedings Volume 295, 2007, 8p. ISBN: 978-388579197-3.
English Wikipedia
{{cite journal |last1=Schenkel |first1=Ralf |last2=Suchanek |first2=Fabian M. |last3=Kasneci |first3=Gjergji |title=YAWN: A Semantically Annotated Wikipedia XML Corpus |date=2007 |isbn=978-388579197-3 |url=https://wikipediaquality.com/wiki/YAWN:_A_Semantically_Annotated_Wikipedia_XML_Corpus |journal=CEUR Workshop Proceedings Volume 295, 2007, 8p}}
HTML
Schenkel, Ralf; Suchanek, Fabian M.; Kasneci, Gjergji. (2007). "<a href="https://wikipediaquality.com/wiki/YAWN:_A_Semantically_Annotated_Wikipedia_XML_Corpus">YAWN: A Semantically Annotated Wikipedia XML Corpus</a>". CEUR Workshop Proceedings Volume 295, 2007, 8p. ISBN: 978-388579197-3.