Difference between revisions of "A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml"

From Wikipedia Quality
Jump to: navigation, search
(Adding wikilinks)
(Adding infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml
 +
| date = 2011
 +
| authors = [[Noah Bubenhofer]]<br />[[Stefanie Haupt]]<br />[[Horst Schwinn]]
 +
| link = https://ids-pub.bsz-bw.de/files/5189/Bubenhofer_Schwinn_Haupt-A_comparable_corpus-2011.pdf
 +
}}
 
'''A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Noah Bubenhofer]], [[Stefanie Haupt]] and [[Horst Schwinn]].
 
'''A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Noah Bubenhofer]], [[Stefanie Haupt]] and [[Horst Schwinn]].
  
 
== Overview ==
 
== Overview ==
 
To build a comparable [[Wikipedia]] corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, authors used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for [[multilingual]] research in various linguistic topics.
 
To build a comparable [[Wikipedia]] corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, authors used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for [[multilingual]] research in various linguistic topics.

Revision as of 10:39, 27 October 2019


A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml
Authors
Noah Bubenhofer
Stefanie Haupt
Horst Schwinn
Publication date
2011
Links
Original

A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml - scientific work related to Wikipedia quality published in 2011, written by Noah Bubenhofer, Stefanie Haupt and Horst Schwinn.

Overview

To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, authors used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.