Difference between revisions of "A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml"

Revision as of 10:56, 8 September 2019

A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml - scientific work related to Wikipedia quality published in 2011, written by Noah Bubenhofer, Stefanie Haupt and Horst Schwinn.

Overview

To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, authors used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.

@@ Line 1: / Line 1: @@
-'''A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml''' - scientific work related to Wikipedia quality published in 2011, written by Noah Bubenhofer, Stefanie Haupt and Horst Schwinn.
+'''A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Noah Bubenhofer]], [[Stefanie Haupt]] and [[Horst Schwinn]].
 == Overview ==
-To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, authors used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.
+To build a comparable [[Wikipedia]] corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, authors used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for [[multilingual]] research in various linguistic topics.

Difference between revisions of "A Comparable Wikipedia Corpus: from Wiki Syntax to Pos Tagged Xml"

Revision as of 10:56, 8 September 2019

Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools