Difference between revisions of "Mining Corpora of Computer-Mediated Communication: Analysis of Linguistic Features in Wikipedia Talk Pages Using Machine Learning Methods"

From Wikipedia Quality
Jump to: navigation, search
(Wikilinks)
(Adding infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Mining Corpora of Computer-Mediated Communication: Analysis of Linguistic Features in Wikipedia Talk Pages Using Machine Learning Methods
 +
| date = 2014
 +
| authors = [[Michael Beißwenger]]<br />[[Harald Lüngen]]<br />[[Eliza Margaretha]]<br />[[Christian Pölitz]]
 +
| link = http://ids-pub.bsz-bw.de/documents/3187/Beisswenger_Luengen_Margaretha_Poelitz_Mining corpora_2014.pdf
 +
}}
 
'''Mining Corpora of Computer-Mediated Communication: Analysis of Linguistic Features in Wikipedia Talk Pages Using Machine Learning Methods''' - scientific work related to [[Wikipedia quality]] published in 2014, written by [[Michael Beißwenger]], [[Harald Lüngen]], [[Eliza Margaretha]] and [[Christian Pölitz]].
 
'''Mining Corpora of Computer-Mediated Communication: Analysis of Linguistic Features in Wikipedia Talk Pages Using Machine Learning Methods''' - scientific work related to [[Wikipedia quality]] published in 2014, written by [[Michael Beißwenger]], [[Harald Lüngen]], [[Eliza Margaretha]] and [[Christian Pölitz]].
  
 
== Overview ==
 
== Overview ==
 
Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Authors contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where authors apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, authors will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic [[features]] in [[talk pages]] of the German [[Wikipedia]] corpus in DeReKo provided by the IDS Mannheim. Authors will investigate different representations of the data to integrate complex syntactic and [[semantic information]] for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.
 
Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Authors contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where authors apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, authors will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic [[features]] in [[talk pages]] of the German [[Wikipedia]] corpus in DeReKo provided by the IDS Mannheim. Authors will investigate different representations of the data to integrate complex syntactic and [[semantic information]] for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.

Revision as of 17:47, 19 December 2020


Mining Corpora of Computer-Mediated Communication: Analysis of Linguistic Features in Wikipedia Talk Pages Using Machine Learning Methods
Authors
Michael Beißwenger
Harald Lüngen
Eliza Margaretha
Christian Pölitz
Publication date
2014
Links
corpora_2014.pdf Original

Mining Corpora of Computer-Mediated Communication: Analysis of Linguistic Features in Wikipedia Talk Pages Using Machine Learning Methods - scientific work related to Wikipedia quality published in 2014, written by Michael Beißwenger, Harald Lüngen, Eliza Margaretha and Christian Pölitz.

Overview

Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Authors contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where authors apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, authors will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. Authors will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.