DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus

From Wikipedia Quality
Jump to: navigation, search
DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus
Authors
Martin Brümmer
Milan Dojchinovski
Sebastian Hellmann
Publication date
2016
ISBN
978-295174089-1
Links

DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus - scientific work about Wikipedia quality published in 2016, written by Martin Brümmer, Milan Dojchinovski and Sebastian Hellmann.

Overview

The ever increasing importance of machine learning in Natural Language Processing is accompanied by an equally increasing need in large-scale training and evaluation corpora. Due to its size, its openness and relative quality, the Wikipedia has already been a source of such data, but on a limited scale. This paper introduces the DBpedia Abstrsct Corpus, a large-scale, open corpus of annotated Wikipedia texts in six languages, featuring over 11 million texts and over 97 million entity links. The properties of the Wikipedia texts are being described, as well as the corpus creation process, its format and interesting use-cases, like Named Entity Linking training and evaluation.

Embed

Wikipedia Quality

Brümmer, Martin; Dojchinovski, Milan; Hellmann, Sebastian. (2016). "[[DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus]]". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9790, 2016, pp. 513-529. ISBN: 978-295174089-1.

English Wikipedia

{{cite journal |last1=Brümmer |first1=Martin |last2=Dojchinovski |first2=Milan |last3=Hellmann |first3=Sebastian |title=DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus |date=2016 |isbn=978-295174089-1 |url=https://wikipediaquality.com/wiki/DBpedia_Abstracts:_A_Large-Scale,_Open,_Multilingual_NLP_Training_Corpus |journal=Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9790, 2016, pp. 513-529}}

HTML

Brümmer, Martin; Dojchinovski, Milan; Hellmann, Sebastian. (2016). &quot;<a href="https://wikipediaquality.com/wiki/DBpedia_Abstracts:_A_Large-Scale,_Open,_Multilingual_NLP_Training_Corpus">DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus</a>&quot;. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9790, 2016, pp. 513-529. ISBN: 978-295174089-1.