Towards Robust Tags for Scientific Publications from Natural Language Processing Tools and Wikipedia

From Wikipedia Quality
Jump to: navigation, search


Towards Robust Tags for Scientific Publications from Natural Language Processing Tools and Wikipedia
Authors
Michał Łopuszyński
Łukasz Bolikowski
Publication date
2015
DOI
10.1007/s00799-014-0132-0
Links
Original

Towards Robust Tags for Scientific Publications from Natural Language Processing Tools and Wikipedia - scientific work related to Wikipedia quality published in 2015, written by Michał Łopuszyński and Łukasz Bolikowski.

Overview

In this work, two simple methods of tagging scientific publications with labels reflecting their content are presented and compared. As a first source of labels, Wikipedia is employed. A second label set is constructed from the noun phrases occurring in the analyzed corpus. The corpus itself consists of abstracts from 0.7 million scientific documents deposited in the ArXiv preprint collection. Authors present a comparison of both approaches, which shows that discussed methods are to a large extent complementary. Moreover, the results give interesting insights into the completeness of Wikipedia knowledge in various scientific domains. As a next step, authors examine the statistical properties of the obtained tags. It turns out that both methods show qualitatively similar rank---frequency dependence, which is best approximated by the stretched exponential curve. The distribution of the number of distinct tags per document follows also the same distribution for both methods and is well described by the negative binomial distribution. The developed tags are meant for use as features in various text mining tasks. Therefore, as a final step authors show the preliminary results on their application to topic modeling.

Embed

Wikipedia Quality

Łopuszyński, Michał; Bolikowski, Łukasz. (2015). "[[Towards Robust Tags for Scientific Publications from Natural Language Processing Tools and Wikipedia]]". Springer Berlin Heidelberg. DOI: 10.1007/s00799-014-0132-0.

English Wikipedia

{{cite journal |last1=Łopuszyński |first1=Michał |last2=Bolikowski |first2=Łukasz |title=Towards Robust Tags for Scientific Publications from Natural Language Processing Tools and Wikipedia |date=2015 |doi=10.1007/s00799-014-0132-0 |url=https://wikipediaquality.com/wiki/Towards_Robust_Tags_for_Scientific_Publications_from_Natural_Language_Processing_Tools_and_Wikipedia |journal=Springer Berlin Heidelberg}}

HTML

Łopuszyński, Michał; Bolikowski, Łukasz. (2015). &quot;<a href="https://wikipediaquality.com/wiki/Towards_Robust_Tags_for_Scientific_Publications_from_Natural_Language_Processing_Tools_and_Wikipedia">Towards Robust Tags for Scientific Publications from Natural Language Processing Tools and Wikipedia</a>&quot;. Springer Berlin Heidelberg. DOI: 10.1007/s00799-014-0132-0.