Difference between revisions of "Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Adding infobox)
(+ cat.)
 
(One intermediate revision by one other user not shown)
Line 10: Line 10:
 
== Overview ==
 
== Overview ==
 
This paper investigates how to effectively do [[cross lingual]] text classification by leveraging a large scale and [[multilingual]] knowledge base, [[Wikipedia]]. Based on the observation that each Wikipedia concept is described by documents of [[different language]]s, authors adapt existing topic modeling algorithms for mining multilingual topics from this knowledge base. The extracted topics have multiple types of representations, with each type corresponding to one language. In this work, authors regard such topics extracted from Wikipedia documents as universal-topics, since each topic corresponds with same [[semantic information]] of different languages. Thus new documents of different languages can be represented in a space using a group of universal-topics. Authors use these universal-topics to do cross lingual text classification. Given the training data labeled for one language, authors can train a text classifier to classify the documents of another language by mapping all documents of both languages into the universal-topic space. This approach does not require any additional linguistic resources, like bilingual dictionaries, [[machine translation]] tools, or labeling data for the target language. The evaluation results indicate that topic modeling approach is effective for building cross lingual text classifier.
 
This paper investigates how to effectively do [[cross lingual]] text classification by leveraging a large scale and [[multilingual]] knowledge base, [[Wikipedia]]. Based on the observation that each Wikipedia concept is described by documents of [[different language]]s, authors adapt existing topic modeling algorithms for mining multilingual topics from this knowledge base. The extracted topics have multiple types of representations, with each type corresponding to one language. In this work, authors regard such topics extracted from Wikipedia documents as universal-topics, since each topic corresponds with same [[semantic information]] of different languages. Thus new documents of different languages can be represented in a space using a group of universal-topics. Authors use these universal-topics to do cross lingual text classification. Given the training data labeled for one language, authors can train a text classifier to classify the documents of another language by mapping all documents of both languages into the universal-topic space. This approach does not require any additional linguistic resources, like bilingual dictionaries, [[machine translation]] tools, or labeling data for the target language. The evaluation results indicate that topic modeling approach is effective for building cross lingual text classifier.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Ni, Xiaochuan; Sun, Jian-Tao; Hu, Jian; Chen, Zheng. (2011). "[[Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia]]".DOI: 10.1145/1935826.1935887.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Ni |first1=Xiaochuan |last2=Sun |first2=Jian-Tao |last3=Hu |first3=Jian |last4=Chen |first4=Zheng |title=Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia |date=2011 |doi=10.1145/1935826.1935887 |url=https://wikipediaquality.com/wiki/Cross_Lingual_Text_Classification_by_Mining_Multilingual_Topics_from_Wikipedia}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Ni, Xiaochuan; Sun, Jian-Tao; Hu, Jian; Chen, Zheng. (2011). &amp;quot;<a href="https://wikipediaquality.com/wiki/Cross_Lingual_Text_Classification_by_Mining_Multilingual_Topics_from_Wikipedia">Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia</a>&amp;quot;.DOI: 10.1145/1935826.1935887.
 +
</nowiki>
 +
</code>
 +
 +
 +
 +
[[Category:Scientific works]]

Latest revision as of 07:29, 24 March 2021


Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia
Authors
Xiaochuan Ni
Jian-Tao Sun
Jian Hu
Zheng Chen
Publication date
2011
DOI
10.1145/1935826.1935887
Links
Original

Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia - scientific work related to Wikipedia quality published in 2011, written by Xiaochuan Ni, Jian-Tao Sun, Jian Hu and Zheng Chen.

Overview

This paper investigates how to effectively do cross lingual text classification by leveraging a large scale and multilingual knowledge base, Wikipedia. Based on the observation that each Wikipedia concept is described by documents of different languages, authors adapt existing topic modeling algorithms for mining multilingual topics from this knowledge base. The extracted topics have multiple types of representations, with each type corresponding to one language. In this work, authors regard such topics extracted from Wikipedia documents as universal-topics, since each topic corresponds with same semantic information of different languages. Thus new documents of different languages can be represented in a space using a group of universal-topics. Authors use these universal-topics to do cross lingual text classification. Given the training data labeled for one language, authors can train a text classifier to classify the documents of another language by mapping all documents of both languages into the universal-topic space. This approach does not require any additional linguistic resources, like bilingual dictionaries, machine translation tools, or labeling data for the target language. The evaluation results indicate that topic modeling approach is effective for building cross lingual text classifier.

Embed

Wikipedia Quality

Ni, Xiaochuan; Sun, Jian-Tao; Hu, Jian; Chen, Zheng. (2011). "[[Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia]]".DOI: 10.1145/1935826.1935887.

English Wikipedia

{{cite journal |last1=Ni |first1=Xiaochuan |last2=Sun |first2=Jian-Tao |last3=Hu |first3=Jian |last4=Chen |first4=Zheng |title=Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia |date=2011 |doi=10.1145/1935826.1935887 |url=https://wikipediaquality.com/wiki/Cross_Lingual_Text_Classification_by_Mining_Multilingual_Topics_from_Wikipedia}}

HTML

Ni, Xiaochuan; Sun, Jian-Tao; Hu, Jian; Chen, Zheng. (2011). &quot;<a href="https://wikipediaquality.com/wiki/Cross_Lingual_Text_Classification_by_Mining_Multilingual_Topics_from_Wikipedia">Cross Lingual Text Classification by Mining Multilingual Topics from Wikipedia</a>&quot;.DOI: 10.1145/1935826.1935887.