Difference between revisions of "Cross-Media Topic Mining on Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Wikilinks)
(+ infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Cross-Media Topic Mining on Wikipedia
 +
| date = 2013
 +
| authors = [[Xikui Wang]]<br />[[Yang Liu]]<br />[[Donghui Wang]]<br />[[Fei Wu]]
 +
| doi = 10.1145/2502081.2502180
 +
| link = http://dl.acm.org/ft_gateway.cfm?id=2502180&amp;type=pdf
 +
}}
 
'''Cross-Media Topic Mining on Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Xikui Wang]], [[Yang Liu]], [[Donghui Wang]] and [[Fei Wu]].
 
'''Cross-Media Topic Mining on Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Xikui Wang]], [[Yang Liu]], [[Donghui Wang]] and [[Fei Wu]].
  
 
== Overview ==
 
== Overview ==
 
As a collaborative wiki-based encyclopedia, [[Wikipedia]] provides a huge amount of articles of various [[categories]]. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media ) from Wikipedia. In this work, authors propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the l 1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes model more interpretable and robust. Furthermore, the correlations of Wikipedia data in different modalities are explicitly considered in model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets.
 
As a collaborative wiki-based encyclopedia, [[Wikipedia]] provides a huge amount of articles of various [[categories]]. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media ) from Wikipedia. In this work, authors propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the l 1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes model more interpretable and robust. Furthermore, the correlations of Wikipedia data in different modalities are explicitly considered in model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets.

Revision as of 07:46, 30 April 2020


Cross-Media Topic Mining on Wikipedia
Authors
Xikui Wang
Yang Liu
Donghui Wang
Fei Wu
Publication date
2013
DOI
10.1145/2502081.2502180
Links
Original

Cross-Media Topic Mining on Wikipedia - scientific work related to Wikipedia quality published in 2013, written by Xikui Wang, Yang Liu, Donghui Wang and Fei Wu.

Overview

As a collaborative wiki-based encyclopedia, Wikipedia provides a huge amount of articles of various categories. In addition to their text corpus, Wikipedia also contains plenty of images which makes the articles more intuitive for readers to understand. To better organize these visual and textual data, one promising area of research is to jointly model the embedding topics across multi-modal data (i.e, cross-media ) from Wikipedia. In this work, authors propose to learn the projection matrices that map the data from heterogeneous feature spaces into a unified latent topic space. Different from previous approaches, by imposing the l 1 regularizers to the projection matrices, only a small number of relevant visual/textual words are associated with each topic, which makes model more interpretable and robust. Furthermore, the correlations of Wikipedia data in different modalities are explicitly considered in model. The effectiveness of the proposed topic extraction algorithm is verified by several experiments conducted on real Wikipedia datasets.