Difference between revisions of "Wikipedia-Based Kernels for Text Categorization"
(+ Infobox work) |
(Categories) |
||
(One intermediate revision by one other user not shown) | |||
Line 10: | Line 10: | ||
== Overview == | == Overview == | ||
In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence between indexing terms, usually words, is assumed. Since training corpora sizes are relatively small - compared to ap infin what would be required for a realistic number of words - the generalization power of the learning algorithms is low. It is assumed that a bigger text corpus can boost the representation and hence the learning process. Based on the work of Gabrilovich and Markovitch [6], authors incorporate [[Wikipedia]] articles into the system to give word distributional representation for documents. The extension with this new corpus causes dimensionality increase, therefore clustering of [[features]] is needed. Authors use latent semantic analysis (LSA), kernel principal component analysis (KPCA) and kernel canonical correlation analysis (KCCA) and present results for these experiments on the Reuters corpus. | In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence between indexing terms, usually words, is assumed. Since training corpora sizes are relatively small - compared to ap infin what would be required for a realistic number of words - the generalization power of the learning algorithms is low. It is assumed that a bigger text corpus can boost the representation and hence the learning process. Based on the work of Gabrilovich and Markovitch [6], authors incorporate [[Wikipedia]] articles into the system to give word distributional representation for documents. The extension with this new corpus causes dimensionality increase, therefore clustering of [[features]] is needed. Authors use latent semantic analysis (LSA), kernel principal component analysis (KPCA) and kernel canonical correlation analysis (KCCA) and present results for these experiments on the Reuters corpus. | ||
+ | |||
+ | == Embed == | ||
+ | === Wikipedia Quality === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Minier, Zsolt; Bodó, Zalán; Csató, Lehel. (2007). "[[Wikipedia-Based Kernels for Text Categorization]]".DOI: 10.1109/SYNASC.2007.8. | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === English Wikipedia === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | {{cite journal |last1=Minier |first1=Zsolt |last2=Bodó |first2=Zalán |last3=Csató |first3=Lehel |title=Wikipedia-Based Kernels for Text Categorization |date=2007 |doi=10.1109/SYNASC.2007.8 |url=https://wikipediaquality.com/wiki/Wikipedia-Based_Kernels_for_Text_Categorization}} | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === HTML === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Minier, Zsolt; Bodó, Zalán; Csató, Lehel. (2007). &quot;<a href="https://wikipediaquality.com/wiki/Wikipedia-Based_Kernels_for_Text_Categorization">Wikipedia-Based Kernels for Text Categorization</a>&quot;.DOI: 10.1109/SYNASC.2007.8. | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | |||
+ | |||
+ | [[Category:Scientific works]] |
Latest revision as of 07:17, 22 June 2020
Authors | Zsolt Minier Zalán Bodó Lehel Csató |
---|---|
Publication date | 2007 |
DOI | 10.1109/SYNASC.2007.8 |
Links | Original |
Wikipedia-Based Kernels for Text Categorization - scientific work related to Wikipedia quality published in 2007, written by Zsolt Minier, Zalán Bodó and Lehel Csató.
Overview
In recent years several models have been proposed for text categorization. Within this, one of the widely applied models is the vector space model (VSM), where independence between indexing terms, usually words, is assumed. Since training corpora sizes are relatively small - compared to ap infin what would be required for a realistic number of words - the generalization power of the learning algorithms is low. It is assumed that a bigger text corpus can boost the representation and hence the learning process. Based on the work of Gabrilovich and Markovitch [6], authors incorporate Wikipedia articles into the system to give word distributional representation for documents. The extension with this new corpus causes dimensionality increase, therefore clustering of features is needed. Authors use latent semantic analysis (LSA), kernel principal component analysis (KPCA) and kernel canonical correlation analysis (KCCA) and present results for these experiments on the Reuters corpus.
Embed
Wikipedia Quality
Minier, Zsolt; Bodó, Zalán; Csató, Lehel. (2007). "[[Wikipedia-Based Kernels for Text Categorization]]".DOI: 10.1109/SYNASC.2007.8.
English Wikipedia
{{cite journal |last1=Minier |first1=Zsolt |last2=Bodó |first2=Zalán |last3=Csató |first3=Lehel |title=Wikipedia-Based Kernels for Text Categorization |date=2007 |doi=10.1109/SYNASC.2007.8 |url=https://wikipediaquality.com/wiki/Wikipedia-Based_Kernels_for_Text_Categorization}}
HTML
Minier, Zsolt; Bodó, Zalán; Csató, Lehel. (2007). "<a href="https://wikipediaquality.com/wiki/Wikipedia-Based_Kernels_for_Text_Categorization">Wikipedia-Based Kernels for Text Categorization</a>".DOI: 10.1109/SYNASC.2007.8.