Difference between revisions of "Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification"

From Wikipedia Quality
Jump to: navigation, search
(+ infobox)
(Embed)
Line 10: Line 10:
 
== Overview ==
 
== Overview ==
 
Majority of the existing text classification algorithms are based on the “bag of words” (BOW) approach, in which the documents are represented as weighted occurrence frequencies of individual terms. However, semantic relations between terms are ignored in this representation. There are several studies which address this problem by integrating background knowledge such as [[WordNet]], ODP or [[Wikipedia]] as a semantic source. However, vast majority of these studies are applied to English texts and to the date there are no similar studies on classification of Turkish documents. Authors empirically analyze the effect of using Turkish Wikipedia (Vikipedi) as a semantic resource in classification of Turkish documents. Authors results demonstrate that performance of classification algorithms can be improved by exploiting Vikipedi concepts. Additionally, authors show that Vikipedi concepts have surprisingly large coverage in datasets which mostly consist of Turkish newspaper articles.
 
Majority of the existing text classification algorithms are based on the “bag of words” (BOW) approach, in which the documents are represented as weighted occurrence frequencies of individual terms. However, semantic relations between terms are ignored in this representation. There are several studies which address this problem by integrating background knowledge such as [[WordNet]], ODP or [[Wikipedia]] as a semantic source. However, vast majority of these studies are applied to English texts and to the date there are no similar studies on classification of Turkish documents. Authors empirically analyze the effect of using Turkish Wikipedia (Vikipedi) as a semantic resource in classification of Turkish documents. Authors results demonstrate that performance of classification algorithms can be improved by exploiting Vikipedi concepts. Additionally, authors show that Vikipedi concepts have surprisingly large coverage in datasets which mostly consist of Turkish newspaper articles.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Poyraz, Mitat; Ganiz, Murat Can; Akyokuş, Selim; Görener, Burak; Kilimci, Zeynep Hilal. (2012). "[[Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification]]".DOI: 10.1109/INISTA.2012.6246996.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Poyraz |first1=Mitat |last2=Ganiz |first2=Murat Can |last3=Akyokuş |first3=Selim |last4=Görener |first4=Burak |last5=Kilimci |first5=Zeynep Hilal |title=Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification |date=2012 |doi=10.1109/INISTA.2012.6246996 |url=https://wikipediaquality.com/wiki/Exploiting_Turkish_Wikipedia_as_a_Semantic_Resource_for_Text_Classification}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Poyraz, Mitat; Ganiz, Murat Can; Akyokuş, Selim; Görener, Burak; Kilimci, Zeynep Hilal. (2012). &amp;quot;<a href="https://wikipediaquality.com/wiki/Exploiting_Turkish_Wikipedia_as_a_Semantic_Resource_for_Text_Classification">Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification</a>&amp;quot;.DOI: 10.1109/INISTA.2012.6246996.
 +
</nowiki>
 +
</code>

Revision as of 16:51, 9 May 2020


Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification
Authors
Mitat Poyraz
Murat Can Ganiz
Selim Akyokuş
Burak Görener
Zeynep Hilal Kilimci
Publication date
2012
DOI
10.1109/INISTA.2012.6246996
Links
Original

Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification - scientific work related to Wikipedia quality published in 2012, written by Mitat Poyraz, Murat Can Ganiz, Selim Akyokuş, Burak Görener and Zeynep Hilal Kilimci.

Overview

Majority of the existing text classification algorithms are based on the “bag of words” (BOW) approach, in which the documents are represented as weighted occurrence frequencies of individual terms. However, semantic relations between terms are ignored in this representation. There are several studies which address this problem by integrating background knowledge such as WordNet, ODP or Wikipedia as a semantic source. However, vast majority of these studies are applied to English texts and to the date there are no similar studies on classification of Turkish documents. Authors empirically analyze the effect of using Turkish Wikipedia (Vikipedi) as a semantic resource in classification of Turkish documents. Authors results demonstrate that performance of classification algorithms can be improved by exploiting Vikipedi concepts. Additionally, authors show that Vikipedi concepts have surprisingly large coverage in datasets which mostly consist of Turkish newspaper articles.

Embed

Wikipedia Quality

Poyraz, Mitat; Ganiz, Murat Can; Akyokuş, Selim; Görener, Burak; Kilimci, Zeynep Hilal. (2012). "[[Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification]]".DOI: 10.1109/INISTA.2012.6246996.

English Wikipedia

{{cite journal |last1=Poyraz |first1=Mitat |last2=Ganiz |first2=Murat Can |last3=Akyokuş |first3=Selim |last4=Görener |first4=Burak |last5=Kilimci |first5=Zeynep Hilal |title=Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification |date=2012 |doi=10.1109/INISTA.2012.6246996 |url=https://wikipediaquality.com/wiki/Exploiting_Turkish_Wikipedia_as_a_Semantic_Resource_for_Text_Classification}}

HTML

Poyraz, Mitat; Ganiz, Murat Can; Akyokuş, Selim; Görener, Burak; Kilimci, Zeynep Hilal. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Exploiting_Turkish_Wikipedia_as_a_Semantic_Resource_for_Text_Classification">Exploiting Turkish Wikipedia as a Semantic Resource for Text Classification</a>&quot;.DOI: 10.1109/INISTA.2012.6246996.