Difference between revisions of "Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree"

From Wikipedia Quality
Jump to: navigation, search
(New scientific work)
 
Line 10: Line 10:
  
 
== Overview ==
 
== Overview ==
The number of articles in [[Wikipedia]] is growing rapidly. It is important for Wikipedia to provide users with high quality and reliable articles. However, the quality assessment metric provided by Wikipedia are inefficient, and other mainstream quality detection methods only focus on the qualities of the English Wikipedia articles, and usually analyze the text contents of articles, which is also a time-consuming process. In this paper, authors propose a method for detecting the article qualities of the Chinese Wikipedia based on C4.5 decision tree. The problem of quality detection is transformed to classification problem of high-quality and low-quality articles. By using the fields from the tables in the Chinese Wikipedia database, authors built the decision trees to distinguish high-quality articles from low-quality ones.
+
The number of articles in [[Wikipedia]] is growing rapidly. It is important for Wikipedia to provide users with high quality and reliable articles. However, the quality assessment metric provided by Wikipedia are inefficient, and other mainstream quality detection methods only focus on the qualities of the English Wikipedia articles, and usually analyze the text contents of articles, which is also a time-consuming process. In this paper, authors propose a method for detecting the article qualities of the Chinese Wikipedia based on C4.5 decision tree. The problem of quality detection is transformed to classification problem of high-quality and low-quality articles. By using the fields from the tables in the Chinese Wikipedia database, authors built the decision trees to distinguish high-quality articles from low-quality ones. Work is supported by the National Nature Science Foundation of [[China]].
 +
 
 +
== Measures ==
 +
* [[Page length]] (from [[page.sql.gz]])
 +
* Number of [[external links]] (from [[externallinks.sql.gz]])
 +
* Number of [[images]] (from [[imagelinks.sql.gz]])
 +
* Number of [[in-links]] and [[out-links]] (from [[pagelinks.sql.gz]])
 +
* Number of [[edits]] and number of [[editors]] (from [[pages−meta−history]])
 +
 
 +
== Algorithms ==
 +
Quality assessment was conducted as classification problem of high-quality and low-quality articles. Detecting article qualities in the Chinese Wikipedia based on the [[C4.5]] algorithm using [[Weka]] software.  
  
 
== Embed ==
 
== Embed ==
Line 37: Line 47:
  
 
[[Category:Scientific works]]
 
[[Category:Scientific works]]
 +
[[Category:C4.5]]
 +
[[Category:Binary classification]]
 +
[[Category:Classification]]
 +
[[Category:Weka]]
 +
[[Category:Page length]]
 +
[[Category:Number of images]]
 +
[[Category:Number of external links]]
 +
[[Category:Number of edits]]
 +
[[Category:Number of authors]]
 +
[[Category:Number of in-links]]
 +
[[Category:Number of out-links]]

Revision as of 16:10, 16 May 2019

Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree
Authors
Kui Xiao
Bing Li
Peng He
Xihui Yang
Publication date
2013
ISSN
03029743
ISBN
978-364239786-8
DOI
10.1007/978-3-642-39787-5-36
Links

Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree - scientific work about Wikipedia quality published in 2013, written by Kui Xiao, Bing Li, Peng He and Xihui Yang.

Overview

The number of articles in Wikipedia is growing rapidly. It is important for Wikipedia to provide users with high quality and reliable articles. However, the quality assessment metric provided by Wikipedia are inefficient, and other mainstream quality detection methods only focus on the qualities of the English Wikipedia articles, and usually analyze the text contents of articles, which is also a time-consuming process. In this paper, authors propose a method for detecting the article qualities of the Chinese Wikipedia based on C4.5 decision tree. The problem of quality detection is transformed to classification problem of high-quality and low-quality articles. By using the fields from the tables in the Chinese Wikipedia database, authors built the decision trees to distinguish high-quality articles from low-quality ones. Work is supported by the National Nature Science Foundation of China.

Measures

Algorithms

Quality assessment was conducted as classification problem of high-quality and low-quality articles. Detecting article qualities in the Chinese Wikipedia based on the C4.5 algorithm using Weka software.

Embed

Wikipedia Quality

Xiao, Kui; Li, Bing; He, Peng; Yang, Xihui. (2013). "[[Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree]]". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 8041 LNAI, 2013, pp. 444-452. ISBN: 978-364239786-8. ISSN: 03029743. DOI: 10.1007/978-3-642-39787-5-36.

English Wikipedia

{{cite journal |last1=Xiao |first1=Kui |last2=Li |first2=Bing |last3=He |first3=Peng |last4=Yang |first4=Xihui |title=Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree |date=2013 |isbn=978-364239786-8 |issn=03029743 |doi=10.1007/978-3-642-39787-5-36 |url=https://wikipediaquality.com/wiki/Detection_of_Article_Qualities_in_the_Chinese_Wikipedia_Based_on_C4.5_Decision_Tree |journal=Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 8041 LNAI, 2013, pp. 444-452}}

HTML

Xiao, Kui; Li, Bing; He, Peng; Yang, Xihui. (2013). &quot;<a href="https://wikipediaquality.com/wiki/Detection_of_Article_Qualities_in_the_Chinese_Wikipedia_Based_on_C4.5_Decision_Tree">Detection of Article Qualities in the Chinese Wikipedia Based on C4.5 Decision Tree</a>&quot;. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 8041 LNAI, 2013, pp. 444-452. ISBN: 978-364239786-8. ISSN: 03029743. DOI: 10.1007/978-3-642-39787-5-36.