UMass at TREC 2010 Web Track: Term Dependence, Spam Filtering and Quality Bias

From Wikipedia Quality
Jump to: navigation, search


UMass at TREC 2010 Web Track: Term Dependence, Spam Filtering and Quality Bias
Authors
Michael Bendersky
David Fisher
W. Bruce Croft
Publication date
2010
ISSN
1048776X
Links

UMass at TREC 2010 Web Track: Term Dependence, Spam Filtering and Quality Bias - scientific work about Wikipedia quality published in 2010, written by Michael Bendersky, David Fisher and W. Bruce Croft.

Overview

Many existing retrieval approaches treat all the documents in the collection equally, and do not take into account the content quality of the retrieved documents. In their submissions for TREC 2010 Web Track, authors utilize quality-biased ranking methods that are aimed to promote documents that potentially contain high-quality content, and penalize spam and low-quality documents. Their experiments with the ad hoc web topics from TREC 2010 show that features such as the spamminess of the document (as computed by the Waterloo team [6]) and the readability of the document (modeled by the fraction of stopwords in the document) are very important for improving the precision at the top ranks. Promotion of the high-quality Wikipedia pages leads to further retrieval performance improvements. In addition, authors found that using Wikipedia as a high-quality document collection for query expansion can ameliorate some of the negative effects of performing pseudo-relevance feedback from a noisy web collection such as ClueWeb09.

Embed

Wikipedia Quality

Bendersky, Michael; Fisher, David; Croft, W. Bruce. (2010). "[[UMass at TREC 2010 Web Track: Term Dependence, Spam Filtering and Quality Bias]]". NIST Special Publication 2010, 7p. ISSN: 1048776X.

English Wikipedia

{{cite journal |last1=Bendersky |first1=Michael |last2=Fisher |first2=David |last3=Croft |first3=W. Bruce |title=UMass at TREC 2010 Web Track: Term Dependence, Spam Filtering and Quality Bias |date=2010 |issn=1048776X |url=https://wikipediaquality.com/wiki/UMass_at_TREC_2010_Web_Track:_Term_Dependence,_Spam_Filtering_and_Quality_Bias |journal=NIST Special Publication 2010, 7p}}

HTML

Bendersky, Michael; Fisher, David; Croft, W. Bruce. (2010). &quot;<a href="https://wikipediaquality.com/wiki/UMass_at_TREC_2010_Web_Track:_Term_Dependence,_Spam_Filtering_and_Quality_Bias">UMass at TREC 2010 Web Track: Term Dependence, Spam Filtering and Quality Bias</a>&quot;. NIST Special Publication 2010, 7p. ISSN: 1048776X.