Using Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking

From Wikipedia Quality
Jump to: navigation, search


Using Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking
Authors
Jaap Kamps
Rianne Kaptein
Marijn Koolen
Publication date
2010
ISSN
1048776X
Links

Using Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking - scientific work about Wikipedia quality published in 2010, written by Jaap Kamps, Rianne Kaptein and Marijn Koolen.

Overview

In this paper, authors document their efforts in participating to the TREC 2010 Entity Ranking and Web Tracks. Authors had multiple aims: For the Web Track authors wanted to compare the effectiveness of anchor text of the category A and B collections and the impact of global document quality measures such as PageRank and spam scores. Authors find that documents in ClueWeb09 category B have a higher probability of being retrieved than other documents in category A. In ClueWeb09 category B, spam is mainly an issue for full-text retrieval. Anchor text suffers little from spam. Spam scores can be used to filter spam but also to find key resources. Documents that are least likely to be spam tend to be high-quality results. For the Entity Ranking Track, authors use Wikipedia as a pivot to find relevant entities on the Web. Using category information to retrieve entities within Wikipedia leads to large improvements. Although authors achieve large improvements over their baseline run that does not use category information, their best scores are still weak. Following the external links onWikipedia pages to find the homepages of the entities in the ClueWeb collection, works better than searching an anchor text index, and combining the external links with searching an anchor text index.

Embed

Wikipedia Quality

Kamps, Jaap; Kaptein, Rianne; Koolen, Marijn. (2010). "[[Using Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking]]". NIST Special Publication 2010, 7p. ISSN: 1048776X.

English Wikipedia

{{cite journal |last1=Kamps |first1=Jaap |last2=Kaptein |first2=Rianne |last3=Koolen |first3=Marijn |title=Using Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking |date=2010 |issn=1048776X |url=https://wikipediaquality.com/wiki/Using_Anchor_Text,_Spam_Filtering_and_Wikipedia_for_Web_Search_and_Entity_Ranking |journal=NIST Special Publication 2010, 7p}}

HTML

Kamps, Jaap; Kaptein, Rianne; Koolen, Marijn. (2010). &quot;<a href="https://wikipediaquality.com/wiki/Using_Anchor_Text,_Spam_Filtering_and_Wikipedia_for_Web_Search_and_Entity_Ranking">Using Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking</a>&quot;. NIST Special Publication 2010, 7p. ISSN: 1048776X.