Difference between revisions of "Unsupervised Query Segmentation Using Generative Language Models and Wikipedia"

Revision as of 20:41, 6 June 2019

Unsupervised Query Segmentation Using Generative Language Models and Wikipedia
Authors	Bin Tan Fuchun Peng
Publication date	2008
DOI	10.1145/1367497.1367545
Links	Original Preprint

Unsupervised Query Segmentation Using Generative Language Models and Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Bin Tan and Fuchun Peng.

Overview

In this paper, authors propose a novel unsupervised approach to query segmentation, an important task in Web search. Authors use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, authors incorporate evidence from Wikipedia. Experiments show that approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774).

@@ Line 1: / Line 1: @@
+{{Infobox work
+| title = Unsupervised Query Segmentation Using Generative Language Models and Wikipedia
+| date = 2008
+| authors = [[Bin Tan]]<br />[[Fuchun Peng]]
+| doi = 10.1145/1367497.1367545
+| link = http://dl.acm.org/citation.cfm?id=1367545
+| plink = https://www.semanticscholar.org/paper/Unsupervised-query-segmentation-using-generative-Tan-Peng/ea775c61144a28136239f1edfa09b6fedd571db0/figure/3
+}}
 '''Unsupervised Query Segmentation Using Generative Language Models and Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Bin Tan]] and [[Fuchun Peng]].
 == Overview ==
 In this paper, authors propose a novel unsupervised approach to query segmentation, an important task in Web search. Authors use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, authors incorporate evidence from [[Wikipedia]]. Experiments show that approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774).

Difference between revisions of "Unsupervised Query Segmentation Using Generative Language Models and Wikipedia"

Revision as of 20:41, 6 June 2019

Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools