Difference between revisions of "Unsupervised Query Segmentation Using Generative Language Models and Wikipedia"
(Adding wikilinks) |
(+ infobox) |
||
Line 1: | Line 1: | ||
+ | {{Infobox work | ||
+ | | title = Unsupervised Query Segmentation Using Generative Language Models and Wikipedia | ||
+ | | date = 2008 | ||
+ | | authors = [[Bin Tan]]<br />[[Fuchun Peng]] | ||
+ | | doi = 10.1145/1367497.1367545 | ||
+ | | link = http://dl.acm.org/citation.cfm?id=1367545 | ||
+ | | plink = https://www.semanticscholar.org/paper/Unsupervised-query-segmentation-using-generative-Tan-Peng/ea775c61144a28136239f1edfa09b6fedd571db0/figure/3 | ||
+ | }} | ||
'''Unsupervised Query Segmentation Using Generative Language Models and Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Bin Tan]] and [[Fuchun Peng]]. | '''Unsupervised Query Segmentation Using Generative Language Models and Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Bin Tan]] and [[Fuchun Peng]]. | ||
== Overview == | == Overview == | ||
In this paper, authors propose a novel unsupervised approach to query segmentation, an important task in Web search. Authors use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, authors incorporate evidence from [[Wikipedia]]. Experiments show that approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774). | In this paper, authors propose a novel unsupervised approach to query segmentation, an important task in Web search. Authors use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, authors incorporate evidence from [[Wikipedia]]. Experiments show that approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774). |
Revision as of 20:41, 6 June 2019
Authors | Bin Tan Fuchun Peng |
---|---|
Publication date | 2008 |
DOI | 10.1145/1367497.1367545 |
Links | Original Preprint |
Unsupervised Query Segmentation Using Generative Language Models and Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Bin Tan and Fuchun Peng.
Overview
In this paper, authors propose a novel unsupervised approach to query segmentation, an important task in Web search. Authors use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, authors incorporate evidence from Wikipedia. Experiments show that approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774).