Unsupervised Query Segmentation Using Generative Language Models and Wikipedia

From Wikipedia Quality
Jump to: navigation, search


Unsupervised Query Segmentation Using Generative Language Models and Wikipedia
Authors
Bin Tan
Fuchun Peng
Publication date
2008
DOI
10.1145/1367497.1367545
Links
Original Preprint

Unsupervised Query Segmentation Using Generative Language Models and Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Bin Tan and Fuchun Peng.

Overview

In this paper, authors propose a novel unsupervised approach to query segmentation, an important task in Web search. Authors use a generative query model to recover a query's underlying concepts that compose its original segmented form. The model's parameters are estimated using an expectation-maximization (EM) algorithm, optimizing the minimum description length objective function on a partial corpus that is specific to the query. To augment this unsupervised learning, authors incorporate evidence from Wikipedia. Experiments show that approach dramatically improves performance over the traditional approach that is based on mutual information, and produces comparable results with a supervised method. In particular, the basic generative language model contributes a 7.4% improvement over the mutual information based method (measured by segment F1 on the Intersection test set). EM optimization further improves the performance by 14.3%. Additional knowledge from Wikipedia provides another improvement of 24.3%, adding up to a total of 46% improvement (from 0.530 to 0.774).

Embed

Wikipedia Quality

Tan, Bin; Peng, Fuchun. (2008). "[[Unsupervised Query Segmentation Using Generative Language Models and Wikipedia]]".DOI: 10.1145/1367497.1367545.

English Wikipedia

{{cite journal |last1=Tan |first1=Bin |last2=Peng |first2=Fuchun |title=Unsupervised Query Segmentation Using Generative Language Models and Wikipedia |date=2008 |doi=10.1145/1367497.1367545 |url=https://wikipediaquality.com/wiki/Unsupervised_Query_Segmentation_Using_Generative_Language_Models_and_Wikipedia}}

HTML

Tan, Bin; Peng, Fuchun. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Unsupervised_Query_Segmentation_Using_Generative_Language_Models_and_Wikipedia">Unsupervised Query Segmentation Using Generative Language Models and Wikipedia</a>&quot;.DOI: 10.1145/1367497.1367545.