Difference between revisions of "Understanding User's Query Intent with Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Adding new article - Understanding User's Query Intent with Wikipedia)
 
(+ links)
Line 1: Line 1:
'''Understanding User's Query Intent with Wikipedia''' - scientific work related to Wikipedia quality published in 2009, written by Jian Hu, Gang Wang, Frederick H. Lochovsky, Jian-Tao Sun and Zheng Chen.
+
'''Understanding User's Query Intent with Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Jian Hu]], [[Gang Wang]], [[Frederick H. Lochovsky]], [[Jian-Tao Sun]] and [[Zheng Chen]].
  
 
== Overview ==
 
== Overview ==
Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, authors propose a general methodology to the problem of query intent classification. With very little human effort, method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. Authors demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. Authors perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that method significantly outperforms other methods in each intent domain.
+
Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, authors propose a general methodology to the problem of query intent classification. With very little human effort, method can discover large quantities of intent concepts by leveraging [[Wikipedia]], one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and [[categories]]. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. Authors demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. Authors perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that method significantly outperforms other methods in each intent domain.

Revision as of 11:47, 18 October 2020

Understanding User's Query Intent with Wikipedia - scientific work related to Wikipedia quality published in 2009, written by Jian Hu, Gang Wang, Frederick H. Lochovsky, Jian-Tao Sun and Zheng Chen.

Overview

Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, authors propose a general methodology to the problem of query intent classification. With very little human effort, method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. Authors demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. Authors perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that method significantly outperforms other methods in each intent domain.