Difference between revisions of "Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines"

From Wikipedia Quality
Jump to: navigation, search
(New study: Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines)
 
(Int.links)
Line 1: Line 1:
'''Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines''' - scientific work related to Wikipedia quality published in 2014, written by Falah Hassan and Ali Al-Akashi.
+
'''Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines''' - scientific work related to [[Wikipedia quality]] published in 2014, written by [[Falah Hassan]] and [[Ali Al-Akashi]].
  
 
== Overview ==
 
== Overview ==
_____________________________________________________________________________ The Web is comprised of a vast quantity of text. Modern search engines struggle to index it independent of the structure of queries and type of Web data, and commonly use indexing based on Web‘s graph structure to identify high-quality relevant pages. However, despite the apparent widespread use of these algorithms, Web indexing based on human feedback and document content is controversial. There are many fundamental questions that need to be addressed, including: How many types of domains/websites are there in the Web? What type of data is in each type of domain? For each type, which segments/HTML fields in the documents are most useful? What are the relationships between the segments? How can web content be indexed efficiently in all forms of document configurations? Authors investigation of these questions has led to a novel way to use Wikipedia to find the relationships between the query structures and document configurations throughout the document indexing process and to use them to build an efficient index that allows fast indexing and searching, and optimizes the retrieval of highly relevant results. Authors consider the top page on the ranked list to be highly important in determining the types of queries. Authors aim is to design a powerful search engine with a strong focus on how to make the first page highly relevant to the user, and on how to retrieve other pages based on that first page. Through processing the user query using the Wikipedia index and determining the type of the query, approach could trace the path of a query in index, and retrieve specific results for each type. Authors use two kinds of data to increase the relevancy and efficiency of the ranked results: offline and real-time. Traditional search engines find it difficult to use these two kinds of data together, because building a real-time index from social data and integrating it with the index for the offline data is difficult in a traditional distributed index. As a source of offline data, authors use data from the Text Retrieval Conference (TREC) evaluation campaign. The web track at TREC offers researchers chance to investigate different retrieval approaches for web indexing and searching. The crawled offline dataset makes it possible to design powerful search engines that extends current methods and to evaluate and compare them. Authors propose a new indexing method, based on the structures of the queries and the content of documents. Authors search engine uses a core index for offline data and a hash index for real-time
+
_____________________________________________________________________________ The Web is comprised of a vast quantity of text. Modern search engines struggle to index it independent of the structure of queries and type of Web data, and commonly use indexing based on Web‘s graph structure to identify high-quality relevant pages. However, despite the apparent widespread use of these algorithms, Web indexing based on human feedback and document content is controversial. There are many fundamental questions that need to be addressed, including: How many types of domains/websites are there in the Web? What type of data is in each type of domain? For each type, which segments/HTML fields in the documents are most useful? What are the relationships between the segments? How can web content be indexed efficiently in all forms of document configurations? Authors investigation of these questions has led to a novel way to use [[Wikipedia]] to find the relationships between the query structures and document configurations throughout the document indexing process and to use them to build an efficient index that allows fast indexing and searching, and optimizes the retrieval of highly relevant results. Authors consider the top page on the ranked list to be highly important in determining the types of queries. Authors aim is to design a powerful search engine with a strong focus on how to make the first page highly relevant to the user, and on how to retrieve other pages based on that first page. Through processing the user query using the Wikipedia index and determining the type of the query, approach could trace the path of a query in index, and retrieve specific results for each type. Authors use two kinds of data to increase the relevancy and efficiency of the ranked results: offline and real-time. Traditional search engines find it difficult to use these two kinds of data together, because building a real-time index from social data and integrating it with the index for the offline data is difficult in a traditional distributed index. As a source of offline data, authors use data from the Text Retrieval Conference (TREC) evaluation campaign. The web track at TREC offers researchers chance to investigate different retrieval approaches for web indexing and searching. The crawled offline dataset makes it possible to design powerful search engines that extends current methods and to evaluate and compare them. Authors propose a new indexing method, based on the structures of the queries and the content of documents. Authors search engine uses a core index for offline data and a hash index for real-time

Revision as of 09:18, 2 May 2020

Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines - scientific work related to Wikipedia quality published in 2014, written by Falah Hassan and Ali Al-Akashi.

Overview

_____________________________________________________________________________ The Web is comprised of a vast quantity of text. Modern search engines struggle to index it independent of the structure of queries and type of Web data, and commonly use indexing based on Web‘s graph structure to identify high-quality relevant pages. However, despite the apparent widespread use of these algorithms, Web indexing based on human feedback and document content is controversial. There are many fundamental questions that need to be addressed, including: How many types of domains/websites are there in the Web? What type of data is in each type of domain? For each type, which segments/HTML fields in the documents are most useful? What are the relationships between the segments? How can web content be indexed efficiently in all forms of document configurations? Authors investigation of these questions has led to a novel way to use Wikipedia to find the relationships between the query structures and document configurations throughout the document indexing process and to use them to build an efficient index that allows fast indexing and searching, and optimizes the retrieval of highly relevant results. Authors consider the top page on the ranked list to be highly important in determining the types of queries. Authors aim is to design a powerful search engine with a strong focus on how to make the first page highly relevant to the user, and on how to retrieve other pages based on that first page. Through processing the user query using the Wikipedia index and determining the type of the query, approach could trace the path of a query in index, and retrieve specific results for each type. Authors use two kinds of data to increase the relevancy and efficiency of the ranked results: offline and real-time. Traditional search engines find it difficult to use these two kinds of data together, because building a real-time index from social data and integrating it with the index for the offline data is difficult in a traditional distributed index. As a source of offline data, authors use data from the Text Retrieval Conference (TREC) evaluation campaign. The web track at TREC offers researchers chance to investigate different retrieval approaches for web indexing and searching. The crawled offline dataset makes it possible to design powerful search engines that extends current methods and to evaluate and compare them. Authors propose a new indexing method, based on the structures of the queries and the content of documents. Authors search engine uses a core index for offline data and a hash index for real-time