Difference between revisions of "Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines"

From Wikipedia Quality
Jump to: navigation, search
(+ infobox)
(Adding embed)
Line 10: Line 10:
 
== Overview ==
 
== Overview ==
 
_____________________________________________________________________________ The Web is comprised of a vast quantity of text. Modern search engines struggle to index it independent of the structure of queries and type of Web data, and commonly use indexing based on Web‘s graph structure to identify high-quality relevant pages. However, despite the apparent widespread use of these algorithms, Web indexing based on human feedback and document content is controversial. There are many fundamental questions that need to be addressed, including: How many types of domains/websites are there in the Web? What type of data is in each type of domain? For each type, which segments/HTML fields in the documents are most useful? What are the relationships between the segments? How can web content be indexed efficiently in all forms of document configurations? Authors investigation of these questions has led to a novel way to use [[Wikipedia]] to find the relationships between the query structures and document configurations throughout the document indexing process and to use them to build an efficient index that allows fast indexing and searching, and optimizes the retrieval of highly relevant results. Authors consider the top page on the ranked list to be highly important in determining the types of queries. Authors aim is to design a powerful search engine with a strong focus on how to make the first page highly relevant to the user, and on how to retrieve other pages based on that first page. Through processing the user query using the Wikipedia index and determining the type of the query, approach could trace the path of a query in index, and retrieve specific results for each type. Authors use two kinds of data to increase the relevancy and efficiency of the ranked results: offline and real-time. Traditional search engines find it difficult to use these two kinds of data together, because building a real-time index from social data and integrating it with the index for the offline data is difficult in a traditional distributed index. As a source of offline data, authors use data from the Text Retrieval Conference (TREC) evaluation campaign. The web track at TREC offers researchers chance to investigate different retrieval approaches for web indexing and searching. The crawled offline dataset makes it possible to design powerful search engines that extends current methods and to evaluate and compare them. Authors propose a new indexing method, based on the structures of the queries and the content of documents. Authors search engine uses a core index for offline data and a hash index for real-time
 
_____________________________________________________________________________ The Web is comprised of a vast quantity of text. Modern search engines struggle to index it independent of the structure of queries and type of Web data, and commonly use indexing based on Web‘s graph structure to identify high-quality relevant pages. However, despite the apparent widespread use of these algorithms, Web indexing based on human feedback and document content is controversial. There are many fundamental questions that need to be addressed, including: How many types of domains/websites are there in the Web? What type of data is in each type of domain? For each type, which segments/HTML fields in the documents are most useful? What are the relationships between the segments? How can web content be indexed efficiently in all forms of document configurations? Authors investigation of these questions has led to a novel way to use [[Wikipedia]] to find the relationships between the query structures and document configurations throughout the document indexing process and to use them to build an efficient index that allows fast indexing and searching, and optimizes the retrieval of highly relevant results. Authors consider the top page on the ranked list to be highly important in determining the types of queries. Authors aim is to design a powerful search engine with a strong focus on how to make the first page highly relevant to the user, and on how to retrieve other pages based on that first page. Through processing the user query using the Wikipedia index and determining the type of the query, approach could trace the path of a query in index, and retrieve specific results for each type. Authors use two kinds of data to increase the relevancy and efficiency of the ranked results: offline and real-time. Traditional search engines find it difficult to use these two kinds of data together, because building a real-time index from social data and integrating it with the index for the offline data is difficult in a traditional distributed index. As a source of offline data, authors use data from the Text Retrieval Conference (TREC) evaluation campaign. The web track at TREC offers researchers chance to investigate different retrieval approaches for web indexing and searching. The crawled offline dataset makes it possible to design powerful search engines that extends current methods and to evaluate and compare them. Authors propose a new indexing method, based on the structures of the queries and the content of documents. Authors search engine uses a core index for offline data and a hash index for real-time
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Hassan, Falah; Al-Akashi, Ali. (2014). "[[Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines]]". Université d'Ottawa / University of Ottawa. DOI: 10.20381/ruor-6304.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Hassan |first1=Falah |last2=Al-Akashi |first2=Ali |title=Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines |date=2014 |doi=10.20381/ruor-6304 |url=https://wikipediaquality.com/wiki/Using_Wikipedia_Knowledge_and_Query_Types_in_a_New_Indexing_Approach_for_Web_Search_Engines |journal=Université d'Ottawa / University of Ottawa}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Hassan, Falah; Al-Akashi, Ali. (2014). &amp;quot;<a href="https://wikipediaquality.com/wiki/Using_Wikipedia_Knowledge_and_Query_Types_in_a_New_Indexing_Approach_for_Web_Search_Engines">Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines</a>&amp;quot;. Université d'Ottawa / University of Ottawa. DOI: 10.20381/ruor-6304.
 +
</nowiki>
 +
</code>

Revision as of 08:52, 12 February 2021


Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines
Authors
Falah Hassan
Ali Al-Akashi
Publication date
2014
DOI
10.20381/ruor-6304
Links
Original

Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines - scientific work related to Wikipedia quality published in 2014, written by Falah Hassan and Ali Al-Akashi.

Overview

_____________________________________________________________________________ The Web is comprised of a vast quantity of text. Modern search engines struggle to index it independent of the structure of queries and type of Web data, and commonly use indexing based on Web‘s graph structure to identify high-quality relevant pages. However, despite the apparent widespread use of these algorithms, Web indexing based on human feedback and document content is controversial. There are many fundamental questions that need to be addressed, including: How many types of domains/websites are there in the Web? What type of data is in each type of domain? For each type, which segments/HTML fields in the documents are most useful? What are the relationships between the segments? How can web content be indexed efficiently in all forms of document configurations? Authors investigation of these questions has led to a novel way to use Wikipedia to find the relationships between the query structures and document configurations throughout the document indexing process and to use them to build an efficient index that allows fast indexing and searching, and optimizes the retrieval of highly relevant results. Authors consider the top page on the ranked list to be highly important in determining the types of queries. Authors aim is to design a powerful search engine with a strong focus on how to make the first page highly relevant to the user, and on how to retrieve other pages based on that first page. Through processing the user query using the Wikipedia index and determining the type of the query, approach could trace the path of a query in index, and retrieve specific results for each type. Authors use two kinds of data to increase the relevancy and efficiency of the ranked results: offline and real-time. Traditional search engines find it difficult to use these two kinds of data together, because building a real-time index from social data and integrating it with the index for the offline data is difficult in a traditional distributed index. As a source of offline data, authors use data from the Text Retrieval Conference (TREC) evaluation campaign. The web track at TREC offers researchers chance to investigate different retrieval approaches for web indexing and searching. The crawled offline dataset makes it possible to design powerful search engines that extends current methods and to evaluate and compare them. Authors propose a new indexing method, based on the structures of the queries and the content of documents. Authors search engine uses a core index for offline data and a hash index for real-time

Embed

Wikipedia Quality

Hassan, Falah; Al-Akashi, Ali. (2014). "[[Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines]]". Université d'Ottawa / University of Ottawa. DOI: 10.20381/ruor-6304.

English Wikipedia

{{cite journal |last1=Hassan |first1=Falah |last2=Al-Akashi |first2=Ali |title=Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines |date=2014 |doi=10.20381/ruor-6304 |url=https://wikipediaquality.com/wiki/Using_Wikipedia_Knowledge_and_Query_Types_in_a_New_Indexing_Approach_for_Web_Search_Engines |journal=Université d'Ottawa / University of Ottawa}}

HTML

Hassan, Falah; Al-Akashi, Ali. (2014). &quot;<a href="https://wikipediaquality.com/wiki/Using_Wikipedia_Knowledge_and_Query_Types_in_a_New_Indexing_Approach_for_Web_Search_Engines">Using Wikipedia Knowledge and Query Types in a New Indexing Approach for Web Search Engines</a>&quot;. Université d'Ottawa / University of Ottawa. DOI: 10.20381/ruor-6304.