scispace - formally typeset
Search or ask a question
Topic

Search engine

About: Search engine is a research topic. Over the lifetime, 9124 publications have been published within this topic receiving 214686 citations. The topic is also known as: search engine.


Papers
More filters
Journal ArticleDOI
01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,696 citations

Journal Article
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

13,327 citations

Proceedings ArticleDOI
23 Jul 2002
TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

4,453 citations

Journal ArticleDOI
Abstract: The effects of prior knowledge about a product class on various characteristics of pre-purchase information search within that product class are examined. A new search task methodology is used that imposes only a limited amount of structure on the search task: subjects are not cued with a list of attributes, and the problem is not structured in a brand-by-attribute matrix. The results indicate that prior knowledge facilitates the acquisition of new information and increases search efficiency. The results also support the conceptual distinction between objective and subjective knowledge.

1,935 citations

Journal Article
TL;DR: A new conceptual paradigm for performing search in context is presented, that largely automates the search process, providing even non-professional users with highly relevant results.
Abstract: Keyword-based search engines are in widespread use today as a popular means for Web-based information retrieval Although such systems seem deceptively simple, a considerable amount of skill is required in order to satisfy non-trivial information needs This paper presents a new conceptual paradigm for performing search in context, that largely automates the search process, providing even non-professional users with highly relevant results This paradigm is implemented in practice in the IntelliZap system, where search is initiated from a text query marked by the user in a document she views, and is guided by the text surrounding the marked query in that document (“the context”) The context-driven information retrieval process involves semantic keyword extraction and clustering to automatically generate new, augmented queries The latter are submitted to a host of general and domain-specific search engines Search results are then semantically reranked, using context Experimental results testify that using context to guide search, effectively offers even inexperienced users an advanced search tool on the Web

1,615 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
82% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Software
130.5K papers, 2M citations
81% related
User interface
85.4K papers, 1.7M citations
80% related
Fuzzy logic
151.2K papers, 2.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023164
2022260
2021129
2020196
2019257
2018321