Topic

Search engine indexing

About: Search engine indexing is a research topic. Over the lifetime, 20909 publications have been published within this topic receiving 516954 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The anatomy of a large-scale hypertextual Web search engine

[...]

Sergey Brin¹, Lawrence Page¹•Institutions (1)

Stanford University¹

01 Apr 1998

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

14,696 citations

Book•

Introduction to Information Retrieval

[...]

Christopher D. Manning¹, Prabhakar Raghavan², Hinrich Schütze³•Institutions (3)

Stanford University¹, Google², University of Stuttgart³

01 Jan 2008

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.

...read moreread less

Abstract: Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike.

...read moreread less

11,804 citations

Proceedings Article•DOI•

R-trees: a dynamic index structure for spatial searching

[...]

Antonin Guttman¹•Institutions (1)

University of California, Berkeley¹

01 Jun 1984

TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.

...read moreread less

Abstract: In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, traditional indexing methods are not well suited to data objects of non-zero size located m multi-dimensional spaces In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it. We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems in spatial applications

...read moreread less

7,336 citations

Journal Article•DOI•

A vector space model for automatic indexing

[...]

Gerard Salton¹, A. Wong¹, C. S. Yang¹•Institutions (1)

Cornell University¹

01 Nov 1975-Communications of The ACM

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.

...read moreread less

Abstract: In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.

...read moreread less

6,619 citations

Journal Article•DOI•

Content-based image retrieval at the end of the early years

[...]

Arnold W. M. Smeulders¹, Marcel Worring¹, Simone Santini², Amarnath Gupta², Ramesh Jain - Show less +1 more•Institutions (2)

University of Amsterdam¹, University of California, San Diego²

01 Dec 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.

...read moreread less

Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

...read moreread less

6,447 citations

Collapse

Network Information

Performance

Metrics

22,174

Papers

542,367

Citations

No. of papers in the topic in previous years
Year	Papers
2023	371
2022	889
2021	382
2020	509
2019	631
2018	648

Search engine indexing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics