scispace - formally typeset
Search or ask a question

Showing papers by "Google published in 2005"


Patent
05 Feb 2005
TL;DR: In this paper, various methods, systems, and apparatus for implementing aspects of a digital mapping system are disclosed, such as sending a location request from a client-side computing device to a map tile server, receiving a set of map tiles in response to the location request, assembling said received map tiles into a tile grid, aligning the tile grid relative to a clipping shape, and displaying the result as a map image.
Abstract: Various methods, systems, and apparatus for implementing aspects of a digital mapping system are disclosed. One such method includes sending a location request from a client-side computing device to a map tile server, receiving a set of map tiles in response to the location request, assembling said received map tiles into a tile grid, aligning the tile grid relative to a clipping shape, and displaying the result as a map image. One apparatus according to aspects of the present invention includes means for sending a location request from a client-side computing device to a map tile server, means for receiving a set of map tiles in response to the location request, means for assembling said received map tiles into a tile grid, means for aligning the tile grid relative to a clipping shape, and means for displaying the result as a map image. Such an apparatus may further include direction control or zoom control objects as interactive overlays on the displayed map image, and may also include route or location overlays on the map image.

876 citations


Proceedings ArticleDOI
05 Jan 2005
TL;DR: The key contributions of this empirical study are to demonstrate that a model trained in this manner can achieve results comparable to a modeltrained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the detection confidence generated by the detector.
Abstract: The construction of appearance-based object detection systems is time-consuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Semi-supervised training is a means for reducing the effort needed to prepare the training set by training the model with a small number of fully labeled examples and an additional set of unlabeled or weakly labeled examples. In this work we present a semi-supervised approach to training object detection systems based on self-training. We implement our approach as a wrapper around the training process of an existing object detector and present empirical results. The key contributions of this empirical study is to demonstrate that a model trained in this manner can achieve results comparable to a model trained in the traditional manner using a much larger set of fully labeled data, and that a training data selection metric that is defined independently of the detector greatly outperforms a selection metric based on the detection confidence generated by the detector.

767 citations


Journal ArticleDOI
TL;DR: The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines.
Abstract: Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines.

718 citations


Journal Article
TL;DR: This contribution presents an online SVM algorithm based on the premise that active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
Abstract: Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention?This contribution proposes an empirical answer. We first present an online SVM algorithm based on this premise. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

700 citations


Patent
12 Jul 2005
TL;DR: In this paper, a system and method for using a user profile to order placed content in search results returned by a search engine is presented, based on search queries submitted by a user, the user's specific interaction with the documents identified by the search engine and personal information provided by the user.
Abstract: A system and method for using a user profile to order placed content in search results returned by a search engine. The user profile is based on search queries submitted by a user, the user's specific interaction with the documents identified by the search engine and personal information provided by the user. Placed content is ranked by a score based at least in part on a similarity of a particular placed content to the user's profile. User profiles can be created and/or stored on the client side or server side of a client-server network environment.

558 citations


Patent
12 May 2005
TL;DR: In this article, a method of establishing connection between users of mobile devices includes receiving at a computer a location of a first user from a first mobile device, receiving from a second mobile device a location, and sending a message to the first mobile devices based on the proximity of the first user to the second user.
Abstract: A method of establishing connection between users of mobile devices includes receiving at a computer a location of a first user from a first mobile device, receiving from a second mobile device a location of a second user having an acquaintance relationship to the first user, and sending a message to the first mobile device based on the proximity of the first user to the second user.

504 citations


Proceedings ArticleDOI
Daniel Gruhl1, Ramanathan V. Guha2, Ravi Kumar1, Jasmine Novak1, Andrew Tomkins1 
21 Aug 2005
TL;DR: First, carefully hand-crafted queries produce matching postings whose volume predicts sales ranks, and even though sales rank motion might be difficult to predict in general, algorithmic predictors can use online postings to successfully predict spikes in sales rank.
Abstract: An increasing fraction of the global discourse is migrating online in the form of blogs, bulletin boards, web pages, wikis, editorials, and a dizzying array of new collaborative technologies. The migration has now proceeded to the point that topics reflecting certain individual products are sufficiently popular to allow targeted online tracking of the ebb and flow of chatter around these topics. Based on an analysis of around half a million sales rank values for 2,340 books over a period of four months, and correlating postings in blogs, media, and web pages, we are able to draw several interesting conclusions.First, carefully hand-crafted queries produce matching postings whose volume predicts sales ranks. Second, these queries can be automatically generated in many cases. And third, even though sales rank motion might be difficult to predict in general, algorithmic predictors can use online postings to successfully predict spikes in sales rank.

432 citations


Patent
21 Jun 2005
TL;DR: In this article, personalized advertisements are provided to a user using a search engine to obtain documents relevant to a search query, where advertisements are personalized in response to a query profile that is derived from personalized search results.
Abstract: Personalized advertisements are provided to a user using a search engine to obtain documents relevant to a search query. The advertisements are personalized in response to a search profile that is derived from personalized search results. The search results are personalized based on a user profile of the user providing the query. The user profile describes interests of the user, and can be derived from a variety of sources, including prior search queries, prior search results, expressed interests, demographic, geographic, psychographic, and activity information.

387 citations


Patent
20 Jan 2005
TL;DR: In this paper, a method comprising identifying a first profile in a social network, identifying associated profiles associated with the first profile, ranking the associated profiles, and outputting associated profiles based at least in part on the ranking is presented.
Abstract: Systems and methods for the display and navigation of a social network or set forth. According to one embodiment a method comprising identifying a first profile in a social network, identifying associated profiles associated with the first profile, ranking the associated profiles, wherein ranking is not based exclusively on a degree of separation, and outputting the associated profiles based at least in part on the ranking is set forth. According to another embodiment a method comprising identifying a user profile, identifying a member profile, determining an association path for the user profile and the member profile, and outputting the association path is set forth.

383 citations


Patent
01 Apr 2005
TL;DR: A portable device having scanning, imaging or other data-capture capability is described in this paper, where the portable device can indicate to the user when enough information has been captured to uniquely identify a source document.
Abstract: A portable device having scanning, imaging or other data-capture capability is described. In some cases, the portable device can indicate to the user when enough information has been captured to uniquely identify a source document. In some cases, the portable device calculates timestamps and location-stamps indicating when and where a data capture occurred. In some cases, the portable device is controlled by gestures. In some cases, the portable scanning device has associated billing and content/service subscription information.

381 citations


Patent
12 Apr 2005
TL;DR: In this paper, a system for processing data captured from rendered documents is described, which provides a way for authors and publishers to add value to printed documents using associated supplemental material using text scanned from a document and context.
Abstract: A system for processing data captured from rendered documents is described. The system provides a way for authors and publishers to add value to printed documents using associated supplemental material. The system can use text scanned from a document and context to identify an electronic document that corresponds to the scanned document. A user can then access supplemental material associated with the digital document.

Proceedings ArticleDOI
21 Aug 2005
TL;DR: An extensive empirical comparison of six distinct measures of similarity for recommending online communities to members of the Orkut social network is presented, determining the usefulness of the different recommendations by actually measuring users' propensity to visit and join recommended communities.
Abstract: Online information services have grown too large for users to navigate without the help of automated tools such as collaborative filtering, which makes recommendations to users based on their collective past behavior. While many similarity measures have been proposed and individually evaluated, they have not been evaluated relative to each other in a large real-world environment. We present an extensive empirical comparison of six distinct measures of similarity for recommending online communities to members of the Orkut social network. We determine the usefulness of the different recommendations by actually measuring users' propensity to visit and join recommended communities. We also examine how the ordering of recommendations influenced user selection, as well as interesting social issues that arise in recommending communities within a real social network.

Patent
07 Oct 2005
TL;DR: In this article, a collection of captured images that form at least a portion of a library of images is used to enable retrieval of the captured images, and an index is generated where the index data is based on recognized information.
Abstract: An embodiment provides for enabling retrieval of a collection of captured images that form at least a portion of a library of images. For each image in the collection, a captured image may be analyzed to recognize information from image data contained in the captured image, and an index may be generated, where the index data is based on the recognized information. Using the index, functionality such as search and retrieval is enabled. Various recognition techniques, including those that use the face, clothing, apparel, and combinations of characteristics may be utilized. Recognition may be performed on, among other things, persons and text carried on objects.

Journal ArticleDOI
09 Jul 2005
TL;DR: The AdaBoost based classifiers presented here achieve over 93% accuracy; these match or surpass the accuracies of the SVM-based classifiers, and yield performance that is 50 times faster.
Abstract: This paper presents a method based on AdaBoost to identify the sex of a person from a low resolution grayscale picture of their face. The method described here is implemented in a system that will process well over 109 images. The goal of this work is to create an efficient system that is both simple to implement and maintain; the methods described here are extremely fast and have straightforward implementations. We achieve 80% accuracy in sex identification with less than 10 pixel comparisons and 90% accuracy with less than 50 pixel comparisons. The best classifiers published to date use Support Vector Machines; we match their accuracies with as few as 500 comparison operations on a 20×20 pixel image. The AdaBoost based classifiers presented here achieve over 93% accuracy; these match or surpass the accuracies of the SVM-based classifiers, and yield performance that is 50 times faster.

Patent
17 May 2005
TL;DR: In this paper, a text sequence captured by a user from a rendered document using a handheld text capture device is described, and a facility identifies in the received text sequence a reference to a distinguished product.
Abstract: A facility for initiating a purchase is described. The facility receives a text sequence captured by a user from a rendered document using a handheld text capture device. The facility identifies in the received text sequence a reference to a distinguished product. In response to identifying the reference, the facility presents to the user an opportunity to place an order for the established product. If the user accepts the presented opportunity to order the distinct product, the facility orders the distinct product on behalf of the user.

Patent
Hartmut Neven1
13 May 2005
TL;DR: In this paper, an image-based information retrieval system is presented that includes a mobile telephone and a remote server. But the system is not designed for the automatic generation of recognition output.
Abstract: An image-based information retrieval system is disclosed that includes a mobile telephone and a remote server. The mobile telephone has a built-in camera and a communication link for transmitting an image from the built-in camera to the remote server. The remote server has an optical character recognition engine for generating a first confidence value based on an image from the mobile telephone, an object recognition engine for generating a second confidence value based on an image from the mobile telephone, a face recognition engine for generating a third confidence value based on an image from the mobile telephone, and an integrator module for receiving the first, second, and third confidence values and generating a recognition output.

Patent
23 Mar 2005
TL;DR: In this paper, an image tile-based digital mapping system is configured for generating map tiles during an offline session, and serving selected sets of those tiles to a client when requested.
Abstract: Digital tile-based mapping techniques are disclosed that enable efficient online serving of aesthetically pleasing maps. In one particular embodiment, an image tile-based digital mapping system is configured for generating map tiles during an offline session, and serving selected sets of those tiles to a client when requested. Also provided are solutions for handling map labels and other such features in a tile-based mapping system, such as when a map label crosses map tile boundaries. Various processing environments (e.g., servers or other computing devices) can be employed in the system.

Patent
07 Oct 2005
TL;DR: In this article, a collection of captured images that form at least a portion of a library of images is used to enable retrieval of the captured images, and an index is generated where the index data is based on recognized information.
Abstract: An embodiment provides for enabling retrieval of a collection of captured images that form at least a portion of a library of images. For each image in the collection, a captured image may be analyzed to recognize information from image data contained in the captured image, and an index may be generated, where the index data is based on the recognized information. Using the index, functionality such as search and retrieval is enabled. Various recognition techniques, including those that use the face, clothing, apparel, and combinations of characteristics may be utilized. Recognition may be performed on, among other things, persons and text carried on objects.

Patent
09 Dec 2005
TL;DR: In this article, a system and method for using location information associated with wireless devices is described, where a wireless device, a location system, and a feature server are used to determine whether to execute an action in accordance with subscriber rules.
Abstract: A system and method is provided for using location information associated with wireless devices. The system includes a wireless device, a location system, and a feature server. The wireless device includes any wireless apparatus having wireless communications capabilities. The location system can generate location information pinpointing the location of the wireless device. The feature server can use the location information to determine whether to execute an action in accordance with subscriber rules. A large number of applications may be implemented to execute the action via a number of communication channels, including without limitation, a wireless communications network, a computer network, and a public switched telephone system, for example.

Patent
01 Apr 2005
TL;DR: In this paper, an action plan data structure for one or more rendered documents is described, which contains information specifying an action to perform automatically in response to a text capture from any of the selected rendered documents.
Abstract: An action plan data structure for one or more selected rendered documents is described. The data structure contains information specifying an action to perform automatically in response to a text capture from any of the selected rendered documents.

Patent
07 Oct 2005
TL;DR: In this article, a collection of captured images that form at least a portion of a library of images is used to enable retrieval of the captured images, and an index is generated where the index data is based on recognized information.
Abstract: An embodiment provides for enabling retrieval of a collection of captured images that form at least a portion of a library of images. For each image in the collection, a captured image may be analyzed to recognize information from image data contained in the captured image, and an index may be generated, where the index data is based on the recognized information. Using the index, functionality such as search and retrieval is enabled. Various recognition techniques, including those that use the face, clothing, apparel, and combinations of characteristics may be utilized. Recognition may be performed on, among other things, persons and text carried on objects.

Journal ArticleDOI
TL;DR: In this article, the authors describe and analyze a family of moderately hard, memory-bound functions, and explain how to use them for protecting against abuse in e-mail abuse by malicious users with high-end systems, or prohibitively slow for legitimate users with low-end computers.
Abstract: A resource may be abused if its users incur little or no cost. For example, e-mail abuse is rampant because sending an e-mail has negligible cost for the sender. It has been suggested that such abuse may be discouraged by introducing an artificial cost in the form of a moderately expensive computation. Thus, the sender of an e-mail might be required to pay by computing for a few seconds before the e-mail is accepted. Unfortunately, because of sharp disparities across computer systems, this approach may be ineffective against malicious users with high-end systems, prohibitively slow for legitimate users with low-end systems, or both. Starting from this observation, we research moderately hard functions that most recent systems will evaluate at about the same speed. For this purpose, we rely on memory-bound computations. We describe and analyze a family of moderately hard, memory-bound functions, and we explain how to use them for protecting against abuses.

Patent
19 Oct 2005
TL;DR: In this article, a hierarchical content-specific node structure and pricing for advertising delivery over each node independently is proposed, which allows an advertiser to pay more for advertisements delivered to a narrowly targeted audience and to pay less for advertisements directed at a more general audience who may or may not have an interest in the goods or services offered.
Abstract: Methods and systems for providing advertising content over the Internet through a hierarchical content-specific node structure and pricing advertising delivery over each node independently. Independent delivery and pricing allows an advertiser to pay more for advertisements delivered to a narrowly targeted audience likely to be interested in the goods or services offered by the advertising entity and to pay less for advertisements directed at a more general audience who may or may not have an interest in the goods or services offered. The less content specific the node is, the less targeted the advertisement will be, and therefore, the less valuable the advertisement will be to the advertising entity. Targeted advertising to multiple levels of content specific nodes is enabled.

Patent
Shumeet Baluja1
29 Jun 2005
TL;DR: In this paper, the authors propose a call-on-select functionality, which allows a user device to automatically dial a telephone number associated with the ad by an advertiser, instead of loading a document (e.g., Web page) for rendering.
Abstract: The serving of one or more ads to a user device considers determined characteristics of a user device, such as whether or not the user device supports telephone calls. At least some ads may include call-on-select functionality. When such an ad is selected (e.g., via a button click), instead of loading a document (e.g., Web page) for rendering, a telephone number associated with the ad by an advertiser can be automatically dialed.

Proceedings ArticleDOI
12 Oct 2005
TL;DR: Three programming language abstractions are identified for the construction of reusable components: abstract type members, explicit selftypes, and modular mixin composition, which enable an arbitrary assembly of static program parts with hard references between them to be transformed into a system of reusable component.
Abstract: We identify three programming language abstractions for the construction of reusable components: abstract type members, explicit selftypes, and modular mixin composition. Together, these abstractions enable us to transform an arbitrary assembly of static program parts with hard references between them into a system of reusable components. The transformation maintains the structure of the original system. We demonstrate this approach in two case studies, a subject/observer framework and a compiler front-end.

Journal ArticleDOI
Luiz Andre Barroso1
TL;DR: The high-computational demands that are inherent in most of Google’s services have led the research group to develop a deep understanding of the overall cost of computing, and continually to look for hardware/software designs that optimize performance per unit of cost.
Abstract: In the late 1990s, our research group at DEC was one of a growing number of teams advocating the CMP (chip multiprocessor) as an alternative to highly complex single-threaded CPUs. We were designing the Piranha system,1 which was a radical point in the CMP design space in that we used very simple cores (similar to the early RISC designs of the late ’80s) to provide a higher level of thread-level parallelism. Our main goal was to achieve the best commercial workload performance for a given silicon budget. Today, in developing Google’s computing infrastructure, our focus is broader than performance alone. The merits of a particular architecture are measured by answering the following question: Are you able to afford the computational capacity you need? The high-computational demands that are inherent in most of Google’s services have led us to develop a deep understanding of the overall cost of computing, and continually to look for hardware/software designs that optimize performance per unit of cost.

Proceedings ArticleDOI
10 May 2005
TL;DR: Thresher is described, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web, and which enables a rich semantic interaction with existing web pages, "unwrapping" semantic data buried in the pages' HTML.
Abstract: We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify examples of semantic content by highlighting them in a web browser and describing their meaning. We then use the tree edit distance between the DOM subtrees of these examples to create a general pattern, or wrapper, for the content, and allow the user to bind RDF classes and predicates to the nodes of these wrappers. By overlaying matches to these patterns on standard documents inside the Haystack semantic web browser, we enable a rich semantic interaction with existing web pages, "unwrapping" semantic data buried in the pages' HTML. By allowing end-users to create, modify, and utilize their own patterns, we hope to speed adoption and use of the Semantic Web and its applications.

Patent
Paul T. Buchheit1, Bay-Wei W. Chang1, Jing Yee Lim1, Brian D. Rakowski1, Sanjeev Singh1 
25 Mar 2005
TL;DR: In this paper, a method and system for processing messages is disclosed that includes receiving a plurality of messages directed to a user, where each message has a unique message identifier and each message may be associated with a respective conversation, where the contents of a conversation are displayed when the user selects a conversation from the displayed list of conversations.
Abstract: A method and system for processing messages is disclosed that includes receiving a plurality of messages directed to a user, where each message has a unique message identifier. Each of the plurality of messages may be associated with a respective conversation, where each conversation has a respective conversation identifier. Also, each conversation includes a set of one or more messages sharing a common set of characteristics that meet a first predefined criteria. A list of conversations is displayed as a set of rows in an order determined by a second predefined criteria, where each row corresponds to one of the listed conversations and includes at least a sender list, a conversation topic and a date/time value. The contents of a conversation is displayed when the user selects a conversation from the displayed list of conversations. Messages can be displayed in one of three modes: expanded, compacted and hidden.


Journal ArticleDOI
TL;DR: Two types of periodicities are defined, and a scalable, computationally efficient algorithm is proposed for each type, and the algorithms are extended in order to discover the periodic patterns of unknown periods at the same time without affecting the time complexity.
Abstract: Periodicity mining is used for predicting trends in time series data. Discovering the rate at which the time series is periodic has always been an obstacle for fully automated periodicity mining. Existing periodicity mining algorithms assume that the periodicity, rate (or simply the period) is user-specified. This assumption is a considerable limitation, especially in time series data where the period is not known a priori. In this paper, we address the problem of detecting the periodicity rate of a time series database. Two types of periodicities are defined, and a scalable, computationally efficient algorithm is proposed for each type. The algorithms perform in O(n log n) time for a time series of length n. Moreover, the proposed algorithms are extended in order to discover the periodic patterns of unknown periods at the same time without affecting the time complexity. Experimental results show that the proposed algorithms are highly accurate with respect to the discovered periodicity rates and periodic patterns. Real-data experiments demonstrate the practicality of the discovered periodic patterns.