scispace - formally typeset
Search or ask a question

Showing papers by "Google published in 2003"


Journal ArticleDOI
19 Oct 2003
TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

5,429 citations


Journal ArticleDOI
TL;DR: Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software that achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers.
Abstract: Amenable to extensive parallelization, Google's web search application lets different queries run on different processors and, by partitioning the overall index, also lets a single query use multiple processors. to handle this workload, Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software. This architecture achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers.

1,129 citations


Patent
24 Sep 2003
TL;DR: In this article, the authors present a method for placing targeted ads on page on the web (or some other document of any media type) by obtaining content that includes available spots for ads, determining ads relevant to content, and/or combining content with ads determined to be relevant to the content.
Abstract: Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content.

809 citations


Patent
24 Sep 2003
TL;DR: In this article, the relevance of advertisements to a user's interests is improved by analyzing the content of a web page to determine a list of one or more topics associated with that web page.
Abstract: The relevance of advertisements to a user's interests is improved. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages.

746 citations


Journal ArticleDOI
TL;DR: Using the rich profile data provided by the users, the analysis of Club Nexus was able to deduce the attributes contributing to the formation of friendships, and to determine how the similarity of users decays as the distance between them in the network increases.
Abstract: We present an analysis of Club Nexus, an online community at Stanford University. Through the Nexus site we were able to study a reflection of the real world community structure within the student body. We observed and measured social network phenomena such as the small world effect, clustering, and the strength of weak ties. Using the rich profile data provided by the users we were able to deduce the attributes contributing to the formation of friendships, and to determine how the similarity of users decays as the distance between them in the network increases. In addition, we found correlations between users' personalities and their other attributes, as well as interesting correspondences between how users perceive themselves and how they are perceived by others.

397 citations


12 Feb 2003
TL;DR: This work presents an approach to clustering based on the observation that "it is easier to criticize than to construct" and demonstrates semi-supervised clustering with a system that learns to cluster news stories from a Reuters data set.
Abstract: We present an approach to clustering based on the observa- tion that "it is easier to criticize than to construct." Our approach of semi- supervised clustering allows a user to iteratively provide feedback to a clus- tering algorithm. The feedback is incorporated in the form of constraints, which the clustering algorithm attempts to satisfy on future iterations. These constraints allow the user to guide the clusterer toward clusterings of the data that the user finds more useful. We demonstrate semi-supervised clustering with a system that learns to cluster news stories from a Reuters data set. 1

293 citations


Patent
10 Apr 2003
TL;DR: An apparatus, system and method for providing multiple logical partitions in a system area network are provided in this paper, which allows multiple operating systems to share the resources of a single physical host channel adapter (HCA).
Abstract: An apparatus, system and method for providing multiple logical partitions in a system area network are provided Logical partitioning support is provided for host channel adapters which allows multiple operating systems to share the resources of a single physical host channel adapter (HCA). The apparatus, system and method ensures that each operating system is unaware that the HCA hardware resources are being shared with other operating systems and further guarantees that the individual operating systems are prevented from accessing HCA hardware resources which are associated with other operating systems.

276 citations


Patent
Narayanan Shivakumar1
24 Sep 2003
TL;DR: A client-side application (such as a browser, a browser plug-in, or a browser toolbar) is used to support the serving of content-relevant ads to the client device as mentioned in this paper.
Abstract: A client-side application (such as a browser, a browser plug-in, a browser toolbar plug-in, etc. on an end user's computer) is used to support the serving of content-relevant ads to the client device. The client-side application may provide such support by sending document information (such as a document identifier, document content, content relevance information, etc.) to a content ad server. The client-side application may also be used to combine content of the document and the content-relevant ads. For example, the client-side application may combine content of the document and the ads in a window (e.g., in a browser window), may provide the ads in a window above, below, adjacent to a document window, may provide the ads in “chrome” of the browser, etc.

252 citations


Patent
30 Dec 2003
TL;DR: Modular data centers with modular components suitable for use with rack or shelf mount computing systems, for example, are disclosed as discussed by the authors, which includes a modular computing module including an intermodal shipping container and computing systems mounted within the container.
Abstract: Modular data centers with modular components suitable for use with rack or shelf mount computing systems, for example, are disclosed The modular center generally includes a modular computing module including an intermodal shipping container and computing systems mounted within the container and configured to be shipped and operated within the container and a temperature control system for maintaining the air temperature surrounding the computing systems The intermodal shipping container may be configured in accordance to International Organization for Standardization (ISO) container manufacturing standards or otherwise configured with respect to height, length, width, weight, and/or lifting points of the container for transport via an intermodal transport infrastructure The modular design enables the modules to be cost effectively built at a factory and easily transported to and deployed at a data center site

246 citations


Journal ArticleDOI
TL;DR: This work proposes a high-level battery model that allows a designer to analytically predict the battery time-to-failure for a given load and allows for a tradeoff between the accuracy and the amount of computation performed.
Abstract: A battery-powered portable electronic system shuts down once the battery is discharged; therefore, it is important to take the battery behavior into account. A system designer needs an adequate high-level battery model to make battery-aware decisions targeting the maximization of the system's online lifetime. We propose such a model that allows a designer to analytically predict the battery time-to-failure for a given load. Our model also allows for a tradeoff between the accuracy and the amount of computation performed. The quality of the proposed model is evaluated using typical pocket computer applications and a detailed low-level simulation of a lithium-ion electrochemical cell. In addition, we verify the proposed model against actual measurements taken on a real lithium-ion battery.

218 citations


Patent
03 Nov 2003
TL;DR: In this paper, a system and method for providing dynamic pay-for-placement advertisements via graphics-enabled email that generates a display of advertisements (24) when the email newsletter is opened so the advertisements displayed are based on rankings at the time the email is opened instead of when the emails were generated and transmitted.
Abstract: A system and method for providing dynamic pay-for-placement advertisements (24) via graphics-enabled email that generates a display of advertisements (24) when the email newsletter is opened so the advertisements (24) displayed are based on rankings at the time the email is opened instead of when the email was generated and transmitted. In one embodiment, a graphical-content email having one or more embedded advertisements (24) image references is provided to one or more email recipients (28c,28d). The advertisements (24) image reference, in one embodiment, may include query string parameters indicating the context of the image reference and/or portion of the image reference (i.e., identifying the image reference as being part of a particular newsletter email), a position of the image reference in the email display, and the like. A URL reference also may be included with each advertisement (24) image reference (.e.g., one URL for each advertisement (24) portion of the image to be retrieved by the advertising image reference).

Patent
Lawrence E. Page1
30 Sep 2003
TL;DR: In this paper, the authors present a computer-implemented method and apparatus for searching in response to Internet-based search queries using a search engine and an electronic database, where data sets representing published items are input, for example, scanned-in or sent electronically and stored in a searchable database.
Abstract: The present invention is directed to a computer-implemented method and apparatus for searching in response to Internet-based search queries using a search engine and an electronic database. According to one example embodiment of the present invention, data sets representing published items are input, for example, scanned-in or sent electronically, and stored in a searchable database. Each data set includes text from at least one published item. Responsive to the search query, a search engine searches for and identifies relevant web pages and data sets representing published items and, in a more specific embodiment, ranked characterizations are returned for the relevant web pages and published items. An electronic path can be provided with the published item for accessing further information about the published item. In one embodiment, the electronic path is a hyperlink from a characterization of a relevant published item to a more complete electronic representation of the relevant published item. Publishers provide authorization to display copyrighted materials through a permission protocol.

Patent
15 Dec 2003
TL;DR: In this paper, a determination is made as to whether a state associated with the record includes at least one hold state and whether the record associated with a state included at least a retention period that has not expired.
Abstract: Provided are a method, system, and program for receiving a request to remove a record. A determination is made as to whether a state associated with the record includes at least one hold state and whether the state associated with the record includes at least a retention period that has not expired. The request to remove the record is denied in response to determining that the state associated with the record includes at least one of at least one hold state and one retention period that has not expired.

Proceedings ArticleDOI
20 May 2003
TL;DR: A variety of algorithms were evaluated for finding news articles on the web that are relevant to news currently being broadcast, looking at the impact of inverse document frequency, stemming, compounds, history, and query length on the relevance and coverage of news articles returned in real time during a broadcast.
Abstract: Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can be treated as one such stream of text; in this paper we discuss finding news articles on the web that are relevant to news currently being broadcast.We evaluated a variety of algorithms for this problem, looking at the impact of inverse document frequency, stemming, compounds, history, and query length on the relevance and coverage of news articles returned in real time during a broadcast. We also evaluated several postprocessing techniques for improving the precision, including reranking using additional terms, reranking by document similarity, and filtering on document similarity. For the best algorithm, 84%-91% of the articles found were relevant, with at least 64% of the articles being on the exact topic of the broadcast. In addition, a relevant article was found for at least 70% of the topics.

Patent
30 Oct 2003
TL;DR: In this article, a mobile station selects a preferred cell site for transmitting a frame of data to be sent subsequently to the mobile station, and a base station transmits a transmission of the frame, wherein parameters for the transmission are determined by the base station from recently-measured channel and interference information.
Abstract: A mobile station (402) selects (1802) a preferred cell site for transmitting a frame of data to be sent subsequently to the mobile station. A base station (602) of the preferred cell site schedules (1804) a transmission of the frame of data, wherein parameters for the transmission are determined by the base station from recently-measured channel and interference information. Thereafter, the base station sends (1806) the frame of data from the preferred cell site; and an active set of base stations associated with the mobile station at ones of a plurality of cell sites synchronize (1808) their data queues to reflect the transmission of the frame of data.

Patent
12 Nov 2003
TL;DR: In this paper, the authors proposed a location blocking service for use in a wireless network that tracks the location and identity of network users, such as networks complying with enhanced 911 standards, which provides a network user with the ability to prevent the location of her wireless handheld device from being disclosed to parties other than the wireless network provider and PSAPs.
Abstract: The invention disclosed is a location blocking service for use in a wireless network that tracks the location and identity of network users, such as networks complying with enhanced 911 standards. The service provides a network user with the ability to prevent the location of her wireless handheld device from being disclosed to parties other than the wireless network provider and PSAPs (Public Safety Answering Points). The network user blocks the forwarding of location information by signaling to the wireless handheld device when the location information originates from the wireless handheld device, or by signaling to the network when the location information originates from the wireless handheld device or the network. Primary components of the present invention include at least one user interface and at least one location block processor provisioned in the wireless handheld device and/or the wireless network. The user interface prompts the user of the handheld device to enter the commands that the send the signals to the device or network.

Patent
Jeremy Bem1
30 Sep 2003
TL;DR: The number of ads potentially relevant to search query information may be increased by relaxing the notion of search query keyword matching as discussed by the authors, which may be done by expanding a set of ad request keywords to include both query keywords and related keywords.
Abstract: The number of ads potentially relevant to search query information may be increased by relaxing the notion of search query keyword matching. This may be done, for example, by expanding a set of ad request keywords to include both query keywords and related keywords. The scores of ads served pursuant to a relaxed notion of matching (those with keyword targeting criteria that matched words related to words in the search query, but not the words from the search query) may be discounted relative to the scores of ads served pursuant to a stricter notion of matching. This may be done by using a score modification parameter, such as an ad performance multiplier (when an ad score is a function of ad performance information) The score modification parameter may be updated to reflect observed performance data, such as that associated with {word-to-related word} mappings.

Patent
Alexander Franz1, Brian Milch1, Eric Jackson1, Jenny Zhou1, Benjamin Jay Diament1 
04 Aug 2003
TL;DR: In this paper, a system and method for identifying language attributes through probabilistic analysis is described, a set of language classes and a plurality of training documents are defined, each language class identifies a language and a character set encoding.
Abstract: A system and method for identifying language attributes through probabilistic analysis is described. A set of language classes and a plurality of training documents are defined, Each language class identifies a language and a character set encoding. Occurrences of one or more document properties within each training document are evaluated. For each language class, a probability for the document properties set conditioned on the occurrence of the language class is calculated. Byte occurrences within each training document are evaluated. For each language class, a probability for the byte occurrences conditioned on the occurrence of the language class is calculated.

Patent
24 Dec 2003
TL;DR: In this paper, the authors described a method for viewing and responding to electronic messages, including a method of viewing a first electronic message, which comprises: identifying an extraneous portion within a second electronic message (420, 430 and 440); eliding the extraneous portions within the second electronic messages (450); and generating the first electronic messages wherein the first e mice include the second e mies with the extrage portion of the second E mies suppressed (460).
Abstract: Methods and apparatus are described for viewing and responding to electronic messages. In one embodiment, when an electronic message is displayed, a portion of the electronic message is elided to aid in the viewing experience (450). In one embodiment, a method of viewing a first electronic message, comprises: identifying an extraneous portion within a second electronic message (420, 430 and 440); eliding the extraneous portion within the second electronic message (450); and generating the first electronic message wherein the first electronic message includes the second electronic message with the extraneous portion of the second electronic message suppressed (460).

Patent
29 Jan 2003
TL;DR: In this paper, a system and method for providing end-to-end security of content over a heterogeneous distribution chain is provided, where a content owner provides content to an aggregator that receives the content and processes the content.
Abstract: A system and method for providing end-to-end security of content over a heterogeneous distribution chain is provided. A content owner provides content to an aggregator that receives the content and processes the content. The processing may involve decrypting the content and associating at least one of a unique fingerprint and a watermark to the decrypted content. The unique fingerprint and a watermark to the decrypted content provide identifying characteristics to the content. Additional content-based fingerprints may be used to monitor quality of consumer experience for Video and Audio. The content may be sent in a decrypted state to a client or in an encrypted state. When the content is encrypted the aggregator wraps and encrypts the content with a signature such that an end-to-end flow of the content may be determined. Application Level encryption is used to provide network/distribution medium transparency as well as persistent encryption. When the content is transmitted from a consumer to another consumer the transmitting consumer loses rights to the content.

Patent
Georges R. Harik1, Noam Shazeer1
03 Oct 2003
TL;DR: In this article, the authors present a system that characterizes a document with respect to clusters of conceptually related words, where candidate clusters are selected using a model that explains how sets of words are generated from clusters of related words.
Abstract: One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects 'candidate clusters' of conceptually related words that are related to the set of words (Figure 22, 2202). These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words (Figure 22, 2204). Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters (Figure 22, 2206). Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words (Figure 22, 2208).

Patent
Andrew M. H. Beattie1
11 Apr 2003
TL;DR: In this paper, a method for disaster recovery includes copying at least a portion of information from a first database to a backup system as backup information and storing an incremental change in a second database.
Abstract: A method for disaster recovery includes copying at least a portion of information from a first database to a backup system as backup information The method also includes storing an incremental change in a second database The incremental change represents a change to at least a portion of the information in the first database The method further includes restoring the first database using at least one of the backup information and the incremental change

Patent
Úlfar Erlingsson1
06 Jun 2003
TL;DR: In this paper, a secure application execution environment using derived user accounts (SAE DUA) for Internet content is described and a determination is made if the received content is trusted or untrusted.
Abstract: Methods and systems are disclosed for implementing a secure application execution environment using Derived User Accounts (SAE DUA) for Internet content. Content is received and a determination is made if the received content is trusted or untrusted content. The content is accessed in a protected derived user account (DUA) such as a SAE DUA if the content is untrusted otherwise the content is accessed in a regular DUA if the content is trusted.

Patent
03 Jul 2003
TL;DR: In this article, a data storage device with a head, a ramp for unloading the head, and a load beam having a head support portion for supporting the head and a tab adapted to slide on a sliding surface of the ramp on a front end side with respect to the head support is presented.
Abstract: Embodiments in accordance with the present invention relate to suppressing deterioration in rigidity and dynamic characteristics of a tab even when an initial point of contact between a tab and a ramp during unloading is set closer to the outer periphery of a disk. A data storage device in one embodiment of the present invention includes a head, a ramp for unloading the head, a load beam having a head support portion for supporting the head and a tab adapted to slide on a sliding surface of the ramp on a front end side with respect to the head support portion, and an actuator having the load beam and adapted to actuate the head for loading and unloading. During unloading, a portion of the tab offset from the center in the transverse direction of the tab first comes into contact with the sliding surface of the ramp. The load beam has a flange formed to be bent and continuously from the front end portion of the tab up to both ends of the head support portion.

Patent
30 Sep 2003
TL;DR: In this paper, a frequency management scheme for a hybrid cellular/GPS or other device generates a local clock signal for the communications portion of the device, using a crystal oscillator or other part.
Abstract: A frequency management scheme for a hybrid cellular/GPS or other device generates a local clock signal for the communications portion of the device, using a crystal oscillator or other part. The oscillator output may be corrected by way of an automatic frequency control (AFC) circuit or software, to drive the frequency of that clock signal to a higher accuracy. Besides being delivered to the cellular or other communications portion of the hybrid device, the compensated clock signal may also be delivered to a comparator to measure the offset between the cellular oscillator and the GPS oscillator. The error in the cellular oscillator may be measured from the AFC operation in the cellular portion of the device. An undershoot or overshoot in the delta between the two oscillators may thus be deduced to be due to bias in the GPS oscillator, whose value may then be determined. That value may then be used to adjust Doppler search, bandwidth or other GPS receiver characteristics to achieve better time to first fix or other performance characteristics.

Patent
05 May 2003
TL;DR: In this article, a communication system (100) provides in-band speaker arbitration in a multi-participant (111-114) communication session by use of RTP floor control messages (300) that include a speaker arbitration command embedded in a data packet header extension.
Abstract: A communication system (100) provides in-band speaker arbitration in a multi-participant (111-114) communication session by use of RTP floor control messages (300) that include a speaker arbitration command embedded in a data packet header extension.

Patent
24 Jul 2003
TL;DR: In this article, the Internet distribution partner defines filters to be applied to ranked advertising listings provided by an advertising listing provider, and the advertisement listings provider applies the filter to the listings in its database and identify matches and/or exclude matches depending on the characteristic specified.
Abstract: Methods and systems that allow an Internet distribution partner of an advertisement listings provider to receive filtered and masked listings for display on the website of the Internet distribution partner. The Internet distribution partner defines filters to be applied to ranked advertising listings provided by an advertising listing provider. The advertisement listings provider system applies the filter to the listings in its database and identify matches and/or excludes matches depending on the characteristic specified. The advertisement listings provider may then send the Internet distribution partner advertisement listings based on the application of one or more filter selected by the distribution partner. Thus, the advertisement listings provider and the Internet distribution partner are able to generate additional revenue without risking the Internet distribution partner's valuable relationships with its exclusive advertisers and without jeopardizing the Internet advertising distribution partner's relationships with its end users.

Patent
Jeremy Bem1
14 Nov 2003
TL;DR: In this article, the authors propose a confidence factor of the ad targeting used to adjust a performance threshold, such as click-through-rate, which is based on the confidence of ad targeting.
Abstract: If some aspect of serving or scoring an ad is subject to a performance (e.g., click-through rate, etc.) threshold, such a threshold may be adjusted using a confidence factor of the ad targeting used. For example, ads served pursuant to a more relaxed notion of match might have to meet a higher performance threshold (e.g., than the threshold applied to ads served pursuant to a stricter notion of match). Alternatively, or in addition, ads served pursuant to a stricter notion of match might be subject to a lower performance threshold (e.g., than the threshold applied to ads served pursuant to a more relaxed notion of match). Thus, in general, a performance threshold could increase as match confidence decreases, and/or a performance threshold could decrease as match confidence increases.

Journal ArticleDOI
TL;DR: In a comprehensive study using the Itsy pocket computer, the authors measure both total system power and power dissipated by individual subcircuits for representative workloads and suggest possible low-power design optimizations and power management strategies.
Abstract: In a comprehensive study using the Itsy pocket computer, the authors measure both total system power and power dissipated by individual subcircuits for representative workloads. The results suggest possible low-power design optimizations and power management strategies.

Patent
10 Feb 2003
TL;DR: In this paper, programmatically-generated byte code insertion is used to perform run-time tracing of code that potentially encounters a wait during execution, and insertion is performed at load time, and inserts byte codes before and after a located (potential) wait point.
Abstract: Methods, systems, computer program products, and methods of doing business whereby programmatically-generated byte code insertion is used to perform run-time tracing of code that potentially encounters a wait during execution. The byte code insertion is performed at load time, and inserts byte codes before and after a located (potential) wait point. The inserted byte code functions to gather execution statistics, such as a time stamp before invoking a located wait point and a time stamp after invoking the located wait point. Preferred embodiments allow this tracing to be selectively activated/deactivated.