The Google file system

doi:10.1145/1165389.945450

Journal ArticleDOI

The Google file system

Sanjay Ghemawat, +2 more

- Vol. 37, Iss: 5, pp 29-43

Chats0

TLDR

This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

Abstract:

We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

Citations

PDF

Open Access

More filters

Dissertation

Security features using a distributed file system

Rui Miguel Coelho Martins

TL;DR: Tese de mestrado em Segurnaca Informatica, apresentada a Universidade de Lisboa, atraves da Faculdade de Ciencias, 2011 as discussed by the authors.

...read moreread less

Journal ArticleDOI

Performance Evaluation of a Distributed File System with Locality-Aware Metadata Lookups

Nan Dun, +2 more

- 05 Oct 2011 -

Ipsj Online Transactions

TL;DR: Experimental results demonstrate that GMount has highly scalable metadata and I/O operation performance when data access locality is common, and the performance of GMount is practically useful for routine data-intensive computing practice.

...read moreread less

Dissertation

Étude des problèmes d'ordonnancement sur des plates-formes hétérogènes en modèle multi-port

Hejer Rejeb

TL;DR: Les travaux menes dans cette these concernent les problemes d'ordonnancement sur des plates-formes de calcul dynamiques and heterogenes and s'appuient sur le modele de communication "multi-port" pour les communications.

...read moreread less

Patent

Modifying data resources within party-partitioned storage areas

Macksood Azmil, +8 more

TL;DR: In this article, a server system comprising a physically separate storage area for each of a plurality of respective parties including a first and second party, and a manager function for managing the storage is presented.

...read moreread less

Hadoop for Roboticists

Olivier Deiss, +1 more

TL;DR: This tutorial is to give an overview of Hadoop, its possibilities and its use in Robotics, as well as to provide explanations about the setup, design of mapreduce jobs and troubleshooting.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A case for redundant arrays of inexpensive disks (RAID)

David A. Patterson, +2 more

TL;DR: Five levels of RAIDs are introduced, giving their relative cost/performance, and a comparison to an IBM 3380 and a Fujitsu Super Eagle is compared.

...read moreread less

Journal ArticleDOI

Scale and performance in a distributed file system

John H. Howard, +6 more

- 01 Feb 1988 -

ACM Transactions on Computer Systems

TL;DR: Observations of a prototype implementation are presented, changes in the areas of cache validation, server process structure, name translation, and low-level storage representation are motivated, and Andrews ability to scale gracefully is quantitatively demonstrated.

...read moreread less

Proceedings Article

GPFS: A Shared-Disk File System for Large Computing Clusters

Frank B. Schmuck, +1 more

TL;DR: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

...read moreread less

Journal ArticleDOI

Searching the Web: the public and their queries

Amanda Spink, +3 more

- 01 Feb 2001 -

Journal of the Association for Informati...

TL;DR: It is found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features, and the language of Web queries is distinctive.

...read moreread less

Journal ArticleDOI

Web search for a planet: The Google cluster architecture

Luiz Andre Barroso, +2 more

- 01 Mar 2003 -

IEEE Micro

TL;DR: Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software that achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers.

...read moreread less