scispace - formally typeset
Journal ArticleDOI

The Google file system

Sanjay Ghemawat, +2 more
- Vol. 37, Iss: 5, pp 29-43
Reads0
Chats0
TLDR
This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Abstract
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points. The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

read more

Content maybe subject to copyright    Report

Citations
More filters
Dissertation

Security features using a distributed file system

TL;DR: Tese de mestrado em Segurnaca Informatica, apresentada a Universidade de Lisboa, atraves da Faculdade de Ciencias, 2011 as discussed by the authors.
Journal ArticleDOI

Performance Evaluation of a Distributed File System with Locality-Aware Metadata Lookups

TL;DR: Experimental results demonstrate that GMount has highly scalable metadata and I/O operation performance when data access locality is common, and the performance of GMount is practically useful for routine data-intensive computing practice.
Dissertation

Étude des problèmes d'ordonnancement sur des plates-formes hétérogènes en modèle multi-port

Hejer Rejeb
TL;DR: Les travaux menes dans cette these concernent les problemes d'ordonnancement sur des plates-formes de calcul dynamiques and heterogenes and s'appuient sur le modele de communication "multi-port" pour les communications.
Patent

Modifying data resources within party-partitioned storage areas

TL;DR: In this article, a server system comprising a physically separate storage area for each of a plurality of respective parties including a first and second party, and a manager function for managing the storage is presented.

Hadoop for Roboticists

TL;DR: This tutorial is to give an overview of Hadoop, its possibilities and its use in Robotics, as well as to provide explanations about the setup, design of mapreduce jobs and troubleshooting.
References
More filters
Journal ArticleDOI

A case for redundant arrays of inexpensive disks (RAID)

TL;DR: Five levels of RAIDs are introduced, giving their relative cost/performance, and a comparison to an IBM 3380 and a Fujitsu Super Eagle is compared.
Journal ArticleDOI

Scale and performance in a distributed file system

TL;DR: Observations of a prototype implementation are presented, changes in the areas of cache validation, server process structure, name translation, and low-level storage representation are motivated, and Andrews ability to scale gracefully is quantitatively demonstrated.
Proceedings Article

GPFS: A Shared-Disk File System for Large Computing Clusters

TL;DR: GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
Journal ArticleDOI

Searching the Web: the public and their queries

TL;DR: It is found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features, and the language of Web queries is distinctive.
Journal ArticleDOI

Web search for a planet: The Google cluster architecture

TL;DR: Googless architecture features clusters of more than 15,000 commodity-class PCs with fault tolerant software that achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers.
Related Papers (5)