scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Towards global storage management and data placement

20 May 2001-pp 184
TL;DR: This work has shown that it is possible to automatically design and configure a storage system of one or more disk arrays to meet a set of application requirements and to dynamically reconfigure as needs change, all without human intervention.
Abstract: As users and companies increasingly depend on shared, networked information services, we continue to see growth in data centers and service providers. This happens as services and servers are consolidated (for ease of management and reduced duplication), while also being distributed (for fault tolerance and to accommodate the global reach of customers). Since access to data is the lifeblood of any organization, a global storage system is a core element in such an infrastructure. Based on success in automatically managing local storage, we believe that the key attribute of such a system is the ability to flexibly adapt to a variety of application semantics and requirements as they arise and as they change over time. Our work has shown that it is possible to automatically design and configure a storage system of one or more disk arrays to meet a set of application requirements and to dynamically reconfigure as needs change, all without human intervention. Work on global data placement expands the scope of this system to a world of distributed data centers.
Citations
More filters
Patent
27 Jun 2002
TL;DR: In this paper, a DataPath Engine coupled to a SAN provides automated storage provisioning between an application on a SAN attached server and a data volume on a NAS attached storage subsystem.
Abstract: A DataPath Engine coupled to a SAN provides automated storage provisioning between an application on a Storage Area Network (SAN) attached server and a data volume on a SAN attached storage subsystem. The apparatus provides a simple user interface that allows operators to use pre-created policies for criteria to select data paths that meet organizations uptime and performance requirements. The apparatus uses pathing methodologies to select the optimal data path from the candidates by rating SAN state, uptime, performance, and other key factors. This apparatus allows an enterprise to more efficiently and effectively manage and monitor large, complex, distributed SANs.

186 citations

01 Jan 2004
TL;DR: Clotho is presented, a storage block abstraction layer that allows transparent and automatic data versioning at the block level that can be used to build flexible higher-level version management policies that range from keeping all data modifications to version capturing triggered by timers or other system events.
Abstract: Recently storage management has emerged as one of the main problems in building cost effective storage infrastructures. One of the issues that contribute to management complexity of storage systems is maintaining previous versions of data. Up till now such functionality has been implemented by high-level applications or at the filesystem level. However, many modern systems aim at higher scalability and do not employ such management entities as filesystems. In this paper we propose pushing the versioning functionality closer to the disk by taking advantage of modern, block-level storage devices. We present Clotho, a storage block abstraction layer that allows transparent and automatic data versioning at the block level. Clotho provides a set of mechanisms that can be used to build flexible higher-level version management policies that range from keeping all data modifications to version capturing triggered by timers or other system events. Overall, we find that our approach is promising in offloading significant management overhead and complexity from higher system layers to the disk itself and is a concrete step towards building self-managed storage devices. Our specific contributions are: (i) We implement Clotho as a new layer in the block I/O hierarchy in Linux and demonstrate that versioning can be performed at the block level in a transparent manner. (ii) We investigate the impact on I/O path performance overhead using both microbenchmarks as well as SPEC SFS V3.0 over two real filesystems, Ext2FS and ReiserFS. (iii) We examine techniques that reduce the memory and disk space required for metadata information.

69 citations


Cites background from "Towards global storage management a..."

  • ...Furthermore, the cost of storage administration is estimated at several times the purchase price of the storage hardware [2, 5, 7, 12, 33, 34, 36]....

    [...]

Journal ArticleDOI
TL;DR: STICS is the first-ever “cache storage” for bridging the gap between SCSI and IP making it possible to build an efficient SAN over IP and can significantly improve performance, reliability, and scalability over current iSCSI systems.

23 citations


Cites background from "Towards global storage management a..."

  • ...Online data storage doubles every 9 months [7] due to an ever-growing demand for networked information services [8, 25, 51]....

    [...]

Proceedings ArticleDOI
18 Aug 2002
TL;DR: STICS (SCSI-To-IP cache storage), a novel storage architecture that couples reliable and high-speed data caching with low-overhead conversion between SCSI and IP protocols, significantly improves performance over current iSCSI system.
Abstract: Data storage plays an essential role in today's fast-growing data-intensive network services. iSCSI is one of the most recent standards that allow SCSI protocols to be carried out over IP networks. However, the disparities between SCSI and IP prevent fast and efficient deployment of SAN (storage area network) over IP. This paper introduces STICS (SCSI-To-IP cache storage), a novel storage architecture that couples reliable and high-speed data caching with low-overhead conversion between SCSI and IP protocols. Through the efficient caching algorithm and localization of certain unnecessary protocol overheads, STICS significantly improves performance over current iSCSI system. Furthermore, STICS can be used as a basic plug-and-play building block for data storage over IP. We have implemented software STICS prototype on Linux operating system. Numerical results using popular PostMark benchmark program and EMC's trace have shown dramatic performance gain over the current iSCSI implementation.

15 citations


Additional excerpts

  • ...growing demand for networked information services [20, 41]....

    [...]

Journal ArticleDOI
TL;DR: A checkpointed channel (CC) abstraction is proposed as an alternative model for storing and accessing content and discusses the advantages of the new abstraction, challenges it poses, and the way it fits within the existing models for RIA development.
Abstract: Today's Rich Internet Application (RIA) technologies such as Ajax, Flex, or Silverlight, are designed around the client-server paradigm and cannot easily take advantage of replication, publish-subscribe, or peer-to-peer mechanisms for better scalability or responsiveness This is particularly true of storage: content is typically persisted in data centers and consumed via web services We propose1 a checkpointed channel (CC) abstraction as an alternative model for storing and accessing content CCs are architecture-agnostic: they could be implemented as web services, but also as replicated state machines running over peer-to-peer multicast protocols They can seamlessly span across the data center boundaries, or live at the edge They are a more natural way of consuming streaming content CCs can store hierarchical documents with hyperlinks to other CCs, thus forming a web of interconnected CCs: a live scalable information space We discuss the advantages of the new abstraction, challenges it poses, and the way it fits within the existing models for RIA development

8 citations

References
More filters
Journal ArticleDOI
12 Nov 2000
TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.
Abstract: OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is protected through redundancy and cryptographic techniques. To improve performance, data is allowed to be cached anywhere, anytime. Additionally, monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development.

3,376 citations


"Towards global storage management a..." refers background in this paper

  • ...OceanStore [Kubiatowicz00] proposes an architecture for creating a persistent global store that relies on large numbers of encrypted replicas, distributed around the world, to provide security and availability....

    [...]

Proceedings ArticleDOI
04 May 1997
TL;DR: A family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network, based on a special kind of hashing that is called consistent hashing.
Abstract: We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems.

2,179 citations

Journal ArticleDOI
TL;DR: This paper shows that disconnected operation is feasible, efficient and usable by describing its design and implementation in the Coda File System by showing that caching of data, now widely used for performance, can also be exploited to improve availability.
Abstract: Disconnected operation is a mode of operation that enables a client to continue accessing critical data during temporary failures of a shared data repository. An important, though not exclusive, application of disconnected operation is in supporting portable computers. In this paper, we show that disconnected operation is feasible, efficient and usable by describing its design and implementation in the Coda File System. The central idea behind our work is that caching of data, now widely used for performance, can also be exploited to improve availability.

1,214 citations

Proceedings ArticleDOI
01 Oct 1997
TL;DR: The design of Odyssey is described, a prototype implementing application-aware adaptation, and how it supports concurrent execution of diverse mobile applications, and agility is identified as a key attribute of adaptive systems.
Abstract: In this paper we show that application-aware adaptation, a collaborative partnership between the operating system and applications, offers the most general and effective approach to mobile information access. We describe the design of Odyssey, a prototype implementing this approach, and show how it supports concurrent execution of diverse mobile applications. We identify agility as a key attribute of adaptive systems, and describe how to quantify and measure it. We present the results of our evaluation of Odyssey, indicating performance improvements up to a factor of 5 on a benchmark of three applications concurrently using remote services over a network with highly variable bandwidth.

827 citations

Proceedings ArticleDOI
03 Dec 1995
TL;DR: This paper shows how to use application-disclosed access patterns (hints) to expose and exploit I/O parallelism and to allocate dynamically file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses.
Abstract: The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-intensive applications. In particular, we show how to use application-disclosed access patterns (hints) to expose and exploit I/O parallelism and to allocate dynamically file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses. Our approach estimates the impact of alternative buffer allocations on application execution time and applies a cost-benefit analysis to allocate buffers where they will have the greatest impact. We implemented informed prefetching and caching in DEC''s OSF/1 operating system and measured its performance on a 150 MHz Alpha equipped with 15 disks running a range of applications including text search, 3D scientific visualization, relational database queries, speech recognition, and computational chemistry. Informed prefetching reduces the execution time of the first four of these applications by 20% to 87%. Informed caching reduces the execution time of the fifth application by up to 30%.

770 citations