scispace - formally typeset
Proceedings ArticleDOI

A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services

TLDR
Experimental results show that the novel cache system built on the top of the Hadoop Distributed File System can store files with a wide range in their sizes and has the access performance in a millisecond level in highly concurrent environments.
Abstract
The improvement of file access performance is a great challenge in real-time cloud services. In this paper, we analyze preconditions of dealing with this problem considering the aspects of requirements, hardware, software, and network environments in the cloud. Then we describe the design and implementation of a novel distributed layered cache system built on the top of the Hadoop Distributed File System which is named HDFS-based Distributed Cache System (HDCache). The cache system consists of a client library and multiple cache services. The cache services are designed with three access layers an in-memory cache, a snapshot of the local disk, and the actual disk view as provided by HDFS. The files loading from HDFS are cached in the shared memory which can be directly accessed by a client library. Multiple applications integrated with a client library can access a cache service simultaneously. Cache services are organized in the P2P style using a distributed hash table. Every file cached has three replicas in different cache service nodes in order to improve robustness and alleviates the workload. Experimental results show that the novel cache system can store files with a wide range in their sizes and has the access performance in a millisecond level in highly concurrent environments.

read more

Citations
More filters
Journal ArticleDOI

A Survey on Network Methodologies for Real-Time Analytics of Massive IoT Data and Open Research Issues

TL;DR: The state-of-the-art of the analytics network methodologies, which are suitable for real-time IoT analytics are reviewed, and a number of prospective research problems and future research directions are presented focusing on thenetwork methodologies for the real- time IoT analytics.
Journal ArticleDOI

Large-scale data mining using genetics-based machine learning

TL;DR: Different classes of methods that alone or (in many cases) combined accelerate genetics‐based machine learning methods are reviewed.
Patent

Prioritizing data requests based on quality of service

TL;DR: In this paper, a method of prioritizing data requests in a computing system based on quality of service includes identifying a plurality of data requests and assigning cache memory to each of the plurality of requests based on the prioritization.
Proceedings ArticleDOI

AutoReplica: Automatic data replica manager in distributed caching and data processing systems

TL;DR: This paper proposes a complete solution called AutoReplica — a replica manager in distributed caching and data processing systems with SSD-HDD tier storages, and proposes the a migrate-on-write technique called “fusion cache” to seamlessly migrate and prefetch among local and remote replicas without pausing the subsystem.
Proceedings ArticleDOI

Analytical review on Hadoop Distributed file system

TL;DR: Step by step introduction toData management using file system, data management using RDBMS then need of Hadoop distributed file system and its working process are included.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI

The Google file system

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Proceedings Article

Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!).

TL;DR: Bigtable as mentioned in this paper is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.
Proceedings ArticleDOI

Dynamo: amazon's highly available key-value store

TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.