scispace - formally typeset
Search or ask a question

Showing papers by "Cloudera published in 2012"


Patent
03 Aug 2012
TL;DR: In this paper, the authors present a distributed computing cluster-wide, real time view of the services running and the status of the host machines in a cluster via a single, central place to enact configuration changes across the computing cluster which further incorporates reporting and diagnostic tools to optimize cluster performance and utilization.
Abstract: Systems and methods for centralized configuration and monitoring of a distributed computing cluster are disclosed. One embodiment of the disclose technology enables deployment and central operation a complete Hadoop stack. The application automates the installation process and reduces deployment time from weeks to minutes. One embodiment further provides a cluster-wide, real time view of the services running and the status of the host machines in a cluster via a single, central place to enact configuration changes across the computing cluster which further incorporates reporting and diagnostic tools to optimize cluster performance and utilization.

108 citations


Proceedings ArticleDOI
20 May 2012
TL;DR: This paper finds that conventional workflow management tools lack at least one of these qualities of Scalability, Security, Multi-tenancy, and Operability, and therefore presents Apache Oozie, a workflow management system specialized for Hadoop.
Abstract: Hadoop is a massively scalable parallel computation platform capable of running hundreds of jobs concurrently, and many thousands of jobs per day. Managing all these computations demands for a workflow and scheduling system. In this paper, we identify four indispensable qualities that a Hadoop workflow management system must fulfill namely Scalability, Security, Multi-tenancy, and Operability. We find that conventional workflow management tools lack at least one of these qualities, and therefore present Apache Oozie, a workflow management system specialized for Hadoop. We discuss the architecture of Oozie, share our production experience over the last few years at Yahoo, and evaluate Oozie's scalability and performance.

101 citations


Patent
Todd Lipcon1
21 Mar 2012
TL;DR: In this paper, the authors present a number of methods of data processing performance enhancement, such as invoking operating system calls to optimize cache management by an I/O component, and proactive triggering of readaheads for sequential read requests of a disk, purging data out of buffer cache after writing to the disk or performing sequential reads from the desk.
Abstract: Systems and methods of data processing performance enhancement are disclosed. One embodiment includes, invoking operating system calls to optimize cache management by an I/O component; wherein, the operating system calls are invoked to perform one or more of; proactive triggering of readaheads for sequential read requests of a disk; purging data out of buffer cache after writing to the disk or performing sequential reads from the desk; and/or eliminating a delay between when a write is performed and when written data from the write is flushed to the disk from the buffer cache.

10 citations


Proceedings ArticleDOI
Matthew Jacobs1
21 Sep 2012
TL;DR: This talk is about building monitoring and diagnostics tools for Hadoop and some of the interesting challenges and lessons learned in the process.
Abstract: This talk is about building monitoring and diagnostics tools for Hadoop and some of the interesting challenges and lessons learned in the process.

1 citations