Author

Hairong Kuang

Bio: Hairong Kuang is an academic researcher from Yahoo!. The author has contributed to research in topics: Enterprise data management & Distributed File System. The author has an hindex of 1, co-authored 1 publications receiving 4572 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The Hadoop Distributed File System

[...]

Konstantin Shvachko¹, Hairong Kuang¹, Sanjay Radia¹, Robert J. Chansler¹•Institutions (1)

Yahoo!¹

03 May 2010

TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.

...read moreread less

Abstract: The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

...read moreread less

5,005 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Edge Computing: Vision and Challenges

[...]

Weisong Shi¹, Jie Cao¹, Quan Zhang¹, Youhuizi Li¹, Lanyu Xu¹ - Show less +1 more•Institutions (1)

Wayne State University¹

09 Jun 2016-IEEE Internet of Things Journal

TL;DR: The definition of edge computing is introduced, followed by several case studies, ranging from cloud offloading to smart home and city, as well as collaborative edge to materialize the concept of edge Computing.

...read moreread less

Abstract: The proliferation of Internet of Things (IoT) and the success of rich cloud services have pushed the horizon of a new computing paradigm, edge computing, which calls for processing the data at the edge of the network. Edge computing has the potential to address the concerns of response time requirement, battery life constraint, bandwidth cost saving, as well as data safety and privacy. In this paper, we introduce the definition of edge computing, followed by several case studies, ranging from cloud offloading to smart home and city, as well as collaborative edge to materialize the concept of edge computing. Finally, we present several challenges and opportunities in the field of edge computing, and hope this paper will gain attention from the community and inspire more research in this direction.

...read moreread less

5,198 citations

Journal Article•DOI•

The rise of big data on cloud computing

[...]

Ibrahim Abaker Targio Hashem¹, Ibrar Yaqoob¹, Nor Badrul Anuar¹, Salimah Binti Mokhtar¹, Abdullah Gani¹, Samee U. Khan² - Show less +2 more•Institutions (2)

Information Technology University¹, North Dakota State University²

01 Jan 2015-Information Systems

TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.

...read moreread less

2,141 citations

Proceedings Article•DOI•

Apache Hadoop YARN: yet another resource negotiator

[...]

Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas¹, Sharad Agarwal, Mahadev Konar, Robert Evans², Thomas Graves², Jason Lowe², Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino¹, Owen O'Malley, Sanjay Radia, Benjamin Reed³, Eric Baldeschwieler - Show less +12 more•Institutions (3)

Microsoft¹, Yahoo!², Facebook³

01 Oct 2013

TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.

...read moreread less

Abstract: The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agora---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

...read moreread less

2,006 citations

Proceedings Article•

In search of an understandable consensus algorithm

[...]

Diego Ongaro¹, John Ousterhout¹•Institutions (1)

Stanford University¹

19 Jun 2014

TL;DR: Raft is a consensus algorithm for managing a replicated log that separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered.

...read moreread less

Abstract: Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Results from a user study demonstrate that Raft is easier for students to learn than Paxos. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety.

...read moreread less

1,811 citations

Journal Article•DOI•

A review of clustering techniques and developments

[...]

Amit Saxena¹, Mukesh Prasad², Akshansh Gupta³, Neha Bharill⁴, Om Prakash Patel⁴, Aruna Tiwari⁴, Meng Joo Er⁵, Weiping Ding⁶, Chin-Teng Lin² - Show less +5 more•Institutions (6)

Guru Ghasidas University¹, University of Technology, Sydney², Jawaharlal Nehru University³, Indian Institute of Technology Indore⁴, Nanyang Technological University⁵, Nantong University⁶

06 Dec 2017-Neurocomputing

TL;DR: The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted and the approaches used in these methods are discussed with their respective states of art and applicability.

...read moreread less

745 citations

Collapse