scispace - formally typeset
Journal ArticleDOI

Hive: a warehousing solution over a map-reduce framework

Reads0
Chats0
TLDR
Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware.
Abstract
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [3] is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

The Hadoop Distributed File System

TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.
Journal ArticleDOI

Experimental evidence of massive-scale emotional contagion through social networks

TL;DR: The results indicate that emotions expressed by others on Facebook influence the authors' own emotions, constituting experimental evidence for massive-scale contagion via social networks, and suggest that the observation of others' positive experiences constitutes a positive experience for people.
Journal ArticleDOI

Big Data: A Survey

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Journal ArticleDOI

The rise of big data on cloud computing

TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.
Journal ArticleDOI

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.
References
More filters
Proceedings ArticleDOI

A comparison of approaches to large-scale data analysis

TL;DR: A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.
Journal ArticleDOI

SCOPE: easy and efficient parallel processing of massive data sets

TL;DR: A new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis, designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters.