Hive: a warehousing solution over a map-reduce framework

doi:10.14778/1687553.1687609

Journal ArticleDOI

Hive: a warehousing solution over a map-reduce framework

Ashish Thusoo, +8 more

- Vol. 2, Iss: 2, pp 1626-1629

Chats0

TLDR

Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware.

Abstract:

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [3] is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

The Hadoop Distributed File System

Konstantin Shvachko, +3 more

TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.

...read moreread less

Journal ArticleDOI

Experimental evidence of massive-scale emotional contagion through social networks

Adam D. I. Kramer, +2 more

- 17 Jun 2014 -

Proceedings of the National Academy of S...

TL;DR: The results indicate that emotions expressed by others on Facebook influence the authors' own emotions, constituting experimental evidence for massive-scale contagion via social networks, and suggest that the observation of others' positive experiences constitutes a positive experience for people.

...read moreread less

Journal ArticleDOI

Big Data: A Survey

Min Chen, +2 more

- 01 Apr 2014 -

Mobile Networks and Applications

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

...read moreread less

Journal ArticleDOI

The rise of big data on cloud computing

Ibrahim Abaker Targio Hashem, +5 more

- 01 Jan 2015 -

Information Systems

TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.

...read moreread less

Journal ArticleDOI

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

Han Hu, +3 more

- 24 Jun 2014 -

IEEE Access

TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

A comparison of approaches to large-scale data analysis

Andrew Pavlo, +6 more

TL;DR: A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.

...read moreread less

Journal ArticleDOI

SCOPE: easy and efficient parallel processing of massive data sets

Ronnie Chaiken, +6 more

TL;DR: A new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis, designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters.

...read moreread less

Hive: a warehousing solution over a map-reduce framework

Citations

The Hadoop Distributed File System

Experimental evidence of massive-scale emotional contagion through social networks

Big Data: A Survey

The rise of big data on cloud computing

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

References

A comparison of approaches to large-scale data analysis

SCOPE: easy and efficient parallel processing of massive data sets

Related Papers (5)

MapReduce: simplified data processing on large clusters

Pig latin: a not-so-foreign language for data processing

Dryad: distributed data-parallel programs from sequential building blocks

Spark: cluster computing with working sets

The Hadoop Distributed File System