Journal ArticleDOI
Hive: a warehousing solution over a map-reduce framework
Ashish Thusoo,Joydeep Sen Sarma,Namit Jain,Zheng Shao,Prasad Chakka,Suresh Anthony,Hao Liu,Pete Wyckoff,Raghotham Murthy +8 more
- Vol. 2, Iss: 2, pp 1626-1629
Reads0
Chats0
TLDR
Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware.Abstract:
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [3] is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse.read more
Citations
More filters
Proceedings ArticleDOI
The Hadoop Distributed File System
TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.
Journal ArticleDOI
Experimental evidence of massive-scale emotional contagion through social networks
TL;DR: The results indicate that emotions expressed by others on Facebook influence the authors' own emotions, constituting experimental evidence for massive-scale contagion via social networks, and suggest that the observation of others' positive experiences constitutes a positive experience for people.
Journal ArticleDOI
Big Data: A Survey
Min Chen,Shiwen Mao,Yunhao Liu +2 more
TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Journal ArticleDOI
The rise of big data on cloud computing
Ibrahim Abaker Targio Hashem,Ibrar Yaqoob,Nor Badrul Anuar,Salimah Binti Mokhtar,Abdullah Gani,Samee U. Khan +5 more
TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.
Journal ArticleDOI
Toward Scalable Systems for Big Data Analytics: A Technology Tutorial
TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.
References
More filters
Proceedings ArticleDOI
A comparison of approaches to large-scale data analysis
Andrew Pavlo,Paulson Erik S,Alexander Rasin,Daniel J. Abadi,David J. DeWitt,Samuel Madden,Michael Stonebraker +6 more
TL;DR: A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.
Journal ArticleDOI
SCOPE: easy and efficient parallel processing of massive data sets
Ronnie Chaiken,Bob Jenkins,Per-Ake Larson,Bill Ramsey,Darren A. Shakib,Simon Weaver,Jingren Zhou +6 more
TL;DR: A new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis, designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters.