Topic

Programming with Big Data in R

About: Programming with Big Data in R is a research topic. Over the lifetime, 115 publications have been published within this topic receiving 38880 citations. The topic is also known as: pbdR.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

...read moreread less

20,309 citations

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2008-Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

...read moreread less

17,663 citations

Book•

Software for Data Analysis: Programming with R

[...]

John M. Chambers

14 Jun 2008

TL;DR: Programming with R: The Basics and Methods and Generic Functions and Interfaces I: Using C and Fortran.

...read moreread less

Abstract: Introduction: Principles and Concepts.- Using R.- Programming with R: The Basics.- R Packages.- Objects.- Basic Data and Computations.- Data Visualization and Graphics.- Computing with Text.- New Classes.- Methods and Generic Functions.- Interfaces I: Using C and Fortran.- Interfaces II: Between R and Other Systems.- How R Works.- Errata and Notes for "Software for Data Analysis: Programming with R".

...read moreread less

307 citations

Journal Article•DOI•

Simple Parallel Statistical Computing in R

[...]

A. J. Rossini¹, Luke Tierney¹, Na Li¹•Institutions (1)

University of Minnesota¹

01 Jun 2007-Journal of Computational and Graphical Statistics

TL;DR: This work presents a framework for the R statistical computing language that provides a simple yet powerful programming interface to a computational cluster of CPUs that allows the rapid development of R functions that distribute independent computations across the nodes of the computational cluster.

...read moreread less

Abstract: Theoretically, many modern statistical procedures are trivial to parallelize. However, practical deployment of a parallelized implementation which is robust and reliably runs on different computational cluster configurations and environments is far from trivial. We present a framework for the R statistical computing language that provides a simple yet powerful programming interface to a computational cluster of CPUs. This interface allows the rapid development of R functions that distribute independent computations across the nodes of the computational cluster. The approach can be extended to finer grain parallelization if needed. The resulting framework allows statisticians to obtain significant speed-ups for some computations at little additional development cost. The particular implementation can be deployed in ad-hoc heterogeneous computing environments.

...read moreread less

111 citations

R: A programming environment for Data Analysis and Graphics

[...]

Silvia Liverani

01 Jan 2008

TL;DR: This talk will introduce R, a language and environment for statistical computing and graphics that provides a wide variety of statistical techniques and techniques, and is highly extensible.

...read moreread less

Abstract: The talk will introduce R, a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and non-linear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. R provides an Open Source route for research in statistical methodology. (C) R Foundation, from http://www.r-project.org.

...read moreread less

108 citations

Collapse

Network Information

Performance

Metrics

115

Papers

40,361

Citations

No. of papers in the topic in previous years
Year	Papers
2018	1
2017	10
2016	18
2015	24
2014	28
2013	11

Programming with Big Data in R

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics