TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

/pdf/tensorflow-a-system-for-large-scale-machine-learning-10wgdjhl6k.pdf

TensorFlow: a system for large-scale machine learning

TensorFlow is an interface for expressing machine learning algorithms, and an
implementation for executing such algorithms. A computation expressed using
TensorFlow can be executed with little or no change on a wide variety of
heterogeneous systems, ranging from mobile devices such as phones and tablets
up to large-scale distributed systems of hundreds of machines and thousands of
computational devices such as GPU cards. The system is flexible and can be used
to express a wide variety of algorithms, including training and inference
algorithms for deep neural network models, and it has been used for conducting
research and for deploying machine learning systems into production across more
than a dozen areas of computer science and other fields, including speech
recognition, computer vision, robotics, information retrieval, natural language
processing, geographic information extraction, and computational drug
discovery. This paper describes the TensorFlow interface and an implementation
of that interface that we have built at Google. The TensorFlow API and a
reference implementation were released as an open-source package under the
Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

/pdf/tensorflow-large-scale-machine-learning-on-heterogeneous-2yk4fgchw6.pdf

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

TensorFlow: A system for large-scale machine learning

Fabric is a modular and extensible open-source system for deploying and operating permissioned blockchains and one of the Hyperledger projects hosted by the Linux Foundation (www.hyperledger.org). Fabric is the first truly extensible blockchain system for running distributed applications. It supports modular consensus protocols, which allows the system to be tailored to particular use cases and trust models. Fabric is also the first blockchain system that runs distributed applications written in standard, general-purpose programming languages, without systemic dependency on a native cryptocurrency. This stands in sharp contrast to existing block-chain platforms that require "smart-contracts" to be written in domain-specific languages or rely on a cryptocurrency. Fabric realizes the permissioned model using a portable notion of membership, which may be integrated with industry-standard identity management. To support such flexibility, Fabric introduces an entirely novel blockchain design and revamps the way blockchains cope with non-determinism, resource exhaustion, and performance attacks. This paper describes Fabric, its architecture, the rationale behind various design decisions, its most prominent implementation aspects, as well as its distributed application programming model. We further evaluate Fabric by implementing and benchmarking a Bitcoin-inspired digital currency. We show that Fabric achieves end-to-end throughput of more than 3500 transactions per second in certain popular deployment configurations, with sub-second latency, scaling well to over 100 peers.

https://dl.acm.org/doi/pdf/10.1145/3190508.3190538

Hyperledger fabric: a distributed operating system for permissioned blockchains

Mobile phones or smartphones are rapidly becoming the central computer and communication device in people's lives. Application delivery channels such as the Apple AppStore are transforming mobile phones into App Phones, capable of downloading a myriad of applications in an instant. Importantly, today's smartphones are programmable and come with a growing set of cheap powerful embedded sensors, such as an accelerometer, digital compass, gyroscope, GPS, microphone, and camera, which are enabling the emergence of personal, group, and communityscale sensing applications. We believe that sensor-equipped mobile phones will revolutionize many sectors of our economy, including business, healthcare, social networks, environmental monitoring, and transportation. In this article we survey existing mobile phone sensing algorithms, applications, and systems. We discuss the emerging sensing paradigms, and formulate an architectural framework for discussing a number of the open issues and challenges emerging in the new area of mobile phone sensing research.

/pdf/a-survey-of-mobile-phone-sensing-4to5zwn4tr.pdf

A survey of mobile phone sensing

We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.

/pdf/resilient-distributed-datasets-a-fault-tolerant-abstraction-1jmy0nuhco.pdf

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications

https://dl.acm.org/doi/pdf/10.1145/2934664

Apache Spark: a unified engine for big data processing

Many "big data" applications must act on data in real time. Running these applications at ever-larger scales requires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not handle stragglers. We propose a new processing model, discretized streams (D-Streams), that overcomes these challenges. D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers. We show that they support a rich set of operators while attaining high per-node throughput similar to single-node systems, linear scaling to 100 nodes, sub-second latency, and sub-second fault recovery. Finally, D-Streams can easily be composed with batch and interactive query models like MapReduce, enabling rich applications that combine these modes. We implement D-Streams in a system called Spark Streaming.

/pdf/discretized-streams-fault-tolerant-streaming-computation-at-59e0rdpnuz.pdf

Discretized streams: fault-tolerant streaming computation at scale

Many important "big data" applications need to process data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional programming API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster. We have prototyped D-Streams in an extension to the Spark cluster computing framework called Spark Streaming, which lets users seamlessly intermix streaming, batch and interactive queries.

/pdf/discretized-streams-an-efficient-and-fault-tolerant-model-2h9jaxbqpx.pdf

Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

Web applications have now become so sophisticated that rendering a typical page may require hundreds of intra-datacenter flows. At the same time, web sites must meet strict page creation deadlines of 200-300ms to satisfy user demands for interactivity. Long-tailed flow completion times make it challenging for web sites to meet these constraints. They are forced to choose between rendering a subset of the complex page, or delay its rendering, thus missing deadlines and sacrificing either quality or responsiveness. Either option leads to potential financial loss.In this paper, we present a new cross-layer network stack aimed at reducing the long tail of flow completion times. The approach exploits cross-layer information to reduce packet drops, prioritize latency-sensitive flows, and evenly distribute network load, effectively reducing the long tail of flow completion times. We evaluate our approach through NS-3 based simulation and Click-based implementation demonstrating our ability to consistently reduce the tail across a wide range of workloads. We often achieve reductions of over 50% in 99.9th percentile flow completion times.

/pdf/detail-reducing-the-flow-completion-time-tail-in-datacenter-1zbmc6308a.pdf

Tathagata Das

Papers

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

Apache Spark: a unified engine for big data processing

Discretized streams: fault-tolerant streaming computation at scale

Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

DeTail: reducing the flow completion time tail in datacenter networks