scispace - formally typeset
Search or ask a question

Showing papers in "Communications of The ACM in 2016"


Journal ArticleDOI
TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Abstract: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications

1,776 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss the threat posed by today's social bots and how their presence can endanger online ecosystems as well as our society, and how to deal with them.
Abstract: Today's social bots are sophisticated and sometimes menacing. Indeed, their presence can endanger online ecosystems as well as our society.

1,259 citations


Journal ArticleDOI
TL;DR: This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.
Abstract: This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.

1,157 citations


Journal ArticleDOI
TL;DR: This research presents a novel and scalable approach called “Smart Contracts” that combines crowd-sourcing, analytics, and machine learning to solve the challenge of integrating NoSQL data stores to manage and protect digital assets.
Abstract: Blockchain technology has the potential to revolutionize applications and redefine the digital economy

903 citations


Journal ArticleDOI
TL;DR: The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.
Abstract: Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost It is challenging to improve all of these factors simultaneously To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA) Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth networkWe describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput In other words, the reconfigurable fabric enables the same throughput using only half the number of servers

835 citations


Journal ArticleDOI
TL;DR: The conjecture that most software is also natural - in the sense that it is created by humans at work, with all the attendant constraints and limitations - and thus, like natural language, it is also likely to be repetitive and predictable is investigated.
Abstract: Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension.We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations---and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether (a) code can be usefully modeled by statistical language models and (b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very regular, and, in fact, even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse's completion capability. We conclude the paper by laying out a vision for future research in this area.

572 citations


Journal ArticleDOI
Brendan Burns1, Brian Grant1, David Oppenheimer1, Eric Brewer1, John Wilkes1 
TL;DR: The lessons from developing and operating three different container-management systems at Google for more than ten years are described.
Abstract: Though widespread interest in software containers is a relatively recent phenomenon, at Google we have been managing Linux containers at scale for more than ten years and built three different container-management systems in that time. Each system was heavily influenced by its predecessors, even though they were developed for different reasons. This article describes the lessons we’ve learned from developing and operating them.

466 citations


Journal ArticleDOI
TL;DR: Historically, even though most Web sites were driven off structured databases, they published their content purely in HTML, and applications requiring access to the structured data underlying these Web pages had to build custom extractors to convert plain HTML into structured data.
Abstract: Separation between content and presentation has always been one of the important design aspects of the Web. Historically, however, even though most Web sites were driven off structured databases, they published their content purely in HTML. Services such as Web search, price comparison, reservation engines, etc. that operated on this content had access only to HTML. Applications requiring access to the structured data underlying these Web pages had to build custom extractors to convert plain HTML into structured data. These efforts were often laborious and the scrapers were fragile and error-prone, breaking every time a site changed its layout.

343 citations


Journal ArticleDOI
TL;DR: Companies such as Automated Insights, which produces the articles for AP, and Narrative Science can now write straight news articles in almost any domain that has clean and well-structured data: finance, sure, but also sports, weather, and education, among others.
Abstract: Every fiscal quarter automated writing algorithms churn out thousands of corporate earnings articles for the AP (Associated Press) based on little more than structured data. Companies such as Automated Insights, which produces the articles for AP, and Narrative Science can now write straight news articles in almost any domain that has clean and well-structured data: finance, sure, but also sports, weather, and education, among others. The articles aren’t cardboard either; they have variability, tone, and style, and in some cases readers even have difficulty distinguishing the machine-produced articles from human-written ones.

300 citations


Journal ArticleDOI
TL;DR: RandNLA is an interdisciplinary research area that exploits randomization as a computational resource to develop improved algorithms for large-scale linear algebra problems and promises a sound algorithmic and statistical foundation for modern large- scale data analysis.
Abstract: M ATRICES ARE UBIQUITOUS in computer science, statistics, and applied mathematics. An m × n matrix can encode information about m objects (each described by n features), or the behavior of a discretized differential operator on a finite element mesh; an n × n positive-definite matrix can encode the correlations between all pairs of n objects, or the edge-connectivity between all pairs of nodes in a social network; and so on. Motivated largely by technological developments that generate extremely large scientific and Internet datasets, recent years have witnessed exciting developments in the theory and practice of matrix algorithms. Particularly remarkable is the use of randomization—typically assumed to be a property of the input data due to, for example, noise in the data generation mechanisms—as an algorithmic or computational resource for the develop ment of improved algorithms for fundamental matrix problems such as matrix multiplication, least-squares (LS) approximation, lowrank matrix approxi mation, and Laplacian-based linear equ ation solvers. Randomized Numerical Linear Algebra (RandNLA) is an interdisciplinary research area that exploits randomization as a computational resource to develop improved algorithms for large-scale linear algebra problems. From a foundational perspective, RandNLA has its roots in theoretical computer science (TCS), with deep connections to mathematics (convex analysis, probability theory, metric embedding theory) and applied mathematics (scientific computing, signal processing, numerical linear algebra). From an applied perspective, RandNLA is a vital new tool for machine learning, statistics, and data analysis. Well-engineered implementations have already outperformed highly optimized software libraries for ubiquitous problems such as leastsquares, with good scalability in parallel and distributed envi ronments.52 Moreover, RandNLA promises a sound algorithmic and statistical foundation for modern large-scale data analysis. RandNLA: Randomized Numerical Linear Algebra

245 citations


Journal ArticleDOI
TL;DR: The aim is to improve cities' management of natural and municipal resources and in turn the quality of life of their citizens.
Abstract: The aim is to improve cities' management of natural and municipal resources and in turn the quality of life of their citizens.

Journal ArticleDOI
TL;DR: Car automation promises to free the authors' hands from the steering wheel but might demand more from their minds, so what does that mean for us?
Abstract: Car automation promises to free our hands from the steering wheel but might demand more from our minds.

Journal ArticleDOI
TL;DR: A series of hardware accelerators designed for ML (especially neural networks), with a special emphasis on the impact of memory on accelerator design, performance, and energy are introduced.
Abstract: Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers) As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously achieve high efficiency and broad application scopeWhile efficient computational primitives are important for a hardware accelerator, inefficient memory transfers can potentially void the throughput, energy, or cost advantages of accelerators, that is, an Amdahl's law effect, and thus, they should become a first-order concern, just like in processors, rather than an element factored in accelerator design on a second step In this article, we introduce a series of hardware accelerators (ie, the DianNao family) designed for ML (especially neural networks), with a special emphasis on the impact of memory on accelerator design, performance, and energy We show that, on a number of representative neural network layers, it is possible to achieve a speedup of 45065x over a GPU, and reduce the energy by 15031x on average for a 64-chip DaDianNao system (a member of the DianNao family)

Journal ArticleDOI
TL;DR: A static quantitative reliability analysis is presented that verifies quantitative requirements on the reliability of an application, enabling a developer to perform sound and verified reliability engineering.
Abstract: Emerging high-performance architectures are anticipated to contain unreliable components that may exhibit soft errors, which silently corrupt the results of computations. Full detection and masking of soft errors is challenging, expensive, and, for some applications, unnecessary. For example, approximate computing applications (such as multimedia processing, machine learning, and big data analytics) can often naturally tolerate soft errors.We present Rely a programming language that enables developers to reason about the quantitative reliability of an application -- namely, the probability that it produces the correct result when executed on unreliable hardware. Rely allows developers to specify the reliability requirements for each value that a function produces.We present a static quantitative reliability analysis that verifies quantitative requirements on the reliability of an application, enabling a developer to perform sound and verified reliability engineering. The analysis takes a Rely program with a reliability specification and a hardware specification that characterizes the reliability of the underlying hardware components and verifies that the program satisfies its reliability specification when executed on the underlying unreliable hardware platform. We demonstrate the application of quantitative reliability analysis on six computations implemented in Rely.

Journal ArticleDOI
TL;DR: This paper aims to reframe computational thinking as computational participation, and investigates the role of language in the development of knowledge representation.
Abstract: Seeking to reframe computational thinking as computational participation.

Journal ArticleDOI
TL;DR: This presentation explains how human-centered design can make application programming interfaces easier for developers to use and how this approach can be applied to the design of mobile devices.
Abstract: Human-centered design can make application programming interfaces easier for developers to use.

Journal ArticleDOI
TL;DR: To encourage repeatable research, fund repeatability engineering and reward commitments to sharing research artifacts to encourage shareable research.
Abstract: To encourage repeatable research, fund repeatability engineering and reward commitments to sharing research artifacts.

Journal ArticleDOI
TL;DR: The future success of these systems depends on more than a Netflix challenge; they need to be designed to work together.
Abstract: The future success of these systems depends on more than a Netflix challenge.

Journal ArticleDOI
Rachel Potvin1, Josh Levenberg1
TL;DR: Google's monolithic repository provides a common source of truth for tens of thousands of developers around the world.
Abstract: Google's monolithic repository provides a common source of truth for tens of thousands of developers around the world.

Journal ArticleDOI
Percy Liang1
TL;DR: Semantic parsing is a rich fusion of the logical and the statistical worlds and can be applied to practically any type of data type.
Abstract: For building question answering systems and natural language interfaces, semantic parsing has emerged as an important and powerful paradigm. Semantic parsers map natural language into logical forms, the classic representation for many important linguistic phenomena. The modern twist is that we are interested in learning semantic parsers from data, which introduces a new layer of statistical and computational issues. This article lays out the components of a statistical semantic parser, highlighting the key challenges. We will see that semantic parsing is a rich fusion of the logical and the statistical world, and that this fusion will play an integral role in the future of natural language understanding systems.

Journal ArticleDOI
TL;DR: In this paper, a heuristic clustering method was used to group Bitcoin wallets based on evidence of shared authority and then using reidentification attacks (i.e., empirical purchasing of goods and services) to classify the operators of those clusters.
Abstract: Bitcoin is a purely online virtual currency, unbacked by either physical commodities or sovereign obligation; instead, it relies on a combination of cryptographic protection and a peer-to-peer protocol for witnessing settlements. Consequently, Bitcoin has the unintuitive property that while the ownership of money is implicitly anonymous, its flow is globally visible. In this paper we explore this unique characteristic further, using heuristic clustering to group Bitcoin wallets based on evidence of shared authority, and then using re-identification attacks (i.e., empirical purchasing of goods and services) to classify the operators of those clusters. From this analysis, we consider the challenges for those seeking to use Bitcoin for criminal or fraudulent purposes at scale.

Journal ArticleDOI
TL;DR: Moore's Law is one small component in an exponentially growing planetary computing ecosystem.
Abstract: Moore's Law is one small component in an exponentially growing planetary computing ecosystem.

Journal ArticleDOI
TL;DR: UPON Lite focuses on users, typically domain experts without ontology expertise, minimizing the role of ontology engineers.
Abstract: UPON Lite focuses on users, typically domain experts without ontology expertise, minimizing the role of ontology engineers.

Journal ArticleDOI
TL;DR: System that is using such algorithmic assessments that have the potential to adversely impact protected groups, such as those in specific ethic groups, religious minorities, and others that might be subject to inadvertent or deliberate discrimination.
Abstract: system that is using such algorithmic assessments. Algorithms also are used to serve up job listings or credit offers that can be viewed as inadvertently biased, as they sometimes utilize end-user characteristics like household income and postal codes that can be proxies for race, given the correlation between ethnicity, household income, and geographic settling patterns. The New York Times in July 2015 highlighted several instances of al-gorithmic unfairness, or outright discrimination. It cited research conducted by Carnegie Mellon University in 2015 that found Google's ad-serving system showed an ad for high-paying jobs to men much more often than it did for women. Similarly , a study conducted at the University of Washington in 2015 found that despite women holding 27% of CEO posts in the U.S., a search for \" CEO \" using Google's Image Search tool returned results of which just 11% depicted women. A 2012 Harvard University study published in the Journal of Social Issues indicated advertisements for services that allow searching for people's arrest records were more likely to come up when searches were conducted on traditionally African -American names. For their part, programmers seem to recognize the need to address these issues of unfairness, particularly with respect to algorithms that have the potential to adversely impact protected groups, such as those in specific ethic groups, religious minorities, and others that might be subject to inadvertent or deliberate discrimination. \" Machine learning engineers care deeply about measuring accuracy of their models, \" explains Moritz Hardt, a senior research scientist at Google. \" What they additionally need to do is to measure accuracy within different subgroups. Wildly differing performance across different groups of the population can indicate a problem. In the context of fairness, it can actually help to make models more com-HAVE become an integral part of everyday life. Algorithms are able to process a far greater range of inputs and variables to make decisions, and can do so with speed and reliability that far exceed human capabilities. From the ads we are served, to the products we are offered, and to the results we are presented with after searching on-line, algorithms, rather than humans sitting behind the scenes, are making these decisions. However, because algorithms simply present the results of calculations defined by humans using data that may be provided by humans, machines, or a combination of the two (at some point during the process), they …

Journal ArticleDOI
TL;DR: Eulerian Video Magnification is a computational technique for visualizing subtle color and motion variations in ordinary videos by making the variations larger, a microscope for small changes that are hard or impossible for us to see by ourselves.
Abstract: The world is filled with important, but visually subtle signals. A person's pulse, the breathing of an infant, the sag and sway of a bridge---these all create visual patterns, which are too difficult to see with the naked eye. We present Eulerian Video Magnification, a computational technique for visualizing subtle color and motion variations in ordinary videos by making the variations larger. It is a microscope for small changes that are hard or impossible for us to see by ourselves. In addition, these small changes can be quantitatively analyzed and used to recover sounds from vibrations in distant objects, characterize material properties, and remotely measure a person's pulse.

Journal ArticleDOI
TL;DR: The most important consideration is how the collection of measurements may affect a person's well-being.
Abstract: The most important consideration is how the collection of measurements may affect a person's well-being.

Journal ArticleDOI
TL;DR: This work introduces AutoMan, the first fully automatic crowdprogramming system that integrates human-based computations into a standard programming language as ordinary function calls that can be intermixed freely with traditional functions.
Abstract: Humans can perform many tasks with ease that remain difficult or impossible for computers. Crowdsourcing platforms like Amazon's Mechanical Turk make it possible to harness human-based computational power at an unprecedented scale. However, their utility as a general-purpose computational platform remains limited. The lack of complete automation makes it difficult to orchestrate complex or interrelated tasks. Scheduling more human workers to reduce latency costs real money, and jobs must be monitored and rescheduled when workers fail to complete their tasks. Furthermore, it is often difficult to predict the length of time and payment that should be budgeted for a given task. Finally, the results of human-based computations are not necessarily reliable, both because human skills and accuracy vary widely, and because workers have a financial incentive to minimize their effort.This paper introduces AutoMan, the first fully automatic crowdprogramming system. AutoMan integrates human-based computations into a standard programming language as ordinary function calls, which can be intermixed freely with traditional functions. This abstraction lets AutoMan programmers focus on their programming logic. An AutoMan program specifies a confidence level for the overall computation and a budget. The AutoMan runtime system then transparently manages all details necessary for scheduling, pricing, and quality control. AutoMan automatically schedules human tasks for each computation until it achieves the desired confidence level; monitors, reprices, and restarts human tasks as necessary; and maximizes parallelism across human workers while staying under budget.

Journal ArticleDOI
TL;DR: To make the most of the enormous opportunities at hand will require focusing on five research areas, according to database researchers, who paint big data as a defining challenge.
Abstract: Database researchers paint big data as a defining challenge. To make the most of the enormous opportunities at hand will require focusing on five research areas.

Journal ArticleDOI
TL;DR: Seeking better understanding of digital transformation is a priority for the next generation of policymakers and decision-makers in the developing world.
Abstract: Seeking better understanding of digital transformation.

Journal ArticleDOI
TL;DR: Several key features and debugging challenges that differentiate distributed systems from other kinds of software that are presented in this article.
Abstract: Distributed systems pose unique challenges for software developers. Reasoning about concurrent activities of system nodes and even understanding the system’s communication topology can be difficult...