Showing papers in &quot;Communications of The ACM in 2012&quot;

A few useful things to know about machine learning

TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.

...read moreread less

Abstract: Probabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can be used for corpus exploration, document search, and a variety of prediction problems.In this tutorial, I will review the state-of-the-art in probabilistic topic models. I will describe the three components of topic modeling:(1) Topic modeling assumptions(2) Algorithms for computing with topic models(3) Applications of topic modelsIn (1), I will describe latent Dirichlet allocation (LDA), which is one of the simplest topic models, and then describe a variety of ways that we can build on it. These include dynamic topic models, correlated topic models, supervised topic models, author-topic models, bursty topic models, Bayesian nonparametric topic models, and others. I will also discuss some of the fundamental statistical ideas that are used in building topic models, such as distributions on the simplex, hierarchical Bayesian modeling, and models of mixed-membership.In (2), I will review how we compute with topic models. I will describe approximate posterior inference for directed graphical models using both sampling and variational inference, and I will discuss the practical issues and pitfalls in developing these algorithms for topic models. Finally, I will describe some of our most recent work on building algorithms that can scale to millions of documents and documents arriving in a stream.In (3), I will discuss applications of topic models. These include applications to images, music, social networks, and other data in which we hope to uncover hidden patterns. I will describe some of our recent work on adapting topic modeling algorithms to collaborative filtering, legislative modeling, and bibliometrics without citations.Finally, I will discuss some future directions and open research problems in topic models.

...read moreread less

4,529 citations

Journal Article•DOI•

Networking named content

[...]

Van L. Jacobson, Diana K. Smetters, James D. Thornton, Michael F. Plass, Nicholas H. Briggs, R. Braynard - Show less +2 more

01 Jan 2012-Communications of The ACM

TL;DR: Content-Centric Networking (CCN) is presented which uses content chunks as a primitive---decoupling location from identity, security and access, and retrieving chunks of content by name, and simultaneously achieves scalability, security, and performance.

...read moreread less

Abstract: Current network use is dominated by content distribution and retrieval yet current networking protocols are designed for conversations between hosts. Accessing content and services requires mapping from the what that users care about to the network's where. We present Content-Centric Networking (CCN) which uses content chunks as a primitive---decoupling location from identity, security and access, and retrieving chunks of content by name. Using new approaches to routing named content, derived from IP, CCN simultaneously achieves scalability, security, and performance. We describe our implementation of the architecture's basic features and demonstrate its performance and resilience with secure file downloads and VoIP calls.

...read moreread less

3,122 citations

Journal Article•DOI•

[...]

Pedro Domingos¹•Institutions (1)

University of Washington¹

01 Oct 2012-Communications of The ACM

TL;DR: Tapping into the "folk knowledge" needed to advance machine learning applications is a natural next step in the development of artificial intelligence systems.

...read moreread less

Abstract: Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.

...read moreread less

2,482 citations

Journal Article•DOI•

Exact matrix completion via convex optimization

[...]

Emmanuel J. Candès¹, Benjamin Recht²•Institutions (2)

Stanford University¹, University of Wisconsin-Madison²

01 Jun 2012-Communications of The ACM

TL;DR: In this paper, a convex programming problem is used to find the matrix with the minimum nuclear norm that is consistent with the observed entries in a low-rank matrix, which is then used to recover all the missing entries from most sufficiently large subsets.

...read moreread less

Abstract: Suppose that one observes an incomplete subset of entries selected from a low-rank matrix. When is it possible to complete the matrix and recover the entries that have not been seen? We demonstrate that in very general settings, one can perfectly recover all of the missing entries from most sufficiently large subsets by solving a convex programming problem that finds the matrix with the minimum nuclear norm agreeing with the observed entries. The techniques used in this analysis draw upon parallels in the field of compressed sensing, demonstrating that objects other than signals and images can be perfectly reconstructed from very limited information.

...read moreread less

2,327 citations

Journal Article•DOI•

SAGE: whitebox fuzzing for security testing

[...]

Patrice Godefroid¹, Michael Y. Levin, David Molnar¹•Institutions (1)

Microsoft¹

01 Mar 2012-Communications of The ACM

TL;DR: If you are reading these lines on a PC running some form of Windows, then you have been affected by this line of work--without knowing it, which is precisely the way the authors want it to be.

...read moreread less

Abstract: Most ACM Queue readers might think of "program verification research" as mostly theoretical with little impact on the world at large. Think again. If you are reading these lines on a PC running som...

...read moreread less

624 citations

Journal Article•DOI•

Controlling queue delay

[...]

Kathleen Nichols, Van Jacobson

01 Jul 2012-Communications of The ACM

TL;DR: The "persistently full buffer problem" is still with us and made increasingly critical by two trends, cheap memory and a "more is better" mentality have led to the inflation and proliferation of buffers.

...read moreread less

Abstract: Nearly three decades after it was first diagnosed, the "persistently full buffer problem" recently exposed as part of "bufferbloat", is still with us and made increasingly critical by two trends. F...

...read moreread less

599 citations

Journal Article•DOI•

Putting the 'smarts' into the smart grid: a grand challenge for artificial intelligence

[...]

Sarvapali D. Ramchurn¹, Perukrishnen Vytelingum¹, Alex Rogers¹, Nicholas R. Jennings¹•Institutions (1)

University of Southampton¹

The state of phishing attacks

TL;DR: A research agenda for making the smart grid a reality is presented, with a focus on energy efficiency, smart grids and smart cities.

...read moreread less

Abstract: The phenomenal growth in material wealth experienced in developed countries throughout the twentieth century has largely been driven by the availability of cheap energy derived from fossil fuels (originally coal, then oil, and most recently natural gas). However, the continued availability of this cheap energy cannot be taken for granted given the growing concern that increasing demand for these fuels (and particularly, demand for oil) will outstrip our ability to produce them (so called 'peak oil'). Many mature oil and gas fields around the world have already peaked and their annual production is now steadily declining. Predictions of when world oil production will peak vary between 0-20 years into the future, but even the most conservative estimates provide little scope for complacency given the significant price increases that peak oil is likely to precipitate. Furthermore, many of the oil and gas reserves that do remain are in environmentally or politically sensitive regions of the world where threats to supply create increased price volatility (as evidenced by the 2010 Deepwater Horizon disaster and 2011 civil unrest in the Middle East). Finally, the growing consensus on the long term impact of carbon emissions from burning fossil fuels suggests that even if peak oil is avoided, and energy security assured, a future based on fossil fuel use will expose regions of the world to damaging climate change that will make the lives of many of the world's poorest people even harder.

...read moreread less

513 citations

Journal Article•DOI•

[...]

Jason Hong¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2012-Communications of The ACM

TL;DR: Looking past the systems people use, they target the people using the systems.

...read moreread less

Abstract: Looking past the systems people use, they target the people using the systems.

...read moreread less

457 citations

Journal Article•DOI•

Process mining

[...]

Wil M. P. van der Aalst¹•Institutions (1)

Eindhoven University of Technology¹

01 Aug 2012-Communications of The ACM

TL;DR: Using real event data to X-ray business processes helps ensure conformance between design and reality.

...read moreread less

Abstract: Using real event data to X-ray business processes helps ensure conformance between design and reality.

...read moreread less

408 citations

Journal Article•DOI•

Interactive dynamics for visual analysis

[...]

Jeffrey Heer¹, Ben Shneiderman²•Institutions (2)

Stanford University¹, University of Maryland, College Park²

Real-time computer vision with OpenCV

TL;DR: Analysis requires contextualized human judgments regarding the domain-specific significance of the clusters, trends, and outliers discovered in data.

...read moreread less

Abstract: The increasing scale and availability of digital data provides an extraordinary resource for informing public policy, scientific discovery, business strategy, and even our personal lives. To get the most out of such data, however, users must be able to make sense of it: to pursue questions, uncover patterns of interest, and identify (and potentially correct) errors. In concert with data-management systems and statistical algorithms, analysis requires contextualized human judgments regarding the domain-specific significance of the clusters, trends, and outliers discovered in data.

...read moreread less

404 citations

Journal Article•DOI•

[...]

Kari Pulli¹, Anatoly Baksheev, Kirill Kornyakov, Victor Eruhimov•Institutions (1)

Nvidia¹

01 Jun 2012-Communications of The ACM

TL;DR: Application areas for computer-vision technology include video surveillance, biometrics, automotive, photography, movie production, Web search, medicine, augmented reality gaming, new user interfaces, and many more.

...read moreread less

Abstract: Computer vision is a rapidly growing field devoted to analyzing, modifying, and high-level understanding of images. Its objective is to determine what is happening in front of a camera and use that understanding to control a computer or robotic system, or to provide people with new images that are more informative or aesthetically pleasing than the original camera images. Application areas for computer-vision technology include video surveillance, biometrics, automotive, photography, movie production, Web search, medicine, augmented reality gaming, new user interfaces, and many more.

...read moreread less

Journal Article•DOI•

Why rumors spread so quickly in social networks

[...]

Benjamin Doerr¹, Mahmoud Fouz, Tobias Friedrich¹•Institutions (1)

Max Planck Society¹

01 Jun 2012-Communications of The ACM

TL;DR: A few hubs with many connections share with many individuals with few connections, leading to a chain of relationships that is mutually beneficial to both parties.

...read moreread less

Abstract: Understanding structural and algorithmic properties of complex networks is an important task, not least because of the huge impact of the internet. Our focus is to analyze how news spreads in social networks. We simulate a simple information spreading process in different network topologies and demonstrate that news spreads much faster in existing social network topologies. We support this finding by analyzing information spreading in the mathematically defined preferential attachment network topology, which is a common model for real-world networks. We prove that here a sublogarithmic time suffices to spread a news to all nodes of the network. All previously studied network topologies need at least a logarithmic time. Surprisingly, we observe that nodes with few neighbors are crucial for the fast dissemination. Social networks like Facebook and Twitter are reshaping the way people take collective actions. They have played a crucial role in the recent uprisings of the ‘Arab Spring’ and the ‘London riots’. It has been argued that the ‘instantaneous nature’ of these networks influenced the speed at which the events were unfolding [4]. It is quite remarkable that social networks spread news so fast. Both the structure of social networks and the process that distributes the news are not designed with this purpose in mind. On the contrary, they are not designed at all, but have evolved in a random and decentralized manner. So is our view correct that social networks ease the spread of information (“rumors”), and if so, what particular properties of social networks are the reason for this? To answer these questions, we simulate a simple rumor spreading process on several graphs having the structure of existing large social networks. We see, for example, that a rumor started at a random node of the Twitter network in average reaches 45.6 million of the total of 51.2 million members within only eight rounds of communication. We also analyze this process on an abstract model of social networks, the so-called preferential attachment graphs introduced by Barabasi and Albert [3]. In [17], we obtain a mathematical proof that rumors in such networks spread much faster than in many other network topologies—even faster than in networks having a communication link between any two nodes (complete graphs). As an explanation, we observe that nodes of small degree build a short-cut between those having large degree (hubs), which due to their large number of possible communication partners less often talk to each other directly.

...read moreread less

Journal Article•DOI•

The challenges ahead for bio-inspired 'soft' robotics

[...]

Rolf Pfeifer¹, Max Lungarella¹, Fumiya Iida²•Institutions (2)

University of Zurich¹, ETH Zurich²

01 Nov 2012-Communications of The ACM

TL;DR: Soft materials may enable the automation of tasks beyond the capacities of current robotic technology.

...read moreread less

Abstract: Soft materials may enable the automation of tasks beyond the capacities of current robotic technology.

...read moreread less

Journal Article•DOI•

Will massive open online courses change how we teach

[...]

Fred Martin¹•Institutions (1)

University of Massachusetts Lowell¹

01 Aug 2012-Communications of The ACM

TL;DR: Sharing recent experiences with an online course is a good place to start if you want to know more about how to become a better teacher.

...read moreread less

Abstract: Sharing recent experiences with an online course.

...read moreread less

Journal Article•DOI•

Spreadsheet data manipulation using examples

[...]

Sumit Gulwani¹, William R. Harris², Rishabh Singh³•Institutions (3)

Microsoft¹, University of Wisconsin-Madison², Massachusetts Institute of Technology³

01 Aug 2012-Communications of The ACM

TL;DR: This work presents a programming by example methodology that allows end users to automate such repetitive tasks over large spreadsheet data by designing a domain-specific language and developing a synthesis algorithm that can learn programs in that language from user-provided examples.

...read moreread less

Abstract: Millions of computer end users need to perform tasks over large spreadsheet data, yet lack the programming knowledge to do such tasks automatically. We present a programming by example methodology that allows end users to automate such repetitive tasks. Our methodology involves designing a domain-specific language and developing a synthesis algorithm that can learn programs in that language from user-provided examples. We present instantiations of this methodology for particular domains of tasks: (a) syntactic transformations of strings using restricted forms of regular expressions, conditionals, and loops, (b) semantic transformations of strings involving lookup in relational tables, and (c) layout transformations on spreadsheet tables. We have implemented this technology as an add-in for the Microsoft Excel Spreadsheet system and have evaluated it successfully over several benchmarks picked from various Excel help forums.

...read moreread less

Journal Article•DOI•

Advances and challenges in log analysis

[...]

Adam J. Oliner¹, Archana Ganapathi, Wei Xu²•Institutions (2)

University of California, Berkeley¹, Google²

Why on-chip cache coherence is here to stay

TL;DR: Logs contain a wealth of information to help manage systems and can be used to improve the quality and efficiency of systems and improve the user experience.

...read moreread less

Abstract: Computer-system logs provide a glimpse into the states of a running system. Instrumentation occasionally generates short messages that are collected in a system-specific log. The content and format...

...read moreread less

Journal Article•DOI•

[...]

Milo M. K. Martin¹, Mark D. Hill², Daniel J. Sorin³•Institutions (3)

University of Pennsylvania¹, University of Wisconsin-Madison², Duke University³

01 Jul 2012-Communications of The ACM

TL;DR: On-chip hardware coherence can scale gracefully as the number of cores increases, and the value of these cores can increase with increasing number of processors.

...read moreread less

Abstract: Today’s multicore chips commonly implement shared memory with cache coherence as low-level support for operating systems and application software. Technology trends continue to enable the scaling of the number of (processor) cores per chip. Because conventional wisdom says that the coherence does not scale well to many cores, some prognosticators predict the end of coherence. This paper seeks to refute this conventional wisdom by showing one way to scale on-chip cache coherence with bounded, modest costs by combining known techniques such as: shared caches augmented to track cached copies, explicit cache eviction notifications, and hierarchical design. Based on this scalable proof-of-concept design, we predict that on-chip coherence and the programming convenience and compatibility it provides are here to stay.

...read moreread less

Journal Article•DOI•

Success factors for deploying cloud computing

[...]

Gary Garrison¹, Sang-Hyun Kim², Robin L. Wakefield³•Institutions (3)

Belmont University¹, Kyungpook National University², Baylor University³

Self-adaptive software needs quantitative verification at runtime

TL;DR: It is shown that trust between client organization and cloud provider is a strong predictor of successful cloud deployment.

...read moreread less

Abstract: Trust between client organization and cloud provider is a strong predictor of successful cloud deployment.

...read moreread less

Journal Article•DOI•

[...]

Radu Calinescu¹, Carlo Ghezzi², Marta Kwiatkowska³, Raffaela Mirandola²•Institutions (3)

University of York¹, Polytechnic University of Milan², University of Oxford³

Software as a service for data scientists

TL;DR: Self-adaptation decisions taken by critical software in response to changes in the operating environment are verified to provide real-time information about how the software has changed over time.

...read moreread less

Abstract: Continually verify self-adaptation decisions taken by critical software in response to changes in the operating environment.

...read moreread less

Journal Article•DOI•

[...]

Bryce Allen¹, John Bresnahan¹, Lisa Childers², Ian Foster², Gopi Kandaswamy³, Raj Kettimuthu², Jack Kordas¹, Mike Link¹, Stuart Martin¹, Karl Pickett¹, Steven Tuecke¹ - Show less +7 more•Institutions (3)

Argonne National Laboratory¹, University of Chicago², Information Sciences Institute³

The grand challenge of computer Go: Monte Carlo tree search and extensions

TL;DR: Globus Online manages fire-and-forget file transfers for big-data, high-performance scientific collaborations.

...read moreread less

Abstract: Globus Online manages fire-and-forget file transfers for big-data, high-performance scientific collaborations.

...read moreread less

Journal Article•DOI•

[...]

Sylvain Gelly¹, Levente Kocsis, Marc Schoenauer¹, Michèle Sebag¹, David Silver², Csaba Szepesvári³, Olivier Teytaud¹ - Show less +3 more•Institutions (3)

French Institute for Research in Computer Science and Automation¹, University College London², University of Alberta³

01 Mar 2012-Communications of The ACM

TL;DR: This paper describes the leading algorithms for Monte-Carlo tree search and explains how they have advanced the state of the art in computer Go.

...read moreread less

Abstract: The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, computer Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. However, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo methods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper, we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.

...read moreread less

Journal Article•DOI•

Programming by optimization

[...]

Holger H. Hoos¹•Institutions (1)

University of British Columbia¹

Progress and challenges in intelligent vehicle area networks

TL;DR: Avoid premature commitment, seek design alternatives, and automatically generate performance-optimized software to avoid wasted effort and improve quality of life.

...read moreread less

Abstract: Avoid premature commitment, seek design alternatives, and automatically generate performance-optimized software.

...read moreread less

Journal Article•DOI•

[...]

Miad Faezipour¹, Mehrdad Nourani², Adnan Saeed², Sateesh Addepalli³•Institutions (3)

University of Bridgeport¹, University of Texas at Dallas², Cisco Systems, Inc.³

Will MOOCs destroy academia

TL;DR: Vehicle area networks form the backbone of future intelligent transportation systems and will be the focus of research and development for the next generation of smart cities.

...read moreread less

Abstract: Vehicle area networks form the backbone of future intelligent transportation systems.

...read moreread less

Journal Article•DOI•

[...]

Moshe Y. Vardi

01 Nov 2012-Communications of The ACM

TL;DR: "Thy destroyers and they that made thee waste shall go forth of thee," wrote the prophet Isaiah, which has been popping into the mind as I have been following the book of Isaiah.

...read moreread less

Abstract: "Thy destroyers and they that made thee waste shall go forth of thee," wrote the prophet Isaiah. This phrase has been popping into my mind as I have been following.

...read moreread less

Journal Article•DOI•

Large-scale complex IT systems

[...]

Ian Sommerville¹, Dave Cliff², Radu Calinescu³, Justin Keen⁴, Tim Kelly⁵, Marta Kwiatkowska⁶, John McDermid⁵, Richard F. Paige⁵ - Show less +4 more•Institutions (6)

University of St Andrews¹, University of Bristol², Aston University³, University of Leeds⁴, University of York⁵, University of Oxford⁶

01 Jul 2012-Communications of The ACM

TL;DR: The reductionism behind today's software-engineering methods breaks down in the face of systems complexity as mentioned in this paper, and the reductionism in software engineering has been shown to break down with the complexity of software systems.

...read moreread less

Abstract: The reductionism behind today's software-engineering methods breaks down in the face of systems complexity.

...read moreread less

Journal Article•DOI•

CryptDB: processing queries on an encrypted database

[...]

Raluca Ada Popa¹, Catherine M. S. Redfield¹, Nickolai Zeldovich¹, Hari Balakrishnan¹•Institutions (1)

Massachusetts Institute of Technology¹

Theory of algorithmic self-assembly

TL;DR: This presentation explains how to identify and prevent data breaches in the rapidly changing environment by identifying the tell-tale signs of abuse.

...read moreread less

Abstract: 1. intRoDuCtion Theft of private information is a significant problem for online applications. For example, a recent investigation found that at least eight million people's medical records were stolen as a result of data breaches between 2009 and 2011,

...read moreread less

Journal Article•DOI•

[...]

David Doty¹•Institutions (1)

California Institute of Technology¹

01 Dec 2012-Communications of The ACM

TL;DR: The challenge of programming molecules to manipulate themselves is studied to find out whether molecules can be programmed to behave in a way that allows them to be manipulated by computers.

...read moreread less

Abstract: Self-assembly is the process by which small components automatically assemble themselves into large, complex structures. Examples in nature abound: lipids self-assemble a cell's membrane, and bacteriophage virus proteins self-assemble a capsid that allows the virus to invade other bacteria. Even a phenomenon as simple as crystal formation is a process of self-assembly. How could such a process be described as "algorithmic?" The key word in the first sentence is automatically. Algorithms automate a series of simple computational tasks. Algorithmic self-assembly systems automate a series of simple growth tasks, in which the object being grown is simultaneously the machine controlling its own growth.

...read moreread less

Journal Article•DOI•

What agile teams think of agile principles

[...]

Laurie Williams¹•Institutions (1)

North Carolina State University¹

The word-gesture keyboard: reimagining keyboard interaction

TL;DR: Even after almost a dozen years, these books still deliver solid guidance for software development teams and their projects.

...read moreread less

Abstract: Even after almost a dozen years, they still deliver solid guidance for software development teams and their projects.

...read moreread less

Journal Article•DOI•

[...]

Shumin Zhai¹, Per Ola Kristensson²•Institutions (2)

Google¹, University of St Andrews²

Questioning naturalism in 3D user interfaces

TL;DR: The basic concepts and initial prototype of a word-gesture keyboard are discussed, and exact or statistical modeling of gesture keyboard's speed-accuracy trade-off incorporating human control behavior is under research.

...read moreread less

Abstract: The basic concepts and initial prototype of a word-gesture keyboard are discussed. In early 1980's, Montgomery conceived the idea of using sliding gestures on a touch keyboard to enter characters. He designed a wipe activated keyboard with a flat touch sensitive surface. The positions of the letter keys were carefully arranged to make consecutive letters commonly appear in words connected on the keyboard. Since a gesture keyboard enhances, rather than replaces, a conventional touchscreen keyboard, out of vocabulary (OOV) letter sequences can always be entered by typing the individual letter keys. In using a word-gesture keyboard, the production of movements increasingly changes from focusing on individual letters to connecting multiple letters into a word gesture. Conceptually, gesture recognition is done by identifying the word which has the highest probability given the user's gesture. Exact or statistical modeling of gesture keyboard's speed-accuracy trade-off incorporating human control behavior is under research.

...read moreread less

Journal Article•DOI•

[...]

Doug A. Bowman¹, Ryan P. McMahan², Eric D. Ragan¹•Institutions (2)

Virginia Tech¹, University of Texas at Dallas²