scispace - formally typeset
Search or ask a question

Showing papers in "Communications of The ACM in 2012"


Journal ArticleDOI
TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.
Abstract: Probabilistic topic modeling provides a suite of tools for the unsupervised analysis of large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. This analysis can be used for corpus exploration, document search, and a variety of prediction problems.In this tutorial, I will review the state-of-the-art in probabilistic topic models. I will describe the three components of topic modeling:(1) Topic modeling assumptions(2) Algorithms for computing with topic models(3) Applications of topic modelsIn (1), I will describe latent Dirichlet allocation (LDA), which is one of the simplest topic models, and then describe a variety of ways that we can build on it. These include dynamic topic models, correlated topic models, supervised topic models, author-topic models, bursty topic models, Bayesian nonparametric topic models, and others. I will also discuss some of the fundamental statistical ideas that are used in building topic models, such as distributions on the simplex, hierarchical Bayesian modeling, and models of mixed-membership.In (2), I will review how we compute with topic models. I will describe approximate posterior inference for directed graphical models using both sampling and variational inference, and I will discuss the practical issues and pitfalls in developing these algorithms for topic models. Finally, I will describe some of our most recent work on building algorithms that can scale to millions of documents and documents arriving in a stream.In (3), I will discuss applications of topic models. These include applications to images, music, social networks, and other data in which we hope to uncover hidden patterns. I will describe some of our recent work on adapting topic modeling algorithms to collaborative filtering, legislative modeling, and bibliometrics without citations.Finally, I will discuss some future directions and open research problems in topic models.

4,529 citations


Journal ArticleDOI
TL;DR: Content-Centric Networking (CCN) is presented which uses content chunks as a primitive---decoupling location from identity, security and access, and retrieving chunks of content by name, and simultaneously achieves scalability, security, and performance.
Abstract: Current network use is dominated by content distribution and retrieval yet current networking protocols are designed for conversations between hosts. Accessing content and services requires mapping from the what that users care about to the network's where. We present Content-Centric Networking (CCN) which uses content chunks as a primitive---decoupling location from identity, security and access, and retrieving chunks of content by name. Using new approaches to routing named content, derived from IP, CCN simultaneously achieves scalability, security, and performance. We describe our implementation of the architecture's basic features and demonstrate its performance and resilience with secure file downloads and VoIP calls.

3,122 citations


Journal ArticleDOI
TL;DR: Tapping into the "folk knowledge" needed to advance machine learning applications is a natural next step in the development of artificial intelligence systems.
Abstract: Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.

2,482 citations


Journal ArticleDOI
TL;DR: In this paper, a convex programming problem is used to find the matrix with the minimum nuclear norm that is consistent with the observed entries in a low-rank matrix, which is then used to recover all the missing entries from most sufficiently large subsets.
Abstract: Suppose that one observes an incomplete subset of entries selected from a low-rank matrix. When is it possible to complete the matrix and recover the entries that have not been seen? We demonstrate that in very general settings, one can perfectly recover all of the missing entries from most sufficiently large subsets by solving a convex programming problem that finds the matrix with the minimum nuclear norm agreeing with the observed entries. The techniques used in this analysis draw upon parallels in the field of compressed sensing, demonstrating that objects other than signals and images can be perfectly reconstructed from very limited information.

2,327 citations


Journal ArticleDOI
TL;DR: If you are reading these lines on a PC running some form of Windows, then you have been affected by this line of work--without knowing it, which is precisely the way the authors want it to be.
Abstract: Most ACM Queue readers might think of "program verification research" as mostly theoretical with little impact on the world at large. Think again. If you are reading these lines on a PC running som...

624 citations


Journal ArticleDOI
TL;DR: The "persistently full buffer problem" is still with us and made increasingly critical by two trends, cheap memory and a "more is better" mentality have led to the inflation and proliferation of buffers.
Abstract: Nearly three decades after it was first diagnosed, the "persistently full buffer problem" recently exposed as part of "bufferbloat", is still with us and made increasingly critical by two trends. F...

599 citations


Journal ArticleDOI
TL;DR: A research agenda for making the smart grid a reality is presented, with a focus on energy efficiency, smart grids and smart cities.
Abstract: The phenomenal growth in material wealth experienced in developed countries throughout the twentieth century has largely been driven by the availability of cheap energy derived from fossil fuels (originally coal, then oil, and most recently natural gas). However, the continued availability of this cheap energy cannot be taken for granted given the growing concern that increasing demand for these fuels (and particularly, demand for oil) will outstrip our ability to produce them (so called 'peak oil'). Many mature oil and gas fields around the world have already peaked and their annual production is now steadily declining. Predictions of when world oil production will peak vary between 0-20 years into the future, but even the most conservative estimates provide little scope for complacency given the significant price increases that peak oil is likely to precipitate. Furthermore, many of the oil and gas reserves that do remain are in environmentally or politically sensitive regions of the world where threats to supply create increased price volatility (as evidenced by the 2010 Deepwater Horizon disaster and 2011 civil unrest in the Middle East). Finally, the growing consensus on the long term impact of carbon emissions from burning fossil fuels suggests that even if peak oil is avoided, and energy security assured, a future based on fossil fuel use will expose regions of the world to damaging climate change that will make the lives of many of the world's poorest people even harder.

513 citations


Journal ArticleDOI
TL;DR: Looking past the systems people use, they target the people using the systems.
Abstract: Looking past the systems people use, they target the people using the systems.

457 citations


Journal ArticleDOI
TL;DR: Using real event data to X-ray business processes helps ensure conformance between design and reality.
Abstract: Using real event data to X-ray business processes helps ensure conformance between design and reality.

408 citations


Journal ArticleDOI
TL;DR: Analysis requires contextualized human judgments regarding the domain-specific significance of the clusters, trends, and outliers discovered in data.
Abstract: The increasing scale and availability of digital data provides an extraordinary resource for informing public policy, scientific discovery, business strategy, and even our personal lives. To get the most out of such data, however, users must be able to make sense of it: to pursue questions, uncover patterns of interest, and identify (and potentially correct) errors. In concert with data-management systems and statistical algorithms, analysis requires contextualized human judgments regarding the domain-specific significance of the clusters, trends, and outliers discovered in data.

404 citations


Journal ArticleDOI
TL;DR: Application areas for computer-vision technology include video surveillance, biometrics, automotive, photography, movie production, Web search, medicine, augmented reality gaming, new user interfaces, and many more.
Abstract: Computer vision is a rapidly growing field devoted to analyzing, modifying, and high-level understanding of images. Its objective is to determine what is happening in front of a camera and use that understanding to control a computer or robotic system, or to provide people with new images that are more informative or aesthetically pleasing than the original camera images. Application areas for computer-vision technology include video surveillance, biometrics, automotive, photography, movie production, Web search, medicine, augmented reality gaming, new user interfaces, and many more.

Journal ArticleDOI
TL;DR: A few hubs with many connections share with many individuals with few connections, leading to a chain of relationships that is mutually beneficial to both parties.
Abstract: Understanding structural and algorithmic properties of complex networks is an important task, not least because of the huge impact of the internet. Our focus is to analyze how news spreads in social networks. We simulate a simple information spreading process in different network topologies and demonstrate that news spreads much faster in existing social network topologies. We support this finding by analyzing information spreading in the mathematically defined preferential attachment network topology, which is a common model for real-world networks. We prove that here a sublogarithmic time suffices to spread a news to all nodes of the network. All previously studied network topologies need at least a logarithmic time. Surprisingly, we observe that nodes with few neighbors are crucial for the fast dissemination. Social networks like Facebook and Twitter are reshaping the way people take collective actions. They have played a crucial role in the recent uprisings of the ‘Arab Spring’ and the ‘London riots’. It has been argued that the ‘instantaneous nature’ of these networks influenced the speed at which the events were unfolding [4]. It is quite remarkable that social networks spread news so fast. Both the structure of social networks and the process that distributes the news are not designed with this purpose in mind. On the contrary, they are not designed at all, but have evolved in a random and decentralized manner. So is our view correct that social networks ease the spread of information (“rumors”), and if so, what particular properties of social networks are the reason for this? To answer these questions, we simulate a simple rumor spreading process on several graphs having the structure of existing large social networks. We see, for example, that a rumor started at a random node of the Twitter network in average reaches 45.6 million of the total of 51.2 million members within only eight rounds of communication. We also analyze this process on an abstract model of social networks, the so-called preferential attachment graphs introduced by Barabasi and Albert [3]. In [17], we obtain a mathematical proof that rumors in such networks spread much faster than in many other network topologies—even faster than in networks having a communication link between any two nodes (complete graphs). As an explanation, we observe that nodes of small degree build a short-cut between those having large degree (hubs), which due to their large number of possible communication partners less often talk to each other directly.

Journal ArticleDOI
TL;DR: Soft materials may enable the automation of tasks beyond the capacities of current robotic technology.
Abstract: Soft materials may enable the automation of tasks beyond the capacities of current robotic technology.

Journal ArticleDOI
TL;DR: Sharing recent experiences with an online course is a good place to start if you want to know more about how to become a better teacher.
Abstract: Sharing recent experiences with an online course.

Journal ArticleDOI
TL;DR: This work presents a programming by example methodology that allows end users to automate such repetitive tasks over large spreadsheet data by designing a domain-specific language and developing a synthesis algorithm that can learn programs in that language from user-provided examples.
Abstract: Millions of computer end users need to perform tasks over large spreadsheet data, yet lack the programming knowledge to do such tasks automatically. We present a programming by example methodology that allows end users to automate such repetitive tasks. Our methodology involves designing a domain-specific language and developing a synthesis algorithm that can learn programs in that language from user-provided examples. We present instantiations of this methodology for particular domains of tasks: (a) syntactic transformations of strings using restricted forms of regular expressions, conditionals, and loops, (b) semantic transformations of strings involving lookup in relational tables, and (c) layout transformations on spreadsheet tables. We have implemented this technology as an add-in for the Microsoft Excel Spreadsheet system and have evaluated it successfully over several benchmarks picked from various Excel help forums.

Journal ArticleDOI
TL;DR: Logs contain a wealth of information to help manage systems and can be used to improve the quality and efficiency of systems and improve the user experience.
Abstract: Computer-system logs provide a glimpse into the states of a running system. Instrumentation occasionally generates short messages that are collected in a system-specific log. The content and format...

Journal ArticleDOI
TL;DR: On-chip hardware coherence can scale gracefully as the number of cores increases, and the value of these cores can increase with increasing number of processors.
Abstract: Today’s multicore chips commonly implement shared memory with cache coherence as low-level support for operating systems and application software. Technology trends continue to enable the scaling of the number of (processor) cores per chip. Because conventional wisdom says that the coherence does not scale well to many cores, some prognosticators predict the end of coherence. This paper seeks to refute this conventional wisdom by showing one way to scale on-chip cache coherence with bounded, modest costs by combining known techniques such as: shared caches augmented to track cached copies, explicit cache eviction notifications, and hierarchical design. Based on this scalable proof-of-concept design, we predict that on-chip coherence and the programming convenience and compatibility it provides are here to stay.

Journal ArticleDOI
TL;DR: It is shown that trust between client organization and cloud provider is a strong predictor of successful cloud deployment.
Abstract: Trust between client organization and cloud provider is a strong predictor of successful cloud deployment.

Journal ArticleDOI
TL;DR: Self-adaptation decisions taken by critical software in response to changes in the operating environment are verified to provide real-time information about how the software has changed over time.
Abstract: Continually verify self-adaptation decisions taken by critical software in response to changes in the operating environment.

Journal ArticleDOI
TL;DR: Globus Online manages fire-and-forget file transfers for big-data, high-performance scientific collaborations.
Abstract: Globus Online manages fire-and-forget file transfers for big-data, high-performance scientific collaborations.

Journal ArticleDOI
TL;DR: This paper describes the leading algorithms for Monte-Carlo tree search and explains how they have advanced the state of the art in computer Go.
Abstract: The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, computer Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. However, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo methods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper, we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.

Journal ArticleDOI
TL;DR: Avoid premature commitment, seek design alternatives, and automatically generate performance-optimized software to avoid wasted effort and improve quality of life.
Abstract: Avoid premature commitment, seek design alternatives, and automatically generate performance-optimized software.

Journal ArticleDOI
TL;DR: Vehicle area networks form the backbone of future intelligent transportation systems and will be the focus of research and development for the next generation of smart cities.
Abstract: Vehicle area networks form the backbone of future intelligent transportation systems.

Journal ArticleDOI
TL;DR: "Thy destroyers and they that made thee waste shall go forth of thee," wrote the prophet Isaiah, which has been popping into the mind as I have been following the book of Isaiah.
Abstract: "Thy destroyers and they that made thee waste shall go forth of thee," wrote the prophet Isaiah. This phrase has been popping into my mind as I have been following.

Journal ArticleDOI
TL;DR: The reductionism behind today's software-engineering methods breaks down in the face of systems complexity as mentioned in this paper, and the reductionism in software engineering has been shown to break down with the complexity of software systems.
Abstract: The reductionism behind today's software-engineering methods breaks down in the face of systems complexity.

Journal ArticleDOI
TL;DR: This presentation explains how to identify and prevent data breaches in the rapidly changing environment by identifying the tell-tale signs of abuse.
Abstract: 1. intRoDuCtion Theft of private information is a significant problem for online applications. For example, a recent investigation found that at least eight million people's medical records were stolen as a result of data breaches between 2009 and 2011,

Journal ArticleDOI
TL;DR: The challenge of programming molecules to manipulate themselves is studied to find out whether molecules can be programmed to behave in a way that allows them to be manipulated by computers.
Abstract: Self-assembly is the process by which small components automatically assemble themselves into large, complex structures. Examples in nature abound: lipids self-assemble a cell's membrane, and bacteriophage virus proteins self-assemble a capsid that allows the virus to invade other bacteria. Even a phenomenon as simple as crystal formation is a process of self-assembly. How could such a process be described as "algorithmic?" The key word in the first sentence is automatically. Algorithms automate a series of simple computational tasks. Algorithmic self-assembly systems automate a series of simple growth tasks, in which the object being grown is simultaneously the machine controlling its own growth.

Journal ArticleDOI
TL;DR: Even after almost a dozen years, these books still deliver solid guidance for software development teams and their projects.
Abstract: Even after almost a dozen years, they still deliver solid guidance for software development teams and their projects.

Journal ArticleDOI
TL;DR: The basic concepts and initial prototype of a word-gesture keyboard are discussed, and exact or statistical modeling of gesture keyboard's speed-accuracy trade-off incorporating human control behavior is under research.
Abstract: The basic concepts and initial prototype of a word-gesture keyboard are discussed. In early 1980's, Montgomery conceived the idea of using sliding gestures on a touch keyboard to enter characters. He designed a wipe activated keyboard with a flat touch sensitive surface. The positions of the letter keys were carefully arranged to make consecutive letters commonly appear in words connected on the keyboard. Since a gesture keyboard enhances, rather than replaces, a conventional touchscreen keyboard, out of vocabulary (OOV) letter sequences can always be entered by typing the individual letter keys. In using a word-gesture keyboard, the production of movements increasingly changes from focusing on individual letters to connecting multiple letters into a word gesture. Conceptually, gesture recognition is done by identifying the word which has the highest probability given the user's gesture. Exact or statistical modeling of gesture keyboard's speed-accuracy trade-off incorporating human control behavior is under research.

Journal ArticleDOI
TL;DR: 3D UIs are uniquely able to achieve superior interaction fidelity, and this naturalism can be a huge advantage in the rapidly changing environment.
Abstract: 3D UIs are uniquely able to achieve superior interaction fidelity, and this naturalism can be a huge advantage.