scispace - formally typeset
Search or ask a question

Showing papers in "Communications of The ACM in 2008"


Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations


Journal ArticleDOI
TL;DR: An algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(DN + n1+1c2 + o(1) + 1/c2), which almost matches the lower bound for hashing-based algorithm recently obtained.
Abstract: In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.

1,759 citations


Journal ArticleDOI
TL;DR: Data generated as a side effect of game play also solves computational problems and trains AI algorithms.
Abstract: Data generated as a side effect of game play also solves computational problems and trains AI algorithms

1,154 citations


Journal ArticleDOI
TL;DR: How ontologies provide the semantics, as explained here with the help of Harry Potter and his owl Hedwig.
Abstract: How ontologies provide the semantics, as explained here with the help of Harry Potter and his owl Hedwig.

629 citations


Journal ArticleDOI
TL;DR: In this paper, a self-supervised learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text.
Abstract: To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.

545 citations


Journal ArticleDOI
TL;DR: With access control and encryption no longer capable of protecting privacy, laws and systems are needed that hold people accountable for the misuse of personal information, whether public or secret.
Abstract: With access control and encryption no longer capable of protecting privacy, laws and systems are needed that hold people accountable for the misuse of personal information, whether public or secret.

421 citations


Journal ArticleDOI
TL;DR: In Zyzzyva, replicas reply to a client's request without first running an expensive three-phase commit protocol to agree on the order to process requests, making BFT replication practical for a broad range of demanding services.
Abstract: A longstanding vision in distributed systems is to build reliable systems from unreliable components An enticing formulation of this vision is Byzantine fault-tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary ("Byzantine") ways Despite this promise, practitioners hesitate to deploy BFT systems at least partly because of the perception that BFT must impose high overheads In this article, we present Zyzzyva, a protocol that uses speculation to reduce the cost of BFT replication In Zyzzyva, replicas reply to a client's request without first running an expensive three-phase commit protocol to agree on the order to process requests Instead, they optimistically adopt the order proposed by a primary server, process the request, and reply immediately to the client If the primary is faulty, replicas can become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima and to achieve throughputs of tens of thousands of requests per second, making BFT replication practical for a broad range of demanding services

390 citations


Journal ArticleDOI
TL;DR: A new image completion algorithm powered by a huge database of photographs gathered from the Web that can generate a diverse set of image completions and allow users to select among them.
Abstract: What can you do with a million images? In this paper, we present a new image completion algorithm powered by a huge database of photographs gathered from the Web. The algorithm patches up holes in images by finding similar image regions in the database that are not only seamless, but also semantically valid. Our chief insight is that while the space of images is effectively infinite, the space of semantically differentiable scenes is actually not that large. For many image completion tasks, we are able to find similar scenes which contain image fragments that will convincingly complete the image. Our algorithm is entirely data driven, requiring no annotations or labeling by the user. Unlike existing image completion methods, our algorithm can generate a diverse set of image completions and we allow users to select among them. We demonstrate the superiority of our algorithm over existing image completion approaches.

354 citations


Journal ArticleDOI
Jon Kleinberg1
TL;DR: Internet-based data on human interaction connects scientific inquiry like never before and helps scientists understand the world around us more fully.
Abstract: The growth of social media and on-line social networks has opened up a set of fascinating new challenges and directions for the field of computing. Some of the basic issues around these developments are the design of information systems in the presence of complex social feedback effects, and the emergence of a growing research interface between computing and the social sciences.

323 citations


Journal ArticleDOI
TL;DR: Developing a framework to analyze coordination patterns occurring in the emergency response life cycle and identifying trends that should be concerned with in the next generation of emergency response plans.
Abstract: Developing a framework to analyze coordination patterns occurring in the emergency response life cycle.

316 citations


Journal ArticleDOI
TL;DR: Drawing on methods from diverse disciplines---including computer science, education, sociology, and psychology---to improve computing education.
Abstract: Drawing on methods from diverse disciplines---including computer science, education, sociology, and psychology---to improve computing education.

Journal ArticleDOI
TL;DR: Embedded networked sensing, having successfully shifted from the lab to the environment, is primed for a more contentious move to the city to where citizens will likely be the target of data collection.
Abstract: Embedded networked sensing, having successfully shifted from the lab to the environment, is primed for a more contentious move to the city to where citizens will likely be the target of data collection. This transition will warrant careful study and touch on issues that go far beyond the scientific realm.

Journal ArticleDOI
TL;DR: Are you ready for a personal energy meter?
Abstract: Are you ready for a personal energy meter?

Journal ArticleDOI
TL;DR: Users sculpt and manipulate digital information through tangible media as clay, sand, and building models, coupled with underlying computation for design and analysis.
Abstract: Users sculpt and manipulate digital information throughsuch tangible media as clay, sand, and building models, coupled with underlying computation for design and analysis.

Journal ArticleDOI
Ping Zhang1
TL;DR: The limits of understanding what contributes to ICT acceptance and use are largely owing to the theoretical perspectives researchers have chosen to study the phenomenon as mentioned in this paper. But despite the heavy investment and keen interest in ICT adoption, our understanding is still limited.
Abstract: Organizations hoping to improve employee productivity, increase strategic advantages, and gain or hold the competitive edge have invested heavily in information and communication technology (ICT). Similarly, ICT development firms and other stakeholders have struggled to attract potential consumers, increase consumer loyalty, and stimulate continued ICT use. Yet despite such heavy investment and keen interest, our understanding of what contributes to ICT acceptance and use is still limited. The limits are largely owing to the theoretical perspectives researchers have chosen to study the phenomenon.

Journal ArticleDOI
TL;DR: This paper found that people tend to believe they are less vulnerable to risks than others, and that they also tend to be less likely to be harmed by consumer products compared to others, which is a common belief for computer users.
Abstract: People tend to believe they are less vulnerable to risks than others. People also believe they are less likely to be harmed by consumer products compared to others. It stands to reason that any computer user has the preset belief that they are at less risk of a computer vulnerability than others.

Journal ArticleDOI
TL;DR: A guide to the tools and core technologies for merging information from disparate sources and how to integrate them into a single system.
Abstract: A guide to the tools and core technologies for merging information from disparate sources.

Journal ArticleDOI
TL;DR: This paper reports how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the memory wall.
Abstract: In the past decades, advances in speed of commodity CPUs have far outpaced advances in RAM latency. Main-memory access has therefore become a performance bottleneck for many computer applications; a phenomenon that is widely known as the "memory wall." In this paper, we report how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the memory wall. This encompasses (i) a redesign of the query execution model to better exploit pipelined CPU architectures and CPU instruction caches; (ii) the use of columnar rather than row-wise data storage to better exploit CPU data caches; (iii) the design of new cache-conscious query processing algorithms; and (iv) the design and automatic calibration of memory cost models to choose and tune these cache-conscious algorithms in the query optimizer.

Journal ArticleDOI
TL;DR: Evaluating user perceptions of location-tracking and location-awareness services finds that perceptions of these services vary greatly depending on the type of service and the location it is offered.
Abstract: Evaluating user perceptions of location-tracking and location-awareness services.

Journal ArticleDOI
TL;DR: The promise of STM may likely be undermined by its overheads and workload applicabilities.
Abstract: TM (transactional memory) is a concurrency control paradigm that provides atomic and isolated execution for regions of code. TM is considered by many researchers to be one of the most promising sol...

Journal ArticleDOI
TL;DR: Displays on real-world objects allow more realistic user interfaces and may improve the quality of the user interface.
Abstract: Over the past few years, there has been a quiet revolution in display manufacturing technology. One that is only comparable in scope to that of the invention of the first LCD, which led to DynaBook and the modern laptop. E-ink electrophoretic pixel technology, combined with advances in organic thin-film circuit substrates, have led to displays that are so thin and flexible they are beginning to resemble paper. Soon displays will completely mimic the high contrast, low power consumption and flexibility of printed media. As with the invention of the first LCD, this means we are on the brink of a new paradigm in computer user interface design: one in which computers can have any organic form or shape. One where any object, no matter how complex, dynamic or flexible its structure, may display information. One where the deformation of shape is a main source of input.

Journal ArticleDOI
TL;DR: It would include details of the processes that produced electronic data as far back as the beginning of time or at least the epoch of provenance awareness.
Abstract: It would include details of the processes that produced electronic data as far back as the beginning of time or at least the epoch of provenance awareness.

Journal ArticleDOI
TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually cataloging and computing the physical world around us.
Abstract: Natural computing is the field of research that investigates models and computational techniques inspired by nature and, dually, attempts to understand the world around us in terms of information processing. It is a highly interdisciplinary field that connects the natural sciences with computing science, both at the level of information technology and at the level of fundamental research, [98]. As a matter of fact, natural computing areas and topics come in many flavours, including pure theoretical research, algorithms and software applications, as well as biology, chemistry and physics experimental laboratory research. In this review we describe computing paradigms abstracted from natural phenomena as diverse as self-reproduction, the functioning of the brain, Darwinian evolution, group behaviour, the immune system, the characteristics of life, cell membranes, and morphogenesis. These paradigms can be implemented either on traditional electronic hardware or on alternative physical media such as biomolecular (DNA, RNA) computing, or trapped-ion quantum computing devices. Dually, we describe several natural processes that can be viewed as information processing, such as gene regulatory networks, protein-protein interaction networks, biological transport networks, and gene assembly in unicellular organisms. In the same vein, we list efforts to understand biological systems by engineering semi-synthetic organisms, and to understand the universe from the point of view of information processing. This review was written with the expectation that the reader is a computer scientist with limited knowledge of natural sciences, and it avoids dwelling on the minute details of

Journal ArticleDOI
TL;DR: This paper presents a concurrency model, based on transactional memory, that offers far richer composition, and describes modular forms of blocking and choice that were inaccessible in earlier work.
Abstract: Writing concurrent programs is notoriously difficult and is of increasing practical importance. A particular source of concern is that even correctly implemented concurrency ions cannot be composed together to form larger ions. In this paper we present a concurrency model, based on transactional memory, that offers far richer composition. All the usual benefits of transactional memory are present (e.g., freedom from low-level deadlock), but in addition we describe modular forms of blocking and choice that were inaccessible in earlier work.

Journal ArticleDOI
TL;DR: Knowing the structure of criminal and terrorist networks could provide the technical insight needed to disrupt their activities.
Abstract: Knowing the structure of criminal and terrorist networks could provide the technical insight needed to disrupt their activities.

Journal ArticleDOI
TL;DR: Considering the advantages and implications of increased usage of wireless connectivity for governmental information and services for governmental Information and services.
Abstract: Considering the advantages and implications of increased usage of wireless connectivity for governmental information and services.

Journal ArticleDOI
TL;DR: As e-government efforts mature, the exploitation of ICt is being extended to the realm of democracy such as, in enhancing citizen participation in policy-making.
Abstract: Governments around the world are tapping on the potential of information and communication technologies (ICt) to transform the public sector, a phenomenon broadly known as e-government. deployment of ICt in government is expected to improve internal efficiency and provide citizens with better information and services. the increasing interest in e-government is evident in the rising public expenditure on ICt. as an indicator, IdC estimates that e-government spending in the asia-Pacific region will exceed u.s. $31 billion by the end of 2010. 6 as e-government efforts mature, the exploitation of ICt is being extended to the realm of democracy such as, in enhancing citizen participation in policy-making.

Journal ArticleDOI
TL;DR: Financial reporting via XBRL is a low-cost method for increasing transparency and compliance while potentially decreasing a firm's cost of capital.
Abstract: Financial reporting via XBRL is a low-cost method for increasing transparency and compliance while potentially decreasing a firm's cost of capital.

Journal ArticleDOI
TL;DR: Even the best project management skills will not guarantee success in the complex world of offshore outsourcing.
Abstract: Even the best project management skills will not guarantee success in the complex world of offshore outsourcing.

Journal ArticleDOI
TL;DR: How managerial prompting, group identification, and social value orientation affect knowledge-sharing behavior affects knowledge- sharing behavior.
Abstract: How managerial prompting, group identification, and social value orientation affect knowledge-sharing behavior.