scispace - formally typeset
Search or ask a question

Showing papers in "ACM Computing Surveys in 2001"


Journal ArticleDOI
TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.
Abstract: We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.

2,723 citations


Journal ArticleDOI
TL;DR: A unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework, and presents a quantitative definition of the elusive concept of "intrinsic dimensionality".
Abstract: The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of "intrinsic dimensionality." We also present a unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework. Most approaches turn out to be variations on a few different concepts. We organize those works in a taxonomy that allows us to devise new algorithms from combinations of concepts not noticed before because of the lack of communication between different communities. We present experiments validating our results and comparing the existing approaches. We finish with recommendations for practitioners and open questions for future development.

1,337 citations


Journal ArticleDOI
TL;DR: An overview of the current state of the art in querying multimedia databases is provided, describing the index structures and algorithms for an efficient query processing in high-dimensional spaces.
Abstract: During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography, and molecular biology. An important research issue in the field of multimedia databases is the content-based retrieval of similar multimedia objects such as images, text, and videos. However, in contrast to searching data in a relational database, a content-based retrieval requires the search of similar objects as a basic functionality of the database system. Most of the approaches addressing similarity search use a so-called feature transformation that transforms important properties of the multimedia objects into high-dimensional points (feature vectors). Thus, the similarity search is transformed into a search of points in the feature space that are close to a given query point in the high-dimensional feature space. Query processing in high-dimensional spaces has therefore been a very active research area over the last few years. A number of new index structures and algorithms have been proposed. It has been shown that the new index structures considerably improve the performance in querying large multimedia databases. Based on recent tutorials [Berchtold and Keim 1998], in this survey we provide an overview of the current state of the art in querying multimedia databases, describing the index structures and algorithms for an efficient query processing in high-dimensional spaces. We identify the problems of processing queries in high-dimensional space, and we provide an overview of the proposed approaches to overcome these problems.

922 citations


Journal ArticleDOI
TL;DR: The survey analyzes existing techniques, identifies which aspects of usability evaluation automation are likely to be of use in future research, and suggests new ways to expand existing approaches to better support usability evaluation.
Abstract: Usability evaluation is an increasingly important part of the user interface design process. However, usability evaluation can be expensive in terms of time and human resources, and automation is therefore a promising way to augment existing approaches. This article presents an extensive survey of usability evaluation methods, organized according to a new taxonomy that emphasizes the role of automation. The survey analyzes existing techniques, identifies which aspects of usability evaluation automation are likely to be of use in future research, and suggests new ways to expand existing approaches to better support usability evaluation.

906 citations


Journal ArticleDOI
TL;DR: This article surveys various complexity and expressiveness results on different forms of logic programming, in particular, propositional logic programming and datalog, but it also mentions general logic programming with function symbols.
Abstract: This article surveys various complexity and expressiveness results on different forms of logic programming. The main focus is on decidable forms of logic programming, in particular, propositional logic programming and datalog, but we also mention general logic programming with function symbols. Next to classical results on plain logic programming (pure Horn clause programs), more recent results on various important extensions of logic programming are surveyed. These include logic programming with different forms of negation, disjunctive logic programming, logic programming with equality, and constraint logic programming.

801 citations


Journal ArticleDOI
TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
Abstract: Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.

751 citations


Journal ArticleDOI
TL;DR: The specification framework presented in this article will help builders of group communication systems understand andspecify their service semantics; the extensive survey will allow them to compare their service to others, and serve as a unified framework for the classification, analysis, and comparison of group Communication systems.
Abstract: View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented group communication systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This survey provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over 30 published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis, and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the presented specifications. This survey is aimed at both system builders and theoretical researchers. The specification framework presented in this article will help builders of group communication systems understand and specify their service semantics; the extensive survey will allow them to compare their service to others. Application builders will find a guide here to the services provided by a large variety of GCSs, which could help them choose the GCS appropriate for their needs. The formal framework may provide a basis for interesting theoretical work, for example, analyzing relative strengths of different properties and the costs of implementing them.

734 citations


Journal ArticleDOI
TL;DR: This tutorial surveys the modeling issues here, with an emphasis upon the impact these have upon the problem of assessing the reliability of fault-tolerant systems.
Abstract: Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software-based systems. While there is clear evidence that the approach can be expected to deliver some increase in reliability compared to a single version, there is no agreement about the extent of this. More importantly, it remains difficult to evaluate exactly how reliable a particular diverse fault-tolerant system is. This difficulty arises because assumptions of independence of failures between different versions have been shown to be untenable: assessment of the actual level of dependence present is therefore needed, and this is difficult. In this tutorial, we survey the modeling issues here, with an emphasis upon the impact these have upon the problem of assessing the reliability of fault-tolerant systems. The intended audience is one of designers, assessors, and project managers with only a basic knowledge of probabilities, as well as reliability experts without detailed knowledge of software, who seek an introduction to the probabilistic issues in decisions about design diversity.

155 citations


Journal ArticleDOI
TL;DR: It is essential to build trust in Java safety, and thence to achieve ITSEC level 6 or Common Criteria level 7 certification for Java programs, to build a coherent set of machine-readable formal models of the whole of Java and its implementation.
Abstract: We review the existing literature on Java safety, emphasizing formal approaches, and the impact of Java safety on small footprint devices such as smartcards. The conclusion is that although a lot of good work has been done, a more concerted effort is needed to build a coherent set of machine-readable formal models of the whole of Java and its implementation. This is a formidable task but we believe it is essential to build trust in Java safety, and thence to achieve ITSEC level 6 or Common Criteria level 7 certification for Java programs.

99 citations


Journal ArticleDOI
TL;DR: This survey formalizes these hardness results, in the context of both logic- and probability-based reasoning, then overviews the techniques now used to address, or at least side-step, this dilemma.
Abstract: Many tasks require “reasoning”—i.e., deriving conclusions from a corpus of explicitly stored information—to solve their range of problems. An ideal reasoning system would produce all-and-only the correct answers to every possible query, produce answers that are as specific as possible, be expressive enough to permit any possible fact to be stored and any possible query to be asked, and be (time) efficient. Unfortunately, this is provably impossible: as correct and precise systems become more expressive, they can become increasingly inefficient, or even undecidable. This survey first formalizes these hardness results, in the context of both logic- and probability-based reasoning, then overviews the techniques now used to address, or at least side-step, this dilemma.

89 citations


Journal ArticleDOI
TL;DR: The focus is on how enriching the labels of transitions with encodings of their deduction trees is sufficient to derive qualitative and quantitative information on the systems in hand simply by relabeling the transitions of a unique concrete model.
Abstract: This article surveys the definition and application of an enhancement of structural operational semantics in the field of concurrent systems, and also addresses issues of distribution and mobility of code. The focus is on how enriching the labels of transitions with encodings of their deduction trees is sufficient to derive qualitative and quantitative information on the systems in hand simply by relabeling the transitions of a unique concrete model.

Journal ArticleDOI
TL;DR: A set of concrete goals, which are general and problem-independent and might appear ubiquitously in the algorithmic realm are presented, illustrated, motivated, and justified by an extensive requirements analysis for a particular algorithm from a particular algorithmic domain: Dijkstra's algorithm for shortest paths in networks.
Abstract: An algorithm component is an implementation of an algorithm which is not intended to be a stand-alone module, but to perform a specific task within a large software package or even within several distinct software packages. Therefore, the design of algorithm components must also incorporate software-engineering aspects. A key design goal is adaptability. This goal is important for maintenance throughout a project, prototypical development, and reuse in new, unforseen contexts. From a theoretical viewpoint most algorithms apply to a range of possible use scenarios. Ideally, each algorithm is implemented by one algorithm component, which is easily, safely, and efficiently adaptable to all of these contexts.Various techniques have been developed for the design and implementation of algorithm components. However, a common basis for systematic, detailed evaluations and comparisons in view of the real practical needs is still missing. Basically, this means a set of concrete criteria, which specify what sort of adaptability is really required in practice, and which are well-justified by convincing, representative use scenarios.This paper is intended to be a first “milestone” on the way towards such a system of criteria. We will present a set of concrete goals, which are general and problem-independent and might appear ubiquitously in the algorithmic realm. These goals are illustrated, motivated, and justified by an extensive requirements analysis for a particular algorithm from a particular algorithmic domain: Dijkstra's algorithm for shortest paths in networks.Clearly, the field of algorithmics might be too versatile to allow a comprehensive, yet concise set of precise, justified criteria. Even a domain as restricted as graph and network algorithms includes aspects that are not fully understood. The analysis will include a discussion of the limits of the case study and the scope of the goals. The case study was chosen because it seems to be close to the “boderline” between the aspects that are well understood and the aspects that are not. Hence, this example may well serve as an“acid test” for programming techniques in view of the state of the art.