scispace - formally typeset
Search or ask a question

Showing papers in "ACM Computing Surveys in 2002"


Journal ArticleDOI
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

7,539 citations


Journal ArticleDOI
TL;DR: This survey covers rollback-recovery techniques that do not require special language constructs and distinguishes between checkpoint-based and log-based protocols, which rely solely on checkpointing for system state restoration.
Abstract: This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based.Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants. Depending on how determinants are logged, log-based protocols can be pessimistic, optimistic, or causal. Throughout the survey, we highlight the research issues that are at the core of rollback-recovery and present the solutions that currently address them. We also compare the performance of different rollback-recovery protocols with respect to a series of desirable properties and discuss the issues that arise in the practical implementations of these protocols.

1,772 citations


Journal ArticleDOI
TL;DR: The hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling are explored, and the software that targets these machines is focused on.
Abstract: Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which reuse the configurable hardware during program execution.

1,666 citations


Journal ArticleDOI
TL;DR: A complete view of the current state of the art with respect to layout problems from an algorithmic point of view is presented.
Abstract: Graph layout problems are a particular class of combinatorial optimization problems whose goal is to find a linear layout of an input graph in such way that a certain objective cost is optimized. This survey considers their motivation, complexity, approximation properties, upper and lower bounds, heuristics and probabilistic analysis on random graphs. The result is a complete view of the current state of the art with respect to layout problems from an algorithmic point of view.

665 citations


Journal ArticleDOI
TL;DR: This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies.
Abstract: The overall increase in traffic on the World Wide Web is augmenting user-perceived response times from popular Web sites, especially in conjunction with special events. System platforms that do not replicate information content cannot provide the needed scalability to handle large traffic volumes and to match rapid and dramatic changes in the number of clients. The need to improve the performance of Web-based services has produced a variety of novel content delivery architectures. This article will focus on Web system architectures that consist of multiple server nodes distributed on a local area, with one or more mechanisms to spread client requests among the nodes. After years of continual proposals of new system solutions, routing mechanisms, and policies (the first dated back to 1994 when the NCSA Web site had to face the first million of requests per day), many problems concerning multiple server architectures for Web sites have been solved. Other issues remain to be addressed, especially at the network application layer, but the main techniques and methodologies for building scalable Web content delivery architectures placed in a single location are settled now. This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies. To this purpose, it focuses on architectures, internal routing mechanisms, and dispatching request algorithms for designing and implementing scalable Web-server systems under the control of one content provider. It identifies also some of the open research issues associated with the use of distributed systems for highly accessed Web sites.

525 citations


Journal ArticleDOI
TL;DR: In this article, techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine are surveyed.
Abstract: Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.

443 citations


Journal ArticleDOI
TL;DR: This work is interested in the class of quadric surfaces, that is, algebraic surfaces of degree 2, instances of which are the sphere, the cylinder and the cone.
Abstract: In a variety of practical situations such as reverse engineering of boundary representation from depth maps of scanned objects, range data analysis, model-based recognition and algebraic surface design, there is a need to recover the shape of visible surfaces of a dense 3D point set. In particular, it is desirable to identify and fit simple surfaces of known type wherever these are in reasonable agreement with the data. We are interested in the class of quadric surfaces, that is, algebraic surfaces of degree 2, instances of which are the sphere, the cylinder and the cone. A comprehensive survey of the recent work in each subtask pertaining to the extraction of quadric surfaces from triangulations is presented.

271 citations


Journal ArticleDOI
TL;DR: This article presents the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties, Webpage significance, Web page similarity, search and retrieval, usage characterization and information theoretic properties, and discusses how these metrics can be applied for improving Web information access and use.
Abstract: The unabated growth and increasing significance of the World Wide Web has resulted in a flurry of research activity to improve its capacity for serving information more effectively. But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" of Web resources and services. This observation points towards measurements and models that quantify various attributes of web sites. The science of measuring all aspects of information, especially its storage and retrieval or informetrics has interested information scientists for decades before the existence of the Web. Is Web informetrics any different, or is it just an application of classical informetrics to a new medium? In this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties, Web page significance, Web page similarity, search and retrieval, usage characterization and information theoretic properties. We also discuss how these metrics can be applied for improving Web information access and use.

254 citations


Journal ArticleDOI
Hala ElAarag1
TL;DR: This paper presents how regular TCP is well tuned to react to packet loss inwired networks and discusses and illustrates the problems caused by the mobility of hosts using a graph tracing packets between fixed and mobile hosts.
Abstract: Transmission Control Protocol (TCP) is the most commonly used transport protocol on the Internet. All indications assure that mobile computers and their wireless communication links will be an integral part of the future internetworks. In this paper, we present how regular TCP is well tuned to react to packet loss in wired networks. We then define mobility and the problems associated with it. We discuss why regular TCP is not suitable for mobile hosts and their wireless links by providing simulation results that demonstrate the effect of the high bit error rates of the wireless link on TCP performance. We discuss and illustrate the problems caused by the mobility of hosts using a graph tracing packets between fixed and mobile hosts. We then present a survey of the research done to improve the performance of TCP over mobile wireless networks. We classify the proposed solutions into three categories: link layer, end-to-end and split. We discuss the intuition behind each solution and present example protocols of each category. We discuss the protocols functionality, their strengths and weaknesses. We also provide a comparison of the different approaches in the same category and on the category level. We conclude this survey with a recommendation of the features that need to be satisfied in a standard mobile TCP protocol.

199 citations


Journal ArticleDOI
TL;DR: The purpose of this article is to answer two questions: "What are the requirements that a modern type system for an object-oriented database programming language should satisfy?" and "Are there any type systems developed to-date that satisfy these requirements?".
Abstract: The concept of an object-oriented database programming language (OODBPL) is appealing because it has the potential of combining the advantages of object orientation and database programming to yield a powerful and universal programming language design. A uniform and consistent combination of object orientation and database programming, however, is not straightforward. Since one of the main components of an object-oriented programming language is its type system, one of the first problems that arises during an OODBPL design is related to the development of a uniform, consistent, and theoretically sound type system that is sufficiently expressive to satisfy the combined needs of object orientation and database programming.The purpose of this article is to answer two questions: "What are the requirements that a modern type system for an object-oriented database programming language should satisfy?" and "Are there any type systems developed to-date that satisfy these requirements?". In order to answer the first question, we compile the set of requirements that an OODBPL type system should satisfy. We then use this set of requirements to evaluate more than 30 existing type systems. The result of this extensive analysis shows that while each of the requirements is satisfied by at least one type system, no type system satisfies all of them. It also enables identification of the mechanisms that lie behind the strengths and weaknesses of the current type systems.

69 citations


Journal ArticleDOI
TL;DR: The aim of the article is to review current approaches to modeling motion together with related data structures and algorithms, and to summarize the challenges that lie ahead in producing a more unified theory of motion representation that would be useful across several disciplines.
Abstract: This article is a survey of research areas in which motion plays a pivotal role. The aim of the article is to review current approaches to modeling motion together with related data structures and algorithms, and to summarize the challenges that lie ahead in producing a more unified theory of motion representation that would be useful across several disciplines.

Journal ArticleDOI
TL;DR: An overview of a wide range of approaches for achieving customizability in the operating systems research community, structured around a taxonomy, is presented.
Abstract: An important goal of an operating system is to make computing and communication resources available in a fair and efficient way to the applications that will run on top of it. To achieve this result, the operating system implements a number of policies for allocating resources to, and sharing resources among applications, and it implements safety mechanisms to guard against misbehaving applications. However, for most of these allocation and sharing tasks, no single optimal policy exists. Different applications may prefer different operating system policies to achieve their goals in the best possible way. A customizable or adaptable operating system is an operating system that allows for flexible modification of important system policies. Over the past decade, a wide range of approaches for achieving customizability has been explored in the operating systems research community. In this survey, an overview of these approaches, structured around a taxonomy, is presented.

Journal ArticleDOI
TL;DR: This tutorial discusses the notion of one-way functions both in a cryptographic and in a complexity-theoretic setting, and considers interactive proof systems and some interesting zero-knowledge protocols.
Abstract: In this tutorial, selected topics of cryptology and of computational complexity theory are presented. We give a brief overview of the history and the foundations of classical cryptography, and then move on to modern public-key cryptography. Particular attention is paid to cryptographic protocols and the problem of constructing key components of protocols such as one-way functions. A function is one-way if it is easy to compute, but hard to invert. We discuss the notion of one-way functions both in a cryptographic and in a complexity-theoretic setting. We also consider interactive proof systems and present some interesting zero-knowledge protocols. In a zero-knowledge protocol, one party can convince the other party of knowing some secret information without disclosing any bit of this information. Motivated by these protocols, we survey some complexity-theoretic results on interactive proof systems and related complexity classes.

Journal Article
TL;DR: The presented theoretical results show that for soft failures, the recovery approach offers maximum roll-forward, and that for hard failures the recovery approaches for these two types of failures are the same.
Abstract: In this paper, a new index-based hybrid checkpointing scheme has been proposed to tackle the problem arising due to "coasting forward". This scheme uses both incremental and communication-induced checkpoints. Failures have been classified as hard and soft in order to take the advantage (i.e. possible reduction of rollback) which the incremental checkpoints offer, while designing the recovery approaches for these two types of failures. The presented theoretical results show that for soft failures, the recovery approach offers maximum roll-forward.