Showing papers in "The Computer Journal in 1998"

PDF

Open Access

Journal Article•DOI•

How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis

[...]

Chris Fraley¹, Adrian E. Raftery¹•Institutions (1)

01 Jan 1998-The Computer Journal

TL;DR: The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model, and the EM result provides a measure of uncertainty about the associated classification of each data point.

...read moreread less

Abstract: We consider the problem of determining the structure of clustered data, without prior knowledge of the number of clusters or any other information about their composition. Data are represented by a mixture model in which each component corresponds to a different cluster. Models with varying geometric properties are obtained through Gaussian components with different parametrizations and cross-cluster constraints. Noise and outliers can be modelled by adding a Poisson process component. Partitions are determined by the expectation-maximization (EM) algorithm for maximum likelihood, with initial values from agglomerative hierarchical clustering. Models are compared using an approximation to the Bayes factor based on the Bayesian information criterion (BIC); unlike significance tests, this allows comparison of more than two models at the same time, and removes the restriction that the models compared be nested. The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model. Moreover, the EM result provides a measure of uncertainty about the associated classification of each data point. Examples are given, showing that this approach can give performance that is much better than standard procedures, which often fail to identify groups that are either overlapping or of varying sizes and shapes.

...read moreread less

2,576 citations

Journal Article•DOI•

Classification of Text Documents

[...]

Yonghong Li¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

01 Jan 1998-The Computer Journal

TL;DR: The adaptive classifier combination method introduced here performed the best on this seven-class Yahoo news groups problem, which is comparable to the performance of other similar studies.

...read moreread less

Abstract: The exponential growth of the internet has led to a great deal of interest in developing useful and efficient tools and software to assist users in searching the Web. Document retrieval, categorization, routing and filtering can all be formulated as classification problems. However, the complexity of natural languages and the extremely high dimensionality of the feature space of documents have made this classification problem very difficult. We investigate four different methods for document classification: the naive Bayes classifier, the nearest neighbour classifier, decision trees and a subspace method. These were applied to seven-class Yahoo news groups (business, entertainment, health, international, politics, sports and technology) individually and in combination. We studied three classifier combination approaches: simple voting, dynamic classifier selection and adaptive classifier combination. Our experimental results indicate that the naive Bayes classifier and the subspace method outperform the other two classifiers on our data sets. Combinations of multiple classifiers did not always improve the classification accuracy compared to the best individual classifier. Among the three different combination approaches, our adaptive classifier combination method introduced here performed the best. The best classification accuracy that we are able to achieve on this seven-class problem is approximately 83%, which is comparable to the performance of other similar studies. However, the classification problem considered here is more difficult because the pattern classes used in our experiments have a large overlap of words in their corresponding documents.

...read moreread less

367 citations

Journal Article•DOI•

A quadtree-based dynamic attribute indexing method

[...]

Jamel Tayeb¹, Özgür Ulusoy¹, Ouri Wolfson²•Institutions (2)

Bilkent University¹, University of Illinois at Chicago²

01 Jan 1998-The Computer Journal

TL;DR: A variant of the quadtree structure is adapted to solve the problem of indexing dynamic attributes based on the key idea of using a linear function of time for each dynamic attribute that allows us to predict its value in the future.

...read moreread less

Abstract: Dynamic attributes are attributes that change continuously over time making it impractical to issue explicit updates for every change. In this paper, we adapt a variant of the quadtree structure to solve the problem of indexing dynamic attributes. The approach is based on the key idea of using a linear function of time for each dynamic attribute that allows us to predict its value in the future. We contribute an algorithm for regenerating the quadtree-based index periodically that minimizes CPU and disk access cost. We also provide an experimental study of performance focusing on query processing and index update overheads.

...read moreread less

218 citations

Journal Article•DOI•

A Strategy for using Genetic Algorithms to Automate Branch and Fault-based Testing

[...]

Bryan F. Jones, David E. Eyres, Harmen-Hinrich Sthamer

01 Jan 1998-The Computer Journal

TL;DR: Genetic algorithms have been used successfully to generate software test data automatically to give 100% branch coverage in up to two orders of magnitude fewer tests than random testing.

...read moreread less

Abstract: Genetic algorithms have been used successfully to generate software test data automatically; all branches were covered with substantially fewer generated tests than simple random testing. We generated test sets which executed all branches in a variety of programs including a quadratic equation solver, remainder, linear and binary search procedures, and a triangle classifier comprising a system of five procedures. We regard the generation of test sets as a search through the input domain for appropriate inputs. The genetic algorithms generated test data to give 100% branch coverage in up to two orders of magnitude fewer tests than random testing. Whilst some of this benefit is offset by increased computation effort, the adequacy of the test data is improved by the genetic algorithm's ability to generate test sets which are at or close to the input subdomain boundaries. Genetic algorithms may be used for fault-based testing where faults associated with mistakes in branch predicates are revealed. The software has been deliberately seeded with faults in the branch predicates (i.e. mutation testing), and our system successfully killed 97% of the mutants.

...read moreread less

116 citations

Journal Article•DOI•

Spotting Method for Classification of Real World Data

[...]

Ryuichi Oka

01 Jan 1998-The Computer Journal

TL;DR: This paper makes the case for a spotting computation scheme which gives rise to a new classification methodology for processing real world data by surveying algorithms developed under the Real World Computing program and related work in Japan.

...read moreread less

Abstract: This paper makes the case for a spotting computation scheme which gives rise to a new classification methodology for processing real world data by surveying algorithms developed under the Real World Computing (RWC) program and related work in Japan. A spotting function has the segmentation-free characteristic which ignores gracefully most real world input data which do not belong to a task domain. Some members of the family of spotting methods have been developed under the RWC program. This paper shows how some spotting methods rise to the challenge of the case made for them. The common computational structure amongst spotting methods suggests an architecture for spotting computation.

...read moreread less

96 citations

Journal Article•DOI•

A Framework for Modelling Trojans and Computer Virus Infection

[...]

Harold Thimbleby¹, Stuart Anderson, Paul Cairns¹•Institutions (1)

Middlesex University¹

01 Jan 1998-The Computer Journal

TL;DR: The actions of Trojan horses and viruses in real computer systems are considered and a minimal framework for an adequate formal understanding of the phenomena is suggested.

...read moreread less

Abstract: It is not possible to view a computer operating in the real world, including the possibility of Trojan horse programs and computer viruses, as simply a finite realisation of a Turing machine. We consider the actions of Trojan horses and viruses in real computer systems and suggest a minimal framework for an adequate formal understanding of the phenomena. Some conventional approaches, including biological metaphors, are shown to be inadequate; some suggestions are made towards constructing virally-resistant systems.

...read moreread less

87 citations

Journal Article•DOI•

Intrinsic Classification of Spatially Correlated Data

[...]

Chris S. Wallace¹•Institutions (1)

Monash University, Clayton campus¹

01 Jan 1998-The Computer Journal

TL;DR: This work extends MML classification to domains where the ‘things’ have a known spatial arrangement and it may be expected that the classes of neighbouring things are correlated, and combines the Snob algorithm with a simple dynamic programming algorithm.

...read moreread less

Abstract: Intrinsic classification, or unsupervised learning of a classification, was the earliest application of what is now termed minimum message length (MML) or minimum description length (MDL) inference. The MML algorithm ‘Snob’ and its relatives have been used successfully in many domains. These algorithms treat the ‘things’ to be classified as independent random selections from an unknown population whose class structure, if any, is to be estimated. This work extends MML classification to domains where the ‘things’ have a known spatial arrangement and it may be expected that the classes of neighbouring things are correlated. Two cases are considered. In the first, the things are arranged in a sequence and the correlation between the classes of successive things modelled by a first-order Markov process. An algorithm for this case is constructed by combining the Snob algorithm with a simple dynamic programming algorithm. The method has been applied to the classification of protein secondary structure. In the second case, the things are arranged on a two-dimensional (2D) square grid, like the pixels of an image. Correlation is modelled by a prior over patterns of class assignments whose log probability depends on the number of adjacent mismatched pixel pairs. The algorithm uses Gibbs sampling from the pattern posterior and a thermodynamic relation to calculate message length.

...read moreread less

58 citations

Journal Article•DOI•

High-Performance Operations Using a Compressed Database Architecture

[...]

W. P. Cockshott¹, Douglas R. McGregor, J. Wilson•Institutions (1)

University of Glasgow¹

01 Jan 1998-The Computer Journal

TL;DR: A new approach to database systems architecture is intended to take advantage of solid-state memory in combination with data compression to provide substantial performance improvements and is capable of greater cost/effectiveness than conventional approaches.

...read moreread less

Abstract: Future database applications will require significant improvements in performance beyond the capabilities of conventional disk based systems. This paper describes a new approach to database systems architecture, which is intended to take advantage of solid-state memory in combination with data compression to provide substantial performance improvements. The compressed data representation is tailored to the data manipulation operations requirements. The architecture has been implemented and measurements of performance are compared to those obtained using other high-performance database systems. The results indicate from one to five orders of magnitude speed-up in retrieval, equivalent or slightly faster performance during insertion (and compression) of data, while achieving approximately one order of magnitude compression in data volume. The resultant architecture is thus capable of greater cost/effectiveness than conventional approaches.

...read moreread less

46 citations

Journal Article•DOI•

Performance Evaluation of Forwarding Strategies for Location Management in Mobile Networks

[...]

Ing-Ray Chen¹, Tsong Min Chen², Chiang Lee•Institutions (2)

The Graduate Center, CUNY¹, National Cheng Kung University²

01 Jan 1998-The Computer Journal

TL;DR: This paper uses a Markov chain to describe the behavior of the mobile user and analyzes the best time when forwarding and resetting should be performed in order to optimize the service rate of the PCS network.

...read moreread less

Abstract: This paper presents a methodology for evaluating the performance of forwarding strategies for location management in a personal communication services (PCS) mobile network. A forwarding strategy in the PCS network can be implemented by two mechanisms: a forwarding operation which follows a chain of databases to locate a mobile user and a resetting operation which updates the databases in the chain so that the current location of a mobile user can be known directly without having to follow a chain of databases. In this paper, we consider the PCS network as a server whose function is to provide services to the mobile user for ‘updating the location of the user as the user moves across a database boundary’ and ‘locating the mobile user’. We use a Markov chain to describe the behavior of the mobile user and analyze the best time when forwarding and resetting should be performed in order to optimize the service rate of the PCS network. We demonstrate the applicability of our approach with hexagonal and mesh coverage models for the PCS network and provide a physical interpretation of the result.

...read moreread less

45 citations

Journal Article•DOI•

Postmodern Software Development

[...]

Hugh Robinson, Patrick A. V. Hall, Fiona Hovenden, Janet Rachel

01 Jan 1998-The Computer Journal

TL;DR: The ‘software crisis’ is discussed as a social and cultural phenomenon, arguing that it can be viewed as (one more) manifestation of postmodernism.

...read moreread less

Abstract: We discuss the ‘software crisis’ as a social and cultural phenomenon, arguing that it can be viewed as (one more) manifestation of postmodernism. We illustrate our argument with a range of examples taken from software engineering, demonstrating software engineering’s roots in (and

...read moreread less

44 citations

Journal Article•DOI•

Algorithmic design of the globe wide-area location service

[...]

M. van Steen¹, Franz J. Hauck¹, G.C. Ballintijn¹, Andrew S. Tanenbaum¹•Institutions (1)

VU University Amsterdam¹

01 Jan 1998-The Computer Journal

TL;DR: The algorithmic design of a worldwide location service for distributed objects is described, based on a worldwide distributed search tree in which addresses are stored at different levels, depending on the migration pattern of the object.

...read moreread less

Abstract: We describe the algorithmic design of a worldwide location service for distributed objects. A distributed object can reside at multiple locations at the same time, and offers a set of addresses to allow client processes to contact it. Objects may be highly mobile like, for example, software agents or Web applets. The proposed location service supports regular updates of an object's set of contact addresses, as well as efficient look-up operations. Our design is based on a worldwide distributed search tree in which addresses are stored at different levels, depending on the migration pattern of the object. By exploiting an object's relative stability with respect to a region, combined with the use of pointer caches, look-up operations can be made highly efficient.

...read moreread less

Journal Article•DOI•

Why Church's Thesis Still Holds. Some Notes on Peter Wegner's Tracts on Interaction and Computability

[...]

Michael Prasse¹, Peter Rittgen¹•Institutions (1)

University of Koblenz and Landau¹

01 Jan 1998-The Computer Journal

TL;DR: Peter Wegner’s definition of computability differs markedly from the classical term as established by Church, Kleene, Markov, Post, Turing, Turing et al, and it is shown that Church's thesis still holds.

...read moreread less

Abstract: Peter Wegner’s definition of computability differs markedly from the classical term as established by Church, Kleene, Markov, Post, Turing et al. Wegner identifies interaction as the main feature of today’s systems which is lacking in the classical treatment of computability. We compare the different approaches and argue whether or not Wegner’s criticism is appropriate. Taking into account the major arguments from the literature, we show that Church’s thesis still holds.

...read moreread less

Journal Article•DOI•

Adaptive Testing of a Deterministic Implementation Against a Nondeterministic Finite State Machine

[...]

Robert M. Hierons¹•Institutions (1)

University of London¹

01 Jan 1998-The Computer Journal

TL;DR: This work shows that methods used to derive a checking experiment from a nondeterministic finite state machine can be extended if it is known that the implementation is equivalent to some (unknown) deterministic infinite state machine.

...read moreread less

Abstract: A number of authors have looked at the problem of deriving a checking experiment from a nondeterministic finite state machine that models the required behaviour of a system. We show that these methods can be extended if it is known that the implementation is equivalent to some (unknown) deterministic finite state machine. When testing a deterministic implementation, the test output provides information about the implementation under test and can thus guide future testing. The use of an adaptive test process is thus proposed.

...read moreread less

Journal Article•DOI•

[...]

Valerie J. Gillet, David J. Wild, Peter Willett¹, John Bradshaw•Institutions (1)

University of Sheffield¹

01 Jan 1998-The Computer Journal

TL;DR: This paper reviews measures of similarity and dissimilarity between pairs of chemical molecules and the use of such measures for processing chemical databases, focusing upon measures that are based on fragment bit-string occurrence data.

...read moreread less

Abstract: This paper reviews measures of similarity and dissimilarity between pairs of chemical molecules and the use of such measures for processing chemical databases. The applications discussed include similarity searching, database clustering and diversity analysis, focusing upon measures that are based on fragment bit-string occurrence data. The paper then discusses recent work on the calculation of similarity by aligning molecular fields and on the selection of structurally diverse subsets of chemical databases.

...read moreread less

Journal Article•DOI•

The Complexity of Interval Routing on Random Graphs

[...]

Michele Flammini, J. van Leeuwen, Alberto Marchetti-Spaccamela

01 Jan 1998-The Computer Journal

TL;DR: It is shown that for suitably large n, there are suitable values of p such that for randomly chosen graphs G ∈?

...read moreread less

Abstract: Several methods exist for routing messages in a network without using complete routing tables (compact routing). In k-interval routing schemes (k-IRS), links carry up to k intervals each. A message is routed over a certain link if its destination belongs to one of the intervals of the link. We present some results for the necessary value of k in order to achieve shortest-path routing. Even though low values of k suffice for very structured networks, we show that for 'general graphs' interval routing cannot significantly reduce the space requirements for shortest-path routing. In particular we show that for suitably large n, there are suitable values of p such that for randomly chosen graphs G ∈? n,P following holds, with high probability: if G admits an optimal k-IRS, then k = Ω(n 1 - 6/ln(np) - ln(np) / ln n ). The result is obtained by means of a novel matrix representation for the shortest paths in a network.

...read moreread less

Journal Article•DOI•

A Weakest Precondition Semantics for Z

[...]

Ana Cavalcanti¹, Jim Woodcock¹•Institutions (1)

University of Oxford¹

01 Jan 1998-The Computer Journal

TL;DR: This paper actually construct a weakest precondition semantics from a relational semantics proposed by the Z standards panel and additionally establishes an isomorphism between weakest precONDitions and relations.

...read moreread less

Abstract: The lack of a method for developing programs from Z specifications is a widely recognized difficulty. In response to this problem, different approaches to the integration of Z with a refinement calculus have been proposed. These programming techniques are promising, but as far as we know, have not been formalized. Since they are based on refinement calculi formalized in terms of weakest preconditions, the definition of a weakest precondition semantics for Z is a significant contribution to the solution of this problem. In this paper, we actually construct a weakest precondition semantics from a relational semantics proposed by the Z standards panel. The construction provides reassurance as to the adequacy of the resulting semantics definition and additionally establishes an isomorphism between weakest preconditions and relations. Compositional formulations for the weakest precondition of some schema calculus expressions are provided.

...read moreread less

Journal Article•DOI•

Downdating the Latent Semantic Indexing Model for Conceptual Information Retrieval

[...]

Dian I. Witter¹, Michael W. Berry¹•Institutions (1)

University of Tennessee¹

01 Jan 1998-The Computer Journal

TL;DR: Implementing the DRM method within LSI++ not only provides downdating functionality, but is less time consuming than recomputing the SVD when removing a term, document or both.

...read moreread less

Abstract: Due to the growth of large data collections, information retrieval or database searching is of vital importance. Lexical matching techniques may retrieve irrelevant or inaccurate results because of synonyms and polysemous words, so effective concept-based techniques are needed. One such technique is latent semantic indexing (LSI) which uses a vector-space approach by identifying documents whose content is related to the user's query in order of similarity. LSI uses the singular value decomposition (SVD) of term-by-document matrix to encode the terms and documents in a vector-space model. Existing methods for removing terms or documents from the term-document space are either time consuming or do not sufficiently change the term-document relationships. This paper presents a new method for downdating, downdating the reduced model (or DRM) method, and discusses its implementation into the LSI++ software environment. The DRM method can be used to assess the effect that a term or document has on the clustering of relevant information in a collection and for the incorporation of user feedback in the existing LSI model. Implementing the DRM method within LSI++ not only provides downdating functionality, but is less time consuming than recomputing the SVD when removing a term, document or both. The DRM method is a viable algorithm for dynamic information modeling and retrieval.

...read moreread less

Journal Article•DOI•

On the Expressiveness of Links in Hypertext Systems

[...]

Luc Moreau¹, Wendy Hall¹•Institutions (1)

University of Southampton¹

01 Jan 1998-The Computer Journal

TL;DR: The conclusion is that simple links, whether embbeded or separate, generic links, and some adaptive links all give hypertext systems the power of finite state automata.

...read moreread less

Abstract: In this paper, we study how linking mechanisms contribute to the expressiveness of hypertext systems. For this purpose, we formalize hypertext systems as abstract machines. As the primary benefit of hypertext systems is to be able to read documents non-linearly, their expressiveness is defined in terms of the ability to follow links. Then, we classify hypertext systems according to the power of the underlying automaton. The model allows us to compare embedded versus separate links and simple versus generic links. Then, we investigate history mechanisms, adaptive hypertexts and functional links. Our conclusion is that simple links, whether embedded or separate, generic links and some adaptive links all give hypertext systems the power of finite state automata. The history mechanism confers to them the power of pushdown automata, whereas the general functional links give them Turing completeness.

...read moreread less

Journal Article•DOI•

An Efficient Hash-Based Algorithm for Sequence Data Searching

[...]

Kelvin Kam Wing Chu¹, Sze Kin Lam¹, Man Hon Wong¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Jan 1998-The Computer Journal

TL;DR: A definition of sequence similarity based on the shape of sequences is introduced to handle sequence matching with linear scaling in both amplitude and time dimensions and a fast sequence searching algorithm based on extendable hashing is proposed.

...read moreread less

Abstract: In real life, data collected day by day often appear in sequences and this type of data is called sequence data. The technique of searching for similar patterns among sequence data is very important in many applications. We first point out that there are some deficiencies in the existing definitions of sequence similarity. We then introduce a definition of sequence similarity based on the shape of sequences. The definition is also extended to handle sequence matching with linear scaling in both amplitude and time dimensions. A fast sequence searching algorithm based on extendable hashing is also proposed. The algorithm can match linearly scaled sequences and guarantee that no qualified data subsequence is falsely rejected. Several experiments are performed on real data (stock price movement) and synthetic data to measure the performance of the algorithm in different aspects.

...read moreread less

Journal Article•DOI•

Proximity Search with a Triangulated Spatial Model

[...]

C. B. Jones¹, J. M. Ware¹•Institutions (1)

University of South Wales¹

01 Jan 1998-The Computer Journal

TL;DR: The proximity relations inherent in triangulations of geometric data can be exploited in the implementation of nearest-neighbour search procedures, relevant to applications such as terrain analysis, cartography and robotics, in which triangulation may be used to model the spatial data.

...read moreread less

Abstract: The proximity relations inherent in triangulations of geometric data can be exploited in the implementation of nearest-neighbour search procedures. This is relevant to applications such as terrain analysis, cartography and robotics, in which triangulations may be used to model the spatial data. Here we describe neighbourhood search procedures within constrained Delaunay triangulations of the vertices of linear objects, for the queries of nearest object to an object and the nearest object to an arbitrary point. The procedures search locally from object edges, or from a query point, to build triangulated regions that extend from the source edge or point by a distance at least equal to that to its nearest neighbouring feature. Several geographical datasets have been used to evaluate the procedures experimentally. Average numbers of edge‐edge distance calculations to find the nearest line feature edge disjoint to another line feature edge ranged between 15 and 39 for the different datasets examined, while the average numbers of point‐edge distance calculations to determine the nearest edge to an arbitrary point ranged between 7 and 35.

...read moreread less

Journal Article•DOI•

Sequential diagnosis with asymmetrical tests

[...]

Anton Biasizzo, A. Žužek, Franc Novak

01 Jan 1998-The Computer Journal

TL;DR: The generalization of the test sequencing problem, originally defined for symmetrical tests, that also covers asymmetrical tests is presented, proving that the same heuristics that has been employed in the traditional solution of the problem can be employed also for the generalized case.

...read moreread less

Abstract: In this paper we present the generalization of the test sequencing problem, originally defined for symmetrical tests, that also covers asymmetrical tests. We prove that the same heuristics that has been employed in the traditional solution of the problem (e.g., the AO * algorithm with heuristics based on Huffman's coding) can be employed also for the generalized case. Examples are given to illustrate the approach.

...read moreread less

Journal Article•DOI•

Protecting IT Systems from Cyber Crime

[...]

R. Benjamin, B. Gladman, Brian Randell

01 Jan 1998-The Computer Journal

TL;DR: In this article, the authors examine the nature and significance of various potential attacks, and survey the defence options available, concluding that IT owners need to think of the threat in more global terms, and to give a new focus and priority to their defence.

...read moreread less

Abstract: Large-scale commercial, industrial and financial operations are becoming ever more interdependent, and ever more dependent on IT. At the same time, the rapidly growing interconnectivity of IT systems, and the convergence of their technology towards industry-standard hardware and software components and sub-systems, renders these IT systems increasingly vulnerable to malicious attack. This paper is aimed particularly at readers concerned with major systems employed in medium to large commercial or industrial enterprises. It examines the nature and significance of the various potential attacks, and surveys the defence options available. It concludes that IT owners need to think of the threat in more global terms, and to give a new focus and priority to their defence. Prompt action can ensure a major improvement in IT resilience at a modest marginal cost, both in terms of finance and in terms of normal IT operation.

...read moreread less

Journal Article•DOI•

Formal Specification of Concurrent Systems: A Structured Approach

[...]

Antonino Mazzeo, Nicola Mazzocca, Stefano Russo, Carlo Savy, Valeria Vittorini - Show less +1 more

01 Jan 1998-The Computer Journal

TL;DR: In this paper an algorithmic transformation from a trace-based specification of a concurrent system to a Petri Net model is described and Causal dependencies between behaviours of the system components are introduced in the net model through the definition of external assumptions.

...read moreread less

Abstract: CSP and Petri Nets are powerful formalisms for the specification and the analysis of concurrent systems. We present an approach to their integration to take advantage of both formalisms. In particular the GSPN class is used to address dependability and real-time aspects. In this paper an algorithmic transformation from a trace-based specification of a concurrent system to a Petri Net model is described. Causal dependencies between behaviours of the system components are introduced in the net model through the definition of external assumptions. The steps of the integration are illustrated by applying them to an unmanned transportation problem.

...read moreread less

Journal Article•DOI•

Least-Squares Structuring, Clustering and Data Processing Issues

[...]

Boris Mirkin¹•Institutions (1)

Rutgers University¹

01 Jan 1998-The Computer Journal

TL;DR: Approximation structuring clustering appears to be not only a mathematical device to support, specify and extend many clustering techniques, but also a framework for mathematical analysis of interrelations among the techniques and their relations to other concepts and problems in data analysis, statistics, machirre learning, data compression and decompression and the design and use of multiresolution hierarchies.

...read moreread less

Abstract: Approximation structuring clustering is an extension of what is usually called 'square-error clustering' onto various cluster structures and data formats. It appears to be not only a mathematical device to support, specify and extend many clustering techniques, but also a framework for mathematical analysis of interrelations among the techniques and their relations to other concepts and problems in data analysis, statistics, machirre learning, data compression and decompression and the design and use of multiresolution hierarchies. Based on the results found, a number of methods for solving data processing problems are described.

...read moreread less

Journal Article•DOI•

Optimizing Average Job Response Time via Decentralized Probabilistic Job Dispatching in Heterogeneous Multiple Computer Systems

[...]

Keqin Li¹•Institutions (1)

State University of New York System¹

01 Jan 1998-The Computer Journal

TL;DR: In this article, the authors study decentralized probabilistic job dispatching and load balancing strategies which optimize the performance of heterogeneous M/G/1 computer systems, and derive closed form solutions for optimal dispatching probabilities which minimize the average job response time when all nodes have an identical coefficient of variation for job execution times.

...read moreread less

Abstract: In this paper, we study decentralized probabilistic job dispatching and load balancing strategies which optimize the performance of heterogeneous multiple computer systems. We present a model to study a heterogeneous multiple computer system with a decentralized stochastic job dispatching mechanism, where nodes are treated as M/G/1 servers. We discuss a way to implement a virtual centralized job dispatcher using a distributed control mechanism. We derive closed form solutions for optimal job dispatching probabilities which minimize the average job response time, when all nodes have an identical coefficient of variation for job execution times. We also generalize the results to the case where nodes have different coefficients of variation for job execution times.

...read moreread less

Journal Article•DOI•

More on the Efficiency of Interval Routing

[...]

Savio S. H. Tse¹, Francis C. M. Lau¹•Institutions (1)

University of Hong Kong¹

01 Jan 1998-The Computer Journal

TL;DR: The counterproof in this paper is given and the corrected bound on the longest routing path that was derived is not, which shows that the interval routing algorithm cannot be optimal in networks with arbitrary topology.

...read moreread less

Abstract: Interval routing is a space-efficient routing method for computer networks. The method is said to be optimal if it can generate optimal routing paths for any source-destination node pair. A path is optimal if it is a shortest path between the two nodes involved. A seminal result in the area, however, has pointed out that 'the interval routing algorithm cannot be optimal in networks with arbitrary topology'. The statement is correct but the lower bound on the longest routing path that was derived is not. We give the counterproof in this paper and the corrected bound.

...read moreread less

Journal Article•DOI•

The Representation of Symmetric Proximity Data: Dimensions and Classifications

[...]

Lawrence Hubert, Phipps Arabie, Jacqueline J. Meulman

01 Jan 1998-The Computer Journal

TL;DR: A review is given for the data analysis task of representing a symmetric proximity matrix by a sum of matrices each having the restrictive anti-Robinson (AR) form, with an emphasis on the inclusion of an optimal monotonic transformation of the given proximity matrix.

...read moreread less

Abstract: A review is given for the data analysis task of representing a symmetric proximity matrix, defined for some object set, by a sum of matrices each having the restrictive anti-Robinson (AR) form. An emphasis is placed on the inclusion of an optimal monotonic transformation of the given proximity matrix and what each AR component of an additive decomposition might be depicting by imposing further restrictions to obtain approximating matrices that are strongly AR, or that provide unidimensional scales or ultrametrics. Three published data sets are used to illustrate the process of constructing the initial decomposition and then giving a substantive interpretation subsequently for each of the terms in the fitted sum. An extension to circular anti-Robinson (CAR) matrices is also discussed briefly and illustrated, along with further restrictions to circular unidimensional scales and circular strongly AR forms.

...read moreread less

Journal Article•DOI•

Distributed Atomic Actions in Ada 95

[...]

Stuart E. Mitchell, Andy Wellings, Alexander Romanovsky

01 Jan 1998-The Computer Journal

TL;DR: The development of a distributed asynchronous atomic action scheme for Ada 95 makes use of many unique Ada 95 features including protected objects, asynchronous transfer of control and the distributed systems annex.

...read moreread less

Abstract: This paper discusses the development of a distributed asynchronous atomic action scheme for Ada 95 The scheme makes use of many unique Ada 95 features including protected objects, asynchronous transfer of control and the distributed systems annex We present the packages which implement the local and global action support and illustrate their use in a (partial) implementation of the FZI production cell problem We also discuss a number of variations of the model and how these might be included Finally, we discuss how the distribution model used in Ada 95 has influenced our design

...read moreread less

Journal Article•DOI•

Detecting Substitutions and Transpositions of Characters

[...]

Khaled Abdel-Ghaffar¹•Institutions (1)

University of California, Davis¹

01 Jan 1998-The Computer Journal

TL;DR: In this paper, codes that detect a single substitution error or a single transposition error are studied and it is shown that such codes of length n over an alphabet of q characters have at most q n-1 codewords if q > 3 and at most [2 n /3] codeword if q = 2.

...read moreread less

Abstract: Substitution errors, where individual characters are altered, and transposition errors, where two consecutive characters are interchanged, are commonly caused by human operators. In this paper, codes that detect a single substitution error or a single transposition error are studied. In particular, it is shown that such codes of length n over an alphabet of q characters have at most q n-1 codewords if q > 3 and at most [2 n /3] codewords if q = 2. Codes which have that many codewords are called optimal codes. We present optimal codes for all values of n and q. Simple encoding techniques for these codes are also described.

...read moreread less

Journal Article•DOI•

Fault-tolerant broadcasting on the arrangement graph

[...]

L. Q. Bai, Hiroyuki Ebara, H. Nakano, H. Maeda

01 Jan 1998-The Computer Journal

TL;DR: Combining the fault-tolerant procedure and the optimal broadcasting algorithm, a fault-Tolerant broadcasting is achieved on the arrangement graph.

...read moreread less

Abstract: This paper proposes a distributed fault-tolerant algorithm for one-to-all broadcasting in the one-port communication model on the arrangement graph. Exploiting the hierarchical properties of the arrangement graph to constitute different-sized broadcasting trees for different-sized subgraphs, we propose a distributed algorithm with optimal time complexity and without message redundancy for one-to-all broadcasting in the one-port communication model for the fault-free arrangement graph. According to the property that there is a family of k(n - k) node-disjoint paths between any two nodes, we develop a fast fault-tolerant procedure capable of sending a message from a node to its adjacent nodes on the (n, k)-arrangement graph with less than k(n - k) faulty edges. Combining the fault-tolerant procedure and the optimal broadcasting algorithm, a fault-tolerant broadcasting is achieved on the arrangement graph. It is shown that a message can be broadcast to all the other (n!/(n - k)!)- 1 processors in O(k lg n) steps if no faults exist on the (n, k)-arrangement graph, and in O(k 2 lg n + klg 2 n)) steps if the number of faulty edges is less than k(n - k).

...read moreread less