scispace - formally typeset
Search or ask a question

Showing papers by "Jeffrey Scott Vitter published in 1992"


Proceedings ArticleDOI
01 Jul 1992
TL;DR: Efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem are presented.
Abstract: We present efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem. Without any constraint violation, the e-approximation problem for many problems of this type is itself NP-hard. Our methods provide polynomial-time e-approximations while attempting to minimize the packing constraint violation.Our methods lead to the first known approximation algorithms with provable performance guarantees for the s-median problem, the tree prunning problem, and the generalized assignment problem. These important problems have numerous applications to data compression, vector quantization, memory-based learning, computer graphics, image processing, clustering, regression, network location, scheduling, and communication. We provide evidence via reductions that our approximation algorithms are nearly optimal in terms of the packing constraint violation. We also discuss some recent applications of our techniques to scheduling problems.

227 citations


01 Jun 1992
TL;DR: Efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem are presented.
Abstract: We present efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem. Without any constraint violation, the epsilon-approximation problem for many problems of this type is itself NP-hard. Our methods provide polynomial-time epsilon-approximations while attempting to minimize the packing constraint violation. Our methods lead to the first known approximation algorithms with provable performance guarantees for the s-median problem, the tree pruning problem, and the generalized assignment problem. These important problems have numerous applications to data compression, vector quantization, memory-based learning, computer graphics, image processing, clustering, regression, network location, scheduling, protocol testing, and communication. We provide evidence via reductions that our approximation algorithms are nearly optimal in terms of the packing constraint violation. We also discuss some recent applications of our techniques to scheduling problems.

212 citations


Journal ArticleDOI
TL;DR: This paper presents approximation algorithms for median problems in metric spaces and fixed-dimensional Euclidean space that use a new method for transforming an optimal solution of the linear program relaxation of the s-median problem into a provably good integral solution.

191 citations


Journal ArticleDOI
TL;DR: The optimal sorting algorithm is randomized and is based upon the probabilistic partitioning technique developed in the companion paper for optimal disk sorting in a two-level memory with parallel block transfer.
Abstract: In this paper we introduce parallel versions of two hierarchical memory models and give optimal algorithms in these models for sorting, FFT, and matrix multiplication. In our parallel models, there are $P$ memory hierarchies operating simultaneously; communication among the hierarchies takes place at a base memory level. Our optimal sorting algorithm is randomized and is based upon the probabilistic partitioning technique developed in the companion paper for optimal disk sorting is a two-level memory with parallel block transfer. The probability of using $\ell$ times the optimal running time is exponentially small in $\ell$(log $\ell$)log $P$.

119 citations


Journal ArticleDOI
TL;DR: This article explains why and by how much scaling increases the code length for files with a homogeneous distribution of symbols, and characterize the reduction in code length due to scaling for files exhibiting locality of reference.
Abstract: Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this article we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodic scaling is often used in arithmetic coding implementations to reduce time and storage requirements, it also introduces a recency effect which can further affect compression. Our main contribution is introducing the concept of weighted entropy and using it to characterize in an elegant way the effect that periodic scaling has on the code length. We explain why and by how much scaling increases the code length for files with a homogeneous distribution of symbols, and we characterize the reduction in code length due to scaling for files exhibiting locality of reference. We also give a rigorous proof that the coding effects of rounding scaled weights, using integer arithmetic, and encoding end-of-file are negligible.

106 citations


Journal ArticleDOI
TL;DR: This work presents two new methods (called MLP and PPPM) for lossless compression, both involving linear prediction, modeling prediction errors by estimating the variance of a Laplace distribution, and coding using arithmetic coding applied to precomputed distributions.
Abstract: We give a new paradigm for lossless image compression, with four modular components: pixel sequence, prediction, error modeling and coding. We present two new methods (called MLP and PPPM) for lossless compression, both involving linear prediction, modeling prediction errors by estimating the variance of a Laplace distribution, and coding using arithmetic coding applied to precomputed distributions. The MLP method is both progressive and parallelizable. We give results showing that our methods perform significantly better than other currently used methods for lossless compression of high resolution images, including the proposed JPEG standard. We express our results both in terms of the compression ratio and in terms of a useful new measure of compression efficiency, which we call compression gain.

63 citations


Proceedings ArticleDOI
24 Mar 1992
TL;DR: An algorithm based on the hierarchical multi-level progressive (MLP) method is presented, used either with Huffman coding or with a new variant of arithmetic coding called quasi-arithmetic coding.
Abstract: The authors show that high-resolution images can be encoded and decoded efficiently in parallel. They present an algorithm based on the hierarchical multi-level progressive (MLP) method, used either with Huffman coding or with a new variant of arithmetic coding called quasi-arithmetic coding. The coding step can be parallelized, even though the codes for different pixels are of different lengths; parallelization of the prediction and error modeling components is straightforward. >

56 citations


Proceedings ArticleDOI
24 Mar 1992
TL;DR: A new method for error modeling applicable to the multi-level progressive (MLP) algorithm for hierarchical lossless image compression is presented, based on a concept called the variability index, which provides accurate models for pixel prediction errors without requiring explicit transmission of the models.
Abstract: The authors present a new method for error modeling applicable to the multi-level progressive (MLP) algorithm for hierarchical lossless image compression. This method, based on a concept called the variability index, provides accurate models for pixel prediction errors without requiring explicit transmission of the models. They also use the variability index to show that prediction errors do not always follow the Laplace distribution, as is commonly assumed; replacing the Laplace distribution with a more general distribution further improves compression. They describe a new compression measurement called compression gain, and give experimental results showing that the using variability index gives significantly better compression than other methods in the literature. >

31 citations


Journal ArticleDOI
TL;DR: This paper extends Valiant''s sequential model of concept learning from examples and introduces models for the efficient learning of concept classes from examples {\em in parallel} and shows that several concept classes which are polynomial-learnable are {\em NC}- learnable in constant time.
Abstract: In this paper, we extend Valiant's (Comm. ACM27 (1984), 1134–1142) sequential model of concept learning from examples and introduce models for the efficient learning of concept classes from examples in parallel. We say that a concept class is NC -learnable if it can be learned in polylog time with a polynomial number of processors. We show that several concept classes which are polynomial-time learnable are NC -learnable in constant time. Some other classes can be shown to be NC -learnable in logarithmic time, but not in constant time. Our main result shows that other classes, such as s-fold unions of geometrical objects in Euclidean space, which are polynomial-time learnable by a greedy set cover technique, are NC -learnable using a nongreedy technique. We also show that (unless P ⊆ RNC ) several polynomial-time learnable concept classes related to linear programming are not NC -learnable. Equivalence of various parallel learning models and issues of fault-tolerance are also discussed.

29 citations


Proceedings ArticleDOI
01 Jul 1992
TL;DR: This model is built upon the generalized PAC learning model of Haussler and is closely related to the method of vector quantization in data compression and can build memory-based learning systems using new clustering algorithms to PAC-learn in polynomial time using onlyPolynomial storage in typical situations.
Abstract: A memory-based learning system is an extended memory management system that decomposes the input space either statically or dynamically into subregions for the purpose of storing and retrieving functional information. The main generalization techniques employed by memory-based learning systems are the nearest-neighbor search, space decomposition techniques, and clustering. Research on memory-based learning is still in its early stage. In particular, there are very few rigorous theoretical results regarding memory requirement, sample size, expected performance, and computational complexity. In this paper, we propose a model for memory-based learning and use it to analyze several methods— e-covering, hashing, clustering, tree-structured clustering, and receptive-fields— for learning smooth functions. The sample size and system complexity are derived for each method. Our model is built upon the generalized PAC learning model of Haussler and is closely related to the method of vector quantization in data compression. Our main result is that we can build memory-based learning systems using new clustering algorithms [LiVb] to PAC-learn in polynomial time using only polynomial storage in typical situations.

20 citations


Book ChapterDOI
13 Feb 1992
TL;DR: In this paper, a simple and efficient output-sensitive algorithm for constructing the display of a polyhedral terrain is presented, which runs in O((d + n) + log 2 n) time, where n is the size of the final display.
Abstract: In this paper we give a simple and efficient output-sensitive algorithm for constructing the display of a polyhedral terrain. It runs in $O((d + n)\log^2 n)$ time, where $d$ is the size of the final display. The main data structure maintains an implicit representation of the convex hull of a set of points that can be dynamically updated in $O(\log^2 n)$ time. It is especially simple and fast in our application since there are no rebalancing operations required in the tree.

01 Aug 1992
TL;DR: An optimal deterministic algorithm for external sorting on multiple disks called Balance Sort is presented, which improves upon the randomized optimal algorithm of Vitter and Shriver as well as the (non-optimal) commonly-used technique of disk striping.
Abstract: We present an optimal deterministic algorithm called Balance Sort for external sorting on multiple disks. Our measure of performance is the number of input/output (I/O) operations. In each I/O, each disk can simultaneously transfer a block of data. Our algorithm improves upon the randomized optimal algorithm of Vitter and Shriver as well as the (non-optimal) commonly-used technique of disk striping. It also improves upon our earlier merge-based sorting algorithm in that it has smaller constants hidden in the big~oh notation, it is possible to implement using only striped writes (but independent reads), and it has application to parallel memory hierarchies.

Proceedings ArticleDOI
24 Mar 1992
TL;DR: The authors present the first known polynomial-time full-search vector quantization codebook design algorithm and tree pruning algorithm with provable worst-case performance guarantees and introduces the notion of pseudorandom pruned tree-structured vector quantizers.
Abstract: The authors present new vector quantization algorithms. The new approach is to formulate a vector quantization problem as a 0-1 integer linear program. They first solve its relaxed linear program by linear programming techniques. Then they transform the linear program solution into a provably good solution for the vector quantization problem. These methods lead to the first known polynomial-time full-search vector quantization codebook design algorithm and tree pruning algorithm with provable worst-case performance guarantees. They also introduce the notion of pseudorandom pruned tree-structured vector quantizers. Initial experimental results on image compression are very encouraging. >

01 Aug 1992
TL;DR: This technical report shows how to adapt Balance Sort to sort deterministically in parallel memory hierarchies and the algorithms so derived will be optimal for all parallelMemory hierarchies for which an optimal algorithm is known for a single hierarchy.
Abstract: We present a general deterministic sorting strategy that is applicable to a wide variety of parallel memory hierarchies with parallel processors. The simplest incarnation of the strategy is an optimal deterministic algorithm called Balance Sort for external sorting on multiple disks with a single CPU. Balance Sort was the topic of a previous technical report. This technical report shows how to adapt Balance Sort to sort deterministically in parallel memory hierarchies. The algorithms so derived will be optimal for all parallel memory hierarchies for which an optimal algorithm is known for a single hierarchy. In the case of $D$ disks, $P$ processors, block size $B$, and internal memory size $M$, they are optimal in terms of I/Os for any $P \leq M$ and $DB M \log \min\{M/B, \log M\}/\log M$ and $\log M/B = o(\log M)$.

Journal ArticleDOI
TL;DR: A new hidden-line elemination technique for displaying the perspective view of a scene of three-dimensional isothetic parallelepipeds (3D-rectangles) with efficient alternative to dynamic fractional cascading for use with augmented segment and range trees when the universe is fixed beforehand.
Abstract: We present a new hidden-line elemination technique for displaying the perspective view of a scene of three-dimensional isothetic parallelepipeds (3D-rectangles). We assume that the 3D-rectangles are totally ordered based upon the dominance relation of occlusion. The perspective view is generated incrementally, starting with the closest 3D-rectangle and proceeding away from the view point. Our algorithm is scene-sensitive and uses0((n +d) logn log logn) time, wheren is the number of 3D-rectangles andd is the number of edges of the display. This improves over the heretofore best known technique. The primary data structure is an efficient alternative to dynamic fractional cascading for use with augmented segment and range trees when the universe is fixed beforehand. It supports queries inO((logn +k) log logn) time, wherek is the size of the response, and insertions and deletions inO(logn log logn) time, all in the worst case.

01 Sep 1992
TL;DR: In this paper, the problem of using disk blocks efficiently in searching graphs that are too large to fit in internal memory was considered, where a vertex can be represented any number of times on the disk in order to take advantage of redundancy.
Abstract: In this paper we consider the problem of using disk blocks efficiently in searching graphs that are too large to fit in internal memory. Our model allows a vertex to be represented any number of times on the disk in order to take advantage of redundancy. We give matching upper and lower bounds for complete d-ary trees and d-dimensional grid graphs, as well as for classes of general graphs that intuitively speaking have a close to uniform number of neighbors around each vertex.

Proceedings ArticleDOI
01 Dec 1992
TL;DR: The authors give efficient parallel algorithms to compute shortest-paths in planar layered digraphs that use one-way separators to give divide-and-conquer solutions to the problem of finding the shortest paths.
Abstract: The authors give efficient parallel algorithms to compute shortest-paths in planar layered digraphs. They show that these digraphs admit special kinds of separators, called one-way separators, which allow paths in the graph to cross them only once. They use these separators to give divide-and-conquer solutions to the problem of finding the shortest paths. They first give a simple algorithm that works on the CREW (concurrent-read exclusive-write) PRAM (parallel random-across machine) model and computes the shortest path between any two vertices of an n-node planar layered diagraph in time O(log/sup 3/ n) using n/log n processors. A CRCW (concurrent-read concurrent-write) version of this algorithm runs in O(log/sup 2/ n log log n) time and uses O(n/log log n) processors. The authors then improve the time bound to O(log/sup 2/ n) on the CREW model and O(log n log log n) on the CRCW model. The processor bounds still remain n log n for the CREW model and n/log log n for the CRCW model. >

01 Jan 1992
TL;DR: The goal of the research under the Multiparadigm Design Environments project was to develop prototype environments to support the design of complex software and VLSI systems and examined the expressive power and optimization of database/programming languages, and the expression and exploitation of parallelism.
Abstract: : The goal of the research under the Multiparadigm Design Environments project was to develop prototype environments to support the design of complex software and VLSI systems. Research on this project has produced the following results: (1) New methods for programming in terms of conceptual models; (2) Design of object-oriented languages; (3) Compiler optimization and analysis techniques for high-level languages, including object-oriented languages; (4) Design of an object-oriented database, including development of query languages and optimization methods; (5) Development of operating system support for parallel programming; (6) Algorithm development for I/O efficiency and incremental computation; (7) Determining the computational complexity of ML type inference; (8) A new architecture for programmable systolic arrays; and (9) New parallel algorithms for the graph partitioning problem and proof that key heuristics for it, including simulated annealing, are P-complete. We experimented with object-based methods for programming directly in terms of conceptual models, object-oriented language design, computer program optimization, and object-oriented database construction. We also examined the expressive power and optimization of database/programming languages, and the expression and exploitation of parallelism. The theoretical and experimental fruits of this research are being widely distributed and used, and we expect these results to have a strong influence on future design environments.