scispace - formally typeset
Search or ask a question

Showing papers by "Andrei Z. Broder published in 1996"


Patent
18 Jun 1996
TL;DR: In this article, a method for facilitating the comparison of two computerized documents is proposed, which includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first sentence into a first sequence of tokens, reducing a second sentence to a second sequence of token, converting the first set of tokens to a first (multi)set of shingles and converting the second set of token to a multiple set of shingsles.
Abstract: A method for facilitating the comparison of two computerized documents. The method includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first document into a first sequence of tokens, reducing the second document into a second sequence of tokens, converting the first set of tokens to a first (multi)set of shingles, converting the second set of tokens to a second (multi)set of shingles, determining a first sketch of the first (multi)set of shingles, determining a second sketch of the second (multi)set of shingles, and comparing the first sketch and the second sketch. The sketches have a fixed size, independent of the size of the documents. The resemblance of two documents is provided using a sketch of each document. The sketches may be computed fairly fast and given two sketches the resemblance of the corresponding documents can be computed in linear time in the size of the sketches.

96 citations


Proceedings ArticleDOI
28 Jan 1996
TL;DR: A randomized polynomial time algorithm that works for almost all graphs; more precisely in the G{ sub n,m} or G{sub n,p} models, the algorithm succeeds with high probability for all edge densities above the connectivity threshold.
Abstract: Given a graph G = (V, E) and a set of pairs of vertices in V, we are interested in finding for each pair (a{sub i}, b{sub i}) a path connecting a{sub i} to b{sub i}, such that the set of paths so found is vertex-disjoint. (The problem is NP-complete for general graphs as well as for planar graphs. It is in P if the number of pairs is fixed.) Our model is that the graph is chosen first, then an adversary chooses the pairs of endpoints, subject only to obvious feasibility constraints, namely, all pairs must be disjoint, no more than a constant fraction of the vertices could be required for the paths, and not {open_quotes}too many{close_quotes} neighbors of a vertex can be endpoints. We present a randomized polynomial time algorithm that works for almost all graphs; more precisely in the G{sub n,m} or G{sub n,p} models, the algorithm succeeds with high probability for all edge densities above the connectivity threshold. The set of pairs that can be accommodated is optimal up to constant factors. Although the analysis is intricate, the algorithm itself is quite simple and suggests a practical heuristic. We include two applications of the main result,more » one in the context of circuit switching communication, the other in the context of topological embeddings of graphs.« less

30 citations


Proceedings ArticleDOI
14 Oct 1996
TL;DR: This work proves a sufficient condition for the stability of dynamic packet routing algorithms, and gives the first dynamic algorithm for routing on a butterfly with bounded buffers, which is also applicable to the recently introduced adversarial input model.
Abstract: We prove a sufficient condition for the stability of dynamic packet routing algorithms. Our approach reduces the problem of steady state analysis to the easier and better understood question of static routing. We show that certain high probability and worst case bounds on the quasistatic (finite past) performance of a routing algorithm imply bounds on the performance of the dynamic version of that algorithm. Our technique is particularly useful in analyzing routing on networks with bounded buffers where complicated dependencies make standard queuing techniques inapplicable. We present several applications of our approach. In all cases we start from a known static algorithm, and modify it to fit our framework. In particular we give the first dynamic algorithm for routing on a butterfly with bounded buffers. Both the injection rate for which the algorithm is stable, and the expected time a packet spends in the system are optimal up to constant factors. Our approach is also applicable to the recently introduced adversarial input model.

28 citations


Proceedings ArticleDOI
01 Jul 1996
TL;DR: It is proved that on the two dimension torus network the one-bend packet routing algorithm is stable for an arrival rate that is within a constant factor of the hardware bandwidth.
Abstract: We study the performance of a simple one-bend packet routing algorithm on arrays with no buffering in the routing switches, under a stochastic model in which new packets are continuously generated at each node at random times and with random destinations. We prove that on the two dimension torus network our algorithm is stable for an arrival rate that is within a constant factor of the hardware bandwidth. Furthermore, we show that in the steady state the expected time a packet spends in the system is optimal (up to a constant factor). Sharper results (in terms of the constants) are obtained for the ring (dimension one torus).

28 citations


Journal ArticleDOI
TL;DR: This work examines several simple questions of this type concerning the long-term behavior of a random walk on a finite graph, and derives tight bounds on the maximum of this objective function over all controller's strategies, and presents polynomial time algorithms for computing the optimal controller strategy.
Abstract: How much can an imperfect source of randomness affect an algorithm? We examine several simple questions of this type concerning the long-term behavior of a random walk on a finite graph. In our setup, at each step of the random walk a “controller” can, with a certain small probability, fix the next step, thus introducing a bias. We analyze the extent to which the bias can affect the limit behavior of the walk. The controller is assumed to associate a real, nonnegative, “benefit” with each state, and to strive to maximize the long-term expected benefit. We derive tight bounds on the maximum of this objective function over all controller's strategies, and present polynomial time algorithms for computing the optimal controller strategy.

23 citations


Proceedings Article
01 Jan 1996
TL;DR: The approach reduces the problem of steady state analysis to the easier and better understood question of static routing, and gives the first dynamic algorithms for routing on a butterfly or two-dimensional mesh with bounded buffers.
Abstract: We prove a sujjicient condition for the stability of dynamic packet routing algorithms. Our approach reduces the problem of steady state analysis to the easier and better understood question of static routing. We show that certain high probability and worst case bounds on the quasistatic (finite past) performance of a routing algorithm imply bounds on the performance of the dynamic version of that algorithm. Our technique is particularly useful in analyzing routing on networks with bounded buffers where complicated dependencies make standard queuing techniques inapplicable. We present several applications of our approach. In all cases we start from a known static algorithm, and mod& it to fit our framework. In particular we give the first dynamic algorithm for routing on a butte$y with bounded buffers. Both the injection rate for which the algorithm is stable, and the expected time a packet spends in the system are optimal up to constant factors. Our approach is also applicable to the recently introduced adversarial input model.

14 citations


Proceedings ArticleDOI
31 Mar 1996
TL;DR: A novel approach for compressing images of text documents based on building up a simple derived font from patterns in the image, and the results of a prototype implementation based on this approach are presented.
Abstract: We suggest a novel approach for compressing images of text documents based on building up a simple derived font from patterns in the image, and present the results of a prototype implementation based on our approach. Our prototype achieves better compression than most alternative systems, and the decompression time appears substantially shorter than other methods with the same compression rate. The method has other advantages, such as a straightforward extension to a lossy scheme that allows one to control the lossiness introduced in a well-defined manner. We believe our approach will be applicable in other domains as well.

13 citations


Patent
04 Jan 1996
TL;DR: In this article, the image is compressed by grouping the pixels of the image into a plurality of regularized groups of pixels, which are then encoded according to a frequency of pixels having identical bit patterns.
Abstract: An image stored in a memory of a computer as pixels, each pixel including a bit pattern to indicate a grey-scale level. The image is compressed by grouping the pixels of the image into a plurality of regularized groups of pixels. The partitioning of the image can be in groups of four by four adjacent pixels. Groups of pixels having identical bit patterns are identified. The groups of pixels are encoded according to a frequency of groups of pixels having identical bit patterns.

7 citations