scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Randomized Distributed Edge Coloring via an Extension of the Chernoff--Hoeffding Bounds

01 Apr 1997-SIAM Journal on Computing (Society for Industrial and Applied Mathematics)-Vol. 26, Iss: 2, pp 350-368
TL;DR: Fast and simple randomized algorithms for edge coloring a graph in the synchronous distributed point-to-point model of computation and new techniques for proving upper bounds on the tail probabilities of certain random variables which are not stochastically independent are introduced.
Abstract: Certain types of routing, scheduling, and resource-allocation problems in a distributed setting can be modeled as edge-coloring problems We present fast and simple randomized algorithms for edge coloring a graph in the synchronous distributed point-to-point model of computation Our algorithms compute an edge coloring of a graph $G$ with $n$ nodes and maximum degree $\Delta$ with at most $16 \Delta + O(\log^{1+ \delta} n)$ colors with high probability (arbitrarily close to 1) for any fixed $\delta > 0$; they run in polylogarithmic time The upper bound on the number of colors improves upon the $(2 \Delta - 1)$-coloring achievable by a simple reduction to vertex coloring To analyze the performance of our algorithms, we introduce new techniques for proving upper bounds on the tail probabilities of certain random variables The Chernoff--Hoeffding bounds are fundamental tools that are used very frequently in estimating tail probabilities However, they assume stochastic independence among certain random variables, which may not always hold Our results extend the Chernoff--Hoeffding bounds to certain types of random variables which are not stochastically independent We believe that these results are of independent interest and merit further study

Summary (2 min read)

Introduction

  • Gene shortening in small genomes has been observed in a number of obligate intracellular symbionts (parasitic and mutualistic) including diverse groups such as symbiotic bacteria in aphids (Charles et al. 1999), a microsporidium infecting humans (Vivares et al. 2002), and the vestigial nuclear genomes of the nucleomorphs of cryptomonads (Cavalier-Smith 2002).
  • It has been suggested that the reductions of gene lengths and genome size which are observed in all these organisms are a consequence of their intimate symbiotic life style (Wernegreen 2002).
  • If reductions in gene length and genome size are general features of obligate intracellular symbionts, the authors expect that mitochondria should have shorter genes than their putative ancestors and, further, that this reduction in gene length covaries with the reduction in genome size.
  • Second, most of the mitochondrial genes which are essential for organellar function were transferred to the nuclear genome.

Materials and Methods

  • Data for a-proteobacteria were collected from the http:// www.ncbi.nlm.nih.gov/PMGifs/Genomes/eub_g.html.
  • For species whose mitochondrial genome has only partially been analyzed, annotated sequence information was obtained from the http://oberon.rug.ac.be: 8080/rRNA/ Web site.
  • For the statistical analysis the authors used Spearman rank correlations, as this test makes no assumption about the distribution of the data.
  • The method is based on the use of monophyletic groups (based on information from http:// www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/) as inde- pendent events in the evolution of the traits analyzed.
  • For the analysis of all genes combined, the authors weighted all three gene groups (ssu/lsu rRNAs, coxI-III, and cytb) equal.

Covariation of Gene Length with Genome Size

  • The shortest and the longest combined lengths of coxI–III, the lengths of cytb, and the combined lengths of ssu/lsu rRNAs across species vary by a factor of 1.59, 1.4, and 4.03, respectively.
  • Figure 2 and 3 reveal that in smaller genomes (6–20 kb) the slope of the gene length/genome size relationship is much steeper than for larger genomes (>20 kb).
  • The length of the combined rRNAs of mitochondria is often longer than that of the corresponding gene products in a-proteobacteria.
  • To test if the observed gene length/genome size relationship is only found across the entire data set, or whether it indicates coevolution within different taxa, the authors tested for correlations between the combined gene length within a set of monophyletic taxa, using the comparative method for correlated traits of Burt (1989).
  • The two taxa with negative covariances were the Stramenopiles and the Nematoda.

AT Content Does Not Explain Gene Size/Genome Length Covariance

  • A potential explanation for the observed shrinkage of coxI–III and cytb could be related to the fact that the ATG start codon and all three translational stop codons (TAA/TAG/TGA) are rich in AT.
  • Organellar genomes, possibly due to the nature of the DNA damage to which they are exposed, have a tendency to become AT rich (Hothe authors et al. 2000).
  • Should the observed shortening of coxI–III and cytb mainly be caused by a high AT content, the authors would predict that the AT content of mitochondrial genomes should be negatively correlated with gene lengths.
  • As rRNAs are not translated, a higher density of start and stop codons cannot explain the observed gene length/genome size relationship.

Discussion

  • While it is clear that reduction of mitochondrial coding capacity during evolution is mainly caused by loss and/or transfer of genes to the nucleus, their results show that shortening of the remaining mitochondrial genes contributes to the process.
  • CoxI–III, combined length of coxI–III; ssu/lsu rRNAs, combined length of ssu/lsu rRNAs; all genes, length of all genes combined after standardizing their means and variances; ns, nonsignificant.
  • As a consequence, the covariance between genome size and gene length is expected to be strongest in smallest genomes, which is in agreement with the data (Table 1, Fig. 1).
  • Mitochondrial genes are on average shorter than those of their bacterial homologs (Fig. 1).
  • The authors data seem to indicate that the length of the protein coding regions (coxI–III and cytb) did not expand, while the rRNA genes expanded and became even longer than their bacterial counterparts.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report






Citations
More filters
Book
19 Oct 2009
TL;DR: In this paper, the authors present a coherent and unified treatment of probabilistic techniques for obtaining high probability estimates on the performance of randomized algorithms, from the basic tool kit from the Chernoff-Hoeffding (CH) bounds to more sophisticated techniques like Martingales and isoperimetric inequalities, as well as some recent developments like Talagrand's inequality, transportation cost inequalities, and log-Sobolev inequalities.
Abstract: Randomized algorithms have become a central part of the algorithms curriculum based on their increasingly widespread use in modern applications. This book presents a coherent and unified treatment of probabilistic techniques for obtaining high- probability estimates on the performance of randomized algorithms. It covers the basic tool kit from the Chernoff-Hoeffding (CH) bounds to more sophisticated techniques like Martingales and isoperimetric inequalities, as well as some recent developments like Talagrand's inequality, transportation cost inequalities, and log-Sobolev inequalities. Along the way, variations on the basic theme are examined, such as CH bounds in dependent settings. The authors emphasize comparative study of the different methods, highlighting respective strengths and weaknesses in concrete example applications. The exposition is tailored to discrete settings sufficient for the analysis of algorithms, avoiding unnecessary measure-theoretic details, thus making the book accessible to computer scientists as well as probabilists and discrete mathematicians.

1,028 citations

Journal ArticleDOI
TL;DR: An O(N/sup k/(k+1)/) query quantum algorithm is given for the generalization of element distinctness in which the authors have to find k equal items among N items.
Abstract: We use quantum walks to construct a new quantum algorithm for element distinctness and its generalization. For element distinctness (the problem of finding two equal items among $N$ given items), we get an $O(N^{2/3})$ query quantum algorithm. This improves the previous $O(N^{3/4})$ quantum algorithm of Buhrman et al. [SIAM J. Comput., 34 (2005), pp. 1324-1330] and matches the lower bound of Aaronson and Shi [J. ACM, 51 (2004), pp. 595-605]. We also give an $O(N^{k/(k+1)})$ query quantum algorithm for the generalization of element distinctness in which we have to find $k$ equal items among $N$ items.

593 citations

Posted Content
TL;DR: In this article, the authors used quantum walks to construct a new O(n 2/3 ) query quantum algorithm for element distinctness and its generalization, which was later improved to O(k/(k+1) ).
Abstract: We use quantum walks to construct a new quantum algorithm for element distinctness and its generalization. For element distinctness (the problem of finding two equal items among N given items), we get an O(N^{2/3}) query quantum algorithm. This improves the previous O(N^{3/4}) query quantum algorithm of Buhrman et.al. (quant-ph/0007016) and matches the lower bound by Shi (quant-ph/0112086). The algorithm also solves the generalization of element distinctness in which we have to find k equal items among N items. For this problem, we get an O(N^{k/(k+1)}) query quantum algorithm.

524 citations

Journal ArticleDOI
TL;DR: In this article, a Gaussian process prior is used to determine the associated space of functions, its reproducing-kernel Hilbert space (RKHS), and the expected improvement is known to converge on the minimum of any function in its RKHS.
Abstract: In the efficient global optimization problem, we minimize an unknown function f, using as few observations f(x) as possible. It can be considered a continuum-armed-bandit problem, with noiseless data, and simple regret. Expected-improvement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussian-process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.

413 citations

Posted Content
TL;DR: In this paper, the authors consider the problem of minimizing an unknown function f, using as few evaluations f(x) as possible, using a Gaussian process prior which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS).
Abstract: Efficient global optimization is the problem of minimizing an unknown function f, using as few evaluations f(x) as possible. It can be considered as a continuum-armed bandit problem, with noiseless data and simple regret. Expected improvement is perhaps the most popular method for solving this problem; the algorithm performs well in experiments, but little is known about its theoretical properties. Implementing expected improvement requires a choice of Gaussian process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in the RKHS. We begin by providing convergence rates for this procedure. The rates are optimal for functions of low smoothness, and we modify the algorithm to attain optimal rates for smoother functions. For practitioners, however, these results are somewhat misleading. Priors are typically not held fixed, but depend on parameters estimated from the data. For standard estimators, we show this procedure may never discover the minimum of f. We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior.

314 citations

References
More filters
Book
01 Jan 1969

16,023 citations

Book ChapterDOI
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Abstract: Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr {S – ES ≥ nt} depend only on the endpoints of the ranges of the summands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.

8,655 citations

Book
01 Jan 1991
TL;DR: A particular set of problems - all dealing with “good” colorings of an underlying set of points relative to a given family of sets - is explored.
Abstract: The use of randomness is now an accepted tool in Theoretical Computer Science but not everyone is aware of the underpinnings of this methodology in Combinatorics - particularly, in what is now called the probabilistic Method as developed primarily by Paul Erdoős over the past half century. Here I will explore a particular set of problems - all dealing with “good” colorings of an underlying set of points relative to a given family of sets. A central point will be the evolution of these problems from the purely existential proofs of Erdős to the algorithmic aspects of much interest to this audience.

6,594 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.

3,760 citations

Journal ArticleDOI
TL;DR: This model focuses on the issue of locality in distributed processing, namely, to what extent a global solution to a computational problem can be obtained from locally available data.
Abstract: This paper concerns a number of algorithmic problems on graphs and how they may be solved in a distributed fashion. The computational model is such that each node of the graph is occupied by a processor which has its own ID. Processors are restricted to collecting data from others which are at a distance at most t away from them in t time units, but are otherwise computationally unbounded. This model focuses on the issue of locality in distributed processing, namely, to what extent a global solution to a computational problem can be obtained from locally available data.Three results are proved within this model: • A 3-coloring of an n-cycle requires time $\Omega (\log ^ * n)$. This bound is tight, by previous work of Cole and Vishkin. • Any algorithm for coloring the d-regular tree of radius r which runs for time at most $2r/3$ requires at least $\Omega (\sqrt d )$ colors. • In an n-vertex graph of largest degree $\Delta $, an $O(\Delta ^2 )$-coloring may be found in time $O(\log ^ * n)$.

1,020 citations