scispace - formally typeset
Search or ask a question

Showing papers on "Greedy algorithm published in 2008"


Journal ArticleDOI
TL;DR: The conjectured hardness of maximizing modularity both in the general case and with the restriction to cuts is proved and an Integer Linear Programming formulation is given.
Abstract: Modularity is a recently introduced quality measure for graph clusterings. It has immediately received considerable attention in several disciplines, particularly in the complex systems literature, although its properties are not well understood. We study the problem of finding clusterings with maximum modularity, thus providing theoretical foundations for past and present work based on this measure. More precisely, we prove the conjectured hardness of maximizing modularity both in the general case and with the restriction to cuts and give an Integer Linear Programming formulation. This is complemented by first insights into the behavior and performance of the commonly applied greedy agglomerative approach.

1,201 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed two algorithms for estimating regression coefficients with a lasso penalty, one based on greedy coordinate descent and another based on Edgeworth's algorithm for ordinary l1 regression.
Abstract: Imposition of a lasso penalty shrinks parameter estimates toward zero and performs continuous model selection. Lasso penalized regression is capable of handling linear regression problems where the number of predictors far exceeds the number of cases. This paper tests two exceptionally fast algorithms for estimating regression coefficients with a lasso penalty. The previously known l2 algorithm is based on cyclic coordinate descent. Our new l1 algorithm is based on greedy coordinate descent and Edgeworth’s algorithm for ordinary l1 regression. Each algorithm relies on a tuning constant that can be chosen by cross-validation. In some regression problems it is natural to group parameters and penalize parameters group by group rather than separately. If the group penalty is proportional to the Euclidean norm of the parameters of the group, then it is possible to majorize the norm and reduce parameter estimation to l2 regression with a lasso penalty. Thus, the existing algorithm can be extended to novel settings. Each of the algorithms discussed is tested via either simulated or real data or both. The Appendix proves that a greedy form of the l2 algorithm converges to the minimum value of the objective function.

821 citations


Journal ArticleDOI
TL;DR: This work proves that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions and suggests a new and less greedy criterion for training RBMs within DBNs.
Abstract: Deep belief networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton, Osindero, and Teh (2006) along with a greedy layer-wise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a restricted Boltzmann machine (RBM), used to represent one layer of the model. Restricted Boltzmann machines are interesting because inference is easy in them and because they have been successfully used as building blocks for training deeper models. We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions. We then study the question of whether DBNs with more layers are strictly more powerful in terms of representational power. This suggests a new and less greedy criterion for training RBMs within DBNs.

800 citations


Proceedings ArticleDOI
17 May 2008
TL;DR: A randomized continuous greedy algorithm is developed which achieves a (1-1/e)-approximation for the Submodular Welfare Problem in the value oracle model and is shown to have a potential of wider applicability on the examples of the Generalized Assignment Problem and the AdWords Assignment Problem.
Abstract: In the Submodular Welfare Problem, m items are to be distributed among n players with utility functions wi: 2[m] → R+. The utility functions are assumed to be monotone and submodular. Assuming that player i receives a set of items Si, we wish to maximize the total utility ∑i=1n wi(Si). In this paper, we work in the value oracle model where the only access to the utility functions is through a black box returning wi(S) for a given set S. Submodular Welfare is in fact a special case of the more general problem of submodular maximization subject to a matroid constraint: max{f(S): S ∈ I}, where f is monotone submodular and I is the collection of independent sets in some matroid. For both problems, a greedy algorithm is known to yield a 1/2-approximation [21, 16]. In special cases where the matroid is uniform (I = S: |S| ≤ k) [20] or the submodular function is of a special type [4, 2], a (1-1/e)-approximation has been achieved and this is optimal for these problems in the value oracle model [22, 6, 15]. A (1-1/e)-approximation for the general Submodular Welfare Problem has been known only in a stronger demand oracle model [4], where in fact 1-1/e can be improved [9]. In this paper, we develop a randomized continuous greedy algorithm which achieves a (1-1/e)-approximation for the Submodular Welfare Problem in the value oracle model. We also show that the special case of n equal players is approximation resistant, in the sense that the optimal (1-1/e)-approximation is achieved by a uniformly random solution. Using the pipage rounding technique [1, 2], we obtain a (1-1/e)-approximation for submodular maximization subject to any matroid constraint. The continuous greedy algorithm has a potential of wider applicability, which we demonstrate on the examples of the Generalized Assignment Problem and the AdWords Assignment Problem.

637 citations


Proceedings ArticleDOI
01 Oct 2008
TL;DR: This paper presents a novel iterative greedy reconstruction algorithm for practical compressed sensing, called the sparsity adaptive matching pursuit, which provides a generalized greedy reconstruction framework in which the orthogonal matching pursuit and the subspace pursuit can be viewed as its special cases.
Abstract: This paper presents a novel iterative greedy reconstruction algorithm for practical compressed sensing (CS), called the sparsity adaptive matching pursuit (SAMP). Compared with other state-of-the-art greedy algorithms, the most innovative feature of the SAMP is its capability of signal reconstruction without prior information of the sparsity. This makes it a promising candidate for many practical applications when the number of non-zero (significant) coefficients of a signal is not available. The proposed algorithm adopts a similar flavor of the EM algorithm, which alternatively estimates the sparsity and the true support set of the target signals. In fact, SAMP provides a generalized greedy reconstruction framework in which the orthogonal matching pursuit and the subspace pursuit can be viewed as its special cases. Such a connection also gives us an intuitive justification of trade-offs between computational complexity and reconstruction performance. While the SAMP offers a comparably theoretical guarantees as the best optimization-based approach, simulation results show that it outperforms many existing iterative algorithms, especially for compressible signals.

517 citations


Proceedings ArticleDOI
07 Apr 2008
TL;DR: It is proved that the problem of achieving (k,delta) -anonymity by space translation with minimum distortion is NP-hard, and a greedy algorithm based on clustering and enhanced with ad hoc pre-processing and outlier removal techniques is proposed.
Abstract: Preserving individual privacy when publishing data is a problem that is receiving increasing attention. According to the fc-anonymity principle, each release of data must be such that each individual is indistinguishable from at least k - 1 other individuals. In this paper we study the problem of anonymity preserving data publishing in moving objects databases. We propose a novel concept of k-anonymity based on co-localization that exploits the inherent uncertainty of the moving object's whereabouts. Due to sampling and positioning systems (e.g., GPS) imprecision, the trajectory of a moving object is no longer a polyline in a three-dimensional space, instead it is a cylindrical volume, where its radius delta represents the possible location imprecision: we know that the trajectory of the moving object is within this cylinder, but we do not know exactly where. If another object moves within the same cylinder they are indistinguishable from each other. This leads to the definition of (k,delta) -anonymity for moving objects databases. We first characterize the (k, delta)-anonymity problem and discuss techniques to solve it. Then we focus on the most promising technique by the point of view of information preservation, namely space translation. We develop a suitable measure of the information distortion introduced by space translation, and we prove that the problem of achieving (k,delta) -anonymity by space translation with minimum distortion is NP-hard. Faced with the hardness of our problem we propose a greedy algorithm based on clustering and enhanced with ad hoc pre-processing and outlier removal techniques. The resulting method, named NWA (Never Walk .Alone), is empirically evaluated in terms of data quality and efficiency. Data quality is assessed both by means of objective measures of information distortion, and by comparing the results of the same spatio-temporal range queries executed on the original database and on the (k, delta)-anonymized one. Experimental results show that for a wide range of values of delta and k, the relative error introduced is kept low, confirming that NWA produces high quality (k, delta)-anonymized data.

507 citations


Posted Content
TL;DR: This work performs asymptotically optimal Bayesian inference using belief propagation (BP) decoding, which represents the CS encoding matrix as a graphical model, and focuses on a two-state mixture Gaussian model that is easily adapted to other signal models.
Abstract: Compressive sensing (CS) is an emerging field based on the revelation that a small collection of linear projections of a sparse signal contains enough information for stable, sub-Nyquist signal acquisition. When a statistical characterization of the signal is available, Bayesian inference can complement conventional CS methods based on linear programming or greedy algorithms. We perform approximate Bayesian inference using belief propagation (BP) decoding, which represents the CS encoding matrix as a graphical model. Fast computation is obtained by reducing the size of the graphical model with sparse encoding matrices. To decode a length-N signal containing K large coefficients, our CS-BP decoding algorithm uses O(Klog(N)) measurements and O(Nlog^2(N)) computation. Finally, although we focus on a two-state mixture Gaussian model, CS-BP is easily adapted to other signal models.

469 citations


Journal ArticleDOI
TL;DR: A locally competitive algorithm (LCA) is described that solves a collection of sparse coding principles minimizing a weighted combination of mean-squared error and a coefficient cost function to produce coefficients with sparsity levels comparable to the most popular centralized sparse coding algorithms while being readily suited for neural implementation.
Abstract: While evidence indicates that neural systems may be employing sparse approximations to represent sensed stimuli, the mechanisms underlying this ability are not understood. We describe a locally competitive algorithm (LCA) that solves a collection of sparse coding principles minimizing a weighted combination of mean-squared error and a coefficient cost function. LCAs are designed to be implemented in a dynamical system composed of many neuron-like elements operating in parallel. These algorithms use thresholding functions to induce local (usually one-way) inhibitory competitions between nodes to produce sparse representations. LCAs produce coefficients with sparsity levels comparable to the most popular centralized sparse coding algorithms while being readily suited for neural implementation. Additionally, LCA coefficients for video sequences demonstrate inertial properties that are both qualitatively and quantitatively more regular (i.e., smoother and more predictable) than the coefficients produced by greedy algorithms.

453 citations


Journal ArticleDOI
01 Aug 2008
TL;DR: This paper aims to solve the infinite-time optimal tracking control problem for a class of discrete-time nonlinear systems using the greedy heuristic dynamic programming (HDP) iteration algorithm, and defines a new type of performance index.
Abstract: In this paper, we aim to solve the infinite-time optimal tracking control problem for a class of discrete-time nonlinear systems using the greedy heuristic dynamic programming (HDP) iteration algorithm. A new type of performance index is defined because the existing performance indexes are very difficult in solving this kind of tracking problem, if not impossible. Via system transformation, the optimal tracking problem is transformed into an optimal regulation problem, and then, the greedy HDP iteration algorithm is introduced to deal with the regulation problem with rigorous convergence analysis. Three neural networks are used to approximate the performance index, compute the optimal control policy, and model the nonlinear system for facilitating the implementation of the greedy HDP iteration algorithm. An example is given to demonstrate the validity of the proposed optimal tracking control scheme.

447 citations


Proceedings ArticleDOI
19 May 2008
TL;DR: A novel dynamic greedy algorithm for the formation of the clusters of cooperating BSs is presented and it is shown that a dynamic clustering approach with a cluster consisting of 2 cells outperforms static coordination schemes with much larger cluster sizes.
Abstract: Multi-cell cooperative processing (MCP) has recently attracted a lot of attention because of its potential for co-channel interference (CCI) mitigation and spectral efficiency increase. MCP inevitably requires increased signaling overhead and inter-base communication. Therefore in practice, only a limited number of base stations (BSs) can cooperate in order for the overhead to be affordable. The intrinsic problem of which BSs shall cooperate in a realistic scenario has been only partially investigated. In this contribution linear beamforming has been considered for the sum-rate maximisation of the uplink. A novel dynamic greedy algorithm for the formation of the clusters of cooperating BSs is presented for a cellular network incorporating MCP. This approach is chosen to be evaluated under a fair MS scheduling scenario (round-robin). The objective of the clustering algorithm is sum-rate maximisation of the already selected MSs. The proposed cooperation scheme is compared with some fixed cooperation clustering schemes. It is shown that a dynamic clustering approach with a cluster consisting of 2 cells outperforms static coordination schemes with much larger cluster sizes.

434 citations


Proceedings ArticleDOI
13 Apr 2008
TL;DR: A novel tree-based multichannel scheme for data collection applications, which allocates channels to disjoint trees and exploits parallel transmissions among trees and outperforms other schemes in dense networks with a small number of channels is proposed.
Abstract: This paper demonstrates how to use multiple channels to improve communication performance in Wireless Sensor Networks (WSNs). We first investigate multi-channel realities in WSNs through intensive empirical experiments with Micaz motes. Our study shows that current multi-channel protocols are not suitable for WSNs, because of the small number of available channels and unavoidable time errors found in real networks. With these observations, we propose a novel tree-based multichannel scheme for data collection applications, which allocates channels to disjoint trees and exploits parallel transmissions among trees. In order to minimize interference within trees, we define a new channel assignment problem which is proven NP- complete. Then we propose a greedy channel allocation algorithm which outperforms other schemes in dense networks with a small number of channels.We implement our protocol, called TMCP, in a real testbed. Through both simulation and real experiments, we show that TMCP can significantly improve network throughput and reduce packet losses. More importantly, evaluation results show that TMCP better accommodates multi-channel realities found in WSNs than other multi-channel protocols.

Journal ArticleDOI
TL;DR: An integrated model, a mixed-integer linear program, is presented, to solve the ship-scheduling and the cargo-routing problems, simultaneously, and an efficient iterative search algorithm is proposed to generate schedules for ships.
Abstract: Acommon problem faced by carriers in liner shipping is the design of their service network. Given a set of demands to be transported and a set of ports, a carrier wants to design service routes for its ships as efficiently as possible, using the underlying facilities. Furthermore, the profitability of the service routes designed depends on the paths chosen to ship the cargo. We present an integrated model, a mixed-integer linear program, to solve the ship-scheduling and the cargo-routing problems, simultaneously. The proposed model incorporates relevant constraints, such as the weekly frequency constraint on the operated routes, and emerging trends, such as the transshipment of cargo between two or more service routes. To solve the mixed-integer program, we propose algorithms that exploit the separability of the problem. More specifically, a greedy heuristic, a column generation-based algorithm, and a two-phase Benders decomposition-based algorithm are developed, and their computational efficiency in terms of the solution quality and the computational time taken is discussed. An efficient iterative search algorithm is proposed to generate schedules for ships. Computational experiments are performed on randomly generated instances simulating real life with up to 20 ports and 100 ships. Our results indicate high percentage utilization of ships' capacities and a significant number of transshipments in the final solution.

Journal ArticleDOI
TL;DR: This paper designs centralized and distributed algorithms for the problem of assigning channels to communication links in the network with the objective of minimizing the overall network interference, and develops a semidefinite program and a linear program formulation of the optimization problem to obtain lower bounds onOverall network interference.
Abstract: In this paper, we consider multihop wireless mesh networks, where each router node is equipped with multiple radio interfaces, and multiple channels are available for communication. We address the problem of assigning channels to communication links in the network with the objective of minimizing the overall network interference. Since the number of radios on any node can be less than the number of available channels, the channel assignment must obey the constraint that the number of different channels assigned to the links incident on any node is at most the number of radio interfaces on that node. The above optimization problem is known to be NP-hard. We design centralized and distributed algorithms for the above channel assignment problem. To evaluate the quality of the solutions obtained by our algorithms, we develop a semidefinite program and a linear program formulation of our optimization problem to obtain lower bounds on overall network interference. Empirical evaluations on randomly generated network graphs show that our algorithms perform close to the above established lower bounds, with the difference diminishing rapidly with increase in number of radios. Also, ns-2 simulations, as well as experimental studies on testbed, demonstrate the performance potential of our channel assignment algorithms in 802.11-based multiradio mesh networks.

Journal ArticleDOI
TL;DR: Two new IG algorithms are proposed for a complex flowshop problem that results from the consideration of sequence dependent setup times on machines, a characteristic that is often found in industrial settings.

Journal ArticleDOI
TL;DR: In this paper, a new semidefinite relaxation is proposed to solve the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination.
Abstract: Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefficients, with total complexity O(n3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n3), per pattern. We discuss applications in subset selection and sparse recovery and show on artificial examples and biological data that our algorithm does provide globally optimal solutions in many cases.

Proceedings Article
20 Jan 2008
TL;DR: An online assignment problem, motivated by Adwords Allocation, in which queries are to be assigned to bidders with budget constraints is studied, with a tight analysis of Greedy in this model showing that it has a competitive ratio of 1 - 1/e for maximizing the value of the assignment.
Abstract: We study an online assignment problem, motivated by Adwords Allocation, in which queries are to be assigned to bidders with budget constraints. We analyze the performance of the Greedy algorithm (which assigns each query to the highest bidder) in a randomized input model with queries arriving in a random permutation. Our main result is a tight analysis of Greedy in this model showing that it has a competitive ratio of 1 - 1/e for maximizing the value of the assignment. We also consider the more standard i.i.d. model of input, and show that our analysis holds there as well. This is to be contrasted with the worst case analysis of [MSVV05] which shows that Greedy has a ratio of 1/2, and that the optimal algorithm presented there has a ratio of 1 - 1/e. The analysis of Greedy is important in the Adwords setting because it is the natural allocation algorithm for an auction-style process. From a theoretical perspective, our result simplifies and generalizes the classic algorithm of Karp, Vazirani and Vazirani for online bipartite matching. Our results include a new proof to show that the Ranking alforithm of [KVV90] has a ratio of 1 - 1/e in the worst case. It has been recently discovered [KV07] (independent of our results) that one of the crucial lemmas in [KVV90], related to a certain reduction, is incorrect. Our proof is direct, in that it does not go via such a reduction, which also enables us to generalize the analysis to our online assignment problem.

Journal ArticleDOI
TL;DR: A family of greedy CIT sample generation algorithms that exploit calculations made by modern Boolean satisfiability (SAT) solvers to prune the search space of the CIT problem are developed.
Abstract: Researchers have explored the application of combinatorial interaction testing (CIT) methods to construct samples to drive systematic testing of software system configurations. Applying CIT to highly-configurable software systems is complicated by the fact that, in many such systems, there are constraints between specific configuration parameters that render certain combinations invalid. Many CIT algorithms lack a mechanism to avoid these. In recent work, automated constraint solving methods have been combined with search-based CIT construction methods to address the constraint problem with promising results. However, these techniques can incur a non-trivial overhead. In this paper, we build upon our previous work to develop a family of greedy CIT sample generation algorithms that exploit calculations made by modern Boolean satisfiability (SAT) solvers to prune the search space of the CIT problem. We perform a comparative evaluation of the cost-effectiveness of these algorithms on four real-world highly-configurable software systems and on a population of synthetic examples that share the characteristics of those systems. In combination our techniques reduce the cost of CIT in the presence of constraints to 30 percent of the cost of widely-used unconstrained CIT methods without sacrificing the quality of the solutions.

Journal ArticleDOI
15 Aug 2008
TL;DR: A novel combinatorial approach based on computing max-cuts in certain graphs derived from the sequenced fragments of a human individual to infer haplotypes and demonstrates that the haplotypes inferred using HapCUT are significantly more accurate than the greedy heuristic and a previously published method, Fast Hare.
Abstract: Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. Results: We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20–25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project. Availability: A program implementing HapCUT is available on

Journal ArticleDOI
TL;DR: A novel minutiae-based fingerprint matching algorithm that ranks 1st on DB3, the most difficult database in FVC2002, and on the average ranks 2nd on all 4 databases.

Journal ArticleDOI
TL;DR: This paper describes a matrix decomposition formulation for Boolean data, the Discrete Basis Problem, and gives a simple greedy algorithm for solving it and shows how it can be solved using existing methods.
Abstract: Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how the observed data can be expressed as combinations of the basis vectors. Decomposition methods have been studied extensively, but many methods return real-valued matrices. Interpreting real-valued factor matrices is hard if the original data is Boolean. In this paper, we describe a matrix decomposition formulation for Boolean data, the Discrete Basis Problem. The problem seeks for a Boolean decomposition of a binary matrix, thus allowing the user to easily interpret the basis vectors. We also describe a variation of the problem, the Discrete Basis Partitioning Problem. We show that both problems are NP-hard. For the Discrete Basis Problem, we give a simple greedy algorithm for solving it; for the Discrete Basis Partitioning Problem we show how it can be solved using existing methods. We present experimental results for the greedy algorithm and compare it against other, well known methods. Our algorithm gives intuitive basis vectors, but its reconstruction error is usually larger than with the real-valued methods. We discuss about the reasons for this behavior.

Proceedings ArticleDOI
13 Apr 2008
TL;DR: A number of new analytic results characterizing the performance limits of greedy maximal scheduling are provided, including an equivalent characterization of the efficiency ratio of GMS through a topological property called the local-pooling factor of the network graph.
Abstract: In this paper, we characterize the performance of an important class of scheduling schemes, called greedy maximal scheduling (GMS), for multi-hop wireless networks. While a lower bound on the throughput performance of GMS is relatively well-known in the simple node-exclusive interference model, it has not been thoroughly explored in the more general K-hop interference model. Moreover, empirical observations suggest that the known bounds are quite loose, and that the performance of GMS is often close to optimal. In this paper, we provide a number of new analytic results characterizing the performance limits of GMS. We first provide an equivalent characterization of the efficiency ratio of GMS through a topological property called the local-pooling factor of the network graph. We then develop an iterative procedure to estimate the local-pooling factor under a large class of network topologies and interference models. We use these results to study the worst-case efficiency ratio of GMS on two classes of network topologies. First, we show how these results can be applied to tree networks to prove that GMS achieves the full capacity region in tree networks under the K-hop interference model. Second, we show that the worst-case efficiency ratio of GMS in geometric network graphs is between 1/6 and 1/3.

Journal ArticleDOI
TL;DR: DIALIGN-TX is presented, a substantial improvement of DIAL IGN-T that combines the previous greedy algorithm with a progressive alignment approach and produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly.
Abstract: DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach. Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences. On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.

Journal ArticleDOI
TL;DR: This work improves on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm, and proves convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary.
Abstract: We consider the problem of approximating a given element $f$ from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118--2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: It is found that a greedy algorithm that recursively splits the set of categories into the two minimally confused subsets achieves 5-20 fold speedups at a small cost in classification performance.
Abstract: The computational complexity of current visual categorization algorithms scales linearly at best with the number of categories. The goal of classifying simultaneously Ncat = 104 - 105 visual categories requires sub-linear classification costs. We explore algorithms for automatically building classification trees which have, in principle, logNcat complexity. We find that a greedy algorithm that recursively splits the set of categories into the two minimally confused subsets achieves 5-20 fold speedups at a small cost in classification performance. Our approach is independent of the specific classification algorithm used. A welcome by-product of our algorithm is a very reasonable taxonomy of the Caltech-256 dataset.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of approximating a given element f from a Hilbert space by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory.
Abstract: We consider the problem of approximating a given element f from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118–2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.

Journal ArticleDOI
TL;DR: This paper provides new results on computing simultaneous sparse approximations of multichannel signals over redundant dictionaries using two greedy algorithms, p-thresholding and p-SOMP, and shows that, if the dictionary satisfies a uniform uncertainty principle, the probability that simultaneous OMP fails to recover any sufficiently sparse set of atoms gets increasingly smaller as the number of channels increases.
Abstract: This paper provides new results on computing simultaneous sparse approximations of multichannel signals over redundant dictionaries using two greedy algorithms. The first one, p-thresholding, selects the S atoms that have the largest p-correlation while the second one, p-simultaneous matching pursuit (p-SOMP), is a generalisation of an algorithm studied by Tropp in (Signal Process. 86:572–588, 2006). We first provide exact recovery conditions as well as worst case analyses of all algorithms. The results, expressed using the standard cumulative coherence, are very reminiscent of the single channel case and, in particular, impose stringent restrictions on the dictionary. We unlock the situation by performing an average case analysis of both algorithms. First, we set up a general probabilistic signal model in which the coefficients of the atoms are drawn at random from the standard Gaussian distribution. Second, we show that under this model, and with mild conditions on the coherence, the probability that p-thresholding and p-SOMP fail to recover the correct components is overwhelmingly small and gets smaller as the number of channels increases. Furthermore, we analyse the influence of selecting the set of correct atoms at random. We show that, if the dictionary satisfies a uniform uncertainty principle (Candes and Tao, IEEE Trans. Inf. Theory, 52(12):5406–5425, 2006), the probability that simultaneous OMP fails to recover any sufficiently sparse set of atoms gets increasingly smaller as the number of channels increases.

Journal ArticleDOI
Sangho Kim1, Shashi Shekhar, Manki Min
TL;DR: This paper presents the first macroscopic approaches for the solution of contraflow network reconfiguration incorporating road capacity constraints, multiple sources, congestion factor, and scalability and shows that these approaches can reduce evacuation time by 40% or more.
Abstract: Given a transportation network having source nodes with evacuees and destination nodes, we want to find a contraflow network configuration, i.e., ideal direction for each edge, to minimize evacuation time. Contraflow is considered a potential remedy to reduce congestion during evacuations in the context of homeland security and natural disasters (e.g., hurricanes). This problem is computationally challenging because of the very large search space and the expensive calculation of evacuation time on a given network. To our knowledge, this paper presents the first macroscopic approaches for the solution of contraflow network reconfiguration incorporating road capacity constraints, multiple sources, congestion factor, and scalability. We formally define the contraflow problem based on graph theory and provide a framework of computational workload to classify our approaches. A greedy heuristic is designed to produce high quality solutions with significant performance. A bottleneck relief heuristic is developed to deal with large numbers of evacuees. We evaluate the proposed approaches both analytically and experimentally using real world datasets. Experimental results show that our contraflow approaches can reduce evacuation time by 40% or more.

Journal ArticleDOI
TL;DR: This paper extends the concept of degree from single vertex to sub-graph, and presents a formal definition of module/community in a network based on this extension, and implements a JAVA tool, MoNet, for exploring local community structures in large networks.
Abstract: In this paper, three new algorithms, a greedy algorithm, a KL-like algorithm, and an add-all algorithm, are proposed to find local optimal community structures in large networks starting from a given source vertex. The time complexity for finding a local community of all these algorithms is O(K 2d), where K is the number of vertices to be explored in the sub-graph and d is the average degree of the vertices in the sub-graph. A JAVA tool is developed based on these algorithms to identify local community structures in large networks. The results of using this tool to analyze a co-purchase network from Amazon.com show that local community structures exist in this large-scale co-purchase network. Further analyses of the identified local communities show that purchases of media items form more compact local communities than purchases of book items do, indicating that recommending digital media items to customers based on co-purchasing information in the online store is more efficient than recommending books.

Journal ArticleDOI
TL;DR: This paper model the CTC problem as a maximum cover tree (MCT) problem, determines an upper bound on the network lifetime for the MCT problem and develops a (1+w)H(M circ) approximation algorithm to solve it, which shows that the lifetime obtained is close to the upper bound.
Abstract: In this paper, we consider the connected target coverage (CTC) problem with the objective of maximizing the network lifetime by scheduling sensors into multiple sets, each of which can maintain both target coverage and connectivity among all the active sensors and the sink. We model the CTC problem as a maximum cover tree (MCT) problem and prove that the MCT problem is NP-Complete. We determine an upper bound on the network lifetime for the MCT problem and then develop a (1+w)H(M circ) approximation algorithm to solve it, where w is an arbitrarily small number, H(M circ)=1 lesilesM circ(1/i) and M circ is the maximum number of targets in the sensing area of any sensor. As the protocol cost of the approximation algorithm may be high in practice, we develop a faster heuristic algorithm based on the approximation algorithm called Communication Weighted Greedy Cover (CWGC) algorithm and present a distributed implementation of the heuristic algorithm. We study the performance of the approximation algorithm and CWGC algorithm by comparing them with the lifetime upper bound and other basic algorithms that consider the coverage and connectivity problems independently. Simulation results show that the approximation algorithm and CWGC algorithm perform much better than others in terms of the network lifetime and the performance improvement can be up to 45% than the best-known basic algorithm. The lifetime obtained by our algorithms is close to the upper bound. Compared with the approximation algorithm, the CWGC algorithm can achieve a similar performance in terms of the network lifetime with a lower protocol cost.

Journal ArticleDOI
01 Aug 2008
TL;DR: This paper is the first to formally characterize a "good" pattern tableau, based on naturally desirable properties of support, confidence and parsimony, and shows that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm.
Abstract: Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs.This paper is the first to formally characterize a "good" pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an "on-demand" algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.