scispace - formally typeset
Search or ask a question

Showing papers on "Greedy algorithm published in 2005"


Proceedings ArticleDOI
13 Mar 2005
TL;DR: An efficient method to extend the sensor network life time by organizing the sensors into a maximal number of set covers that are activated successively, and designing two heuristics that efficiently compute the sets, using linear programming and a greedy approach are proposed.
Abstract: A critical aspect of applications with wireless sensor networks is network lifetime. Power-constrained wireless sensor networks are usable as long as they can communicate sensed data to a processing node. Sensing and communications consume energy, therefore judicious power management and sensor scheduling can effectively extend network lifetime. To cover a set of targets with known locations when ground access in the remote area is prohibited, one solution is to deploy the sensors remotely, from an aircraft. The lack of precise sensor placement is compensated by a large sensor population deployed in the drop zone, that would improve the probability of target coverage. The data collected from the sensors is sent to a central node (e.g. cluster head) for processing. In this paper we propose un efficient method to extend the sensor network life time by organizing the sensors into a maximal number of set covers that are activated successively. Only the sensors from the current active set are responsible for monitoring all targets and for transmitting the collected data, while all other nodes are in a low-energy sleep mode. By allowing sensors to participate in multiple sets, our problem formulation increases the network lifetime compared with related work [M. Cardei et al], that has the additional requirements of sensor sets being disjoint and operating equal time intervals. In this paper we model the solution as the maximum set covers problem and design two heuristics that efficiently compute the sets, using linear programming and a greedy approach. Simulation results are presented to verify our approaches.

1,046 citations


Journal ArticleDOI
TL;DR: The results indicate that the proposed method attains a significant fraction of sum capacity and throughput of Tu and Blum's scheme and, thus, offers an attractive alternative to DP-based schemes.
Abstract: This paper considers the problem of simultaneous multiuser downlink beamforming. The idea is to employ a transmit antenna array to create multiple "beams" directed toward the individual users, and the aim is to increase throughput, measured by sum capacity. In particular, we are interested in the practically important case of more users than transmit antennas, which requires user selection. Optimal solutions to this problem can be prohibitively complex for online implementation at the base station and entail so-called Dirty Paper (DP) precoding for known interference. Suboptimal solutions capitalize on multiuser (selection) diversity to achieve a significant fraction of sum capacity at lower complexity cost. We analyze the throughput performance in Rayleigh fading of a suboptimal greedy DP-based scheme proposed by Tu and Blum. We also propose another user-selection method of the same computational complexity based on simple zero-forcing beamforming. Our results indicate that the proposed method attains a significant fraction of sum capacity and throughput of Tu and Blum's scheme and, thus, offers an attractive alternative to DP-based schemes.

654 citations


Journal ArticleDOI
TL;DR: This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions, and leads to a rigorous proof that for a linearly separable problem, AdaBoost becomes an L 1 -margin maximizer when left to run to convergence.
Abstract: Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes. as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with E → 0 step-size becomes an L 1 -margin maximizer when left to run to convergence.

451 citations


Proceedings ArticleDOI
14 Jun 2005
TL;DR: It is proved that finding minimal-cost repairs in this model is NP-complete in the size of the database, and an approach to heuristic repair-construction based on equivalence classes of attribute values is introduced.
Abstract: Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find "low cost" changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple insertions and deletions, we follow recent work to define a database repair as a set of value modifications. In this context, we introduce a novel cost framework that allows for the application of techniques from record-linkage to the search for good repairs. We prove that finding minimal-cost repairs in this model is NP-complete in the size of the database, and introduce an approach to heuristic repair-construction based on equivalence classes of attribute values. Following this approach, we define two greedy algorithms. While these simple algorithms take time cubic in the size of the database, we develop optimizations inspired by algorithms for duplicate-record detection that greatly improve scalability. We evaluate our framework and algorithms on synthetic and real data, and show that our proposed optimizations greatly improve performance at little or no cost in repair quality.

436 citations


Journal ArticleDOI
03 Oct 2005
TL;DR: This work proposes the notion of a traffic-independent base channel assignment to ease coordination and enable dynamic, efficient and flexible channel assignment, and develops a new greedy heuristic channel assignment algorithm (termed CLICA) for finding connected, low interference topologies by utilizing multiple channels.
Abstract: We consider the channel assignment problem in a multi-radio wireless mesh network that involves assigning channels to radio interfaces for achieving efficient channel utilization. We propose the notion of a traffic-independent base channel assignment to ease coordination and enable dynamic, efficient and flexible channel assignment. We present a novel formulation of the base channel assignment as a topology control problem, and show that the resulting optimization problem is NP-complete. We then develop a new greedy heuristic channel assignment algorithm (termed CLICA) for finding connected, low interference topologies by utilizing multiple channels. Our extensive simulation studies show that the proposed CLICA algorithm can provide large reduction in interference (even with a small number of radios per node), which in turn leads to significant gains in both link layer and multihop performance in 802.11-based multi-radio mesh networks.

421 citations


Proceedings ArticleDOI
28 Aug 2005
TL;DR: Overall the results demonstrate that PeopleNet, with its bazaar concept and peer-to-peer query propagation, can provide a simple and efficient mechanism for seeking information.
Abstract: People often seek information by asking other people even when they have access to vast reservoirs of information such as the Internet and libraries. This is because people are great sources of unique information, especially that which is location-specific, community-specific and time-specific. Social networking is effective because this type of information is often not easily available anywhere else. In this paper, we conceive a wireless virtual social network which mimics the way people seek information via social networking. PeopleNet is a simple, scalable and low-cost architecture for efficient information search in a distributed manner. It uses the infrastructure to propagate queries of a given type to users in specific geographic locations, called bazaars. Within each bazaar, the query is further propagated between neighboring nodes via peer-to-peer connectivity until it finds a matching query. The PeopleNet architecture can overlay easily on existing cellular infrastructure and entails minimal software installation. We identify three metrics for system performance: (i) probability of a match, (ii) time to find a match and (iii) number of matches found by a query. We describe two simple models, called the swap and spread models, for query propagation within a bazaar. We qualitatively argue that the swap model is better with respect to the performance metrics identified and demonstrate this via simulations. Next, we compute analytically the probability of match for the swap model. We show that the probability of match can be significantly improved if, prior to swapping queries, the nodes exchange some limited information about their buffer contents. We propose a simple greedy algorithm which uses this limited information to decide which queries to swap. We show via simulation that this algorithm achieves significantly better performance. Overall our results demonstrate that PeopleNet, with its bazaar concept and peer-to-peer query propagation, can provide a simple and efficient mechanism for seeking information.

337 citations


Journal ArticleDOI
TL;DR: The network reconfiguration problem of one three-feeder distribution system from the literature and one practical distribution network of Taiwan Power Company (TPC) are solved using the proposed ACSA method, the genetic algorithm (GA), and the simulated annealing (SA).

304 citations


Proceedings ArticleDOI
18 Mar 2005
TL;DR: A greedy pursuit algorithm called simultaneous orthogonal matching pursuit is presented, which proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error.
Abstract: A simple sparse approximation problem requests an approximation of a given input signal as a linear combination of T elementary signals drawn from a large, linearly dependent collection An important generalization is simultaneous sparse approximation Now one must approximate several input signals at once using different linear combinations of the same T elementary signals This formulation appears, for example, when analyzing multiple observations of a sparse signal that have been contaminated with noise A new approach to this problem is presented here: a greedy pursuit algorithm called simultaneous orthogonal matching pursuit The paper proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error This result requires that the collection of elementary signals be weakly correlated, a property that is also known as incoherence Numerical experiments demonstrate that the algorithm often succeeds, even when the inputs do not meet the hypotheses of the proof

301 citations


Proceedings ArticleDOI
13 Mar 2005
TL;DR: The virtual coordinate assignment protocol (VCap) is introduced which defines a virtual coordinate system based on hop distances which is simple and have very little requirements in terms of communication and memory overheads.
Abstract: In this paper we consider the problem of constructing a coordinate system in a sensor network where location information is not available. To this purpose we introduce the virtual coordinate assignment protocol (VCap) which defines a virtual coordinate system based on hop distances. As compared to other approaches, VCap is simple and have very little requirements in terms of communication and memory overheads. We compare by simulations the performances of greedy routing using our virtual coordinate system with the one using the physical coordinates. Results show that the virtual coordinate system can be used to efficiently support geographic routing.

290 citations


Proceedings Article
05 Dec 2005
TL;DR: This work considers an alternative discrete spectral formulation based on variational eigenvalue bounds and provides an effective greedy strategy as well as provably optimal solutions using branch-and-bound search and reveals a simple renormalization step that improves approximate solutions obtained by any continuous method.
Abstract: Sparse PCA seeks approximate sparse "eigenvectors" whose projections capture the maximal variance of data. As a cardinality-constrained and non-convex optimization problem, it is NP-hard and is encountered in a wide range of applied fields, from bio-informatics to finance. Recent progress has focused mainly on continuous approximation and convex relaxation of the hard cardinality constraint. In contrast, we consider an alternative discrete spectral formulation based on variational eigenvalue bounds and provide an effective greedy strategy as well as provably optimal solutions using branch-and-bound search. Moreover, the exact methodology used reveals a simple renormalization step that improves approximate solutions obtained by any continuous method. The resulting performance gain of discrete algorithms is demonstrated on real-world benchmark data and in extensive Monte Carlo evaluation trials.

278 citations


Proceedings ArticleDOI
23 Oct 2005
TL;DR: An O(log OPT) approximation is obtained for a generalization of the orienteering problem in which the profit for visiting each node may vary arbitrarily with time and the implications for the approximability of several basic optimization problems are interesting.
Abstract: Given an arc-weighted directed graph G = (V, A, /spl lscr/) and a pair of nodes s, t, we seek to find an s-t walk of length at most B that maximizes some given function f of the set of nodes visited by the walk. The simplest case is when we seek to maximize the number of nodes visited: this is called the orienteering problem. Our main result is a quasi-polynomial time algorithm that yields an O(log OPT) approximation for this problem when f is a given submodular set function. We then extend it to the case when a node v is counted as visited only if the walk reaches v in its time window [R(v), D(v)]. We apply the algorithm to obtain several new results. First, we obtain an O(log OPT) approximation for a generalization of the orienteering problem in which the profit for visiting each node may vary arbitrarily with time. This captures the time window problem considered earlier for which, even in undirected graphs, the best approximation ratio known [Bansal, N et al. (2004)] is O(log/sup 2/ OPT). The second application is an O(log/sup 2/ k) approximation for the k-TSP problem in directed graphs (satisfying asymmetric triangle inequality). This is the first non-trivial approximation algorithm for this problem. The third application is an O(log/sup 2/ k) approximation (in quasi-poly time) for the group Steiner problem in undirected graphs where k is the number of groups. This improves earlier ratios (Garg, N et al.) by a logarithmic factor and almost matches the inapproximability threshold on trees (Halperin and Krauthgamer, 2003). This connection to group Steiner trees also enables us to prove that the problem we consider is hard to approximate to a ratio better than /spl Omega/(log/sup 1-/spl epsi// OPT), even in undirected graphs. Even though our algorithm runs in quasi-poly time, we believe that the implications for the approximability of several basic optimization problems are interesting.

Journal ArticleDOI
TL;DR: This work presents a framework for finding point correspondences in monocular image sequences over multiple frames by using a polynomial time algorithm for a restriction of the general problem of multiframe point correspondence, which is NP-hard for three or more frames.
Abstract: This work presents a framework for finding point correspondences in monocular image sequences over multiple frames. The general problem of multiframe point correspondence is NP-hard for three or more frames. A polynomial time algorithm for a restriction of this problem is presented and is used as the basis of the proposed greedy algorithm for the general problem. The greedy nature of the proposed algorithm allows it to be used in real-time systems for tracking and surveillance, etc. In addition, the proposed algorithm deals with the problems of occlusion, missed detections, and false positives by using a single noniterative greedy optimization scheme and, hence, reduces the complexity of the overall algorithm as compared to most existing approaches where multiple heuristics are used for the same purpose. While most greedy algorithms for point tracking do not allow the entry and exit of the points from the scene, this is not a limitation for the proposed algorithm. Experiments with real and synthetic data over a wide range of scenarios and system parameters are presented to validate the claims about the performance of the proposed algorithm.

Proceedings ArticleDOI
05 Apr 2005
TL;DR: This paper presents a correlation based load distribution algorithm that aims at avoiding overload and minimizing end-to-end latency by minimizing load variance and maximizing load correlation.
Abstract: Distributed and parallel computing environments are becoming cheap and commonplace. The availability of large numbers of CPU's makes it possible to process more data at higher speeds. Stream-processing systems are also becoming more important, as broad classes of applications require results in real-time. Since load can vary in unpredictable ways, exploiting the abundant processor cycles requires effective dynamic load distribution techniques. Although load distribution has been extensively studied for the traditional pull-based systems, it has not yet been fully studied in the context of push-based continuous query processing. In this paper, we present a correlation based load distribution algorithm that aims at avoiding overload and minimizing end-to-end latency by minimizing load variance and maximizing load correlation. While finding the optimal solution for such a problem is NP-hard, our greedy algorithm can find reasonable solutions in polynomial time. We present both a global algorithm for initial load distribution and a pair-wise algorithm for dynamic load migration.

Proceedings ArticleDOI
23 Jan 2005
TL;DR: It is shown that the shortest set of loops that generate the fundamental group of any oriented combinatorial 2-manifold, with any given basepoint, can be constructed in O(n log n) time using a straightforward application of Dijkstra's shortest path algorithm.
Abstract: We describe simple greedy algorithms to construct the shortest set of loops that generates either the fundamental group (with a given basepoint) or the first homology group (over any fixed coefficient field) of any oriented 2-manifold. In particular, we show that the shortest set of loops that generate the fundamental group of any oriented combinatorial 2-manifold, with any given basepoint, can be constructed in O(n log n) time using a straightforward application of Dijkstra's shortest path algorithm. This solves an open problem of Colin de Verdiere and Lazarus.

Journal ArticleDOI
TL;DR: It is shown by simulation that the RDG outperforms previously proposed routing graphs in the context of the Greedy perimeter stateless routing (GPSR) protocol, and theoretical bounds on the quality of paths discovered using GPSR are investigated.
Abstract: We propose a new routing graph, the restricted Delaunay graph (RDG), for mobile ad hoc networks. Combined with a node clustering algorithm, the RDG can be used as an underlying graph for geographic routing protocols. This graph has the following attractive properties: 1) it is planar; 2) between any two graph nodes there exists a path whose length, whether measured in terms of topological or Euclidean distance, is only a constant times the minimum length possible; and 3) the graph can be maintained efficiently in a distributed manner when the nodes move around. Furthermore, each node only needs constant time to make routing decisions. We show by simulation that the RDG outperforms previously proposed routing graphs in the context of the Greedy perimeter stateless routing (GPSR) protocol. Finally, we investigate theoretical bounds on the quality of paths discovered using GPSR.

Proceedings ArticleDOI
22 May 2005
TL;DR: This paper presents a monotone PTAS for the generalized assignment problem with any bounded number of parameters per agent, and shows that primal-dual greedy algorithms achieve almost the same approximation ratios for PIPs as randomized rounding.
Abstract: This paper deals with the design of efficiently computable incentive compatible, or truthful, mechanisms for combinatorial optimization problems with multi-parameter agents. We focus on approximation algorithms for NP-hard mechanism design problems. These algorithms need to satisfy certain monotonicity properties to ensure truthfulness. Since most of the known approximation techniques do not fulfill these properties, we study alternative techniques.Our first contribution is a quite general method to transform a pseudopolynomial algorithm into a monotone FPTAS. This can be applied to various problems like, e.g., knapsack, constrained shortest path, or job scheduling with deadlines. For example, the monotone FPTAS for the knapsack problem gives a very efficient, truthful mechanism for single-minded multi-unit auctions. The best previous result for such auctions was a 2-approximation. In addition, we present a monotone PTAS for the generalized assignment problem with any bounded number of parameters per agent.The most efficient way to solve packing integer programs (PIPs) is LP-based randomized rounding, which also is in general not monotone. We show that primal-dual greedy algorithms achieve almost the same approximation ratios for PIPs as randomized rounding. The advantage is that these algorithms are inherently monotone. This way, we can significantly improve the approximation ratios of truthful mechanisms for various fundamental mechanism design problems like single-minded combinatorial auctions (CAs), unsplittable flow routing and multicast routing. Our approximation algorithms can also be used for the winner determination in CAs with general bidders specifying their bids through an oracle.

Journal ArticleDOI
TL;DR: In this paper, a new algorithm for the container loading problem based on the greedy randomized adaptive search procedure (GRASP) paradigm is presented. But this algorithm is not suitable for large containers.
Abstract: The container-loading problem aims to determine the arrangement of items in a container. We present GRMODGRASP, a new algorithm for the CLP based on the GRASP (greedy randomized adaptive search procedure) paradigm. We evaluate GRMODGRASP'S performance in terms of volume use and load stability and by comparing it with nine well-known algorithms. Our approach produces solutions that surpass other approaches' solutions in terms of volume use and cargo stability.

Journal ArticleDOI
05 Sep 2005
TL;DR: This paper presents a new greedy heuristic algorithm for selecting a minimal subset of a test suite T that covers all the requirements covered by T and shows how the algorithm was inspired by the concept analysis framework.
Abstract: Software testing and retesting occurs continuously during the software development lifecycle to detect errors as early as possible and to ensure that changes to existing software do not break the software. Test suites once developed are reused and updated frequently as the software evolves. As a result, some test cases in the test suite may become redundant as the software is modified over time since the requirements covered by them are also covered by other test cases. Due to the resource and time constraints for re-executing large test suites, it is important to develop techniques to minimize available test suites by removing redundant test cases. In general, the test suite minimization problem is NP complete. In this paper, we present a new greedy heuristic algorithm for selecting a minimal subset of a test suite T that covers all the requirements covered by T. We show how our algorithm was inspired by the concept analysis framework. We conducted experiments to measure the extent of test suite reduction obtained by our algorithm and prior heuristics for test suite minimization. In our experiments, our algorithm always selected same size or smaller size test suite than that selected by prior heuristics and had comparable time performance.

Journal ArticleDOI
TL;DR: This paper proposes a new greedy algorithm, called S-MIS, with the help of Steiner tree that can construct a CDS within a factor of 4:8 þ ln5 from the optimal solution and introduces the distributed version of this algorithm.
Abstract: Summary Since no fixed infrastructure and no centralized management present in wireless networks, a connected dominating set (CDS) of the graph representing the network is widely used as a virtual backbone. Constructing a minimum CDS is NP-hard. In this paper, we propose a new greedy algorithm, called S-MIS, with the help of Steiner tree that can construct a CDS within a factor of 4:8 þ ln5 from the optimal solution. We also introduce the distributed version of this algorithm. We prove that the proposed algorithm is better than the current best performance ratio which is 6.8. A simulation is conducted to compare S-MIS with its variation which is rS-MIS. The simulation shows that the sizes of the CDSs generated by S-MIS and rS-MIS are almost the same. Copyright # 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper uses a greedy algorithm to minimize the number of ungated flights and total walking distances or connection times and a hybrid of simulated annealing and tabu search is used.

Proceedings ArticleDOI
23 Jan 2005
TL;DR: The adaptivity gap is investigated for these problems: the maximum ratio between the expected values achieved by optimal adaptive and non-adaptive policies, and the hardness results for deterministic PIP are improved.
Abstract: We study stochastic variants of Packing Integer Programs (PIP) --- the problems of finding a maximum-value 0/1 vector x satisfying Ax ≤ b, with A and b nonnegative Many combinatorial problems belong to this broad class, including the knapsack problem, maximum clique, stable set, matching, hypergraph matching (aka set packing), b-matching, and others PIP can also be seen as a "multidimensional" knapsack problem where we wish to pack a maximum-value collection of items with vector-valued sizes In our stochastic setting, the vector-valued sizes of each item is known to us apriori only as a probability distribution, and the size of an item is instantiated once we commit to including the item in our solutionFollowing the framework of [3], we consider both adaptive and non-adaptive policies for solving such problems, adaptive policies having the flexibility of being able to make decisions based on the instantiated sizes of items already included in the solution We investigate the adaptivity gap for these problems: the maximum ratio between the expected values achieved by optimal adaptive and non-adaptive policies We show tight bounds on the adaptivity gap for set packing and b-matching, and we also show how to find efficiently non-adaptive policies approximating the adaptive optimum For instance, we can approximate the adaptive optimum for stochastic set packing to within O(d1/2), which is not only optimal with respect to the adaptivity gap, but it is also the best known approximation factor in the deterministic case It is known that there is no polynomial-time d1/2-e approximation for set packing, unless NP = ZPP Similarly, for b-matching, we obtain algorithmically a tight bound on the adaptivity gap of O(λ) where λ satisfies Σ λbj+1 = 1For general Stochastic Packing, we prove that a simple greedy algorithm provides an O(d)-approximation to the adaptive optimum For A ∈ [0, 1]dxn, we provide an O(λ) approximation where Σ 1/λbj = 1 (For b = (B, B,, B), we get λ = d1/B) We also improve the hardness results for deterministic PIP: in the general case, we prove that a polynomial-time d1-e-approximation algorithm would imply NP = ZPP In the special case when A ∈ [0,1]dxn and b = (B,B,,B), we show that a d1/B-∈-approximation would imply NP = ZPP Finally, we prove that it is PSPACE-hard to find the optimal adaptive policy for Stochastic Packing in any fixed dimension d ≥ 2

Book ChapterDOI
08 Sep 2005
TL;DR: Results confirm the utility of feature selection for clustering and the theoretical superiority of wrapper methods and suggest evidence that filters are a reasonably alternative with limited computational cost.
Abstract: Feature selection for clustering is a problem rarely addressed in the literature. Although recently there has been some work on the area, there is a lack of extensive empirical evaluation to assess the potential of each method. In this paper, we propose a new implementation of a wrapper and adapt an existing filter method to perform experiments over several data sets and compare both approaches. Results confirm the utility of feature selection for clustering and the theoretical superiority of wrapper methods. However, it raises some problems that arise from using greedy search procedures and also suggest evidence that filters are a reasonably alternative with limited computational cost.

Proceedings ArticleDOI
13 Mar 2005
TL;DR: This paper proposes an optimal solution to find the target watching schedule for sensors that achieves the maximal lifetime of the surveillance system and a workload matrix by using linear programming techniques.
Abstract: This paper addresses the maximal lifetime scheduling problem in sensor surveillance networks. Given a set of sensors and targets in a Euclidean plane, a sensor can watch only one target at a time, our task is to schedule sensors to watch targets, such that the lifetime of the surveillance system is maximized, where the lifetime is the duration that all targets are watched. We propose an optimal solution to find the target watching schedule for sensors that achieves the maximal lifetime. Our solution consists of three steps: 1) computing the maximal lifetime of the surveillance system and a workload matrix by using linear programming techniques; 2) decomposing the workload matrix into a sequence of schedule matrices that can achieve the maximal lifetime; 3) obtaining a target watching timetable for each sensor based on the schedule matrices. Simulations have been conducted to study the complexity of our proposed method and to compare with the performance of a greedy method.

Journal ArticleDOI
TL;DR: A general boosting technique for Textual Data Compression that can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts, and is very simple and optimal in terms of time.
Abstract: We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. It displays the following remarkable properties: (a) it can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts; (b) it is very simple and optimal in terms of time; and (c) it admits a decompression algorithm again optimal in time. To the best of our knowledge, this is the first boosting technique displaying these properties.Technically, our boosting technique builds upon three main ingredients: the Burrows--Wheeler Transform, the Suffix Tree data structure, and a greedy algorithm to process them. Specifically, we show that there exists a proper partition of the Burrows--Wheeler Transform of a string s that shows a deep combinatorial relation with the kth order entropy of s. That partition can be identified via a greedy processing of the suffix tree of s with the aim of minimizing a proper objective function over its nodes. The final compressed string is then obtained by compressing individually each substring of the partition by means of the base compressor we wish to boost.Our boosting technique is inherently combinatorial because it does not need to assume any prior probabilistic model about the source emitting s, and it does not deploy any training, parameter estimation and learning. Various corollaries are derived from this main achievement. Among the others, we show analytically that using our booster, we get better compression algorithms than some of the best existing ones, that is, LZ77, LZ78, PPMC and the ones derived from the Burrows--Wheeler Transform. Further, we settle analytically some long-standing open problems about the algorithmic structure and the performance of BWT-based compressors. Namely, we provide the first family of BWT algorithms that do not use Move-To-Front or Symbol Ranking as a part of the compression process.

Proceedings ArticleDOI
15 May 2005
TL;DR: A framework is developed to evaluate a large class of greedy methods that build suites one test at a time and provides a platform for optimizing the accuracy and speed of "one-test-at-a-time" greedy methods.
Abstract: Greedy algorithms for the construction of software interaction test suites are studied. A framework is developed to evaluate a large class of greedy methods that build suites one test at a time. Within this framework are many instantiations of greedy methods generalizing those in the literature. Greedy algorithms are popular when the time for test suite construction is of paramount concern. We focus on the size of the test suite produced by each instantiation. Experiments are analyzed using statistical techniques to determine the importance of the implementation decisions within the framework. This framework provides a platform for optimizing the accuracy and speed of "one-test-at-a-time" greedy methods.

Proceedings ArticleDOI
23 Jun 2005
TL;DR: It is shown that a simulated annealing approach is well suited to the problem of finding significant biclusters in gene expression data grows exponentially with the size of the dataset and heuristic approaches such as Cheng and Church's greedy node deletion algorithm are required.
Abstract: In a gene expression data matrix a bicluster is a grouping of a subset of genes and a subset of conditions which show correlating levels of expression activity. The difficulty of finding significant biclusters in gene expression data grows exponentially with the size of the dataset and heuristic approaches such as Cheng and Church's greedy node deletion algorithm are required. It is to be expected that stochastic search techniques such as genetic algorithms or simulated annealing might produce better solutions than greedy search. In this paper we show that a simulated annealing approach is well suited to this problem and we present a comparative evaluation of simulated annealing and node deletion on a variety of datasets. We show that simulated annealing discovers more significant biclusters in many cases.

Journal ArticleDOI
TL;DR: This paper proposes two greedy algorithms for packing unequal circles into a two-dimensional rectangular container that selects the next circle to place according to the maximum-hole degree rule, inspired from human activity in packing.
Abstract: In this paper, we study the problem of packing unequal circles into a two-dimensional rectangular container. We solve this problem by proposing two greedy algorithms. The first algorithm, denoted by B1.0, selects the next circle to place according to the maximum-hole degree rule, that is inspired from human activity in packing. The second algorithm, denoted by B1.5, improves B1.0 with a self-look-ahead search strategy. The comparisons with the published methods on several instances taken from the literature show the good performance of our approach.

Journal ArticleDOI
TL;DR: An innovative aspect of the presented approach is that it combines all relevant subproblems, concerning node locations, node sizes, and object placement, and solves them jointly in a single optimization step.

Journal ArticleDOI
TL;DR: This paper introduces the novel idea of information-directed routing, in which routing is formulated as a joint optimization of data transport and information aggregation, and derives information constraints from realistic signal models, and presents several routing algorithms that find near-optimal solutions for the joint optimization problem.
Abstract: In a sensor network, data routing is tightly coupled to the needs of a sensing task, and hence the application semantics. This paper introduces the novel idea of information-directed routing, in which routing is formulated as a joint optimization of data transport and information aggregation. The routing objective is to minimize communication cost, while maximizing information gain, differing from routing considerations for more general ad hoc networks. The paper uses the concrete problem of locating and tracking possibly moving signal sources as an example of information generation process, and considers two common information extraction patterns in a sensor network: routing a user query from an arbitrary entry node to the vicinity of signal sources and back, or to a prespecified exit node, maximizing information accumulated along the path. We derive information constraints from realistic signal models, and present several routing algorithms that find near-optimal solutions for the joint optimization problem. Simulation results have demonstrated that information-directed routing is a significant improvement over a previously reported greedy algorithm, as measured by sensing quality such as localization and tracking accuracy and communication quality such as success rate in routing around sensor holes.

Journal ArticleDOI
TL;DR: It is shown that phylogenetic diversity has an attractive mathematical property that ensures that it can be solved easily by the greedy algorithm: find a subset of the species of any given size k of maximal phylogenetic Diversity.
Abstract: Given a phylogenetic tree with leaves labeled by a collection of species, and with weighted edges, the "phylogenetic diversity" of any subset of the species is the sum of the edge weights of the minimal subtree connecting the species. This measure is relevant in biodiversity conservation where one may wish to compare different subsets of species according to how much evolutionary variation they encompass. In this note we show that phylogenetic diversity has an attractive mathematical property that ensures that we can solve the following problem easily by the greedy algorithm: find a subset of the species of any given size k of maximal phylogenetic diversity. We also describe an extension of this result that also allows weights to be assigned to species.