scispace - formally typeset
Search or ask a question

Showing papers on "Greedy algorithm published in 2015"


Journal ArticleDOI
TL;DR: In this article, the phase retrieval problem is cast as a nonconvex quadratic program over a complex phase vector and formulated a tractable relaxation (called PhaseCut) similar to the classical MaxCut semidefinite program.
Abstract: Phase retrieval seeks to recover a signal $$x \in {\mathbb {C}}^p$$ x ? C p from the amplitude $$|A x|$$ | A x | of linear measurements $$Ax \in {\mathbb {C}}^n$$ A x ? C n . We cast the phase retrieval problem as a non-convex quadratic program over a complex phase vector and formulate a tractable relaxation (called PhaseCut) similar to the classical MaxCut semidefinite program. We solve this problem using a provably convergent block coordinate descent algorithm whose structure is similar to that of the original greedy algorithm in Gerchberg and Saxton (Optik 35:237---246, 1972), where each iteration is a matrix vector product. Numerical results show the performance of this approach over three different phase retrieval problems, in comparison with greedy phase retrieval algorithms and matrix completion formulations.

502 citations


Proceedings Article
25 Jan 2015
TL;DR: In this article, a linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint was proposed, which can achieve a (1 − 1/e − e) approximation guarantee to the optimum solution in time linear in the size of the data and independent of the cardinality constraints.
Abstract: Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can achieve a (1 — 1/e — e) approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.

320 citations


Proceedings ArticleDOI
13 Apr 2015
TL;DR: This paper formulates the online virtual function mapping and scheduling problem and proposes a set of algorithms for solving it and proposes three greedy algorithms and a tabu search-based heuristic.
Abstract: Network function virtualization has received attention from both academia and industry as an important shift in the deployment of telecommunication networks and services. It is being proposed as a path towards cost efficiency, reduced time-to-markets, and enhanced innovativeness in telecommunication service provisioning. However, efficiently running virtualized services is not trivial as, among other initialization steps, it requires first mapping virtual networks onto physical networks, and thereafter mapping and scheduling virtual functions onto the virtual networks. This paper formulates the online virtual function mapping and scheduling problem and proposes a set of algorithms for solving it. Our main objective is to propose simple algorithms that may be used as a basis for future work in this area. To this end, we propose three greedy algorithms and a tabu search-based heuristic. We carry out evaluations of these algorithms considering parameters such as successful service mappings, total service processing times, revenue, cost etc, under varying network conditions. Simulations show that the tabu search-based algorithm performs only slightly better than the best greedy algorithm.

287 citations


Journal ArticleDOI
08 Sep 2015
TL;DR: A powerful sampling technique that aids in parallelization of sequential algorithms and yields efficient algorithms that run in a logarithmic number of rounds while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm.
Abstract: Greedy algorithms are practitioners’ best friends—they are intuitive, are simple to implement, and often lead to very good solutions. However, implementing greedy algorithms in a distributed setting is challenging since the greedy choice is inherently sequential, and it is not clear how to take advantage of the extra processing power. Our main result is a powerful sampling technique that aids in parallelization of sequential algorithms. Armed with this primitive, we then adapt a broad class of greedy algorithms to the MapReduce paradigm; this class includes maximum cover and submodular maximization subject to p-system constraint problems. Our method yields efficient algorithms that run in a logarithmic number of rounds while obtaining solutions that are arbitrarily close to those produced by the standard sequential greedy algorithm. We begin with algorithms for modular maximization subject to a matroid constraint and then extend this approach to obtain approximation algorithms for submodular maximization subject to knapsack or p-system constraints.

206 citations


Journal ArticleDOI
TL;DR: It is found that random undetectable attacks can be accomplished by modifying only a much smaller number of measurements than this value, and this greedy algorithm has almost the same performance as the brute-force method, but without the combinatorial complexity.
Abstract: This paper discusses malicious false data injection attacks on the wide area measurement and monitoring system in smart grids. First, methods of constructing sparse stealth attacks are developed for two typical scenarios: 1) random attacks in which arbitrary measurements can be compromised; and 2) targeted attacks in which specified state variables are modified. It is already demonstrated that stealth attacks can always exist if the number of compromised measurements exceeds a certain value. In this paper, it is found that random undetectable attacks can be accomplished by modifying only a much smaller number of measurements than this value. It is well known that protecting the system from malicious attacks can be achieved by making a certain subset of measurements immune to attacks. An efficient greedy search algorithm is then proposed to quickly find this subset of measurements to be protected to defend against stealth attacks. It is shown that this greedy algorithm has almost the same performance as the brute-force method, but without the combinatorial complexity. Third, a robust attack detection method is discussed. The detection method is designed based on the robust principal component analysis problem by introducing element-wise constraints. This method is shown to be able to identify the real measurements, as well as attacks even when only partial observations are collected. The simulations are conducted based on IEEE test systems.

197 citations


Journal ArticleDOI
TL;DR: The algorithm to resolve the unbalanced energy consumption problem caused by long distance data transmission of some nodes in a chain formed by the greedy algorithm.
Abstract: Sensor network consisting of nodes with limited battery power and wireless communications are deployed to collect useful information from the field. The main idea in PEGASIS is for each node to receive from and transmit to close neighbors and take turns being the leader for transmission to the BS. This approach distributes the energy load evenly among the sensor nodes in the network. Sensor nodes are randomly deployed in the sensor field, and therefore, the i th node is at a random location. The nodes will be organized to form a chain, which can either be accomplished by the sensor nodes themselves using a greedy algorithm. The algorithm to resolve the unbalanced energy consumption problem caused by long distance data transmission of some nodes in a chain formed by the greedy algorithm.

188 citations


Journal ArticleDOI
TL;DR: This paper discusses the definition of modularity (Q) used as a metric for community quality and then it reviews the modularity maximization approaches which were used for community detection and introduces two novel fine-tuned community detection algorithms that iteratively attempt to improve the community quality measurements by splitting and merging the given network community structure.
Abstract: In this paper, we first discuss the definition of modularity (Q) used as a metric for community quality and then we review the modularity maximization approaches which were used for community detection in the last decade. Then, we discuss two opposite yet coexisting problems of modularity optimization: in some cases, it tends to favor small communities over large ones while in others, large communities over small ones (so called the resolution limit problem). Next, we overview several community quality metrics proposed to solve the resolution limit problem and discuss Modularity Density (Qds) which simultaneously avoids the two problems of modularity. Finally, we introduce two novel fine-tuned community detection algorithms that iteratively attempt to improve the community quality measurements by splitting and merging the given network community structure. The first of them, referred to as Fine-tuned Q, is based on modularity (Q) while the second one is based on Modularity Density (Qds) and denoted as Fine-tuned Qds. Then, we compare the greedy algorithm of modularity maximization (denoted as Greedy Q), Fine-tuned Q, and Fine-tuned Qds on four real networks, and also on the classical clique network and the LFR benchmark networks, each of which is instantiated by a wide range of parameters. The results indicate that Fine-tuned Qds is the most effective among the three algorithms discussed. Moreover, we show that Fine-tuned Qds can be applied to the communities detected by other algorithms to significantly improve their results.

175 citations


Journal ArticleDOI
TL;DR: This paper focuses on sensor scheduling for state estimation, which consists of a network of noisy sensors and a discrete-time linear system with process noise, and shows that most commonly-used estimation error metrics are not, in general, submodular functions.

158 citations


Proceedings Article
04 May 2015
TL;DR: This paper studies the interplay between Integer Linear Programming (ILP) and greedy algorithms to generate solutions optimized for latency, pipeline occupancy, or power consumption, and suggests the best greedy approach.
Abstract: Programmable switching chips are becoming more commonplace, along with new packet processing languages to configure the forwarding behavior. Our paper explores the design of a compiler for such switching chips, in particular how to map logical lookup tables to physical tables, while meeting data and control dependencies in the program. We study the interplay between Integer Linear Programming (ILP) and greedy algorithms to generate solutions optimized for latency, pipeline occupancy, or power consumption. ILP is slower but more likely to fit hard cases; further, ILP can be used to suggest the best greedy approach. We compile benchmarks from real production networks to two different programmable switch architectures: RMT and Intel's FlexPipe. Greedy solutions can fail to fit and can require up to 38% more stages, 42% more cycles, or 45% more power for some benchmarks. Our analysis also identifies critical resources in chips. For a complicated use case, doubling the TCAM per stage reduces the minimum number of stages needed by 12.5%.

155 citations


Proceedings ArticleDOI
24 Aug 2015
TL;DR: In this paper, the authors investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay.
Abstract: We investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay. Here, content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access a piece of content, a user must decide whether to route its request to a cache or to the back-end server. Additionally, caches must decide which content to cache. We investigate the problem complexity of two problem formulations, where the direct path to the back-end server is modeled as i) a congestion-sensitive or ii) a congestion-insensitive path, reflecting whether or not the delay of the uncached path to the back-end server depends on the user request load, respectively. We show that the problem is NP-complete in both cases. We prove that under the congestion-insensitive model the problem can be solved optimally in polynomial time if each piece of content is requested by only one user, or when there are at most two caches in the network. We also identify a structural property of the user-cache graph that potentially makes the problem NP-complete. For the congestion-sensitive model, we prove that the problem remains NP-complete even if there is only one cache in the network and each content is requested by only one user. We show that approximate solutions can be found for both models within a (1 − 1/e) factor of the optimal solution, and demonstrate a greedy algorithm that is found to be within 1% of optimal for small problem sizes. Through trace-driven simulations we evaluate the performance of our greedy algorithms, which show up to a 50% reduction in average delay over solutions based on LRU content caching.

147 citations


Proceedings ArticleDOI
14 Jun 2015
TL;DR: This paper shows that a simple greedy algorithm results in a 1/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with O(n) total communication complexity for either monotone or non-monotone functions.
Abstract: An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems [7,15,8]. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique [15]. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. The effectiveness of this technique has been confirmed empirically for several machine learning applications [22], and our proof provides a theoretical foundation to this idea.In summary, we show that a simple greedy algorithm results in a 1/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with O(n) total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy, we present an improved 0.545-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor 1/2 in a constant number of rounds.

Journal ArticleDOI
01 May 2015
TL;DR: Results show that the proposed TMIIG algorithm is relatively more effective in minimizing the makespan than other existing well-performing heuristic algorithms.
Abstract: Graphical abstractDisplay Omitted HighlightsWe propose an improved IG algorithm for the no-wait flowshop scheduling problem.The proposed algorithm is incorporated with a Tabu-based reconstruction strategy.Simulation results confirm the advantages of utilizing the new reconstruction scheme.Our algorithm is more effective than other competitive algorithms in the literature.43 new upper bound solutions for the problem have been made available. This paper proposes a Tabu-mechanism improved iterated greedy (TMIIG) algorithm to solve the no-wait flowshop scheduling problem with a makespan criterion. The idea of seeking further improvement in the iterated greedy (IG) algorithm framework is based on the observation that the construction phase of the original IG algorithm may not achieve good performance in escaping from local minima when incorporating the insertion neighborhood search. To overcome this limitation, we have modified the IG algorithm by utilizing a Tabu-based reconstruction strategy to enhance its exploration ability. A powerful neighborhood search method that involves insert, swap, and double-insert moves is then applied to obtain better solutions from the reconstructed solution in the previous step. Empirical results on several benchmark problem instances and those generated randomly confirm the advantages of utilizing the new reconstruction scheme. In addition, our results also show that the proposed TMIIG algorithm is relatively more effective in minimizing the makespan than other existing well-performing heuristic algorithms.

Proceedings ArticleDOI
18 May 2015
TL;DR: This work proposes a greedy algorithm to select the most relevant meta-paths and presents a data structure to enable efficient execution of this algorithm and incorporates hierarchical relationships among node classes in their solutions.
Abstract: The Heterogeneous Information Network (HIN) is a graph data model in which nodes and edges are annotated with class and relationship labels. Large and complex datasets, such as Yago or DBLP, can be modeled as HINs. Recent work has studied how to make use of these rich information sources. In particular, meta-paths, which represent sequences of node classes and edge types between two nodes in a HIN, have been proposed for such tasks as information retrieval, decision making, and product recommendation. Current methods assume meta-paths are found by domain experts. However, in a large and complex HIN, retrieving meta-paths manually can be tedious and difficult. We thus study how to discover meta-paths automatically. Specifically, users are asked to provide example pairs of nodes that exhibit high proximity. We then investigate how to generate meta-paths that can best explain the relationship between these node pairs. Since this problem is computationally intractable, we propose a greedy algorithm to select the most relevant meta-paths. We also present a data structure to enable efficient execution of this algorithm. We further incorporate hierarchical relationships among node classes in our solutions. Extensive experiments on real-world HIN show that our approach captures important meta-paths in an efficient and scalable manner.

Proceedings ArticleDOI
13 Jun 2015
TL;DR: The main result is the first polynomial-time deterministic approximation algorithm for this problem, with an approximation ratio of 67/3, and a randomized version of the algorithm, with a ratio of 9+16√2/3.
Abstract: Communications in datacenter jobs (such as the shuffle operations in MapReduce applications) often involve many parallel flows, which may be processed simultaneously. This highly parallel structure presents new scheduling challenges in optimizing job-level performance objectives in data centers. Chowdhury and Stoica introduced the coflow abstraction to capture these communication patterns, and recently Chowdhury et al. developed effective heuristics to schedule coflows. In this paper, we consider the problem of efficiently scheduling coflows with release dates so as to minimize the total weighted completion time, which has been shown to be strongly NP-hard. Our main result is the first polynomial-time deterministic approximation algorithm for this problem, with an approximation ratio of 67/3, and a randomized version of the algorithm, with a ratio of 9+16√2/3. Our results use techniques from both combinatorial scheduling and matching theory, and rely on a clever grouping of coflows. We also run experiments on a Facebook trace to test the practical performance of several algorithms, including our deterministic algorithm. Our experiments suggest that simple algorithms provide effective approximations of the optimal, and that our deterministic algorithm has near-optimal performance.

Journal ArticleDOI
TL;DR: A divide-and-conquer strategy with parallel computing mechanism has been adopted and an algorithm called Community-based Greedy algorithm for mining top-K influential nodes and precision analysis is given to show approximation guarantees of the models.
Abstract: With the proliferation of mobile devices and wireless technologies, mobile social network systems are increasingly available. A mobile social network plays an essential role as the spread of information and influence in the form of “word-of-mouth”. It is a fundamental issue to find a subset of influential individuals in a mobile social network such that targeting them initially (e.g., to adopt a new product) will maximize the spread of the influence (further adoptions of the new product). The problem of finding the most influential nodes is unfortunately NP-hard. It has been shown that a Greedy algorithm with provable approximation guarantees can give good approximation; However, it is computationally expensive, if not prohibitive, to run the greedy algorithm on a large mobile social network. In this paper, a divide-and-conquer strategy with parallel computing mechanism has been adopted. We first propose an algorithm called Community-based Greedy algorithm for mining top-K influential nodes. It encompasses two components: dividing the large-scale mobile social network into several communities by taking into account information diffusion and selecting communities to find influential nodes by a dynamic programming. Then, to further improve the performance, we parallelize the influence propagation based on communities and consider the influence propagation crossing communities. Also, we give precision analysis to show approximation guarantees of our models. Experiments on real large-scale mobile social networks show that the proposed methods are much faster than previous algorithms, meanwhile, with high accuracy.

Journal ArticleDOI
TL;DR: Numerical results show that the submodular saturation algorithm outperforms the greedy algorithm and the genetic algorithm in most cases, and provides competitive results in other cases.
Abstract: The active nature of next generation distribution systems requires a highly accurate real-time monitoring for state estimation and situational awareness. This paper considers the problem of robust placement of a limited number of voltage magnitude meters and phasor measurement units for state estimation in an active distribution system comprising topological reconfigurations. The trace of the inverse of the Fisher information matrix is chosen as criterion for the estimation accuracy. For meter placement in a single configuration distribution system, the greedy approach provides a near-optimal solution as well as a theoretical approximation guarantee. However, for meter placement in an active distribution system, a robust approach is required to optimize the worst case estimation accuracy among all possible configurations of the system. In this paper, a simple robust algorithm, called submodular saturation algorithm, for meter placement in active distribution systems is proposed. Numerical results on three different active distribution systems show that the submodular saturation algorithm outperforms the greedy algorithm and the genetic algorithm in most cases, and provides competitive results in other cases.

Journal ArticleDOI
TL;DR: The presented approach exceeds the mileage savings achieved by a greedy heuristic by 6% while requiring 30% lower computational efforts, and is applied to investigate the potential for taxi sharing in Singapore.
Abstract: Ridesharing offers the opportunity to make more efficient use of vehicles while preserving the benefits of individual mobility. Presenting ridesharing as a viable option for commuters, however, requires minimizing certain inconvenience factors. One of these factors includes detours which result from picking up and dropping off additional passengers. This paper proposes a method which aims to best utilize ridesharing potential while keeping detours below a specific limit. The method specifically targets ridesharing systems on a very large scale and with a high degree of dynamics which are difficult to address using classical approaches known from operations research. For this purpose, the road network is divided into distinct partitions which define the search space for ride matches. The size and shape of the partitions depend on the topology of the road network as well as on two free parameters. This allows optimizing the partitioning with regard to sharing potential utilization and inconvenience minimization. Match making is ultimately performed using an agent-based approach. As a case study, the algorithm is applied to investigate the potential for taxi sharing in Singapore. This is done by considering about 110 000 daily trips and allowing up to two sharing partners. The outcome shows that the number of trips could be reduced by 42% resulting in a daily mileage savings of 230 000 km. It is further shown that the presented approach exceeds the mileage savings achieved by a greedy heuristic by 6% while requiring 30% lower computational efforts.

Journal ArticleDOI
TL;DR: It is shown that the optimality of MOGAs can be significantly improved by diversifying the solutions (sub-sets of the test suite) generated during the search process by introducing a new MOGA, coined as DIversity based Genetic Algorithm (DIV-GA), based on the mechanisms of Orthogonal design and orthogonal evolution.
Abstract: A way to reduce the cost of regression testing consists of selecting or prioritizing subsets of test cases from a test suite according to some criteria. Besides greedy algorithms, cost cognizant additional greedy algorithms, multi-objective optimization algorithms, and multi-objective genetic algorithms (MOGAs), have also been proposed to tackle this problem. However, previous studies have shown that there is no clear winner between greedy and MOGAs, and that their combination does not necessarily produce better results. In this paper we show that the optimality of MOGAs can be significantly improved by diversifying the solutions (sub-sets of the test suite) generated during the search process. Specifically, we introduce a new MOGA, coined as DIversity based Genetic Algorithm (DIV-GA), based on the mechanisms of orthogonal design and orthogonal evolution that increase diversity by injecting new orthogonal individuals during the search process. Results of an empirical study conducted on eleven programs show that DIV-GA outperforms both greedy algorithms and the traditional MOGAs from the optimality point of view. Moreover, the solutions (sub-sets of the test suite) provided by DIV-GA are able to detect more faults than the other algorithms, while keeping the same test execution cost.

Journal ArticleDOI
TL;DR: A genetic algorithm (GA) with dual-chromosome coding for CTSP is presented and the results suggest that SAGA can achieve the best quality of solutions and HCGA should be the choice making good tradeoff between the solution quality and computing time.
Abstract: The multiple traveling salesman problem (MTSP) is an important combinatorial optimization problem. It has been widely and successfully applied to the practical cases in which multiple traveling individuals (salesmen) share the common workspace (city set). However, it cannot represent some application problems where multiple traveling individuals not only have their own exclusive tasks but also share a group of tasks with each other. This work proposes a new MTSP called colored traveling salesman problem (CTSP) for handling such cases. Two types of city groups are defined, i.e., each group of exclusive cities of a single color for a salesman to visit and a group of shared cities of multiple colors allowing all salesmen to visit. Evidences show that CTSP is NP-hard and a multidepot MTSP and multiple single traveling salesman problems are its special cases. We present a genetic algorithm (GA) with dual-chromosome coding for CTSP and analyze the corresponding solution space. Then, GA is improved by incorporating greedy, hill-climbing (HC), and simulated annealing (SA) operations to achieve better performance. By experiments, the limitation of the exact solution method is revealed and the performance of the presented GAs is compared. The results suggest that SAGA can achieve the best quality of solutions and HCGA should be the choice making good tradeoff between the solution quality and computing time.

Journal ArticleDOI
TL;DR: A low rank approximation method based on discrete least-squares for the approximation of a multivariate function from random, noise-free observations is proposed, proving the interest of the proposed algorithm for the propagation of uncertainties through complex computational models.
Abstract: In this paper, we propose a low rank approximation method based on discrete least-squares for the approximation of a multivariate function from random, noise-free observations. Sparsity inducing regularization techniques are used within classical algorithms for low rank approximation in order to exploit the possible sparsity of low rank approximations. Sparse low rank approximations are constructed with a robust updated greedy algorithm, which includes an optimal selection of regularization parameters and approximation ranks using cross validation techniques. Numerical examples demonstrate the capability of approximating functions of many variables even when very few function evaluations are available, thus proving the interest of the proposed algorithm for the propagation of uncertainties through complex computational models.

Proceedings Article
25 Jan 2015
TL;DR: An auction algorithm to allocate tasks that have temporal constraints to cooperative robots that is computationally frugal and consistently allocates more tasks than the competing algorithms.
Abstract: We propose an auction algorithm to allocate tasks that have temporal constraints to cooperative robots. Temporal constraints are expressed as time windows, within which a task must be executed. There are no restrictions on the time windows, which are allowed to overlap. Robots model their temporal constraints using a simple temporal network, enabling them to maintain consistent schedules. When bidding on a task, a robot takes into account its own current commitments and an optimization objective, which is to minimize the time of completion of the last task alone or in combination with minimizing the distance traveled. The algorithm works both when all the tasks are known upfront and when tasks arrive dynamically. We show the performance of the algorithm in simulation with different numbers of tasks and robots, and compare it with a baseline greedy algorithm and a state-of-the-art auction algorithm. Our algorithm is computationally frugal and consistently allocates more tasks than the competing algorithms.

Journal ArticleDOI
TL;DR: This paper proposes a subset selection algorithm for supervised classification which targets at finding features that can best predict class labels that considers both the selected and remaining features' relevances with the label.
Abstract: Feature selection tries to find a subset of feature from a larger feature pool and the selected subset can provide the same or even better performance compared with using the whole set. Feature selection is usually a critical preprocessing step for many machine-learning applications such as clustering and classification. In this paper, we focus on feature selection for supervised classification which targets at finding features that can best predict class labels. Traditional greedy search algorithms incrementally find features based on the relevance of candidate features and the class label. However, this may lead to suboptimal results when there are redundant features that may interfere with the selection. To solve this problem, we propose a subset selection algorithm that considers both the selected and remaining features’ relevances with the label. The intuition is that features, which do not have better alternatives from the feature set, should be selected first. We formulate the selection problem as maximizing the dependency margin which is measured by the difference between the selected feature set performance and the remaining feature set performance. Extensive experiments on various data sets show the superiority of the proposed approach against traditional algorithms.

Journal ArticleDOI
TL;DR: This paper describes an optimization algorithm called CoGEnT that produces solutions with succinct atomic representations for reconstruction problems, generally formulated with atomic-norm constraints, and introduces several novel applications that are enabled by the atomic- norm framework.
Abstract: In many signal processing applications, the aim is to reconstruct a signal that has a simple representation with respect to a certain basis or frame. Fundamental elements of the basis known as “atoms” allow us to define “atomic norms” that can be used to formulate convex regularizations for the reconstruction problem. Efficient algorithms are available to solve these formulations in certain special cases, but an approach that works well for general atomic norms, both in terms of speed and reconstruction accuracy, remains to be found. This paper describes an optimization algorithm called CoGEnT that produces solutions with succinct atomic representations for reconstruction problems, generally formulated with atomic-norm constraints. CoGEnT combines a greedy selection scheme based on the conditional gradient approach with a backward (or “truncation”) step that exploits the quadratic nature of the objective to reduce the basis size. We establish convergence properties and validate the algorithm via extensive numerical experiments on a suite of signal processing applications. Our algorithm and analysis also allow for inexact forward steps and for occasional enhancements of the current representation to be performed. CoGEnT can outperform the basic conditional gradient method, and indeed many methods that are tailored to specific applications, when the enhancement and truncation steps are defined appropriately. We also introduce several novel applications that are enabled by the atomic-norm framework, including tensor completion, moment problems in signal processing, and graph deconvolution.

Journal ArticleDOI
TL;DR: New upper bounds are established to significantly reduce the number of Monte-Carlo simulations in greedy-based algorithms, especially at the initial step, and a new Upper Bound based Lazy Forward algorithm (UBLF in short) is proposed for discovering the top-k influential nodes in social networks.
Abstract: Influence maximization, defined as finding a small subset of nodes that maximizes spread of influence in social networks, is NP-hard under both Independent Cascade (IC) and Linear Threshold (LT) models, where many greedy-based algorithms have been proposed with the best approximation guarantee. However, existing greedy-based algorithms are inefficient on large networks, as it demands heavy Monte-Carlo simulations of the spread functions for each node at the initial step [7] . In this paper, we establish new upper bounds to significantly reduce the number of Monte-Carlo simulations in greedy-based algorithms, especially at the initial step. We theoretically prove that the bound is tight and convergent when the summation of weights towards (or from) each node is less than 1. Based on the bound, we propose a new Upper Bound based Lazy Forward algorithm ( UBLF in short) for discovering the top-k influential nodes in social networks. We test and compare UBLF with prior greedy algorithms, especially CELF [30] . Experimental results show that UBLF reduces more than 95 percent Monte-Carlo simulations of CELF and achieves about $2\hbox{-}10$ times speedup when the seed set is small.

Proceedings ArticleDOI
04 Jan 2015
TL;DR: In this paper, a (1 − c/e)-approximation algorithm for submodular and supermodular functions subject to a single matroid constraint was proposed, which is the best possible in the value oracle model, even in the case of a cardinality constraint.
Abstract: We design new approximation algorithms for the problems of optimizing submodular and supermodular functions subject to a single matroid constraint. Specifically, we consider the case in which we wish to maximize a nondecreasing submodular function or minimize a nonincreasing supermodular function in the setting of bounded total curvature c. In the case of submodular maximization with curvature c, we obtain a (1 − c/e)-approximation --- the first improvement over the greedy (1 − e--c)/c-approximation of Conforti and Cornuejols from 1984, which holds for a cardinality constraint, as well as recent approaches that hold for an arbitrary matroid constraint.Our approach is based on modifications of the continuous greedy algorithm and non-oblivious local search, and allows us to approximately maximize the sum of a nonnegative, nondecreasing submodular function and a (possibly negative) linear function. We show how to reduce both submodular maximization and supermodular minimization to this general problem when the objective function has bounded total curvature.We prove that the approximation results we obtain are the best possible in the value oracle model, even in the case of a cardinality constraint. Finally, we give two concrete applications of our results in the settings of maximum entropy sampling, and the column-subset selection problem.

Proceedings Article
25 Jul 2015
TL;DR: This work proposes a novel approach to diversifying a list of recommended items, which maximizes the utility of the items subject to the increase in their diversity, and outperforms the baseline methods.
Abstract: The need for diversification manifests in various recommendation use cases. In this work, we propose a novel approach to diversifying a list of recommended items, which maximizes the utility of the items subject to the increase in their diversity. From a technical perspective, the problem can be viewed as maximization of a modular function on the polytope of a submodular function, which can be solved optimally by a greedy method. We evaluate our approach in an offline analysis, which incorporates a number of baselines and metrics, and in two online user studies. In all the experiments, our method outperforms the baseline methods.

Proceedings ArticleDOI
27 May 2015
TL;DR: This paper proves that the problem of answering mCK queries is NP-hard, and proposes an exact algorithm that utilizes the group found by the 2 over √3 + ε)-approximation algorithm to obtain the optimal group.
Abstract: As an important type of spatial keyword query, the m-closest keywords (mCK) query finds a group of objects such that they cover all query keywords and have the smallest diameter, which is defined as the largest distance between any pair of objects in the group The query is useful in many applications such as detecting locations of web resources However, the existing work does not study the intractability of this problem and only provides exact algorithms, which are computationally expensive In this paper, we prove that the problem of answering mCK queries is NP-hard We first devise a greedy algorithm that has an approximation ratio of 2 Then, we observe that an mCK query can be approximately answered by finding the circle with the smallest diameter that encloses a group of objects together covering all query keywords We prove that the group enclosed in the circle can answer the mCK query with an approximation ratio of 2 over 3 Based on this, we develop an algorithm for finding such a circle exactly, which has a high time complexity To improve efficiency, we propose another two algorithms that find such a circle approximately, with a ratio of 2 over √3 + e Finally, we propose an exact algorithm that utilizes the group found by the 2 over √3 + e)-approximation algorithm to obtain the optimal group We conduct extensive experiments using real-life datasets The experimental results offer insights into both efficiency and accuracy of the proposed approximation algorithms, and the results also demonstrate that our exact algorithm outperforms the best known algorithm by an order of magnitude

Journal ArticleDOI
TL;DR: A transportation problem arising in public bicycle sharing systems where a fleet of vehicles continuously performs tours moving bikes among stations is considered, a fast greedy construction heuristic is proposed and a Variable Neighborhood Descent is described that exploits a set of specifically designed neighborhood structures in a deterministic way to locally improve the solutions.
Abstract: We consider a transportation problem arising in public bicycle sharing systems: To avoid rental stations to run entirely empty or full, a fleet of vehicles continuously performs tours moving bikes among stations. In the static problem variant considered in this paper, we are given initial and target fill levels for all stations, and the goal is primarily to find vehicle tours including corresponding loading instructions in order to minimize the deviations from the target fill levels. As secondary objectives we are further interested in minimizing the tours' total duration and the overall number of loading actions. For this purpose we first propose a fast greedy construction heuristic and extend it to a PILOT method that evaluates each candidate station considered for addition to the current partial tour in a refined way by looking forward via a recursive call. Next we describe a Variable Neighborhood Descent (VND) that exploits a set of specifically designed neighborhood structures in a deterministic way to locally improve the solutions. While the VND is processing the search space of candidate routes to determine the stops for vehicles at unbalanced rental stations, the number of bikes to be loaded or unloaded at each stop is derived by an efficient method. Four alternatives are considered for this embedded procedure based on a greedy heuristic, two variants of maximum flow calculations, and linear programming. Last but not least, we investigate a general Variable Neighborhood Search (VNS) and variants of a Greedy Randomized Adaptive Search Procedure (GRASP) for further diversification and extended runs. Rigorous experiments using benchmark instances derived from a real-world scenario in Vienna with up to 700 stations document the performance of the suggested approaches and individual pros and cons. While the VNS yields the best results on instances of moderate size, a PILOT/GRASP hybrid turns out to be superior on very large instances. If solutions are required in short time, the construction heuristic or PILOT method optionally followed by VND still yield reasonable results.

Journal ArticleDOI
TL;DR: The results show that GA is competitive only for pairwise testing for subjects with a small number of constraints; the results for the greedy algorithm are actually slightly superior, however, the results are critically dependent on the approach adopted to constraint handling.
Abstract: Combinatorial interaction testing (CIT) is important because it tests the interactions between the many features and parameters that make up the configuration space of software systems. Simulated Annealing (SA) and Greedy Algorithms have been widely used to find CIT test suites. From the literature, there is a widely-held belief that SA is slower, but produces more effective tests suites than Greedy and that SA cannot scale to higher strength coverage. We evaluated both algorithms on seven real-world subjects for the well-studied two-way up to the rarely-studied six-way interaction strengths. Our findings present evidence to challenge this current orthodoxy: real-world constraints allow SA to achieve higher strengths. Furthermore, there was no evidence that Greedy was less effective (in terms of time to fault revelation) compared to SA; the results for the greedy algorithm are actually slightly superior. However, the results are critically dependent on the approach adopted to constraint handling. Moreover, we have also evaluated a genetic algorithm for constrained CIT test suite generation. This is the first time strengths higher than 3 and constraint handling have been used to evaluate GA. Our results show that GA is competitive only for pairwise testing for subjects with a small number of constraints.

Journal ArticleDOI
TL;DR: A greedy algorithm called Greedy-W SC and an ant colony optimization based algorithm called ACO-WSC are presented, which attempt to select cloud combinations that are feasible and use the minimum number of clouds, and experimental results show that the proposed ant colonies optimization method can effectively and efficiently find cloud combinations with a minimal number of Clouds.