scispace - formally typeset
Search or ask a question
Topic

Greedy algorithm

About: Greedy algorithm is a research topic. Over the lifetime, 15347 publications have been published within this topic receiving 393945 citations.


Papers
More filters
Proceedings Article
22 Jul 2012
TL;DR: This work presents a two-phase exploration-exploitation assignment algorithm and proves that it is competitive with respect to the optimal offline algorithm which has access to the unknown skill levels of each worker.
Abstract: We explore the problem of assigning heterogeneous tasks to workers with different, unknown skill sets in crowdsourcing markets such as Amazon Mechanical Turk. We first formalize the online task assignment problem, in which a requester has a fixed set of tasks and a budget that specifies how many times he would like each task completed. Workers arrive one at a time (with the same worker potentially arriving multiple times), and must be assigned to a task upon arrival. The goal is to allocate workers to tasks in a way that maximizes the total benefit that the requester obtains from the completed work. Inspired by recent research on the online adwords problem, we present a two-phase exploration-exploitation assignment algorithm and prove that it is competitive with respect to the optimal offline algorithm which has access to the unknown skill levels of each worker. We empirically evaluate this algorithm using data collected on Mechanical Turk and show that it performs better than random assignment or greedy algorithms. To our knowledge, this is the first work to extend the online primal-dual technique used in the online adwords problem to a scenario with unknown parameters, and the first to offer an empirical validation of an online primal-dual algorithm.

342 citations

Journal ArticleDOI
TL;DR: A novel process mining framework is introduced and some relevant computational issues are deeply studied, where an iterative, hierarchical, refinement of the process model is founded, where traces sharing similar behavior patterns are clustered together and equipped with a specialized schema.
Abstract: Process mining techniques have recently received notable attention in the literature; for their ability to assist in the (re)design of complex processes by automatically discovering models that explain the events registered in some log traces provided as input. Following this line of research, the paper investigates an extension of such basic approaches, where the identification of different variants for the process is explicitly accounted for, based on the clustering of log traces. Indeed, modeling each group of similar executions with a different schema allows us to single out "conformant" models, which, specifically, minimize the number of modeled enactments that are extraneous to the process semantics. Therefore, a novel process mining framework is introduced and some relevant computational issues are deeply studied. As finding an exact solution to such an enhanced process mining problem is proven to require high computational costs, in most practical cases, a greedy approach is devised. This is founded on an iterative, hierarchical, refinement of the process model, where, at each step, traces sharing similar behavior patterns are clustered together and equipped with a specialized schema. The algorithm guarantees that each refinement leads to an increasingly sound mDdel, thus attaining a monotonic search. Experimental results evidence the validity of the approach with respect to both effectiveness and scalability

341 citations

Proceedings ArticleDOI
28 Aug 2005
TL;DR: Overall the results demonstrate that PeopleNet, with its bazaar concept and peer-to-peer query propagation, can provide a simple and efficient mechanism for seeking information.
Abstract: People often seek information by asking other people even when they have access to vast reservoirs of information such as the Internet and libraries. This is because people are great sources of unique information, especially that which is location-specific, community-specific and time-specific. Social networking is effective because this type of information is often not easily available anywhere else. In this paper, we conceive a wireless virtual social network which mimics the way people seek information via social networking. PeopleNet is a simple, scalable and low-cost architecture for efficient information search in a distributed manner. It uses the infrastructure to propagate queries of a given type to users in specific geographic locations, called bazaars. Within each bazaar, the query is further propagated between neighboring nodes via peer-to-peer connectivity until it finds a matching query. The PeopleNet architecture can overlay easily on existing cellular infrastructure and entails minimal software installation. We identify three metrics for system performance: (i) probability of a match, (ii) time to find a match and (iii) number of matches found by a query. We describe two simple models, called the swap and spread models, for query propagation within a bazaar. We qualitatively argue that the swap model is better with respect to the performance metrics identified and demonstrate this via simulations. Next, we compute analytically the probability of match for the swap model. We show that the probability of match can be significantly improved if, prior to swapping queries, the nodes exchange some limited information about their buffer contents. We propose a simple greedy algorithm which uses this limited information to decide which queries to swap. We show via simulation that this algorithm achieves significantly better performance. Overall our results demonstrate that PeopleNet, with its bazaar concept and peer-to-peer query propagation, can provide a simple and efficient mechanism for seeking information.

337 citations

Journal ArticleDOI
TL;DR: This article designs a new heuristic algorithm that is easily scalable to millions of nodes and edges and significantly outperforms all other scalable heuristics to as much as 100–260% increase in influence spread.
Abstract: Influence maximization, defined by Kempe et al. (SIGKDD 2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in large-scale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (SIGKDD 2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this article, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several real-world and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond million-sized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread—it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100–260% increase in influence spread.

336 citations

Posted Content
TL;DR: This paper proposes three cost-based heuristic algorithms: Volcano-SH and Volcano-RU, which are based on simple modifications to the Volcano search strategy, and a greedy heuristic that incorporates novel optimizations that improve efficiency greatly.
Abstract: Complex queries are becoming commonplace, with the growing use of decision support systems. These complex queries often have a lot of common sub-expressions, either within a single query, or across multiple such queries run as a batch. Multi-query optimization aims at exploiting common sub-expressions to reduce evaluation cost. Multi-query optimization has hither-to been viewed as impractical, since earlier algorithms were exhaustive, and explore a doubly exponential search space. In this paper we demonstrate that multi-query optimization using heuristics is practical, and provides significant benefits. We propose three cost-based heuristic algorithms: Volcano-SH and Volcano-RU, which are based on simple modifications to the Volcano search strategy, and a greedy heuristic. Our greedy heuristic incorporates novel optimizations that improve efficiency greatly. Our algorithms are designed to be easily added to existing optimizers. We present a performance study comparing the algorithms, using workloads consisting of queries from the TPC-D benchmark. The study shows that our algorithms provide significant benefits over traditional optimization, at a very acceptable overhead in optimization time.

336 citations


Network Information
Related Topics (5)
Optimization problem
96.4K papers, 2.1M citations
92% related
Wireless network
122.5K papers, 2.1M citations
88% related
Network packet
159.7K papers, 2.2M citations
88% related
Wireless sensor network
142K papers, 2.4M citations
87% related
Node (networking)
158.3K papers, 1.7M citations
87% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023350
2022690
2021809
2020939
20191,006
2018967