scispace - formally typeset
Search or ask a question
Author

Yanjun Zhang

Bio: Yanjun Zhang is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Parallel algorithm & Branch and bound. The author has an hindex of 3, co-authored 3 publications receiving 318 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Universal randomized methods for parallelizing sequential backtrack search and branch-and-bound computation are presented and demonstrate the effectiveness of randomization in distributed parallel computation.
Abstract: Universal randomized methods for parallelizing sequential backtrack search and branch-and-bound computation are presented. These methods execute on message-passing multi- processor systems, and require no global data structures or complex communication protocols. For backtrack search, it is shown that, uniformly on all instances, the method described in this paper is likely to yield a speed-up within a small constant factor from optimal, when all solutions to the problem instance are required. For branch-and-bound computation, it is shown that, uniformly on all instances, the execution time of this method is unlikely to exceed a certain inherent lower bound by more than a constant factor. These randomized methods demonstrate the effectiveness of randomization in distributed parallel computation. Categories and Subject Descriptors: F.2.2 (Analysis of Algorithms and Problem Complexity): Non-numerical Algorithms-computation

191 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: A universal randomized method called Local Best-First Search for parallelizing sequential branch-and-bound algorithms that shows that, uniformly on all instances, the execution time of the method is unlikely to exceed a certain inherent lower bound by more than a constant factor.
Abstract: We present a universal randomized method called Local Best-First Search for parallelizing sequential branch-and-bound algorithms. The method executes on a message-passing multiprocessor system, and requires no global data structures or complex communication protocols. We show that, uniformly on all instances, the execution time of the method is unlikely to exceed a certain inherent lower bound by more than a constant factor.

100 citations

01 Jan 1989
TL;DR: This thesis presents parallel algorithms for backtrack search, branch-and-bound computation and game-tree search, and presents a randomized method called Local Best-First Search for parallelizing sequential branch- and-bound algorithms.
Abstract: This thesis is a theoretical study of parallel algorithms for combinatorial search problems. In this thesis we present parallel algorithms for backtrack search, branch-and-bound computation and game-tree search. Our model of parallel computation is a network of processors communicating via messages. Our primary interest in a parallel algorithm is its speed-up over the sequential ones. Our goal is to design parallel algorithms that achieve a speed-up proportional to the number of processors used. We first study backtrack search that enumerates all solutions to a combinatorial problem. We propose a simple randomized method for parallelizing sequential backtrack search algorithms for solving enumeration problems. We show that, uniformly on all instances, this method is likely to achieve a nearly best possible speed-up. We then study the branch-and-bound method for solving combinatorial optimization problems. We present a randomized method called Local Best-First Search for parallelizing sequential branch-and-bound algorithms. We show that, uniformly on all instances, the execution time of this method is unlikely to exceed a certain inherent lower bound by more than a constant factor. In the rest of this thesis we study the problem of evaluation of game trees in parallel. We present a class of parallel algorithms that parallelize the "left-to-right" algorithm for evaluating AND/OR trees and the $\alpha$-$\beta$ pruning algorithm for evaluating MIN/MAX trees. We prove that the algorithm achieves a linear speed-up over the left-to-right algorithm on uniform AND/OR trees when the number of processors used is close to the height of the input tree. We conjecture that the same conclusion holds for the speed-up of the algorithm over the $\alpha$-$\beta$ pruning algorithm.

31 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.

1,688 citations

Journal ArticleDOI
TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time to execute a fully strict computation on P processors using this scheduler is 1:1.
Abstract: This paper studies the problem of efficiently schedulling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is “work stealing,” in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies.Specifically, our analysis shows that the expected time to execute a fully strict computation on P processors using our work-stealing scheduler is T1/P + O(T ∞ , where T1 is the minimum serial execution time of the multithreaded computation and (T ∞ is the minimum execution time with an infinite number of processors. Moreover, the space required by the execution is at most S1P, where S1 is the minimum serial space requirement. We also show that the expected total communication of the algorithm is at most O(PT ∞( 1 + nd)Smax), where Smax is the size of the largest activation record of any thread and nd is the maximum number of times that any thread synchronizes with its parent. This communication bound justifies the folk wisdom that work-stealing schedulers are more communication efficient than their work-sharing counterparts. All three of these bounds are existentially optimal to within a constant factor.

1,202 citations

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This paper shows that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance, and proves that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.
Abstract: Cilk (pronounced “silk”) is a C-based runtime system for multi-threaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance. Consequently, a Cilk programmer can focus on reducing the work and critical path of his computation, insulated from load balancing and other runtime scheduling issues. We also prove that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.The Cilk runtime system currently runs on the Connection Machine CM5 MPP, the Intel Paragon MPP, the Silicon Graphics Power Challenge SMP, and the MIT Phish network of workstations. Applications written in Cilk include protein folding, graphic rendering, backtrack search, and the *Socrates chess program, which won third prize in the 1994 ACM International Computer Chess Championship.

985 citations

Proceedings ArticleDOI
20 Nov 1994
TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time T/sub P/ to execute a fully strict computation on P processors using this work- Stealing Scheduler is T/ Sub P/=O(T/sub 1//P+T/ sub /spl infin//), where T/ sub 1/ is the minimum serial execution time of the multith readed computation and T/
Abstract: This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically, our analysis shows that the expected time T/sub P/ to execute a fully strict computation on P processors using our work-stealing scheduler is T/sub P/=O(T/sub 1//P+T/sub /spl infin//), where T/sub 1/ is the minimum serial execution time of the multithreaded computation and T/sub /spl infin// is the minimum execution time with an infinite number of processors. Moreover, the space S/sub P/ required by the execution satisfies S/sub P//spl les/S/sub 1/P. We also show that the expected total communication of the algorithm is at most O(T/sub /spl infin//S/sub max/P), where S/sub max/ is the size of the largest activation record of any thread, thereby justifying the folk wisdom that work-stealing schedulers are more communication efficient than their work-sharing counterparts. All three of these bounds are existentially optimal to within a constant factor. >

660 citations