Showing papers by "Ming-Yang Kao published in 2009"

PDF

Open Access

Journal Article•DOI•

Randomized fast design of short DNA words

[...]

Ming-Yang Kao¹, Manan Sanghi², Robert T. Schweller³•Institutions (3)

Northwestern University¹, Microsoft², University of Texas–Pan American³

06 Nov 2009-ACM Transactions on Algorithms

TL;DR: This article proposes a natural optimization formulation of the DNA code design problem in which the goal is to design n strings that satisfy a given set of constraints while minimizing the length of the strings.

...read moreread less

Abstract: We consider the problem of efficiently designing sets (codes) of equal-length DNA strings (words) that satisfy certain combinatorial constraints. This problem has numerous motivations including DNA self-assembly and DNA computing. Previous work has extended results from coding theory to obtain bounds on code size for new biologically motivated constraints and has applied heuristic local search and genetic algorithm techniques for code design. This article proposes a natural optimization formulation of the DNA code design problem in which the goal is to design n strings that satisfy a given set of constraints while minimizing the length of the strings. For multiple sets of constraints, we provide simple randomized algorithms that run in time polynomial in n and any given constraint parameters, and output strings of length within a constant factor of the optimal with high probability. To the best of our knowledge, this work is the first to consider this type of optimization problem in the context of DNA code design.

...read moreread less

15 citations

Journal Article•DOI•

An approximation algorithm for a bottleneck traveling salesman problem

[...]

Ming-Yang Kao¹, Manan Sanghi²•Institutions (2)

Northwestern University¹, Microsoft²

01 Sep 2009-Journal of Discrete Algorithms

TL;DR: Minimizing the total time spent to deliver loads L"1,...,L"n is equivalent to solving the traveling salesman problem (TSP) where the cities correspond to the loads with coordinates (@a"i,@b"i) and the distance from L"i to L"j is given by @!"@a"""i^@b^"^jf(x)dx if @b"j>[email protected]"i and by @!".

...read moreread less

12 citations

Journal Article•DOI•

On approximating four covering and packing problems

[...]

Mary V. Ashley¹, Tanya Y. Berger-Wolf¹, Piotr Berman², Wanpracha Art Chaovalitwongse³, Bhaskar DasGupta¹, Ming-Yang Kao⁴ - Show less +2 more•Institutions (4)

University of Illinois at Chicago¹, Pennsylvania State University², Rutgers University³, Northwestern University⁴

01 Aug 2009-Journal of Computer and System Sciences

TL;DR: The inapproximability constant for the triangle packing problem improves upon the previous results and the results on the maximum profit coverage problem provides almost matching upper and lower bounds on the approximation ratio, answering a question posed by Hassin and Or.

...read moreread less

10 citations

Book Chapter•DOI•

The Closest Pair Problem under the Hamming Metric

[...]

Kerui Min¹, Ming-Yang Kao², Hong Zhu³•Institutions (3)

Fudan University¹, Northwestern University², East China Normal University³

11 Jul 2009

TL;DR: This paper shows that for 0 < D ≤ n 0.294, the problem with the binary alphabet set can be solved within time complexity, and provides an alternative approach not involving algebraic matrix multiplication, which has the time complexity with small constant, and is effective for practical use.

...read moreread less

Abstract: Finding the closest pair among a given set of points under Hamming Metric is a fundamental problem with many applications. Let n be the number of points and D the dimensionality of all points. We show that for 0 < D ≤ n 0.294, the problem, with the binary alphabet set, can be solved within time complexity $O\left(n^{2+o(1)}\right)$, whereas for n 0.294 < D ≤ n , it can be solved within time complexity $O\left(n^{1.843} D^{0.533}\right)$. We also provide an alternative approach not involving algebraic matrix multiplication, which has the time complexity $O\left(n^2D/\log^2 D\right)$ with small constant, and is effective for practical use. Moreover, for arbitrary large alphabet set, an algorithm with the time complexity $O\left(n^2\sqrt{D}\right)$ is obtained for 0 < D ≤ n 0.294, whereas the time complexity is $O\left(n^{1.921} D^{0.767}\right)$ for n 0.294 < D ≤ n . In addition, the algorithms propose in this paper provides a solution to the open problem stated by Kao et al.

...read moreread less

9 citations

Book Chapter•DOI•

Discovering Almost Any Hidden Motif from Multiple Sequences in Polynomial Time with Low Sample Complexity and High Success Probability

[...]

Bin Fu¹, Ming-Yang Kao², Lusheng Wang³•Institutions (3)

University of Texas–Pan American¹, Northwestern University², City University of Hong Kong³

12 May 2009

TL;DR: An efficient algorithm is developed that can discover a hidden motif from a set of sequences for any alphabet Σ with |Σ | *** 2 and is applicable to DNA motif discovery.

...read moreread less

Abstract: We study a natural probabilistic model for motif discovery that has been used to experimentally test the effectiveness of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ . A motif G = g 1 g 2 ...g m is a string of m characters. Each background sequence is implanted a probabilistically generated approximate copy of G . For a probabilistically generated approximate copy b 1 b 2 ...b m of G , every character is probabilistically generated such that the probability for b i *** g i is at most *** . It has been conjectured that multiple background sequences can help with finding faint motifs G . In this paper, we develop an efficient algorithm that can discover a hidden motif from a set of sequences for any alphabet Σ with |Σ | *** 2 and is applicable to DNA motif discovery. We prove that for $\alpha and any constant x *** 8, there exist positive constants c 0 , *** , *** 1 and *** 2 such that if the length ρ of motif G is at least *** 1 logn , and there are k *** c 0 logn input sequences, then in O (n 2 + kn ) time this algorithm finds the motif with probability at least $1-{1\over 2^x}$ for every $G\in \Sigma^{\rho}-\Psi_{\rho, h,\epsilon}(\Sigma)$, where ρ is the length of the motif, h is a parameter with ρ *** 4h *** *** 2 logn , and *** ρ , h ,*** (Σ ) is a small subset of at most $2^{-\Theta(\epsilon^2 h)}$ fraction of the sequences in Σ ρ . The constants c 0 , *** , *** 1 and *** 2 do not depend on x when x is a parameter of order O (logn ). Our algorithm can take any number k sequences as input.

...read moreread less

5 citations

Journal Article•DOI•

Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences

[...]

Bin Fu, Ming-Yang Kao, Lusheng Wang

01 Nov 2009-SIAM Journal on Discrete Mathematics

TL;DR: This paper gives the first analytical proof that multiple background sequences do help with finding subtle and faint motifs, and develops an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large.

...read moreread less

Abstract: We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are $k$ background sequences, and each character in a background sequence is a random character from an alphabet $\Sigma$. A motif $G=g_1g_2\cdots g_m$ is a string of $m$ characters. Each background sequence is implanted into a probabilistically generated approximate copy of $G$. For an approximate copy $b_1b_2\cdots b_m$ of $G$, every character $b_i$ is probabilistically generated such that the probability for $b_i eq g_i$ is at most $\alpha$. In this paper, we give the first analytical proof that multiple background sequences do help with finding subtle and faint motifs. This work is a theoretical approach with a rigorous probabilistic analysis. We develop an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large. Specifically, we prove that for $\alpha 0$ such that if the length of the motif is at least $\delta_0\log n$, the alphabet has at least $t_0$ characters, and there are at least $\delta_1\log n_0$ input sequences, then in $O(n^3)$ time our algorithm finds the motif with probability at least $1-\frac{1}{2^x}$, where $n$ is the longest length of any input sequence and $n_0\leq n$ is an upper bound for the length of the motif.

...read moreread less

4 citations

Journal Article•DOI•

Linear-Time Haplotype Inference on Pedigrees without Recombinations and Mating Loops

[...]

Mee Yee Chan, Wun-Tat Chan, Francis Y. L. Chin¹, Stanley P. Y. Fung, Ming-Yang Kao - Show less +1 more•Institutions (1)

University of Hong Kong¹

01 Feb 2009-SIAM Journal on Computing

TL;DR: An optimal linear-time algorithm is presented to solve the haplotype inference problem for pedigree data when there are no recombinations and the pedigree has no mating loops, based on the use of graphs to capture SNP, Mendelian, and parity constraints of the given pedigree.

...read moreread less

Abstract: In this paper, an optimal linear-time algorithm is presented to solve the haplotype inference problem for pedigree data when there are no recombinations and the pedigree has no mating loops The approach is based on the use of graphs to capture SNP, Mendelian, and parity constraints of the given pedigree This representation allows us to capture the constraints as the edges in a graph, rather than as a system of linear equations as in previous approaches Graph traversals are then used to resolve the parity of these edges, resulting in an optimal running time

...read moreread less

4 citations

Proceedings Article•DOI•

Fast Accurate Algorithms for Tail Conditional Expectation

[...]

Bryant Chen¹, William M Y Hsu¹, Ming-Yang Kao•Institutions (1)

Northwestern University¹

14 Sep 2009

TL;DR: A key finding is that combining the techniques of tilting lattice, extrapolation and fractional steps substantially increases speed and accuracy.

...read moreread less

Abstract: This paper proposes novel lattice algorithms to compute tail conditional expectation. To improve the naive approach, we develop tilting, trinomial, and extrapolation algorithms together with some syntheses of these algorithms. Furthermore, we introduce fractional‐step lattices to prevent interpolation error in the extrapolation algorithms. A key finding is that combining the techniques of tilting lattice, extrapolation and fractional steps substantially increases speed and accuracy. We demostrate the efficiency and accuracy of these algorithms with numerical results.

...read moreread less

Book Chapter•DOI•

Two-Vertex Connectivity Augmentations for Graphs with a Partition Constraint (Extended Abstract)

[...]

Pei-Chi Huang¹, Hsin-Wen Wei², Yen-Chiu Chen¹, Ming-Yang Kao³, Wei-Kuan Shih¹, Tsan-sheng Hsu² - Show less +2 more•Institutions (3)

National Tsing Hua University¹, Academia Sinica², Northwestern University³

05 Dec 2009

TL;DR: This paper proposes an algorithm to solve the two-vertex connectivity augmentation problem in an undirected graph whose vertices are partitioned into k sets that runs in linear time in the size of the input graph.

...read moreread less

Abstract: In this paper, we study the two-vertex connectivity augmentation problem in an undirected graph whose vertices are partitioned into k sets. Our objective is to add the smallest number of edges to the graph such that the resulting graph is 2-vertex connected under the constraint that each new edge is between two different sets in the partition. We propose an algorithm to solve the above augmentation problem that runs in linear time in the size of the input graph.

...read moreread less