scispace - formally typeset
Search or ask a question

Showing papers by "Ming-Yang Kao published in 2009"


Journal ArticleDOI
TL;DR: This article proposes a natural optimization formulation of the DNA code design problem in which the goal is to design n strings that satisfy a given set of constraints while minimizing the length of the strings.
Abstract: We consider the problem of efficiently designing sets (codes) of equal-length DNA strings (words) that satisfy certain combinatorial constraints. This problem has numerous motivations including DNA self-assembly and DNA computing. Previous work has extended results from coding theory to obtain bounds on code size for new biologically motivated constraints and has applied heuristic local search and genetic algorithm techniques for code design. This article proposes a natural optimization formulation of the DNA code design problem in which the goal is to design n strings that satisfy a given set of constraints while minimizing the length of the strings. For multiple sets of constraints, we provide simple randomized algorithms that run in time polynomial in n and any given constraint parameters, and output strings of length within a constant factor of the optimal with high probability. To the best of our knowledge, this work is the first to consider this type of optimization problem in the context of DNA code design.

15 citations


Journal ArticleDOI
TL;DR: Minimizing the total time spent to deliver loads L"1,...,L"n is equivalent to solving the traveling salesman problem (TSP) where the cities correspond to the loads with coordinates (@a"i,@b"i) and the distance from L"i to L"j is given by @!"@a"""i^@b^"^jf(x)dx if @b"j>[email protected]"i and by @!".

12 citations


Journal ArticleDOI
TL;DR: The inapproximability constant for the triangle packing problem improves upon the previous results and the results on the maximum profit coverage problem provides almost matching upper and lower bounds on the approximation ratio, answering a question posed by Hassin and Or.

10 citations


Book ChapterDOI
11 Jul 2009
TL;DR: This paper shows that for 0 < D ≤ n 0.294, the problem with the binary alphabet set can be solved within time complexity, and provides an alternative approach not involving algebraic matrix multiplication, which has the time complexity with small constant, and is effective for practical use.
Abstract: Finding the closest pair among a given set of points under Hamming Metric is a fundamental problem with many applications. Let n be the number of points and D the dimensionality of all points. We show that for 0 < D ≤ n 0.294, the problem, with the binary alphabet set, can be solved within time complexity $O\left(n^{2+o(1)}\right)$, whereas for n 0.294 < D ≤ n , it can be solved within time complexity $O\left(n^{1.843} D^{0.533}\right)$. We also provide an alternative approach not involving algebraic matrix multiplication, which has the time complexity $O\left(n^2D/\log^2 D\right)$ with small constant, and is effective for practical use. Moreover, for arbitrary large alphabet set, an algorithm with the time complexity $O\left(n^2\sqrt{D}\right)$ is obtained for 0 < D ≤ n 0.294, whereas the time complexity is $O\left(n^{1.921} D^{0.767}\right)$ for n 0.294 < D ≤ n . In addition, the algorithms propose in this paper provides a solution to the open problem stated by Kao et al.

9 citations


Book ChapterDOI
12 May 2009
TL;DR: An efficient algorithm is developed that can discover a hidden motif from a set of sequences for any alphabet Σ with |Σ | *** 2 and is applicable to DNA motif discovery.
Abstract: We study a natural probabilistic model for motif discovery that has been used to experimentally test the effectiveness of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ . A motif G = g 1 g 2 ...g m is a string of m characters. Each background sequence is implanted a probabilistically generated approximate copy of G . For a probabilistically generated approximate copy b 1 b 2 ...b m of G , every character is probabilistically generated such that the probability for b i *** g i is at most *** . It has been conjectured that multiple background sequences can help with finding faint motifs G . In this paper, we develop an efficient algorithm that can discover a hidden motif from a set of sequences for any alphabet Σ with |Σ | *** 2 and is applicable to DNA motif discovery. We prove that for $\alpha and any constant x *** 8, there exist positive constants c 0 , *** , *** 1 and *** 2 such that if the length ρ of motif G is at least *** 1 logn , and there are k *** c 0 logn input sequences, then in O (n 2 + kn ) time this algorithm finds the motif with probability at least $1-{1\over 2^x}$ for every $G\in \Sigma^{\rho}-\Psi_{\rho, h,\epsilon}(\Sigma)$, where ρ is the length of the motif, h is a parameter with ρ *** 4h *** *** 2 logn , and *** ρ , h ,*** (Σ ) is a small subset of at most $2^{-\Theta(\epsilon^2 h)}$ fraction of the sequences in Σ ρ . The constants c 0 , *** , *** 1 and *** 2 do not depend on x when x is a parameter of order O (logn ). Our algorithm can take any number k sequences as input.

5 citations


Journal ArticleDOI
TL;DR: This paper gives the first analytical proof that multiple background sequences do help with finding subtle and faint motifs, and develops an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large.
Abstract: We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are $k$ background sequences, and each character in a background sequence is a random character from an alphabet $\Sigma$. A motif $G=g_1g_2\cdots g_m$ is a string of $m$ characters. Each background sequence is implanted into a probabilistically generated approximate copy of $G$. For an approximate copy $b_1b_2\cdots b_m$ of $G$, every character $b_i$ is probabilistically generated such that the probability for $b_i eq g_i$ is at most $\alpha$. In this paper, we give the first analytical proof that multiple background sequences do help with finding subtle and faint motifs. This work is a theoretical approach with a rigorous probabilistic analysis. We develop an algorithm that under the probabilistic model can find the implanted motif with high probability when the number of background sequences is reasonably large. Specifically, we prove that for $\alpha 0$ such that if the length of the motif is at least $\delta_0\log n$, the alphabet has at least $t_0$ characters, and there are at least $\delta_1\log n_0$ input sequences, then in $O(n^3)$ time our algorithm finds the motif with probability at least $1-\frac{1}{2^x}$, where $n$ is the longest length of any input sequence and $n_0\leq n$ is an upper bound for the length of the motif.

4 citations


Journal ArticleDOI
TL;DR: An optimal linear-time algorithm is presented to solve the haplotype inference problem for pedigree data when there are no recombinations and the pedigree has no mating loops, based on the use of graphs to capture SNP, Mendelian, and parity constraints of the given pedigree.
Abstract: In this paper, an optimal linear-time algorithm is presented to solve the haplotype inference problem for pedigree data when there are no recombinations and the pedigree has no mating loops The approach is based on the use of graphs to capture SNP, Mendelian, and parity constraints of the given pedigree This representation allows us to capture the constraints as the edges in a graph, rather than as a system of linear equations as in previous approaches Graph traversals are then used to resolve the parity of these edges, resulting in an optimal running time

4 citations


Proceedings ArticleDOI
14 Sep 2009
TL;DR: A key finding is that combining the techniques of tilting lattice, extrapolation and fractional steps substantially increases speed and accuracy.
Abstract: This paper proposes novel lattice algorithms to compute tail conditional expectation. To improve the naive approach, we develop tilting, trinomial, and extrapolation algorithms together with some syntheses of these algorithms. Furthermore, we introduce fractional‐step lattices to prevent interpolation error in the extrapolation algorithms. A key finding is that combining the techniques of tilting lattice, extrapolation and fractional steps substantially increases speed and accuracy. We demostrate the efficiency and accuracy of these algorithms with numerical results.

Book ChapterDOI
05 Dec 2009
TL;DR: This paper proposes an algorithm to solve the two-vertex connectivity augmentation problem in an undirected graph whose vertices are partitioned into k sets that runs in linear time in the size of the input graph.
Abstract: In this paper, we study the two-vertex connectivity augmentation problem in an undirected graph whose vertices are partitioned into k sets. Our objective is to add the smallest number of edges to the graph such that the resulting graph is 2-vertex connected under the constraint that each new edge is between two different sets in the partition. We propose an algorithm to solve the above augmentation problem that runs in linear time in the size of the input graph.