Showing papers by "Ming-Yang Kao published in 2006"

PDF

Open Access

Proceedings Article•DOI•

Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience

[...]

Zhichun Li¹, Manan Sanghi¹, Yan Chen¹, Ming-Yang Kao¹, B. Chavez¹ - Show less +1 more•Institutions (1)

21 May 2006

TL;DR: Hamsa is proposed, a network-based automated signature generation system for polymorphic worms which is fast, noise-tolerant and attack-resilient, and significantly outperforms Polygraph in terms of efficiency, accuracy, and attack resilience.

...read moreread less

Abstract: Zero-day polymorphic worms pose a serious threat to the security of Internet infrastructures. Given their rapid propagation, it is crucial to detect them at edge networks and automatically generate signatures in the early stages of infection. Most existing approaches for automatic signature generation need host information and are thus not applicable for deployment on high-speed network links. In this paper, we propose Hamsa, a network-based automated signature generation system for polymorphic worms which is fast, noise-tolerant and attack-resilient. Essentially, we propose a realistic model to analyze the invariant content of polymorphic worms which allows us to make analytical attack-resilience guarantees for the signature generation algorithm. Evaluation based on a range of polymorphic worms and polymorphic engines demonstrates that Hamsa significantly outperforms Polygraph (J. Newsome et al., 2005) in terms of efficiency, accuracy, and attack resilience.

...read moreread less

313 citations

Proceedings Article•DOI•

Reducing tile complexity for self-assembly through temperature programming

[...]

Ming-Yang Kao¹, Robert T. Schweller¹•Institutions (1)

Northwestern University¹

22 Jan 2006

TL;DR: This work suggests that temperature change can constitute a natural, dynamic method for providing input to self-assembly systems that is potentially superior to the current technique of designing large tile sets with specific inputs hardwired into the tileset.

...read moreread less

Abstract: We consider the tile self-assembly model and how tile complexity can be eliminated by permitting the temperature of the self-assembly system to be adjusted throughout the assembly process. To do this, we propose novel techniques for designing tile sets that permit an arbitrary length m binary number to be encoded into a sequence of O(m) temperature changes such that the tile set uniquely assembles a supertile that precisely encodes the corresponding binary number. As an application, we show how this provides a general tile set of size O(1) that is capable of uniquely assembling essentially any n X n square, where the assembled square is determined by a temperature sequence of length O(log n) that encodes a binary description of n. This yields an important decrease in tile complexity from the required Ω(log n/log log n) for almost all n when the temperature of the system is fixed. We further show that for almost all n, no tile system can simultaneously achieve both o(log n) temperature complexity and O(log n/log log n) tile complexity, showing that both versions of an optimal square building scheme have been discovered. This work suggests that temperature change can constitute a natural, dynamic method for providing input to self-assembly systems that is potentially superior to the current technique of designing large tile sets with specific inputs hardwired into the tileset.

...read moreread less

95 citations

Posted Content•

Reducing Tile Complexity for Self-Assembly Through Temperature Programming

[...]

Ming-Yang Kao¹, Robert T. Schweller¹•Institutions (1)

Northwestern University¹

05 Feb 2006-arXiv: Computational Complexity

TL;DR: In this paper, the authors consider the tile self-assembly model and show how to reduce tile complexity by allowing the temperature of the self-assembling system to be adjusted throughout the assembly process.

...read moreread less

Abstract: We consider the tile self-assembly model and how tile complexity can be eliminated by permitting the temperature of the self-assembly system to be adjusted throughout the assembly process. To do this, we propose novel techniques for designing tile sets that permit an arbitrary length $m$ binary number to be encoded into a sequence of $O(m)$ temperature changes such that the tile set uniquely assembles a supertile that precisely encodes the corresponding binary number. As an application, we show how this provides a general tile set of size O(1) that is capable of uniquely assembling essentially any $n\times n$ square, where the assembled square is determined by a temperature sequence of length $O(\log n)$ that encodes a binary description of $n$. This yields an important decrease in tile complexity from the required $\Omega(\frac{\log n}{\log\log n})$ for almost all $n$ when the temperature of the system is fixed. We further show that for almost all $n$, no tile system can simultaneously achieve both $o(\log n)$ temperature complexity and $o(\frac{\log n}{\log\log n})$ tile complexity, showing that both versions of an optimal square building scheme have been discovered. This work suggests that temperature change can constitute a natural, dynamic method for providing input to self-assembly systems that is potentially superior to the current technique of designing large tile sets with specific inputs hardwired into the tileset.

...read moreread less

86 citations

Journal Article•DOI•

Design optimization methods for genomic DNA tiling arrays

[...]

Paul Bertone¹, Valery Trifonov¹, Joel Rozowsky¹, Falk Schubert¹, Olof Emanuelsson¹, John E. Karro¹, Ming-Yang Kao, Michael Snyder¹, Mark Gerstein¹ - Show less +5 more•Institutions (1)

Yale University¹

01 Feb 2006-Genome Research

TL;DR: A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements, and two algorithms for finding an optimal tile path composed of longer sequence tiles are developed.

...read moreread less

Abstract: A recent development in microarray research entails the unbiased coverage, or tiling, of genomic DNA for the large-scale identification of transcribed sequences and regulatory elements. A central issue in designing tiling arrays is that of arriving at a single-copy tile path, as significant sequence cross-hybridization can result from the presence of non-unique probes on the array. Due to the fragmentation of genomic DNA caused by the widespread distribution of repetitive elements, the problem of obtaining adequate sequence coverage increases with the sizes of subsequence tiles that are to be included in the design. This becomes increasingly problematic when considering complex eukaryotic genomes that contain many thousands of interspersed repeats. The general problem of sequence tiling can be framed as finding an optimal partitioning of non-repetitive subsequences over a prescribed range of tile sizes, on a DNA sequence comprising repetitive and non-repetitive regions. Exact solutions to the tiling problem become computationally infeasible when applied to large genomes, but successive optimizations are developed that allow their practical implementation. These include an efficient method for determining the degree of similarity of many oligonucleotide sequences over large genomes, and two algorithms for finding an optimal tile path composed of longer sequence tiles. The first algorithm, a dynamic programming approach, finds an optimal tiling in linear time and space; the second applies a heuristic search to reduce the space complexity to a constant requirement. A Web resource has also been developed, accessible at http://tiling.gersteinlab.org, to generate optimal tile paths from user-provided DNA sequences.

...read moreread less

60 citations

Proceedings Article•DOI•

Reverse Hashing for High-Speed Network Monitoring: Algorithms, Evaluation, and Applications

[...]

Robert T. Schweller¹, Zhichun Li¹, Yan Chen¹, Yan Gao¹, Ashish Gupta¹, Yin Zhang², Peter A. Dinda, Ming-Yang Kao, Gokhan Memik - Show less +5 more•Institutions (2)

Northwestern University¹, University of Texas at Austin²

23 Apr 2006

TL;DR: Both the analytical and experimental results show that the proposed reverse hashing scheme is able to achieve online traffic monitoring and accurate change/intrusion detection over massive data streams on high speed links, all in a manner that scales to large key space size.

...read moreread less

Abstract: A key function for network traffic monitoring and analysis is the ability to perform aggregate queries over multiple data streams. Change detection is an important primitive which can be extended to construct many aggregate queries. The recently proposed sketches (Krishnamurthy, 2003) are among the very few that can detect heavy changes online for high speed links, and thus support various aggregate queries in both temporal and spatial domains. However, it does not preserve the keys (e.g., source IP address) of flows, making it difficult to reconstruct the desired set of anomalous keys. In an earlier abstract we proposed a framework for a reversible sketch data structure that offers hope for efficient extraction of keys (Schweller, 2004). However, this scheme is only able to detect a single heavy change key and places restrictions on the statistical properties of the key space. To address these challenges, we propose an efficient reverse hashing scheme to infer the keys of culprit flows from reversible sketches. There are two phases. The first operates online, recording the packet stream in a compact representation with negligible extra memory and few extra memory accesses. Our prototype single FPGA board implementation can achieve a throughput of over 16 Gbps for 40-byte-packet streams (the worst case). The second phase identifies heavy changes and their keys from the representation in nearly real time. We evaluate our scheme using traces from large edge routers with OC-12 or higher links. Both the analytical and experimental results show that we are able to achieve online traffic monitoring and accurate change/intrusion detection over massive data streams on high speed links, all in a manner that scales to large key space size. To the best of our knowledge, our system is the first to achieve these properties simultaneously.

...read moreread less

47 citations

Book Chapter•DOI•

Linear-Time haplotype inference on pedigrees without recombinations

[...]

Mee Yee Chan¹, Wun-Tat Chan¹, Francis Y. L. Chin¹, Stanley P. Y. Fung², Ming-Yang Kao³ - Show less +1 more•Institutions (3)

University of Hong Kong¹, University of Leicester², Northwestern University³

11 Sep 2006

TL;DR: A linear-time algorithm, which is optimal, is presented to solve the haplotype inference problem for pedigree data when there are no recombinations and the pedigree has no mating loops.

...read moreread less

Abstract: In this paper, a linear-time algorithm, which is optimal, is presented to solve the haplotype inference problem for pedigree data when there are no recombinations and the pedigree has no mating loops. The approach is based on the use of graphs to capture SNP, Mendelian and parity constraints of the given pedigree.

...read moreread less

26 citations

Book Chapter•DOI•

An approximation algorithm for a bottleneck traveling salesman problem

[...]

Ming-Yang Kao¹, Manan Sanghi¹•Institutions (1)

Northwestern University¹

29 May 2006

TL;DR: An approximation algorithm is provided for this bottleneck version of the Traveling Salesman Problem by exploiting the underlying geometry in a novel fashion and achieving an approximation ratio of (2+γ) where f(x)=g(x), the approximation ratio is 3.

...read moreread less

Abstract: Consider a truck running along a road. It picks up a load Li at point βi and delivers it at αi, carrying at most one load at a time. The speed on the various parts of the road in one direction is given by f(x) and that in the other direction is given by g(x). Minimizing the total time spent to deliver loads L1,...,Ln is equivalent to solving the Traveling Salesman Problem (TSP) where the cities correspond to the loads Li with coordinates (αi, βi) and the distance from Li to Lj is given by $\int^{\beta_j}_{\alpha_i} f(x)dx$ if βj ≥ αi and by $\int^{\alpha_i}_{\beta_j} g(x)dx$ if βj < αi. This case of TSP is polynomially solvable with significant real-world applications. Gilmore and Gomory obtained a polynomial time solution for this TSP [6]. However, the bottleneck version of the problem (BTSP) was left open. Recently, Vairaktarakis showed that BTSP with this distance metric is NP-complete [10]. We provide an approximation algorithm for this BTSP by exploiting the underlying geometry in a novel fashion. This also allows for an alternate analysis of Gilmore and Gomory's polynomial time algorithm for the TSP. We achieve an approximation ratio of (2+γ) where $\gamma \geq \frac{f(x)}{g(x)} \geq \frac{1}{\gamma} \; \forall x$. Note that when f(x)=g(x), the approximation ratio is 3.

...read moreread less

8 citations

Journal Article•

Flexible Word Design and Graph Labeling

[...]

Ming-Yang Kao, Manan Sanghi, Robert T. Schweller

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: This work considers a generalization of the code word design problem in which an input graph is given which must be labeled with equal length binary strings of minimal length such that the Hamming distance is small between words of adjacent nodes and large between Words of non-adjacent nodes.

...read moreread less

Abstract: Motivated by emerging applications for DNA code word design, we consider a generalization of the code word design problem in which an input graph is given which must be labeled with equal length binary strings of minimal length such that the Hamming distance is small between words of adjacent nodes and large between words of non-adjacent nodes. For general graphs we provide algorithms that bound the word length with respect to either the maximum degree of any vertex or the number of edges in either the input graph or its complement. We further provide multiple types of recursive, deterministic algorithms for trees and forests, and provide an improvement for forests that makes use of randomization.

...read moreread less

5 citations

Book Chapter•DOI•

Flexible word design and graph labeling

[...]

Ming-Yang Kao¹, Manan Sanghi¹, Robert T. Schweller¹•Institutions (1)

Northwestern University¹

18 Dec 2006

TL;DR: In this article, a generalization of the code word design problem is considered, in which an input graph is given which must be labeled with equal length binary strings of minimal length such that the Hamming distance is small between words of adjacent nodes and large between word of non-adjacent nodes.

...read moreread less

5 citations

Journal Article•

A 6-Approximation Algorithm for Computing Smallest Common AoN-Supertree with Application to the Reconstruction of Glycan Trees

[...]

Kiyoko F. Aoki-Kinoshita, Minoru Kanehisa, Ming-Yang Kao, Xiang-Yang Li, Weizhao Wang - Show less +1 more

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this article, a polynomial-time greedy algorithm with approximation ratio 6.5 was proposed for the smallest common AoN-supertree problem, which aims to find the smallest possible node-labeled rooted tree such that every tree T ι in T is an all-or-nothing subtree of LCST.

...read moreread less

Abstract: A node-labeled rooted tree T (with root r) is an all-or-nothing subtree (called AoN-subtree) of a node-labeled rooted tree T' if (1) T is a subtree of the tree rooted at some node u (with the same label as r) of T', (2) for each internal node v of T, all the neighbors of v in T' are the neighbors of v in T. Tree T' is then called an AoN-supertree of T. Given a set T = {Ti, T2, ..., T n } of n node-labeled rooted trees, smallest common AoN-supertree problem seeks the smallest possible node-labeled rooted tree (denoted as LCST) such that every tree T ι in T is an AoN-subtree of LCST. It generalizes the smallest superstring problem and it has applications in glycobiology. We present a polynomial-time greedy algorithm with approximation ratio 6.

...read moreread less

1 citations

Book Chapter•DOI•

A 6-approximation algorithm for computing smallest common aon-supertree with application to the reconstruction of glycan trees

[...]

Kiyoko F. Aoki-Kinoshita¹, Minoru Kanehisa², Ming-Yang Kao³, Xiang-Yang Li⁴, Weizhao Wang⁴ - Show less +1 more•Institutions (4)

Soka University of America¹, University of Tokyo², Northwestern University³, Illinois Institute of Technology⁴

18 Dec 2006

TL;DR: The smallest common AoN-supertree problem seeks the smallest possible node-labeled rooted tree (denoted as ${\textbf{LCST}}$) such that every tree Ti in ${\mathcal {T}}$ is an AoN

...read moreread less

Abstract: A node-labeled rooted tree T (with root r) is an all-or-nothing subtree (called AoN-subtree) of a node-labeled rooted tree T′ if (1) T is a subtree of the tree rooted at some node u (with the same label as r) of T′, (2) for each internal node v of T, all the neighbors of v in T′ are the neighbors of v in T. Tree T′ is then called an AoN-supertree of T. Given a set ${\mathcal {T}}=\{{T}_1,{T}_2,\cdots, {T}_n\}$ of nnode-labeled rooted trees, smallest common AoN-supertree problem seeks the smallest possible node-labeled rooted tree (denoted as ${\textbf{LCST}}$) such that every tree Ti in ${\mathcal {T}}$ is an AoN-subtree of ${\textbf{LCST}}$. It generalizes the smallest superstring problem and it has applications in glycobiology. We present a polynomial-time greedy algorithm with approximation ratio 6.

...read moreread less

Posted Content•

Randomized Fast Design of Short DNA Words

[...]

Ming-Yang Kao¹, Manan Sanghi², Robert T. Schweller³•Institutions (3)

Northwestern University¹, Microsoft², University of Texas–Pan American³

19 Jan 2006-arXiv: Data Structures and Algorithms

TL;DR: In this article, a natural optimization formulation of the DNA code design problem is proposed, in which the goal is to design n strings that satisfy a given set of constraints while minimizing the length of the strings.

...read moreread less

Abstract: We consider the problem of efficiently designing sets (codes) of equal-length DNA strings (words) that satisfy certain combinatorial constraints. This problem has numerous motivations including DNA computing and DNA self-assembly. Previous work has extended results from coding theory to obtain bounds on code size for new biologically motivated constraints and has applied heuristic local search and genetic algorithm techniques for code design. This paper proposes a natural optimization formulation of the DNA code design problem in which the goal is to design n strings that satisfy a given set of constraints while minimizing the length of the strings. For multiple sets of constraints, we provide high-probability algorithms that run in time polynomial in n and any given constraint parameters, and output strings of length within a constant factor of the optimal. To the best of our knowledge, this work is the first to consider this type of optimization problem in the context of DNA code design.

...read moreread less

The 6th International Workshop on Algorithms in Bioinformatics

[...]

Man-Yee Chan, Joseph Wun-Tat Chan, Francis Y. L. Chin, Stanley P. Y. Fung, Ming-Yang Kao - Show less +1 more

01 Jan 2006

Session 2A-Approximation Algorithms-A 6-Approximation Algorithm for Computing Smallest Common AoN-Supertree with Application to the Reconstruction of Glycan Trees

[...]

Kiyoko F. Aoki-Kinoshita, Minoru Kanehisa, Ming-Yang Kao, Xiang-Yang Li, Weizhao Wang - Show less +1 more

01 Jan 2006

TL;DR: In this paper, a polynomial-time greedy algorithm for the smallest common AoN-supertree problem with approximation ratio 6.5 is presented. But the algorithm is not applicable to the smallest superstring problem.

...read moreread less