Showing papers by "Wing-Kin Sung published in 2002"

PDF

Open Access

Book Chapter•DOI•

A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

[...]

Tak-Wah Lam¹, Kunihiko Sadakane², Wing-Kin Sung³, Siu-Ming Yiu¹•Institutions (3)

University of Hong Kong¹, Tohoku University², National University of Singapore³

15 Aug 2002

TL;DR: This paper initiates the study of constructing compressed suffix arrays directly from text with the main contribution is a new construction algorithm that uses only O(n) bits of working memory, and more importantly, the time complexity remains the same as before.

...read moreread less

Abstract: With the first Human DNA being decoded into a sequence of about 2.8 billion base pairs, many biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 Gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from text. The main contribution is a new construction algorithm that uses only O(n) bits of working memory, and more importantly, the time complexity remains the same as before, i.e., O(n log n).

...read moreread less

77 citations

Journal Article•DOI•

A Decomposition Theorem for Maximum Weight Bipartite Matchings

[...]

Ming-Yang Kao¹, Tak-Wah Lam², Wing-Kin Sung², Hing-Fung Ting²•Institutions (2)

Yale University¹, University of Hong Kong²

01 Jan 2002-SIAM Journal on Computing

TL;DR: A new decomposition theorem is presented for maximum weight bipartite matchings and the weight of a maximum weight matching of G - {u} for all nodes u in O(W) time is computed.

...read moreread less

Abstract: Let G be a bipartite graph with positive integer weights on the edges and without isolated nodes. Let n, N, and W be the node count, the largest edge weight, and the total weight of G. Let k(x, y) be log x / log (x2/y). We present a new decomposition theorem for maximum weight bipartite matchings and use it to design an $O(\sqrt{n}W / k(n, W/N))$-time algorithm for computing a maximum weight matching of G. This algorithm bridges a long-standing gap between the best known time complexity of computing a maximum weight matching and that of computing a maximum cardinality matching. Given G and a maximum weight matching of G, we can further compute the weight of a maximum weight matching of G - {u} for all nodes u in O(W) time.

...read moreread less

60 citations

Journal Article•DOI•

Automatic construction of online catalog topologies

[...]

Wing-Kin Sung¹, David Yang¹, Siu-Ming Yiu¹, David W. Cheung¹, Wai-Shing Ho¹, Tak-Wah Lam¹ - Show less +2 more•Institutions (1)

University of Hong Kong¹

01 Nov 2002

TL;DR: This paper proposes a metric, based on the popularity of products and the relative importance of product attribute values, to evaluate the quality of a catalog organization and develops an efficient greedy algorithm, GENCAT, which produces better catalog organizations based on this metric.

...read moreread less

Abstract: A good online catalog is crucial to the success of an e-commerce web site. Traditionally, an online catalog is mainly built by hand. To what extent this can be automated is a challenging problem. Recently, there have been investigations on how to reorganize an existing online catalog based on some criteria, but none of them has addressed the problem of organizing an online catalog automatically from scratch. This paper attempts to tackle this problem. We model an online catalog organization as a decision tree structure and propose a metric, based on the popularity of products and the relative importance of product attribute values, to evaluate the quality of a catalog organization. The problem is then formulated as a decision tree construction problem. Although traditional decision tree algorithms, such as C4.5, can be used to generate online catalog organization, the catalog constructed is generally not good based on our metric. An efficient greedy algorithm (GENCAT) is thus developed, and the experimental results show that GENCAT produces better catalog organizations based on our metric.

...read moreread less

13 citations

Book Chapter•DOI•

On the Control of Hybridization Noise in DNA Sequencing-by-Hybridization

[...]

Hon Wai Leong¹, Franco P. Preparata², Wing-Kin Sung¹, Hugo Willy¹•Institutions (2)

National University of Singapore¹, Brown University²

17 Sep 2002

TL;DR: This paper proves that the reported dramatic drop in performance is attributable to algorithmic artifacts, and presents instead an algorithm for sequence reconstruction under hybridization noise, which exhibits graceful degradation of performance as the error-rate increases.

...read moreread less

Abstract: DNA sequencing-by-hybridization (SBH) is a powerful potential alternative to current sequencing by electrophoresis. Different SBH methods have been compared under the hypothesis of error-free hybridization. However both false negatives and false positive are likely to occur in practice. Under the assumption of random independent hybridization errors, Doi and Imai [3] recently concluded that the algorithms of [15], which are asymptotically optimal in the error-free case, cannot be successfully adapted to noisy conditions. In this paper we prove that the reported dramatic drop in performance is attributable to algorithmic artifacts, and present instead an algorithm for sequence reconstruction under hybridization noise, which exhibits graceful degradation of performance as the error-rate increases. As a downside, the computational cost of sequence reconstruction rises noticeably under noisy conditions.

...read moreread less

7 citations

Posted Content•

Improved Phylogeny Comparisons: Non-Shared Edges Nearest Neighbor Interchanges, and Subtree Transfers

[...]

Wing-Kai Hon, Ming-Yang Kao, Tak-Wah Lam, Wing-Kin Sung, Siu-Ming Yiu - Show less +1 more

11 Nov 2002-arXiv: Data Structures and Algorithms

TL;DR: This paper gives the first sub-quadratic time algorithm for finding the non-shared edges of two phylogenies, which is then used to speed up the existing approximation algorithm for the NNI distance.

...read moreread less

Abstract: The number of the non-shared edges of two phylogenies is a basic measure of the dissimilarity between the phylogenies. The non-shared edges are also the building block for approximating a more sophisticated metric called the nearest neighbor interchange (NNI) distance. In this paper, we give the first subquadratic-time algorithm for finding the non-shared edges, which are then used to speed up the existing approximating algorithm for the NNI distance from $O(n^2)$ time to $O(n \log n)$ time. Another popular distance metric for phylogenies is the subtree transfer (STT) distance. Previous work on computing the STT distance considered degree-3 trees only. We give an approximation algorithm for the STT distance for degree-$d$ trees with arbitrary $d$ and with generalized STT operations.

...read moreread less

1 citations