scispace - formally typeset
Search or ask a question
Author

Haodong Hu

Bio: Haodong Hu is an academic researcher from Stony Brook University. The author has contributed to research in topics: Medicine & Upper and lower bounds. The author has an hindex of 8, co-authored 9 publications receiving 232 citations. Previous affiliations of Haodong Hu include Shanghai University of Finance and Economics & Microsoft.

Papers
More filters
Journal ArticleDOI
TL;DR: The first adaptive packed-memory array (APMA), which automatically adjusts to the input pattern, is given, which has four times fewer element moves per insertion than the traditional PMA and running times that are more than seven times faster.
Abstract: The packed-memory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order in a Θ(N)-sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only a small number of elements need to be shifted around on an insert or delete. Because the elements are stored physically in sorted order in memory or on disk, the PMA can be used to support extremely efficient range queries. Specifically, the cost to scan L consecutive elements is O(1 p LsB) memory transfers.This article gives the first adaptive packed-memory array (APMA), which automatically adjusts to the input pattern. Like the traditional PMA, any pattern of updates costs only O(log2N) amortized element moves and O(1 p (log2N)sB) amortized memory transfers per update. However, the APMA performs even better on many common input distributions achieving only O(log N) amortized element moves and O(1p (logN)sB) amortized memory transfers. The article analyzes sequential inserts, where the insertions are to the front of the APMA, hammer inserts, where the insertions “hammer” on one part of the APMA, random inserts, where the insertions are after random elements in the APMA, and bulk inserts, where for constant α e [0, 1], Nα elements are inserted after random elements in the APMA. The article then gives simulation results that are consistent with the asymptotic bounds. For sequential insertions of roughly 1.4 million elements, the APMA has four times fewer element moves per insertion than the traditional PMA and running times that are more than seven times faster.

53 citations

Proceedings ArticleDOI
11 Jan 2004
TL;DR: This work studies the problem of sorting integer sequences and permutations by length-weighted reversals, and gives polynomial-time algorithms to determine the optimal reversal sequence for a restricted but interesting class of sequences and cost functions.
Abstract: We study the problem of sorting integer sequences and permutations by length-weighted reversals. We consider a wide class of cost functions, namely f(l) = lα for all α ≥ 0, where l is the length of the reversed subsequence. We present tight or nearly tight upper and lower bounds on the worst-case cost of sorting by reversals. Then we develop algorithms to approximate the optimal cost to sort a given input. Furthermore, we give polynomial-time algorithms to determine the optimal reversal sequence for a restricted but interesting class of sequences and cost functions. Our results have direct application in computational biology to the field of comparative genomics.

43 citations

Proceedings ArticleDOI
11 Oct 2003
TL;DR: It is shown that for a multilevel memory hierarchy, a simple cache-oblivious structure almost replicates the performance of an optimal parameterized k-level DAM structure, and it is demonstrated that as k grows, the search costs of the optimal k- level DAM search structure and the optimal cache-OBlivious search structure rapidly converge.
Abstract: Tight bounds on the cost of cache-oblivious searching are proved. It is shown that no cache-oblivious search structure can guarantee that a search performs fewer than lg e log/sub B/N block transfers between any two levels of the memory hierarchy. This lower bound holds even if all of the block sizes are limited to be powers of 2. A modified version of the van Emde Boas layout is proposed, whose expected block transfers between any two levels of the memory hierarchy arbitrarily close to [lg e + O(lg lg B/ lgB)] logB N + O(1). This factor approaches lg e /spl ap/ 1.443 as B increases. The expectation is taken over the random placement of the first element of the structure in memory. As searching in the disk access model (DAM) can be performed in log/sub B/N + 1 block transfers, this result shows a separation between the 2-level DAM and cache-oblivious memory-hierarchy models. By extending the DAM model to k levels, multilevel memory hierarchies can be modeled. It is shown that as k grows, the search costs of the optimal k-level DAM search structure and of the optimal cache-oblivious search structure rapidly converge. This demonstrates that for a multilevel memory hierarchy, a simple cache-oblivious structure almost replicates the performance of an optimal parameterized k-level DAM structure.

34 citations

Proceedings ArticleDOI
26 Jun 2006
TL;DR: The first adaptive packed-memory array (APMA), which automatically adjusts to the input pattern, is given, which has four times fewer element moves per insertion than the traditional PMA and running times that are more than seven times faster.
Abstract: The packed-memory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order in a Θ(N)-sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only a small number of elements need to be shifted around on an insert or delete. Because the elements are stored physically in sorted order in memory or on disk, the PMA can be used to support extremely efficient range queries. Specifically, the cost to scan L consecutive elements is O(1+L/B) memory transfers.This paper gives the first adaptive packed-memory array (APMA), which automatically adjusts to the input pattern. Like the original PMA, any pattern of updates costs only O(log2N) amortized element moves and O(1+(log2N)/B) amortized memory transfers per update. However, the APMA performs even better on many common input distributions achieving only O(logN) amortized element moves and O(1+(logN)/B) amortized memory transfers. The paper analyzes sequential inserts, where the insertions are to the front of the APMA, hammer inserts, where the insertions "hammer" on one part of the APMA, random inserts, where the insertions are after random elements in the APMA, and bulk inserts, where for constant α∈[0,1], Nα elements are inserted after random elements in the APMA. The paper then gives simulation results that are consistent with the asymptotic bounds. For sequential insertions of roughly 1.4 million elements, the APMA has four times fewer element moves per insertion than the traditional PMA and running times that are more than seven times faster.

33 citations

Book ChapterDOI
05 Jul 2004
TL;DR: The main result in this paper is an optimal polynomial-time algorithm for sorting circular 0/1 sequences when the cost function is additive.
Abstract: We consider the problem of sorting linear and circular permutations and 0/1 sequences by reversals in a length-sensitive cost model. We extend the results on sorting by length-weighted reversals in two directions: we consider the signed case for linear sequences and also the signed and unsigned cases for circular sequences. We give lower and upper bounds as well as guaranteed approximation ratios for these three cases. The main result in this paper is an optimal polynomial-time algorithm for sorting circular 0/1 sequences when the cost function is additive.

31 citations


Cited by
More filters
Journal Article
TL;DR: BLOCKIN BLOCKINÒ BLOCKin× ½¸ÔÔº ¾ßß¿º ¿ ¾ ¾ à ¼ à à 0
Abstract: BLOCKIN BLOCKINÒ BLOCKIN× ½¸ÔÔº ¿ßß¿º ¿

373 citations

Proceedings ArticleDOI
11 Jun 2020
TL;DR: Alex as mentioned in this paper is a learned index for read-write workloads that contains a mix of point lookups, short range queries, inserts, updates, and deletes, but it is limited to static, read-only workloads.
Abstract: Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+ tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+ trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.

154 citations

Patent
06 Apr 2010
TL;DR: In this article, a high-performance dictionary data structure is defined for storing data in a disk storage system, which supports full transactional semantics, concurrent access from multiple transactions, and logging and recovery.
Abstract: A method, apparatus and computer program product for storing data in a disk storage system is presented. A high-performance dictionary data structure is defined. The dictionary data structure is stored on a disk storage system. Key-value pairs can be inserted and deleted into the dictionary data structure. Updates run faster than one insertion per disk-head movement. The structure can also be stored on any system with two or more levels of memory. The dictionary is high performance and supports with full transactional semantics, concurrent access from multiple transactions, and logging and recovery. Keys can be looked up with only a logarithmic number of transfers, even for keys that have been recently inserted or deleted. Queries can be performed on ranges of key-value pairs, including recently inserted or deleted pairs, at a constant fraction of the bandwidth of the disk.

146 citations

Book
15 Aug 2011
TL;DR: This tutorial of B- tree techniques will stimulate research and development of modern B-tree indexing techniques for future data management systems.
Abstract: In summary, the core design of B-trees has remained unchanged in 40 years: balanced trees, pages or other units of I/O as nodes, efficient root-to-leaf search, splitting and merging nodes, etc. On the other hand, an enormous amount of research and development has improved every aspect of B-trees including data contents such as multi-dimensional data, access algorithms such as multi-dimensional queries, data organization within each node such as compression and cache optimization, concurrency control such as separation of latching and locking, recovery such as multi-level recovery, etc. Gray and Reuter believed in 1993 that “B-trees are by far the most important access path structure in database and file systems.” It seems that this statement remains true today. B-tree indexes are likely to gain new importance in relational databases due to the advent of flash storage. Fast access latencies permit many more random I/O operations than traditional disk storage, thus shifting the break-even point between a full-bandwidth scan and a B-tree index search, even if the scan has the benefit of columnar database storage. We hope that this tutorial of B-tree techniques will stimulate research and development of modern B-tree indexing techniques for future data management systems.

142 citations

Proceedings ArticleDOI
TL;DR: A new learned index called ALEX is presented which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes and effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint.
Abstract: Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+Trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.

121 citations