scispace - formally typeset
Search or ask a question
Author

Yonatan R. Fogel

Bio: Yonatan R. Fogel is an academic researcher from Stony Brook University. The author has contributed to research in topics: Data structure & Disk storage. The author has an hindex of 2, co-authored 2 publications receiving 301 citations.

Papers
More filters
Proceedings ArticleDOI
09 Jun 2007
TL;DR: A cache-aware version of the COLA, the lookahead array, which achieves the same bounds as Brodal and Fagerberg's (cache-aware) Bε-tree.
Abstract: A streaming B-tree is a dictionary that efficiently implements insertions and range queries. We present two cache-oblivious streaming B-trees, the shuttle tree, and the cache-oblivious lookahead array (COLA).For block-transfer size B and on N elements, the shuttle tree implements searches in optimal O(log B+1N) transfers, range queries of L successive elements in optimal O(log B+1N +L/B) transfers, and insertions in O((log B+1N)/BΘ(1/(log log B)2)+(log2N)/B) transfers, which is an asymptotic speedup over traditional B-trees if B ≥ (log N)1+c log log log2N for any constant c >1.A COLA implements searches in O(log N) transfers, range queries in O(log N + L/B) transfers, and insertions in amortized O((log N)/B) transfers, matching the bounds for a (cache-aware) buffered repository tree. A partially deamortized COLA matches these bounds but reduces the worst-case insertion cost to O(log N) if memory size M = Ω(log N). We also present a cache-aware version of the COLA, the lookahead array, which achieves the same bounds as Brodal and Fagerberg's (cache-aware) Be-tree.We compare our COLA implementation to a traditional B-tree. Our COLA implementation runs 790 times faster for random inser-tions, 3.1 times slower for insertions of sorted data, and 3.5 times slower for searches.

162 citations

Patent
06 Apr 2010
TL;DR: In this article, a high-performance dictionary data structure is defined for storing data in a disk storage system, which supports full transactional semantics, concurrent access from multiple transactions, and logging and recovery.
Abstract: A method, apparatus and computer program product for storing data in a disk storage system is presented. A high-performance dictionary data structure is defined. The dictionary data structure is stored on a disk storage system. Key-value pairs can be inserted and deleted into the dictionary data structure. Updates run faster than one insertion per disk-head movement. The structure can also be stored on any system with two or more levels of memory. The dictionary is high performance and supports with full transactional semantics, concurrent access from multiple transactions, and logging and recovery. Keys can be looked up with only a logarithmic number of transfers, even for keys that have been recently inserted or deleted. Queries can be performed on ranges of key-value pairs, including recently inserted or deleted pairs, at a constant fraction of the bandwidth of the disk.

146 citations


Cited by
More filters
Proceedings ArticleDOI
20 May 2012
TL;DR: In this article, the authors present bLSM, a Log Structured Merge (LSM) tree with the advantages of B-Trees and log structured approaches, which has near-optimal read and scan performance and its new "spring and gear" merge scheduler bounds write latency without impacting throughput or allowing merges to block writes for extended periods of time.
Abstract: Data management workloads are increasingly write-intensive and subject to strict latency SLAs. This presents a dilemma: Update in place systems have unmatched latency but poor write throughput. In contrast, existing log structured techniques improve write throughput but sacrifice read performance and exhibit unacceptable latency spikes.We begin by presenting a new performance metric: read fanout, and argue that, with read and write amplification, it better characterizes real-world indexes than approaches such as asymptotic analysis and price/performance.We then present bLSM, a Log Structured Merge (LSM) tree with the advantages of B-Trees and log structured approaches: (1) Unlike existing log structured trees, bLSM has near-optimal read and scan performance, and (2) its new "spring and gear" merge scheduler bounds write latency without impacting throughput or allowing merges to block writes for extended periods of time. It does this by ensuring merges at each level of the tree make steady progress without resorting to techniques that degrade read performance.We use Bloom filters to improve index performance, and find a number of subtleties arise. First, we ensure reads can stop after finding one version of a record. Otherwise, frequently written items would incur multiple B-Tree lookups. Second, many applications check for existing values at insert. Avoiding the seek performed by the check is crucial.

325 citations

Proceedings Article
22 Feb 2016
TL;DR: WiscKey as discussed by the authors is a persistent LSM-tree-based key-value store with a performance-oriented data layout that separates keys from values to minimize the I/O amplification.
Abstract: We present WiscKey, a persistent LSM-tree-based key-value store with a performance-oriented data layout that separates keys from values to minimize I/O amplification. The design of WiscKey is highly SSD optimized, leveraging both the sequential and random performance characteristics of the device. We demonstrate the advantages of WiscKey with both microbenchmarks and YCSB workloads. Microbenchmark results show that WiscKey is 2.5×-111× faster than LevelDB for loading a database and 1.6×-14× faster for random lookups. WiscKey is faster than both LevelDB and RocksDB in all six YCSB workloads.

297 citations

Patent
31 Jul 2013
TL;DR: In this paper, a method and system for indexing, searching, and retrieving information from timed media files based on relevance intervals is presented, where a portion of a timed media file is returned, which is selected specifically to be relevant to the given information representations.
Abstract: A method and system for indexing, searching, and retrieving information from timed media files based upon relevance intervals. The method and system for indexing, searching, and retrieving this information is based upon relevance intervals so that a portion of a timed media file is returned, which is selected specifically to be relevant to the given information representations, thereby eliminating the need for a manual determination of the relevance and avoiding missing relevant portions. The timed media includes streaming audio, streaming video, timed HTML, animations such as vector-based graphics, slide shows, other timed media, and combinations thereof.

276 citations

Proceedings ArticleDOI
14 Oct 2017
TL;DR: PebblesDB is built, a high-performance key-value store, by modifying HyperLevelDB to use the FLSM data structure, and two widely-used NoSQL stores, MongoDB and HyperDex, are modified to use PebblesDB as their underlying storage engine.
Abstract: Key-value stores such as LevelDB and RocksDB offer excellent write throughput, but suffer high write amplification. The write amplification problem is due to the Log-Structured Merge Trees data structure that underlies these key-value stores. To remedy this problem, this paper presents a novel data structure that is inspired by Skip Lists, termed Fragmented Log-Structured Merge Trees (FLSM). FLSM introduces the notion of guards to organize logs, and avoids rewriting data in the same level. We build PebblesDB, a high-performance key-value store, by modifying HyperLevelDB to use the FLSM data structure. We evaluate PebblesDB using micro-benchmarks and show that for write-intensive workloads, PebblesDB reduces write amplification by 2.4-3x compared to RocksDB, while increasing write throughput by 6.7x. We modify two widely-used NoSQL stores, MongoDB and HyperDex, to use PebblesDB as their underlying storage engine. Evaluating these applications using the YCSB benchmark shows that throughput is increased by 18-105% when using PebblesDB (compared to their default storage engines) while write IO is decreased by 35-55%.

200 citations

Proceedings ArticleDOI
14 Apr 2014
TL;DR: LevelDB is extended to explicitly leverage the multiple channels of an SSD to exploit its abundant parallelism and the throughput of storage system can be improved by more than 4X after applying all proposed optimization techniques.
Abstract: Various key-value (KV) stores are widely employed for data management to support Internet services as they offer higher efficiency, scalability, and availability than relational database systems. The log-structured merge tree (LSM-tree) based KV stores have attracted growing attention because they can eliminate random writes and maintain acceptable read performance. Recently, as the price per unit capacity of NAND flash decreases, solid state disks (SSDs) have been extensively adopted in enterprise-scale data centers to provide high I/O bandwidth and low access latency. However, it is inefficient to naively combine LSM-tree-based KV stores with SSDs, as the high parallelism enabled within the SSD cannot be fully exploited. Current LSM-tree-based KV stores are designed without assuming SSD's multi-channel architecture.To address this inadequacy, we propose LOCS, a system equipped with a customized SSD design, which exposes its internal flash channels to applications, to work with the LSM-tree-based KV store, specifically LevelDB in this work. We extend LevelDB to explicitly leverage the multiple channels of an SSD to exploit its abundant parallelism. In addition, we optimize scheduling and dispatching polices for concurrent I/O requests to further improve the efficiency of data access. Compared with the scenario where a stock LevelDB runs on a conventional SSD, the throughput of storage system can be improved by more than 4X after applying all proposed optimization techniques.

184 citations