scispace - formally typeset
Search or ask a question

Showing papers by "Michael Mitzenmacher published in 2022"


Journal Article
TL;DR: SNARF is presented, a learned range filter that efficiently supports range queries for numerical data that provides up to 50x better false positive rate than state-of-the-art range filters, such as SuRF and Rosetta with the same space usage.
Abstract: We present Sparse Numerical Array-Based Range Filters (SNARF), a learned range filter that efficiently supports range queries for numerical data. SNARF creates a model of the data distribution to map the keys into a bit array which is stored in a compressed form. The model along with the compressed bit array which constitutes SNARF are used to answer membership queries. WeevaluateSNARFon multiple syntheticandreal-world datasets as a stand-alone filter and by integrating it into RocksDB. For range queries, SNARF provides up to 50x better false positive rate than state-of-the-art range filters, such as SuRF and Rosetta, with the same space usage. We also evaluate SNARF in RocksDB as a filter replacement for filtering requests before they access on-disk data structures. For RocksDB, SNARF can improve the execution time of the system up to 10x compared to SuRF and Rosetta for certain read-only workloads.

5 citations


Journal ArticleDOI
TL;DR: This paper proposes QUIC-FL, a DME algorithm that is unbiased, offers fast aggregation time, and is competitive with the most accurate (slow aggregation) DME techniques.
Abstract: Distributed Mean Estimation (DME), in which $n$ clients communicate vectors to a parameter server that estimates their average, is a fundamental building block in communication-efficient federated learning. In this paper, we improve on previous DME techniques that achieve the optimal $O(1/n)$ Normalized Mean Squared Error (NMSE) guarantee by asymptotically improving the complexity for either encoding or decoding (or both). To achieve this, we formalize the problem in a novel way that allows us to use off-the-shelf mathematical solvers to design the quantization.

4 citations


Journal ArticleDOI
TL;DR: This work finds that learned modelscan finds that hash functions are not suitable for certaindatasets, but only forcertaindatadistributions, and tries to evaluate the effectiveness of using learned models instead of traditional hash functions.
Abstract: Hashing is a fundamental operation in database management, playing a key role in the implementation of numerous core database data structures and algorithms. Traditional hash functions aim to mimic a function that maps a key to a random value, which can result in collisions, where multiple keys are mapped to the same value. There are many well-known schemes like chaining, probing, and cuckoo hashing to handle collisions. In this work, we aim to study if using learned models instead of traditional hash functions can reduce collisions and whether such a reduction translates to improved performance, particularly for indexing and joins. We show that learned models reduce collisions in some cases, which depend on how the data is distributed. To evaluate the effectiveness of learned models as hash function, we test them with bucket chaining, linear probing, and cuckoo hash tables. We find that learned models can (1) yield a 1.4x lower probe latency, and (2) reduce the non-partitioned hash join runtime with 28% over the next best baseline for certain datasets. On the other hand, if the data distribution is not suitable, we either do not see gains or see worse performance. In summary, we find that learned models can indeed outperform hash functions, but only for certain data distributions.

3 citations


Proceedings ArticleDOI
10 Jun 2022
TL;DR: Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement, is introduced and empirically demonstrate the accuracy of the model and Proteus' ability to optimize over both synthetic workloads and real-world datasets.
Abstract: We introduce Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement. Proteus unifies the probabilistic and deterministic design spaces of state-of-the-art range filters to achieve robust performance across a larger variety of use cases. At the core of Proteus lies our Contextual Prefix FPR (CPFPR) model - a formal framework for the FPR of prefix-based filters across their design spaces. We empirically demonstrate the accuracy of our model and Proteus' ability to optimize over both synthetic workloads and real-world datasets. We further evaluate Proteus in RocksDB and show that it is able to improve end-to-end performance by as much as 5.3x over more brittle state-of-the-art methods such as SuRF and Rosetta. Our experiments also indicate that the cost of modeling is not significant compared to the end-to-end performance gains and that Proteus is robust to workload shifts.

3 citations


Journal ArticleDOI

[...]

TL;DR: SNARF is presented, a learned range filter that efficiently supports range queries for numerical data that provides up to 50x better false positive rate than state-of-the-art range filters, such as SuRF and Rosetta with the same space usage.
Abstract: We present Sparse Numerical Array-Based Range Filters (SNARF), a learned range filter that efficiently supports range queries for numerical data. SNARF creates a model of the data distribution to map the keys into a bit array which is stored in a compressed form. The model along with the compressed bit array which constitutes SNARF are used to answer membership queries. We evaluate SNARF on multiple synthetic and real-world datasets as a stand-alone filter and by integrating it into RocksDB. For range queries, SNARF provides up to 50x better false positive rate than state-of-the-art range filters, such as SuRF and Rosetta, with the same space usage. We also evaluate SNARF in RocksDB as a filter replacement for filtering requests before they access on-disk data structures. For RocksDB, SNARF can improve the execution time of the system up to 10x compared to SuRF and Rosetta for certain read-only workloads.

2 citations



Journal ArticleDOI
TL;DR: Tabula reduces overall communication by up to 9× and achieves a speedup of up to 50×, while imposing comparable storage costs, which leads to significant performance gains over garbled circuits with quantized inputs during secure inference on neural networks.
Abstract: Multiparty computation approaches to secure neural network inference traditionally rely on garbled circuits for securely executing nonlinear activation functions. However, garbled circuits require excessive communication between server and client, impose significant storage overheads, and incur large runtime penalties. To eliminate these costs, we propose an alternative to garbled circuits: Tabula, an algorithm based on secure lookup tables. Tabula leverages neural networks’ ability to be quantized and employs a secure lookup table approach to efficiently, securely, and accurately compute neural network nonlinear activation functions. Compared to garbled circuits with quantized inputs, when computing individual nonlinear functions, our experiments show Tabula uses between 35×-70× less communication, is over 100× faster, and uses a comparable amount of storage. This leads to significant performance gains over garbled circuits with quantized inputs during secure inference on neural networks: Tabula reduces overall communication by up to 9× and achieves a speedup of up to 50×, while imposing comparable storage costs.

2 citations


Journal ArticleDOI
TL;DR: This paper proposes FRANCIS, a new framework for running message passing algorithms on programmable switches to enable fast reactions to network events in large networks, and exemplifies the framework’s usefulness by improving the resiliency and reaction times of clock synchronization and source-routed multicast.
Abstract: Distributed protocols are widely used to support network functions such as clock synchronization and multicast. As the network gets larger and faster, it is increasingly challenging for these protocols to react quickly to network events. The theory community has made significant progress in developing distributed message passing algorithms with improved convergence times. With the emerging programmability at switches, it now becomes feasible to adopt and adapt these theoretical advances for networking functions. In this paper, we propose FRANCIS, a new framework for running message passing algorithms on programmable switches to enable fast reactions to network events in large networks. We introduce an execution engine with computing and communication primitives for supporting message passing algorithms in P4 switches. We exemplify the framework’s usefulness by improving the resiliency and reaction times of clock synchronization and source-routed multicast. In particular, our approach allows lower clock drift than Sundial and PTP, quickly recovers from multiple failures, and reduces the time uncertainty bound by up to 5x. Compared with state-of-the-art multicast solutions, our approach uses packet headers up to 33% smaller and has an order of magnitude faster reaction time. current and future theoretical results in message passing algorithms into viable real-world implementations on the data plane layer of networked switches, via useful abstractions and breaking the problem into suitable subtasks. Our framework

Journal Article
TL;DR: In this paper , a natural queueing theory framework is proposed to make job scheduling systems incentive compatible, without using monetary charges, under the assumption that each user has an estimate of their job's running time, but this estimate may be incorrect.
Abstract: For job scheduling systems, where jobs require some amount of processing and then leave the system, it is natural for each user to provide an estimate of their job’s time requirement in order to aid the scheduler. However, if there is no incentive mechanism for truthfulness, each user will be motivated to provide estimates that give their job precedence in the schedule, so that the job completes as early as possible. We examine how to make such scheduling systems incentive compatible, without using monetary charges, under a natural queueing theory framework. In our setup, each user has an estimate of their job’s running time, but it is possible for this estimate to be incorrect. We examine scheduling policies where if a job exceeds its estimate, it is with some probability “punished” and re-scheduled after other jobs, to disincentivize underestimates of job times. However, because user estimates may be incorrect (without any malicious intent), excessive punishment may incentivize users to overestimate their job times, which leads to less efficient scheduling. We describe two natural scheduling policies, BlindTrust and MeasuredTrust. We show that, for both of these policies, given the parameters of the system, we can efficiently determine the set of punishment probabilities that are incentive compatible, in that users are incentivized to provide their actual estimate of the job time. Moreover, we prove for MeasuredTrust that in the limit as estimates converge to perfect accuracy, the range of punishment probabilities that are incentive compatible converges to [0, 1]. Our formalism establishes a framework for studying further queue-based scheduling problems where job time estimates from users are utilized, and the system needs to incentivize truthful reporting of estimates.

Journal Article
TL;DR: Direct Telemetry Access is introduced, a solution that allows fast and efficient telemetry collection, aggregation, and indexing that can collect and aggregate over 90M INT path traces per second with a single collector, improving over Confluo, the state-of-the-art CPU-based collector, by up to 55x.
Abstract: The emergence of programmable switches allows operators to collect a vast amount of fine-grained telemetry data in real time. However, consolidating the telemetry reports at centralized collectors to gain a network-wide view poses an im-mense challenge. The received data has to be transported from the switches, parsed, manipulated, and inserted in queryable data structures. As the network scales, this requires excessive CPU processing. RDMA is a transport protocol that bypasses the CPU and allows extremely high data transfer rates. Yet, RDMA is not designed for telemetry collection: it requires a stateful connection, supports only a small number of concurrent writers, and has limited writing primitives that restrict its applicability to data aggregation. We introduce Direct Telemetry Access (DTA), a solution that allows fast and efficient telemetry collection, aggregation, and indexing. Our system establishes RDMA connections only from collectors’ ToR switches, called translators , that process DTA reports from all other switches. DTA features novel and expressive reporting primitives such as Key-Write, Postcarding, Append, Sketch-Merge, and Key-Increment that allow integration of telemetry systems such as INT, Marple, and others. The translators then aggregate, batch, and write the reports to collectors’ memory in queryable form. As a result, our solution can collect and aggregate over 90M INT path traces per second with a single collector, improving over Confluo, the state-of-the-art CPU-based collector, by up to 55x.