scispace - formally typeset
Search or ask a question
Author

Vladimir Braverman

Bio: Vladimir Braverman is an academic researcher from Johns Hopkins University. The author has contributed to research in topics: Computer science & Coreset. The author has an hindex of 25, co-authored 158 publications receiving 2475 citations. Previous affiliations of Vladimir Braverman include University of California, Los Angeles & Google.


Papers
More filters
Proceedings ArticleDOI
22 Aug 2016
TL;DR: UnivMon is presented, a framework for flow monitoring which leverages recent theoretical advances and demonstrates that it is possible to achieve both generality and high accuracy, and evaluated using a range of trace-driven evaluations to show that it offers comparable (and sometimes better) accuracy relative to custom sketching solutions.
Abstract: Network management requires accurate estimates of metrics for traffic engineering (e.g., heavy hitters), anomaly detection (e.g., entropy of source addresses), and security (e.g., DDoS detection). Obtaining accurate estimates given router CPU and memory constraints is a challenging problem. Existing approaches fall in one of two undesirable extremes: (1) low fidelity general-purpose approaches such as sampling, or (2) high fidelity but complex algorithms customized to specific application-level metrics. Ideally, a solution should be both general (i.e., supports many applications) and provide accuracy comparable to custom algorithms. This paper presents UnivMon, a framework for flow monitoring which leverages recent theoretical advances and demonstrates that it is possible to achieve both generality and high accuracy. UnivMon uses an application-agnostic data plane monitoring primitive; different (and possibly unforeseen) estimation algorithms run in the control plane, and use the statistics from the data plane to compute application-level metrics. We present a proof-of-concept implementation of UnivMon using P4 and develop simple coordination techniques to provide a ``one-big-switch'' abstraction for network-wide monitoring. We evaluate the effectiveness of UnivMon using a range of trace-driven evaluations and show that it offers comparable (and sometimes better) accuracy relative to custom sketching solutions.

440 citations

Proceedings Article
21 Nov 2020
TL;DR: This paper introduces a novel algorithm, called FetchSGD, which compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers.
Abstract: Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch. This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.

169 citations

Proceedings ArticleDOI
19 Aug 2019
TL;DR: The design and implementation of NitroSketch is presented, a sketching framework that systematically addresses the performance bottlenecks of sketches without sacrificing robustness and generality and is implemented on three popular software platforms.
Abstract: Software switches are emerging as a vital measurement vantage point in many networked systems. Sketching algorithms or sketches, provide high-fidelity approximate measurements, and appear as a promising alternative to traditional approaches such as packet sampling. However, sketches incur significant computation overhead in software switches. Existing efforts in implementing sketches in virtual switches make sacrifices on one or more of the following dimensions: performance (handling 40 Gbps line-rate packet throughput with low CPU footprint), robustness (accuracy guarantees across diverse workloads), and generality (supporting various measurement tasks). In this work, we present the design and implementation of NitroSketch, a sketching framework that systematically addresses the performance bottlenecks of sketches without sacrificing robustness and generality. Our key contribution is the careful synthesis of rigorous, yet practical solutions to reduce the number of per-packet CPU and memory operations. We implement NitroSketch on three popular software platforms (Open vSwitch-DPDK, FD.io-VPP, and BESS) and evaluate the performance. We show that accuracy is comparable to unmodified sketches while attaining up to two orders of magnitude speedup, and up to 45% reduction in CPU usage.

140 citations

Posted Content
TL;DR: This work introduces a new technique for converting an offline coreset construction to the streaming setting, and provides the first generalizations of such coresets for handling outliers.
Abstract: Let $P$ be a set (called points), $Q$ be a set (called queries) and a function $ f:P\times Q\to [0,\infty)$ (called cost). For an error parameter $\epsilon>0$, a set $S\subseteq P$ with a \emph{weight function} $w:P \rightarrow [0,\infty)$ is an $\epsilon$-coreset if $\sum_{s\in S}w(s) f(s,q)$ approximates $\sum_{p\in P} f(p,q)$ up to a multiplicative factor of $1\pm\epsilon$ for every given query $q\in Q$. We construct coresets for the $k$-means clustering of $n$ input points, both in an arbitrary metric space and $d$-dimensional Euclidean space. For Euclidean space, we present the first coreset whose size is simultaneously independent of both $d$ and $n$. In particular, this is the first coreset of size $o(n)$ for a stream of $n$ sparse points in a $d \ge n$ dimensional space (e.g. adjacency matrices of graphs). We also provide the first generalizations of such coresets for handling outliers. For arbitrary metric spaces, we improve the dependence on $k$ to $k \log k$ and present a matching lower bound. For $M$-estimator clustering (special cases include the well-known $k$-median and $k$-means clustering), we introduce a new technique for converting an offline coreset construction to the streaming setting. Our method yields streaming coreset algorithms requiring the storage of $O(S + k \log n)$ points, where $S$ is the size of the offline coreset. In comparison, the previous state-of-the-art was the merge-and-reduce technique that required $O(S \log^{2a+1} n)$ points, where $a$ is the exponent in the offline construction's dependence on $\epsilon^{-1}$. For example, combining our offline and streaming results, we produce a streaming metric $k$-means coreset algorithm using $O(\epsilon^{-2} k \log k \log n)$ points of storage. The previous state-of-the-art required $O(\epsilon^{-4} k \log k \log^{6} n)$ points.

136 citations

Proceedings ArticleDOI
21 Oct 2007
TL;DR: This paper presents a new smooth histograms method that improves the approximation error rate obtained via exponential histograms and provides the first approximation algorithms for the following functions: Lp norms for p notin, frequency moments, length of increasing subsequence and geometric mean.
Abstract: In the streaming model elements arrive sequentially and can be observed only once. Maintaining statistics and aggregates is an important and non-trivial task in the model. This becomes even more challenging in the sliding windows model, where statistics must be maintained only over the most recent n elements. In their pioneering paper, Datar, Gionis, Indyk and Motwani [15] presented exponential histograms, an effective method for estimating statistics on sliding windows. In this paper we present a new smooth histograms method that improves the approximation error rate obtained via exponential histograms. Furthermore, our smooth histograms method not only captures and improves multiple previous results on sliding windows bur also extends the class functions that can be approximated on sliding windows. In particular, we provide the first approximation algorithms for the following functions: Lp norms for p notin [1,2], frequency moments, length of increasing subsequence and geometric mean.

130 citations


Cited by
More filters
Book ChapterDOI
01 Jan 1998
TL;DR: In this paper, the authors explore questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties, using diffusion processes as a model of a Markov process with continuous sample paths.
Abstract: We explore in this chapter questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties. This endeavor is really a study of diffusion processes. Loosely speaking, the term diffusion is attributed to a Markov process which has continuous sample paths and can be characterized in terms of its infinitesimal generator.

2,446 citations

01 Jan 2009
TL;DR: The aim of the research presented in this thesis is to create new methods for design for manufacturing, by using several approaches of KE, and find the beneficial and less beneficial aspects of these methods in comparison to each other and earlier research.
Abstract: As companies strive to develop artefacts intended for services instead of traditional sell-off, new challenges in the product development process arise to promote continuous improvement and increasing market profits. This creates a focus on product life-cycle components as companies then make life-cycle commitments, where they are responsible for the function availability during the extent of the life-cycle, i.e. functional products. One of these life-cycle components is manufacturing; therefore, companies search for new approaches of success during manufacturability evaluation already in engineering design. Efforts have been done to support early engineering design, as this phase sets constraints and opportunities for manufacturing. These efforts have turned into design for manufacturing methods and guidelines. A further step to improve the life-cycle focus during early engineering design is to reuse results and use experience from earlier projects. However, because results and experiences created during project work are often not documented for reuse, only remembered by some people, there is a need for design support. Knowledge engineering (KE) is a methodology for creating knowledge-based systems, e.g. systems that enable reuse of earlier results and make available both explicit and tacit corporate knowledge, enabling the automated generation and evaluation of new engineering design solutions during early product development. There are a variety of KE-approaches, such as knowledge-based engineering, case-based reasoning and programming, which have been used in research to develop design for manufacturing methods and applications. There are, however, opportunities for research where several approaches and their interdependencies, to create a transparent picture of how KE can be used to support engineering design, are investigated. The aim of the research presented in this thesis is to create new methods for design for manufacturing, by using several approaches of KE, and find the beneficial and less beneficial aspects of these methods in comparison to each other and earlier research. This thesis presents methods and applications for design for manufacturing using KE. KE has been employed in several ways, namely rule-based, rule-, programmingand finite element analysis (FEA)-based, and ruleand plan-based, which are tested and compared with each other. Results show that KE can be used to generate information about manufacturing in several ways. The rule-based way is suitable for supporting life-cycle commitments, as engineering design and manufacturing can be integrated with maintenance and performance predictions during early engineering design, though limited to the firing of production rules. The rule-, programmingand FEA-based way can be used to integrate computer-aided design tools and virtual manufacturing for non-linear stress and displacement analysis. This way may also bridge the gap between engineering designers and computational experts, even though this way requires a larger effort to program than the rule-based. The ruleand planbased way can enable design for manufacturing in two fashions – based on earlier manufacturing plans and based on rules. Because earlier manufacturing plans, together with programming algorithms, can handle knowledge that may be more intricate to capture as rules, as opposed to the time demanding routine work that is often automated by means of rules, several opportunities for designing for manufacturing exist.

727 citations

Proceedings ArticleDOI
25 Oct 2008
TL;DR: In this article, a stream-cipher S whose implementation is secure even if a bounded amount of arbitrary (adversarially chosen) information on the internal state ofS is leaked during computation is presented.
Abstract: We construct a stream-cipher S whose implementation is secure even if a bounded amount of arbitrary (adversarially chosen) information on the internal state ofS is leaked during computation. This captures all possible side-channel attacks on S where the amount of information leaked in a given period is bounded, but overall can be arbitrary large. The only other assumption we make on the implementation of S is that only data that is accessed during computation leaks information. The stream-cipher S generates its output in chunks K1, K2, . . . and arbitrary but bounded information leakage is modeled by allowing the adversary to adaptively chose a function fl : {0,1}* rarr {0, 1}lambda before Kl is computed, she then gets fl(taul) where taul is the internal state ofS that is accessed during the computation of Kg. One notion of security we prove for S is that Kg is indistinguishable from random when given K1,..., K1-1,f1(tau1 ),..., fl-1(taul-1) and also the complete internal state of S after Kg has been computed (i.e. S is forward-secure). The construction is based on alternating extraction (used in the intrusion-resilient secret-sharing scheme from FOCS'07). We move this concept to the computational setting by proving a lemma that states that the output of any PRG has high HILLpseudoentropy (i.e. is indistinguishable from some distribution with high min-entropy) even if arbitrary information about the seed is leaked. The amount of leakage lambda that we can tolerate in each step depends on the strength of the underlying PRG, it is at least logarithmic, but can be as large as a constant fraction of the internal state of S if the PRG is exponentially hard.

519 citations

Proceedings ArticleDOI
22 Aug 2016
TL;DR: UnivMon is presented, a framework for flow monitoring which leverages recent theoretical advances and demonstrates that it is possible to achieve both generality and high accuracy, and evaluated using a range of trace-driven evaluations to show that it offers comparable (and sometimes better) accuracy relative to custom sketching solutions.
Abstract: Network management requires accurate estimates of metrics for traffic engineering (e.g., heavy hitters), anomaly detection (e.g., entropy of source addresses), and security (e.g., DDoS detection). Obtaining accurate estimates given router CPU and memory constraints is a challenging problem. Existing approaches fall in one of two undesirable extremes: (1) low fidelity general-purpose approaches such as sampling, or (2) high fidelity but complex algorithms customized to specific application-level metrics. Ideally, a solution should be both general (i.e., supports many applications) and provide accuracy comparable to custom algorithms. This paper presents UnivMon, a framework for flow monitoring which leverages recent theoretical advances and demonstrates that it is possible to achieve both generality and high accuracy. UnivMon uses an application-agnostic data plane monitoring primitive; different (and possibly unforeseen) estimation algorithms run in the control plane, and use the statistics from the data plane to compute application-level metrics. We present a proof-of-concept implementation of UnivMon using P4 and develop simple coordination techniques to provide a ``one-big-switch'' abstraction for network-wide monitoring. We evaluate the effectiveness of UnivMon using a range of trace-driven evaluations and show that it offers comparable (and sometimes better) accuracy relative to custom sketching solutions.

440 citations

Journal ArticleDOI
13 May 2014
TL;DR: The techniques developed in this area are now finding applications in other areas including data structures for dynamic graphs, approximation algorithms, and distributed and parallel computation.
Abstract: Over the last decade, there has been considerable interest in designing algorithms for processing massive graphs in the data stream model. The original motivation was two-fold: a) in many applications, the dynamic graphs that arise are too large to be stored in the main memory of a single machine and b) considering graph problems yields new insights into the complexity of stream computation. However, the techniques developed in this area are now finding applications in other areas including data structures for dynamic graphs, approximation algorithms, and distributed and parallel computation. We survey the state-of-the-art results; identify general techniques; and highlight some simple algorithms that illustrate basic ideas.

405 citations