scispace - formally typeset
Search or ask a question
Author

Milan Vojnovic

Bio: Milan Vojnovic is an academic researcher from London School of Economics and Political Science. The author has contributed to research in topics: Node (networking) & Scheduling (computing). The author has an hindex of 36, co-authored 122 publications receiving 6168 citations. Previous affiliations of Milan Vojnovic include Microsoft & University of Split.


Papers
More filters
Proceedings Article
06 Dec 2017
TL;DR: Quantized SGD (QSGD) as discussed by the authors is a family of compression schemes for gradient updates which provides convergence guarantees for convex and nonconvex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques.
Abstract: Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always guarantee convergence, and it is not clear whether they can be improved. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes for gradient updates which provides convergence guarantees. QSGD allows the user to smoothly trade off \emph{communication bandwidth} and \emph{convergence time}: nodes can adjust the number of bits sent per iteration, at the cost of possibly higher variance. We show that this trade-off is inherent, in the sense that improving it past some threshold would violate information-theoretic lower bounds. QSGD guarantees convergence for convex and non-convex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques. When applied to training deep neural networks for image classification and automated speech recognition, QSGD leads to significant reductions in end-to-end training time. For example, on 16GPUs, we can train the ResNet152 network to full accuracy on ImageNet 1.8x faster than the full-precision variant.

759 citations

Proceedings ArticleDOI
09 Sep 2007
TL;DR: The fundamental properties that determine the basic performance metrics for opportunistic communications are examined, and empirical evidence is presented that the return time of a mobile device to its favorite location site may already explain the observed dichotomy.
Abstract: We examine the fundamental properties that determine the basic performance metrics for opportunistic communications. We first consider the distribution of inter-contact times between mobile devices. Using a diverse set of measured mobility traces, we find as an invariant property that there is a characteristic time, order of half a day, beyond which the distribution decays exponentially. Up to this value, the distribution in many cases follows a power law, as shown in recent work. This powerlaw finding was previously used to support the hypothesis that inter-contact time has a power law tail, and that common mobility models are not adequate. However, we observe that the time scale of interest for opportunistic forwarding may be of the same order as the characteristic time, and thus the exponential tail is important. We further show that already simple models such as random walk and random way point can exhibit the same dichotomy in the distribution of inter-contact time ascin empirical traces. Finally, we perform an extensive analysis of several properties of human mobility patterns across several dimensions, and we present empirical evidence that the return time of a mobile device to its favorite location site may already explain the observed dichotomy. Our findings suggest that existing results on the performance of forwarding schemes basedon power-law tails might be overly pessimistic.

687 citations

Proceedings ArticleDOI
13 Mar 2005
TL;DR: A generic mobility model for independent mobiles that contains as special cases the random waypoint on convex or non convex domains, random walk with reflection or wrapping, city section, space graph and other models is defined.
Abstract: We define "random trip", a generic mobility model for independent mobiles that contains as special cases: the random waypoint on convex or non convex domains, random walk with reflection or wrapping, city section, space graph and other models. We use Palm calculus to study the model and give a necessary and sufficient condition for a stationary regime to exist. When this condition is satisfied, we compute the stationary regime and give an algorithm to start a simulation in steady state (perfect simulation). The algorithm does not require the knowledge of geometric constants. For the special case of random waypoint, we provide for the first time a proof and a sufficient and necessary condition of the existence of a stationary regime. Further, we extend its applicability to a broad class of non convex and multi-site examples, and provide a ready-to-use algorithm for perfect simulation. For the special case of random walks with reflection or wrapping, we show that, in the stationary regime, the mobile location is uniformly distributed and is independent of the speed vector, and that there is no speed decay. Our framework provides a rich set of well understood models that can be used to simulate mobile networks with independent node movements. Our perfect sampling is implemented to use with ns-2, and it is freely available to download from http://ica1www.epfl.ch/RandomTrip.

503 citations

Posted Content
TL;DR: Quantized SGD is proposed, a family of compression schemes for gradient updates which provides convergence guarantees and leads to significant reductions in end-to-end training time, and can be extended to stochastic variance-reduced techniques.
Abstract: Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Consequently, lossy compression heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always provably converge, and it is not clear whether they are optimal. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes which allow the compression of gradient updates at each node, while guaranteeing convergence under standard assumptions. QSGD allows the user to trade off compression and convergence time: it can communicate a sublinear number of bits per iteration in the model dimension, and can achieve asymptotically optimal communication cost. We complement our theoretical results with empirical data, showing that QSGD can significantly reduce communication cost, while being competitive with standard uncompressed techniques on a variety of real tasks. In particular, experiments show that gradient quantization applied to training of deep neural networks for image classification and automated speech recognition can lead to significant reductions in communication cost, and end-to-end training time. For instance, on 16 GPUs, we are able to train a ResNet-152 network on ImageNet 1.8x faster to full accuracy. Of note, we show that there exist generic parameter settings under which all known network architectures preserve or slightly improve their full accuracy when using quantization.

419 citations

Proceedings ArticleDOI
24 Feb 2014
TL;DR: This work derives a novel one-pass, streaming graph partitioning algorithm and shows that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs.
Abstract: Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of non-neighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs. Surprisingly, despite the fact that our algorithm is a one-pass streaming algorithm, we found its performance to be in many cases comparable to the de-facto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 81/2 hours by METIS to produce a balanced partition that cuts 11.98% of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.

324 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The existence of fair end-to-end window-based congestion control protocols for packet-switched networks with first come-first served routers is demonstrated using a Lyapunov function.
Abstract: In this paper, we demonstrate the existence of fair end-to-end window-based congestion control protocols for packet-switched networks with first come-first served routers. Our definition of fairness generalizes proportional fairness and includes arbitrarily close approximations of max-min fairness. The protocols use only information that is available to end hosts and are designed to converge reasonably fast. Our study is based on a multiclass fluid model of the network. The convergence of the protocols is proved using a Lyapunov function. The technical challenge is in the practical implementation of the protocols.

2,161 citations

Proceedings ArticleDOI
02 Mar 2009
TL;DR: This paper presents the Opportunistic Networking Environment (ONE) simulator specifically designed for evaluating DTN routing and application protocols, and shows sample simulations to demonstrate the simulator's flexible support for DTN protocol evaluation.
Abstract: Delay-tolerant Networking (DTN) enables communication in sparse mobile ad-hoc networks and other challenged environments where traditional networking fails and new routing and application protocols are required. Past experience with DTN routing and application protocols has shown that their performance is highly dependent on the underlying mobility and node characteristics. Evaluating DTN protocols across many scenarios requires suitable simulation tools. This paper presents the Opportunistic Networking Environment (ONE) simulator specifically designed for evaluating DTN routing and application protocols. It allows users to create scenarios based upon different synthetic movement models and real-world traces and offers a framework for implementing routing and application protocols (already including six well-known routing protocols). Interactive visualization and post-processing tools support evaluating experiments and an emulation mode allows the ONE simulator to become part of a real-world DTN testbed. We show sample simulations to demonstrate the simulator's flexible support for DTN protocol evaluation.

2,075 citations