scispace - formally typeset
Search or ask a question

Showing papers on "Time complexity published in 2016"


Posted Content
TL;DR: In this paper, a factorized convolution operator was introduced to reduce the number of parameters in the discriminative correlation filter (DCF) model and a compact generative model of the training sample distribution, which significantly reduced memory and time complexity, while providing better diversity of samples.
Abstract: In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model; (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of samples; (iii) a conservative model update strategy with improved robustness and reduced complexity. We perform comprehensive experiments on four benchmarks: VOT2016, UAV123, OTB-2015, and TempleColor. When using expensive deep features, our tracker provides a 20-fold speedup and achieves a 13.0% relative gain in Expected Average Overlap compared to the top ranked method in the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65.0% AUC on OTB-2015.

1,069 citations


Posted Content
TL;DR: This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.
Abstract: We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch. As many of the state-of-the-art models hit the upper bound of the GPU memory, our algorithm allows deeper and more complex models to be explored, and helps advance the innovations in deep learning research. We focus on reducing the memory cost to store the intermediate feature maps and gradients during training. Computation graph analysis is used for automatic in-place operation and memory sharing optimizations. We show that it is possible to trade computation for memory - giving a more memory efficient training algorithm with a little extra computation cost. In the extreme case, our analysis also shows that the memory consumption can be reduced to O(log n) with as little as O(n log n) extra cost for forward computation. Our experiments show that we can reduce the memory cost of a 1,000-layer deep residual network from 48G to 7G with only 30 percent additional running time cost on ImageNet problems. Similarly, significant memory cost reduction is observed in training complex recurrent neural networks on very long sequences.

673 citations


Journal ArticleDOI
TL;DR: A new two-dimensional Sine ICMIC modulation map (2D-SIMM) is proposed based on a close-loop modulation coupling (CMC) model, and its chaotic performance is analyzed by means of phase diagram, Lyapunov exponent spectrum and complexity.

360 citations


Journal ArticleDOI
TL;DR: This work introduces quantities called graph spectral proxies, defined using the powers of the variation operator, in order to approximate the spectral content of graph signals, and forms a direct sampling set selection approach that does not require the computation and storage of the basis elements.
Abstract: We study the problem of selecting the best sampling set for bandlimited reconstruction of signals on graphs. A frequency domain representation for graph signals can be defined using the eigenvectors and eigenvalues of variation operators that take into account the underlying graph connectivity. Smoothly varying signals defined on the nodes are of particular interest in various applications, and tend to be approximately bandlimited in the frequency basis. Sampling theory for graph signals deals with the problem of choosing the best subset of nodes for reconstructing a bandlimited signal from its samples. Most approaches to this problem require a computation of the frequency basis (i.e., the eigenvectors of the variation operator), followed by a search procedure using the basis elements. This can be impractical, in terms of storage and time complexity, for real datasets involving very large graphs. We circumvent this issue in our formulation by introducing quantities called graph spectral proxies, defined using the powers of the variation operator, in order to approximate the spectral content of graph signals. This allows us to formulate a direct sampling set selection approach that does not require the computation and storage of the basis elements. We show that our approach also provides stable reconstruction when the samples are noisy or when the original signal is only approximately bandlimited. Furthermore, the proposed approach is valid for any choice of the variation operator, thereby covering a wide range of graphs and applications. We demonstrate its effectiveness through various numerical experiments.

351 citations


Journal ArticleDOI
TL;DR: Very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model, are presented, showing that this model is robust to uncorrelated violations of the molecular clock.
Abstract: Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3.

329 citations


Journal ArticleDOI
TL;DR: It is shown that, if either of two plausible average-case hardness conjectures holds, then IQP computations are hard to simulate classically up to constant additive error.
Abstract: We use the class of commuting quantum computations known as IQP (instantaneous quantum polynomial time) to strengthen the conjecture that quantum computers are hard to simulate classically We show that, if either of two plausible average-case hardness conjectures holds, then IQP computations are hard to simulate classically up to constant additive error One conjecture relates to the hardness of estimating the complex-temperature partition function for random instances of the Ising model; the other concerns approximating the number of zeroes of random low-degree polynomials We observe that both conjectures can be shown to be valid in the setting of worst-case complexity We arrive at these conjectures by deriving spin-based generalizations of the boson sampling problem that avoid the so-called permanent anticoncentration conjecture

299 citations


Proceedings ArticleDOI
11 Apr 2016
TL;DR: The LargeVis is proposed, a technique that first constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space and easily scales to millions of high-dimensional data points.
Abstract: We study the problem of visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space. Much success has been reported recently by techniques that first compute a similarity structure of the data points and then project them into a low-dimensional space with the structure preserved. These two steps suffer from considerable computational costs, preventing the state-of-the-art methods such as the t-SNE from scaling to large-scale and high-dimensional data (e.g., millions of data points and hundreds of dimensions). We propose the LargeVis, a technique that first constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space. Comparing to t-SNE, LargeVis significantly reduces the computational cost of the graph construction step and employs a principled probabilistic model for the visualization step, the objective of which can be effectively optimized through asynchronous stochastic gradient descent with a linear time complexity. The whole procedure thus easily scales to millions of high-dimensional data points. Experimental results on real-world data sets demonstrate that the LargeVis outperforms the state-of-the-art methods in both efficiency and effectiveness. The hyper-parameters of LargeVis are also much more stable over different data sets.

291 citations


Proceedings Article
24 May 2016
TL;DR: It is proved that the commonly used non-convex objective function for positive semidefinite matrix completion has no spurious local minima --- all local minata must also be global.
Abstract: Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for matrix completion has no spurious local minima \--- all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve matrix completion with \textit{arbitrary} initialization in polynomial time.

281 citations


Proceedings ArticleDOI
10 Jan 2016
TL;DR: Spherical LSF is applied to sieving algorithms for solving the shortest vector problem (SVP) on lattices, and it is shown that this leads to a heuristic time complexity for solving SVP in dimension n of (3/2)n/2+o(n) a 20.292n+o (n).
Abstract: To solve the approximate nearest neighbor search problem (NNS) on the sphere, we propose a method using locality-sensitive filters (LSF), with the property that nearby vectors have a higher probability of surviving the same filter than vectors which are far apart. We instantiate the filters using spherical caps of height 1 -- α, where a vector survives a filter if it is contained in the corresponding spherical cap, and where ideally each filter has an independent, uniformly random direction.For small α, these filters are very similar to the spherical locality-sensitive hash (LSH) family previously studied by Andoni et al. For larger α bounded away from 0, these filters potentially achieve a superior performance, provided we have access to an efficient oracle for finding relevant filters. Whereas existing LSH schemes are limited by a performance parameter of ρ ≥ 1/(2c2 -- 1) to solve approximate NNS with approximation factor c, with spherical LSF we potentially achieve smaller asymptotic values of ρ, depending on the density of the data set. For sparse data sets where the dimension is super-logarithmic in the size of the data set, we asymptotically obtain ρ = 1/(2c2 -- 1), while for a logarithmic dimensionality with density constant κ we obtain asymptotics of ρ ~ 1/(4κc2).To instantiate the filters and prove the existence of an efficient decoding oracle, we replace the independent filters by filters taken from certain structured random product codes. We show that the additional structure in these concatenation codes allows us to decode efficiently using techniques similar to lattice enumeration, and we can find the relevant filters with low overhead, while at the same time not significantly changing the collision probabilities of the filters.We finally apply spherical LSF to sieving algorithms for solving the shortest vector problem (SVP) on lattices, and show that this leads to a heuristic time complexity for solving SVP in dimension n of (3/2)n/2+o(n) a 20.292n+o(n). This asymptotically improves upon the previous best algorithms for solving SVP which use spherical LSH and cross-polytope LSH and run in time 20.298n+o(n). Experiments with the GaussSieve validate the claimed speedup and show that this method may be practical as well, as the polynomial overhead is small.

264 citations


Journal ArticleDOI
TL;DR: An efficient algorithm is provided which approximates up to a multiplicative factor of O(log n), with n being the network size, any optimal actuator set that meets the same energy criteria; this is the best approximation factor one can achieve in polynomial time in the worst case.
Abstract: We address the problem of minimal actuator placement in a linear system subject to an average control energy bound. First, following the recent work of Olshevsky, we prove that this is NP-hard. Then, we provide an efficient algorithm which, for a given range of problem parameters, approximates up to a multiplicative factor of $O(\log n)$ , with $n$ being the network size, any optimal actuator set that meets the same energy criteria; this is the best approximation factor one can achieve in polynomial time in the worst case. Moreover, the algorithm uses a perturbed version of the involved control energy metric, which we prove to be supermodular. Next, we focus on the related problem of cardinality-constrained actuator placement for minimum control effort, where the optimal actuator set is selected so that an average input energy metric is minimized. While this is also an NP-hard problem, we use our proposed algorithm to efficiently approximate its solutions as well. Finally, we run our algorithms over large random networks to illustrate their efficiency.

233 citations


Posted Content
TL;DR: In this article, it was shown that the commonly used non-convex objective function for positive semidefinite matrix completion has no spurious local minima, and that all local minimizations must also be global.
Abstract: Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for \textit{positive semidefinite} matrix completion has no spurious local minima --- all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve positive semidefinite matrix completion with \textit{arbitrary} initialization in polynomial time. The result can be generalized to the setting when the observed entries contain noise. We believe that our main proof strategy can be useful for understanding geometric properties of other statistical problems involving partial or noisy observations.

Posted Content
TL;DR: In this paper, the Dense Inverse Search-based method (DIS) is proposed to find correspondences inspired by the inverse compositional image alignment proposed by Baker and Matthews in 2001.
Abstract: Most recent works in optical flow extraction focus on the accuracy and neglect the time complexity. However, in real-life visual applications, such as tracking, activity detection and recognition, the time complexity is critical. We propose a solution with very low time complexity and competitive accuracy for the computation of dense optical flow. It consists of three parts: 1) inverse search for patch correspondences; 2) dense displacement field creation through patch aggregation along multiple scales; 3) variational refinement. At the core of our Dense Inverse Search-based method (DIS) is the efficient search of correspondences inspired by the inverse compositional image alignment proposed by Baker and Matthews in 2001. DIS is competitive on standard optical flow benchmarks with large displacements. DIS runs at 300Hz up to 600Hz on a single CPU core, reaching the temporal resolution of human's biological vision system. It is order(s) of magnitude faster than state-of-the-art methods in the same range of accuracy, making DIS ideal for visual applications.

Book ChapterDOI
08 Oct 2016
TL;DR: The Dense Inverse Search-based method (DIS) is the efficient search of correspondences inspired by the inverse compositional image alignment proposed by Baker and Matthews (2001, 2004), making DIS ideal for real-time applications.
Abstract: Most recent works in optical flow extraction focus on the accuracy and neglect the time complexity. However, in real-life visual applications, such as tracking, activity detection and recognition, the time complexity is critical. We propose a solution with very low time complexity and competitive accuracy for the computation of dense optical flow. It consists of three parts: (1) inverse search for patch correspondences; (2) dense displacement field creation through patch aggregation along multiple scales; (3) variational refinement. At the core of our Dense Inverse Search-based method (DIS) is the efficient search of correspondences inspired by the inverse compositional image alignment proposed by Baker and Matthews (2001, 2004). DIS is competitive on standard optical flow benchmarks. DIS runs at 300 Hz up to 600 Hz on a single CPU core (1024 \(\times \) 436 resolution. 42 Hz/46 Hz when including preprocessing: disk access, image re-scaling, gradient computation. More details in Sect. 3.1.), reaching the temporal resolution of human’s biological vision system. It is order(s) of magnitude faster than state-of-the-art methods in the same range of accuracy, making DIS ideal for real-time applications.

Journal ArticleDOI
TL;DR: Experimental results based on different scales of DLRP instances demonstrate that the clustering algorithm can significantly improve the performance of KACO in terms of the qualities and robustness of solutions.

Journal ArticleDOI
TL;DR: The theoretical analysis proves that the maximal clique in the CG is also the maximum one when the substrate network has sufficient resources, and the first formal proof of the NP-completeness and inapproximability result of LC-VNE is provided.
Abstract: This paper tries to solve the location-constrained virtual network embedding (LC-VNE) problem efficiently. We first investigate the complexity of LC-VNE, and by leveraging the graph bisection problem, we provide the first formal proof of the $\mathcal {NP}$ -completeness and inapproximability result of LC-VNE. Then, we propose two novel LC-VNE algorithms based on a compatibility graph (CG) to achieve integrated node and link mapping. In particular, in the CG, each node represents a candidate substrate path for a virtual link, and each link indicates the compatible relation between its two endnodes. Our theoretical analysis proves that the maximal clique in the CG is also the maximum one when the substrate network has sufficient resources. With CG, we reduce LC-VNE to the minimum-cost maximum clique problem, which inspires us to propose two efficient LC-VNE heuristics. Extensive numerical simulations demonstrate that compared with the existing ones, our proposed LC-VNE algorithms have significantly reduced time complexity and can provide smaller gaps to the optimal solutions, lower blocking probabilities, and higher time-average revenue as well.

Journal ArticleDOI
TL;DR: This work applies the BPD algorithm (which has approximately linear time complexity) to the network optimal attack problem and demonstrates that it has much better performance than a recently proposed collective information algorithm.
Abstract: For a network formed by nodes and undirected links between pairs of nodes, the network optimal attack problem aims at deleting a minimum number of target nodes to break the network down into many small components. This problem is intrinsically related to the feedback vertex set problem that was successfully tackled by spin-glass theory and an associated belief propagation-guided decimation (BPD) algorithm [Zhou, Eur. Phys. J. B 86, 455 (2013)]. In the present work we apply the BPD algorithm (which has approximately linear time complexity) to the network optimal attack problem and demonstrate that it has much better performance than a recently proposed collective information algorithm [Morone and Makse, Nature 524, 65 (2015)] for different types of random networks and real-world network instances. The BPD-guided attack scheme often induces an abrupt collapse of the whole network, which may make it very difficult to defend.

Proceedings ArticleDOI
26 Jun 2016
TL;DR: A general approach to speed up CPU computing for graph computing in general by reducing the CPU cache miss ratio for different graph algorithms and proposes a new algorithm to reduce the time complexity and improve the efficiency with new optimization techniques based on a new data structure.
Abstract: The CPU cache performance is one of the key issues to efficiency in database systems. It is reported that cache miss latency takes a half of the execution time in database systems. To improve the CPU cache performance, there are studies to support searching including cache-oblivious, and cache-conscious trees. In this paper, we focus on CPU speedup for graph computing in general by reducing the CPU cache miss ratio for different graph algorithms. The approaches dealing with trees are not applicable to graphs which are complex in nature. In this paper, we explore a general approach to speed up CPU computing, in order to further enhance the efficiency of the graph algorithms without changing the graph algorithms (implementations) and the data structures used. That is, we aim at designing a general solution that is not for a specific graph algorithm, neither for a specific data structure. The approach studied in this work is graph ordering, which is to find the optimal permutation among all nodes in a given graph by keeping nodes that will be frequently accessed together locally, to minimize the CPU cache miss ratio. We prove the graph ordering problem is NP-hard, and give a basic algorithm with a bounded approximation. To improve the time complexity of the basic algorithm, we further propose a new algorithm to reduce the time complexity and improve the efficiency with new optimization techniques based on a new data structure. We conducted extensive experiments to evaluate our approach in comparison with other 9 possible graph orderings (such as the one obtained by METIS) using 8 large real graphs and 9 representative graph algorithms. We confirm that our approach can achieve high performance by reducing the CPU cache miss ratios.

Journal ArticleDOI
TL;DR: The experimental results show that the efficiency of Delaunay triangulation by the adaptive Hilbert curve insertion algorithm can be improved significantly for both uniformly and non-uniformly distributed point cloud data, compared with CGAL, regular grid insertion and multi-grid insertion algorithms.

Journal ArticleDOI
TL;DR: The proposed QUATRE algorithm is a swarm based algorithm and use quasi-affine transformation approach for evolution, which has excellent performance not only on uni-modal functions, but also on multi- modal functions even on higher dimension optimization problems.
Abstract: This paper presents a new novel evolutionary approach named QUasi-Affine TRansformation Evolutionary (QUATRE) algorithm, which is a swarm based algorithm and use quasi-affine transformation approach for evolution. The paper also discusses the relation between QUATRE algorithm and other kinds of swarm based algorithms including Particle Swarm Optimization (PSO) variants and Differential Evolution (DE) variants. Comparisons and contrasts are made among the proposed QUATRE algorithm, state-of-the-art PSO variants and DE variants under CEC2013 test suite on real-parameter optimization and CEC2008 test suite on large-scale optimization. Experiment results show that our algorithm outperforms the other algorithms not only on real-parameter optimization but also on large-scale optimization. Moreover, our algorithm has a much more cooperative property that to some extent it can reduce the time complexity (better performance can be achieved by reducing number of generations required for a target optimum by increasing particle population size with the total number of function evaluations unchanged). In general, the proposed algorithm has excellent performance not only on uni-modal functions, but also on multi-modal functions even on higher dimension optimization problems.

Proceedings ArticleDOI
13 Aug 2016
TL;DR: The use of the Incremental Kolmogorov-Smirnov test to detect concept drifts without true labels, which is a significant speed-up compared to the O(N log N) cost of the non-incremental implementation.
Abstract: Data stream research has grown rapidly over the last decade. Two major features distinguish data stream from batch learning: stream data are generated on the fly, possibly in a fast and variable rate; and the underlying data distribution can be non-stationary, leading to a phenomenon known as concept drift. Therefore, most of the research on data stream classification focuses on proposing efficient models that can adapt to concept drifts and maintain a stable performance over time. However, specifically for the classification task, the majority of such methods rely on the instantaneous availability of true labels for all already classified instances. This is a strong assumption that is rarely fulfilled in practical applications. Hence there is a clear need for efficient methods that can detect concept drifts in an unsupervised way. One possibility is the well-known Kolmogorov-Smirnov test, a statistical hypothesis test that checks whether two samples differ. This work has two main contributions. The first one is the Incremental Kolmogorov-Smirnov algorithm that allows performing the Kolmogorov-Smirnov hypothesis test instantly using two samples that change over time, where the change is an insertion and/or removal of an observation. Our algorithm employs a randomized tree and is able to perform the insertion and removal operations in O(log N) with high probability and calculate the Kolmogorov-Smirnov test in O(1), where N is the number of sample observations. This is a significant speed-up compared to the O(N log N) cost of the non-incremental implementation. The second contribution is the use of the Incremental Kolmogorov-Smirnov test to detect concept drifts without true labels. Classification algorithms adapted to use the test rely on a limited portion of those labels just to update the classification model after a concept drift is detected.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: Manifold SLIC is extended to compute content-sensitive superpixels to characterize content-sensitivity by geodesic distances by measuring areas of Voronoi cells on M, which can be computed at a very low cost.
Abstract: Superpixels are perceptually meaningful atomic regions that can effectively capture image features. Among various methods for computing uniform superpixels, simple linear iterative clustering (SLIC) is popular due to its simplicity and high performance. In this paper, we extend SLIC to compute content-sensitive superpixels, i.e., small superpixels in content-dense regions (e.g., with high intensity or color variation) and large superpixels in content-sparse regions. Rather than the conventional SLIC method that clusters pixels in R5, we map the image I to a 2-dimensional manifold M ⊂ R5, whose area elements are a good measure of the content density in I. We propose an efficient method to compute restricted centroidal Voronoi tessellation (RCVT) — a uniform tessellation — on M, which induces the content-sensitive superpixels in I. Unlike other algorithms that characterize content-sensitivity by geodesic distances, manifold SLIC tackles the problem by measuring areas of Voronoi cells on M, which can be computed at a very low cost. As a result, it runs 10 times faster than the state-of-the-art content-sensitive superpixels algorithm. We evaluate manifold SLIC and seven representative methods on the BSDS500 benchmark and observe that our method outperforms the existing methods.

Journal ArticleDOI
TL;DR: This paper can exploit a recent result by Petitjean et al. to allow meaningful averaging of “warped” time series, which then allows us to create super-efficient nearest “centroid” classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives.
Abstract: A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of "warped" time series, which then allows us to create super-efficient nearest "centroid" classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors.

Proceedings ArticleDOI
Shay Solomon1
01 Oct 2016
TL;DR: In this paper, a randomized algorithm for maintaining maximal matching in general graphs with constant amortized update time was presented, which is essentially the best one can hope for under the unique games conjecture in the context of dynamic approximate vertex cover.
Abstract: Baswana, Gupta and Sen [FOCS'11] showed that fully dynamic maximal matching can be maintained in general graphs with logarithmic amortized update time. More specifically, starting from an empty graph on n fixed vertices, they devised a randomized algorithm for maintaining maximal matching over any sequence of t edge insertions and deletions with a total runtime of O(t log n) in expectation and O(t log n + n log2 n) with high probability. Whether or not this runtime bound can be improved towards O(t) has remained an important open problem. Despite significant research efforts, this question has resisted numerous attempts at resolution even for basic graph families such as forests. In this paper, we resolve the question in the affirmative, by presenting a randomized algorithm for maintaining maximal matching in general graphs with constant amortized update time. The optimal runtime bound O(t) of our algorithm holds both in expectation and with high probability. As an immediate corollary, we can maintain 2-approximate vertex cover with constant amortized update time. This result is essentially the best one can hope for (under the unique games conjecture) in the context of dynamic approximate vertex cover, culminating a long line of research. Our algorithm builds on Baswana et al.'s algorithm, but is inherently different and arguably simpler. As an implication of our simplified approach, the space usage of our algorithm is linear in the (dynamic) graph size, while the space usage of Baswana et al.'s algorithm is always at least Ω(n log n). Finally, we present applications to approximate weighted matchings and to distributed networks.

Proceedings Article
06 Jun 2016
TL;DR: The main technical result is in characterizing the Rademacher complexity of the sequence of norms that arise in the sum-of-squares relaxations to the tensor nuclear norm.
Abstract: In the noisy tensor completion problem we observe m entries (whose location is chosen uniformly at random) from an unknown n1 n2 n3 tensor T . We assume that T is entry-wise close to being rankr. Our goal is to fill in its missing entries using as few observations as possible. Let n = max(n1;n2;n3). We show that if m = n 3=2 r then there is a polynomial time algorithm based on the sixth level of the sum-of-squares hierarchy for completing it. Our estimate agrees with almost all of T ’s entries almost exactly and works even when our observations are corrupted by noise. This is also the first algorithm for tensor completion that works in the overcomplete case whenr > n, and in fact it works all the way up tor = n 3=2 . Our proofs are short and simple and are based on establishing a new connection between noisy tensor completion (through the language of Rademacher complexity) and the task of refuting random constant satisfaction problems. This connection seems to have gone unnoticed even in the context of matrix completion. Furthermore, we use this connection to show matching lower bounds. Our main technical result is in characterizing the Rademacher complexity of the sequence of norms that arise in the sum-of-squares relaxations to the tensor nuclear norm. These results point to an interesting new direction: Can we explore computational vs. sample complexity tradeoffs through the sum-of-squares hierarchy?

Journal ArticleDOI
TL;DR: The Bag-Of-SFA-Symbols in Vector Space classifier is presented, which is significantly more accurate than 1-NN DTW while being multiple orders of magnitude faster and relevant for use cases like long or large amounts of time series or real-time analytics.
Abstract: Time series classification tries to mimic the human understanding of similarity. When it comes to long or larger time series datasets, state-of-the-art classifiers reach their limits because of unreasonably high training or testing times. One representative example is the 1-nearest-neighbor dynamic time warping classifier (1-NN DTW) that is commonly used as the benchmark to compare to. It has several shortcomings: it has a quadratic time complexity in the time series length and its accuracy degenerates in the presence of noise. To reduce the computational complexity, early abandoning techniques, cascading lower bounds, or recently, a nearest centroid classifier have been introduced. Still, classification times on datasets of a few thousand time series are in the order of hours. We present our Bag-Of-SFA-Symbols in Vector Space classifier that is accurate, fast and robust to noise. We show that it is significantly more accurate than 1-NN DTW while being multiple orders of magnitude faster. Its low computational complexity combined with its good classification accuracy makes it relevant for use cases like long or large amounts of time series or real-time analytics.

Journal ArticleDOI
TL;DR: The Noisy CNN algorithm speeds training on average because the backpropagation algorithm is a special case of the generalized expectation-maximization (EM) algorithm and because such carefully chosen noise always speeds up the EM algorithm on average.

Journal ArticleDOI
TL;DR: For the uniform-speed case, in which all jobs have arbitrary power demands and must be processed at a single uniform speed, it is proved that the non-preemptive version of this problem is inapproximable within a constant factor unless P = NP, and for the speed-scalable case, this problem can be solved in polynomial time.
Abstract: We consider the problem of scheduling jobs on a single machine to minimize the total electricity cost of processing these jobs under time-of-use electricity tariffs. For the uniform-speed case, in which all jobs have arbitrary power demands and must be processed at a single uniform speed, we prove that the non-preemptive version of this problem is inapproximable within a constant factor unless $$\text {P} = \text {NP}$$ . On the other hand, when all the jobs have the same workload and the electricity prices follow a so-called pyramidal structure, we show that this problem can be solved in polynomial time. For the speed-scalable case, in which jobs can be processed at an arbitrary speed with a trade-off between speed and power demand, we show that the non-preemptive version of the problem is strongly NP-hard. We also present different approximation algorithms for this case, and test the computational performance of these approximation algorithms on randomly generated instances. In addition, for both the uniform-speed and speed-scaling cases, we show how to compute optimal schedules for the preemptive version of the problem in polynomial time.

Posted Content
TL;DR: The study of fairness in reinforcement learning is initiated, and a provably fair polynomial time algorithm is provided under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness.
Abstract: We initiate the study of fairness in reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. Our fairness constraint requires that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take time exponential in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness

Journal ArticleDOI
TL;DR: This paper presents a heuristic scheduling algorithm with quadratic time complexity that considers two important constraints for QoS-based workflow scheduling, time and cost, named Deadline-Budget Constrained Scheduling (DBCS).

01 Jan 2016
TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, which are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Abstract: An efficient method for the calculation of the interactions of a 2' factorial experiment was introduced by Yates and is widely known by his name. The generalization to 3' was given by Box et al. [1]. Good [2] generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series. In their full generality, Good's methods are applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices, where m is proportional to log N. This results inma procedure requiring a number of operations proportional to N log N rather than N2. These methods are applied here to the calculation of complex Fourier series. They are useful in situations where the number of data points is, or can be chosen to be, a highly composite number. The algorithm is here derived and presented in a rather different form. Attention is given to the choice of N. It is also shown how special advantage can be obtained in the use of a binary computer with N = 2' and how the entire calculation can be performed within the array of N data storage locations used for the given Fourier coefficients. Consider the problem of calculating the complex Fourier series N-1