scispace - formally typeset
Search or ask a question

Showing papers on "Statistical learning theory published in 2019"


Journal ArticleDOI
TL;DR: In this article, the authors reformulate coarse-graining as a supervised machine learning problem and use statistical learning theory to decompose the coarsegraining error and cross-validation to select and compare the performance of different models.
Abstract: Atomistic or ab initio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time- and length-scales accessible with such computationally expensive simulations is the definition of coarse-grained molecular models. Existing coarse-graining approaches define an effective interaction potential to match defined properties of high-resolution models or experimental data. In this paper, we reformulate coarse-graining as a supervised machine learning problem. We use statistical learning theory to decompose the coarse-graining error and cross-validation to select and compare the performance of different models. We introduce CGnets, a deep learning approach, that learns coarse-grained free energy functions and can be trained by a force-matching scheme. CGnets maintain all physically relevant invariances and allow one to incorporate prior physics knowledge to avoid sampling of unphysical structures. We show tha...

298 citations


Journal ArticleDOI
TL;DR: Ban and Rudin this article take an innovative machine learning approach to a classic problem solved by almost every news vendor, and apply it to the Big Data Newsvendor Problem (BNN).
Abstract: In Ban and Rudin’s (2018) “The Big Data Newsvendor: Practical Insights from Machine Learning,” the authors take an innovative machine-learning approach to a classic problem solved by almost every c...

222 citations


Journal ArticleDOI
TL;DR: The current thrust of development in machine learning and artificial intelligence, fueled by advances in statistical learning theory over the last 20 years and commercial successes by leading big data companies, is introduced.

184 citations


Proceedings ArticleDOI
20 May 2019
TL;DR: This work addresses the problem of ε-approximate database reconstruction (ε-ADR) from range query leakage, giving attacks whose query cost scales only with the relative error ε, and is independent of the size of the database, or the number N of possible values of data items.
Abstract: We show that the problem of reconstructing encrypted databases from access pattern leakage is closely related to statistical learning theory. This new viewpoint enables us to develop broader attacks that are supported by streamlined performance analyses. First, we address the problem of e-approximate database reconstruction (e-ADR) from range query leakage, giving attacks whose query cost scales only with the relative error e, and is independent of the size of the database, or the number N of possible values of data items. This already goes significantly beyond the state-of-the-art for such attacks, as represented by Kellaris et al. (ACM CCS 2016) and Lacharite et al. (IEEE SP using real data, we show that devastatingly small numbers of queries are needed to attain very accurate database reconstruction. Finally, we generalize from ranges to consider what learning theory tells us about the impact of access pattern leakage for other classes of queries, focusing on prefix and suffix queries. We illustrate this with both concrete attacks for prefix queries and with a general lower bound for all query classes. We also show a very general reduction from reconstruction with known or chosen queries to PAC learning.

97 citations



Journal ArticleDOI
TL;DR: A novel scalable PU learning algorithm that is theoretically proven to provide the optimal solution, while showing superior computational and memory performance, is proposed and successfully applied to a large variety of real-world problems involving PU learning.
Abstract: Positive unlabeled (PU) learning is useful in various practical situations, where there is a need to learn a classifier for a class of interest from an unlabeled data set, which may contain anomalies as well as samples from unknown classes. The learning task can be formulated as an optimization problem under the framework of statistical learning theory. Recent studies have theoretically analyzed its properties and generalization performance, nevertheless, little effort has been made to consider the problem of scalability, especially when large sets of unlabeled data are available. In this work we propose a novel scalable PU learning algorithm that is theoretically proven to provide the optimal solution, while showing superior computational and memory performance. Experimental evaluation confirms the theoretical evidence and shows that the proposed method can be successfully applied to a large variety of real-world problems involving PU learning.

58 citations


Journal ArticleDOI
TL;DR: The Algorithmic Stability framework is relied on to prove learning bounds for the unsupervised concept drift detection on data streams, and the Plover algorithm is designed to detect drifts using different measure functions, such as Statistical Moments and the Power Spectrum.
Abstract: Motivated by the Statistical Learning Theory (SLT), which provides a theoretical framework to ensure when supervised learning algorithms generalize input data, this manuscript relies on the Algorithmic Stability framework to prove learning bounds for the unsupervised concept drift detection on data streams. Based on such proof, we also designed the Plover algorithm to detect drifts using different measure functions, such as Statistical Moments and the Power Spectrum. In this way, the criterion for issuing data changes can also be adapted to better address the target task. From synthetic and real-world scenarios, we observed that each data stream may require a different measure function to identify concept drifts, according to the underlying characteristics of the corresponding application domain. In addition, we discussed about the differences of our approach against others from literature, and showed illustrative results confirming the usefulness of our proposal.

53 citations


Journal ArticleDOI
TL;DR: By accounting for the trade-off between model’s complexity and fitting ability, the proposed approach avoids the problem of over-fitting and enhances the generalization ability to non-reference alternatives and belongs to the family of preference disaggregation approaches.

51 citations


Journal ArticleDOI
TL;DR: The LUSI paradigm, in order to construct the desired classification function, a learning machine computes statistical invariants that are specific for the problem, and then minimizes the expected error in a way that preserves these invariants; it is thus both data- and invariant-driven learning.
Abstract: This paper introduces a new learning paradigm, called Learning Using Statistical Invariants (LUSI), which is different from the classical one. In a classical paradigm, the learning machine constructs a classification rule that minimizes the probability of expected error; it is data-driven model of learning. In the LUSI paradigm, in order to construct the desired classification function, a learning machine computes statistical invariants that are specific for the problem, and then minimizes the expected error in a way that preserves these invariants; it is thus both data- and invariant-driven learning. From a mathematical point of view, methods of the classical paradigm employ mechanisms of strong convergence of approximations to the desired function, whereas methods of the new paradigm employ both strong and weak convergence mechanisms. This can significantly increase the rate of convergence.

48 citations


Posted Content
TL;DR: A new framework, termed Bayes-Stability, is developed for proving algorithm-dependent generalization error bounds for learning general non-convex objectives and it is demonstrated that the data-dependent bounds can distinguish randomly labelled data from normal data.
Abstract: Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning theory. In this paper, we obtain generalization error bounds for learning general non-convex objectives, which has attracted significant attention in recent years. We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability. Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a). We also study the setting where the total loss is the sum of a bounded loss and an additional \ell_2 regularization term. We obtain new generalization bounds for the continuous Langevin dynamic in this setting by developing a new Log-Sobolev inequality for the parameter distribution at any time. Our new bounds are more desirable when the noisy level of the process is not small, and do not become vacuous even when T tends to infinity.

45 citations


Journal ArticleDOI
TL;DR: A Semi-Supervised Metric Transfer Learning framework called SSMT is proposed that reduces the distribution between domains both statistically and geometrically by learning the instance weights, while a regularized distance metric is learned to minimize the within- class co-variance and maximize the between-class co-Variance simultaneously for the target domain.
Abstract: A common assumption of statistical learning theory is that the training and testing data are drawn from the same distribution However, in many real-world applications, this assumption does not hold true Hence, a realistic strategy, Cross Domain Adaptation (DA) or Transfer Learning (TA), can be used to employ previously labelled source domain data to boost the task in the new target domain Previously, Cross Domain Adaptation methods have been focused on re-weighting the instances or aligning the cross-domain distributions However, these methods have two significant challenges: (1) There is no proper consideration of the unlabelled data of target task as in the real-world, an abundant amount of unlabelled data is available, (2) The use of normal Euclidean distance function fails to capture the appropriate similarity or dissimilarity between samples To deal with this issue, we have proposed a Semi-Supervised Metric Transfer Learning framework called SSMT that reduces the distribution between domains both statistically and geometrically by learning the instance weights, while a regularized distance metric is learned to minimize the within-class co-variance and maximize the between-class co-variance simultaneously for the target domain Compared with the previous works where Mahalanobis distance metric and instance weights are learned by using the labelled data or in a pipelined framework that leads to a decrease in the performance, our proposed SSMT attempts to learn a regularized distance metric and instance weights by considering unlabelled data in a parallel framework Experimental evaluation on three cross-domain visual data sets, eg, PIE Face, Handwriting Digit Recognition on MNIST–USPS and Object Recognition, demonstrates the effectiveness of our designed approach on facilitating the unlabelled target task learning, compared to current state-of-the-art domain adaptation approaches

Journal ArticleDOI
TL;DR: A support vector regression (a nonparametric machine learning approach)-based pitch curve is presented and its application to anomaly detection explored for wind turbine condition monitoring.
Abstract: The unexpected failure of wind turbine components leads to significant downtime and loss of revenue. To prevent this, supervisory control and data acquisition (SCADA) based condition monitoring is considered as a cost-effective approach. In several studies, the wind turbine power curve has been used as a critical indicator for power performance assessment. In contrast, the application of the blade pitch angle curve has hardly been explored for wind turbine condition monitoring purposes. The blade pitch angle curve describes the nonlinear relationship between pitch angle and hub height wind speed and can be used for the detection of faults. A support vector machine (SVM) is an improved version of an artificial neural networks (ANN) and is widely used for classification- and regression-related problems. Support vector regression is a data-driven approach based on statistical learning theory and a structural risk minimization principle which provides useful nonlinear system modeling. In this paper, a support vector regression (a nonparametric machine learning approach)-based pitch curve is presented and its application to anomaly detection explored for wind turbine condition monitoring. A radial basis function (RBF) was used as the kernel function for effective SVR blade pitch curve modeling. This approach is then compared with a binned pitch curve in the identification of operational anomalies. The paper will outline the advantages and limitations of these techniques.

Posted Content
TL;DR: This work derives a procedure that allows for learning from all available sources, yet automatically suppresses irrelevant or corrupted data, and shows that this method provides significant improvements over alternative approaches from robust statistics and distributed optimization.
Abstract: Modern machine learning methods often require more data for training than a single expert can provide. Therefore, it has become a standard procedure to collect data from external sources, e.g. via crowdsourcing. Unfortunately, the quality of these sources is not always guaranteed. As additional complications, the data might be stored in a distributed way, or might even have to remain private. In this work, we address the question of how to learn robustly in such scenarios. Studying the problem through the lens of statistical learning theory, we derive a procedure that allows for learning from all available sources, yet automatically suppresses irrelevant or corrupted data. We show by extensive experiments that our method provides significant improvements over alternative approaches from robust statistics and distributed optimization.

Journal ArticleDOI
TL;DR: A regression obtained through a particular class of machine learners, based on statistical learning theory and its Bayesian variants is proposed, which is applied to data from the Sanctuary of Vicoforte, which was dynamically monitored over a period of four months and modelled with finite elements to simulate structural damage.

Proceedings ArticleDOI
23 Jun 2019
TL;DR: This paper settles the sample complexity of single-parameter revenue maximization by showing matching upper and lower bounds, up to a poly-logarithmic factor, for all families of value distributions that have been considered in the literature.
Abstract: This paper settles the sample complexity of single-parameter revenue maximization by showing matching upper and lower bounds, up to a poly-logarithmic factor, for all families of value distributions that have been considered in the literature. The upper bounds are unified under a novel framework, which builds on the strong revenue monotonicity by Devanur, Huang, and Psomas (STOC 2016), and an information theoretic argument. This is fundamentally different from the previous approaches that rely on either constructing an є-net of the mechanism space, explicitly or implicitly via statistical learning theory, or learning an approximately accurate version of the virtual values. To our knowledge, it is the first time information theoretical arguments are used to show sample complexity upper bounds, instead of lower bounds. Our lower bounds are also unified under a meta construction of hard instances.

Proceedings Article
24 May 2019
TL;DR: In this paper, the authors address the question of how to learn robustly in such scenarios, and derive a procedure that allows for learning from all available sources, yet automatically suppresses irrelevant or corrupted data.
Abstract: Modern machine learning methods often require more data for training than a single expert can provide. Therefore, it has become a standard procedure to collect data from external sources, e.g. via crowdsourcing. Unfortunately, the quality of these sources is not always guaranteed. As additional complications, the data might be stored in a distributed way, or might even have to remain private. In this work, we address the question of how to learn robustly in such scenarios. Studying the problem through the lens of statistical learning theory, we derive a procedure that allows for learning from all available sources, yet automatically suppresses irrelevant or corrupted data. We show by extensive experiments that our method provides significant improvements over alternative approaches from robust statistics and distributed optimization.

Journal ArticleDOI
TL;DR: Granular support vector machine (GSVM) is a novel machine learning model based on granular computing and statistical learning theory, and it can solve the low efficiency learning problem that exists in the traditional SVM and obtain satisfactory generalization performance, as well.
Abstract: The time complexity of traditional support vector machine (SVM) is $$O(l^{3})$$ and l is the the training sample size, and it can not solve the large scale problems. Granular support vector machine (GSVM) is a novel machine learning model based on granular computing and statistical learning theory, and it can solve the low efficiency learning problem that exists in the traditional SVM and obtain satisfactory generalization performance, as well. This paper primarily reviews the past (rudiment), present (basic model) and future (development direction) of GSVM. Firstly, we briefly introduce the basic theory of SVM and GSVM. Secondly, we describe the past related research works conducted before the GSVM was proposed. Next, the latest thoughts, models, algorithms and applications of GSVM are described. Finally, we note the research and development prospects of GSVM.

Journal ArticleDOI
TL;DR: The input and weight Hessians are used to quantify a network's ability to generalize to unseen data and how one can control the generalization capability of the network by means of the training process using the learning rate, batch size and the number of training iterations as controls.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Results computed on multiple UCI benchmark datasets clearly indicate the effectiveness and applicability of the proposed ISPTSVM compared to pinball support vector machine (Pin-SVM), twin bounded support vectors machine (TBSVM) and SPTSVM.
Abstract: In this paper, we propose an improved version of sparse pinball twin support vector machine (SPTSVM) [1], called improved sparse pinball twin support vector machine (ISPTSVM). SPTSVM implements empirical risk minimization principle and the matrices appearing in the formulation of SPTSVM are positive semi-definite. Here, we reformulate the primal problems of SPTSVM by introducing extra regularization term to the objective function of SPTSVM. Unlike SPTSVM, structural risk minimization (SRM) principle is implemented in the proposed ISPTSVM which embodies the marrow of statistical learning theory. Also, the matrices that appear in the dual formulation of the proposed ISPTSVM are positive definite. Results computed on multiple UCI benchmark datasets clearly indicate the effectiveness and applicability of the proposed ISPTSVM compared to pinball support vector machine (Pin-SVM), twin bounded support vector machine (TBSVM) and SPTSVM.

Posted Content
TL;DR: Finite sample upper bounds are derived for the generalization error committed by specific families of reservoir computing systems when processing discrete-time inputs under various hypotheses on their dependence structure in the framework of statistical learning theory.
Abstract: We analyze the practices of reservoir computing in the framework of statistical learning theory. In particular, we derive finite sample upper bounds for the generalization error committed by specific families of reservoir computing systems when processing discrete-time inputs under various hypotheses on their dependence structure. Non-asymptotic bounds are explicitly written down in terms of the multivariate Rademacher complexities of the reservoir systems and the weak dependence structure of the signals that are being handled. This allows, in particular, to determine the minimal number of observations needed in order to guarantee a prescribed estimation accuracy with high probability for a given reservoir family. At the same time, the asymptotic behavior of the devised bounds guarantees the consistency of the empirical risk minimization procedure for various hypothesis classes of reservoir functionals.

Proceedings ArticleDOI
25 Jul 2019
TL;DR: The results of the experimental evaluation show that SPuManTE allows the discovery of statistically significant patterns while properly accounting for uncertainties in patterns' frequencies due to the data generation process.
Abstract: We present SPuManTE, an efficient algorithm for mining significant patterns from a transactional dataset. SPuManTE controls the Family-wise Error Rate: it ensures that the probability of reporting one or more false discoveries is less than an user-specified threshold. A key ingredient of SPuManTE is UT, our novel unconditional statistical test for evaluating the significance of a pattern, that requires fewer assumptions on the data generation process and is more appropriate for a knowledge discovery setting than classical conditional tests, such as the widely used Fisher's exact test. Computational requirements have limited the use of unconditional tests in significant pattern discovery, but UT overcomes this issue by obtaining the required probabilities in a novel efficient way. SPuManTE combines UT with recent results on the supremum of the deviations of pattern frequencies from their expectations, founded in statistical learning theory. This combination allows SPuManTE to be very efficient, while also enjoying high statistical power. The results of our experimental evaluation show that SPuManTE allows the discovery of statistically significant patterns while properly accounting for uncertainties in patterns' frequencies due to the data generation process.

Posted Content
24 Oct 2019
TL;DR: This work proposes a framework for safe reinforcement learning that can handle stochastic nonlinear dynamical systems, and proposes to use a tube-based robust nonlinear model predictive controller (NMPC) as the backup controller.
Abstract: This paper proposes a framework for safe reinforcement learning that can handle stochastic nonlinear dynamical systems. We focus on the setting where the nominal dynamics are known, and are subject to additive stochastic disturbances with known distribution. Our goal is to ensure the safety of a control policy trained using reinforcement learning, e.g., in a simulated environment. We build on the idea of model predictive shielding (MPS), where a backup controller is used to override the learned policy as needed to ensure safety. The key challenge is how to compute a backup policy in the context of stochastic dynamics. We propose to use a tube-based robust NMPC controller as the backup controller. We estimate the tubes using sampled trajectories, leveraging ideas from statistical learning theory to obtain high-probability guarantees. We empirically demonstrate that our approach can ensure safety in stochastic systems, including cart-pole and a non-holonomic particle with random obstacles.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This work considers the problem of designing control laws for stochastic jump linear systems where the disturbances are drawn randomly from a finite sample space according to an unknown distribution, and adopts a distributionally robust approach to compute a mean-square stabilizing feedback gain with a given probability.
Abstract: We consider the problem of designing control laws for stochastic jump linear systems where the disturbances are drawn randomly from a finite sample space according to an unknown distribution, which is estimated from a finite sample of i.i.d. observations. We adopt a distributionally robust approach to compute a mean-square stabilizing feedback gain with a given probability. The larger the sample size, the less conservative the controller, yet our methodology gives stability guarantees with high probability, for any number of samples. Using tools from statistical learning theory, we estimate confidence regions for the unknown probability distributions (ambiguity sets) which have the shape of total variation balls centered around the empirical distribution. We use these confidence regions in the design of appropriate distributionally robust controllers and show that the associated stability conditions can be cast as a tractable linear matrix inequality (LMI) by using conjugate duality. The resulting design procedure scales gracefully with the size of the probability space and the system dimensions. Through a numerical example, we illustrate the superior sample complexity of the proposed methodology over the stochastic approach.

Journal ArticleDOI
TL;DR: This study introduced distributed statistical computing (DSC) into the design of secure multiparty protocols, which allows DSC to securely calculate a linear regression model over multiple datasets, thus limiting communication to the final step and reducing complexity.
Abstract: Background: Biomedical research often requires large cohorts and necessitates the sharing of biomedical data with researchers around the world, which raises many privacy, ethical, and legal concerns. In the face of these concerns, privacy experts are trying to explore approaches to analyzing the distributed data while protecting its privacy. Many of these approaches are based on secure multiparty computations (SMCs). SMC is an attractive approach allowing multiple parties to collectively carry out calculations on their datasets without having to reveal their own raw data; however, it incurs heavy computation time and requires extensive communication between the involved parties. Objective: This study aimed to develop usable and efficient SMC applications that meet the needs of the potential end-users and to raise general awareness about SMC as a tool that supports data sharing. Methods: We have introduced distributed statistical computing (DSC) into the design of secure multiparty protocols, which allows us to conduct computations on each of the parties’ sites independently and then combine these computations to form 1 estimator for the collective dataset, thus limiting communication to the final step and reducing complexity. The effectiveness of our privacy-preserving model is demonstrated through a linear regression application. Results: Our secure linear regression algorithm was tested for accuracy and performance using real and synthetic datasets. The results showed no loss of accuracy (over nonsecure regression) and very good performance (20 min for 100 million records). Conclusions: We used DSC to securely calculate a linear regression model over multiple datasets. Our experiments showed very good performance (in terms of the number of records it can handle). We plan to extend our method to other estimators such as logistic regression.

Posted Content
TL;DR: These notes gather recent results on robust statistical learning theory and stress the main principles underlying the construction and theoretical analysis of these estimators rather than provide an exhaustive account on this rapidly growing field.
Abstract: These notes gather recent results on robust statistical learning theory. The goal is to stress the main principles underlying the construction and theoretical analysis of these estimators rather than provide an exhaustive account on this rapidly growing field. The notes are the basis of lectures given at the conference StatMathAppli 2019.

Proceedings Article
01 Jan 2019
TL;DR: In this article, the authors propose a theoretical framework to deal with part-based data from a general perspective and study a novel method within the setting of statistical learning theory, which explicitly quantifies the benefits of leveraging the partbased structure of a problem on the learning rates of the proposed estimator.
Abstract: Key to structured prediction is exploiting the problem's structure to simplify the learning process. A major challenge arises when data exhibit a local structure (i.e., are made ``by parts'') that can be leveraged to better approximate the relation between (parts of) the input and (parts of) the output. Recent literature on signal processing, and in particular computer vision, shows that capturing these aspects is indeed essential to achieve state-of-the-art performance. However, in this context algorithms are typically derived on a case-by-case basis. In this work we propose the first theoretical framework to deal with part-based data from a general perspective and study a novel method within the setting of statistical learning theory. Our analysis is novel in that it explicitly quantifies the benefits of leveraging the part-based structure of a problem on the learning rates of the proposed estimator.

Proceedings Article
Dixian Zhu1, Zhe Li1, Xiaoyu Wang, Boqing Gong2, Tianbao Yang1 
11 Apr 2019
TL;DR: A novel robust zero- sum game framework for pool-based active learning grounded on advanced statistical learning theory that avoids the issues of many previous algorithms such as inefficiency, sampling bias and sensitivity to imbalanced data distribution is presented.
Abstract: In this paper, we present a novel robust zero- sum game framework for pool-based active learning grounded on advanced statistical learning theory. Pool-based active learning usually consists of two components, namely, learning of a classifier given labeled data and querying of unlabeled data for labeling. Most previous studies on active learning consider these as two separate tasks and propose various heuristics for selecting important unlabeled data for labeling, which may render the selection of unlabeled examples sub-optimal for minimizing the classification error. In contrast, the present work formulates active learning as a unified optimization framework for learning the classifier, i.e., the querying of labels and the learning of models are unified to minimize a common objective for statistical learning. In addition, the proposed method avoids the issues of many previous algorithms such as inefficiency, sampling bias and sensitivity to imbalanced data distribution. Besides theoretical analysis, we conduct extensive experiments on benchmark datasets and demonstrate the superior performance of the proposed active learning method compared with the state-of-the-art methods.

Posted Content
TL;DR: The standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in this setting.
Abstract: Statistical learning theory has largely focused on learning and generalization given independent and identically distributed (i.i.d.) samples. Motivated by applications involving time-series data, there has been a growing literature on learning and generalization in settings where data is sampled from an ergodic process. This work has also developed complexity measures, which appropriately extend the notion of Rademacher complexity to bound the generalization error and learning rates of hypothesis classes in this setting. Rather than time-series data, our work is motivated by settings where data is sampled on a network or a spatial domain, and thus do not fit well within the framework of prior work. We provide learning and generalization bounds for data that are complexly dependent, yet their distribution satisfies the standard Dobrushin's condition. Indeed, we show that the standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in our setting. Moreover, our generalization bounds only degrade by constant factors compared to their i.i.d. analogs, and our learnability bounds degrade by log factors in the size of the training set.

Proceedings Article
21 Jun 2019
TL;DR: In this article, the authors show that the standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in this setting.
Abstract: Statistical learning theory has largely focused on learning and generalization given independent and identically distributed (i.i.d.) samples. Motivated by applications involving time-series data, there has been a growing literature on learning and generalization in settings where data is sampled from an ergodic process. This work has also developed complexity measures, which appropriately extend the notion of Rademacher complexity to bound the generalization error and learning rates of hypothesis classes in this setting. Rather than time-series data, our work is motivated by settings where data is sampled on a network or a spatial domain, and thus do not fit well within the framework of prior work. We provide learning and generalization bounds for data that are complexly dependent, yet their distribution satisfies the standard Dobrushin's condition. Indeed, we show that the standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in our setting. Moreover, our generalization bounds only degrade by constant factors compared to their i.i.d. analogs, and our learnability bounds degrade by log factors in the size of the training set.

Proceedings Article
01 Jan 2019
TL;DR: This work proves novel McDiarmid-type concentration inequalities for Lipschitz functions of graph-dependent random variables and demonstrates that for many types of dependent data, the forest complexity is small and thus implies good concentration.
Abstract: A crucial assumption in most statistical learning theory is that samples are independently and identically distributed (i.i.d.). However, for many real applications, the i.i.d. assumption does not hold. We consider learning problems in which examples are dependent and their dependency relation is characterized by a graph. To establish algorithm-dependent generalization theory for learning with non-i.i.d. data, we first prove novel McDiarmid-type concentration inequalities for Lipschitz functions of graph-dependent random variables. We show that concentration relies on the forest complexity of the graph, which characterizes the strength of the dependency. We demonstrate that for many types of dependent data, the forest complexity is small and thus implies good concentration. Based on our new inequalities we are able to build stability bounds for learning from graph-dependent data.