Showing papers on "Generalization published in 2019"

PDF

Open Access

Proceedings Article•DOI•

Meta-Learning With Differentiable Convex Optimization

[...]

Kwonjoon Lee¹, Subhransu Maji², Avinash Ravichandran³, Stefano Soatto³•Institutions (3)

University of California, San Diego¹, University of Massachusetts Amherst², Amazon.com³

15 Jun 2019

TL;DR: The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem.

...read moreread less

Abstract: Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks. Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. To efficiently solve the objective, we exploit two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. This allows us to use high-dimensional embeddings with improved generalization at a modest increase in computational overhead. Our approach, named MetaOptNet, achieves state-of-the-art performance on miniImageNet, tieredImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks.

...read moreread less

1,084 citations

Posted Content•

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

[...]

Sanjeev Arora¹, Simon S. Du², Wei Hu¹, Zhiyuan Li¹, Ruosong Wang² - Show less +1 more•Institutions (2)

Princeton University¹, Carnegie Mellon University²

24 Jan 2019-arXiv: Learning

TL;DR: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

...read moreread less

Abstract: Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent. The key idea is to track dynamics of training and generalization via properties of a related kernel.

...read moreread less

476 citations

Proceedings Article•

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

[...]

Sanjeev Arora, Simon S. Du¹, Wei Hu², Zhiyuan Li², Ruosong Wang¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Princeton University²

24 Jan 2019

TL;DR: In this paper, a simple 2-layer ReLU network with random initialization is analyzed and generalization bound independent of network size is shown to be robust to the size of the network.

...read moreread less

403 citations

Journal Article•DOI•

An approach toward decision-making and medical diagnosis problems using the concept of spherical fuzzy sets

[...]

Tahir Mahmood¹, Kifayat Ullah¹, Qaisar Khan¹, Naeem Jan¹•Institutions (1)

International Islamic University, Islamabad¹

01 Nov 2019-Neural Computing and Applications

TL;DR: The concept of spherical fuzzy set (SFS) and T-spherical fuzzy set [T-SFS] is introduced as a generalization of FS, IFS and PFS and shown by examples and graphical comparison with early established concepts.

...read moreread less

Abstract: Human opinion cannot be restricted to yes or no as depicted by conventional fuzzy set (FS) and intuitionistic fuzzy set (IFS) but it can be yes, abstain, no and refusal as explained by picture fuzzy set (PFS). In this article, the concept of spherical fuzzy set (SFS) and T-spherical fuzzy set (T-SFS) is introduced as a generalization of FS, IFS and PFS. The novelty of SFS and T-SFS is shown by examples and graphical comparison with early established concepts. Some operations of SFSs and T-SFSs along with spherical fuzzy relations are defined, and related results are conferred. Medical diagnostics and decision-making problem are discussed in the environment of SFSs and T-SFSs as practical applications.

...read moreread less

398 citations

Journal Article•DOI•

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

[...]

Lu Lu, Pengzhan Jin, George Em Karniadakis

08 Oct 2019-arXiv: Learning

TL;DR: This work proposes deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset, and demonstrates that DeepONet significantly reduces the generalization error compared to the fully-connected networks.

...read moreread less

Abstract: While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size.

...read moreread less

324 citations

Proceedings Article•DOI•

DLOW: Domain Flow for Adaptation and Generalization

[...]

Rui Gong¹, Wen Li¹, Yuhua Chen¹, Luc Van Gool¹•Institutions (1)

ETH Zurich¹

15 Jun 2019

TL;DR: A domain flow generation model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other and demonstrating the effectiveness of the model for both cross-domain semantic segmentation and the style generalization tasks on benchmark datasets is presented.

...read moreread less

Abstract: In this work, we present a domain flow generation(DLOW) model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other. The benefits of our DLOW model are two-fold. First, it is able to transfer source images into different styles in the intermediate domains. The transferred images smoothly bridge the gap between source and target domains, thus easing the domain adaptation task. Second, when multiple target domains are provided for training, our DLOW model is also able to generate new styles of images that are unseen in the training data. We implement our DLOW model based on CycleGAN. A domainness variable is introduced to guide the model to generate the desired intermediate domain images. In the inference phase, a flow of various styles of images can be obtained by varying the domainness variable. We demonstrate the effectiveness of our model for both cross-domain semantic segmentation and the style generalization tasks on benchmark datasets. Our implementation is available at https://github.com/ETHRuiGong/DLOW .

...read moreread less

311 citations

Proceedings Article•

The role of over-parametrization in generalization of neural networks

[...]

Behnam Neyshabur, Zhiyuan Li¹, Srinadh Bhojanapalli², Yann LeCun³, Nathan Srebro² - Show less +1 more•Institutions (3)

Princeton University¹, Toyota Technological Institute at Chicago², New York University³

01 Jan 2019

289 citations

Posted Content•

Domain Generalization via Model-Agnostic Learning of Semantic Features

[...]

Qi Dou¹, Daniel Coelho de Castro¹, Konstantinos Kamnitsas¹, Ben Glocker¹•Institutions (1)

Imperial College London¹

29 Oct 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work investigates the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics, and adopts a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift.

...read moreread less

Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

...read moreread less

272 citations

Posted Content•

Fantastic Generalization Measures and Where to Find Them

[...]

Yiding Jiang¹, Behnam Neyshabur¹, Hossein Mobahi¹, Dilip Krishnan¹, Samy Bengio¹ - Show less +1 more•Institutions (1)

Google¹

04 Dec 2019-arXiv: Learning

TL;DR: This work presents the first large scale study of generalization in deep networks, investigating more then 40 complexity measures taken from both theoretical bounds and empirical studies and showing surprising failures of some measures as well as promising measures for further research.

...read moreread less

Abstract: Generalization of deep networks has been of great interest in recent years, resulting in a number of theoretically and empirically motivated complexity measures. However, most papers proposing such measures study only a small set of models, leaving open the question of whether the conclusion drawn from those experiments would remain valid in other settings. We present the first large scale study of generalization in deep networks. We investigate more then 40 complexity measures taken from both theoretical bounds and empirical studies. We train over 10,000 convolutional networks by systematically varying commonly used hyperparameters. Hoping to uncover potentially causal relationships between each measure and generalization, we analyze carefully controlled experiments and show surprising failures of some measures as well as promising measures for further research.

...read moreread less

258 citations

Proceedings Article•DOI•

Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection

[...]

Rui Shao¹, Xiangyuan Lan¹, Jiawei Li², Pong C. Yuen¹•Institutions (2)

Southwest Baptist University¹, Hong Kong Baptist University²

15 Jun 2019

TL;DR: This work proposes to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework under a dual-force triplet-mining constraint, which ensures that the learned feature space is discriminating and shared by multiple source domains, and thus more generalized to new face presentation attacks.

...read moreread less

Abstract: Face presentation attacks have become an increasingly critical issue in the face recognition community. Many face anti-spoofing methods have been proposed, but they cannot generalize well on "unseen" attacks. This work focuses on improving the generalization ability of face anti-spoofing methods from the perspective of the domain generalization. We propose to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework. In this framework, a multi-adversarial deep domain generalization is performed under a dual-force triplet-mining constraint. This ensures that the learned feature space is discriminative and shared by multiple source domains, and thus is more generalized to new face presentation attacks. An auxiliary face depth supervision is incorporated to further enhance the generalization ability. Extensive experiments on four public datasets validate the effectiveness of the proposed method.

...read moreread less

245 citations

Proceedings Article•DOI•

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data

[...]

Xiangyu Yue¹, Yang Zhang², Sicheng Zhao¹, Alberto Sangiovanni-Vincentelli¹, Kurt Keutzer¹, Boqing Gong³ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, University of Central Florida², Google³

01 Oct 2019

TL;DR: A new approach of domain randomization and pyramid consistency to learn a model with high generalizability for semantic segmentation of real-world self-driving scenes in a domain generalization fashion is proposed.

...read moreread less

Abstract: We propose to harness the potential of simulation for semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any information about target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First, we propose to randomize the synthetic images with styles of real images in terms of visual appearances using auxiliary datasets, in order to effectively learn domain-invariant representations. Second, we further enforce pyramid consistency across different "stylized" images and within an image, in order to learn domain-invariant and scale-invariant features, respectively. Extensive experiments are conducted on generalization from GTA and SYNTHIA to Cityscapes, BDDS, and Mapillary; and our method achieves superior results over the state-of-the-art techniques. Remarkably, our generalization results are on par with or even better than those obtained by state-of-the-art simulation-to-real domain adaptation methods, which access the target domain data at training time.

...read moreread less

Proceedings Article•

Domain Generalization via Model-Agnostic Learning of Semantic Features

[...]

Qi Dou¹, Daniel Coelho de Castro¹, Konstantinos Kamnitsas¹, Ben Glocker¹•Institutions (1)

Imperial College London¹

29 Oct 2019

TL;DR: In this paper, the authors adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift and introduce two complementary losses which explicitly regularize the semantic structure of the feature space.

...read moreread less

Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge of inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

...read moreread less

Journal Article•DOI•

Soundness and completeness of quantum root-mean-square errors

[...]

Masanao Ozawa¹•Institutions (1)

Nagoya University¹

15 Jan 2019-npj Quantum Information

TL;DR: In this paper, Ozawa et al. proposed an improved root-mean-square (RMS) metric for quantum measurement uncertainty relation, which is state-dependent, operationally definable and perfectly characterizes accurate measurements.

...read moreread less

Abstract: Defining and measuring the error of a measurement is one of the most fundamental activities in experimental science. However, quantum theory shows a peculiar difficulty in extending the classical notion of root-mean-square (rms) error to quantum measurements. A straightforward generalization based on the noise-operator was used to reformulate Heisenberg’s uncertainty relation on the accuracy of simultaneous measurements to be universally valid and made the conventional formulation testable to observe its violation. Recently, its reliability was examined based on an anomaly that the error vanishes for some inaccurate measurements, in which the meter does not commute with the measured observable. Here, we propose an improved definition for a quantum generalization of the classical rms error, which is state-dependent, operationally definable, and perfectly characterizes accurate measurements. Moreover, it is shown that the new notion maintains the previously obtained universally valid uncertainty relations and their experimental confirmations without changing their forms and interpretations, in contrast to a prevailing view that a state-dependent formulation for measurement uncertainty relation is not tenable. An improved definition extends the notion of root-mean-square error from classical to quantum measurements. How to define and measure the error of a measurement is one of the basic characteristics of experimental science. The root-mean-square error is a frequently used metric, but extending this notion from classical to quantum measurements is not trivial. Attempts to generalize this error to quantum measurements have been made, but many approaches suffer from anomalies, which unwantedly see the error vanish for certain types of measurements. Masanao Ozawa from Nagoya University now presents an improved definition for a quantum generalization of the classical root-mean-square error, which doesn’t suffer from such limitations.

...read moreread less

Proceedings Article•

Uniform convergence may be unable to explain generalization in deep learning

[...]

Vaishnavh Nagarajan¹, J. Zico Kolter¹•Institutions (1)

Carnegie Mellon University¹

06 Sep 2019

TL;DR: This paper showed that applying two-sided uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than Ω(n) √ ω(n).

...read moreread less

Abstract: Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot ``explain generalization'' -- even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small $\epsilon$ in our settings, we show that applying (two-sided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than $1-\epsilon$. Through these findings, we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.

...read moreread less

Proceedings Article•DOI•

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

[...]

Alon Talmor¹, Jonathan Berant²•Institutions (2)

Allen Institute for Artificial Intelligence¹, Tel Aviv University²

31 May 2019

TL;DR: This paper proposed MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five reading comprehension (RC) datasets.

...read moreread less

Abstract: A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community.

...read moreread less

Proceedings Article•

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

[...]

Yuan Cao¹, Quanquan Gu¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 2019

TL;DR: In this article, the authors show that the expected loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of neural tangent random feature (NTRF) model induced by the network gradient at initialization.

...read moreread less

Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a \textit{neural tangent random feature} (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

...read moreread less

Proceedings Article•

Compositional generalization through meta sequence-to-sequence learning

[...]

Brenden M. Lake

01 Jan 2019

TL;DR: In this article, memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning, which solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

...read moreread less

Abstract: People can learn a new concept and use it compositionally, understanding how to "blicket twice" after learning how to "blicket." In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning. In this approach, models train on a series of seq2seq problems to acquire the compositional skills needed to solve new seq2seq problems. Meta se2seq learning solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

...read moreread less

Journal Article•DOI•

Maximum principle preserving exponential time differencing schemes for the nonlocal Allen Cahn equation

[...]

Qiang Du¹, Lili Ju², Xiao Li³, Xiao Li⁴, Xiao Li⁵, Zhonghua Qiao⁴ - Show less +2 more•Institutions (5)

Columbia University¹, Ocean University of China², University of South Carolina³, Hong Kong Polytechnic University⁴, China Academy of Engineering Physics⁵

30 Apr 2019-SIAM Journal on Numerical Analysis

TL;DR: The nonlocal Allen--Cahn equation, a generalization of the classic Allen-- Cahn equation by replacing the Laplacian with a parameterized nonlocal diffusion operator, satisfies the maximum principle.

...read moreread less

Abstract: The nonlocal Allen--Cahn equation, a generalization of the classic Allen--Cahn equation by replacing the Laplacian with a parameterized nonlocal diffusion operator, satisfies the maximum principle ...

...read moreread less

Posted Content•

On the Inductive Bias of Neural Tangent Kernels

[...]

Alberto Bietti¹, Julien Mairal¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

29 May 2019-arXiv: Machine Learning

TL;DR: In this article, the authors study the inductive bias of learning in such a regime by analyzing the neural tangent kernel and the corresponding function space (RKHS), and compare to other known kernels for similar architectures.

...read moreread less

Abstract: State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures.

...read moreread less

Posted Content•

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

[...]

Yuan Cao¹, Quanquan Gu¹•Institutions (1)

University of California, Los Angeles¹

30 May 2019-arXiv: Learning

TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.

...read moreread less

Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

...read moreread less

Journal Article•DOI•

Optimal errors and phase transitions in high-dimensional generalized linear models

[...]

Jean Barbier, Florent Krzakala¹, Nicolas Macris², Léo Miolane¹, Lenka Zdeborová¹ - Show less +1 more•Institutions (2)

Centre national de la recherche scientifique¹, École Polytechnique Fédérale de Lausanne²

01 Mar 2019-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: In this paper, the mutual information (or free entropy) from which the Bayes-optimal estimation and generalization errors of generalized linear models (GLMs) are deduced is analyzed.

...read moreread less

Abstract: Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

...read moreread less

Proceedings Article•DOI•

Striking the Right Balance With Uncertainty

[...]

Salman Khan¹, Munawar Hayat², Syed Waqas Zamir, Jianbing Shen³, Ling Shao - Show less +1 more•Institutions (3)

Australian National University¹, University of Canberra², Beijing Institute of Technology³

15 Jun 2019

TL;DR: This paper demonstrates that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples, and presents a novel framework for uncertainty based class imbalance learning that efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers.

...read moreread less

Abstract: Learning unbiased models on imbalanced datasets is a significant challenge. Rare classes tend to get a concentrated representation in the classification space which hampers the generalization of learned boundaries to new test examples. In this paper, we demonstrate that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples. Subsequently, we present a novel framework for uncertainty based class imbalance learning that follows two key insights: First, classification boundaries should be extended further away from a more uncertain (rare) class to avoid over-fitting and enhance its generalization. Second, each sample should be modeled as a multi-variate Gaussian distribution with a mean vector and a covariance matrix defined by the sample's uncertainty. The learned boundaries should respect not only the individual samples but also their distribution in the feature space. Our proposed approach efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers. We systematically study the class imbalance problem and derive a novel loss formulation for max-margin learning based on Bayesian uncertainty measure. The proposed method shows significant performance improvements on six benchmark datasets for face verification, attribute prediction, digit/object classification and skin lesion detection.

...read moreread less

Proceedings Article•

When to Trust Your Model: Model-Based Policy Optimization

[...]

Michael Janner¹, Justin Fu¹, Marvin Zhang¹, Sergey Levine²•Institutions (2)

University of California, Berkeley¹, Google²

01 Jun 2019

TL;DR: In this article, the role of model usage in policy optimization both theoretically and empirically is investigated, and a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms.

...read moreread less

Abstract: Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

...read moreread less

Journal Article•DOI•

Optimizing Bloom Filter: Challenges, Solutions, and Comparisons

[...]

Lailong Luo¹, Deke Guo¹, Richard T. B. Ma², Ori Rottenstreich³, Xueshan Luo¹ - Show less +1 more•Institutions (3)

National University of Defense Technology¹, National University of Singapore², Technion – Israel Institute of Technology³

01 Jan 2019-IEEE Communications Surveys and Tutorials

TL;DR: In this article, a survey of the existing literature on BF optimization, covering more than 60 variants, is presented, and a comprehensive analysis and qualitative comparison are conducted from the perspectives of BF components.

...read moreread less

Abstract: Bloom filter (BF) has been widely used to support membership query, i.e., to judge whether a given element ${x}$ is a member of a given set ${S}$ or not. Recent years have seen a flourish design explosion of BF due to its characteristic of space-efficiency and the functionality of constant-time membership query. The existing reviews or surveys mainly focus on the applications of BF, but fall short in covering the current trends, thereby lacking intrinsic understanding of their design philosophy. To this end, this survey provides an overview of BF and its variants, with an emphasis on the optimization techniques. Basically, we survey the existing variants from two dimensions, i.e., performance and generalization. To improve the performance, dozens of variants devote themselves to reducing the false positives and implementation costs. Besides, tens of variants generalize the BF framework in more scenarios by diversifying the input sets and enriching the output functionalities. To summarize the existing efforts, we conduct an in-depth study of the existing literature on BF optimization, covering more than 60 variants. We unearth the design philosophy of these variants and elaborate how the employed optimization techniques improve BF. Furthermore, comprehensive analysis and qualitative comparison are conducted from the perspectives of BF components. Lastly, we highlight the future trends of designing BFs. This is, to the best of our knowledge, the first survey that accomplishes such goals.

...read moreread less

Proceedings Article•DOI•

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

[...]

Fu Xiong¹, Boshen Zhang¹, Yang Xiao, Zhiguo Cao, Taidong Yu¹, Joey Tianyi Zhou², Junsong Yuan³ - Show less +3 more•Institutions (3)

Huazhong University of Science and Technology¹, Agency for Science, Technology and Research², University at Buffalo³

01 Oct 2019

TL;DR: Zhang et al. as discussed by the authors proposed Anchor-to-Joint Regression Network (A2J) to estimate 3D hand and body pose estimation in depth images.

...read moreread less

Abstract: For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. Within A2J, anchor points able to capture global-local spatial context information are densely set on depth image as local regressors for the joints. They contribute to predict the positions of the joints in ensemble way to enhance generalization ability. The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners. To discover informative anchor points towards certain joint, anchor proposal procedure is also proposed for A2J. Meanwhile 2D CNN (i.e., ResNet- 50) is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers. The experiments on 3 hand datasets and 2 body datasets verify A2J’s superiority. Meanwhile, A2J is of high running speed around 100 FPS on single NVIDIA 1080Ti GPU.

...read moreread less

Journal Article•DOI•

A Learning Framework of Adaptive Manipulative Skills From Human to Robot

[...]

Chenguang Yang¹, Chao Zeng¹, Yang Cong², Ning Wang³, Min Wang¹ - Show less +1 more•Institutions (3)

South China University of Technology¹, Chinese Academy of Sciences², University of Plymouth³

01 Feb 2019-IEEE Transactions on Industrial Informatics

TL;DR: A new framework to facilitate robot skill generalization is proposed, in that the learned skills are first segmented into a sequence of subskills automatically, then each individual subskill is encoded and regulated accordingly.

...read moreread less

Abstract: Robots are often required to generalize the skills learned from human demonstrations to fulfil new task requirements. However, skill generalization will be difficult to realize when facing with the following situations: the skill for a complex multistep task includes a number of features; some special constraints are imposed on the robots during the process of task reproduction; and a completely new situation quite different with the one in which demonstrations are given to the robot. This work proposes a new framework to facilitate robot skill generalization. The basic idea lies in that the learned skills are first segmented into a sequence of subskills automatically, then each individual subskill is encoded and regulated accordingly. Specifically, we adapt each set of the segmented movement trajectories individually instead of the whole movement profiles, thus, making it more convenient for the realization of skill generalization. In addition, human limb stiffness estimated from surface electromyographic signals is considered in the framework for the realization of human-to-robot variable impedance control skill transfer, as well as the generalization of both movement trajectories and stiffness profiles. Experimental study has been performed to verify the effectiveness of the proposed framework.

...read moreread less

Proceedings Article•DOI•

ContextDesc: Local Descriptor Augmentation With Cross-Modality Context

[...]

Zixin Luo¹, Tianwei Shen¹, Lei Zhou¹, Jiahui Zhang², Yao Yao¹, Shiwei Li¹, Tian Fang, Long Quan¹ - Show less +4 more•Institutions (2)

Hong Kong University of Science and Technology¹, Tsinghua University²

08 Apr 2019

TL;DR: This paper proposed a unified learning framework that leverages and aggregates the cross-modality contextual information, including visual context from high-level image representation and geometric context from 2D keypoint distribution.

...read moreread less

Abstract: Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.

...read moreread less

Proceedings Article•

Learning Action Representations for Reinforcement Learning

[...]

Yash Chandak¹, Georgios Theocharous², James Kostas¹, Scott M. Jordan¹, Philip S. Thomas¹ - Show less +1 more•Institutions (2)

University of Massachusetts Amherst¹, Adobe Systems²

01 Feb 2019

TL;DR: In this article, a policy can be decomposed into a component that acts in a low-dimensional space of action representations, and another component that transforms these representations into actual actions to improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken.

...read moreread less

Abstract: Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.

...read moreread less

Proceedings Article•

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

[...]

Fengxiang He¹, Tongliang Liu¹, Dacheng Tao¹•Institutions (1)

University of Sydney¹

01 Jan 2019

TL;DR: A PAC-Bayes generalization bound for neural networks trained by SGD is proved, which has a positive correlation with the ratio of batch size to learning rate, which builds the theoretical foundation of the training strategy.

...read moreread less

Abstract: Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability. Specifically, we prove a PAC-Bayes generalization bound for neural networks trained by SGD, which has a positive correlation with the ratio of batch size to learning rate. This correlation builds the theoretical foundation of the training strategy. Furthermore, we conduct a large-scale experiment to verify the correlation and training strategy. We trained 1,600 models based on architectures ResNet-110, and VGG-19 with datasets CIFAR-10 and CIFAR-100 while strictly control unrelated variables. Accuracies on the test sets are collected for the evaluation. Spearman's rank-order correlation coefficients and the corresponding $p$ values on 164 groups of the collected data demonstrate that the correlation is statistically significant, which fully supports the training strategy.

...read moreread less

Journal Article•DOI•

A jamming transition from under- to over-parametrization affects generalization in deep learning

[...]

Stefano Spigler¹, Mario Geiger¹, Stéphane d'Ascoli², Levent Sagun¹, Giulio Biroli², Matthieu Wyart¹ - Show less +2 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, École Normale Supérieure²

22 Nov 2019-Journal of Physics A

TL;DR: It is argued that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved, and it is shown that this transition is sharp for the hinge loss.

...read moreread less

Collapse