scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 2019"


Proceedings ArticleDOI
15 Jun 2019
TL;DR: The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem.
Abstract: Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks. Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. To efficiently solve the objective, we exploit two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. This allows us to use high-dimensional embeddings with improved generalization at a modest increase in computational overhead. Our approach, named MetaOptNet, achieves state-of-the-art performance on miniImageNet, tieredImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks.

1,084 citations


Posted Content
TL;DR: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.
Abstract: Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent. The key idea is to track dynamics of training and generalization via properties of a related kernel.

476 citations


Proceedings Article
24 Jan 2019
TL;DR: In this paper, a simple 2-layer ReLU network with random initialization is analyzed and generalization bound independent of network size is shown to be robust to the size of the network.
Abstract: Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent. The key idea is to track dynamics of training and generalization via properties of a related kernel.

403 citations


Journal ArticleDOI
TL;DR: The concept of spherical fuzzy set (SFS) and T-spherical fuzzy set [T-SFS] is introduced as a generalization of FS, IFS and PFS and shown by examples and graphical comparison with early established concepts.
Abstract: Human opinion cannot be restricted to yes or no as depicted by conventional fuzzy set (FS) and intuitionistic fuzzy set (IFS) but it can be yes, abstain, no and refusal as explained by picture fuzzy set (PFS). In this article, the concept of spherical fuzzy set (SFS) and T-spherical fuzzy set (T-SFS) is introduced as a generalization of FS, IFS and PFS. The novelty of SFS and T-SFS is shown by examples and graphical comparison with early established concepts. Some operations of SFSs and T-SFSs along with spherical fuzzy relations are defined, and related results are conferred. Medical diagnostics and decision-making problem are discussed in the environment of SFSs and T-SFSs as practical applications.

398 citations


Journal ArticleDOI
TL;DR: This work proposes deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset, and demonstrates that DeepONet significantly reduces the generalization error compared to the fully-connected networks.
Abstract: While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,\dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size.

324 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: A domain flow generation model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other and demonstrating the effectiveness of the model for both cross-domain semantic segmentation and the style generalization tasks on benchmark datasets is presented.
Abstract: In this work, we present a domain flow generation(DLOW) model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other. The benefits of our DLOW model are two-fold. First, it is able to transfer source images into different styles in the intermediate domains. The transferred images smoothly bridge the gap between source and target domains, thus easing the domain adaptation task. Second, when multiple target domains are provided for training, our DLOW model is also able to generate new styles of images that are unseen in the training data. We implement our DLOW model based on CycleGAN. A domainness variable is introduced to guide the model to generate the desired intermediate domain images. In the inference phase, a flow of various styles of images can be obtained by varying the domainness variable. We demonstrate the effectiveness of our model for both cross-domain semantic segmentation and the style generalization tasks on benchmark datasets. Our implementation is available at https://github.com/ETHRuiGong/DLOW .

311 citations



Posted Content
TL;DR: This work investigates the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics, and adopts a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift.
Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge about inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

272 citations


Posted Content
Yiding Jiang1, Behnam Neyshabur1, Hossein Mobahi1, Dilip Krishnan1, Samy Bengio1 
TL;DR: This work presents the first large scale study of generalization in deep networks, investigating more then 40 complexity measures taken from both theoretical bounds and empirical studies and showing surprising failures of some measures as well as promising measures for further research.
Abstract: Generalization of deep networks has been of great interest in recent years, resulting in a number of theoretically and empirically motivated complexity measures. However, most papers proposing such measures study only a small set of models, leaving open the question of whether the conclusion drawn from those experiments would remain valid in other settings. We present the first large scale study of generalization in deep networks. We investigate more then 40 complexity measures taken from both theoretical bounds and empirical studies. We train over 10,000 convolutional networks by systematically varying commonly used hyperparameters. Hoping to uncover potentially causal relationships between each measure and generalization, we analyze carefully controlled experiments and show surprising failures of some measures as well as promising measures for further research.

258 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: This work proposes to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework under a dual-force triplet-mining constraint, which ensures that the learned feature space is discriminating and shared by multiple source domains, and thus more generalized to new face presentation attacks.
Abstract: Face presentation attacks have become an increasingly critical issue in the face recognition community. Many face anti-spoofing methods have been proposed, but they cannot generalize well on "unseen" attacks. This work focuses on improving the generalization ability of face anti-spoofing methods from the perspective of the domain generalization. We propose to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework. In this framework, a multi-adversarial deep domain generalization is performed under a dual-force triplet-mining constraint. This ensures that the learned feature space is discriminative and shared by multiple source domains, and thus is more generalized to new face presentation attacks. An auxiliary face depth supervision is incorporated to further enhance the generalization ability. Extensive experiments on four public datasets validate the effectiveness of the proposed method.

245 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: A new approach of domain randomization and pyramid consistency to learn a model with high generalizability for semantic segmentation of real-world self-driving scenes in a domain generalization fashion is proposed.
Abstract: We propose to harness the potential of simulation for semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any information about target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First, we propose to randomize the synthetic images with styles of real images in terms of visual appearances using auxiliary datasets, in order to effectively learn domain-invariant representations. Second, we further enforce pyramid consistency across different "stylized" images and within an image, in order to learn domain-invariant and scale-invariant features, respectively. Extensive experiments are conducted on generalization from GTA and SYNTHIA to Cityscapes, BDDS, and Mapillary; and our method achieves superior results over the state-of-the-art techniques. Remarkably, our generalization results are on par with or even better than those obtained by state-of-the-art simulation-to-real domain adaptation methods, which access the target domain data at training time.

Proceedings Article
29 Oct 2019
TL;DR: In this paper, the authors adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift and introduce two complementary losses which explicitly regularize the semantic structure of the feature space.
Abstract: Generalization capability to unseen domains is crucial for machine learning models when deploying to real-world conditions. We investigate the challenging problem of domain generalization, i.e., training a model on multi-domain source data such that it can directly generalize to target domains with unknown statistics. We adopt a model-agnostic learning paradigm with gradient-based meta-train and meta-test procedures to expose the optimization to domain shift. Further, we introduce two complementary losses which explicitly regularize the semantic structure of the feature space. Globally, we align a derived soft confusion matrix to preserve general knowledge of inter-class relationships. Locally, we promote domain-independent class-specific cohesion and separation of sample features with a metric-learning component. The effectiveness of our method is demonstrated with new state-of-the-art results on two common object recognition benchmarks. Our method also shows consistent improvement on a medical image segmentation task.

Journal ArticleDOI
Masanao Ozawa1
TL;DR: In this paper, Ozawa et al. proposed an improved root-mean-square (RMS) metric for quantum measurement uncertainty relation, which is state-dependent, operationally definable and perfectly characterizes accurate measurements.
Abstract: Defining and measuring the error of a measurement is one of the most fundamental activities in experimental science. However, quantum theory shows a peculiar difficulty in extending the classical notion of root-mean-square (rms) error to quantum measurements. A straightforward generalization based on the noise-operator was used to reformulate Heisenberg’s uncertainty relation on the accuracy of simultaneous measurements to be universally valid and made the conventional formulation testable to observe its violation. Recently, its reliability was examined based on an anomaly that the error vanishes for some inaccurate measurements, in which the meter does not commute with the measured observable. Here, we propose an improved definition for a quantum generalization of the classical rms error, which is state-dependent, operationally definable, and perfectly characterizes accurate measurements. Moreover, it is shown that the new notion maintains the previously obtained universally valid uncertainty relations and their experimental confirmations without changing their forms and interpretations, in contrast to a prevailing view that a state-dependent formulation for measurement uncertainty relation is not tenable. An improved definition extends the notion of root-mean-square error from classical to quantum measurements. How to define and measure the error of a measurement is one of the basic characteristics of experimental science. The root-mean-square error is a frequently used metric, but extending this notion from classical to quantum measurements is not trivial. Attempts to generalize this error to quantum measurements have been made, but many approaches suffer from anomalies, which unwantedly see the error vanish for certain types of measurements. Masanao Ozawa from Nagoya University now presents an improved definition for a quantum generalization of the classical root-mean-square error, which doesn’t suffer from such limitations.

Proceedings Article
06 Sep 2019
TL;DR: This paper showed that applying two-sided uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than Ω(n) √ ω(n).
Abstract: Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot ``explain generalization'' -- even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small $\epsilon$ in our settings, we show that applying (two-sided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than $1-\epsilon$. Through these findings, we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well.

Proceedings ArticleDOI
31 May 2019
TL;DR: This paper proposed MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five reading comprehension (RC) datasets.
Abstract: A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community.

Proceedings Article
01 Jan 2019
TL;DR: In this article, the authors show that the expected loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of neural tangent random feature (NTRF) model induced by the network gradient at initialization.
Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a \textit{neural tangent random feature} (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

Proceedings Article
01 Jan 2019
TL;DR: In this article, memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning, which solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.
Abstract: People can learn a new concept and use it compositionally, understanding how to "blicket twice" after learning how to "blicket." In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning. In this approach, models train on a series of seq2seq problems to acquire the compositional skills needed to solve new seq2seq problems. Meta se2seq learning solves several of the SCAN tests for compositional learning and can learn to apply implicit rules to variables.

Journal ArticleDOI
TL;DR: The nonlocal Allen--Cahn equation, a generalization of the classic Allen-- Cahn equation by replacing the Laplacian with a parameterized nonlocal diffusion operator, satisfies the maximum principle.
Abstract: The nonlocal Allen--Cahn equation, a generalization of the classic Allen--Cahn equation by replacing the Laplacian with a parameterized nonlocal diffusion operator, satisfies the maximum principle ...

Posted Content
TL;DR: In this article, the authors study the inductive bias of learning in such a regime by analyzing the neural tangent kernel and the corresponding function space (RKHS), and compare to other known kernels for similar architectures.
Abstract: State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures.

Posted Content
TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.
Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.

Journal ArticleDOI
TL;DR: In this paper, the mutual information (or free entropy) from which the Bayes-optimal estimation and generalization errors of generalized linear models (GLMs) are deduced is analyzed.
Abstract: Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: This paper demonstrates that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples, and presents a novel framework for uncertainty based class imbalance learning that efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers.
Abstract: Learning unbiased models on imbalanced datasets is a significant challenge. Rare classes tend to get a concentrated representation in the classification space which hampers the generalization of learned boundaries to new test examples. In this paper, we demonstrate that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples. Subsequently, we present a novel framework for uncertainty based class imbalance learning that follows two key insights: First, classification boundaries should be extended further away from a more uncertain (rare) class to avoid over-fitting and enhance its generalization. Second, each sample should be modeled as a multi-variate Gaussian distribution with a mean vector and a covariance matrix defined by the sample's uncertainty. The learned boundaries should respect not only the individual samples but also their distribution in the feature space. Our proposed approach efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers. We systematically study the class imbalance problem and derive a novel loss formulation for max-margin learning based on Bayesian uncertainty measure. The proposed method shows significant performance improvements on six benchmark datasets for face verification, attribute prediction, digit/object classification and skin lesion detection.

Proceedings Article
01 Jun 2019
TL;DR: In this article, the role of model usage in policy optimization both theoretically and empirically is investigated, and a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms.
Abstract: Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

Journal ArticleDOI
TL;DR: In this article, a survey of the existing literature on BF optimization, covering more than 60 variants, is presented, and a comprehensive analysis and qualitative comparison are conducted from the perspectives of BF components.
Abstract: Bloom filter (BF) has been widely used to support membership query, i.e., to judge whether a given element ${x}$ is a member of a given set ${S}$ or not. Recent years have seen a flourish design explosion of BF due to its characteristic of space-efficiency and the functionality of constant-time membership query. The existing reviews or surveys mainly focus on the applications of BF, but fall short in covering the current trends, thereby lacking intrinsic understanding of their design philosophy. To this end, this survey provides an overview of BF and its variants, with an emphasis on the optimization techniques. Basically, we survey the existing variants from two dimensions, i.e., performance and generalization. To improve the performance, dozens of variants devote themselves to reducing the false positives and implementation costs. Besides, tens of variants generalize the BF framework in more scenarios by diversifying the input sets and enriching the output functionalities. To summarize the existing efforts, we conduct an in-depth study of the existing literature on BF optimization, covering more than 60 variants. We unearth the design philosophy of these variants and elaborate how the employed optimization techniques improve BF. Furthermore, comprehensive analysis and qualitative comparison are conducted from the perspectives of BF components. Lastly, we highlight the future trends of designing BFs. This is, to the best of our knowledge, the first survey that accomplishes such goals.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Zhang et al. as discussed by the authors proposed Anchor-to-Joint Regression Network (A2J) to estimate 3D hand and body pose estimation in depth images.
Abstract: For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. Within A2J, anchor points able to capture global-local spatial context information are densely set on depth image as local regressors for the joints. They contribute to predict the positions of the joints in ensemble way to enhance generalization ability. The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners. To discover informative anchor points towards certain joint, anchor proposal procedure is also proposed for A2J. Meanwhile 2D CNN (i.e., ResNet- 50) is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers. The experiments on 3 hand datasets and 2 body datasets verify A2J’s superiority. Meanwhile, A2J is of high running speed around 100 FPS on single NVIDIA 1080Ti GPU.

Journal ArticleDOI
TL;DR: A new framework to facilitate robot skill generalization is proposed, in that the learned skills are first segmented into a sequence of subskills automatically, then each individual subskill is encoded and regulated accordingly.
Abstract: Robots are often required to generalize the skills learned from human demonstrations to fulfil new task requirements. However, skill generalization will be difficult to realize when facing with the following situations: the skill for a complex multistep task includes a number of features; some special constraints are imposed on the robots during the process of task reproduction; and a completely new situation quite different with the one in which demonstrations are given to the robot. This work proposes a new framework to facilitate robot skill generalization. The basic idea lies in that the learned skills are first segmented into a sequence of subskills automatically, then each individual subskill is encoded and regulated accordingly. Specifically, we adapt each set of the segmented movement trajectories individually instead of the whole movement profiles, thus, making it more convenient for the realization of skill generalization. In addition, human limb stiffness estimated from surface electromyographic signals is considered in the framework for the realization of human-to-robot variable impedance control skill transfer, as well as the generalization of both movement trajectories and stiffness profiles. Experimental study has been performed to verify the effectiveness of the proposed framework.

Proceedings ArticleDOI
08 Apr 2019
TL;DR: This paper proposed a unified learning framework that leverages and aggregates the cross-modality contextual information, including visual context from high-level image representation and geometric context from 2D keypoint distribution.
Abstract: Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.

Proceedings Article
01 Feb 2019
TL;DR: In this article, a policy can be decomposed into a component that acts in a low-dimensional space of action representations, and another component that transforms these representations into actual actions to improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken.
Abstract: Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.

Proceedings Article
01 Jan 2019
TL;DR: A PAC-Bayes generalization bound for neural networks trained by SGD is proved, which has a positive correlation with the ratio of batch size to learning rate, which builds the theoretical foundation of the training strategy.
Abstract: Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability. Specifically, we prove a PAC-Bayes generalization bound for neural networks trained by SGD, which has a positive correlation with the ratio of batch size to learning rate. This correlation builds the theoretical foundation of the training strategy. Furthermore, we conduct a large-scale experiment to verify the correlation and training strategy. We trained 1,600 models based on architectures ResNet-110, and VGG-19 with datasets CIFAR-10 and CIFAR-100 while strictly control unrelated variables. Accuracies on the test sets are collected for the evaluation. Spearman's rank-order correlation coefficients and the corresponding $p$ values on 164 groups of the collected data demonstrate that the correlation is statistically significant, which fully supports the training strategy.

Journal ArticleDOI
TL;DR: It is argued that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved, and it is shown that this transition is sharp for the hinge loss.