Showing papers on "Generalization published in 2020"

PDF

Open Access

Posted Content•

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

[...]

Pengcheng He¹, Xiaodong Liu¹, Jianfeng Gao¹, Weizhu Chen¹•Institutions (1)

05 Jun 2020-arXiv: Computation and Language

TL;DR: A new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) is proposed that improves the BERT and RoBERTa models using two novel techniques that significantly improve the efficiency of model pre-training and performance of downstream tasks.

...read moreread less

Abstract: Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions, respectively. Second, an enhanced mask decoder is used to incorporate absolute positions in the decoding layer to predict the masked tokens in model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve models' generalization. We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). Notably, we scale up DeBERTa by training a larger version that consists of 48 Transform layers with 1.5 billion parameters. The significant performance boost makes the single DeBERTa model surpass the human performance on the SuperGLUE benchmark (Wang et al., 2019a) for the first time in terms of macro-average score (89.9 versus 89.8), and the ensemble DeBERTa model sits atop the SuperGLUE leaderboard as of January 6, 2021, out performing the human baseline by a decent margin (90.3 versus 89.8).

...read moreread less

921 citations

Journal Article•DOI•

IFCNN: A general image fusion framework based on convolutional neural network

[...]

Yu Zhang¹, Yu Liu², Peng Sun, Han Yan¹, Xiaolin Zhao, Li Zhang¹ - Show less +2 more•Institutions (2)

Tsinghua University¹, Hefei University of Technology²

01 Feb 2020-Information Fusion

TL;DR: The experimental results show that the proposed model demonstrates better generalization ability than the existing image fusion models for fusing various types of images, such as multi-focus, infrared-visual, multi-modal medical and multi-exposure images.

...read moreread less

524 citations

Posted Content•

In Search of Lost Domain Generalization

[...]

Ishaan Gulrajani¹, David Lopez-Paz²•Institutions (2)

Stanford University¹, Facebook²

02 Jul 2020-arXiv: Learning

TL;DR: This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.

...read moreread less

Abstract: The goal of domain generalization algorithms is to predict well on distributions different from those seen during training While a myriad of domain generalization algorithms exist, inconsistencies in experimental conditions -- datasets, architectures, and model selection criteria -- render fair and realistic comparisons difficult In this paper, we are interested in understanding how useful domain generalization algorithms are in realistic settings As a first step, we realize that model selection is non-trivial for domain generalization tasks Contrary to prior work, we argue that domain generalization algorithms without a model selection strategy should be regarded as incomplete Next, we implement DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria We conduct extensive experiments using DomainBed and find that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets Looking forward, we hope that the release of DomainBed, along with contributions from fellow researchers, will streamline reproducible and rigorous research in domain generalization

...read moreread less

492 citations

Posted Content•

Learning from Noisy Labels with Deep Neural Networks: A Survey

[...]

Hwanjun Song¹, Minseok Kim², Dongmin Park², Jae-Gil Lee²•Institutions (2)

Naver Corporation¹, KAIST²

16 Jul 2020-arXiv: Learning

TL;DR: A comprehensive review of 62 state-of-the-art robust training methods, all of which are categorized into five groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority.

...read moreread less

Abstract: Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in modern deep learning applications. In this survey, we first describe the problem of learning with label noise from a supervised learning perspective. Next, we provide a comprehensive review of 46 state-of-the-art robust training methods, all of which are categorized into seven groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority. Subsequently, we summarize the typically used evaluation methodology, including public noisy datasets and evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for future studies.

...read moreread less

474 citations

Proceedings Article•DOI•

Local Deep Implicit Functions for 3D Shape

[...]

Kyle Genova¹, Forrester Cole², Avneesh Sud², Aaron Sarna², Thomas Funkhouser¹ - Show less +1 more•Institutions (2)

Princeton University¹, Google²

14 Jun 2020

TL;DR: Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a structured set of learned implicit functions that provides higher surface reconstruction accuracy than the state-of-the-art (OccNet), while requiring fewer than 1\% of the network parameters.

...read moreread less

Abstract: The goal of this project is to learn a 3D shape representation that enables accurate surface reconstruction, compact storage, efficient computation, consistency for similar shapes, generalization across diverse shape categories, and inference from depth camera observations. Towards this end, we introduce Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a structured set of learned implicit functions. We provide networks that infer the space decomposition and local deep implicit functions from a 3D mesh or posed depth image. During experiments, we find that it provides 10.3 points higher surface reconstruction accuracy (F-Score) than the state-of-the-art (OccNet), while requiring fewer than 1\% of the network parameters. Experiments on posed depth image completion and generalization to unseen classes show 15.8 and 17.8 point improvements over the state-of-the-art, while producing a structured 3D representation for each input with consistency across diverse shape collections.

...read moreread less

378 citations

Posted Content•

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

[...]

Andrew Gordon Wilson¹, Pavel Izmailov¹•Institutions (1)

New York University¹

20 Feb 2020-arXiv: Learning

TL;DR: In this article, deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.

...read moreread less

Abstract: The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.

...read moreread less

328 citations

Proceedings Article•DOI•

High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks

[...]

Haohan Wang¹, Xindi Wu¹, Zeyi Huang¹, Eric P. Xing¹•Institutions (1)

Carnegie Mellon University¹

14 Jun 2020

TL;DR: This work first notices CNN's ability in capturing the high-frequency components of images, which are almost imperceptible to a human, and leads to multiple hypotheses that are related to the generalization behaviors of CNN, including a potential explanation for adversarial examples.

...read moreread less

Abstract: We investigate the relationship between the frequency spectrum of image data and the generalization behavior of convolutional neural networks (CNN). We first notice CNN's ability in capturing the high-frequency components of images. These high-frequency components are almost imperceptible to a human. Thus the observation leads to multiple hypotheses that are related to the generalization behaviors of CNN, including a potential explanation for adversarial examples, a discussion of CNN's trade-off between robustness and accuracy, and some evidence in understanding training heuristics.

...read moreread less

321 citations

Journal Article•DOI•

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer.

[...]

Rene Ranftl¹, Katrin Lasinger², David Hafner¹, Konrad Schindler², Vladlen Koltun¹ - Show less +1 more•Institutions (2)

Intel¹, ETH Zurich²

27 Aug 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The authors proposed a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks.

...read moreread less

Abstract: The success of monocular depth estimation relies on large and diverse training sets. Due to the challenges associated with acquiring dense ground-truth depth across different environments at scale, a number of datasets with distinct characteristics and biases have emerged. We develop tools that enable mixing multiple datasets during training, even if their annotations are incompatible. In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks. Armed with these tools, we experiment with six diverse training datasets, including a new, massive data source: 3D films. To demonstrate the generalization power of our approach we use zero-shot cross-dataset transfer, i.e. we evaluate on datasets that were not seen during training. The experiments confirm that mixing data from complementary sources greatly improves monocular depth estimation. Our approach clearly outperforms competing methods across diverse datasets, setting a new state of the art for monocular depth estimation.

...read moreread less

300 citations

Posted Content•

Self-Challenging Improves Cross-Domain Generalization

[...]

Zeyi Huang¹, Haohan Wang¹, Eric P. Xing¹, Dong Huang¹•Institutions (1)

Carnegie Mellon University¹

05 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, representation self-challenging (RSC) is proposed to improve cross-domain generalization of CNNs by iteratively disabling the dominant features on the training data and forcing the network to activate remaining features that correlate with labels.

...read moreread less

Abstract: Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels. When the training and testing data are under similar distributions, their dominant features are similar, which usually facilitates decent performance on the testing data. The performance is nonetheless unmet when tested on samples from different distributions, leading to the challenges in cross-domain image classification. We introduce a simple training heuristic, Representation Self-Challenging (RSC), that significantly improves the generalization of CNN to the out-of-domain data. RSC iteratively challenges (discards) the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels. This process appears to activate feature representations applicable to out-of-domain data without prior knowledge of new domain and without learning extra network parameters. We present theoretical properties and conditions of RSC for improving cross-domain generalization. The experiments endorse the simple, effective and architecture-agnostic nature of our RSC method.

...read moreread less

272 citations

Journal Article•DOI•

Deep Domain-Adversarial Image Generation for Domain Generalisation

[...]

Kaiyang Zhou¹, Yongxin Yang¹, Timothy M. Hospedales², Tao Xiang¹•Institutions (2)

University of Surrey¹, University of Edinburgh²

03 Apr 2020

TL;DR: This paper proposes a novel DG approach based on Deep Domain-Adversarial Image Generation based on augmenting the source training data with the generated unseen domain data to make the label classifier more robust to unknown domain changes.

...read moreread less

Abstract: Machine learning models typically suffer from the domain shift problem when trained on a source dataset and evaluated on a target dataset of different distribution. To overcome this problem, domain generalisation (DG) methods aim to leverage data from multiple source domains so that a trained model can generalise to unseen domains. In this paper, we propose a novel DG approach based on Deep Domain-Adversarial Image Generation (DDAIG). Specifically, DDAIG consists of three components, namely a label classifier, a domain classifier and a domain transformation network (DoTNet). The goal for DoTNet is to map the source training data to unseen domains. This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classified by the label classifier while fooling the domain classifier. By augmenting the source training data with the generated unseen domain data, we can make the label classifier more robust to unknown domain changes. Extensive experiments on four DG datasets demonstrate the effectiveness of our approach.

...read moreread less

224 citations

Journal Article•DOI•

A dynamic ensemble learning algorithm for neural networks

[...]

Kazi Md. Rokibul Alam¹, Nazmul Siddique², Hojjat Adeli³•Institutions (3)

Khulna University of Engineering & Technology¹, Intel², Ohio State University³

30 Jun 2020-Neural Computing and Applications

TL;DR: It has been confirmed by experimental results that DEL produces dynamic NN ensembles of appropriate architecture and diversity that demonstrate good generalization ability.

...read moreread less

Abstract: This paper presents a novel dynamic ensemble learning (DEL) algorithm for designing ensemble of neural networks (NNs). DEL algorithm determines the size of ensemble, the number of individual NNs employing a constructive strategy, the number of hidden nodes of individual NNs employing a constructive–pruning strategy, and different training samples for individual NN’s learning. For diversity, negative correlation learning has been introduced and also variation of training samples has been made for individual NNs that provide better learning from the whole training samples. The major benefits of the proposed DEL compared to existing ensemble algorithms are (1) automatic design of ensemble; (2) maintaining accuracy and diversity of NNs at the same time; and (3) minimum number of parameters to be defined by user. DEL algorithm is applied to a set of real-world classification problems such as the cancer, diabetes, heart disease, thyroid, credit card, glass, gene, horse, letter recognition, mushroom, and soybean datasets. It has been confirmed by experimental results that DEL produces dynamic NN ensembles of appropriate architecture and diversity that demonstrate good generalization ability.

...read moreread less

Journal Article•DOI•

Domain Generalization Using a Mixture of Multiple Latent Domains

[...]

Toshihiko Matsuura¹, Tatsuya Harada¹•Institutions (1)

University of Tokyo¹

03 Apr 2020

TL;DR: In this article, the authors propose a method that iteratively divides samples into latent domains via clustering, and then trains the domain-invariant feature extractor shared among the divided domains via adversarial learning.

...read moreread less

Abstract: When domains, which represent underlying data distributions, vary during training and testing processes, deep neural networks suffer a drop in their performance. Domain generalization allows improvements in the generalization performance for unseen target domains by using multiple source domains. Conventional methods assume that the domain to which each sample belongs is known in training. However, many datasets, such as those collected via web crawling, contain a mixture of multiple latent domains, in which the domain of each sample is unknown. This paper introduces domain generalization using a mixture of multiple latent domains as a novel and more realistic scenario, where we try to train a domain-generalized model without using domain labels. To address this scenario, we propose a method that iteratively divides samples into latent domains via clustering, and which trains the domain-invariant feature extractor shared among the divided latent domains via adversarial learning. We assume that the latent domain of images is reflected in their style, and thus, utilize style features for clustering. By using these features, our proposed method successfully discovers latent domains and achieves domain generalization even if the domain labels are not given. Experiments show that our proposed method can train a domain-generalized model without using domain labels. Moreover, it outperforms conventional domain generalization methods, including those that utilize domain labels.

...read moreread less

Proceedings Article•

Classification-Based Anomaly Detection for General Data

[...]

Liron Bergman¹, Yedid Hoshen¹•Institutions (1)

Hebrew University of Jerusalem¹

30 Apr 2020

TL;DR: This work presents a unifying view and proposes an open-set method to relax current generalization assumptions, and extends the applicability of transformation-based methods to non-image data using random affine transformations.

...read moreread less

Abstract: Anomaly detection, finding patterns that substantially deviate from those seen previously, is one of the fundamental problems of artificial intelligence. Recently, classification-based methods were shown to achieve superior results on this task. In this work, we present a unifying view and propose an open-set method to relax current generalization assumptions. Furthermore, we extend the applicability of transformation-based methods to non-image data using random affine transformations. Our method is shown to obtain state-of-the-art accuracy and is applicable to broad data types. The strong performance of our method is extensively validated on multiple datasets from different domains.

...read moreread less

Proceedings Article•DOI•

Real Time Speech Enhancement in the Waveform Domain.

[...]

Alexandre Défossez¹, Gabriel Synnaeve¹, Yossi Adi¹•Institutions (1)

Facebook¹

23 Jun 2020

TL;DR: Empirical evidence shows that the proposed causal speech enhancement model, based on an encoder-decoder architecture with skip-connections, is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb.

...read moreread less

Abstract: We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

...read moreread less

Posted Content•

GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks

[...]

Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, Dawei Yin, Yi Chang - Show less +2 more

01 Jan 2020-arXiv: Learning

TL;DR: This paper proposes GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method.

...read moreread less

Abstract: Graph structured data has wide applicability in various domains such as physics, chemistry, biology, computer vision, and social networks, to name a few. Recently, graph neural networks (GNN) were shown to be successful in effectively representing graph structured data because of their good performance and generalization ability. GNN is a deep learning based method that learns a node representation by combining specific nodes and the structural/topological information of a graph. However, like other deep models, explaining the effectiveness of GNN models is a challenging task because of the complex nonlinear transformations made over the iterations. In this paper, we propose GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method. GraphLIME is a generic GNN-model explanation framework that learns a nonlinear interpretable model locally in the subgraph of the node being explained. More specifically, to explain a node, we generate a nonlinear interpretable model from its $N$-hop neighborhood and then compute the K most representative features as the explanations of its prediction using HSIC Lasso. Through experiments on two real-world datasets, the explanations of GraphLIME are found to be of extraordinary degree and more descriptive in comparison to the existing explanation methods.

...read moreread less

Journal Article•DOI•

A sharp double inequality involving generalized complete elliptic integral of the first kind

[...]

Tie-Hong Zhao, Miao-Kun Wang, Yu-Ming Chu¹, Yu-Ming Chu²•Institutions (2)

Changsha University of Science and Technology¹, Hangzhou Normal University²

13 May 2020

TL;DR: In this paper, a sharp double inequality involving the ratio of generalized complete elliptic integrals of the first kind was established, which is the improvement and generalization of some previously known results.

...read moreread less

Abstract: In the article, we establish a sharp double inequality involving the ratio of generalized complete elliptic integrals of the first kind, which is the improvement and generalization of some previously known results.

...read moreread less

Proceedings Article•

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

[...]

Daniel Keysers¹, Nathanael Schärli¹, Nathan Scales¹, Hylke Buisman¹, Daniel Furrer¹, Sergii Kashubin, Nikola Momchev¹, Danila Sinopalnikov¹, Lukasz Stafiniak¹, Tibor Tihon¹, Dmitry Tsarkov², Xiao Wang, Marc van Zee¹, Olivier Bousquet¹ - Show less +10 more•Institutions (2)

Google¹, University of Manchester²

30 Apr 2020

TL;DR: A novel method to systematically construct compositional generalization benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets is introduced, and it is demonstrated how this method can be used to create new compositionality benchmarks on top of the existing SCAN dataset.

...read moreread less

Abstract: State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and we quantitatively compare this method to other approaches for creating compositional generalization benchmarks. We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures. We find that they fail to generalize compositionally and that there is a surprisingly strong negative correlation between compound divergence and accuracy. We also demonstrate how our method can be used to create new compositionality benchmarks on top of the existing SCAN dataset, which confirms these findings.

...read moreread less

Posted Content•

Weakly-Supervised Disentanglement Without Compromises

[...]

Francesco Locatello¹, Ben Poole, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem, Michael Tschannen - Show less +2 more•Institutions (1)

ETH Zurich¹

07 Feb 2020-arXiv: Learning

TL;DR: This work theoretically shows that only knowing how many factors have changed, but not which ones, is sufficient to learn disentangled representations, and provides practical algorithms that learn disENTangled representations from pairs of images without requiring annotation of groups, individual factors, or the number of factors that have changed.

...read moreread less

Abstract: Intelligent agents should be able to learn useful representations by observing changes in their environment. We model such observations as pairs of non-i.i.d. images sharing at least one of the underlying factors of variation. First, we theoretically show that only knowing how many factors have changed, but not which ones, is sufficient to learn disentangled representations. Second, we provide practical algorithms that learn disentangled representations from pairs of images without requiring annotation of groups, individual factors, or the number of factors that have changed. Third, we perform a large-scale empirical study and show that such pairs of observations are sufficient to reliably learn disentangled representations on several benchmark data sets. Finally, we evaluate our learned representations and find that they are simultaneously useful on a diverse suite of tasks, including generalization under covariate shifts, fairness, and abstract reasoning. Overall, our results demonstrate that weak supervision enables learning of useful disentangled representations in realistic scenarios.

...read moreread less

Proceedings Article•

What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

[...]

Vitaly Feldman¹, Chiyuan Zhang¹•Institutions (1)

Google¹

09 Aug 2020

TL;DR: The experiments demonstrate the significant benefits of memorization for generalization on several standard benchmarks and provide quantitative and visually compelling evidence for the theory put forth in Feldman (2019), which proposes a theoretical explanation for this phenomenon.

...read moreread less

Abstract: Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical explanation for this phenomenon based on a combination of two insights. First, natural image and data distributions are (informally) known to be long-tailed, that is have a significant fraction of rare and atypical examples. Second, in a simple theoretical model such memorization is necessary for achieving close-to-optimal generalization error when the data distribution is long-tailed. However, no direct empirical evidence for this explanation or even an approach for obtaining such evidence were given. In this work we design experiments to test the key ideas in this theory. The experiments require estimation of the influence of each training example on the accuracy at each test example as well as memorization values of training examples. Estimating these quantities directly is computationally prohibitive but we show that closely-related subsampled influence and memorization values can be estimated much more efficiently. Our experiments demonstrate the significant benefits of memorization for generalization on several standard benchmarks. They also provide quantitative and visually compelling evidence for the theory put forth in (Feldman, 2019).

...read moreread less

Posted Content•

Inductive Biases for Deep Learning of Higher-Level Cognition

[...]

Anirudh Goyal, Yoshua Bengio

30 Nov 2020-arXiv: Learning

TL;DR: This work considers a larger list of inductive biases that humans and animals exploit, focusing on those which concern mostly higher-level and sequential conscious processing, and suggests they could potentially help build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization.

...read moreread less

Abstract: A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.

...read moreread less

Journal Article•DOI•

The state-of-art of the generalizations of the Choquet integral: From aggregation and pre-aggregation to ordered directionally monotone functions

[...]

Graçaliz Pereira Dimuro¹, Javier Fernández², Benjamin Bedregal³, Benjamin Bedregal², Radko Mesiar⁴, José Antonio Sanz², Giancarlo Lucca¹, Humberto Bustince² - Show less +4 more•Institutions (4)

Universidade Federal do Rio Grande do Sul¹, Universidad Pública de Navarra², Federal University of Rio Grande do Norte³, Slovak University of Technology in Bratislava⁴

01 May 2020-Information Fusion

TL;DR: A general panorama of the state of the art of the Choquet integral generalizations is offered, showing the relations and intersections among such five classes of generalizations.

...read moreread less

Proceedings Article•DOI•

Towards Better Generalization: Joint Depth-Pose Learning Without PoseNet

[...]

Wang Zhao¹, Shaohui Liu¹, Yezhi Shu¹, Yong-Jin Liu¹•Institutions (1)

Tsinghua University¹

14 Jun 2020

TL;DR: A novel system that explicitly disentangles scale from the network estimation, which achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset and presents some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability.

...read moreread less

Abstract: In this work, we tackle the essential problem of scale inconsistency for self supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow.

...read moreread less

Proceedings Article•

Fantastic Generalization Measures and Where to Find Them

[...]

Yiding Jiang¹, Behnam Neyshabur¹, Dilip Krishnan¹, Hossein Mobahi¹, Samy Bengio¹ - Show less +1 more•Institutions (1)

Google¹

30 Apr 2020

TL;DR: In this article, a large scale study of generalization bounds and measures in deep networks is presented, where the authors train over two thousand CIFAR-10 networks with systematic changes in important hyper-parameters.

...read moreread less

Abstract: Generalization of deep networks has been intensely researched in recent years, resulting in a number of theoretical bounds and empirically motivated measures. However, most papers proposing such measures only study a small set of models, leaving open the question of whether these measures are truly useful in practice. We present the first large scale study of generalization bounds and measures in deep networks. We train over two thousand CIFAR-10 networks with systematic changes in important hyper-parameters. We attempt to uncover potential causal relationships between each measure and generalization, by using rank correlation coefficient and its modified forms. We analyze the results and show that some of the studied measures are very promising for further research.

...read moreread less

Journal Article•DOI•

Correlation coefficients for T-spherical fuzzy sets and their applications in clustering and multi-attribute decision making

[...]

Kifayat Ullah¹, Harish Garg², Tahir Mahmood¹, Naeem Jan¹, Zeeshan Ali¹ - Show less +1 more•Institutions (2)

International Islamic University, Islamabad¹, Thapar University²

01 Feb 2020

TL;DR: The objective of this paper is to develop some correlation coefficients for T-spherical fuzzy sets due to the non-applicability of correlations of intuitionistic fuzzy sets and picture fuzzy sets in some certain circumstances.

...read moreread less

Abstract: The framework of T-spherical fuzzy set is a generalization of fuzzy set, intuitionistic fuzzy set and picture fuzzy set having a great potential of dealing with uncertain events with no limitation. A T-spherical fuzzy framework can deal with phenomena of more than yes or no type; for example, consider the scenario of voting where one’s voting interest is not limited to “in favor’’ or “against’’ rather there could be some sort of abstinence or refusal degree also. The objective of this paper is to develop some correlation coefficients for T-spherical fuzzy sets due to the non-applicability of correlations of intuitionistic fuzzy sets and picture fuzzy sets in some certain circumstances. The fitness of new correlation coefficients has been discussed, and their generalization is studied with the help of some results. Clustering and multi-attribute decision-making algorithms have been proposed in the environment of T-spherical fuzzy sets. To demonstrate the viability of proposed algorithms and correlation coefficients, two real-life problems including a clustering problem and a multi-attribute decision-making problem have been solved. A comparative study of the newly presented and pre-existing literature is established showing the superiority of proposed work over the existing theory. Some advantages of new correlation coefficients and drawbacks of the pre-existing work are demonstrated with the help of numerical examples.

...read moreread less

Journal Article•DOI•

Scaling description of generalization with number of parameters in deep learning

[...]

Mario Geiger¹, Arthur Jacot¹, Stefano Spigler¹, Franck Gabriel¹, Levent Sagun¹, Stéphane d'Ascoli², Giulio Biroli², Clément Hongler¹, Matthieu Wyart¹ - Show less +5 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, École Normale Supérieure²

04 Feb 2020-Journal of Statistical Mechanics: Theory and Experiment

TL;DR: In this article, the authors show that the initialization causes finite-size random fluctuations of the neural net output function f N around its expectation, which affect the generalization error for classification.

...read moreread less

Abstract: Supervised deep learning involves the training of neural networks with a large number N of parameters. For large enough N, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as N grows past a certain threshold N *. Instead, empirical studies have shown that in the over-parametrized regime, generalization error keeps decreasing with N. We resolve this paradox through a new framework. We rely on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations of the neural net output function f N around its expectation . These affect the generalization error for classification: under natural assumptions, it decays to a plateau value in a power-law fashion ~N −1/2. This description breaks down at a so-called jamming transition N = N *. At this threshold, we argue that diverges. This result leads to a plausible explanation for the cusp in test error known to occur at N *. Our results are confirmed by extensive empirical observations on the MNIST and CIFAR image datasets. Our analysis finally suggests that, given a computational envelope, the smallest generalization error is obtained using several networks of intermediate sizes, just beyond N *, and averaging their outputs.

...read moreread less

Book Chapter•DOI•

Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization

[...]

Shujun Wang¹, Lequan Yu², Caizi Li³, Chi-Wing Fu¹, Pheng-Ann Heng¹ - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, Stanford University², Chinese Academy of Sciences³

23 Aug 2020

TL;DR: A new domain generalization framework that learns how to generalize across domains simultaneously from extrinsic relationship supervision and intrinsic self-supervision for images from multi-source domains is presented.

...read moreread less

Abstract: The generalization capability of neural networks across domains is crucial for real-world applications. We argue that a generalized object recognition system should well understand the relationships among different images and also the images themselves at the same time. To this end, we present a new domain generalization framework (called EISNet) that learns how to generalize across domains simultaneously from extrinsic relationship supervision and intrinsic self-supervision for images from multi-source domains. To be specific, we formulate our framework with feature embedding using a multi-task learning paradigm. Besides conducting the common supervised recognition task, we seamlessly integrate a momentum metric learning task and a self-supervised auxiliary task to collectively integrate the extrinsic and intrinsic supervisions. Also, we develop an effective momentum metric learning scheme with the K-hard negative mining to boost the network generalization ability. We demonstrate the effectiveness of our approach on two standard object recognition benchmarks VLCS and PACS, and show that our EISNet achieves state-of-the-art performance.

...read moreread less

Journal Article•DOI•

A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics

[...]

Weinan E¹, Chao Ma¹, Lei Wu¹•Institutions (1)

Princeton University¹

01 Jan 2020-Science China-mathematics

TL;DR: In this paper, a fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated, and sharp estimates of the generalization error is established for target functions in the appropriate reproducing kernel Hilbert space.

...read moreread less

Abstract: A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero training loss exponentially fast regardless of the quality of the labels. In addition, it is proved that throughout the training process the functions represented by the neural network model are uniformly close to that of a kernel method. For general values of the network width and training data size, sharp estimates of the generalization error is established for target functions in the appropriate reproducing kernel Hilbert space.

...read moreread less

Journal Article•DOI•

Prior Guided Feature Enrichment Network for Few-Shot Segmentation.

[...]

Zhuotao Tian¹, Hengshuang Zhao¹, Michelle Shu², Zhicheng Yang, Ruiyu Li, Jiaya Jia¹ - Show less +2 more•Institutions (2)

The Chinese University of Hong Kong¹, Johns Hopkins University²

03 Aug 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, a Prior Guided Feature Enrichment Network (PFENet) is proposed to solve the problem of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information of training classes and spatial inconsistency between query and support targets.

...read moreread less

Abstract: State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results and hardly work on unseen classes without fine-tuning. Few-shot segmentation is thus proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information of training classes and spatial inconsistency between query and support targets. To alleviate these issues, we propose the Prior Guided Feature Enrichment Network (PFENet). It consists of novel designs of (1) a training-free prior mask generation method that not only retains generalization power but also improves model performance and (2) Feature Enrichment Module (FEM) that overcomes spatial inconsistency by adaptively enriching query features with support features and prior masks. Extensive experiments on PASCAL- 5i and COCO prove that the proposed prior generation method and FEM both improve the baseline method significantly. Our PFENet also outperforms state-of-the-art methods by a large margin without efficiency loss. It is surprising that our model even generalizes to cases without labeled support samples.

...read moreread less

Book Chapter•DOI•

Learning to Balance Specificity and Invariance for In and Out of Domain Generalization

[...]

Prithvijit Chattopadhyay¹, Yogesh Balaji², Judy Hoffman¹•Institutions (2)

Georgia Institute of Technology¹, University of Maryland, College Park²

23 Aug 2020

TL;DR: Domain-specific masks for generalization (DMG) as discussed by the authors learns a balance of domain-invariant and domain-specific features to improve both in-domain and out-of-domain generalization performance.

...read moreread less

Abstract: We introduce Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance. For domain generalization, the goal is to learn from a set of source domains to produce a single model that will best generalize to an unseen target domain. As such, many prior approaches focus on learning representations which persist across all source domains with the assumption that these domain agnostic representations will generalize well. However, often individual domains contain characteristics which are unique and when leveraged can significantly aid in-domain recognition performance. To produce a model which best generalizes to both seen and unseen domains, we propose learning domain specific masks. The masks are encouraged to learn a balance of domain-invariant and domain-specific features, thus enabling a model which can benefit from the predictive power of specialized features while retaining the universal applicability of domain-invariant features. We demonstrate competitive performance compared to naive baselines and state-of-the-art methods on both PACS and DomainNet (Our code is available at https://github.com/prithv1/DMG).

...read moreread less

Journal Article•DOI•

MABAC method for multiple attribute group decision making under q-rung orthopair fuzzy environment

[...]

Jie Wang¹, Guiwu Wei¹, Cun Wei¹, Cun Wei², Yu Wei³ - Show less +1 more•Institutions (3)

Sichuan Normal University¹, Southwestern University of Finance and Economics², Yunnan University of Finance and Economics³

01 Feb 2020-Defence Technology

TL;DR: An actual MADM application has been given to testify this new model and some comparisons between this novel MABAC model and two q-ROFNs aggregation operators are provided to further demonstrate the merits of the q-rung orthopair fuzzy MABac model.

...read moreread less

Collapse