Showing papers by "Yoshua Bengio published in 2020"

PDF

Open Access

Journal Article•DOI•

[...]

Ian Goodfellow¹, Jean Pouget-Abadie², Mehdi Mirza², Bing Xu², David Warde-Farley², Sherjil Ozair², Aaron Courville², Yoshua Bengio² - Show less +4 more•Institutions (2)

Google¹, Université de Montréal²

22 Oct 2020-Communications of The ACM

TL;DR: A generative adversarial networks algorithm designed to solve the generative modeling problem and its applications in medicine, education and robotics are studied.

...read moreread less

Abstract: Generative adversarial networks are a kind of artificial intelligence algorithm designed to solve the generative modeling problem. The goal of a generative model is to study a collection of training examples and learn the probability distribution that generated them. Generative Adversarial Networks (GANs) are then able to generate more examples from the estimated probability distribution. Generative models based on deep learning are common, but GANs are among the most successful generative models (especially in terms of their ability to generate realistic high-resolution images). GANs have been successfully applied to a wide variety of tasks (mostly in research settings) but continue to present unique challenges and research opportunities because they are based on game theory while most other approaches to generative modeling are based on optimization.

...read moreread less

2,447 citations

Posted Content•

Benchmarking Graph Neural Networks

[...]

Vijay Prakash Dwivedi¹, Chaitanya K. Joshi¹, Thomas Laurent², Yoshua Bengio, Xavier Bresson¹ - Show less +1 more•Institutions (2)

Nanyang Technological University¹, Loyola Marymount University²

02 Mar 2020-arXiv: Learning

TL;DR: A reproducible GNN benchmarking framework is introduced, with the facility for researchers to add new models conveniently for arbitrary datasets, and a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs).

...read moreread less

Abstract: Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. As the field grows, it becomes critical to identify key architectures and validate new ideas that generalize to larger, more complex datasets. Unfortunately, it has been increasingly difficult to gauge the effectiveness of new models in the absence of a standardized benchmark with consistent experimental settings. In this paper, we introduce a reproducible GNN benchmarking framework, with the facility for researchers to add new models conveniently for arbitrary datasets. We demonstrate the usefulness of our framework by presenting a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs) for a variety of graph tasks, i.e. graph regression/classification and node/link prediction, with medium-scale datasets.

...read moreread less

481 citations

Proceedings Article•

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

[...]

Boris N. Oreshkin¹, Dmitri Carpov, Nicolas Chapados², Yoshua Bengio²•Institutions (2)

McGill University¹, Université de Montréal²

30 Apr 2020

TL;DR: The proposed deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train.

...read moreread less

Abstract: We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy.

...read moreread less

347 citations

Proceedings Article•DOI•

Multi-Task Self-Supervised Learning for Robust Speech Recognition

[...]

Mirco Ravanelli¹, Jianyuan Zhong², Santiago Pascual³, Pawel Swietojanski⁴, Joao Monteiro⁵, Jan Trmal⁶, Yoshua Bengio¹ - Show less +3 more•Institutions (6)

Université de Montréal¹, University of Rochester², Polytechnic University of Catalonia³, University of New South Wales⁴, Institut national de la recherche scientifique⁵, Johns Hopkins University⁶

25 Jan 2020

TL;DR: PASE+ is proposed, an improved version of PASE that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks and learns transferable representations suitable for highly mismatched acoustic conditions.

...read moreread less

Abstract: Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation.Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions.

...read moreread less

237 citations

Journal Article•DOI•

Predicting COVID-19 Pneumonia Severity on Chest X-ray With Deep Learning

[...]

Joseph Paul Cohen¹, Lan Dao¹, Karsten Roth², Paul Morrison³, Yoshua Bengio¹, Almas F. Abbasi⁴, Beiyi Shen⁴, Hoshmand Kochi Mahsa⁴, Marzyeh Ghassemi⁵, Haifang Li⁴, Timothy Q. Duong⁴ - Show less +7 more•Institutions (5)

Université de Montréal¹, Heidelberg University², Fontbonne University³, Stony Brook University⁴, University of Toronto⁵

28 Jul 2020-Cureus

TL;DR: The results indicate that the model’s ability to gauge the severity of COVID-19 lung infections could be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU.

...read moreread less

Abstract: Introduction The need to streamline patient management for coronavirus disease-19 (COVID-19) has become more pressing than ever. Chest X-rays (CXRs) provide a non-invasive (potentially bedside) tool to monitor the progression of the disease. In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images. Such a tool can gauge the severity of COVID-19 lung infections (and pneumonia in general) that can be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. Methods Images from a public COVID-19 database were scored retrospectively by three blinded experts in terms of the extent of lung involvement as well as the degree of opacity. A neural network model that was pre-trained on large (non-COVID-19) chest X-ray datasets is used to construct features for COVID-19 images which are predictive for our task. Results This study finds that training a regression model on a subset of the outputs from this pre-trained chest X-ray model predicts our geographic extent score (range 0-8) with 1.14 mean absolute error (MAE) and our lung opacity score (range 0-6) with 0.78 MAE. Conclusions These results indicate that our model’s ability to gauge the severity of COVID-19 lung infections could be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. To enable follow up work, we make our code, labels, and data available online.

...read moreread less

202 citations

Proceedings Article•DOI•

Experience Grounds Language

[...]

Yonatan Bisk¹, Ari Holtzman², Jesse Thomason³, Jacob Andreas⁴, Yoshua Bengio⁵, Joyce Y. Chai⁶, Mirella Lapata⁷, Angeliki Lazaridou⁸, Jonathan May⁹, Aleksandr Nisnevich¹⁰, Nicolas Pinto⁴, Joseph Turian⁵ - Show less +8 more•Institutions (10)

Carnegie Mellon University¹, Allen Institute for Artificial Intelligence², University of Washington³, Massachusetts Institute of Technology⁴, Université de Montréal⁵, Michigan State University⁶, University of Edinburgh⁷, Google⁸, Information Sciences Institute⁹, Microsoft¹⁰

21 Apr 2020

TL;DR: It is posited that the present success of representation learning approaches trained on large text corpora can be deeply enriched from the parallel tradition of research on the contextual and social nature of language.

...read moreread less

Abstract: Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates. Despite the incredible effectiveness of language processing models to tackle tasks after being trained on text alone, successful linguistic communication relies on a shared experience of the world. It is this shared experience that makes utterances meaningful. Natural language processing is a diverse field, and progress throughout its development has come from new representational theories, modeling techniques, data collection paradigms, and tasks. We posit that the present success of representation learning approaches trained on large, text-only corpora requires the parallel tradition of research on the broader physical and social context of language to address the deeper questions of communication.

...read moreread less

191 citations

Posted Content•

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

[...]

Miles Brundage¹, Shahar Avin², Jasmine Wang³, Haydn Belfield², Gretchen Krueger¹, Gillian K. Hadfield¹, Gillian K. Hadfield⁴, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong⁵, Tegan Maharaj, Pang Wei Koh⁶, Sara Hooker⁷, Jade Leung⁵, Andrew Trask⁵, Emma Bluemke⁵, Jonathan Lebensbold³, Cullen O'Keefe¹, Mark Koren⁶, Théo Ryffel⁸, J. B. Rubinovitz, Tamay Besiroglu², Federica Carugati⁶, Jack Clark¹, Peter Eckersley, Sarah de Haas⁷, Maritza Johnson⁷, Ben Laurie⁷, Alex Ingerman⁷, Igor Krawczuk⁹, Amanda Askell¹, Rosario Cammarota¹⁰, Andrew J. Lohn¹¹, David Krueger¹², Charlotte Stix¹³, Peter Henderson⁶, Logan Graham⁵, Carina E. A. Prunkl⁵, Bianca Martin¹, Elizabeth Seger², Noa Zilberman⁵, Seán Ó hÉigeartaigh², Frens Kroeger, Girish Sastry¹, Rebecca Kagan, Adrian Weller¹⁴, Adrian Weller², Brian Tse⁵, Elizabeth A. Barnes¹, Allan Dafoe⁵, Paul Scharre¹⁵, Ariel Herbert-Voss¹, Martijn Rasser¹⁵, Shagun Sodhani¹², Carrick Flynn, Thomas Krendl Gilbert¹⁶, Lisa Dyer, Saif Khan, Yoshua Bengio¹², Markus Anderljung⁵ - Show less +57 more•Institutions (16)

OpenAI¹, University of Cambridge², McGill University³, University of Toronto⁴, University of Oxford⁵, Stanford University⁶, Google⁷, École Normale Supérieure⁸, École Polytechnique Fédérale de Lausanne⁹, Intel¹⁰, RAND Corporation¹¹, Université de Montréal¹², Eindhoven University of Technology¹³, The Turing Institute¹⁴, Center for a New American Security¹⁵, University of California¹⁶

15 Apr 2020-arXiv: Computers and Society

TL;DR: This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems.

...read moreread less

Abstract: With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.

...read moreread less

191 citations

Posted Content•

Inductive Biases for Deep Learning of Higher-Level Cognition

[...]

Anirudh Goyal, Yoshua Bengio

30 Nov 2020-arXiv: Learning

TL;DR: This work considers a larger list of inductive biases that humans and animals exploit, focusing on those which concern mostly higher-level and sequential conscious processing, and suggests they could potentially help build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization.

...read moreread less

Abstract: A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.

...read moreread less

151 citations

Posted Content•

Gradient Starvation: A Learning Proclivity in Neural Networks

[...]

Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie - Show less +2 more

18 Nov 2020-arXiv: Learning

TL;DR: This work provides a theoretical explanation for the emergence of feature imbalance in neural networks and develops guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation.

...read moreread less

Abstract: We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.

...read moreread less

115 citations

Journal Article•DOI•

The need for privacy with public digital contact tracing during the COVID-19 pandemic

[...]

Yoshua Bengio¹, Richard Janda², Yun William Yu³, Daphne Ippolito⁴, Max Jarvie, Dan Nicola Neacsu Pilat, Brooke Struck, Sekoul Krastev, Abhinav Sharma⁵ - Show less +5 more•Institutions (5)

Université de Montréal¹, McGill University², University of Toronto³, University of Pennsylvania⁴, McGill University Health Centre⁵

02 Jun 2020

106 citations

Journal Article•DOI•

BigBrain 3D atlas of cortical layers: Cortical and laminar thickness gradients diverge in sensory and motor cortices.

[...]

Konrad Wagstyl¹, Konrad Wagstyl², Konrad Wagstyl³, Stéphanie Larocque⁴, Guillem Cucurull⁴, Claude Lepage², Joseph Paul Cohen⁴, Sebastian Bludau⁵, Nicola Palomero-Gallagher⁶, Nicola Palomero-Gallagher⁵, Lindsay B. Lewis², Thomas Funck², Hannah Spitzer⁵, Timo Dickscheid⁵, Paul C. Fletcher³, Adriana Romero⁴, Adriana Romero⁷, Karl Zilles⁵, Katrin Amunts⁵, Katrin Amunts⁸, Yoshua Bengio⁴, Alan C. Evans² - Show less +18 more•Institutions (8)

Wellcome Trust Centre for Neuroimaging¹, Montreal Neurological Institute and Hospital², University of Cambridge³, Université de Montréal⁴, Forschungszentrum Jülich⁵, RWTH Aachen University⁶, McGill University⁷, University of Düsseldorf⁸

03 Apr 2020-PLOS Biology

TL;DR: This BigBrain cortical atlas was derived from a 3D histological atlas of the human brain at 20-micrometer isotropic resolution (BigBrain), using a convolutional neural network to segment, automatically, the cortical layers in both hemispheres to provide an unprecedented level of precision and detail.

...read moreread less

Abstract: Histological atlases of the cerebral cortex, such as those made famous by Brodmann and von Economo, are invaluable for understanding human brain microstructure and its relationship with functional organization in the brain. However, these existing atlases are limited to small numbers of manually annotated samples from a single cerebral hemisphere, measured from 2D histological sections. We present the first whole-brain quantitative 3D laminar atlas of the human cerebral cortex. It was derived from a 3D histological atlas of the human brain at 20-micrometer isotropic resolution (BigBrain), using a convolutional neural network to segment, automatically, the cortical layers in both hemispheres. Our approach overcomes many of the historical challenges with measurement of histological thickness in 2D, and the resultant laminar atlas provides an unprecedented level of precision and detail. We utilized this BigBrain cortical atlas to test whether previously reported thickness gradients, as measured by MRI in sensory and motor processing cortices, were present in a histological atlas of cortical thickness and which cortical layers were contributing to these gradients. Cortical thickness increased across sensory processing hierarchies, primarily driven by layers III, V, and VI. In contrast, motor-frontal cortices showed the opposite pattern, with decreases in total and pyramidal layer thickness from motor to frontal association cortices. These findings illustrate how this laminar atlas will provide a link between single-neuron morphology, mesoscale cortical layering, macroscopic cortical thickness, and, ultimately, functional neuroanatomy.

...read moreread less

Proceedings Article•

Revisiting Fundamentals of Experience Replay

[...]

William Fedus¹, Prajit Ramachandran², Rishabh Agarwal², Yoshua Bengio, Hugo Larochelle², Mark Rowland, Will Dabney - Show less +3 more•Institutions (2)

Université de Montréal¹, Google²

12 Jul 2020

TL;DR: This work presents a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio).

...read moreread less

Abstract: Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

...read moreread less

Posted Content•

Experience Grounds Language

[...]

21 Apr 2020-arXiv: Computation and Language

TL;DR: The authors posit that the present success of representation learning approaches trained on large, text-only corpora requires the parallel tradition of research on the broader physical and social context of language to address the deeper questions of communication.

...read moreread less

Journal Article•DOI•

Toward Training Recurrent Neural Networks for Lifelong Learning.

[...]

Shagun Sodhani¹, Sarath Chandar¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Jan 2020-Neural Computation

TL;DR: In this article, the authors study catastrophic forgetting and capacity saturation in a parametric lifelong learning system and study these challenges in the context of sequential supervise learning, and propose a solution to solve them.

...read moreread less

Abstract: Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we study these challenges in the context of sequential supervise...

...read moreread less

Posted Content•

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

[...]

Ossama Ahmed, Frederik Träuble, Anirudh Goyal, Alexander Neitz, Yoshua Bengio, Bernhard Schölkopf, Manuel Wüthrich, Stefan Bauer - Show less +4 more

08 Oct 2020-arXiv: Robotics

TL;DR: CausalWorld is proposed, a benchmark for causal structure and transfer learning in a robotic manipulation environment that is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer.

...read moreread less

Abstract: Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

...read moreread less

Posted Content•

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

[...]

Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam M. J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio - Show less +8 more

26 Apr 2020-arXiv: Learning

TL;DR: A novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de noVO drug design system.

...read moreread less

Abstract: Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models. However, current generative approaches exhibit a significant challenge as they do not ensure that the proposed molecular structures can be feasibly synthesized nor do they provide the synthesis routes of the proposed small molecules, thereby seriously limiting their practical applicability. In this work, we propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo drug design system. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space by subjecting commercially available small molecule building blocks to valid chemical reactions at every time step of the iterative virtual multi-step synthesis process. The proposed environment for drug discovery provides a highly challenging test-bed for RL algorithms owing to the large state space and high-dimensional continuous action space with hierarchical actions. PGFS achieves state-of-the-art performance in generating structures with high QED and penalized clogP. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets. Finally, we describe how the end-to-end training conceptualized in this study represents an important paradigm in radically expanding the synthesizable chemical space and automating the drug discovery process.

...read moreread less

Posted Content•

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

[...]

Meng Qu¹, Junkun Chen², Louis-Pascal Xhonneux¹, Yoshua Bengio¹, Jian Tang³ - Show less +1 more•Institutions (3)

Université de Montréal¹, Tsinghua University², HEC Montréal³

08 Oct 2020-arXiv: Artificial Intelligence

TL;DR: An EM-based algorithm for optimization of a probabilistic model called RNNLogic, which treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules.

...read moreread less

Abstract: This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for prediction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards (e.g., techniques based on reinforcement learning). To address these limitations, this paper proposes a probabilistic model called RNNLogic. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. We develop an EM-based algorithm for optimization. In each iteration, the reasoning predictor is first updated to explore some generated logic rules for reasoning. Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step. Experiments on four datasets prove the effectiveness of RNNLogic.

...read moreread less

Posted Content•

HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery.

[...]

Michel Deudon, Alfredo Kalaitzis, Israel Goytom, Rifat Arefin, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira Ebrahimi Kahou, Julien Cornebise, Yoshua Bengio - Show less +6 more

15 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: HighRes-net is presented, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion, and shows that by learning deep representations of multiple views, it can super-resolve low-resolution signals and enhance Earth Observation data at scale.

...read moreread less

Abstract: Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic results, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet -- from deforestation, to human rights violations -- that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-resolution views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-resolution pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth Observation data at scale. Our approach recently topped the European Space Agency's MFSR competition on real-world satellite imagery.

...read moreread less

Posted Content•

Hybrid Models for Learning to Branch

[...]

Prateek Gupta, Maxime Gasse, Elias B. Khalil, M. Pawan Kumar, Andrea Lodi, Yoshua Bengio - Show less +2 more

26 Jun 2020-arXiv: Learning

TL;DR: This work addresses the first question in the negative, and addresses the second question by proposing a new hybrid architecture for efficient branching on CPU machines, which combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.

...read moreread less

Abstract: A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP) While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based This severely limits its application as many practitioners may not have access to high-end GPUs In this work, we ask two key questions First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on The code for this project is publicly available at this https URL

...read moreread less

Posted Content•

Object-Centric Image Generation from Layouts

[...]

Tristan Sylvain, Pengchuan Zhang¹, Yoshua Bengio, R Devon Hjelm¹, Shikhar Sharma¹ - Show less +1 more•Institutions (1)

Microsoft¹

16 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well is started and an object-centric adaptation of the popular Fr{e}chet Inception Distance metric is introduced, that is better suited for multi-object images.

...read moreread less

Abstract: Despite recent impressive results on single-object and single-domain image generation, the generation of complex scenes with multiple objects remains challenging. In this paper, we start with the idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well. Our layout-to-image-generation method, which we call Object-Centric Generative Adversarial Network (or OC-GAN), relies on a novel Scene-Graph Similarity Module (SGSM). The SGSM learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We also propose changes to the conditioning mechanism of the generator that enhance its object instance-awareness. Apart from improving image quality, our contributions mitigate two failure modes in previous approaches: (1) spurious objects being generated without corresponding bounding boxes in the layout, and (2) overlapping bounding boxes in the layout leading to merged objects in images. Extensive quantitative evaluation and ablation studies demonstrate the impact of our contributions, with our model outperforming previous state-of-the-art approaches on both the COCO-Stuff and Visual Genome datasets. Finally, we address an important limitation of evaluation metrics used in previous works by introducing SceneFID -- an object-centric adaptation of the popular Fr{e}chet Inception Distance metric, that is better suited for multi-object images.

...read moreread less

Posted Content•

Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems

[...]

Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Sergey Levine, Charles Blundell, Yoshua Bengio, Michael C. Mozer - Show less +4 more

29 Jun 2020-arXiv: Learning

TL;DR: The resulting architecture is a drop-in replacement conforming to the same input-output interface as normal recurrent networks yet achieves substantially better generalization on environments that have factorized declarative and procedural knowledge, including a challenging intuitive physics benchmark.

...read moreread less

Abstract: Modeling a structured, dynamic environment like a video game requires keeping track of the objects and their states declarative knowledge) as well as predicting how objects behave (procedural knowledge) Black-box models with a monolithic hidden state often fail to apply procedural knowledge consistently and uniformly, ie, they lack systematicity For example, in a video game, correct prediction of one enemy's trajectory does not ensure correct prediction of another's We address this issue via an architecture that factorizes declarative and procedural knowledge and that imposes modularity within each form of knowledge The architecture consists of active modules called object files that maintain the state of a single object and invoke passive external knowledge sources called schemata that prescribe state updates To use a video game as an illustration, two enemies of the same type will share schemata but will have separate object files to encode their distinct state (eg, health, position) We propose to use attention to determine which object files to update, the selection of schemata, and the propagation of information between object files The resulting architecture is a drop-in replacement conforming to the same input-output interface as normal recurrent networks (eg, LSTM, GRU) yet achieves substantially better generalization on environments that have multiple object tokens of the same type, including a challenging intuitive physics benchmark

...read moreread less

Posted Content•

Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

[...]

Giulia Zarpellon¹, Jason Jo, Andrea Lodi¹, Yoshua Bengio•Institutions (1)

École Polytechnique de Montréal¹

12 Feb 2020-arXiv: Learning

TL;DR: A novel imitation learning framework is proposed, and new input features and architectures to represent branching are introduced, and the resulting policies significantly outperform the current state-of-the-art method for "learning to branch" by effectively allowing generalization to generic unseen instances.

...read moreread less

Abstract: Branch and Bound (B&B) is the exact tree search method typically used to solve Mixed-Integer Linear Programming problems (MILPs). Learning branching policies for MILP has become an active research area, with most works proposing to imitate the strong branching rule and specialize it to distinct classes of problems. We aim instead at learning a policy that generalizes across heterogeneous MILPs: our main hypothesis is that parameterizing the state of the B&B search tree can aid this type of generalization. We propose a novel imitation learning framework, and introduce new input features and architectures to represent branching. Experiments on MILP benchmark instances clearly show the advantages of incorporating an explicit parameterization of the state of the search tree to modulate the branching decisions, in terms of both higher accuracy and smaller B&B trees. The resulting policies significantly outperform the current state-of-the-art method for "learning to branch" by effectively allowing generalization to generic unseen instances.

...read moreread less

Posted Content•

Training End-to-End Analog Neural Networks with Equilibrium Propagation

[...]

Jack D. Kendall, Ross D. Pantone, Kalpana Manickavasagam, Yoshua Bengio, Benjamin Scellier - Show less +1 more

02 Jun 2020-arXiv: Neural and Evolutionary Computing

TL;DR: It is shown mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models: they possess an energy function as a consequence of Kirchhoff's laws governing electrical circuits.

...read moreread less

Abstract: We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent. In these analog neural networks, the weights to be adjusted are implemented by the conductances of programmable resistive devices such as memristors [Chua, 1971], and the nonlinear transfer functions (or `activation functions') are implemented by nonlinear components such as diodes. We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models: they possess an energy function as a consequence of Kirchhoff's laws governing electrical circuits. This property enables us to train them using the Equilibrium Propagation framework [Scellier and Bengio, 2017]. Our update rule for each conductance, which is local and relies solely on the voltage drop across the corresponding resistor, is shown to compute the gradient of the loss function. Our numerical simulations, which use the SPICE-based Spectre simulation framework to simulate the dynamics of electrical circuits, demonstrate training on the MNIST classification task, performing comparably or better than equivalent-size software-based neural networks. Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.

...read moreread less

Proceedings Article•

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

[...]

Yoshua Bengio¹, Tristan Deleu¹, Nasim Rahaman², Nan Rosemary Ke³, Sébastien Lachapelle¹, Olexa Bilaniuk¹, Anirudh Goyal¹, Chris Pal³ - Show less +4 more•Institutions (3)

Université de Montréal¹, Heidelberg University², École Polytechnique de Montréal³

30 Apr 2020

TL;DR: In this article, a meta-learning objective that maximizes the speed of transfer on a modified distribution is proposed to learn how to modularize acquired knowledge, where the objective is to factor a joint distribution into appropriate conditionals consistent with the causal directions.

...read moreread less

Abstract: We propose to use a meta-learning objective that maximizes the speed of transfer on a modified distribution to learn how to modularize acquired knowledge. In particular, we focus on how to factor a joint distribution into appropriate conditionals, consistent with the causal directions. We explain when this can work, using the assumption that the changes in distributions are localized (e.g. to one of the marginals, for example due to an intervention on one of the variables). We prove that under this assumption of localized changes in causal mechanisms, the correct causal graph will tend to have only a few of its parameters with non-zero gradient, i.e. that need to be adapted (those of the modified variables). We argue and observe experimentally that this leads to faster adaptation, and use this property to define a meta-learning surrogate score which, in addition to a continuous parametrization of graphs, would favour correct causal graphs. Finally, motivated by the AI agent point of view (e.g. of a robot discovering its environment autonomously), we consider how the same objective can discover the causal variables themselves, as a transformation of observed low-level variables with no causal meaning. Experiments in the two-variable case validate the proposed ideas and theoretical results.

...read moreread less

Posted Content•

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

[...]

Tong Che¹, Ruixiang Zhang¹, Jascha Sohl-Dickstein², Hugo Larochelle², Liam Paull¹, Yuan Cao³, Yoshua Bengio¹ - Show less +3 more•Institutions (3)

Université de Montréal¹, Google², University of California, Los Angeles³

12 Mar 2020-arXiv: Learning

TL;DR: Discriminator Driven Latent Sampling is shown to be highly efficient compared to previous methods which work in the high-dimensional pixel space and can be applied to improve on previously trained GANs of many types and achieves a new state-of-the-art in unconditional image synthesis setting without introducing extra parameters or additional training.

...read moreread less

Abstract: We show that the sum of the implicit generator log-density $\log p_g$ of a GAN with the logit score of the discriminator defines an energy function which yields the true data density when the generator is imperfect but the discriminator is optimal, thus making it possible to improve on the typical generator (with implicit density $p_g$). To make that practical, we show that sampling from this modified density can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score. This can be achieved by running a Langevin MCMC in latent space and then applying the generator function, which we call Discriminator Driven Latent Sampling~(DDLS). We show that DDLS is highly efficient compared to previous methods which work in the high-dimensional pixel space and can be applied to improve on previously trained GANs of many types. We evaluate DDLS on both synthetic and real-world datasets qualitatively and quantitatively. On CIFAR-10, DDLS substantially improves the Inception Score of an off-the-shelf pre-trained SN-GAN~\citep{sngan} from $8.22$ to $9.09$ which is even comparable to the class-conditional BigGAN~\citep{biggan} model. This achieves a new state-of-the-art in unconditional image synthesis setting without introducing extra parameters or additional training.

...read moreread less

Posted Content•

Multi-task self-supervised learning for Robust Speech Recognition

[...]

Mirco Ravanelli¹, Jianyuan Zhong², Santiago Pascual³, Pawel Swietojanski⁴, Joao Monteiro⁵, Jan Trmal⁶, Yoshua Bengio¹ - Show less +3 more•Institutions (6)

25 Jan 2020-arXiv: Audio and Speech Processing

TL;DR: In this paper, the authors proposed an improved version of PASE for robust speech recognition in noisy and reverberant environments, called PASE+, which employs an online speech distortion module, that contaminates the input signals with a variety of random disturbances.

...read moreread less

Abstract: Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation. Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions.

...read moreread less

Proceedings Article•

Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

[...]

Tong Che¹, Ruixiang Zhang¹, Jascha Sohl-Dickstein², Hugo Larochelle², Liam Paull¹, Yuan Cao³, Yoshua Bengio¹ - Show less +3 more•Institutions (3)

Université de Montréal¹, Google², University of California, Los Angeles³

12 Mar 2020

TL;DR: Discriminator Driven Latent Sampling (DDLS) as mentioned in this paper is the state-of-the-art method for image synthesis, which is based on the Langevin MCMC.

...read moreread less

Proceedings Article•

On the interplay between noise and curvature and its effect on optimization and generalization

[...]

Valentin Thomas, Fabian Pedregosa¹, Bart van Merriënboer¹, Pierre-Antoine Manzagol¹, Yoshua Bengio², Nicolas Le Roux¹ - Show less +2 more•Institutions (2)

Google¹, Université de Montréal²

03 Jun 2020

TL;DR: This work clarifies the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients and explains how both curvature and noise are relevant to properly estimate the generalization gap.

...read moreread less

Posted Content•

HNHN: Hypergraph Networks with Hyperedge Neurons.

[...]

Yihe Dong¹, Will Sawin², Yoshua Bengio³•Institutions (3)

Microsoft¹, Columbia University², Université de Montréal³

22 Jun 2020-arXiv: Learning

TL;DR: HNHN is a hypergraph convolution network with nonlinear activation functions applied to both hypernodes and hyperedges, combined with a normalization scheme that can flexibly adjust the importance of high-cardinality hyperedge and high-degree vertices depending on the dataset.

...read moreread less

Abstract: Hypergraphs provide a natural representation for many real world datasets. We propose a novel framework, HNHN, for hypergraph representation learning. HNHN is a hypergraph convolution network with nonlinear activation functions applied to both hypernodes and hyperedges, combined with a normalization scheme that can flexibly adjust the importance of high-cardinality hyperedges and high-degree vertices depending on the dataset. We demonstrate improved performance of HNHN in both classification accuracy and speed on real world datasets when compared to state of the art methods.

...read moreread less

Posted Content•

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

[...]

Timo Milbich¹, Karsten Roth¹, Karsten Roth², Homanga Bharadhwaj³, Homanga Bharadhwaj⁴, Samarth Sinha², Samarth Sinha⁴, Yoshua Bengio², Yoshua Bengio⁵, Björn Ommer¹, Joseph Paul Cohen² - Show less +7 more•Institutions (5)

Heidelberg University¹, Université de Montréal², Carnegie Mellon University³, University of Toronto⁴, Canadian Institute for Advanced Research⁵

28 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes and studies multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting, resulting in strong generalization and state-of-the-art performance on multiple established DML benchmark datasets.

...read moreread less

Abstract: Visual Similarity plays an important role in many computer vision applications. Deep metric learning (DML) is a powerful framework for learning such similarities which not only generalize from training data to identically distributed test distributions, but in particular also translate to unknown test classes. However, its prevailing learning paradigm is class-discriminative supervised training, which typically results in representations specialized in separating training classes. For effective generalization, however, such an image representation needs to capture a diverse range of data characteristics. To this end, we propose and study multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting. Through simultaneous optimization of our tasks we learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance on multiple established DML benchmark datasets.

...read moreread less