Showing papers by "Kyunghyun Cho published in 2018"

PDF

Open Access

Journal Article•DOI•

Recurrent Neural Networks for Multivariate Time Series with Missing Values.

[...]

Zhengping Che¹, Sanjay Purushotham¹, Kyunghyun Cho², David Sontag³, Yan Liu¹ - Show less +1 more•Institutions (3)

University of Southern California¹, New York University², Massachusetts Institute of Technology³

17 Apr 2018-Scientific Reports

TL;DR: In this article, a deep learning model based on Gated Recurrent Unit (GRU) is proposed to exploit the missing values and their missing patterns for effective imputation and improving prediction performance.

...read moreread less

Abstract: Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.

...read moreread less

1,085 citations

Posted Content•

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

[...]

Jason Lee¹, Elman Mansimov¹, Kyunghyun Cho¹•Institutions (1)

New York University¹

19 Feb 2018-arXiv: Learning

TL;DR: This paper proposed a conditional non-autoregressive neural sequence model based on iterative refinement, which is designed based on the principles of latent variable models and denoising autoencoders, and is generally applicable to any sequence generation task.

...read moreread less

Abstract: We propose a conditional non-autoregressive neural sequence model based on iterative refinement. The proposed model is designed based on the principles of latent variable models and denoising autoencoders, and is generally applicable to any sequence generation task. We extensively evaluate the proposed model on machine translation (En-De and En-Ro) and image caption generation, and observe that it significantly speeds up decoding while maintaining the generation quality comparable to the autoregressive counterpart.

...read moreread less

272 citations

Proceedings Article•DOI•

Deterministic non-autoregressive neural sequence modeling by iterative refinement

[...]

Jason Lee¹, Elman Mansimov¹, Kyunghyun Cho¹•Institutions (1)

New York University¹

01 Jan 2018

TL;DR: The proposed conditional non-autoregressive neural sequence model is evaluated on machine translation and image caption generation, and it is observed that it significantly speeds up decoding while maintaining the generation quality comparable to the autoregressive counterpart.

...read moreread less

261 citations

Proceedings Article•

Search Engine Guided Neural Machine Translation

[...]

Jiatao Gu¹, Yong Wang¹, Kyunghyun Cho², Victor O. K. Li¹•Institutions (2)

University of Hong Kong¹, New York University²

01 Jan 2018

TL;DR: Empirical evaluation on three language pairs shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.

...read moreread less

Abstract: In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage –retrieval stage–, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage–translation stage–, a novel translation model, called search engine guided NMT (SEG-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.

...read moreread less

150 citations

Journal Article•DOI•

Fine-grained attention mechanism for neural machine translation

[...]

Heeyoul Choi¹, Kyunghyun Cho², Yoshua Bengio³•Institutions (3)

Handong Global University¹, New York University², Université de Montréal³

05 Apr 2018-Neurocomputing

TL;DR: The authors proposed a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate attention score, which improves the translation quality in terms of BLEU score.

...read moreread less

141 citations

Proceedings Article•DOI•

Zero-shot transfer learning for event extraction

[...]

Lifu Huang¹, Heng Ji¹, Kyunghyun Cho², Ido Dagan³, Sebastian Riedel⁴, Clare R. Voss⁵ - Show less +2 more•Institutions (5)

Rensselaer Polytechnic Institute¹, New York University², Bar-Ilan University³, University College London⁴, United States Army Research Laboratory⁵

01 Jan 2018

TL;DR: A transferable architecture of structural and compositional neural networks is designed to jointly represent and map event mentions and types into a shared semantic space and can select, for each event mention, the event type which is semantically closest in this space as its type.

...read moreread less

Abstract: Most previous supervised event extraction methods have relied on features derived from manual annotations, and thus cannot be applied to new event types without extra annotation effort. We take a fresh look at event extraction and model it as a generic grounding problem: mapping each event mention to a specific type in a target event ontology. We design a transferable architecture of structural and compositional neural networks to jointly represent and map event mentions and types into a shared semantic space. Based on this new framework, we can select, for each event mention, the event type which is semantically closest in this space as its type. By leveraging manual annotations available for a small set of existing event types, our framework can be applied to new unseen event types without additional manual annotations. When tested on 23 unseen event types, our zero-shot framework, without manual annotations, achieved performance comparable to a supervised model trained from 3,000 sentences annotated with 500 event mentions.

...read moreread less

110 citations

Journal Article•DOI•

Segmentation of the Proximal Femur from MR Images using Deep Convolutional Neural Networks

[...]

Cem M. Deniz¹, Siyuan Xiang¹, R. Spencer Hallyburton², Arakua Welbeck¹, James S. Babb¹, Stephen Honig¹, Kyunghyun Cho¹, Gregory Chang¹ - Show less +4 more•Institutions (2)

New York University¹, Harvard University²

07 Nov 2018-Scientific Reports

TL;DR: In this article, the authors presented an automatic proximal femur segmentation method that is based on deep convolutional neural networks (CNNs), which achieved a high dice similarity score of 0.95.

...read moreread less

Abstract: Magnetic resonance imaging (MRI) has been proposed as a complimentary method to measure bone quality and assess fracture risk. However, manual segmentation of MR images of bone is time-consuming, limiting the use of MRI measurements in the clinical practice. The purpose of this paper is to present an automatic proximal femur segmentation method that is based on deep convolutional neural networks (CNNs). This study had institutional review board approval and written informed consent was obtained from all subjects. A dataset of volumetric structural MR images of the proximal femur from 86 subjects were manually-segmented by an expert. We performed experiments by training two different CNN architectures with multiple number of initial feature maps, layers and dilation rates, and tested their segmentation performance against the gold standard of manual segmentations using four-fold cross-validation. Automatic segmentation of the proximal femur using CNNs achieved a high dice similarity score of 0.95 ± 0.02 with precision = 0.95 ± 0.02, and recall = 0.95 ± 0.03. The high segmentation accuracy provided by CNNs has the potential to help bring the use of structural MRI measurements of bone quality into clinical practice for management of osteoporosis.

...read moreread less

108 citations

Proceedings Article•DOI•

Learning Distributed Representations from Reviews for Collaborative Filtering

[...]

Amjad Almahairi¹, Kyle Kastner¹, Kyunghyun Cho¹, Aaron Courville¹•Institutions (1)

Université de Montréal¹

18 Jun 2018-arXiv: Learning

TL;DR: It is demonstrated that the increased flexibility offered by the product-of-experts model allowed it to achieve state- of-the-art performance on the Amazon review dataset, outperforming the LDA-based approach, however, interestingly, the greater modeling power offers by the recurrent neural network appears to undermine the model's ability to act as a regularizer of the product representations.

...read moreread less

Abstract: Recent work has shown that collaborative filter-based recommender systems can be improved by incorporating side information, such as natural language reviews, as a way of regularizing the derived product representations. Motivated by the success of this approach, we introduce two different models of reviews and study their effect on collaborative filtering performance. While the previous state-of-the-art approach is based on a latent Dirichlet allocation (LDA) model of reviews, the models we explore are neural network based: a bag-of-words product-of-experts model and a recurrent neural network. We demonstrate that the increased flexibility offered by the product-of-experts model allowed it to achieve state-of-the-art performance on the Amazon review dataset, outperforming the LDA-based approach. However, interestingly, the greater modeling power offered by the recurrent neural network appears to undermine the model's ability to act as a regularizer of the product representations.

...read moreread less

94 citations

Proceedings Article•

DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder

[...]

Xiaodong Gu¹, Kyunghyun Cho¹, Jung-Woo Ha, Sunghun Kim¹•Institutions (1)

Hong Kong University of Science and Technology¹

31 May 2018

TL;DR: The authors proposed DialogWAE, a conditional Wasserstein autoencoder specially designed for dialogue modeling, which models the distribution of data by training a GAN within the latent variable space.

...read moreread less

Abstract: Variational autoencoders~(VAEs) have shown a promise in data-driven conversation modeling. However, most VAE conversation models match the approximate posterior distribution over the latent variables to a simple prior such as standard normal distribution, thereby restricting the generated responses to a relatively simple (e.g., unimodal) scope. In this paper, we propose DialogWAE, a conditional Wasserstein autoencoder~(WAE) specially designed for dialogue modeling. Unlike VAEs that impose a simple distribution over the latent variables, DialogWAE models the distribution of data by training a GAN within the latent variable space. Specifically, our model samples from the prior and posterior distributions over the latent variables by transforming context-dependent random noise using neural networks and minimizes the Wasserstein distance between the two distributions. We further develop a Gaussian mixture prior network to enrich the latent space. Experiments on two popular datasets show that DialogWAE outperforms the state-of-the-art approaches in generating more coherent, informative and diverse responses.

...read moreread less

79 citations

Journal Article•DOI•

Conditional molecular design with deep generative models

[...]

Seokho Kang¹, Kyunghyun Cho², Kyunghyun Cho³, Kyunghyun Cho⁴•Institutions (4)

Sungkyunkwan University¹, New York University², Canadian Institute for Advanced Research³, Facebook⁴

30 Apr 2018-arXiv: Learning

TL;DR: In this paper, a semi-supervised variational autoencoder is proposed to generate new molecules with desired properties by sampling from the generative distribution estimated by the model.

...read moreread less

Abstract: Although machine learning has been successfully used to propose novel molecules that satisfy desired properties, it is still challenging to explore a large chemical space efficiently. In this paper, we present a conditional molecular design method that facilitates generating new molecules with desired properties. The proposed model, which simultaneously performs both property prediction and molecule generation, is built as a semi-supervised variational autoencoder trained on a set of existing molecules with only a partial annotation. We generate new molecules with desired properties by sampling from the generative distribution estimated by the model. We demonstrate the effectiveness of the proposed model by evaluating it on drug-like molecules. The model improves the performance of property prediction by exploiting unlabeled molecules, and efficiently generates novel molecules fulfilling various target conditions.

...read moreread less

57 citations

Posted Content•

Pommerman: A Multi-Agent Playground

[...]

Cinjon Resnick¹, Wes Eldridge, David Ha, Denny Britz², Jakob Foerster³, Julian Togelius, Kyunghyun Cho, Joan Bruna - Show less +4 more•Institutions (3)

New York University¹, Stanford University², University of Oxford³

19 Sep 2018-arXiv: Multiagent Systems

TL;DR: Pommerman, a multi-agent environment based on the classic console game Bomberman, consists of a set of scenarios, each having at least four players and containing both cooperative and competitive aspects.

...read moreread less

Abstract: We present Pommerman, a multi-agent environment based on the classic console game Bomberman. Pommerman consists of a set of scenarios, each having at least four players and containing both cooperative and competitive aspects. We believe that success in Pommerman will require a diverse set of tools and methods, including planning, opponent/teammate modeling, game theory, and communication, and consequently can serve well as a multi-agent benchmark. To date, we have already hosted one competition, and our next one will be featured in the NIPS 2018 competition track.

...read moreread less

Posted Content•

Importance of a Search Strategy in Neural Dialogue Modelling

[...]

Ilya Kulikov, Alexander H. Miller, Kyunghyun Cho, Jason Weston

02 Nov 2018-arXiv: Computation and Language

TL;DR: A significant gap is observed between greedy search and the proposed iterative beam search augmented with selection scoring, demonstrating the importance of the search algorithm in neural dialogue generation.

...read moreread less

Abstract: Search strategies for generating a response from a neural dialogue model have received relatively little attention compared to improving network architectures and learning algorithms in recent years. In this paper, we consider a standard neural dialogue model based on recurrent networks with an attention mechanism, and focus on evaluating the impact of the search strategy. We compare four search strategies: greedy search, beam search, iterative beam search and iterative beam search followed by selection scoring. We evaluate these strategies using human evaluation of full conversations and compare them using automatic metrics including log-probabilities, scores and diversity metrics. We observe a significant gap between greedy search and the proposed iterative beam search augmented with selection scoring, demonstrating the importance of the search algorithm in neural dialogue generation.

...read moreread less

Proceedings Article•DOI•

Breast Density Classification with Deep Convolutional Neural Networks

[...]

Nan Wu¹, Krzysztof J. Geras¹, Yiqiu Shen¹, Jingyi Su¹, S. Gene Kim¹, Eric Kim¹, Stacey Wolfson¹, Linda Moy¹, Kyunghyun Cho¹ - Show less +5 more•Institutions (1)

New York University¹

15 Apr 2018

TL;DR: This work explored the limits of breast density classification with a data set coming from over 200,000 breast cancer screening exams and found that a strong convolutional neural network classifier can perform this task comparably to a human expert.

...read moreread less

Abstract: Breast density classification is an essential part of breast cancer screening. Although a lot of prior work considered this problem as a task for learning algorithms, to our knowledge, all of them used small and not clinically realistic data both for training and evaluation of their models. In this work, we explored the limits of this task with a data set coming from over 200,000 breast cancer screening exams. We used this data to train and evaluate a strong convolutional neural network classifier. In a reader study, we found that our model can perform this task comparably to a human expert.

...read moreread less

Proceedings Article•DOI•

A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

[...]

Keunwoo Choi¹, György Fazekas¹, Mark Sandler¹, Kyunghyun Cho²•Institutions (2)

Queen Mary University of London¹, New York University²

29 Nov 2018

TL;DR: In this paper, the authors empirically investigate the effect of audio preprocessing on music tagging with deep neural networks and show that many commonly used input preprocessing techniques are redundant except magnitude compression.

...read moreread less

Abstract: In this paper, we empirically investigate the effect of audio preprocessing on music tagging with deep neural networks. We perform comprehensive experiments involving audio preprocessing using different time-frequency representations, logarithmic magnitude compression, frequency weighting, and scaling. We show that many commonly used input preprocessing techniques are redundant except magnitude compression.

...read moreread less

Journal Article•DOI•

Reading the (functional) writing on the (structural) wall: Multimodal fusion of brain structure and function via a deep neural network based translation approach reveals novel impairments in schizophrenia

[...]

Sergey M. Plis¹, Faijul Amin¹, Adam M Chekroud², R. Devon Hjelm¹, Eswar Damaraju¹, Hyo Jong Lee³, Juan R. Bustillo⁴, Kyunghyun Cho⁵, Godfrey D. Pearlson², Vince D. Calhoun⁴ - Show less +6 more•Institutions (5)

The Mind Research Network¹, Yale University², Chonbuk National University³, University of New Mexico⁴, New York University⁵

01 Nov 2018-NeuroImage

TL;DR: A novel approach to finding linkage/association between multimodal brain imaging data, such as structural MRI (sMRI) and functional MRI (fMRI), which employs a deep learning model to consider two different imaging views of the same brain like two different languages conveying some common facts.

...read moreread less

Journal Article•DOI•

Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes.

[...]

Caglar Gulcehre¹, Sarath Chandar¹, Kyunghyun Cho², Yoshua Bengio¹•Institutions (2)

Université de Montréal¹, New York University²

20 Mar 2018-Neural Computation

TL;DR: The model outperforms long short-term memory and NTM variants, and provides further experimental results on the sequential MNIST, Stanford Natural Language Inference, associative recall, and copy tasks.

...read moreread less

Abstract: We extend the neural Turing machine (NTM) model into a dynamic neural Turing machine (D-NTM) by introducing trainable address vectors. This addressing scheme maintains for each memory cell two sepa...

...read moreread less

Pommerman: A Multi-Agent Playground

[...]

Cinjon Resnick¹, Wes Eldridge, David Ha, Denny Britz², Jakob Foerster³, Julian Togelius, Kyunghyun Cho, Joan Bruna - Show less +4 more•Institutions (3)

New York University¹, Stanford University², University of Oxford³

01 Sep 2018

TL;DR: Pommerman as mentioned in this paper is a multi-agent environment based on the classic console game Bomberman, which consists of a set of scenarios, each having at least four players and containing both cooperative and competitive aspects.

...read moreread less

Posted Content•

DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder.

[...]

Xiaodong Gu¹, Kyunghyun Cho¹, Jung-Woo Ha², Sunghun Kim³•Institutions (3)

Hong Kong University of Science and Technology¹, New York University², Naver Corporation³

31 May 2018-arXiv: Computation and Language

TL;DR: DialogWAE is proposed, a conditional Wasserstein autoencoder specially designed for dialogue modeling that models the distribution of data by training a GAN within the latent variable space and develops a Gaussian mixture prior network to enrich the latent space.

...read moreread less

Posted Content•

Backplay: "Man muss immer umkehren"

[...]

Cinjon Resnick¹, Roberta Raileanu¹, Sanyam Kapoor¹, Alexander Peysakhovich², Kyunghyun Cho¹, Joan Bruna¹ - Show less +2 more•Institutions (2)

New York University¹, Facebook²

18 Jul 2018-arXiv: Learning

TL;DR: The approach, Backplay, uses a single demonstration to construct a curriculum for a given task, and analytically characterize the types of environments where Backplay can improve training speed and compare favorably to other competitive methods known to improve sample efficiency.

...read moreread less

Abstract: Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.

...read moreread less

Journal Article•DOI•

The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging

[...]

Keunwoo Choi¹, George Fazekas¹, Kyunghyun Cho², Mark Sandler¹•Institutions (2)

Queen Mary University of London¹, New York University²

23 Mar 2018

TL;DR: In this article, the effects of noisy labels are investigated in deep neural networks for music classification. And the authors show that networks can be effective despite relatively large error rates in ground truth datasets, while conjecturing that label noise can be the cause of varying tag-wise performance differences.

...read moreread less

Abstract: Deep neural networks (DNNs) have been successfully applied to music classification including music tagging. However, there are several open questions regarding the training, evaluation, and analysis of DNNs. In this paper, we investigate specific aspects of neural networks, the effects of noisy labels, to deepen our understanding of their properties. We analyze and (re-)validate a large music tagging dataset to investigate the reliability of training and evaluation. Using a trained network, we compute label vector similarities, which are compared to groundtruth similarity. The results highlight several important aspects of music tagging and neural networks. We show that networks can be effective despite relatively large error rates in groundtruth datasets, while conjecturing that label noise can be the cause of varying tag-wise performance differences. Finally, the analysis of our trained network provides valuable insight into the relationships between music tags. These results highlight the benefit of using data-driven methods to address automatic music tagging.

...read moreread less

Proceedings Article•DOI•

Code-Switched Named Entity Recognition with Embedding Attention

[...]

Changhan Wang¹, Kyunghyun Cho², Douwe Kiela¹•Institutions (2)

Facebook¹, New York University²

01 Jul 2018

TL;DR: This work describes the work for the CALCS 2018 shared task on named entity recognition on code-switched data, where the system ranked first place for MS Arabic-Egyptian namedentity recognition and third place for English-Spanish.

...read moreread less

Abstract: We describe our work for the CALCS 2018 shared task on named entity recognition on code-switched data. Our system ranked first place for MS Arabic-Egyptian named entity recognition and third place for English-Spanish.

...read moreread less

Posted Content•

Meta-Learning for Low-Resource Neural Machine Translation

[...]

Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, Victor O. K. Li - Show less +1 more

25 Aug 2018-arXiv: Computation and Language

TL;DR: This paper proposed a meta-learning approach for low-resource NMT, which uses the universal lexical representation to overcome the input-output mismatch across different languages, and showed that the proposed approach significantly outperforms the multilingual, transfer learning based approach.

...read moreread less

Abstract: In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome the input-output mismatch across different languages. We evaluate the proposed meta-learning strategy using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt, Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro, Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach significantly outperforms the multilingual, transfer learning based approach~\citep{zoph2016transfer} and enables us to train a competitive NMT system with only a fraction of training examples. For instance, the proposed approach can achieve as high as 22.04 BLEU on Romanian-English WMT'16 by seeing only 16,000 translated words (~600 parallel sentences).

...read moreread less

Posted Content•

Retrieval-Augmented Convolutional Neural Networks for Improved Robustness against Adversarial Examples

[...]

Jake Zhao, Kyunghyun Cho

26 Feb 2018-arXiv: Learning

TL;DR: This work proposes a retrieval-augmented convolutional network and proposes to train it with local mixup, a novel variant of the recently proposed mixup algorithm that addresses on-manifold adversarial examples by explicitly encouraging the classifier to locally behave linearly on the data manifold.

...read moreread less

Abstract: We propose a retrieval-augmented convolutional network and propose to train it with local mixup, a novel variant of the recently proposed mixup algorithm. The proposed hybrid architecture combining a convolutional network and an off-the-shelf retrieval engine was designed to mitigate the adverse effect of off-manifold adversarial examples, while the proposed local mixup addresses on-manifold ones by explicitly encouraging the classifier to locally behave linearly on the data manifold. Our evaluation of the proposed approach against five readily-available adversarial attacks on three datasets--CIFAR-10, SVHN and ImageNet--demonstrate the improved robustness compared to the vanilla convolutional network.

...read moreread less

Journal Article•DOI•

Spatio-Temporal Dynamics of Intrinsic Networks in Functional Magnetic Imaging Data Using Recurrent Neural Networks

[...]

R Devon Hjelm¹, Eswar Damaraju², Eswar Damaraju³, Kyunghyun Cho⁴, Helmut Laufs, Sergey M. Plis³, Vince D. Calhoun³, Vince D. Calhoun² - Show less +4 more•Institutions (4)

Microsoft¹, University of New Mexico², The Mind Research Network³, New York University⁴

20 Sep 2018-Frontiers in Neuroscience

TL;DR: In this paper, a novel recurrent neural network (RNN) approach is introduced to account for temporal dynamics and dependencies in brain networks observed via functional magnetic resonance imaging (fMRI).

...read moreread less

Abstract: We introduce a novel recurrent neural network (RNN) approach to account for temporal dynamics and dependencies in brain networks observed via functional magnetic resonance imaging (fMRI). Our approach directly parameterizes temporal dynamics through recurrent connections, which can be used to formulate blind source separation with a conditional (rather than marginal) independence assumption, which we call RNN-ICA. This formulation enables us to visualize the temporal dynamics of both first order (activity) and second order (directed connectivity) information in brain networks that are widely studied in a static sense, but not well-characterized dynamically. RNN-ICA predicts dynamics directly from the recurrent states of the RNN in both task and resting state fMRI. Our results show both task-related and group-differentiating directed connectivity.

...read moreread less

Proceedings Article•

Loss Functions for Multiset Prediction

[...]

Sean Welleck¹, Sean Welleck², Zixin Yao¹, Yu Gai¹, Jialin Mao¹, Zheng Zhang¹, Kyunghyun Cho² - Show less +3 more•Institutions (2)

New York University Shanghai¹, New York University²

15 Feb 2018

TL;DR: In this paper, a multiset loss function was proposed for sequential decision-making, which is empirically evaluated on two families of datasets, one synthetic and the other real, with varying levels of difficulty, against various baseline loss functions.

...read moreread less

Abstract: We study the problem of multiset prediction. The goal of multiset prediction is to train a predictor that maps an input to a multiset consisting of multiple items. Unlike existing problems in supervised learning, such as classification, ranking and sequence generation, there is no known order among items in a target multiset, and each item in the multiset may appear more than once, making this problem extremely challenging. In this paper, we propose a novel multiset loss function by viewing this problem from the perspective of sequential decision making. The proposed multiset loss function is empirically evaluated on two families of datasets, one synthetic and the other real, with varying levels of difficulty, against various baseline loss functions including reinforcement learning, sequence, and aggregated distribution matching loss functions. The experiments reveal the effectiveness of the proposed loss function over the others.

...read moreread less

Proceedings Article•

Boundary Seeking GANs

[...]

R Devon Hjelm¹, Athul Paul Jacob², Adam Trischler¹, Gerry Che, Kyunghyun Cho³, Yoshua Bengio⁴ - Show less +2 more•Institutions (4)

Microsoft¹, University of Waterloo², New York University³, Université de Montréal⁴

15 Feb 2018

TL;DR: A method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator and demonstrating the effectiveness of the proposed algorithm with discrete image and character-based natural language generation.

...read moreread less

Abstract: Generative adversarial networks are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions GANs, as normally formulated, rely on the generated samples being completely differentiable wrt the generative parameters, and thus do not work for discrete data We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs) We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning

...read moreread less

Proceedings Article•DOI•

Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop.

[...]

Rujun Han¹, Michael Gill², Arthur Spirling², Kyunghyun Cho²•Institutions (2)

University of Southern California¹, New York University²

01 Jan 2018

TL;DR: This work addresses concerns with a model that incorporates document covariates to estimate conditional word embedding distributions and allows for hypothesis tests about the meanings of terms, and assessments as to whether a word is near or far from another conditioned on different covariate values.

...read moreread less

Abstract: Conventional word embedding models do not leverage information from document meta-data, and they do not model uncertainty. We address these concerns with a model that incorporates document covariates to estimate conditional word embedding distributions. Our model allows for (a) hypothesis tests about the meanings of terms, (b) assessments as to whether a word is near or far from another conditioned on different covariate values, and (c) assessments as to whether estimated differences are statistically significant.

...read moreread less

Posted Content•

Controlling Decoding for More Abstractive Summaries with Copy-Based Networks

[...]

Noah Weber, Leena Shekhar, Niranjan Balasubramanian, Kyunghyun Cho

19 Mar 2018-arXiv: Computation and Language

TL;DR: This paper proposes a simple baseline method that allows us to control the amount of copying without retraining, and indicates that the method provides a strong baseline for abstractive systems looking to obtain high ROUGE scores while minimizing overlap with the source article.

...read moreread less

Abstract: Attention-based neural abstractive summarization systems equipped with copy mechanisms have shown promising results. Despite this success, it has been noticed that such a system generates a summary by mostly, if not entirely, copying over phrases, sentences, and sometimes multiple consecutive sentences from an input paragraph, effectively performing extractive summarization. In this paper, we verify this behavior using the latest neural abstractive summarization system - a pointer-generator network. We propose a simple baseline method that allows us to control the amount of copying without retraining. Experiments indicate that the method provides a strong baseline for abstractive systems looking to obtain high ROUGE scores while minimizing overlap with the source article, substantially reducing the n-gram overlap with the original article while keeping within 2 points of the original model's ROUGE score.

...read moreread less

Posted Content•

Vehicle Community Strategies.

[...]

Cinjon Resnick, Ilya Kulikov, Kyunghyun Cho, Jason Weston

19 Apr 2018

TL;DR: This work considers self-driving cars coordinating with each other and focuses on how communication influences the agents' collective behavior, finding that communication helps (most) with adverse conditions.

...read moreread less

Abstract: Interest in emergent communication has recently surged in Machine Learning. The focus of this interest has largely been either on investigating the properties of the learned protocol or on utilizing emergent communication to better solve problems that already have a viable solution. Here, we consider self-driving cars coordinating with each other and focus on how communication influences the agents' collective behavior. Our main result is that communication helps (most) with adverse conditions.

...read moreread less

Simple Nearest Neighbor Policy Method for Continuous Control Tasks

[...]

Elman Mansimov¹, Kyunghyun Cho¹•Institutions (1)

New York University¹

15 Feb 2018

TL;DR: In this paper, the authors propose a new policy, called a nearest neighbor policy, which does not require any optimization for simple, low-dimensional continuous control tasks and allows them to investigate the underlying difficulty of a task without being distracted by optimization difficulty.

...read moreread less

Abstract: We design a new policy, called a nearest neighbor policy, that does not require any optimization for simple, low-dimensional continuous control tasks As this policy does not require any optimization, it allows us to investigate the underlying difficulty of a task without being distracted by optimization difficulty of a learning algorithm We propose two variants, one that retrieves an entire trajectory based on a pair of initial and goal states, and the other retrieving a partial trajectory based on a pair of current and goal states We test the proposed policies on five widely-used benchmark continuous control tasks with a sparse reward: Reacher, Half Cheetah, Double Pendulum, Cart Pole and Mountain Car We observe that the majority (the first four) of these tasks, which have been considered difficult, are easily solved by the proposed policies with high success rates, indicating that reported difficulties of them may have likely been due to the optimization difficulty Our work suggests that it is necessary to evaluate any sophisticated policy learning algorithm on more challenging problems in order to truly assess the advances from them

...read moreread less