Top 6 papers published by Dumitru Erhan from Google in 2018

Proceedings Article•

[...]

Mohammad Babaeizadeh¹, Chelsea Finn², Dumitru Erhan, Roy H. Campbell¹, Sergey Levine² - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, Berkeley²

01 Jan 2018

TL;DR: In this paper, a stochastic variational video prediction (SV2P) method is proposed to predict a different possible future for each sample of its latent variables for real-world video.

...read moreread less

Abstract: Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world video. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.

...read moreread less

332 citations

Proceedings Article•

Learning how to explain neural networks: PatternNet and PatternAttribution

[...]

Pieter-Jan Kindermans, Kristof T. Schütt¹, Maximilian Alber¹, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne¹ - Show less +3 more•Institutions (1)

Technical University of Berlin¹

01 Jan 2018

TL;DR: This work argues that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models, and proposes a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.

...read moreread less

Abstract: DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models. Based on our analysis of linear models we propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.

...read moreread less

218 citations

Posted Content•

A Benchmark for Interpretability Methods in Deep Neural Networks

[...]

Sara Hooker¹, Dumitru Erhan¹, Pieter-Jan Kindermans¹, Been Kim¹•Institutions (1)

Google¹

28 Jun 2018-arXiv: Learning

TL;DR: An empirical measure of the approximate accuracy of feature importance estimates in deep neural networks is proposed and it is shown that some approaches do no better then the underlying method but carry a far higher computational burden.

...read moreread less

Abstract: We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance. Only certain ensemble based approaches---VarGrad and SmoothGrad-Squared---outperform such a random assignment of importance. The manner of ensembling remains critical, we show that some approaches do no better then the underlying method but carry a far higher computational burden.

...read moreread less

114 citations

Posted Content•

Hierarchical Long-term Video Prediction without Supervision

[...]

Nevan Wichers¹, Ruben Villegas², Dumitru Erhan¹, Honglak Lee¹•Institutions (2)

Google¹, University of Michigan²

12 Jun 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work develops a novel training method that jointly trains the encoder, the predictor, and the decoder together without highlevel supervision and improves upon this by using an adversarial loss in the feature space to train the predictor.

...read moreread less

Abstract: Much of recent research has been devoted to video prediction and generation, yet most of the previous works have demonstrated only limited success in generating videos on short-term horizons. The hierarchical video prediction method by Villegas et al. (2017) is an example of a state-of-the-art method for long-term video prediction, but their method is limited because it requires ground truth annotation of high-level structures (e.g., human joint landmarks) at training time. Our network encodes the input frame, predicts a high-level encoding into the future, and then a decoder with access to the first frame produces the predicted image from the predicted encoding. The decoder also produces a mask that outlines the predicted foreground object (e.g., person) as a by-product. Unlike Villegas et al. (2017), we develop a novel training method that jointly trains the encoder, the predictor, and the decoder together without highlevel supervision; we further improve upon this by using an adversarial loss in the feature space to train the predictor. Our method can predict about 20 seconds into the future and provides better results compared to Denton and Fergus (2018) and Finn et al. (2016) on the Human 3.6M dataset.

...read moreread less

82 citations

Proceedings Article•

Hierarchical Long-term Video Prediction without Supervision

[...]

Nevan Wichers¹, Ruben Villegas², Dumitru Erhan¹, Honglak Lee¹•Institutions (2)

Google¹, University of Michigan²

03 Jul 2018

TL;DR: In this paper, the authors propose a hierarchical video prediction method that jointly trains the encoder, the predictor, and the decoder together without high-level supervision, and further improves upon this by using an adversarial loss in the feature space to train the predictor.

...read moreread less

Abstract: Much of recent research has been devoted to video prediction and generation, yet most of the previous works have demonstrated only limited success in generating videos on short-term horizons. The hierarchical video prediction method by Villegas et al. (2017) is an example of a state-of-the-art method for long-term video prediction, but their method is limited because it requires ground truth annotation of high-level structures (e.g., human joint landmarks) at training time. Our network encodes the input frame, predicts a high-level encoding into the future, and then a decoder with access to the first frame produces the predicted image from the predicted encoding. The decoder also produces a mask that outlines the predicted foreground object (e.g., person) as a by-product. Unlike Villegas et al. (2017), we develop a novel training method that jointly trains the encoder, the predictor, and the decoder together without highlevel supervision; we further improve upon this by using an adversarial loss in the feature space to train the predictor. Our method can predict about 20 seconds into the future and provides better results compared to Denton and Fergus (2018) and Finn et al. (2016) on the Human 3.6M dataset.

...read moreread less

77 citations

Posted Content•

Evaluating Feature Importance Estimates

[...]

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim

28 Jun 2018-arXiv: Learning

TL;DR: ROAR, RemOve And Retrain is introduced, a benchmark to evaluate the accuracy of interpretability methods that estimate input feature importance in deep neural networks and averaging a set of squared noisy estimators leads to significant gains in accuracy for each method considered.

...read moreread less

Abstract: Estimating the influence of a given feature to a model prediction is challenging. We introduce ROAR, RemOve And Retrain, a benchmark to evaluate the accuracy of interpretability methods that estimate input feature importance in deep neural networks. We remove a fraction of input features deemed to be most important according to each estimator and measure the change to the model accuracy upon retraining. The most accurate estimator will identify inputs as important whose removal causes the most damage to model performance relative to all other estimators. This evaluation produces thought-provoking results -- we find that several estimators are less accurate than a random assignment of feature importance. However, averaging a set of squared noisy estimators (a variant of a technique proposed by Smilkov et al. (2017)), leads to significant gains in accuracy for each method considered and far outperforms such a random guess.

...read moreread less

74 citations

Showing papers by "Dumitru Erhan published in 2018"