Top 10 papers published by Aditya Khosla from Massachusetts Institute of Technology in 2016

Proceedings Article•DOI•

Learning Deep Features for Discriminative Localization

[...]

Bolei Zhou¹, Aditya Khosla¹, Agata Lapedriza¹, Aude Oliva¹, Antonio Torralba¹ - Show less +1 more•Institutions (1)

27 Jun 2016

TL;DR: This work revisits the global average pooling layer proposed in [13], and sheds light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels.

...read moreread less

Abstract: In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.

...read moreread less

5,978 citations

Posted Content•

Deep Learning for Identifying Metastatic Breast Cancer

[...]

Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, Andrew H. Beck - Show less +1 more

18 Jun 2016-arXiv: Quantitative Methods

TL;DR: The power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses is demonstrated, by combining the deep learning system's predictions with the human pathologist's diagnoses.

...read moreread less

Abstract: The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system's predictions with the human pathologist's diagnoses increased the pathologist's AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.

...read moreread less

739 citations

Journal Article•DOI•

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence

[...]

Radoslaw Martin Cichy¹, Aditya Khosla¹, Dimitrios Pantazis², Antonio Torralba¹, Aude Oliva¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, McGovern Institute for Brain Research²

10 Jun 2016-Scientific Reports

TL;DR: It was shown that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams and provided an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

...read moreread less

Abstract: The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the statistics of real-world visual recognition. We showed that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams. Further investigation of crucial DNN parameters revealed that while model architecture was important, training on real-world categorization was necessary to enforce spatio-temporal hierarchical relationships with the brain. Together our results provide an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

...read moreread less

600 citations

Posted Content•

Eye Tracking for Everyone

[...]

Kyle Krafka¹, Aditya Khosla, Petr Kellnhofer², Harini Kannan, Suchendra M. Bhandarkar¹, Wojciech Matusik, Antonio Torralba - Show less +3 more•Institutions (2)

University of Georgia¹, Max Planck Society²

18 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: iTracker, a convolutional neural network for eye tracking, is trained, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device.

...read moreread less

Abstract: From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2.5M frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device. Our model achieves a prediction error of 1.71cm and 2.53cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.34cm and 2.12cm. Further, we demonstrate that the features learned by iTracker generalize well to other datasets, achieving state-of-the-art results. The code, data, and models are available at this http URL.

...read moreread less

535 citations

Proceedings Article•DOI•

Eye Tracking for Everyone

[...]

Kyle Krafka¹, Aditya Khosla, Petr Kellnhofer², Harini Kannan, Suchendra M. Bhandarkar¹, Wojciech Matusik, Antonio Torralba - Show less +3 more•Institutions (2)

University of Georgia¹, Max Planck Society²

27 Jun 2016

TL;DR: Gaze Capture as mentioned in this paper is the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2:5M frames and trained iTracker, a convolutional neural network, which achieves a significant reduction in error over previous approaches while running in real time (10-15fps) on a modern mobile device.

...read moreread less

Abstract: From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost 2:5M frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10–15fps) on a modern mobile device. Our model achieves a prediction error of 1.71cm and 2.53cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.34cm and 2.12cm. Further, we demonstrate that the features learned by iTracker generalize well to other datasets, achieving state-of-the-art results. The code, data, and models are available at http://gazecapture.csail.mit.edu.

...read moreread less

473 citations

Posted Content•

Places: An Image Database for Deep Scene Understanding

[...]

Bolei Zhou¹, Aditya Khosla², Agata Lapedriza², Antonio Torralba¹, Aude Oliva¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Open University of Catalonia²

06 Oct 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Places Database as discussed by the authors is a collection of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world.

...read moreread less

Abstract: The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories and attributes, comprising a quasi-exhaustive list of the types of environments encountered in the world. Using state of the art Convolutional Neural Networks, we provide impressive baseline performances at scene classification. With its high-coverage and high-diversity of exemplars, the Places Database offers an ecosystem to guide future progress on currently intractable visual recognition problems.

...read moreread less

364 citations

Posted Content•

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition

[...]

Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

12 Jan 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: It was shown that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams and provided an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

...read moreread less

Abstract: The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the statistics of real-world visual recognition. We showed that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams. Further investigation of crucial DNN parameters revealed that while model architecture was important, training on real-world categorization was necessary to enforce spatio-temporal hierarchical relationships with the brain. Together our results provide an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

...read moreread less

69 citations

Journal Article•

Visualizing Object Detection Features

[...]

Hamed Pirsiavash, Tomasz Malisiewicz, Carl Vondrick, Aditya Khosla, Antonio Torralba - Show less +1 more

01 Mar 2016-Springer US

TL;DR: In this article, the authors introduce algorithms to visualize feature spaces used by object detectors by inverting a visual feature back to multiple natural images and finding that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures.

...read moreread less

Abstract: We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they often look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and supports that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors without improving the features. By visualizing feature spaces, we can gain a more intuitive understanding of recognition systems.

...read moreread less

43 citations

Journal Article•DOI•

Visualizing Object Detection Features

[...]

Carl Vondrick¹, Aditya Khosla¹, Hamed Pirsiavash², Tomasz Malisiewicz, Antonio Torralba¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Maryland, Baltimore County²

01 Sep 2016-International Journal of Computer Vision

TL;DR: In this article, the authors introduce algorithms to visualize feature spaces used by object detectors by inverting a visual feature back to multiple natural images and finding that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures.

...read moreread less

Abstract: We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector's failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they often look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and supports that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors without improving the features. By visualizing feature spaces, we can gain a more intuitive understanding of recognition systems.

...read moreread less

27 citations

Posted Content•

Following Gaze Across Views.

[...]

Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

09 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: An approach for following gaze across views by predicting where a particular person is looking throughout a scene by building an end-to-end model that solves the following sub-problems: saliency, gaze pose, and geometric relationships between views.

...read moreread less

Abstract: Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one view with a person in it and a second view of the scene, our model estimates a density for gaze location in the second view. A key aspect of our approach is an end-to-end model that solves the following sub-problems: saliency, gaze pose, and geometric relationships between views. Although our model is supervised only with gaze, we show that the model learns to solve these subproblems automatically without supervision. Experiments suggest that our approach follows gaze better than standard baselines and produces plausible results for everyday situations.

...read moreread less

5 citations

Showing papers by "Aditya Khosla published in 2016"