Home
/
Authors
/
Aditya Khosla

Author

Aditya Khosla

Other affiliations: Stanford University, Open University of Catalonia

Bio: Aditya Khosla is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Object detection & Cognitive neuroscience of visual object recognition. The author has an hindex of 39, co-authored 61 publications receiving 50417 citations. Previous affiliations of Aditya Khosla include Stanford University & Open University of Catalonia.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Machine Learning Approach Enables Quantitative Measurement of Liver Histology and Disease Monitoring in NASH.

[...]

Amaro Taylor-Weiner, Harsha Pokkalla, Ling Han, Catherine Jia, Ryan S Huss, Chuhan Chung, Hunter L. Elliott, Benjamin Glass, Kishalve Pethia, Oscar Carrasco-Zevallos, Chinmay Shukla, Urmila Khettry¹, Robert Najarian, Ross Taliano², G. Mani Subramanian, Robert P. Myers, Ilan Wapinski, Aditya Khosla, Murray B. Resnick², Michael Christopher Montalto, Quentin M. Anstee³, Vincent Wai-Sun Wong⁴, Michael Trauner⁵, Eric Lawitz⁶, Stephen A. Harrison⁷, Takeshi Okanoue, Manuel Romero-Gómez, Zachary Goodman⁸, Rohit Loomba⁹, Andrew H. Beck, Zobair M. Younossi⁸ - Show less +27 more•Institutions (9)

Lahey Hospital & Medical Center¹, Brown University², Newcastle University³, The Chinese University of Hong Kong⁴, Medical University of Vienna⁵, University of Texas Health Science Center at San Antonio⁶, Pinnacle Financial Partners⁷, Inova Health System⁸, University of California, San Diego⁹

11 Feb 2021-Hepatology

TL;DR: There is a critical need for improved tools to assess liver pathology in order to risk stratify NASH patients and monitor treatment response.

...read moreread less

83 citations

Journal Article•DOI•

Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes.

[...]

James A. Diao¹, Jason K Wang¹, Wan Fung Chui¹, Victoria Mountain, Sai Chowdary Gullapally, Ramprakash Srinivasan, Richard N. Mitchell², Richard N. Mitchell¹, Benjamin Glass, Sara Hoffman, Sudha K. Rao, Chirag Maheshwari, Abhik Lahiri, Aaditya Prakash, Ryan McLoughlin, Jennifer K. Kerner, Murray B. Resnick³, Michael Christopher Montalto, Aditya Khosla, Ilan Wapinski, Andrew H. Beck, Hunter L. Elliott, Amaro Taylor-Weiner - Show less +19 more•Institutions (3)

Harvard University¹, Brigham and Women's Hospital², Brown University³

12 Mar 2021-Nature Communications

TL;DR: In this article, a human-interpretable image features (HIF) based approach was proposed to predict clinically relevant molecular phenotypes from whole-slide histopathology images.

...read moreread less

Abstract: Computational methods have made substantial progress in improving the accuracy and throughput of pathology workflows for diagnostic, prognostic, and genomic prediction. Still, lack of interpretability remains a significant barrier to clinical integration. We present an approach for predicting clinically-relevant molecular phenotypes from whole-slide histopathology images using human-interpretable image features (HIFs). Our method leverages >1.6 million annotations from board-certified pathologists across >5700 samples to train deep learning models for cell and tissue classification that can exhaustively map whole-slide images at two and four micron-resolution. Cell- and tissue-type model outputs are combined into 607 HIFs that quantify specific and biologically-relevant characteristics across five cancer types. We demonstrate that these HIFs correlate with well-known markers of the tumor microenvironment and can predict diverse molecular signatures (AUROC 0.601–0.864), including expression of four immune checkpoint proteins and homologous recombination deficiency, with performance comparable to ‘black-box’ methods. Our HIF-based approach provides a comprehensive, quantitative, and interpretable window into the composition and spatial architecture of the tumor microenvironment. Computational methods have made progress in improving classification accuracy and throughput of pathology workflows, but lack of interpretability remains a barrier to clinical integration. Here, the authors present an approach for predicting clinically-relevant molecular phenotypes from whole-slide histopathology images using human-interpretable image features.

...read moreread less

80 citations

Posted Content•

3D ShapeNets: A Deep Representation for Volumetric Shapes

[...]

Zhirong Wu¹, Shuran Song¹, Aditya Khosla², Fisher Yu¹, Linguang Zhang¹, Xiaoou Tang³, Jianxiong Xiao¹ - Show less +3 more•Institutions (3)

Princeton University¹, Massachusetts Institute of Technology², The Chinese University of Hong Kong³

22 Jun 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network.

...read moreread less

Abstract: 3D shape is a crucial but heavily underutilized cue in today's computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart from category recognition, recovering full 3D shapes from view-based 2.5D depth maps is also a critical part of visual understanding. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically. It naturally supports joint object recognition and shape completion from 2.5D depth maps, and it enables active object recognition through view planning. To train our 3D deep learning model, we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive experiments show that our 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.

...read moreread less

80 citations

Proceedings Article•DOI•

Following Gaze in Video

[...]

Adrià Recasens¹, Carl Vondrick¹, Aditya Khosla², Antonio Torralba¹•Institutions (2)

Massachusetts Institute of Technology¹, Open University of Catalonia²

01 Oct 2017

TL;DR: An approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame, using VideoGaze, a new dataset which is used as a benchmark to both train and evaluate models.

...read moreread less

Abstract: Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video.

...read moreread less

75 citations

Posted Content•

Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition

[...]

Radoslaw Martin Cichy, Aditya Khosla, Dimitrios Pantazis, Antonio Torralba, Aude Oliva¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

12 Jan 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: It was shown that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams and provided an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

...read moreread less

Abstract: The complex multi-stage architecture of cortical visual pathways provides the neural basis for efficient visual object recognition in humans. However, the stage-wise computations therein remain poorly understood. Here, we compared temporal (magnetoencephalography) and spatial (functional MRI) visual brain representations with representations in an artificial deep neural network (DNN) tuned to the statistics of real-world visual recognition. We showed that the DNN captured the stages of human visual processing in both time and space from early visual areas towards the dorsal and ventral streams. Further investigation of crucial DNN parameters revealed that while model architecture was important, training on real-world categorization was necessary to enforce spatio-temporal hierarchical relationships with the brain. Together our results provide an algorithmically informed view on the spatio-temporal dynamics of visual object recognition in the human visual brain.

...read moreread less

69 citations

1
2
3
4
…
5
6
7
8
9
10
11
…
12
13

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

49,914 citations

Posted Content•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

10 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

44,703 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse