Home
/
Authors
/
Nitish Srivastava

Author

Nitish Srivastava

Other affiliations: Indian Institute of Technology Kanpur, Cornell University, University of Toronto

Bio: Nitish Srivastava is an academic researcher from Apple Inc.. The author has contributed to research in topics: Generative model & Boltzmann machine. The author has an hindex of 22, co-authored 41 publications receiving 40184 citations. Previous affiliations of Nitish Srivastava include Indian Institute of Technology Kanpur & Cornell University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations

[...]

Nitish Srivastava¹, Hanchen Jin¹, Shaden Smith², Hongbo Rong³, David H. Albonesi¹, Zhiru Zhang¹ - Show less +2 more•Institutions (3)

Cornell University¹, Microsoft², Intel³

01 Feb 2020

TL;DR: This work proposes a hardware accelerator that can accelerate both dense and sparse tensor factorizations and co-designs the hardware and a sparse storage format, which allows accessing the sparse data in vectorized and streaming fashion and maximizes the utilization of the memory bandwidth.

...read moreread less

Abstract: Tensor factorizations are powerful tools in many machine learning and data analytics applications. Tensors are often sparse, which makes sparse tensor factorizations memory bound. In this work, we propose a hardware accelerator that can accelerate both dense and sparse tensor factorizations. We co-design the hardware and a sparse storage format, which allows accessing the sparse data in vectorized and streaming fashion and maximizes the utilization of the memory bandwidth. We extract a common computation pattern that is found in numerous matrix and tensor operations and implement it in the hardware. By designing the hardware based on this common compute pattern, we can not only accelerate tensor factorizations but also mixed sparse-dense matrix operations. We show significant speedup and energy benefit over the state-of-the-art CPU and GPU implementations of tensor factorizations and over CPU, GPU and accelerators for matrix operations.

...read moreread less

77 citations

Posted Content•

Learning Generative Models with Visual Attention

[...]

Yichuan Tang¹, Nitish Srivastava¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

20 Dec 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep-learning based generative framework using attention that can robustly attend to the face region of novel test subjects and can learn generative models of new faces from a novel dataset of large images where the face locations are not known.

...read moreread less

Abstract: Attention has long been proposed by psychologists as important for effectively dealing with the enormous sensory stimulus available in the neocortex. Inspired by the visual attention models in computational neuroscience and the need of object-centric data for generative models, we describe for generative learning framework using attentional mechanisms. Attentional mechanisms can propagate signals from region of interest in a scene to an aligned canonical representation, where generative modeling takes place. By ignoring background clutter, generative models can concentrate their resources on the object of interest. Our model is a proper graphical model where the 2D Similarity transformation is a part of the top-down process. A ConvNet is employed to provide good initializations during posterior inference which is based on Hamiltonian Monte Carlo. Upon learning images of faces, our model can robustly attend to face regions of novel test subjects. More importantly, our model can learn generative models of new faces from a novel dataset of large images where the face locations are not known.

...read moreread less

70 citations

Proceedings Article•DOI•

Enriching textbooks through data mining

[...]

Rakesh Agrawal¹, Sreenivas Gollapudi¹, Krishnaram Kenthapadi¹, Nitish Srivastava², Raja Velu³ - Show less +1 more•Institutions (3)

Microsoft¹, Indian Institute of Technology Kanpur², Syracuse University³

17 Dec 2010

TL;DR: This work uses ideas from data mining for identifying the concepts that need augmentation as well as to determine the links to the authoritative content that should be used for augmentation in textbooks.

...read moreread less

Abstract: Textbooks play an important role in any educational system. Unfortunately, many textbooks produced in developing countries are not written well and they often lack adequate coverage of important concepts. We propose a technological solution to address this problem based on enriching textbooks with authoritative web content. We augment textbooks at the section level for key concepts discussed in the section. We use ideas from data mining for identifying the concepts that need augmentation as well as to determine the links to the authoritative content that should be used for augmentation. Our evaluation, employing textbooks from India, shows that we are able to enrich textbooks on different subjects and across different grades with high quality augmentations using automated techniques.

...read moreread less

55 citations

Proceedings Article•DOI•

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations

[...]

Nitish Srivastava¹, Hongbo Rong², Prithayan Barua³, Guanyu Feng⁴, Huanqi Cao⁴, Zhiru Zhang¹, David H. Albonesi¹, Vivek Sarkar³, Wenguang Chen⁴, Paul Petersen², Geoff Lowney², Adam W. Herr², Christopher J. Hughes², Timothy G. Mattson², Pradeep Dubey² - Show less +11 more•Institutions (4)

Cornell University¹, Intel², Georgia Institute of Technology³, Tsinghua University⁴

01 Apr 2019

TL;DR: A language and compilation framework for productively generating high-performance systolic arrays for dense tensor kernels on spatial architectures, including FPGAs and CGRAs, which decouples a functional specification from a spatial mapping, allowing programmers to quickly explore various spatial optimizations for the same function.

...read moreread less

Abstract: We present a language and compilation framework for productively generating high-performance systolic arrays for dense tensor kernels on spatial architectures, including FPGAs and CGRAs. It decouples a functional specification from a spatial mapping, allowing programmers to quickly explore various spatial optimizations for the same function. The actual implementation of these optimizations is left to a compiler. Thus, productivity and performance are achieved at the same time. We used this framework to implement several important dense tensor kernels. We implemented dense matrix multiply for an Arria-10 FPGA and a research CGRA, achieving 88% and 92% of the performance of manually written, and highly optimized expert (ninja") implementations in just 3% of their engineering time. Three other tensor kernels, including MTTKRP, TTM and TTMc, were also implemented with high performance and low design effort, and for the first time on spatial architectures."

...read moreread less

47 citations

Proceedings Article•

Capsules with Inverted Dot-Product Attention Routing

[...]

Yao-Hung Hubert Tsai¹, Nitish Srivastava², Hanlin Goh³, Ruslan Salakhutdinov¹•Institutions (3)

Carnegie Mellon University¹, Cornell University², Apple Inc.³

30 Apr 2020

TL;DR: A new routing algorithm for capsule networks is introduced, in which a child capsule is routed to a parent based only on agreement between the parent's state and the child's vote, which improves performance on benchmark datasets and performs at-par with a powerful CNN with 4x fewer parameters.

...read moreread less

Abstract: We introduce a new routing algorithm for capsule networks, in which a child capsule is routed to a parent based only on agreement between the parent's state and the child's vote. Unlike previously proposed routing algorithms, the parent's ability to reconstruct the child is not explicitly taken into account to update the routing probabilities. This simplifies the routing procedure and improves performance on benchmark datasets such as CIFAR-10 and CIFAR-100. The new mechanism 1) designs routing via inverted dot-product attention; 2) imposes Layer Normalization as normalization; and 3) replaces sequential iterative routing with concurrent iterative routing. Besides outperforming existing capsule networks, our model performs at-par with a powerful CNN (ResNet-18), using less than 25% of the parameters. On a different task of recognizing digits from overlayed digit images, the proposed capsule model performs favorably against CNNs given the same number of layers and neurons per layer. We believe that our work raises the possibility of applying capsule networks to complex real-world tasks.

...read moreread less

36 citations

…
1
2
3
4
5
6
7
…
8
9

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Posted Content•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

10 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

44,703 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse