Home
/
Authors
/
Ram Nevatia

Author

Ram Nevatia

Other affiliations: Facebook

Bio: Ram Nevatia is an academic researcher from University of Southern California. The author has contributed to research in topics: Object detection & Computer science. The author has an hindex of 35, co-authored 100 publications receiving 5841 citations. Previous affiliations of Ram Nevatia include Facebook.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2004
2003
2000
1998
1989
1988
1987
1979
1977

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors

[...]

Bo Wu¹, Ram Nevatia¹•Institutions (1)

University of Southern California¹

01 Nov 2007-International Journal of Computer Vision

TL;DR: This work presents an approach to automatically detect and track multiple, possibly partially occluded humans in a walking or standing pose from a single camera, which may be stationary or moving.

...read moreread less

Abstract: Detection and tracking of humans in video streams is important for many applications. We present an approach to automatically detect and track multiple, possibly partially occluded humans in a walking or standing pose from a single camera, which may be stationary or moving. A human body is represented as an assembly of body parts. Part detectors are learned by boosting a number of weak classifiers which are based on edgelet features. Responses of part detectors are combined to form a joint likelihood model that includes an analysis of possible occlusions. The combined detection responses and the part detection responses provide the observations used for tracking. Trajectory initialization and termination are both automatic and rely on the confidences computed from the detection responses. An object is tracked by data association and meanshift methods. Our system can track humans with both inter-object and scene occlusions with static or non-static backgrounds. Evaluation results on a number of images and videos and comparisons with some previous methods are given.

...read moreread less

836 citations

Proceedings Article•DOI•

Learning to associate: HybridBoosted multi-target tracker for crowded scene

[...]

Yuan Li¹, Chang Huang¹, Ram Nevatia¹•Institutions (1)

University of Southern California¹

20 Jun 2009

TL;DR: A learning-based hierarchical approach of multi-target tracking from a single camera by progressively associating detection responses into longer and longer track fragments (tracklets) and finally the desired target trajectories by virtue of a HybridBoost algorithm.

...read moreread less

Abstract: We propose a learning-based hierarchical approach of multi-target tracking from a single camera by progressively associating detection responses into longer and longer track fragments (tracklets) and finally the desired target trajectories. To define tracklet affinity for association, most previous work relies on heuristically selected parametric models; while our approach is able to automatically select among various features and corresponding non-parametric models, and combine them to maximize the discriminative power on training data by virtue of a HybridBoost algorithm. A hybrid loss function is used in this algorithm because the association of tracklet is formulated as a joint problem of ranking and classification: the ranking part aims to rank correct tracklet associations higher than other alternatives; the classification part is responsible to reject wrong associations when no further association should be done. Experiments are carried out by tracking pedestrians in challenging datasets. We compare our approach with state-of-the-art algorithms to show its improvement in terms of tracking accuracy.

...read moreread less

637 citations

Journal Article•DOI•

Video-based event recognition: activity representation and probabilistic recognition methods

[...]

Somboon Hongeng¹, Ram Nevatia¹, Francois Bremond¹•Institutions (1)

University of Southern California¹

01 Nov 2004-Computer Vision and Image Understanding

TL;DR: A new representation and recognition method for human activities that recognizes multi-agent events by propagating the constraints and likelihood of event threads in a temporal logic network and presents results on real-world data and performance characterization on perturbed data.

...read moreread less

351 citations

Proceedings Article•DOI•

An online learned CRF model for multi-target tracking

[...]

Bo Yang¹, Ram Nevatia¹•Institutions (1)

University of Southern California¹

16 Jun 2012

TL;DR: The online CRF approach is more powerful at distinguishing spatially close targets with similar appearances, as well as in dealing with camera motions, and an efficient algorithm is introduced for finding an association with low energy cost.

...read moreread less

Abstract: We introduce an online learning approach for multitarget tracking Detection responses are gradually associated into tracklets in multiple levels to produce final tracks Unlike most previous approaches which only focus on producing discriminative motion and appearance models for all targets, we further consider discriminative features for distinguishing difficult pairs of targets The tracking problem is formulated using an online learned CRF model, and is transformed into an energy minimization problem The energy functions include a set of unary functions that are based on motion and appearance models for discriminating all targets, as well as a set of pairwise functions that are based on models for differentiating corresponding pairs of tracklets The online CRF approach is more powerful at distinguishing spatially close targets with similar appearances, as well as in dealing with camera motions An efficient algorithm is introduced for finding an association with low energy cost We evaluate our approach on three public data sets, and show significant improvements compared with several state-of-art methods

...read moreread less

344 citations

Journal Article•DOI•

Detecting buildings in aerial images

[...]

A. Huertas¹, Ram Nevatia¹•Institutions (1)

University of Southern California¹

01 Feb 1988-Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing

TL;DR: Detecting building structures in aerial images by using a generic model of the shapes of the structures looking for — that they are rectangular or composed of rectangular components to confirm their presence and to estimate their height.

...read moreread less

Abstract: Detecting building structures in aerial images is a task of importance for many applications. Low-level segmentation rarely gives a complete outline of the desired structures. We use a generic model of the shapes of the structures we are looking for — that they are rectangular or composed of rectangular components. We also use shadows cast by buildings to confirm their presence and to estimate their height. Our techniques have been tested on images with density typical of suburban areas.

...read moreread less

322 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou¹, Iasonas Kokkinos², Kevin Murphy¹, Alan L. Yuille³ - Show less +1 more•Institutions (3)

Google¹, University College London², Johns Hopkins University³

01 Apr 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

...read moreread less

Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First , we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second , we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third , we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

...read moreread less

11,856 citations

Posted Content•

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou¹, Iasonas Kokkinos², Kevin Murphy¹, Alan L. Yuille³ - Show less +1 more•Institutions (3)

Google¹, University College London², Johns Hopkins University³

02 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.

...read moreread less

Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

...read moreread less

10,120 citations

Journal Article•DOI•

Object Detection With Deep Learning: A Review

[...]

Zhong-Qiu Zhao¹, Peng Zheng¹, Shou-Tao Xu¹, Xindong Wu²•Institutions (2)

Hefei University of Technology¹, University of Louisiana at Lafayette²

28 Jan 2019-IEEE Transactions on Neural Networks

TL;DR: In this article, a review of deep learning-based object detection frameworks is provided, focusing on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.

...read moreread less

Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.

...read moreread less

3,097 citations

Journal Article•DOI•

A survey of advances in vision-based human motion capture and analysis

[...]

Thomas B. Moeslund¹, Adrian Hilton², Volker Krüger³•Institutions (3)

Aalborg University¹, University of Surrey², Aalborg University – Copenhagen³

01 Nov 2006-Computer Vision and Image Understanding

TL;DR: This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.

...read moreread less

2,738 citations

Journal Article•DOI•

National Institute of Standards and Technology における超伝導研究及び生活

[...]

尚島影

01 Oct 2001-Ieej Transactions on Fundamentals and Materials

2,687 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse