Home
/
Authors
/
Haroon Idrees

Author

Haroon Idrees

Other affiliations: International Islamic University, Islamabad, University of Central Florida, Carnegie Mellon University

Bio: Haroon Idrees is an academic researcher from University of Sargodha. The author has contributed to research in topics: Frame (networking) & TRECVID. The author has an hindex of 22, co-authored 53 publications receiving 2517 citations. Previous affiliations of Haroon Idrees include International Islamic University, Islamabad & University of Central Florida.

Papers published on a yearly basis

2022
2021
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Multi-source Multi-scale Counting in Extremely Dense Crowd Images

[...]

Haroon Idrees¹, Imran Saleemi¹, Cody Seibert¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

23 Jun 2013

TL;DR: This work relies on multiple sources such as low confidence head detections, repetition of texture elements, and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region, and employs a global consistency constraint on counts using Markov Random Field.

...read moreread less

Abstract: We propose to leverage multiple sources of information to compute an estimate of the number of individuals present in an extremely dense crowd visible in a single image. Due to problems including perspective, occlusion, clutter, and few pixels per person, counting by human detection in such images is almost impossible. Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Secondly, we employ a global consistency constraint on counts using Markov Random Field. This caters for disparity in counts in local neighborhoods and across scales. We tested our approach on a new dataset of fifty crowd images containing 64K annotated humans, with the head counts ranging from 94 to 4543. This is in stark contrast to datasets used for existing methods which contain not more than tens of individuals. We experimentally demonstrate the efficacy and reliability of the proposed approach by quantifying the counting performance.

...read moreread less

897 citations

Book Chapter•DOI•

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

[...]

Haroon Idrees¹, Muhmmad Tayyab², Kishan Athrey², Dong Zhang³, Somaya Al-Maadeed⁴, Nasir M. Rajpoot⁵, Mubarak Shah² - Show less +3 more•Institutions (5)

Carnegie Mellon University¹, University of Central Florida², Nvidia³, Qatar University⁴, University of Warwick⁵

08 Sep 2018

TL;DR: A novel approach is proposed that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image and significantly outperforms state-of-the-art on the new dataset, which is the most challenging dataset with the largest number of crowd annotations in the most diverse set of scenes.

...read moreread less

Abstract: With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision. In particular, counting in highly dense crowds is a challenging problem with far-reaching applicability in crowd safety and management, as well as gauging political significance of protests and demonstrations. In this paper, we propose a novel approach that simultaneously solves the problems of counting, density map estimation and localization of people in a given dense crowd image. Our formulation is based on an important observation that the three problems are inherently related to each other making the loss function for optimizing a deep CNN decomposable. Since localization requires high-quality images and annotations, we introduce UCF-QNRF dataset that overcomes the shortcomings of previous datasets, and contains 1.25 million humans manually marked with dot annotations. Finally, we present evaluation measures and comparison with recent deep CNNs, including those developed specifically for crowd counting. Our approach significantly outperforms state-of-the-art on the new dataset, which is the most challenging dataset with the largest number of crowd annotations in the most diverse set of scenes.

...read moreread less

579 citations

Journal Article•DOI•

The THUMOS challenge on action recognition for videos “in the wild”

[...]

Haroon Idrees¹, Amir Roshan Zamir², Yu-Gang Jiang³, Alexander Gorban⁴, Ivan Laptev⁵, Rahul Sukthankar⁴, Mubarak Shah¹ - Show less +3 more•Institutions (5)

University of Central Florida¹, Stanford University², Fudan University³, Google⁴, French Institute for Research in Computer Science and Automation⁵

01 Feb 2017-Computer Vision and Image Understanding

TL;DR: The THUMOS benchmark is described in detail and an overview of data collection and annotation procedures are given, including a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

...read moreread less

415 citations

Book Chapter•DOI•

Detection and tracking of large number of targets in wide area surveillance

[...]

Vladimir Reilly¹, Haroon Idrees¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

05 Sep 2010

TL;DR: This paper divides the scene into grid cells, solves the tracking problem optimally within each cell using bipartite graph matching and then link tracks across cells, and uses median background modeling which requires few frames to obtain a workable model.

...read moreread less

Abstract: In this paper, we tackle the problem of object detection and tracking in a new and challenging domain of wide area surveillance. This problem poses several challenges: large camera motion, strong parallax, large number of moving objects, small number of pixels on target, single channel data and low framerate of video. We propose a method that overcomes these challenges and evaluate it on CLIF dataset. We use median background modeling which requires few frames to obtain a workable model. We remove false detections due to parallax and registration errors using gradient information of the background image. In order to keep complexity of the tracking problem manageable, we divide the scene into grid cells, solve the tracking problem optimally within each cell using bipartite graph matching and then link tracks across cells. Besides tractability, grid cells allow us to define a set of local scene constraints such as road orientation and object context. We use these constraints as part of cost function to solve the tracking problem which allows us to track fast-moving objects in low framerate videos. In addition to that, we manually generated groundtruth for four sequences and performed quantitative evaluation of the proposed algorithm.

...read moreread less

209 citations

Proceedings Article•DOI•

NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization

[...]

Mahdi M. Kalayeh¹, Haroon Idrees¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

23 Jun 2014

TL;DR: The key idea is to learn query-specific generative model on the features of nearest-neighbors and tags using the proposed NMF-KNN approach which imposes consensus constraint on the coefficient matrices across different features to solve the problem of feature fusion.

...read moreread less

Abstract: The real world image databases such as Flickr are characterized by continuous addition of new images. The recent approaches for image annotation, i.e. the problem of assigning tags to images, have two major drawbacks. First, either models are learned using the entire training data, or to handle the issue of dataset imbalance, tag-specific discriminative models are trained. Such models become obsolete and require relearning when new images and tags are added to database. Second, the task of feature-fusion is typically dealt using ad-hoc approaches. In this paper, we present a weighted extension of Multi-view Non-negative Matrix Factorization (NMF) to address the aforementioned drawbacks. The key idea is to learn query-specific generative model on the features of nearest-neighbors and tags using the proposed NMF-KNN approach which imposes consensus constraint on the coefficient matrices across different features. This results in coefficient vectors across features to be consistent and, thus, naturally solves the problem of feature fusion, while the weight matrices introduced in the proposed formulation alleviate the issue of dataset imbalance. Furthermore, our approach, being query-specific, is unaffected by addition of images and tags in a database. We tested our method on two datasets used for evaluation of image annotation and obtained competitive results.

...read moreread less

137 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

기독교 사역과 Leadership

[...]

유화자

01 May 1997

TL;DR: Coaching & Communicating for Performance Coaching and communicating for Performance is a highly interactive program that will give supervisors and managers the opportunity to build skills that will enable them to share expectations and set objectives for employees, provide constructive feedback, more effectively engage in learning conversations, and coaching opportunities as mentioned in this paper.

...read moreread less

Abstract: Building Leadership Effectiveness This program encourages leaders to develop practices that transform values into action, vision into realities, obstacles into innovations, and risks into rewards. Participants will be introduced to the five practices of exemplary leadership: modeling the way, inspiring a shared vision, challenging the process, enabling others to act, and encouraging the heart Coaching & Communicating for Performance Coaching & Communicating for Performance is a highly interactive program that will give supervisors and managers the opportunity to build skills that will enable them to share expectations and set objectives for employees, provide constructive feedback, more effectively engage in learning conversations, and coaching opportunities. Skillful Conflict Management for Leaders As a leader, it is important to understand conflict and be effective at conflict management because the way conflict is resolved becomes an integral component of our university’s culture. This series of conflict management sessions help leaders learn and put into practice effective strategies for managing conflict.

...read moreread less

4,935 citations

Proceedings Article•DOI•

Single-Image Crowd Counting via Multi-Column Convolutional Neural Network

[...]

Yingying Zhang¹, Desen Zhou¹, Siqin Chen¹, Shenghua Gao¹, Yi Ma¹ - Show less +1 more•Institutions (1)

ShanghaiTech University¹

27 Jun 2016

TL;DR: With the proposed simple MCNN model, the method outperforms all existing methods and experiments show that the model, once trained on one dataset, can be readily transferred to a new dataset.

...read moreread less

Abstract: This paper aims to develop a method than can accurately estimate the crowd count from an individual image with arbitrary crowd density and arbitrary perspective. To this end, we have proposed a simple but effective Multi-column Convolutional Neural Network (MCNN) architecture to map the image to its crowd density map. The proposed MCNN allows the input image to be of arbitrary size or resolution. By utilizing filters with receptive fields of different sizes, the features learned by each column CNN are adaptive to variations in people/head size due to perspective effect or image resolution. Furthermore, the true density map is computed accurately based on geometry-adaptive kernels which do not need knowing the perspective map of the input image. Since exiting crowd counting datasets do not adequately cover all the challenging situations considered in our work, we have collected and labelled a large new dataset that includes 1198 images with about 330,000 heads annotated. On this challenging new dataset, as well as all existing datasets, we conduct extensive experiments to verify the effectiveness of the proposed model and method. In particular, with the proposed simple MCNN model, our method outperforms all existing methods. In addition, experiments show that our model, once trained on one dataset, can be readily transferred to a new dataset.

...read moreread less

1,603 citations

Proceedings Article•DOI•

Cross-scene crowd counting via deep convolutional neural networks

[...]

Cong Zhang¹, Hongsheng Li², Xiaogang Wang², Xiaokang Yang¹•Institutions (2)

Shanghai Jiao Tong University¹, The Chinese University of Hong Kong²

07 Jun 2015

TL;DR: A deep convolutional neural network is proposed for crowd counting, and it is trained alternatively with two related learning objectives, crowd density and crowd count, to obtain better local optimum for both objectives.

...read moreread less

Abstract: Cross-scene crowd counting is a challenging task where no laborious data annotation is required for counting people in new target surveillance crowd scenes unseen in the training set. The performance of most existing crowd counting methods drops significantly when they are applied to an unseen scene. To address this problem, we propose a deep convolutional neural network (CNN) for crowd counting, and it is trained alternatively with two related learning objectives, crowd density and crowd count. This proposed switchable learning approach is able to obtain better local optimum for both objectives. To handle an unseen target crowd scene, we present a data-driven method to finetune the trained CNN model for the target scene. A new dataset including 108 crowd scenes with nearly 200,000 head annotations is introduced to better evaluate the accuracy of cross-scene crowd counting methods. Extensive experiments on the proposed and another two existing datasets demonstrate the effectiveness and reliability of our approach.

...read moreread less

1,143 citations

Proceedings Article•DOI•

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

[...]

Yuhong Li¹, Yuhong Li², Xiaofan Zhang¹, Deming Chen¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Beijing University of Posts and Telecommunications²

18 Jun 2018

TL;DR: CSRNet as discussed by the authors is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations.

...read moreread less

Abstract: We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the ShanghaiTech Part_B dataset, CSRNet achieves 47.3% lower Mean Absolute Error (MAE) than the previous state-of-the-art method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the-art approach.

...read moreread less

1,120 citations

Proceedings Article•DOI•

AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions

[...]

Chunhui Gu¹, Chen Sun¹, David A. Ross¹, Carl Vondrick¹, Caroline Pantofaru¹, Yeqing Li¹, Sudheendra Vijayanarasimhan¹, George Toderici¹, Susanna Ricco¹, Rahul Sukthankar¹, Cordelia Schmid¹, Jitendra Malik², Jitendra Malik¹ - Show less +9 more•Institutions (2)

Google¹, Lawrence Berkeley National Laboratory²

18 Jun 2018

TL;DR: The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently.

...read moreread less

Abstract: This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video clips, where actions are localized in space and time, resulting in 1.59M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.8% mAP, underscoring the need for developing new approaches for video understanding.

...read moreread less

850 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse