scispace - formally typeset
Search or ask a question
Author

K. S. Venkatesh

Bio: K. S. Venkatesh is an academic researcher from Indian Institute of Technology Kanpur. The author has contributed to research in topics: Depth map & Image segmentation. The author has an hindex of 12, co-authored 117 publications receiving 511 citations. Previous affiliations of K. S. Venkatesh include Kingston University & Indian Institutes of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: Simulation and experimental results show the usefulness of the new method for generating paths in rough terrains, and prove that thenew method is superior to conventional potential field method.

66 citations

Proceedings Article
01 Jan 2018
TL;DR: This paper investigates the problem of Domain Shift in action videos, an area that has remained under-explored, and proposes two new approaches named Action Modeling on Latent Subspace (AMLS) and Deep Adversarial Action Adaptation (DAAA).
Abstract: In the general settings of supervised learning, human action recognition has been a widely studied topic. The classifiers learned in this setting assume that the training and test data have been sampled from the same underlying probability distribution. However, in most of the practical scenarios, this assumption is not true, resulting in a suboptimal performance of the classifiers. This problem, referred to as Domain Shift, has been extensively studied, but mostly for image/object classification task. In this paper, we investigate the problem of Domain Shift in action videos, an area that has remained under-explored, and propose two new approaches named Action Modeling on Latent Subspace (AMLS) and Deep Adversarial Action Adaptation (DAAA). In the AMLS approach, the action videos in the target domain are modeled as a sequence of points on a latent subspace and adaptive kernels are successively learned between the source domain point and the sequence of target domain points on the manifold. In the DAAA approach, an end-to-end adversarial learning framework is proposed to align the two domains. The action adaptation experiments were conducted using various combinations of multi-domain action datasets, including six common classes of Olympic Sports and UCF50 datasets and all classes of KTH, MSR and our own SonyCam datasets. In this paper, we have achieved consistent improvements over chosen baselines and obtained some state-of-the-art results for the datasets.

64 citations

Journal ArticleDOI
TL;DR: In this paper, a method of estimating the number of people in high density crowds from still images is presented, which uses multiple sources, viz. interest points (SIFT), Fourier analysis, wavelet decomposition, GLCM features and low confidence head detections, to estimate the counts.
Abstract: We present a method of estimating the number of people in high density crowds from still images. The method estimates counts by fusing information from multiple sources. Most of the existing work on crowd counting deals with very small crowds (tens of individuals) and use temporal information from videos. Our method uses only still images to estimate the counts in high density images (hundreds to thousands of individuals). At this scale, we cannot rely on only one set of features for count estimation. We, therefore, use multiple sources, viz. interest points (SIFT), Fourier analysis, wavelet decomposition, GLCM features and low confidence head detections, to estimate the counts. Each of these sources gives a separate estimate of the count along with confidences and other statistical measures which are then combined to obtain the final estimate. We test our method on an existing dataset of fifty images containing over 64000 individuals. Further, we added another fifty annotated images of crowds and tested on the complete dataset of hundred images containing over 87000 individuals. The counts per image range from 81 to 4633. We report the performance in terms of mean absolute error, which is a measure of accuracy of the method, and mean normalised absolute error, which is a measure of the robustness.

42 citations

01 Jan 2005
TL;DR: In this article, a set of predicates describing acomprehensive set of possible surveillance event primitives including entry/exit, partial or complete occlusions by background objects, crowding, splitting of agents and algorithm failures resulting from track loss are evaluated based on the fractional overlaps between the localized regions and ground blobs.
Abstract: Tracking multiple agents inamonocular visual surveillance system isoften challenged bythephenomenon ofocclusions. Agents entering thefield ofviewcanundergo twodifferent formsofocclusions, either caused bycrowding ordueto obstructions bybackground objects atfinite distances fromthe camera. Theagents areprimarily detected asforeground blobs andarecharacterized bytheir motion history andweighted color histograms. These features arefurther usedfor localizing theminsubsequent frames through motion prediction assisted meanshift tracking. A numberofBoolean predicates are evaluated basedonthefractional overlaps between thelocalized regions andforeground blobs. Weconstruct predicates describing acomprehensive setofpossible surveillance event primitives including entry/exit, partial orcomplete occlusions bybackground objects, crowding, splitting ofagents and algorithm failures resulting fromtrack loss. Instantiation of these event primitives followed byselective feature updates enables ustodevelop aneffective scheme fortracking multiple agents inrelatively unconstrained environments.

26 citations

Posted Content
TL;DR: In this article, a method of estimating the number of people in high density crowds from still images is presented, which uses multiple sources, viz. interest points (SIFT), Fourier analysis, wavelet decomposition, GLCM features and low confidence head detections, to estimate the counts.
Abstract: We present a method of estimating the number of people in high density crowds from still images. The method estimates counts by fusing information from multiple sources. Most of the existing work on crowd counting deals with very small crowds (tens of individuals) and use temporal information from videos. Our method uses only still images to estimate the counts in high density images (hundreds to thousands of individuals). At this scale, we cannot rely on only one set of features for count estimation. We, therefore, use multiple sources, viz. interest points (SIFT), Fourier analysis, wavelet decomposition, GLCM features and low confidence head detections, to estimate the counts. Each of these sources gives a separate estimate of the count along with confidences and other statistical measures which are then combined to obtain the final estimate. We test our method on an existing dataset of fifty images containing over 64000 individuals. Further, we added another fifty annotated images of crowds and tested on the complete dataset of hundred images containing over 87000 individuals. The counts per image range from 81 to 4633. We report the performance in terms of mean absolute error, which is a measure of accuracy of the method, and mean normalised absolute error, which is a measure of the robustness.

24 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.

2,738 citations

Proceedings ArticleDOI
Yingying Zhang1, Desen Zhou1, Siqin Chen1, Shenghua Gao1, Yi Ma1 
27 Jun 2016
TL;DR: With the proposed simple MCNN model, the method outperforms all existing methods and experiments show that the model, once trained on one dataset, can be readily transferred to a new dataset.
Abstract: This paper aims to develop a method than can accurately estimate the crowd count from an individual image with arbitrary crowd density and arbitrary perspective. To this end, we have proposed a simple but effective Multi-column Convolutional Neural Network (MCNN) architecture to map the image to its crowd density map. The proposed MCNN allows the input image to be of arbitrary size or resolution. By utilizing filters with receptive fields of different sizes, the features learned by each column CNN are adaptive to variations in people/head size due to perspective effect or image resolution. Furthermore, the true density map is computed accurately based on geometry-adaptive kernels which do not need knowing the perspective map of the input image. Since exiting crowd counting datasets do not adequately cover all the challenging situations considered in our work, we have collected and labelled a large new dataset that includes 1198 images with about 330,000 heads annotated. On this challenging new dataset, as well as all existing datasets, we conduct extensive experiments to verify the effectiveness of the proposed model and method. In particular, with the proposed simple MCNN model, our method outperforms all existing methods. In addition, experiments show that our model, once trained on one dataset, can be readily transferred to a new dataset.

1,603 citations

Patent
05 Apr 2006
TL;DR: In this paper, a video surveillance system extracts video primitives and extracts event occurrences from the video primifiers using event discriminators, and the system can undertake a response such as an alarm, based on extracted event occurrences.
Abstract: A video surveillance system extracts video primitives and extracts event occurrences from the video primitives using event discriminators. The system can undertake a response, such as an alarm, based on extracted event occurrences.

599 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the state-of-the-art computer vision for traffic video with a critical analysis and an outlook to future research directions is presented.
Abstract: Automatic video analysis from urban surveillance cameras is a fast-emerging field based on computer vision techniques. We present here a comprehensive review of the state-of-the-art computer vision for traffic video with a critical analysis and an outlook to future research directions. This field is of increasing relevance for intelligent transport systems (ITSs). The decreasing hardware cost and, therefore, the increasing deployment of cameras have opened a wide application field for video analytics. Several monitoring objectives such as congestion, traffic rule violation, and vehicle interaction can be targeted using cameras that were typically originally installed for human operators. Systems for the detection and classification of vehicles on highways have successfully been using classical visual surveillance techniques such as background estimation and motion tracking for some time. The urban domain is more challenging with respect to traffic density, lower camera angles that lead to a high degree of occlusion, and the variety of road users. Methods from object categorization and 3-D modeling have inspired more advanced techniques to tackle these challenges. There is no commonly used data set or benchmark challenge, which makes the direct comparison of the proposed algorithms difficult. In addition, evaluation under challenging weather conditions (e.g., rain, fog, and darkness) would be desirable but is rarely performed. Future work should be directed toward robust combined detectors and classifiers for all road users, with a focus on realistic conditions during evaluation.

579 citations