Showing papers on "Object-class detection published in 2018"

PDF

Open Access

Journal Article•DOI•

3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection

[...]

Xiaozhi Chen¹, Kaustav Kundu², Yukun Zhu², Huimin Ma¹, Sanja Fidler², Raquel Urtasun² - Show less +2 more•Institutions (2)

Tsinghua University¹, University of Toronto²

01 May 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a convolutional neural network (CNN) was employed to jointly regress to 3D bounding box coordinates and object pose for object detection and orientation estimation tasks.

...read moreread less

Abstract: The goal of this paper is to perform 3D object detection in the context of autonomous driving. Our method aims at generating a set of high-quality 3D object proposals by exploiting stereo imagery. We formulate the problem as minimizing an energy function that encodes object size priors, placement of objects on the ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. We then exploit a CNN on top of these proposals to perform object detection. In particular, we employ a convolutional neural net (CNN) that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. When combined with the CNN, our approach outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes. Furthermore, we experiment also with the setting where LIDAR information is available, and show that using both LIDAR and stereo leads to the best result.

...read moreread less

319 citations

Journal Article•DOI•

Crafting GBD-Net for Object Detection

[...]

Zeng Xingyu¹, Wanli Ouyang², Junjie Yan³, Hongsheng Li¹, Tong Xiao¹, Kun Wang¹, Yu Liu³, Yucong Zhou³, Bin Yang³, Zhe Wang¹, Hui Zhou¹, Xiaogang Wang¹ - Show less +8 more•Institutions (3)

The Chinese University of Hong Kong¹, University of Sydney², SenseTime³

01 Sep 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: GBDai et al. as mentioned in this paper proposed a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction.

...read moreread less

Abstract: The visual cues from multiple support regions of different sizes and resolutions are complementary in classifying a candidate box in object detection. Effective integration of local and contextual visual cues from these regions has become a fundamental problem in object detection. In this paper, we propose a gated bi-directional CNN (GBD-Net) to pass messages among features from different support regions during both feature learning and feature extraction. Such message passing can be implemented through convolution between neighboring support regions in two directions and can be conducted in various layers. Therefore, local and contextual visual patterns can validate the existence of each other by learning their nonlinear relationships and their close interactions are modeled in a more complex way. It is also shown that message passing is not always helpful but dependent on individual samples. Gated functions are therefore needed to control message transmission, whose on-or-offs are controlled by extra visual evidence from the input sample. The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO. Besides the GBD-Net, this paper also shows the details of our approach in winning the ImageNet object detection challenge of 2016, with source code provided on https://github.com/craftGBD/craftGBD . In this winning system, the modified GBD-Net, new pretraining scheme and better region proposal designs are provided. We also show the effectiveness of different network structures and existing techniques for object detection, such as multi-scale testing, left-right flip, bounding box voting, NMS, and context.

...read moreread less

136 citations

Journal Article•DOI•

Visual and Semantic Knowledge Transfer for Large Scale Semi-Supervised Object Detection

[...]

Yuxing Tang¹, Josiah Wang², Xiaofang Wang³, Boyang Gao⁴, Emmanuel Dellandréa³, Robert Gaizauskas², Liming Chen³ - Show less +3 more•Institutions (4)

National Institutes of Health¹, University of Sheffield², École centrale de Lyon³, Istituto Italiano di Tecnologia⁴

01 Dec 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Strong evidence is found that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.

...read moreread less

Abstract: Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.

...read moreread less

66 citations

Journal Article•DOI•

Material based salient object detection from hyperspectral images

[...]

Jie Liang¹, Jun Zhou², Lei Tong³, Xiao Bai⁴, Bin Wang¹ - Show less +1 more•Institutions (4)

China Aerodynamics Research and Development Center¹, Griffith University², Beijing University of Technology³, Beihang University⁴

01 Apr 2018-Pattern Recognition

TL;DR: A material-based salient object detection method which can effectively distinguish objects with similar perceived color but different spectral responses, and outperforms several existing hyperspectral salient object Detection approaches and the state-of-the-art methods proposed for RGB images.

...read moreread less

64 citations

Journal Article•DOI•

Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks

[...]

Jun-Cheng Chen¹, Rajeev Ranjan¹, Swami Sankaranarayanan¹, Amit Kumar¹, Ching-Hui Chen¹, Vishal M. Patel², Carlos D. Castillo¹, Rama Chellappa¹ - Show less +4 more•Institutions (2)

University of Maryland, College Park¹, Rutgers University²

01 Apr 2018-International Journal of Computer Vision

TL;DR: In this paper, the authors present the design details of a deep learning system for unconstrained face recognition, including modules for face detection, association, alignment, and face verification.

...read moreread less

Abstract: Over the last 5 years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems. This has been made possible due to the availability of large annotated datasets, a better understanding of the non-linear mapping between input images and class labels as well as the affordability of GPUs. In this paper, we present the design details of a deep learning system for unconstrained face recognition, including modules for face detection, association, alignment and face verification. The quantitative performance evaluation is conducted using the IARPA Janus Benchmark A (IJB-A), the JANUS Challenge Set 2 (JANUS CS2), and the Labeled Faces in the Wild (LFW) dataset. The IJB-A dataset includes real-world unconstrained faces of 500 subjects with significant pose and illumination variations which are much harder than the LFW and Youtube Face datasets. JANUS CS2 is the extended version of IJB-A which contains not only all the images/frames of IJB-A but also includes the original videos. Some open issues regarding DCNNs for face verification problems are then discussed.

...read moreread less

46 citations

Journal Article•DOI•

Incremental Learning With Saliency Map for Moving Object Detection

[...]

Yanwei Pang¹, Li Ye¹, Xuelong Li², Jing Pan¹•Institutions (2)

Tianjin University¹, Chinese Academy of Sciences²

01 Mar 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper proposes to incorporate a saliency map into an incremental subspace analysis framework in which theSaliency map makes the estimated background have less of a chance than the foreground to contain salient objects.

...read moreread less

Abstract: Moving object detection is a key to intelligent video analysis. On the one hand, what moves are not only interesting objects but also noise and cluttered background. On the other hand, moving objects without rich texture are prone to not be detected. Therefore, there are undesirable false alarms and missed alarms in the results of many algorithms of moving object detection. To reduce the false alarms and missed alarms, in this paper we propose to incorporate a saliency map into an incremental subspace analysis framework in which the saliency map makes the estimated background have less of a chance than the foreground (i.e., moving objects) to contain salient objects. The proposed objective function systematically takes into account the properties of sparsity, low rank, connectivity, and saliency. An alternative minimization algorithm is proposed to seek the optimal solutions. The experimental results on both the Perception Test Images Sequences data set and Wallflower data set demonstrate that the proposed method is effective in reducing false alarms and missed alarms.

...read moreread less

36 citations

Journal Article•DOI•

Object Detection and Tracking Under Occlusion for Object-Level RGB-D Video Segmentation

[...]

Qian Xie¹, Oussama Remil¹, Yanwen Guo², Meng Wang³, Mingqiang Wei¹, Jun Wang¹ - Show less +2 more•Institutions (3)

Nanjing University of Aeronautics and Astronautics¹, Nanjing University², Hefei University of Technology³

01 Mar 2018-IEEE Transactions on Multimedia

TL;DR: This work proposes a novel spatiotemporal RGB-D video segmentation framework that automatically segments and tracks objects with continuity and consistency over time and leverages scale-invariant feature transform (SIFT) flow and bilateral representation to solve inconsistency under occlusion.

...read moreread less

Abstract: RGB-D video segmentation is important for many applications, including scene understanding, object tracking, and robotic grasping. However, to segment RGB-D frames over a long video sequence into globally consistent segmentation is still a challenging problem. Current methods often lose pixel correspondences between frames under occlusion and, thus, fail to generate consistent and continuous segmentation results. To address this problem, we propose a novel spatiotemporal RGB-D video segmentation framework that automatically segments and tracks objects with continuity and consistency over time. Our approach first produces consistent segments in some keyframes by region clustering, and then propagates the segmentation result to a whole video sequence via a mask propagation scheme in bilateral space. Instead of exploiting local optical, flow information to establish correspondences between adjacent frames, we leverage scale-invariant feature transform (SIFT) flow and bilateral representation to solve inconsistency under occlusion. Moreover, our method automatically extracts multiple objects of interest and tracks them without any user input hint. A variety of experiments demonstrates effectiveness and robustness of our proposed method.

...read moreread less

33 citations

Patent•

Face detection, representation, and recognition

[...]

Mohamed N. Ahmed¹•Institutions (1)

IBM¹

23 Mar 2018

TL;DR: In an approach to face recognition in an image, one or more computer processors receive an image that includes at least one face and oneor more face parts, and the computer processors extract, from the clustered images, face descriptors.

...read moreread less

Abstract: In an approach to face recognition in an image, one or more computer processors receive an image that includes at least one face and one or more face parts. The one or more computer processors detect the one or more face parts in the image with a face component model. The one or more computer processors cluster the detected one or more face parts with one or more stored images. The one or more computer processors extract, from the clustered images, one or more face descriptors. The one or more computer processors determine a recognition score of the at least one face, based, at least in part, on the extracted one or more face descriptors.

...read moreread less

27 citations

Journal Article•DOI•

Saliency Detection in Face Videos: A Data-Driven Approach

[...]

Mai Xu¹, Yun Ren¹, Zulin Wang¹, Jingxian Liu¹, Xiaoming Tao² - Show less +1 more•Institutions (2)

Beihang University¹, Tsinghua University²

01 Jun 2018-IEEE Transactions on Multimedia

TL;DR: This paper proposes adopting the particle filter (PF) in modeling DGMM for saliency detection of face videos, which is called PF-DGMM and shows that the experimental results show that this approach significantly outperforms other state-of-the-art approaches in saliency detected in face videos.

...read moreread less

Abstract: Recently, videoconferencing has been popular in multimedia systems, such as FaceTime and Skype. In videoconferencing, almost every frame contains a human face. Therefore, it is important to predict human visual attention on face videos by saliency detection, as saliency may be used as a guide to the region of interest for the content-based applications of face videos. In this paper, we propose a data-driven approach for saliency detection in face videos. From the data-driven perspective, we first establish an eye-tracking database that contains fixations of 76 face videos viewed by 40 subjects. Upon the analysis of our database, we find that visual attention is significantly attracted by faces in videos. More important, the attention distribution within face regions varies with regard to mouth movement. Since previous works have investigated that it is efficient to model face saliency in still images using a Gaussian mixture model (GMM), the variation of visual attention in videos can be modeled by dynamic GMM (DGMM). Accordingly, we propose adopting the particle filter (PF) in modeling DGMM for saliency detection of face videos, which is called PF-DGMM. Finally, the experimental results show that our PF-DGMM approach significantly outperforms other state-of-the-art approaches in saliency detection of face videos.

...read moreread less

17 citations

Journal Article•DOI•

Boosted Random Ferns for Object Detection

[...]

Michael Alejandro Villamizar Vergel¹, Juan Andrade-Cetto¹, Alberto Sanfeliu¹, Francesc Moreno-Noguer¹•Institutions (1)

Spanish National Research Council¹

01 Feb 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The Boosted Random Ferns are introduced to rapidly build discriminative classifiers for learning and detecting object categories, and can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times.

...read moreread less

Abstract: In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.

...read moreread less

15 citations

Journal Article•DOI•

Enhancing lifetime of visual sensor networks with a preprocessing-based multi-face detection method

[...]

Hadi S. Aghdasi¹, Shamim Yousefi¹•Institutions (1)

University of Tabriz¹

01 Aug 2018-Wireless Networks

TL;DR: A preprocessing method in camera nodes named Preprocessing-based Multi-Face Detection (PMFD) is proposed, which works based on the extracting bounding box of each object’s face, using Boosting-based face detection algorithm, and sending only the faces’ information to the base station.

...read moreread less

Abstract: Recently, advances in hardware such as CMOS camera nodes have led to the development of Visual Sensor Networks (VSNs) that process sensed data and transmit the useful information to the base station for completing subsequent tasks. Today, object detection and sending useful information to the base station for object recognition is emerged as an important challenging issue in VSNs. Our investigations show that the face’s information is adequate for completing object recognition. According to literature, many approaches have been proposed for object detection and sending useful information to the base station to be completed subsequent tasks like object recognition. However, in most of them, lack of preprocessing methods in camera nodes causes network to be faced with large volume of data. For example when there is more than one object within the each camera node filed-of-view, conventional works deliver empty spaces among objects to the base station. Also, most of them send whole information about each object to the base station, while sending only face’s information of each object is adequate for completing object recognition. Therefore, in this paper, a preprocessing method in camera nodes named Preprocessing-based Multi-Face Detection (PMFD) is proposed. Our method works based on the extracting bounding box of each object’s face, using Boosting-based face detection algorithm, and sending only the faces’ information to the base station. The simulation results show that PMFD method has acceptable preprocessing time complexity and injects low volume of traffic into the network. Consequently, PMFD method prolongs the network lifetime in comparison with state-of-the-art algorithms.

...read moreread less

Journal Article•DOI•

A New Virtual Samples-Based CRC Method for Face Recognition

[...]

Yali Peng¹, Lingjun Li², Lingjun Li¹, Shigang Liu¹, Tao Lei³, Jie Wu² - Show less +2 more•Institutions (3)

Chinese Ministry of Education¹, Shaanxi Normal University², Shaanxi University of Science and Technology³

01 Aug 2018-Neural Processing Letters

TL;DR: Two virtual ‘axis-symmetrical’ face images are generated from an original face image and collaborative representation based classification method (CRC) is adopted to perform classification.

...read moreread less

Abstract: The research of automatic face recognition has attracted much attention from many researchers because of human faces’ uniqueness and usability. However, in the real-world applications, the acquisition equipment of face images is affected by illumination changes, facial expression variations, different postures and other environment factors, resulting in limited number of face images collected. This situation has become an obstacle to the development of face recognition technology. Therefore, in this paper, we utilize the information of the left-half face and right-half face to generate respectively two virtual ‘axis-symmetrical’ face images from an original face image and adopt collaborative representation based classification method (CRC) to perform classification. The first and second virtual face images convey more information of the right-half face and left-half face, respectively. Experiments have been performed on the Extended Yale_B, ORL, AR and FERET face databases and the experimental results show that our method can improve the recognition accuracy effectively.

...read moreread less

Journal Article•DOI•

Group object detection and tracking by combining RPCA and fractal analysis

[...]

Longxin Lin¹, Weiwei Lin², Sibin Huang•Institutions (2)

Jinan University¹, South China University of Technology²

01 Jan 2018

TL;DR: A unified algorithm framework called group object detection and tracking is presented, which detects moving objects by robust principle component analysis (RPCA) and Graph Cut algorithm and tracks objects via fractal analysis simultaneously.

...read moreread less

Abstract: Automatic video analysis is a hot research topic in the field of computer vision and has broad application prospects. It usually consists of three key steps: object detection, object tracking and behavior recognition. Usually, object detection is just considered as the precondition of object tracking, and the correlation between them is very little. So, existing video analysis solutions treat them as independent procedures and execute them separately. Actually, object detection and tracking are related and the effective combination of them can improve the performance of video analysis. This paper mainly studies object detection and tracking, and tries to utilize the outputs of them to optimize their performance by each other. For this purpose, a unified algorithm framework called group object detection and tracking is presented, which detects moving objects by robust principle component analysis (RPCA) and Graph Cut algorithm and tracks objects via fractal analysis simultaneously. The multi-fractal spectrum (MFS) constrain and Graph Cut improve the complement of object detection, which will bring more exact tracking feature. At the same time, the successful results from tracking can provide optimal constrain for object detection in an opposite manner. Therefore, object detection and tracking are grouped and can be improved by an iterative RPCA algorithm. The experimental results of simulation and real sequence demonstrate that the proposed algorithm is more robust and outperforms state-of-art algorithms in object detection and tracking.

...read moreread less

Journal Article•DOI•

Face Deduplication in Video Surveillance

[...]

Qi Chen¹, Li Yang¹, Dongping Zhang¹, Ye Shen¹, Shuying Huang² - Show less +1 more•Institutions (2)

China Jiliang University¹, Jiangxi University of Finance and Economics²

01 Mar 2018-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: A face deduplication system which is combined with face detection and face quality evaluation to obtain the highest quality face image of a person is proposed.

...read moreread less

Abstract: The video surveillance system based on face analysis has played an increasingly important role in the security industry. Compared with identification methods of other physical characteristics, face verification method is easy to be accepted by people. In the video surveillance scene, it is common to capture multiple faces belonging to a same person. We cannot get a good result of face recognition if we use all the images without considering image quality. In order to solve this problem, we propose a face deduplication system which is combined with face detection and face quality evaluation to obtain the highest quality face image of a person. The experimental results in this paper also show that our method can effectively detect the faces and select the high-quality face images, so as to improve the accuracy of face recognition.

...read moreread less

Journal Article•DOI•

Spatiotemporal optical blob reconstruction for object detection in grayscale videos

[...]

Rahul Raman¹, Suman Kumar Choudhury¹, Sambit Bakshi¹•Institutions (1)

National Institute of Technology, Rourkela¹

01 Jan 2018-Multimedia Tools and Applications

TL;DR: A novel blob reconstruction method is introduced that overcomes the mentioned limitation through optical flow based nullification, bifurcation, and unification of detected blobs.

...read moreread less

Abstract: There has been a significant research devoted towards detection of a moving object in an image sequence. Detected moving objects usually contain some errors (some pixels belonging to the object are marked as non-objects and vice versa). To achieve a refined detection of moving object in the video, there is a need of post processing of the binary blobs detected as objects in every frame of the video. This article introduces a novel blob reconstruction method that overcomes the mentioned limitation through optical flow based nullification, bifurcation, and unification of detected blobs. To claim the performance of the proposed method, a comparison is made with ten widely used object detection methods on twenty four standard moving-object scene videos. Comparison is made based on standard parameters like accuracy, precision, recall, and F-measure. The results clearly indicates the efficacy of the proposed method. Following this, results on a priliminary case study on placodal cell migration during early development of ectodermal organ of human and mice has been made employing the proposed model which promisingly tracks the cell migration.

...read moreread less

Journal Article•DOI•

Privacy-preserving face detection based on linear and nonlinear kernels

[...]

Li Wang¹, Jun Jie Shi¹, Chen Chen¹, Sheng Zhong¹•Institutions (1)

Nanjing University¹

01 Mar 2018-Multimedia Tools and Applications

TL;DR: This paper proposes a cryptographic algorithm for the kernel method to process encrypted images without decrypting them, so the owner of these images can have them processed by some classifiers belong to other people without leaking the content ofThese images to these people, and the owner also learns nothing about the classifier.

...read moreread less

Abstract: With the advance of computer vision, some technologies such as face detection and human detection, have been used widely. However, when processing photos through computer vision technologies, we have to face a privacy-related problem : people do not want their photos to be distributed to others even for taking advantage of computer vision. Since kernel method has been used widely in object classifiers, we proposed a cryptographic algorithm for the kernel method to process encrypted images without decrypting them. So the owner of these images can have them processed by some classifiers belong to other people without leaking the content of these images to these people, and the owner also learns nothing about the classifier. In this paper, we analyze the security, correctness and efficiency of our proposed cryptographic algorithms, then approve the effectiveness of them through some face detection experiments.

...read moreread less

Journal Article•DOI•

Visual Tracking Methods for Improved Sequential Image-Based Object Detection

[...]

Timothy S. Murphy¹, Marcus J. Holzinger¹, Brien Flewelling²•Institutions (2)

Georgia Institute of Technology¹, Air Force Research Laboratory²

01 Jan 2018-Journal of Guidance Control and Dynamics

TL;DR: In this article, the random finite set based multi-Bernoulli filter with a detectionless likelihood function was applied to frame-to-frame tracking of space objects observed in electro-optical imagery.

...read moreread less

Abstract: This paper applies the random finite set based multi-Bernoulli filter with a detectionless likelihood function to frame-to-frame tracking of space objects observed in electro-optical imagery for sp...

...read moreread less

Dissertation•

Localizing spatially and temporally objects and actions in videos

[...]

Vasiliki Kalogeiton

02 Jul 2018

TL;DR: An end-to-end multitask objective is introduced that jointly learns object-action relationships and is the first to propose an action tubelet detector that leverages the temporal continuity of videos instead of operating at the frame level, as state-of- the-art approaches do.

...read moreread less

Abstract: The rise of deep learning has facilitated remarkable progress in video understanding. This thesis addresses three important tasks of video understanding: video object detection, joint object and action detection, and spatio-temporal action localization. Object class detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, training an object detector on one domain (either still images or videos) and testing on the other one results in a significant performance gap compared to training and testing on the same domain. In the first part of this thesis, we examine the reasons behind this performance gap. We define and evaluate several domain shift factors: spatial location accuracy, appearance diversity, image quality, aspect distribution, and object size and camera framing. We examine the impact of these factors by comparing the detection performance before and after cancelling them out. The results show that all five factors affect the performance of the detectors and their combined effect explains the performance gap. While most existing approaches for detection in videos focus on objects or human actions separately, in the second part of this thesis we aim at detecting non-human centric actions, i.e., objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting object-action pairs in videos, and show that both tasks of object and action detection benefit from this joint learning. In experiments on the A2D dataset, we obtain state-of-the-art results on segmentation of object-action pairs. In the third part, we are the first to propose an action tubelet detector that leverages the temporal continuity of videos instead of operating at the frame level, as state-of- the-art approaches do. The same way modern detectors rely on anchor boxes, our tubelet detector is based on anchor cuboids by taking as input a sequence of frames and outputing tubelets, i.e., sequences of bounding boxes with associated scores. Our tubelet detector outperforms all state of the art on the UCF-Sports, J-HMDB, and UCF-101 action localization datasets especially at high overlap thresholds. The improvement in detection performance is explained by both more accurate scores and more precise localization.

...read moreread less

Dissertation•

Image context for object detection, object context for part detection

[...]

Abel Gonzalez-Garcia

02 Jul 2018

TL;DR: This thesis proposes an active search strategy for efficient object class detection and a part detection approach that exploits object context, and complements part appearance with the object appearance, its class, and the expected relative location of the parts inside it.

...read moreread less

Abstract: Objects and parts are crucial elements for achieving automatic image understanding. The goal of the object detection task is to recognize and localize all the objects in an image. Similarly, semantic part detection attempts to recognize and localize the object parts. This thesis proposes four contributions. The first two make object detection more efficient by using active search strategies guided by image context. The last two involve parts. One of them explores the emergence of parts in neural networks trained for object detection, whereas the other improves on part detection by adding object context. First, we present an active search strategy for efficient object class detection. Modern object detectors evaluate a large set of windows using a window classifier. Instead, our search sequentially chooses what window to evaluate next based on all the information gathered before. This results in a significant reduction on the number of necessary window evaluations to detect the objects in the image. We guide our search strategy using image context and the score of the classifier. In our second contribution, we extend this active search to jointly detect pairs of object classes that appear close in the image, exploiting the valuable information that one class can provide about the location of the other. This leads to an even further reduction on the number of necessary evaluations for the smaller, more challenging classes. In the third contribution of this thesis, we study whether semantic parts emerge in Convolutional Neural Networks trained for different visual recognition tasks, especially object detection. We perform two quantitative analyses that provide a deeper understanding of their internal representation by investigating the responses of the network filters. Moreover, we explore several connections between discriminative power and semantics, which provides further insights on the role of semantic parts in the network. Finally, the last contribution is a part detection approach that exploits object context. We complement part appearance with the object appearance, its class, and the expected relative location of the parts inside it. We significantly outperform approaches that use part appearance alone in this challenging task.

...read moreread less

Book Chapter•DOI•

Object Detection from Video Sequences Using Deep Learning: An Overview

[...]

Dweepna Garg¹, Ketan Kotecha¹•Institutions (1)

Parul Institute of Engineering and Technology¹

01 Jan 2018

TL;DR: The purpose of the paper is to survey the method with which the objects can be efficiently detected from any given video sequence along with the preferable use of the deep learning library.

...read moreread less

Abstract: One of the challenging topics in the field of computer vision is the detection of the stationary/non-stationary objects from a video sequence. The outcome of detection, tracking, and learning must be free from ambiguity. For effectively detecting the moving object, first the background information from the video should be subtracted. However, in the high-definition video, modeling techniques suffer from high computation and memory cost which may lead to a decrease in performance measure such as accuracy and efficiency in identifying the object accurately. It is important to identify the definite structure from a large amount of unstructured data which is a prerequisite problem to be solved. The task of finding the structure from a large amount of data is achieved using Deep Learning ‘which is about learning multiple levels of representation and abstraction that help to make sense of data such as images, sound, and text’. The purpose of the paper is to survey the method with which the objects can be efficiently detected from any given video sequence along with the preferable use of the deep learning library.

...read moreread less

Journal Article•DOI•

Constant-time monocular object detection using scene geometry

[...]

Marcos Nieto, Juan Diego Ortega, Peter Leškovský, Orti Senderos

01 Nov 2018-Pattern Analysis and Applications

TL;DR: This paper presents a structured approach for efficiently exploiting the perspective information of a scene to enhance the detection of objects in monocular systems that defines a finite grid of 3D positions on the dominant ground plane and computes occupancy maps from which object location estimates are extracted.

...read moreread less

Abstract: This paper presents a structured approach for efficiently exploiting the perspective information of a scene to enhance the detection of objects in monocular systems It defines a finite grid of 3D positions on the dominant ground plane and computes occupancy maps from which object location estimates are extracted This method works on the top of any detection method, either pixel-wise (eg background subtraction) or region-wise (eg detection-by-classification) technique, which can be linked to the proposed scheme with minimal fine tuning Its flexibility thus allows for applying this approach in a wide variety of applications and sectors, such as surveillance applications (eg person detection) or driver assistance systems (eg vehicle or pedestrian detection) Extensive results provide evidence of its excellent performance and its ease of use in combination with different image processing techniques

...read moreread less

Journal Article•DOI•

Rapid Human Finding with Motion Segmentation for Mobile Robot

[...]

Yutong Gao¹, Yutong Gao², Weimin Lei², Xie Xie¹, Yue Fu¹, Lu Zhang¹ - Show less +2 more•Institutions (2)

Shenyang University¹, Northeastern University (China)²

01 Apr 2018-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: A computer vision method is presented for the mobile robot to find humans in scene that achieves high detection accuracy and fast detection speed on both standard testing datasets and real-life images.

...read moreread less

Abstract: A computer vision method is presented for the mobile robot to find humans in scene. Face detection is used for confirming humans. In order to reduce regions of search, optical flow algorithm is used to segment the image in advance. Asymmetric problems in face detection are explained, and relative solutions are put forward by bootstrapping strategy and asymmetric adaboost algorithm. In addition, fisher discriminant analysis further improves the performance of face detection. Multi-view face models are trained to accommodate practical face detection application. At last, experiments demonstrate that our multi-view face detector achieves high detection accuracy and fast detection speed on both standard testing datasets and real-life images.

...read moreread less

Journal Article•DOI•

Face image retrieval: super-resolution based on sketch-photo transformation

[...]

Shu Zhan¹, Jingjing Zhao¹, Yucheng Tang¹, Zhenzhu Xie¹•Institutions (1)

Hefei University of Technology¹

01 Feb 2018

TL;DR: The experimental results show that the corresponding face images can be retrieved according to the input face sketch and super-resolution can effectively enhance the image quality and detail information of the pseudo-photo.

...read moreread less

Abstract: Considering the crucial role of face image in modern intelligent system, face image retrieval has attracted more attention for authentication, surveillance, law enforcement, and security control. In most cases, we cannot obtain the suspect’s face image directly and the best substitute is a face sketch of criminal suspect drawn by artist according to eyewitness description. It is a key step in the criminal investigation process to narrow down criminal suspect using the face sketch. At first, the face sketch is transformed to a pseudo-photo for subsequent utilization. Transformation is performed according to the classic eigenface algorithm and enhanced by super-resolution. Matching between reconstructed pseudo-photo and real face photographs is performed by Hash encoding and iterative quantization. We carried out our ideas on two public face databases, and the sketch face images are generated by photo-shopping software program. The experimental results show that the corresponding face images can be retrieved according to the input face sketch and super-resolution can effectively enhance the image quality and detail information of the pseudo-photo. Hash encoding and iterative quantization achieve the quick search of approximate face images.

...read moreread less