scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 2017"


Proceedings ArticleDOI
24 Mar 2017
TL;DR: In this paper, a subcategory-aware CNN was proposed for object detection and pose estimation, which achieved state-of-the-art performance on both detection and poses estimation on commonly used benchmarks.
Abstract: In Convolutional Neural Network (CNN)-based object detection methods, region proposal becomes a bottleneck when objects exhibit significant scale variation, occlusion or truncation. In addition, these methods mainly focus on 2D object detection and cannot estimate detailed properties of objects. In this paper, we propose subcategory-aware CNNs for object detection. We introduce a novel region proposal network that uses subcategory information to guide the proposal generating process, and a new detection network for joint detection and subcategory classification. By using subcategories related to object pose, we achieve state of-the-art performance on both detection and pose estimation on commonly used benchmarks.

276 citations


Book ChapterDOI
TL;DR: A face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above and allows explicit body contextual reasoning in the network inspired from the intuition of human vision system.
Abstract: Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e., unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g., heavy facial occlusions, extremely low resolutions, strong illumination, exceptional pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. First, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Second, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e., the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.

256 citations


Journal ArticleDOI
TL;DR: This paper revisits the co- saliency detection task and advances its development into a new phase, where the problem setting is generalized to allow the image group to contain objects in arbitrary number of categories and the algorithms need to simultaneously detect multi-class co-salient objects from such complex data.
Abstract: With the goal of discovering the common and salient objects from the given image group, co-saliency detection has received tremendous research interest in recent years. However, as most of the existing co-saliency detection methods are performed based on the assumption that all the images in the given image group should contain co-salient objects in only one category, they can hardly be applied in practice, particularly for the large-scale image set obtained from the Internet. To address this problem, this paper revisits the co-saliency detection task and advances its development into a new phase, where the problem setting is generalized to allow the image group to contain objects in arbitrary number of categories and the algorithms need to simultaneously detect multi-class co-salient objects from such complex data. To solve this new challenge, we decompose it into two sub-problems, i.e., how to identify subgroups of relevant images and how to discover relevant co-salient objects from each subgroup, and propose a novel co-saliency detection framework to correspondingly address the two sub-problems via two-stage multi-view spectral rotation co-clustering. Comprehensive experiments on two publically available benchmarks demonstrate the effectiveness of the proposed approach. Notably, it can even outperform the state-of-the-art co-saliency detection methods, which are performed based on the image subgroups carefully separated by the human labor.

207 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: A framework for object detection in videos is proposed, which consists of a novel tubelet proposal network to efficiently generate spatiotemporal proposals, and a Long Short-term Memory network that incorporates temporal information from tubelet proposals for achieving high object detection accuracy in videos.
Abstract: Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset. Different from object detection in static images, temporal information in videos is vital for object detection. To fully utilize temporal information, state-of-the-art methods [15, 14] are based on spatiotemporal tubelets, which are essentially sequences of associated bounding boxes across time. However, the existing methods have major limitations in generating tubelets in terms of quality and efficiency. Motion-based [14] methods are able to obtain dense tubelets efficiently, but the lengths are generally only several frames, which is not optimal for incorporating long-term temporal information. Appearance-based [15] methods, usually involving generic object tracking, could generate long tubelets, but are usually computationally expensive. In this work, we propose a framework for object detection in videos, which consists of a novel tubelet proposal network to efficiently generate spatiotemporal proposals, and a Long Short-term Memory (LSTM) network that incorporates temporal information from tubelet proposals for achieving high object detection accuracy in videos. Experiments on the large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed framework for object detection in videos.

177 citations


Journal ArticleDOI
TL;DR: This approach will be the first to incorporate deep neural networks for tool detection and localization in RAS videos, and applies a region proposal network (RPN) and a multimodal two stream convolutional network for object detection to jointly predict objectness and localization on a fusion of image and temporal motion cues.
Abstract: Video understanding of robot-assisted surgery (RAS) videos is an active research area. Modeling the gestures and skill level of surgeons presents an interesting problem. The insights drawn may be applied in effective skill acquisition, objective skill assessment, real-time feedback, and human–robot collaborative surgeries. We propose a solution to the tool detection and localization open problem in RAS video understanding, using a strictly computer vision approach and the recent advances of deep learning. We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos. To the best of our knowledge, this approach will be the first to incorporate deep neural networks for tool detection and localization in RAS videos. Our architecture applies a region proposal network (RPN) and a multimodal two stream convolutional network for object detection to jointly predict objectness and localization on a fusion of image and temporal motion cues. Our results with an average precision of 91% and a mean computation time of 0.1 s per test frame detection indicate that our study is superior to conventionally used methods for medical imaging while also emphasizing the benefits of using RPN for precision and efficiency. We also introduce a new data set, ATLAS Dione, for RAS video understanding. Our data set provides video data of ten surgeons from Roswell Park Cancer Institute, Buffalo, NY, USA, performing six different surgical tasks on the daVinci Surgical System (dVSS) with annotations of robotic tools per frame.

165 citations


Proceedings ArticleDOI
26 Jul 2017
TL;DR: The paper introduced the basic concept and architecture of CNN, some public datasets of object detection and the concept of evaluation criterion, and combed the current research achievements and thoughts ofobject detection, summarizing the important progress and discussing the future directions.
Abstract: With the development of intelligent device and social media, the data bulk on Internet has grown with high speed. As an important aspect of image processing, object detection has become one of the international popular research fields. In recent years, the powerful ability with feature learning and transfer learning of Convolutional Neural Network (CNN) has received growing interest within the computer vision community, thus making a series of important breakthroughs in object detection. So it is a significant survey that how to apply CNN to object detection for better performance. First the paper introduced the basic concept and architecture of CNN. Secondly the methods that how to solve the existing problems of conventional object detection are surveyed, mainly analyzing the detection algorithm based on region proposal and based on regression. Thirdly it mentioned some means which improve the performance of object detection. Then the paper introduced some public datasets of object detection and the concept of evaluation criterion. Finally, it combed the current research achievements and thoughts of object detection, summarizing the important progress and discussing the future directions.

150 citations


Journal ArticleDOI
TL;DR: A highly efficient and robust integrated geospatial object detection framework based on faster region-based convolutional neural network (Faster R-CNN) is proposed in this paper, which realizes the integrated procedure by sharing features between the region proposal generation stage and the object detection stage.
Abstract: Geospatial object detection from high spatial resolution (HSR) remote sensing imagery is a significant and challenging problem when further analyzing object-related information for civil and engineering applications. However, the computational efficiency and the separate region generation and localization steps are two big obstacles for the performance improvement of the traditional convolutional neural network (CNN)-based object detection methods. Although recent object detection methods based on CNN can extract features automatically, these methods still separate the feature extraction and detection stages, resulting in high time consumption and low efficiency. As a significant influencing factor, the acquisition of a large quantity of manually annotated samples for HSR remote sensing imagery objects requires expert experience, which is expensive and unreliable. Despite the progress made in natural image object detection fields, the complex object distribution makes it difficult to directly deal with the HSR remote sensing imagery object detection task. To solve the above problems, a highly efficient and robust integrated geospatial object detection framework based on faster region-based convolutional neural network (Faster R-CNN) is proposed in this paper. The proposed method realizes the integrated procedure by sharing features between the region proposal generation stage and the object detection stage. In addition, a pre-training mechanism is utilized to improve the efficiency of the multi-class geospatial object detection by transfer learning from the natural imagery domain to the HSR remote sensing imagery domain. Extensive experiments and comprehensive evaluations on a publicly available 10-class object detection dataset were conducted to evaluate the proposed method.

139 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel multi-task learning (MTL) method to jointly model object detection and distance prediction with a Cartesian product-based multi- task combination strategy, and mathematically proves that the proposed Cartesian products-based combination strategy is more optimal than the linear multi- Task combination strategy that is usually used in MTL models, when themulti-task itself is not independent.

124 citations


Proceedings ArticleDOI
10 Apr 2017
TL;DR: Faster Regions with CNNs (R-CNNs), a state-of-the-art algorithm, is applied, to detect not one or two but hundreds of object types in near real-time.
Abstract: Real-time object detection is crucial for many applications of Unmanned Aerial Vehicles (UAVs) such asreconnaissance and surveillance, search-and-rescue, and infras-tructure inspection. In the last few years, Convolutional NeuralNetworks (CNNs) have emerged as a powerful class of modelsfor recognizing image content, and are widely considered inthe computer vision community to be the de facto standardapproach for most problems. However, object detection basedon CNNs is extremely computationally demanding, typicallyrequiring high-end Graphics Processing Units (GPUs) thatrequire too much power and weight, especially for a lightweightand low-cost drone. In this paper, we propose moving thecomputation to an off-board computing cloud, while keepinglow-level object detection and short-term navigation onboard. We apply Faster Regions with CNNs (R-CNNs), a state-of-the-art algorithm, to detect not one or two but hundreds of objecttypes in near real-time.

112 citations


Proceedings ArticleDOI
01 Sep 2017
TL;DR: A new deep learning based face recognition attendance system that is composed of several essential steps developed using today's most advanced techniques: CNN cascade for face detection and CNN for generating face embeddings.
Abstract: In the interest of recent accomplishments in the development of deep convolutional neural networks (CNNs) for face detection and recognition tasks, a new deep learning based face recognition attendance system is proposed in this paper. The entire process of developing a face recognition model is described in detail. This model is composed of several essential steps developed using today's most advanced techniques: CNN cascade for face detection and CNN for generating face embeddings. The primary goal of this research was the practical employment of these state-of-the-art deep learning approaches for face recognition tasks. Due to the fact that CNNs achieve the best results for larger datasets, which is not the case in production environment, the main challenge was applying these methods on smaller datasets. A new approach for image augmentation for face recognition tasks is proposed. The overall accuracy was 95.02% on a small dataset of the original face images of employees in the real-time environment. The proposed face recognition model could be integrated in another system with or without some minor alternations as a supporting or a main component for monitoring purposes.

109 citations


Proceedings ArticleDOI
24 Mar 2017
TL;DR: This paper systematically investigates the potential of Fast R-CNN and Faster R- CNN for aerial images, which achieve top performing results on common detection benchmark datasets and proposes an own network that clearly outperforms state-of-the-art methods for vehicle detection in aerial images.
Abstract: Vehicle detection in aerial images is a crucial image processing step for many applications like screening of large areas. In recent years, several deep learning based frameworks have been proposed for object detection. However, these detectors were developed for datasets that considerably differ from aerial images. In this paper, we systematically investigate the potential of Fast R-CNN and Faster R-CNN for aerial images, which achieve top performing results on common detection benchmark datasets. Therefore, the applicability of 8 state-of-the-art object proposals methods used to generate a set of candidate regions and of both detectors is examined. Relevant adaptations of the object proposals methods are provided. To overcome shortcomings of the original approach in case of handling small instances, we further propose our own network that clearly outperforms state-of-the-art methods for vehicle detection in aerial images. All experiments are performed on two publicly available datasets to account for differing characteristics such as ground sampling distance, number of objects per image and varying backgrounds.

Proceedings ArticleDOI
23 Oct 2017
TL;DR: The results show that each component in the multispectral image was individually useful for the task of object detection when applied to different types of objects, and the mean average precision (mAP) of mult ispectral object detection is 13% higher than that of RGB-only object detection.
Abstract: Recently, researchers have actively conducted studies on mobile robot technologies that involve autonomous driving. To implement an automatic mobile robot (e.g., an automated driving vehicle) in traffic, robustly detecting various types of objects such as cars, people, and bicycles in various conditions such as daytime and nighttime is necessary. In this paper, we propose the use of multispectral images as input information for object detection in traffic. Multispectral images are composed of RGB images, near-infrared images, middle-infrared images, and far-infrared images and have multilateral information as a whole. For example, some objects that cannot be visually recognized in the RGB image can be detected in the far-infrared image. To train our multispectral object detection system, we need a multispectral dataset for object detection in traffic. Since such a dataset does not currently exist, in this study we generated our own multispectral dataset. In addition, we propose a multispectral ensemble detection pipeline to fully use the features of multispectral images. The pipeline is divided into two parts: the single-spectral detection model and the ensemble part. We conducted two experiments in this work. In the first experiment, we evaluate our single-spectral object detection model. Our results show that each component in the multispectral image was individually useful for the task of object detection when applied to different types of objects. In the second experiment, we evaluate the entire multispectral object detection system and show that the mean average precision (mAP) of multispectral object detection is 13% higher than that of RGB-only object detection.

Journal ArticleDOI
TL;DR: This paper proposes a new feature descriptor called common encoding model for heterogeneous face recognition, which is able to capture common discriminant information, such that the large modality gap can be significantly reduced at the feature extraction stage.
Abstract: Heterogeneous face recognition is an important, yet challenging problem in face recognition community. It refers to matching a probe face image to a gallery of face images taken from alternate imaging modality. The major challenge of heterogeneous face recognition lies in the great discrepancies between different image modalities. Conventional face feature descriptors, e.g., local binary patterns, histogram of oriented gradients, and scale-invariant feature transform, are mostly designed in a handcrafted way and thus generally fail to extract the common discriminant information from the heterogeneous face images. In this paper, we propose a new feature descriptor called common encoding model for heterogeneous face recognition, which is able to capture common discriminant information, such that the large modality gap can be significantly reduced at the feature extraction stage. Specifically, we turn a face image into an encoded one with the encoding model learned from the training data, where the difference of the encoded heterogeneous face images of the same person can be minimized. Based on the encoded face images, we further develop a discriminant matching method to infer the hidden identity information of the cross-modality face images for enhanced recognition performance. The effectiveness of the proposed approach is demonstrated (on several public-domain face datasets) in two typical heterogeneous face recognition scenarios: matching NIR faces to VIS faces and matching sketches to photographs.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: Various face detection algorithms are discussed and analyzed like Viola-Jones, SMQT features & SNOW Classifier, Neural Network-Based Face Detection and Support Vector Machine-Based face detection and all these face detection methods are compared based on the precision and recall value calculated using a DetEval Software.
Abstract: With the tremendous increase in video and image database there is a great need of automatic understanding and examination of data by the intelligent systems as manually it is becoming out of reach. Narrowing it down to one specific domain, one of the most specific objects that can be traced in the images are people i.e. faces. Face detection is becoming a challenge by its increasing use in number of applications. It is the first step for face recognition, face analysis and detection of other features of face. In this paper, various face detection algorithms are discussed and analyzed like Viola-Jones, SMQT features & SNOW Classifier, Neural Network-Based Face Detection and Support Vector Machine-Based face detection. All these face detection methods are compared based on the precision and recall value calculated using a DetEval Software which deals with precised values of the bounding boxes around the faces to give accurate results.

Journal ArticleDOI
TL;DR: In this paper, a novel superpixel-based face sketch–photo synthesis method is presented by estimating the face structures through image segmentation by first segmented into superpixels, which are then dilated to enhance the compatibility of neighboringsuperpixels.
Abstract: Face sketch–photo synthesis technique has attracted growing attention in many computer vision applications, such as law enforcement and digital entertainment. Existing methods either simply perform the face sketch–photo synthesis on the holistic image or divide the face image into regular rectangular patches ignoring the inherent structure of the face image. In view of such situations, this paper presents a novel superpixel-based face sketch–photo synthesis method by estimating the face structures through image segmentation. In our proposed method, face images are first segmented into superpixels, which are then dilated to enhance the compatibility of neighboring superpixels. Each input face image induces a specific graphical structure modeled by Markov networks. We employ a two-stage synthesis process to learn the face structures through Markov networks constructed from two scales of dilation, respectively. Experiments on several public databases demonstrate that our proposed face sketch–photo synthesis method achieves superior performance compared with the state-of-the-art methods.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed scheme can outperform modern moving object detection methods in terms of precision, recall, F-measure, and other measurements.
Abstract: Moving object extraction is the core of event detection in video surveillance. Although many related methods have been proposed to extract moving objects, even advanced applications still encounter cavity problems, which are false detection and deficiencies resulting from cavities inside the body or fragmented foreground objects. In this paper, an entirely new structure for extracting moving objects is proposed. This scheme is based on the concepts of hysteresis thresholding and motion compensation, which constitute spatial and temporal compensations, respectively. Experimental results show that the proposed scheme can outperform modern moving object detection methods in terms of precision, recall, F -measure, and other measurements.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: Object detection and tracking is reviewed in dynamic environment in various events such as sports, public safety, and management of traffic.
Abstract: Image processing is a method of extracting some useful information by converting image into digital inform by performing some operations on it. Object detection and tracking are the task that is important and challenging such as video surveillance and vehicle navigation. Video surveillance is a technology which works in dynamic environment in various events such as sports, public safety, and management of traffic. This paper reviews the various challenges and aspects of detection and tracking of objects.

Proceedings ArticleDOI
Liping Yuan1, Zhiyi Qu1, Yufeng Zhao1, Hongshuai Zhang1, Qing Nian 
25 Mar 2017
TL;DR: Experimental results show that the proposed Convolutional Neural Network based on TensorFlow, an open source deep learning framework, has better recognition accuracy and higher robustness in complex environment.
Abstract: Face recognition is a hot research field in computer vision, and it has a high practical value for the detection and recognition of specific sensitive characters. Research found that in traditional hand-crafted features, there are uncontrolled environments such as pose, facial expression, illumination and occlusion influencing the accuracy of recognition and it has poor performance, so the deep learning method is adopted. On the basis of face detection, a Convolutional Neural Network (CNN) based on TensorFlow, an open source deep learning framework, is proposed for face recognition. Experimental results show that the proposed method has better recognition accuracy and higher robustness in complex environment.

Journal ArticleDOI
TL;DR: This letter presents a novel weighted low-rank matrix recovery (WLRR) model for salient object detection, which shows competitive results as compared with 24 state-of-the-art methods.
Abstract: Image-based salient object detection is a useful and important technique, which can promote the efficiency of several applications such as object detection, image classification/retrieval, object co-segmentation, and content-based image editing. In this letter, we present a novel weighted low-rank matrix recovery (WLRR) model for salient object detection. In order to facilitate efficient salient objects-background separation, a high-level background prior map is estimated by employing the property of the color, location, and boundary connectivity, and then this prior map is ensembled into a weighting matrix which indicates the likelihood that each image region belongs to the background. The final salient object detection task is formulated as the WLRR model with the weighting matrix. Both quantitative and qualitative experimental results on three challenging datasets show competitive results as compared with 24 state-of-the-art methods.

Journal ArticleDOI
TL;DR: A fast and robust face occlusion detection algorithm for ATM surveillance, which is demonstrated to be effective and efficient to handle arbitrarily occluded faces.

Journal ArticleDOI
TL;DR: Compared with existing state-of-the-art pedestrian detection algorithms, the proposed fusion saliency-based region of interest (ROI) detection method demonstrates a much higher pedestrian detection rate with a comparably short processing time.
Abstract: Night time pedestrian detection is more and more important in advanced driver assistant systems (ADAS). Traditional pedestrian detection algorithms in far infrared (FIR) images lack accuracy and have long processing times. Focusing on this issue, in this paper, a visual saliency-based pedestrian detection algorithm is proposed. First, areas that contain suspected pedestrians are detected using a fusion saliency-based method. Then, the sub-image of the suspected pedestrian is used as an input to a histogram of local intensity difference feature and cross kernel-based support vector machine classifier to make a final determination. Experiments performed using a real FIR road image data set demonstrated that the proposed fusion saliency-based region of interest (ROI) detection method has the largest pedestrian inclusion rate and the smallest ROI proportion compared with three other methods. Besides, compared with existing state-of-the-art pedestrian detection algorithms, the proposed method demonstrates a much higher pedestrian detection rate with a comparably short processing time.

Journal ArticleDOI
TL;DR: An effective method for accurate object detection, which is inspired by the mechanism of memory and prediction in the authors' brain, and a memory-based prediction model which is specially designed to predict potential object locations in the surveillance scenes are proposed.

Journal ArticleDOI
TL;DR: A novel approach to face recognition which simultaneously tackles three combined challenges: 1) uneven illumination; 2) partial occlusion; and 3) limited training data, and it is shown that the new method performs competitively even when the training images are corrupted.
Abstract: In this paper, we introduce a novel approach to face recognition which simultaneously tackles three combined challenges: 1) uneven illumination; 2) partial occlusion; and 3) limited training data. The new approach performs lighting normalization, occlusion de-emphasis and finally face recognition, based on finding the largest matching area (LMA) at each point on the face, as opposed to traditional fixed-size local area-based approaches. Robustness is achieved with novel approaches for feature extraction, LMA-based face image comparison and unseen data modeling. On the extended YaleB and AR face databases for face identification, our method using only a single training image per person, outperforms other methods using a single training image, and matches or exceeds methods which require multiple training images. On the labeled faces in the wild face verification database, our method outperforms comparable unsupervised methods. We also show that the new method performs competitively even when the training images are corrupted.

Proceedings ArticleDOI
06 Jun 2017
TL;DR: This paper introduces an improved scheme for generating anchor proposals and proposes a modification to Faster R-CNN which leverages higher-resolution feature maps for small objects and evaluates the approach on the FlickrLogos dataset.
Abstract: Many modern approaches for object detection are two-staged pipelines. The first stage identifies regions of interest which are then classified in the second stage. Faster R-CNN is such an approach for object detection which combines both stages into a single pipeline. In this paper we apply Faster R-CNN to the task of company logo detection. Motivated by its weak performance on small object instances, we examine in detail both the proposal and the classification stage with respect to a wide range of object sizes. We investigate the influence of feature map resolution on the performance of those stages. Based on theoretical considerations, we introduce an improved scheme for generating anchor proposals and propose a modification to Faster R-CNN which leverages higher-resolution feature maps for small objects. We evaluate our approach on the FlickrLogos dataset improving the RPN performance from 0.52 to 0.71 (MABO) and the detection performance from 0.52 to $0.67$ (mAP).

Proceedings ArticleDOI
01 Aug 2017
TL;DR: Results show this method is quite fast and effective in detecting cars in real time CCTV footages.
Abstract: In this paper we would describe a vehicle detection technique that can be used for traffic surveillance systems. An intelligent traffic surveillance system, equipped with electronic devices, works by communicating with moving vehicles about traffic conditions, monitor rules and regulations and avoid collision between cars. Therefore the first step in this process is the detection of cars. The system uses Haar like features for vehicle detection, which is generally used for face detection. Haar feature-based cascade classifiers are an effective object detection method first proposed by Viola and Jones. It's a machine learning based technique which uses a set of positive and negative images for training purpose. Results show this method is quite fast and effective in detecting cars in real time CCTV footages.

Proceedings ArticleDOI
02 Jun 2017
TL;DR: A novel dataset from target real-world surveillance videos is constructed automatically and incrementally with the process of face detection, tracking, labeling and purifying, and a convolutional neural network with the labeled dataset is fine-tuned.
Abstract: Robust face recognition in real-world surveillance videos is a challenging but important issue due to the needs of practical applications such as security monitoring. While current face recognition systems perform well in relatively constrained scenes, they tend to suffer from variations in pose, illumination or facial expression in real-world surveillance videos. In this paper, we propose a method for face recognition in real-world surveillance videos by deep learning. First, a novel dataset from target real-world surveillance videos is constructed automatically and incrementally with the process of face detection, tracking, labeling and purifying. Then, a convolutional neural network with the labeled dataset is fine-tuned. On the testing dataset collected from the campus surveillance system, the network after fine-tuning achieves recognition accuracy of 92.1%, which obviously outperforms the network without fine-tuning, which returns a recognition accuracy of 83.6%.

Proceedings ArticleDOI
17 Jul 2017
TL;DR: It is demonstrated that it can be also used for face detection from low resolution thermal images, acquired with a portable camera, and the current state of the art in the area of image classification and face tracking in thermography was significantly outperformed.
Abstract: Recently, capabilities of many computer vision tasks have significantly improved due to advances in Convolutional Neural Networks. In our research, we demonstrate that it can be also used for face detection from low resolution thermal images, acquired with a portable camera. The physical size of the camera used in our research allows for embedding it in a wearable device or indoor remote monitoring solution for elderly and disabled people. The benefits of the proposed architecture were experimentally verified on the thermal video sequences, acquired in various scenarios to address possible limitations of remote diagnostics: movements of the person performing a diagnose and movements of the examined person. The achieved short processing time (42.05±0.21ms) along with high model accuracy (false positives - 0.43%; true positives for the patient focused on a certain task - 89.2%) clearly indicates that the current state of the art in the area of image classification and face tracking in thermography was significantly outperformed.

Journal ArticleDOI
TL;DR: It is proved that good results can be obtained by exploiting color and texture information in a multi-stage process: pre-selection, fine-selection and post processing.
Abstract: An approach for candidate pre-selection based on corners and color information.A robust approach for object detection and recognition; Bag of Words and Deep Neural Networks are compared.A post-processing step, used to combine multiple detection of the same object.A deep experimental evaluation on the complex Grozi-120 public dataset. Object detection and recognition are challenging computer vision tasks receiving great attention due to the large number of applications. This work focuses on the detection/recognition of products in supermarket shelves; this framework has a number of practical applications such as providing additional product/price information to the user or guiding visually impaired customers during shopping. The automatic creation of planograms (i.e., actual layout of products on shelves) is also useful for commercial analysis and management of large stores.Although in many object detection/recognition contexts it can be assumed that training images are representative of the real operational conditions, in our scenario such assumption is not realistic because the only training images available are acquired in well-controlled conditions. This gap between the training and test data makes the object detection and recognition tasks far more complex and requires very robust techniques. In this paper we prove that good results can be obtained by exploiting color and texture information in a multi-stage process: pre-selection, fine-selection and post processing. For fine-selection we compared a classical Bag of Words technique with a more recent Deep Neural Networks approach and found interesting outcomes. Extensive experiments on datasets of varying complexity are discussed to highlight the main issues characterizing this problem, and to guide toward the practical development of a real application.

Journal ArticleDOI
Zhe Chen1, Zhen Zhang1, Fengzhao Dai, Yang Bu, Huibin Wang1 
03 Aug 2017-Sensors
TL;DR: The global contrast of various features is used to initially identify the region of interest (ROI), which is then filtered by the image segmentation method, producing the final underwater object detection results.
Abstract: In this paper, we propose an underwater object detection method using monocular vision sensors. In addition to commonly used visual features such as color and intensity, we investigate the potential of underwater object detection using light transmission information. The global contrast of various features is used to initially identify the region of interest (ROI), which is then filtered by the image segmentation method, producing the final underwater object detection results. We test the performance of our method with diverse underwater datasets. Samples of the datasets are acquired by a monocular camera with different qualities (such as resolution and focal length) and setups (viewing distance, viewing angle, and optical environment). It is demonstrated that our ROI detection method is necessary and can largely remove the background noise and significantly increase the accuracy of our underwater object detection method.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: The paper focuses on the framework design and the working principle of the models and analyzes the model performance in the real-time and the accuracy of detection, and discusses the challenges in the object detection based on deep learning and offers some solutions for reference.
Abstract: The object detection based on deep learning is an important application in deep learning technology, which is characterized by its strong capability of feature learning and feature representation compared with the traditional object detection methods. The paper first makes an introduction of the classical methods in object detection, and expounds the relation and difference between the classical methods and the deep learning methods in object detection. Then it introduces the emergence of the object detection methods based on deep learning and elaborates the most typical methods nowadays in the object detection via deep learning. In the statement of the methods, the paper focuses on the framework design and the working principle of the models and analyzes the model performance in the real-time and the accuracy of detection. Eventually, it discusses the challenges in the object detection based on deep learning and offers some solutions for reference.