Showing papers in &quot;Iet Computer Vision in 2017&quot;

Computer-aided mammogram diagnosis system using deep learning convolutional fully complex-valued relaxation neural network classifier

TL;DR: The authors present an analytical framework to classify and to evaluate these methods based on some important functional measures, and a categorisation of the state-of-the-art approaches in deep learning for human action recognition is presented.

...read moreread less

Abstract: A study on one of the most important issues in a human action recognition task, i.e. how to create proper data representations with a high-level abstraction from large dimensional noisy video data, is carried out. Most of the recent successful studies in this area are mainly focused on deep learning. Deep learning methods have gained superiority to other approaches in the field of image recognition. In this survey, the authors first investigate the role of deep learning in both image and video processing and recognition. Owing to the variety and plenty of deep learning methods, the authors discuss them in a comparative form. For this purpose, the authors present an analytical framework to classify and to evaluate these methods based on some important functional measures. Furthermore, a categorisation of the state-of-the-art approaches in deep learning for human action recognition is presented. The authors summarise the significantly related works in each approach and discuss their performance.

...read moreread less

60 citations

Journal Article•DOI•

[...]

Saraswathi Duraisamy, Srinivasan Emperumal

Multiple human tracking in RGB‐depth data: a survey

TL;DR: The proposed framework performs well in classifying the digital mammograms as normal, benign or malignant and its subclasses as well and exhibits significant improvement in performance over the traditional methods.

...read moreread less

Abstract: In this study, a novel deep learning-based framework for classifying the digital mammograms is introduced. The development of this methodology is based on deep learning strategies that model the presence of the tumour tissues with level sets. It is difficult to robustly segment mammogram image due to low contrast between normal and lesion tissues. Therefore, Chan-Vese level set method is used to extract the initial contour of mammograms and deep learning convolutional neural network (DL-CNN) algorithm is used to learn the features of mammary-specific mass and microcalcification clusters. To increase the classification accuracy and reduce the false positives, a well-known fully complex-valued relaxation network classifier is used in the last stage of DL-CNN network. Experimental results using the standard benchmarking breast cancer dataset (MIAS and BCDR) show that the proposed method exhibits significant improvement in performance over the traditional methods. Performance measures such as accuracy, sensitivity, specificity, AUC achieved are 99%, 0.9875, 1.0 and 0.9815, respectively. The proposed framework performs well in classifying the digital mammograms as normal, benign or malignant and its subclasses as well.

...read moreread less

56 citations

Journal Article•DOI•

[...]

Massimo Camplani, Adeline Paiement, Majid Mirmehdi, Dima Damen¹, Sion Hannuna¹, Tilo Burghardt, Lili Tao - Show less +3 more•Institutions (1)

Vision Institute¹

01 Jun 2017-Iet Computer Vision

TL;DR: The authors identify and introduce existing, publicly available, benchmark datasets and software resources that fuse colour and depth data for MHT and present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets.

...read moreread less

Abstract: Multiple human tracking (MHT) is a fundamental task in many computer vision applications. Appearance-based approaches, primarily formulated on RGB data, are constrained and affected by problems arising from occlusions and/or illumination variations. In recent years, the arrival of cheap RGB-depth devices has led to many new approaches to MHT, and many of these integrate colour and depth cues to improve each and every stage of the process. In this survey, the authors present the common processing pipeline of these methods and review their methodology based (a) on how they implement this pipeline and (b) on what role depth plays within each stage of it. They identify and introduce existing, publicly available, benchmark datasets and software resources that fuse colour and depth data for MHT. Finally, they present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets.

...read moreread less

42 citations

Journal Article•DOI•

Online multi-person tracking with two-stage data association and online appearance model learning

[...]

Jaeyong Ju, Daehun Kim, Bonhwa Ku, David K. Han, Hanseok Ko - Show less +1 more

Fitting-based optimisation for image visual salient object detection

TL;DR: This study addresses the automatic multi-person tracking problem in complex scenes from a single, static, uncalibrated camera using a sequential tracking-by-detection framework, which can be applied to real-time applications.

...read moreread less

Abstract: This study addresses the automatic multi-person tracking problem in complex scenes from a single, static, uncalibrated camera. In contrast with offline tracking approaches, a novel online multi-person tracking method is proposed based on a sequential tracking-by-detection framework, which can be applied to real-time applications. A two-stage data association is first developed to handle the drifting targets stemming from occlusions and people's abrupt motion changes. Subsequently, a novel online appearance learning is developed by using the incremental/decremental support vector machine with an adaptive training sample collection strategy to ensure reliable data association and rapid learning. Experimental results show the effectiveness and robustness of the proposed method while demonstrating its compatibility with real-time applications.

...read moreread less

34 citations

Journal Article•DOI•

[...]

Yuzhen Niu, Wenqi Lin, Xiao Ke, Lingling Ke

01 Mar 2017-Iet Computer Vision

TL;DR: The authors propose a fitting-based optimisation method for salient object detection algorithms that analyses the quantitative relationship between saliency and ground truth values, and uses the derived relationship to fit the saliency values to the original saliency maps.

...read moreread less

Abstract: To overcome some major problems with traditional saliency evaluation metrics, full-reference image quality assessment (IQA) metrics, which have similar but stricter objectives, are used. Inspired by the root mean absolute error, the authors propose a fitting-based optimisation method for salient object detection algorithms. Their algorithm analyses the quantitative relationship between saliency and ground truth values, and uses the derived relationship to fit the saliency values to the original saliency maps. This ensures that the resulting images, which are composed of fitted values, are closer to the ground truth. The proposed algorithm first computes the statistics of the ground truth and saliency maps computed by each salient object detection algorithm. These statistics are used to compute the parameters of four fitting models, which generally agree with the characteristics of the statistical data. For a new saliency map, they use the fitting model with the computed parameters to obtain the fitted saliency values, which are confined to the range [0, 255]. Finally, they evaluate their saliency optimisation algorithm using traditional evaluation metrics, IQA metrics, and a content-based image retrieval application. The results show that the proposed approach improves the quality of the optimised saliency maps.

...read moreread less

34 citations

Journal Article•DOI•

Fully convolutional networks for action recognition

[...]

Sheng Yu, Yun Cheng, Li Xie, Shaozi Li

06 Jul 2017-Iet Computer Vision

TL;DR: A novel two-stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance is designed and can achieve the state-of-the-art performance on two challenging datasets.

...read moreread less

Abstract: Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two-stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial-temporal features, a linear weighted fusion method is used to fuse two-stream networks’ feature maps and a video pooling method is adopted to construct the video-level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state-of-the-art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).

...read moreread less

29 citations

Journal Article•DOI•

Deep convolutional neural networks for automatic segmentation of left ventricle cavity from cardiac magnetic resonance images

[...]

Xulei Yang¹, Zeng Zeng¹, Su Yi¹•Institutions (1)

Agency for Science, Technology and Research¹

28 Jun 2017-Iet Computer Vision

TL;DR: A localisation-segmentation framework to cater for small object segmentation of left ventricle (LV) cavity from cardiac magnetic resonance (CMR) images, which shows the potentials of deep learning approaches for this particular application.

...read moreread less

Abstract: This work conducts a feasibility study of deep learning approaches for automatic segmentation of left ventricle (LV) cavity from cardiac magnetic resonance (CMR) images. Automatic LV cavity segmentation is a challenging task, partially due to the small size of the object as compared to the large CMR image background, especially at the apex. To cater for small object segmentation, the authors present a localisation-segmentation framework, to first locate the object in the large full image, then segment the object within the small cropped region of interest. The localisation is performed by a deep regression model based on convolutional neural networks, while the segmentation is done by the deep neural networks based on U-Net architecture. They also employ the Dice loss function for the training process of the segmentation models, to investigate its effects on the segmentation performance. The deep learning models are trained and evaluated by using public endocardium-annotated CMR datasets from York University and MICCAI 2009 LV Challenge websites. The average dice metric values of the authors’ proposed framework are 0.91 and 0.93, respectively, on these two databases. These results are promising as compared to the best results achieved by the current state-of-art, which shows the potentials of deep learning approaches for this particular application.

...read moreread less

28 citations

Journal Article•DOI•

Improved sparse representation method for image classification

[...]

Shigang Liu, Lingjun Li, Yali Peng, Guoyong Qiu, Tao Lei - Show less +1 more

12 Jan 2017-Iet Computer Vision

TL;DR: The devised sparse representation method employs both the original and virtual training samples to improve the classification accuracy since the two kinds of training samples makes sample information to be fully exploited in a good way, also satisfactory robustness to be obtained.

...read moreread less

Abstract: Among all image representation and classification methods, sparse representation has proven to be an extremely powerful tool. However, a limited number of training samples are an unavoidable problem for sparse representation methods. Many efforts have been devoted to improve the performance of sparse representation methods. In this study, the authors proposed a novel framework to improve the classification accuracy of sparse representation methods. They first introduced the concept of the approximations of all training samples (i.e., virtual training samples). The advantage of this is that the application of virtual training samples can allow noise in original training samples to be partially reduced. Then they proposed an efficient and competent objective function to disclose more discriminant information between different classes, which is very significant for obtaining a better classification result. The devised sparse representation method employs both the original and virtual training samples to improve the classification accuracy since the two kinds of training samples makes sample information to be fully exploited in a good way, also satisfactory robustness to be obtained. The experimental results on the JAFFE, ORL, Columbia Object Image Library (COIL-100) AR and CMU PIE databases show that the proposed method outperforms the state-of-art image classification methods.

...read moreread less

23 citations

Journal Article•DOI•

Large margin relative distance learning for person re-identification

[...]

Husheng Dong, Shengrong Gong, Chunping Liu, Yi Ji, Shan Zhong - Show less +1 more

Algorithmic optimisation of histogram intersection kernel support vector machine-based pedestrian detection using low complexity features

TL;DR: A large margin relative distance learning (LMRDL) method which learns the metric from triplet constraints, so that the problem of imbalanced sample pairs can be bypassed.

...read moreread less

Abstract: Distance metric learning has achieved great success in person re-identification. Most existing methods that learn metrics from pairwise constraints suffer the problem of imbalanced data. In this study, the authors present a large margin relative distance learning (LMRDL) method which learns the metric from triplet constraints, so that the problem of imbalanced sample pairs can be bypassed. Different from existing triplet-based methods, LMRDL employs an improved triplet loss that enforces penalisation on the triplets with minimal inter-class distance, and this leads to a more stringent constraint to guide the learning. To suppress the large variations of pedestrian's appearance in different camera views, the authors propose to learn the metric over the intra-class subspace. The proposed method is formulated as a logistic metric learning problem with positive semi-definite constraint, and the authors derive an efficient optimisation scheme to solve it based on the accelerated proximal gradient approach. Experimental results show that the proposed method achieves state-of-the-art performance on three challenging datasets (VIPeR, PRID450S, and GRID).

...read moreread less

22 citations

Journal Article•DOI•

[...]

Muhammad Bilal

15 Feb 2017-Iet Computer Vision

TL;DR: This work describes several important enhancements made in the original framework related to the pre-processing steps, feature calculation and training setup and proposes the augmented framework, which stands out in terms of the detection accuracy and computational complexity compared to contemporary detectors.

...read moreread less

Abstract: Histogram intersection kernel support vector machine (SVM) is accepted as a better discriminator than its linear counterpart when used for pedestrian detection in images and video frames. Its computational complexity has, however, limited its use in practical real-time detectors. To circumvent this problem, prior work proposed a low complexity detection framework based on integer-only histograms of oriented gradient features which allow a look-up table-based implementation of kernel SVM leading to further simplification without compromising detection performance. This work describes several important enhancements made in the original framework related to the pre-processing steps, feature calculation and training setup. Resultantly, the augmented framework, proposed in this study, stands out in terms of the detection accuracy and computational complexity compared to contemporary detectors. The best detector described in this study achieves 8 and 2% lesser miss rates (MRs) on ETH and INRIA pedestrian datasets, respectively, compared to the well-known boosting cascades-based aggregate channel feature detector despite avoiding complex floating point operations. Moreover, the proposed detector performs exceptionally better in scenarios where less than 10−2 false positives per image are desired as demonstrated through the MR versus false positive curves.

...read moreread less

20 citations

Journal Article•DOI•

Automatic age estimation from facial profile view

[...]

Ali Maina Bukar, Hassan Ugail

Adaptive skin detection using face location and facial structure estimation

TL;DR: This work uses a pretrained deep residual neural network to extract features, and then utilises a sparse partial least-squares regression approach to estimate ages from the side-view of face images.

...read moreread less

Abstract: In recent years, automatic facial age estimation has gained popularity due to its numerous applications. Much work has been done on frontal images and lately, minimal estimation errors have been achieved on most of the benchmark databases. However, in reality, images obtained in unconstrained environments are not always frontal. For instance, when conducting a demographic study or crowd analysis, one may get profile images of the face. To the best of our knowledge, no attempt has been made to estimate ages from the side-view of face images. Here the authors exploit this by using a pretrained deep residual neural network to extract features, and then utilise a sparse partial least-squares regression approach to estimate ages. Despite having less information as compared with frontal images, the results show that the extracted deep features achieve a promising performance.

...read moreread less

Journal Article•DOI•

[...]

Yong Luo, Ye-Peng Guan

Spider monkey optimisation assisted particle filter for robust object tracking

TL;DR: The face location algorithm is developed to improve the reliability of face detection and extract a face region with a high proportion of skin and facial structure estimation is exploited to further reduce the impact of non-skin factors on dynamic skin colour modelling.

...read moreread less

Abstract: Reliable and accurate facial skin extraction is the most critical and urgent issue for adaptive skin detection. Aiming at resolving this issue, the authors propose an adaptive skin detection method using face location and facial structure estimation. The face location algorithm is developed to improve the reliability of face detection and extract a face region with a high proportion of skin. Facial structure estimation is exploited to further reduce the impact of non-skin factors on dynamic skin colour modelling. The colour space distribution model of extracted facial skin is very close to that of real facial skin. Finally, the skin in an image is obtained by using a hybrid colour space strategy. Extensive experimental comparisons with some state-of-the-art methods have shown the superior performance of the proposed method.

...read moreread less

Journal Article•DOI•

[...]

Rajesh Rohilla, Vanshaj Sikri, Rajiv Kapoor

01 Apr 2017-Iet Computer Vision

TL;DR: The aim of optimisation is to distribute particles in high-likelihood area according to the cognitive effect and improve quality of particles, while the objective of dynamic resampling is to maintain diversity in the particle set.

...read moreread less

Abstract: Particle filters (PFs) are sequential Monte Carlo methods that use particle representation of state-space model to implement the recursive Bayesian filter for non-linear and non-Gaussian systems. Owing to this property, PFs have been extensively used for object tracking in recent years. Although PFs provide a robust object tracking framework, they suffer from shortcomings. Particle degeneracy and particle impoverishment brought by the resampling step result in abysmal construction of posterior probability density function (PDF) of the state. To overcome these problems, this work amalgamates two characteristics of population-based heuristic optimisation algorithms: exploration and exploitation with PF implementing dynamic resampling method. The aim of optimisation is to distribute particles in high-likelihood area according to the cognitive effect and improve quality of particles, while the objective of dynamic resampling is to maintain diversity in the particle set. This work uses very efficient spider monkey optimisation to achieve this. Furthermore, to test the efficiency of the proposed algorithm, experiments were carried out on one-dimensional state estimation problem, bearing only tracking problem, standard videos and synthesised videos. Metrics obtained show that the proposed algorithm outplays simple PF, particle swarm optimisation based PF, and cuckoo search based PF, and effectively handles different challenges inherent in object tracking.

...read moreread less

Journal Article•DOI•

Active learning with label correlation exploration for multi-label image classification

[...]

Jian Wu, Chen Ye, Victor S. Sheng, Jing Zhang¹, Pengpeng Zhao, Zhiming Cui - Show less +2 more•Institutions (1)

Nanjing University¹

Analysis of 2D singularities for mammographic mass classification

TL;DR: The authors propose a novel, semi-supervised multi-label active learning (SSMAL) method that combines automated annotation with human annotation to reduce the annotation workload associated with the active learning process.

...read moreread less

Abstract: Multi-label image classification has attracted considerable attention in machine learning recently. Active learning is widely used in multi-label learning because it can effectively reduce the human annotation workload required to construct high-performance classifiers. However, annotation by experts is costly, especially when the number of labels in a dataset is large. Inspired by the idea of semi-supervised learning, in this study, the authors propose a novel, semi-supervised multi-label active learning (SSMAL) method that combines automated annotation with human annotation to reduce the annotation workload associated with the active learning process. In SSMAL, they capture three aspects of potentially useful information – classification prediction information, label correlation information, and example spatial information – and they use this information to develop an effective strategy for automated annotation of selected unlabelled example-label pairs. The experimental results obtained in this study demonstrate the effectiveness of the authors' proposed approach.

...read moreread less

Journal Article•DOI•

[...]

Rinku Rabidas, Jayasree Chakraborty¹, Abhishek Midya•Institutions (1)

Kettering University¹

On the Improvement of Foreground-Background Model based Object Tracker

TL;DR: The 2D singularities of masses and their surrounding regions with Ripplet-II transform are analyzed to classify them as benign and malignant to quantify the texture information of mammographic regions.

...read moreread less

Abstract: Masses are one of the prevalent early signs of breast cancer, visible in mammogram. However, its variation in shape, size, and appearance often creates hazards in proper diagnosis of mammographic masses. This study analyses the 2D singularities of masses and their surrounding regions with Ripplet-II transform to classify them as benign and malignant. Since benign and malignant masses may change the orientation patterns of normal breast tissues differently, several textural features including Ripplet-II coefficients and statistical co-variates, derived from the Ripplet-II transformed images, are extracted to quantify the texture information of mammographic regions. The important features are then selected using stepwise logistic regression technique and evaluated using linear discriminant analysis and support vector machine with a ten-fold cross-validation. The best performance in terms of the area under the receiver operating characteristic curve of 0.91 ± 0.01 and 0.83 ± 0.01 and accuracy of 87.28 ± 0.02 and 75.60 ± 0.01 are obtained with the proposed method while experimenting with 58 images from the mini-MIAS and 200 images from the Digital Database for Screening Mammography database, respectively.

...read moreread less

Journal Article•DOI•

[...]

Muhammad Shehzad Hanif¹, Shafiq Ahmad, Khurram Khurshid•Institutions (1)

King Abdulaziz University¹

Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition

TL;DR: The authors explore different feature spaces by employing features commonly used in object detection to improve the performance of detector in feature space and propose a robust scale estimation algorithm that estimates the size of the object in the current frame.

...read moreread less

Abstract: In this study, the authors propose two kinds of improvements to a baseline tracker that employs the tracking-by-detection framework. First, they explore different feature spaces by employing features commonly used in object detection to improve the performance of detector in feature space. Second, they propose a robust scale estimation algorithm that estimates the size of the object in the current frame. Their experimental results on the challenging online tracking benchmark-13 dataset show that reduced dimensionality histogram of oriented gradients boosts the performance of the tracker. The proposed scale estimation algorithm provides a significant gain and reduces the failure of the tracker in challenging scenarios. The improved tracker is compared with 13 state-of-the-art trackers. The quantitative and qualitative results show that the performance of the tracker is comparable with the state of the art against initialisation errors, variations in illumination, scale and motion, out-of-plane and in-plane rotations, deformations and low resolution.

...read moreread less

Journal Article•DOI•

[...]

Fenglei Wang, Qiang Guo, Jun Lei, Jun Zhang

Mammographic mass classification according to Bi-RADS lexicon

TL;DR: A system that directly transcribes scene text images to text without character segmentation is developed and achieves competitive performance comparison with the state of the art on several public scene text datasets, including both lexicon-based and non-lexicon ones.

...read moreread less

Abstract: Text recognition in natural scene remains a challenging problem due to the highly variable appearance in unconstrained condition. The authors develop a system that directly transcribes scene text images to text without character segmentation. They formulate the problem as sequence labelling. They build a convolutional recurrent neural network (RNN) by using deep convolutional neural networks (CNN) for modelling text appearance and RNNs for sequence dynamics. The two models are complementary in modelling capabilities and so integrated together to form the segmentation free system. They train a Gaussian mixture model–hidden Markov model to supervise the training of the CNN model. The system is data driven and needs no hand labelled training data. Their method has several appealing properties: (i) It can recognise arbitrary length text images. (ii) The recognition process does not involve sophisticated character segmentation. (iii) It is trained on scene text images with only word-level transcriptions. (iv) It can recognise both the lexicon-based or lexicon-free text. The proposed system achieves competitive performance comparison with the state of the art on several public scene text datasets, including both lexicon-based and non-lexicon ones.

...read moreread less

Journal Article•DOI•

[...]

Ferkous Chokri, Merouani Hayet Farida

01 Apr 2017-Iet Computer Vision

TL;DR: A computer-aided diagnosis system to differentiate between four breast imaging reporting and data system (Bi-RADS) classes in digitised mammograms is proposed, inspired by the approach of the doctor during the radiologic examination.

...read moreread less

Abstract: The goal of this study is to propose a computer-aided diagnosis system to differentiate between four breast imaging reporting and data system (Bi-RADS) classes in digitised mammograms This system is inspired by the approach of the doctor during the radiologic examination as it was agreed in BI-RADS, where masses are described by their form, their boundary and their density The segmentation of masses in the authors’ approach is manual because it is supposed that the detection is already made When the segmented region is available, the features extraction process can be carried out 22 visual characteristics are automatically computed from shape, edge and textural properties; only one human feature is used in this study, which is the patient's age Classification is finally done using a multi-layer perceptron according to two separate schemes; the first one consists of classify masses to distinguish between the four BI-RADS classes (2, 3, 4 and 5) In the second one the authors classify abnormalities on two classes (benign and malign) The proposed approach has been evaluated on 480 mammographic masses extracted from the digital database for screening mammography, and the obtained results are encouraging

...read moreread less

Journal Article•DOI•

Human action recognition using a multi-layered fusion scheme of Kinect Modalities

[...]

Bassem Seddik, Sami Gazzah, Najoua Essoukri Ben Amara

Sample reconstruction with deep autoencoder for one sample per person face recognition

TL;DR: This study addresses the problem of efficiently combining the joint, RGB and depth modalities of the Kinect sensor in order to recognise human actions with a multi-layered fusion scheme that builds specialised local and global SVM models and iteratively fuses their different scores.

...read moreread less

Abstract: This study addresses the problem of efficiently combining the joint, RGB and depth modalities of the Kinect sensor in order to recognise human actions. For this purpose, a multi-layered fusion scheme concatenates different specific features, builds specialised local and global SVM models and then iteratively fuses their different scores. The authors essentially contribute in two levels: (i) they combine the performance of local descriptors with the strength of global bags-of-visual-words representations. They are able then to generate improved local decisions that allow noisy frames handling. (ii) They also study the performance of multiple fusion schemes guided by different features concatenations, Fisher vectors representations concatenation and later iterative scores fusion. To prove the efficiency of their approach, they have evaluated their experiments on two challenging public datasets: CAD-60 and CGC-2014. Competitive results are obtained for both benchmarks.

...read moreread less

Journal Article•DOI•

[...]

Yan Zhang¹, Hua Peng•Institutions (1)

Zhengzhou University of Light Industry¹

Segmentation and semantic labelling of RGBD data with convolutional neural networks and surface fitting

TL;DR: A new algorithm to generalise intra-class variations of multi-sample subjects to single- sample subjects by deep autoencoder and reconstruct new samples is proposed.

...read moreread less

Abstract: One sample per person (OSPP) face recognition is a challenging problem in face recognition community. Lack of samples is the main reason for the failure of most algorithms in OSPP. In this study, the authors propose a new algorithm to generalise intra-class variations of multi-sample subjects to single-sample subjects by deep autoencoder and reconstruct new samples. In the proposed algorithm, a generalised deep autoencoder is first trained with all images in the gallery, then a class-specific deep autoencoder (CDA) is fine-tuned for each single-sample subject with its single sample. Samples of the multi-sample subject, which is most like the single-sample subject, are input to the corresponding CDA to reconstruct new samples. For classification, minimum L2 distance, principle component analysis, sparse represented-based classifier and softmax regression are used. Experiments on the Extended Yale Face Database B, AR database and CMU PIE database are provided to show the validity of the proposed algorithm.

...read moreread less

Journal Article•DOI•

[...]

Giampaolo Pagnutti, Ludovico Minto, Pietro Zanuttigh

Extended social force model-based mean shift for pedestrian tracking under obstacle avoidance

TL;DR: Experimental results show how the proposed approach outperforms state-of-the-art methods and provides an accurate segmentation and labelling of RGBD data.

...read moreread less

Abstract: We present an approach for segmentation and semantic labelling of RGBD data exploiting together geometrical cues and deep learning techniques. An initial over-segmentation is performed using spectral clustering and a set of non-uniform rational B-spline surfaces is fitted on the extracted segments. Then a convolutional neural network (CNN) receives in input colour and geometry data together with surface fitting parameters. The network is made of nine convolutional stages followed by a softmax classifier and produces a vector of descriptors for each sample. In the next step, an iterative merging algorithm recombines the output of the over-segmentation into larger regions matching the various elements of the scene. The couples of adjacent segments with higher similarity according to the CNN features are candidate to be merged and the surface fitting accuracy is used to detect which couples of segments belong to the same surface. Finally, a set of labelled segments is obtained by combining the segmentation output with the descriptors from the CNN. Experimental results show how the proposed approach outperforms state-of-the-art methods and provides an accurate segmentation and labelling.

...read moreread less

Journal Article•DOI•

[...]

Xuguang Zhang¹, Xufeng Zhang¹, Yiming Wang, Hui Yu²•Institutions (2)

Yanshan University¹, University of Portsmouth²

Discriminative feature learning-based pixel difference representation for facial expression recognition

TL;DR: A novel extended social force model-based mean shift tracking algorithm in which pedestrian environment is full taken in consideration is proposed in which obstacles exist and this algorithm achieves an encouraging performance when obstacles exist.

...read moreread less

Abstract: It has been shown that mean shift tracking algorithm can achieve excellent results in pedestrian tracking task. It empirically estimates the target position of current frame by locating the maximum of a density function from the local neighborhood of the target position of previous frame. However, this method only considers its past trajectory without considering the influence of pedestrian environment when applying to pedestrian tracking. In practical, pedestrians always keep a safe distance away from obstacles when programming their paths. To address the issue of obstacle avoidance, this paper proposes a novel extended social force model-based mean shift tracking algorithm in which pedestrian environment is full taken in consideration. Firstly, an extended social force model is presented to quantify the interaction between pedestrian and obstacle by means of force. Furthermore, directional weights and speed weights are introduced to adjust the strength of the force in terms of the difference of individual perspectives and relative velocities. Finally, the initial target position is predicted by Newton's laws of motion and then the Mean Shift method is integrated to track target position. Experiment results show that this algorithm achieves an encouraging performance when obstacles exist.

...read moreread less

Journal Article•DOI•

[...]

Zhe Sun, Zheng-ping Hu, Meng Wang, Shuhuan Zhao

08 Aug 2017-Iet Computer Vision

TL;DR: The proposed discriminative feature learning scheme achieves satisfying recognition results, reaching accuracy rates as high as 91.87% on CK+ database, 82.24% on KDEF database, and 78.94% on CMU Multi-PIE database in the LOSO scenario, which perform better than other comparison methods.

...read moreread less

Abstract: Recently, researchers have proposed different feature descriptors to achieve robust performance for facial expression recognition (FER). However, finding a discriminative feature descriptor remains one of the critical tasks. In this paper, we propose a discriminative feature learning scheme to improve the representation power of expressions. First, we obtain a discriminative feature matrix (DFM) based pixel difference representation. Subsequently, all DFMs corresponding to the training samples are used to construct a discriminative feature dictionary (DFD). Next, DFD is projected on a vertical two-dimensional linear discriminant analysis in direction (V-2DLDA) space to compute between and within-class scatter because V-2DLDA works well with the DFD in matrix representation and achieves good efficiency. Finally, nearest neighbor (NN) classifier is used to determine the labels of the query samples. DFD represents the local feature changes that are robust to the expression, illumination et al. Besides, we exploit V-2DLDA to find an optimal projection matrix since it not only protects the discriminative features but reduces the dimensions. The proposed method achieves satisfying recognition results, reaching accuracy rates as high as 91.87% on CK+ database, 82.24% on KDEF database, and 78.94% on CMU Multi-PIE database in the LOSO scenario, which perform better than other comparison methods.

...read moreread less

Journal Article•DOI•

Data-driven image captioning via salient region discovery

[...]

Mert Kilickaya, Burak Kerim Akkus, Ruket Cakici, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis - Show less +2 more

03 Mar 2017-Iet Computer Vision

TL;DR: An object-based semantic image representation is integrated into a deep features-based retrieval framework to select the relevant images and a novel phrase selection paradigm and sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework are presented.

...read moreread less

Abstract: In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models.

...read moreread less

Journal Article•DOI•

K -means based multiple objects tracking with long-term occlusion handling

[...]

Muhammad Imran Shehzad, Yasir Ali Shah, Zahid Mehmood, Abdul Waheed Malik, Shoaib Azmat - Show less +1 more

Real-time transmission tower detection from video based on a feature descriptor

TL;DR: The proposed multiple objects tracking approach has the capability to deal long term and complete occlusion without any prior training of the shape and motion model of the objects and is cost effective in terms of memory and/or computation as compared with that of the existing state-of-the-art techniques.

...read moreread less

Abstract: This study presents a novel multiple objects tracking (MOT) approach that models object's appearance based on K-means, while introducing a new statistical measure for association of objects after occlusion. The proposed method is tested on several standard datasets dealing complex situations in both indoor and outdoor environments. The experimental results show that the proposed model successfully tracks multiple objects in the presence of occlusion with high accuracy. Moreover, the presented work has the capability to deal long term and complete occlusion without any prior training of the shape and motion model of the objects. Accuracy of the proposed method is comparable with that of the existing state-of-the-art techniques as it successfully deals with all MOT cases in the standard datasets. Most importantly, the proposed method is cost effective in terms of memory and/or computation as compared with that of the existing state-of-the-art techniques. These traits make the proposed system very useful for real-time embedded video surveillance platforms especially those that have low memory/compute resources.

...read moreread less

Journal Article•DOI•

[...]

Alexander Cerón, Iván F. Mondragón¹, Flavio Prieto²•Institutions (2)

Pontifical Xavierian University¹, National University of Colombia²

D-patches: effective traffic sign detection with occlusion handling

TL;DR: A new method for transmission tower detection that involves the use of visual features and the linear content of the scene and a descriptor based on a grid of two-dimensional feature descriptors that is useful not only for object detection, but also for tracking the area of interest.

...read moreread less

Abstract: In this study, the authors propose a new method for transmission tower detection that involves the use of visual features and the linear content of the scene. For this process, they developed a descriptor based on a grid of two-dimensional feature descriptors that is useful not only for object detection, but also for tracking the area of interest. For the detection and classification, they used a support vector machine. The experiments were conducted with a dataset of real world images from transmission tower videos that were used to validate the strategy by comparing it with the ground truth. The results showed that the obtained method is fast and appropriate for tower detection in video sequences of environments that include rural and urban areas. The detection took less than 50 ms and was faster than other methods.

...read moreread less

Journal Article•DOI•

[...]

Yawar Rehman, Irfan Riaz, Xue Fan, Hyunchul Shin

20 Jan 2017-Iet Computer Vision

TL;DR: A novel hypothesis generation scheme that uses a voting and penalisation mechanism to accurately select a true-positive candidate is proposed and achieves 100% detection accuracy on German TSD benchmark and achieves 4.0% better detection accuracy, when compared with other well-known methods (under partially occluded settings), on KTSD dataset.

...read moreread less

Abstract: In advanced driver assistance systems, accurate detection of traffic signs plays an important role in extracting information about the road ahead. However, traffic signs are persistently occluded by vehicles, trees, and other structures on road. Performance of a detector decreases drastically when occlusions are encountered especially when it is trained using full object templates. Therefore, we propose a new method called discriminative patches (d-patches), which is a traffic sign detection (TSD) framework with occlusion handling capability. D-patches are those regions of an object that possess the most discriminative features than their surroundings. They are mined during training and are used for classification instead of the full object templates. Furthermore, we observe that the distribution of redundant-detections around a true-positive is different from that around a false-positive. Based on this observation, we propose a novel hypothesis generation scheme that uses a voting and penalisation mechanism to accurately select a true-positive candidate. We also introduce a new Korean TSD (KTSD) dataset with several evaluation settings to facilitate detector's evaluation under different conditions. The proposed method achieves 100% detection accuracy on German TSD benchmark and achieves 4.0% better detection accuracy, when compared with other well-known methods (under partially occluded settings), on KTSD dataset.

...read moreread less

Journal Article•DOI•

A new low-complexity patch-based image super-resolution

[...]

Pejman Rasti, Kamal Nasrollahi, Olga Orlova¹, Gert Tamberg¹, Cagri Ozcinar, Thomas B. Moeslund, Gholamreza Anbarjafari - Show less +3 more•Institutions (1)

Tallinn University¹