scispace - formally typeset
Search or ask a question

Showing papers in "Iet Computer Vision in 2014"


Journal ArticleDOI
TL;DR: Experimental results show that just a small amount of Z Ms features is sufficient to achieve the recognition rates that rival other established methods, and so ZMs features can be regarded as a powerful discriminatory feature for automatic target recognition applications relevant to SAR imagery.
Abstract: In the present study, a new algorithm for automatic target detection (ATR) in synthetic aperture radar (SAR) images has been proposed. First, moving and stationary target acquisition and recognition image chips have been segmented and then passed to a number of preprocessing stages such as histogram equalisation, position and size normalisation. Second, the feature extraction based on Zernike moments (ZMs) having linear transformation invariance properties and robustness in the presence of the noise has been introduced for the first time. Third, a genetic algorithm-based feature selection and a support vector machine classifier have been presented to select the optimal feature subset of ZMs for decreasing the computational complexity. Experimental results demonstrate the efficiency of the proposed approach in target recognition of SAR imagery. The authors obtained results show that just a small amount of ZMs features is sufficient to achieve the recognition rates that rival other established methods, and so ZMs features can be regarded as a powerful discriminatory feature for automatic target recognition applications relevant to SAR imagery. Furthermore, it can be observed that the classifier performs fairly well until the signal-to-noise ratio falls beneath 5 dB for noisy images.

113 citations


Journal ArticleDOI
TL;DR: A novel framework for face recognition based on the fusion of global and local HOG features has been proposed and shows that, in comparison with 12 state-of-the-art approaches of face recognition, the proposed method achieves the highest average recognition rate.
Abstract: Histogram of oriented gradients (HOG) descriptor was initially applied to human detection and achieved great success. In recent years, HOG descriptor has also been applied to face recognition. However, comparing with other sophisticated feature descriptors such as LBP, Gabor and so on, there are still considerable research space on the application of HOG features for face recognition. There are two main contributions. On one hand, the main parameters are statistically analysed characterising HOG descriptor for face recognition, which seems to be not discussed clearly in literatures so far. On the other hand, a novel framework for face recognition based on the fusion of global and local HOG features has been proposed. Face images are first illumination normalised by the DoG filter. Secondly, global and local HOG features are extracted by PCA + LDA or LDA with different framework. Finally, in decision level, global and local classifiers are built by the nearest neighbour classifier, after that, two classifiers are fused by a weighted sum rule. Experimental results on two large-scale face databases FERET and CAS-PEAL-R1 show that, in comparison with 12 state-of-the-art approaches of face recognition, the proposed method achieves the highest average recognition rate.

55 citations


Journal ArticleDOI
TL;DR: This study proposes a method of single image haze removal using content-adaptive dark channel and post enhancement, and demonstrates that the proposed method significantly improves the visibility of the hazy image.
Abstract: As a challenging problem, image haze removal plays an important role in computer vision applications. The dark channel prior has been widely studied for haze removal since it is simple and effective; however, it still suffers from over-saturation, artefacts and dark-look. To resolve these problems, this study proposes a method of single image haze removal using content-adaptive dark channel and post enhancement. The main contributions of this work are as follows: first, an associative filter, which can transfer the structures of a reference image and the grey levels of a coarse image to the filtering output, is employed to compute the dark channel efficiently and effectively. Secondly, the dark channel confidence is utilised to restrict the dark channel based on the content of the image. Finally, a post enhancement method is devised to map the luminance of the restored haze-free image with the preservation of local contrast. Experimental results demonstrate that the proposed method significantly improves the visibility of the hazy image.

51 citations


Journal ArticleDOI
TL;DR: This study summarises several developments in recent literature and discusses the various available methods used in person re-identification to achieve a higher-accuracy rate and lowercomputational costs.
Abstract: Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lower-computational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.

37 citations


Journal ArticleDOI
TL;DR: The challenges and applications of visual tracking are introduced, and the state-of-the-art online-learning-based tracking methods by category are discussed, including detail descriptions of representative methods in each category, and their pros and cons are examined.
Abstract: Visual tracking is a popular and challenging topic in computer vision and robotics. Owing to changes in the appearance of the target and complicated variations that may occur in various scenes, online learning scheme is necessary for advanced visual tracking framework to adopt. This paper briefly introduces the challenges and applications of visual tracking and focuses on discussing the state-of-the-art online-learning-based tracking methods by category. We provide detail descriptions of representative methods in each category, and examine their pros and cons. Moreover, several most representative algorithms are implemented to provide quantitative reference. At last, we outline several trends for future visual tracking research.

32 citations


Journal ArticleDOI
TL;DR: This study proposes a novel FER algorithm by exploiting the structural characteristics and the texture information hiding in the image space and shows the significant advantages of the proposed method over the existing ones.
Abstract: Facial expression recognition (FER) plays an important role in human-computer interaction. The recent years have witnessed an increasing trend of various approaches for the FER, but these approaches usually do not consider the effect of individual differences to the recognition result. When the face images change from neutral to a certain expression, the changing information constituted of the structural characteristics and the texture information can provide rich important clues not seen in either face image. Therefore it is believed to be of great importance for machine vision. This study proposes a novel FER algorithm by exploiting the structural characteristics and the texture information hiding in the image space. Firstly, the feature points are marked by an active appearance model. Secondly, three facial features, which are feature point distance ratio coefficient, connection angle ratio coefficient and skin deformation energy parameter, are proposed to eliminate the differences among the individuals. Finally, a radial basis function neural network is utilised as the classifier for the FER. Extensive experimental results on the Cohn-Kanade database and the Beihang University (BHU) facial expression database show the significant advantages of the proposed method over the existing ones.

28 citations


Journal ArticleDOI
TL;DR: This study shows how stereo and time-of-flight images can be combined for non-destructive automatic leaf area measurements and shows that combining stereo and ToF images gives superior qualitative and quantitative results.
Abstract: Leaf area measurements are commonly obtained by destructive and laborious practice. This study shows how stereo and time-of-flight (ToF) images can be combined for non-destructive automatic leaf area measurements. The authors focus on some challenging plant images captured in a greenhouse environment, and show that even the state-of-the-art stereo methods produce unsatisfactory results. By transforming depth information in a ToF image to a localised search range for dense stereo, a global optimisation strategy is adopted for producing smooth results that preserve discontinuity. They also use edges of colour and disparity images for automatic leaf detection and develop a smoothing method necessary for accurately estimating surface area. In addition to show that combining stereo and ToF images gives superior qualitative and quantitative results, 149 automatic measurements on leaf area using the authors system in a validation trial have a correlation of 0.97 with true values and the root-mean-square error is 10.97 cm(2), which is 9.3% of the average leaf area. Their approach could potentially be applied for combining other modalities of images with large difference in image resolutions and camera positions.

27 citations


Journal ArticleDOI
TL;DR: An active contour model for vascular segmentation has been proposed, by defining a new, local, feature fitting, energy function and introducing this feature into the fitting process, which exhibits greater accuracy when compared to existing models.
Abstract: An active contour model for vascular segmentation has been proposed, by defining a new, local, feature fitting, energy function. A vesselness filter is applied to the image in a directional Hessian-based framework. The filter output, as a feature, expresses the degree of the correspondence of each pixel to the vessel structure. By using intensity information obtained from local regions, the proposed model is able to solve the problem of intensity inhomogeneity in images. In addition, by introducing this feature into the fitting process, the model exhibits greater accuracy when compared to existing models. Experimental results from synthetic images and coronary X-ray angiograms verify the desirable performance of the proposed model.

24 citations


Journal ArticleDOI
TL;DR: A novel visual similarity measurement called adaptive region matching (ARM) has been developed and for decreasing negative influence of interference regions and important information loss simultaneously, a region importance index is constructed and semantic meaningful region (SMR) is introduced.
Abstract: This study deals with the problem of similarity matching in region-based image retrieval (RBIR). A novel visual similarity measurement called adaptive region matching (ARM) has been developed. For decreasing negative influence of interference regions and important information loss simultaneously, a region importance index is constructed and semantic meaningful region (SMR) is introduced. Moreover, ARM automatically performs SMR-to-image matching or image-to-image matching. Extensive experiments on Corel-1000, Caltech-256 and University of Washington (UW) databases demonstrate the authors proposed ARM is more flexible and more efficient than the existing visual similarity measurements that were originally developed for RBIR.

23 citations


Journal ArticleDOI
TL;DR: A novel discriminative method is proposed, which model the action of each person by a large-scale global feature and local body part features, to capture such interdependencies for recognising interaction of two people.
Abstract: This study addresses the problem of recognising human interactions between two people. The main difficulties lie in the partial occlusion of body parts and the motion ambiguity in interactions. The authors observed that the interdependencies existing at both the action level and the body part level can greatly help disambiguate similar individual movements and facilitate human interaction recognition. Accordingly, they proposed a novel discriminative method, which model the action of each person by a large-scale global feature and local body part features, to capture such interdependencies for recognising interaction of two people. A variant of multi-class Adaboost method is proposed to automatically discover class-specific discriminative three-dimensional body parts. The proposed approach is tested on the authors newly introduced BIT-interaction dataset and the UT-interaction dataset. The results show that their proposed model is quite effective in recognising human interactions.

22 citations


Journal ArticleDOI
TL;DR: A new optimal colour-based, mean-shift algorithm for tracking objects that has the advantages of decreased processing time and improved tracking accuracy is presented.
Abstract: The mean-shift method is widely used to locate a target object quickly in sequential images. The mean-shift algorithm takes advantage of a colour distribution with a uniform quantisation. However, the quantisation method ignores the close relationship of colour statistics. The uniform distribution also results in a colour histogram with many empty bins, which introduces additional computation cost in the tracking procedure. To reduce the number of these redundant, empty bins, the authors present a new optimal colour-based, mean-shift algorithm for tracking objects. In the proposed method, the optimal colours are extracted by a histogram agglomeration, which clusters three-dimensional (3D) colour histogram bins with the frequency ratios of 3D colour values. After obtaining optimal colours in a RGB colour histogram, the target image is represented by the indices of the optimal colours. The mean-shift algorithm thus creates a confidence map in a candidate image based on the optimal colour histogram in the target image. It then finds the peak of the confidence map near the previous position of an object area. Comparative experiments with the conventional mean-shift method showed that our method has the advantages of decreased processing time and improved tracking accuracy.

Journal ArticleDOI
TL;DR: Experimental results of proposed despeckling algorithm, based on non-subsampled contourlet transform, show that the proposed method is able to preserve edges and image structural details compared with existing methods.
Abstract: Speckle noise reduction is an important preprocessing stage for ultrasound medical image processing. In this paper, a despeckling algorithm is proposed based on non-subsampled contourlet transform. This transform has the property of high directionality, anisotropy and translation invariance, which can be controlled by non-subsampled filter banks. This study aims to denoise the speckle noise in ultrasound images using adaptive binary morphological operations, in order to preserve edges, contours and textures. In morphological operations, structural element plays an important role for image enhancement. In this work, different shapes of structural element have been analysed and filtering parameters have been changed adaptively depending on the nature of the image and the amount of noise in the image. Experimental results of proposed method for natural images, Field II simulated images and real ultrasound images, show that the proposed method is able to preserve edges and image structural details compared with existing methods.

Journal ArticleDOI
TL;DR: A novel method for shadow detection and removal using discrete wavelet transform (DWT) is proposed using DWT because of its multi-resolution property that decomposes an image into four different bands without loss of the spatial information.
Abstract: Shadow detection and removal is an important problem in computer vision. The real challenge in moving shadow detection and removal is to classify moving shadow points which are many times misclassified as moving object points in a video sequences. Various shadow detection and removal algorithms have been proposed for images but only a few works have been done for moving objects. In this study, a novel method for shadow detection and removal is proposed using discrete wavelet transform (DWT). The authors have used DWT because of its multi-resolution property that decomposes an image into four different bands without loss of the spatial information. For detection and removal of shadow, they have proposed a new threshold in the form of relative standard deviation. The value of threshold is automatically determined and does not require any supervised learning or manual calibration. The proposed method is flexible and depends on only one parameter, namely, wavelet coefficients. Results of shadow detection and removal from moving object after applying the proposed method are compared with the results of other state-of-the-art methods in terms of visual performance and a number of quantitative performance parameters. The proposed method is found to be better and more robust than other methods.

Journal ArticleDOI
TL;DR: The shape prior is coupled with the intensity information to enhance the segmentation results and is validated on synthetic and clinical images with various challenges such as the noise, occlusion and missing information.
Abstract: In this study, a novel probabilistic, geometric and dynamic shape-based level sets method is proposed. The shape prior is coupled with the intensity information to enhance the segmentation results. The two-dimensional principal component analysis method is applied on the training shapes to represent the shape variation with enough number of shape projections in the training step. The shape model is constructed using the implicit representation of the projected shapes. A new energy functional is proposed (i) to embed the shape model into the image domain and (ii) to estimate the shape coefficients. The proposed method is validated on synthetic and clinical images with various challenges such as the noise, occlusion and missing information. The authors compare their method with some of related works. Experiments show that the proposed segmentation method is more accurate and robust than other alternatives under different challenges. * Note: Colour figures are available in the online version of this paper.

Journal ArticleDOI
TL;DR: In this study, the problem of vehicle detection, tracking and speed estimation in the nighttime traffic surveillance videos captured in highly reflective environments is considered and a robust algorithm is proposed which uses vehicle headlights as their prominent features.
Abstract: In this study, the problem of vehicle detection, tracking and speed estimation in the nighttime traffic surveillance videos captured in highly reflective environments is considered. In this case, a robust algorithm is proposed which uses vehicle headlights as their prominent features. The proposed algorithm consists of three main stages. In the first stage, bright objects are segmented by thresholding the grey-scale image. An effective algorithm is then applied to distinguish between vehicles lights and lights reflected on the road and on the vehicles bodies. In the second stage, the segmented bright objects are tracked using their spatial characteristics and their shapes and then, their speeds are estimated. To correct the camera perspective effect and reduce computational complexity, a projective transformation is used. In the third stage, the lights of each vehicle are grouped and paired using their positions and speeds. Motorbikes are also identified among the unpaired lights in this stage. Finally, the proposed real-time system is implemented in C and applied to videos captured by traffic surveillance cameras in some highways in Iran. Experimental results reveal that accuracy of the algorithm proposed for vehicle detection is more than 98%.

Journal ArticleDOI
Jinsheng Xiao1, Li Wenhao1, Guoxiong Liu1, Shih-Lung Shaw1, Yongqin Zhang 
TL;DR: Experimental results show that the proposed algorithm with less computational cost reduces the halo effect significantly, and achieves the natural colour and the rich details, and outperforms the state-of-the-art methods in terms of visual quality and objective indicators.
Abstract: To solve the problem of low efficiency and poor effect of the current tone mapping methods for the high dynamic range images, the authors propose a hierarchical tone mapping algorithm based on colour appearance model. The discrete Gaussian kernel is used to speed up the bilateral filter. The operation of tone compression in RGB colour space is adopted to correct the colour casts. The extreme values of the pixels are also adjusted in the detail layer. Moreover, after the tone mapping, the colour saturation is enhanced in the image regions of rich details and sharp edges. Experimental results show that the proposed algorithm with less computational cost reduces the halo effect significantly, and achieves the natural colour and the rich details. It outperforms the state-of-the-art methods in terms of visual quality and objective indicators.

Journal ArticleDOI
TL;DR: This study shows that the slice theorem is valid within integer fields, via modulo arithmetic, using a circulant theory of the Radon transform (RT), and provides a representation of images as discrete projections that is always exact and real-valued.
Abstract: This study presents an integer-only algorithm to exactly recover an image from its discrete projected views that can be computed with the same computational complexity as the fast Fourier transform (FFT). Most discrete transforms for image reconstruction rely on the FFT, via the Fourier slice theorem (FST), in order to compute reconstructions with lowcomputational complexity. Consequently, complex arithmetic and floating point representations are needed, the latter of which is susceptible to round-off errors. This study shows that the slice theorem is valid within integer fields, via modulo arithmetic, using a circulant theory of the Radon transform (RT). The resulting number-theoretic RT (NRT) provides a representation of images as discrete projections that is always exact and real-valued. The NRT is ideally suited as part of a discrete tomographic algorithm, an encryption scheme or for when numerical overflow is likely, such as when computing a large number of convolutions on the projections. The low-computational complexity of the NRT algorithm also provides an efficient method to generate discrete projected views of image data.

Journal ArticleDOI
TL;DR: A biological vision-based facial description, namely perceived facial images, applied to extract features from human face images is proposed and a good architecture of neural network classifier can be obtained.
Abstract: This study presents a modified constructive training algorithm for multilayer perceptron (MLP) which is applied to face recognition problem. An incremental training procedure has been employed where the training patterns are learned incrementally. This algorithm starts with a small number of training patterns and a single hidden-layer using an initial number of neurons. During the training, the hidden neurons number is increased when the mean square error (MSE) threshold of the training data (TD) is not reduced to a predefined value. Input patterns are trained incrementally until all patterns of TD are learned. The aim of this algorithm is to determine the adequate initial number of hidden neurons, the suitable number of training patterns in the subsets of each class and the number of iterations during the training step as well as the MSE threshold value. The proposed algorithm is applied in the classification stage in face recognition system. For the feature extraction stage, this paper proposes to use a biological vision-based facial description, namely perceived facial images, applied to extract features from human face images. Gabor features and Zernike moment have been used in order to determine the best feature extractor. The proposed approach is tested on the Cohn-Kanade Facial Expression Database. Experimental results indicate that a good architecture of neural network classifier can be obtained. The effectiveness of the proposed method compared with the fixed MLP architecture has been proved.

Journal ArticleDOI
TL;DR: A new method for non-parametric background segmentation of dynamic scenes is proposed and it is demonstrated that the proposed method outperforms the state-of-the-art for Background segmentation in dynamic scenes.
Abstract: Detecting moving objects from background in video sequences is the first step of many image applications. The background can be divided into two types according to whether the pixel values of it are variable or not: static one and dynamic one. How to correctly detect moving foreground objects from dynamic scenes is a difficult problem because of the similarity between the moving foreground and the variable background. In this study, a new method for non-parametric background segmentation of dynamic scenes is proposed. Here the background is described by two interrelated models. One of them is called the self-model, which concerns with the recently observed pixel values at the same position, and the other one is called the neighbourhood-model, which is described by the pixel values of the neighbourhood. The author's method can accurately detect the dynamic background. To correctly detect pixels in the foreground as much as possible, the authors also propose an adaptive threshold for foreground decision based on the background characteristics. All of the above detection processes can be done in real time. Experimental results on public dataset demonstrate that the proposed method outperforms the state-of-the-art for background segmentation in dynamic scenes.

Journal ArticleDOI
TL;DR: This study presents an approach to hand detection and tracking that exploits two different video streams: the depth one and the colour one, and is designed to maintain a low computational cost and is optimised to efficiently execute HCI tasks.
Abstract: Hand detection and gesture recognition are two of the most studied topics in human–computer interaction (HCI). The increasing availability of sensors able to provide real-time depth measurements, such as time-of-flight cameras or the more recent Kinect, has helped researchers to find more and more efficient solutions for these issues. With the main aim to implement effective gesture-based interaction systems, this study presents an approach to hand detection and tracking that exploits two different video streams: the depth one and the colour one. Both hand and gesture recognition are based only on geometrical and colour constraints, and no learning phase is needed. The use of a Kalman filter to track hands guarantees system robustness also in presence of many persons in the scene. The entire procedure is designed to maintain a low computational cost and is optimised to efficiently execute HCI tasks. As use cases two common applications are described: a virtual keyboard and a three-dimensional object manipulation virtual environment. These applications have been tested with a representative sample of non-trained users to assess the usability and flexibility of the system.

Journal ArticleDOI
TL;DR: The authors propose to use a set of unscented Kalman filters to maintain each text region's identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation.
Abstract: The authors present a system that automatically detects, recognises and tracks text in natural scenes in real-time. The focus of the author's method is on large text found in outdoor environments, such as shop signs, street names, billboards and so on. Built on top of their previously developed techniques for scene text detection and orientation estimation, the main contribution of this work is to present a complete end-to-end scene text reading system based on text tracking. They propose to use a set of unscented Kalman filters to maintain each text region's identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The system is designed for continuous, unsupervised operation in a handheld or wearable system over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised to maximise usage of available processing power and achieve real-time operation. They demonstrate the performance of the system on sequences recorded in outdoor scenarios.

Journal ArticleDOI
TL;DR: Experimental results on real-world dataset, with both partially occluded and non-occluded data, show high performance of the proposed method compared with other state-of-the-art methods.
Abstract: One of the main challenges in pedestrian classification is partial occlusion. This study presents a new method for pedestrian classification with partial occlusion handling. The proposed method involves a set of part-based classifiers trained on histogram of oriented gradients features derived from non-occluded pedestrian data set. The score of each part classifier is then employed to weight features used to train a second stage full-body classifier. The full-body classifier based on local weighted linear kernel support vector machine is trained using both non-occluded and artificially generated partial occlusion pedestrian dataset. The new kernel allows to significantly focus on the non-occluded parts and reduce the impact of the occluded ones. Experimental results on real-world dataset, with both partially occluded and non-occluded data, show high performance of the proposed method compared with other state-of-the-art methods.

Journal ArticleDOI
TL;DR: The experimental results show that the proposedient descent method with adaptive momentum term is effective and efficient and can be applied to extract a single object in real images.
Abstract: In active contour models (snakes), various vector force fields replacing the gradient of the original external energy in the equations of motion are a popular way to extract the object boundary. Gradient descent method is usually used to obtain the equations of motion by minimising the energy functional. However, it always suffers from local minimum in extracting complex geometries because of non-convex functional. Gradient descent method with adaptive momentum term is proposed in this study. First, an acceleration function of evolution is defined. Then, the adaptive momentum term is obtained by calculating the product between the edge stopping function and the defined acceleration function. Finally, adaptive momentum is compatible with the snakes. The edge stopping function is used to decide the influence region of the momentum, whereas the defined acceleration function determines the magnitude of the momentum. It is used to extract the complex geometries (such as deep concavity) when adding the adaptive momentum into some snakes, such as gradient vector field or vector field convolution snakes. On the other hand, the proposed method also accelerates the rate of convergence. It can be applied to extract a single object in real images. The experimental results show that the proposed method is effective and efficient.

Journal ArticleDOI
TL;DR: The HGP descriptor which takes the confidence of the gradient phase into account is more discriminative and less sensitive to the normalisation process than most local descriptors, which significantly degrade without a proper normalisation.
Abstract: Gradient-based local descriptors have received more attention these years and have been successfully used in many applications such as human detection and face recognition. The advantages of the local descriptors are the resistance to the local geometric and photometric errors and the robustness to the expression variations. In this paper, the authors propose a new local descriptor called the histogram of gradient phases (HGP), which has some intriguing properties compared with the existing local descriptors such as the histogram of orientated gradients and DAISY for face recognition under the unconstrained conditions. In contrast with the histogram of the oriented gradient descriptor, the orientation histogram is computed from the estimated gradient phase distributions instead of weighting the votes of the gradient magnitudes. In this paper, the phase distributions are estimated by means of the gradient phases and the variances are decided by the estimated gradient signal-to-noise ratios of the pixels in a local region. The HGP descriptor which takes the confidence of the gradient phase into account is more discriminative and less sensitive to the normalisation process than most local descriptors, which significantly degrade without a proper normalisation. The simulation results show that the proposed HGP descriptor achieves a better performance and is more robust than the existing local descriptors.

Journal ArticleDOI
TL;DR: A local stereo matching algorithm whose performance is insensitive to changes in radiometric conditions between the input images and is more robust to radiometric differences between input images than other algorithms is presented.
Abstract: The authors present a local stereo matching algorithm whose performance is insensitive to changes in radiometric conditions between the input images. First, a prior on the disparities is built by combining the DAISY descriptor and Census filtering. Then, a Census-based cost aggregation with a self-adaptive window is performed. Finally, the maximum a-posteriori estimation is carried out to compute the disparity. The authors’ algorithm is compared with both local and global stereo matching algorithms (NLCA, ELAS, ANCC, AdaptWeight and CSBP) by using Middlebury datasets. The results show that the proposed algorithm achieves high-accuracy dense disparity estimations and is more robust to radiometric differences between input images than other algorithms.

Journal ArticleDOI
TL;DR: A novel approach is proposed to increase the representational power of the 3D face reconstruction model by deforming a set of examples in the training dataset to improve the RP of the standard PCA-based model and outperforms it with a 95% confidence level.
Abstract: Example-based statistical face models using principle component analysis (PCA) have been widely deployed for three-dimensional (3D) face reconstruction and face recognition. The two common factors that are generally concerned with such models are the size of the training dataset and the selection of different examples in the training set. The representational power (RP) of an example-based model is its capability to depict a new 3D face for a given 2D face image. The RP of the model can be increased by correspondingly increasing the number of training samples. In this contribution, a novel approach is proposed to increase the RP of the 3D face reconstruction model by deforming a set of examples in the training dataset. A PCA-based 3D face model is adapted for each new near frontal input face image to reconstruct the 3D face shape. Further an extended Tikhonov regularisation method has been employed to reconstruct 3D face shapes from a set of facial points. The results justify that the proposed adaptive PCA-based model considerably improves the RP of the standard PCA-based model and outperforms it with a 95% confidence level.

Journal ArticleDOI
TL;DR: A non-linear tensor-based model based on multi-linear decomposition is proposed that maps the high-dimensional image space into low-dimensional pose manifold and achieves high accuracy in pose estimation and multi-view face recognition.
Abstract: Although the ability to estimate the face pose and recognise its identity are common human abilities, they are still a challenge in computer vision context. In this study, the authors aim to overcome these difficulties by learning a non-linear tensor-based model based on multi-linear decomposition. Proposed model maps the high-dimensional image space into low-dimensional pose manifold. For preserving the actual distance along the manifold shape, a graph-based distance measure is proposed. Also, to compensate for the limited number of training poses, mirrored images are added to training ones to improve the recognition accuracy. For performance evaluation of the proposed method, experiments are run on three famous face databases using three different manifold shapes and two different distance measures. Eight training data modes are chosen such that the influential parameters are studied comprehensively. The obtained results confirm the effectiveness of proposed model in achieving high accuracy in pose estimation and multi-view face recognition, even with different training poses for different identities.

Journal ArticleDOI
TL;DR: The authors propose a multiple subsequence combination (MSC) method that divides the video into several consecutive subsequences, applies part-based and bag of visual words approaches to classify each subsequence, and combines subsequence labels to assign an action label to the video.
Abstract: Human action recognition is an active research area with applications in several domains such as visual surveillance, video retrieval and human–computer interaction. Current approaches assign action labels to video streams considering the whole video as a single sequence but, in some cases, the large variability between frames may lead to misclassifications. The authors propose a multiple subsequence combination (MSC) method that divides the video into several consecutive subsequences. It applies part-based and bag of visual words approaches to classify each subsequence. Then, it combines subsequence labels to assign an action label to the video. The proposed approach was tested on the KTH, UCF sports, Youtube and Robo-Kitchen datasets, which have large differences in terms of video length, object appearance and pose, object scale, viewpoint, background, as well as number, type and complexity of actions performed. Two main results were achieved. First, the MSC approach shows better performances compared to classify the video as a whole, even when few subsequences are used. Second, the approach is robust and stable since, for each dataset, its performances are comparable to the part-based approach at the state-of-the-art.

Journal ArticleDOI
TL;DR: Experimental results on a large image database demonstrate that the authors technique for computing the Euler number outperforms the earlier approaches significantly in terms of the number of basic arithmetic operations needed per pixel.
Abstract: The authors propose two equations based on the pixel geometry and connectivity properties, which can be used to compute, efficiently, the Euler number of a binary digital image with either thick or thin boundaries. Although computing this feature, the authors' technique extracts the underlying topological information provided by the shape pixels of the given image. The correctness of computing the Euler number using the new equations is also established theoretically. The performance of the proposed method is compared against other available alternatives. Experimental results on a large image database demonstrate that the authors technique for computing the Euler number outperforms the earlier approaches significantly in terms of the number of basic arithmetic operations needed per pixel. Both equations are specialised only for 4-connectivity cases.

Journal ArticleDOI
TL;DR: The authors present a method to extract moving objects in image sequences based on a graph cuts algorithm defined on a spatiotemporal superpixel neighbourhood that gives better performance of segmentation with respect to other state-of-the-art methods.
Abstract: In this study, the authors present a method to extract moving objects in image sequences The proposed approach is based on a graph cuts algorithm defined on a spatiotemporal superpixel neighbourhood Presegmented superpixels are partitioned into foreground and background while preserving temporal and spatial coherence It achieves this goal by three steps First, instead of operating at pixel level, the superpixels are advocated as basic units of the authors segmentation scheme Second, within the graph cuts framework, two superpixel-based data terms and two superpixel-based smoothness terms are proposed to solve segmentation problem Finally, the proposed method yields the segmentation of all the superpixels within video volume by the graph cuts algorithm To illustrate the advantages of this approach, the quantitative and qualitative results are compared with other state-of-the-art methods The experimental results show that the proposed method gives better performance of segmentation with respect to these methods