scispace - formally typeset
Search or ask a question

Showing papers in "Iet Computer Vision in 2012"


Journal ArticleDOI
TL;DR: The authors prove that the weights assigned to pixels in the target candidate region by BWH are proportional to those without background information, that is, BWH does not introduce any new information because the mean-shift iteration formula is invariant to the scale transformation of weights.
Abstract: The background-weighted histogram (BWH) algorithm proposed by Comaniciu et al. attempts to reduce the interference of background in target localisation in mean-shift tracking. However, the authors prove that the weights assigned to pixels in the target candidate region by BWH are proportional to those without background information, that is, BWH does not introduce any new information because the mean-shift iteration formula is invariant to the scale transformation of weights. Then a corrected BWH (CBWH) formula is proposed by transforming only the target model but not the target candidate model. The CBWH scheme can effectively reduce background's interference in target localisation. The experimental results show that CBWH can lead to faster convergence and more accurate localisation than the usual target representation in mean-shift tracking. Even if the target is not well initialised, the proposed algorithm can still robustly track the object, which is hard to achieve by the conventional target representation.

192 citations


Journal ArticleDOI
TL;DR: In this article, a scale and orientation adaptive mean shift tracking (SOAMST) algorithm is proposed to address the problem of how to estimate the scales and orientation changes of the target under the Mean Shift Tracking framework.
Abstract: A scale and orientation adaptive mean shift tracking (SOAMST) algorithm is proposed in this study to address the problem of how to estimate the scale and orientation changes of the target under the mean shift tracking framework. In the original mean shift tracking algorithm, the position of the target can be well estimated, whereas the scale and orientation changes cannot be adaptively estimated. Considering that the weight image derived from the target model and the candidate model can represent the possibility that a pixel belongs to the target, the authors show that the original mean shift tracking algorithm can be derived using the zeroth- and the first-order moments of the weight image. With the zeroth-order moment and the Bhattacharyya coefficient between the target model and candidate model, a simple and effective method is proposed to estimate the scale of target. Then an approach, which utilises the estimated area and the second-order centre moment, is proposed to adaptively estimate the width, height and orientation changes of the target. Extensive experiments are performed to testify the proposed method and validate its robustness to the scale and orientation changes of the target.

130 citations


Journal ArticleDOI
TL;DR: Experiments show that LDP and LTP perform much better than the other LBP-based models for face recognition, and it is essential to review, whether LBP or their derivatives perform better for face Recognition.
Abstract: Texture is the surface property that is used to identify and recognise objects. This property is widely used in many applications including texture-based face recognition systems, surveillance, identity verification and so on. The Local binary pattern (LBP) texture method is most successful for face recognition. Owing to the great success of LBP, recently many models, which are variants of LBP have been proposed for texture analysis. Some of the derivatives of LBPs are multivariate local binary pattern, centre symmetric local binary pattern, local binary pattern variance, dominant local binary pattern, advanced local binary pattern, local texture pattern (LTP) and local derivative pattern (LDP). In this scenario, it is essential to review, whether LBP or their derivatives perform better for face recognition. The real-time challenges such as illumination changes, rotations, angle variations and facial expression variations are evaluated by different LBP-based models. Experiments were conducted on the Japanese female facial expression, YALE and FRGC version2 databases. The results show that LDP and LTP perform much better than the other LBP-based models.

72 citations


Journal ArticleDOI
TL;DR: A method that remedies problems of mean-shift tracking and presents an easy to implement, robust and efficient tracking method that can be used for automated static camera video surveillance applications is proposed and it is shown that the proposed method is superior to the standard mean- shift.
Abstract: Mean-shift tracking plays an important role in computer vision applications because of its robustness, ease of implementation and computational efficiency. In this study, a fully automatic multiple-object tracker based on mean-shift algorithm is presented. Foreground is extracted using a mixture of Gaussian followed by shadow and noise removal to initialise the object trackers and also used as a kernel mask to make the system more efficient by decreasing the search area and the number of iterations to converge for the new location of the object. By using foreground detection, new objects entering to the field of view and objects that are leaving the scene could be detected. Trackers are automatically refreshed to solve the potential problems that may occur because of the changes in objects' size, shape, to handle occlusion-split between the tracked objects and to detect newly emerging objects as well as objects that leave the scene. Using a shadow removal method increases the tracking accuracy. As a result, a method that remedies problems of mean-shift tracking and presents an easy to implement, robust and efficient tracking method that can be used for automated static camera video surveillance applications is proposed. Additionally, it is shown that the proposed method is superior to the standard mean-shift.

54 citations


Journal ArticleDOI
TL;DR: Results on a database of 40 people show that bodyprints are very robust to changes of pose, point of view and illumination, and potential applications include tracking people with networks of non-overlapping cameras.
Abstract: This study proposes the concept of bodyprints to perform re-identification of people in surveillance videos. Bodyprints are obtained using calibrated depth-colour cameras such as kinect. The author's results on a database of 40 people show that bodyprints are very robust to changes of pose, point of view and illumination. Potential applications include tracking people with networks of non-overlapping cameras.

51 citations


Journal ArticleDOI
TL;DR: A new method based on geometric and transient optical flow features and their comparison and integration for facial expression recognition is proposed and achieves an advanced feature representation for the accurate and robust classification of facial expressions.
Abstract: Facial expression recognition is a useful feature in modern human computer interaction (HCI). In order to build efficient and reliable recognition systems, face detection, feature extraction and classification have to be robustly realised. Addressing the latter two issues, this work proposes a new method based on geometric and transient optical flow features and illustrates their comparison and integration for facial expression recognition. In the authors' method, photogrammetric techniques are used to extract three-dimensional (3-D) features from every image frame, which is regarded as a geometric feature vector. Additionally, optical flow-based motion detection is carried out between consecutive images, what leads to the transient features. Artificial neural network and support vector machine classification results demonstrate the high performance of the proposed method. In particular, through the use of 3-D normalisation and colour information, the proposed method achieves an advanced feature representation for the accurate and robust classification of facial expressions.

51 citations


Journal ArticleDOI
TL;DR: The authors try to introduce a new procedure for finding the vanishing point based on the visual information and K-Means clustering and have imported the minimum possible information to the Hough space by using only two pixels of each line.
Abstract: One of the main challenges in steering a vehicle or a robot is the detection of appropriate heading. Many solutions have been proposed during the past few decades to overcome the difficulties of intelligent navigation platforms. In this study, the authors try to introduce a new procedure for finding the vanishing point based on the visual information and K-Means clustering. Unlike other solutions the authors do not need to find the intersection of lines to extract the vanishing point. This has reduced the complexity and the processing time of our algorithm to a large extent. The authors have imported the minimum possible information to the Hough space by using only two pixels (the points) of each line (start point and end point) instead of hundreds of pixels that form a line. This has reduced the mathematical complexity of our algorithm while maintaining very efficient functioning. The most important and unique characteristic of our algorithm is the usage of processed data for other important tasks in navigation such as mapping and localisation.

43 citations


Journal ArticleDOI
TL;DR: The role and advancement of saliency algorithms over the past decade are surveyed, with an outline of the datasets and performance measures utilised as well as the computational techniques pervasive in the literature.
Abstract: Salient image regions permit non-uniform allocation of computational resources. The selection of a commensurate set of salient regions is often a step taken in the initial stages of many computer vision algorithms, thereby facilitating object recognition, visual search and image matching. In this study, the authors survey the role and advancement of saliency algorithms over the past decade. The authors first offer a concise introduction to saliency. Next, the authors present a summary of saliency literature cast into their respective categories then further differentiated by their domains, computational methods, features, context and use of scale. The authors then discuss the achievements and limitations of the current state of the art. This information is augmented by an outline of the datasets and performance measures utilised as well as the computational techniques pervasive in the literature.

40 citations


Journal ArticleDOI
TL;DR: By applying one class classification techniques with three-dimensional (3-d) features, the authors can obtain a more efficient fall detection system with acceptable performance, as shown in the experimental part.
Abstract: In this study, the authors introduce a video-based robust fall detection system for monitoring an elderly person in a smart room environment. Video features, namely the centroid and orientation of a voxel person, are extracted. The boundary method, which is an example of one class classification technique, is then used to determine whether the incoming features lie in the `fall region` of the feature space, and thereby effectively distinguishing a fall from other activities, such as walking, sitting, standing, crouching or lying. Four different types of boundary methods, k -centre, k th nearest neighbour, one class support vector machine and single class minimax probability machine (SCMPM) are assessed on representative test datasets. The comparison is made on the following three aspects: (i) true positive rate, false positive rate and geometric means in detection. (ii) Robustness to noise in the training dataset. (iii) The computational time for the test phase. From the comparison results, the authors show that the SCMPM achieves the best overall performance. By applying one class classification techniques with three-dimensional (3-d) features, the authors can obtain a more efficient fall detection system with acceptable performance, as shown in the experimental part; besides, it can avoid the drawbacks of other traditional fall detection methods.

34 citations


Journal ArticleDOI
Jihua Zhu1, Shaoyi Du1, Zuyi Yuan1, Liu Yaxiong1, Liang Ma1 
TL;DR: Wang et al. as discussed by the authors proposed a robust affine iterative closest point (ICP) algorithm based on bidirectional distance for the registration of m-dimensional (m-D) point sets.
Abstract: This study proposes a robust affine iterative closest point (ICP) algorithm based on bidirectional distance for the registration of m -dimensional ( m -D) point sets. Since the affine registration problem can be formulated as a least square (LS) problem by incorporating an affine transformation, this study first analyses the ill-posed problem of the affine registration and turn it into well-posed one by introducing the bidirectional distance into the LS formulation. Then, the corresponding affine ICP algorithm is proposed to solve the well-posed problem. By using the bidirectional distance, the proposed algorithm can directly estimate the affine transformation and converge monotonically to a local minimum from any given initial parameters. To obtain the desired global minimum, good initial parameters can be estimated by independent component analysis (ICA) technique. The proposed approach makes no geometric assumptions on point sets, so it is a general framework for affine registration of m -D point sets. Experimental results demonstrate its robustness and accuracy compared with the current state-of-the-art approaches.

29 citations


Journal ArticleDOI
M. Xu1, J. Lu1
TL;DR: Experiments show that D-RansAC is superior to RANSAC and R-RANSAC in computational complexity and accuracy in most cases, particularly when the inlier proportion is below 65%.
Abstract: Many low- or middle-level three-dimensional reconstruction algorithms involve a robust estimation and selection step whereby parameters of the best model are estimated and inliers fitting this model are selected. The RANSAC (RANdom SAmple consensus) algorithm is the most widely used robust algorithm for this task. A new version of RANSAC, called distributed RANSAC (D-RANSAC), is proposed, to save computation time and improve accuracy. The authors compare their results with those of classical RANSAC and randomised RANSAC (R-RANSAC). Experiments show that D-RANSAC is superior to RANSAC and R-RANSAC in computational complexity and accuracy in most cases, particularly when the inlier proportion is below 65%.

Journal ArticleDOI
TL;DR: A novel region-based method to recognise human actions that takes advantage of the naturally formed negative regions that come with simple shape, simplifying the job for action classification and making the work robust with respect to segmentation errors and distinctive from other approaches.
Abstract: The authors propose a novel region-based method to recognise human actions. Other region-based approaches work on silhouette of the human body, which is termed as the positive space according to art theory. In contrast, the authors investigate and analyse regions surrounding the human body, termed as the negative space for human action recognition. This concept takes advantage of the naturally formed negative regions that come with simple shape, simplifying the job for action classification. Negative space is less sensitive to segmentation errors, overcoming some limitations of silhouette-based methods such as leaks or holes in the silhouette caused by background segmentation. Inexpensive semantic-level description can be generated from the negative space that supports fast and accurate action recognition. The proposed system has obtained 100% accuracy on the Weizmann human action dataset and the robust sequence dataset. On KTH dataset the system achieved 94.67% accuracy. Furthermore, 95% accuracy can be achieved even when half of the negative space regions are ignored. This makes our work robust with respect to segmentation errors and distinctive from other approaches.

Journal ArticleDOI
TL;DR: This paper explores the problem of learning a shape class prototype from a set of class exemplars which may not share a single local image feature, and solves the many-to-many graph matching problem in a different way.
Abstract: The mainstream object categorisation community relies heavily on object representations consisting of local image features, due to their ease of recovery and their attractive invariance properties Object categorisation is therefore formulated as finding, that is, `detecting`, a one-to-one correspondence between image and model features This assumption breaks down for categories in which two exemplars may not share a single local image feature Even when objects are represented as more abstract image features, a collection of features at one scale (in one image) may correspond to a single feature at a coarser scale (in the second image) Effective object categorisation therefore requires the ability to match features many-to-many In this paper, we review our progress on three independent object categorisation problems, each formulated as a graph matching problem and each solving the many-to-many graph matching problem in a different way First, we explore the problem of learning a shape class prototype from a set of class exemplars which may not share a single local image feature Next, we explore the problem of matching two graphs in which correspondence exists only at higher levels of abstraction, and describe a low-dimensional, spectral encoding of graph structure that captures the abstract shape of a graph Finally, we embed graphs into geometric spaces, reducing the many-to-many graph-matching problem to a weighted point matching problem, for which efficient many-to-many matching algorithms exist

Journal ArticleDOI
TL;DR: This study provides an object tracking method in video sequences, which is based on curvelet transform, which improves greatly the tracking accuracy and efficiency than traditional methods.
Abstract: This study provides an object tracking method in video sequences, which is based on curvelet transform. The wavelet transform has been widely used for object tracking purpose, but it cannot well describe curve discontinuities. We have used curvelet transform for tracking. Tracking is done using energy of curvelet coefficients in sequence of frames. The proposed method is simple and does not rely on any other parameter except curvelet coefficients. Compared with a number of schemes like Kalman filter, particle filter, Bayesian methods, template model, corrected background weighted histogram, joint colour texture histogram and covariance-based tracking methods, the proposed method extracts effectively the features in target region, which characterise better and represent more robustly the target. The experimental results validate that the proposed method improves greatly the tracking accuracy and efficiency than traditional methods.

Journal ArticleDOI
TL;DR: A novel identification architecture that uses hand geometry as a soft biometric to accelerate the identification process and ensure the system's scalability is proposed and a new feature binarisation technique is proposed that guarantees that the Hamming distance between transformed binary features is proportional to the difference between their real values.
Abstract: This study proposes a biometric system for personal identification based on three biometric characteristics from the hand, namely: the palmprint, finger surfaces and hand geometry. A protection scheme is applied to the biometric template data to guarantee its revocability, security and diversity among different biometric systems. An error-correcting code (ECC), a cryptographic hash function (CHF) and a binarisation module are the core of the template protection scheme. Since the ECC and CHF operate on binary data, an additional feature binarisation step is required. This study proposes: (i) a novel identification architecture that uses hand geometry as a soft biometric to accelerate the identification process and ensure the system's scalability; and (ii) a new feature binarisation technique that guarantees that the Hamming distance between transformed binary features is proportional to the difference between their real values. The proposed system achieves promising recognition and speed performances on two publicly available hand image databases.

Journal ArticleDOI
TL;DR: In this article, a novel depth from defocus (DFD) method considering different images with fixed camera parameters is given, where the relative blurring and the diffusion equation are discussed and the relation between depth and blurring is discussed.
Abstract: Owing to the space limitation and the strict requirement on operation, depth measurement using single visual sensor is necessary in many applications, such as mini-robot, precision processing and micro/nano-manipulation. Depth from defocus (DFD), a typical method applied in depth reconstruction, has been extensively researched and has developed greatly in recent years. However, all the existing DFD algorithms has focused only on the situation that blurring images with different camera parameters (i.e. focal length or radius of the lens), and it resulted in the inapplicability of these algorithms in cases where any change of camera parameters is absolutely forbidden. Therefore a novel DFD method considering different images with fixed camera parameters is given. First, the blurring imaging model is constructed with the relative blurring and the diffusion equation. Secondly the relation between depth and blurring is discussed. Subsequently, the depth measurement problem is transformed into an optimisation issue. Finally, simulations and experiments are conducted to show the feasibility and effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: The authors consider the general problem of robust pedestrian detection irrespective of background, reviewing the state of the art, showing some representative results and suggesting ways forward.
Abstract: The significant progress in visual surveillance has been motivated by the need to emulate some of the human ability to monitor activity in human-made environments, particularly in the contexts of security and safety. The rapid rise in numbers of cameras installed in public and private places makes such automation desirable, at least to reduce CCTV workload. Real-world applications of visual surveillance impose the need of robust real-time solutions, able to deal with a wide range of circumstances and environmental conditions. Conventional approaches work based on what has become known as motion (or change) detection followed by tracking (in single or multiple camera systems). Objects of interest are represented by rectangular blobs and decisions on whether something might be interesting are made on rules or learned patterns of presence and trajectories of such blobs. There is growing interest in looking `inside the box` for applications that are concerned with detailed human activity recognition and with robust detection of people even when image backgrounds change, as is the case of a moving camera. In this study, the authors consider the general problem of robust pedestrian detection irrespective of background, reviewing the state of the art, showing some representative results and suggesting ways forward.

Journal ArticleDOI
TL;DR: This chapter presents an algorithm for the automatic detection of circular shapes from complicated and noisy images with no consideration of conventional Hough transform principles based on Learning Automata which is a probabilistic optimization method that explores an unknown random environment by progressively improving the performance via a reinforcement signal.
Abstract: The outcome of Turing’s seminal work, originally proposed as a simple operational definition of intelligence, delivered several computer applications for solving complex engineering problems such as object detection and pattern recognition. Among such issues, circle detection over digital images has received considerable attention from the computer vision community over the last few years. This chapter presents an algorithm for the automatic detection of circular shapes from complicated and noisy images with no consideration of conventional Hough transform principles. The proposed algorithm is based on Learning Automata (LA) which is a probabilistic optimization method that explores an unknown random environment by progressively improving the performance via a reinforcement signal. The approach uses the encoding of three non-collinear points as a candidate circle over the edge image. A reinforcement signal indicates if such candidate circles are actually present in the edge map. Guided by the values of such reinforcement signal, the probability set of the encoded candidate circles is modified through the LA algorithm so that they can fit to the actual circles on the edge map. Experimental results over several complex synthetic and natural images have validated the efficiency of the proposed technique regarding accuracy, speed and robustness.

Journal ArticleDOI
TL;DR: The proposed method for estimating the number of people waiting at regular open bus stops by means of image processing is shown to yield better pedestrian count estimates than those obtained using milestone detectors, and requires model fitting procedures than can be easily implemented without requiring very large datasets for proper classifier training.
Abstract: Automated bus fleet scheduling and dispatch require an accurate measurements of current passenger demand. This study presents an effective holistic approach for estimating the number of people waiting at regular open bus stops by means of image processing. This is a non-trivial problem because of several varying conditions that complicate the detection process, such as illumination, crowdedness and different people poses, to name a few. The proposed method estimates the pedestrian count using measurements of foreground areas corrected by perspective. Four approaches are evaluated to find the best mapping between the area measurements and the people count. These mappings include two parametric (standard linear regression model, linear discriminant analysis) and two non-parametric (probabilistic neural network, k-nearest neighbours) approaches. This study also evaluates the performance of the algorithm when thermal and panoramic catadioptric cameras are used instead of standard perspective colour cameras. The proposed method is shown to yield better pedestrian count estimates than those obtained using milestone detectors, and requires model fitting procedures than can be easily implemented without requiring very large datasets for proper classifier training. The approach can also be employed to count people in other public spaces, such as buildings and crosswalks.

Journal ArticleDOI
TL;DR: The authors recommend that when image blocks are to be used for forensic investigations, they should be taken from the image centre before SPN extraction is performed in order to reduce false-positive rate.
Abstract: The sensor pattern noise (SPN) is a unique attribute of the content of images that can facilitate identification of source digital imaging devices. Owing to its potential in forensic applications, it has drawn much attention in the digital forensic community. Although much work has been done on the applications of the SPN, investigations into its characteristics have been largely overlooked in the literature. In this study, the authors aim to fill this gap by providing insight into the characteristic dependency of the SPN quality on its location in images. They have observed that the SPN components at the image periphery are not reliable for the task of source camera identification, and tend to cause higher false-positive rates. Empirical evidence is presented in this work. The authors suspect that this location-dependent SPN quality degradation has strong connection with the so-called ‘vignetting effect’, as both exhibit the same type of location dependency. The authors recommend that when image blocks are to be used for forensic investigations, they should be taken from the image centre before SPN extraction is performed in order to reduce false-positive rate.

Journal ArticleDOI
TL;DR: An improved non-local framework is presented, which can contain the corner and end points region information and can obtain more accurate results when segmenting images with bias field and noise.
Abstract: Intensity inhomogeneities cause considerable difficulties in the quantitative analysis of magnetic resonanceimages (MRIs). Consequently, intensity inhomogeneities estimation is a necessary step before quantitative analysis of MR data can be undertaken. This study proposes a new energy minimisation framework for simultaneous estimation of the intensity inhomogeneities and segmentation. The method was formulated by modifying the objective function of the standard fuzzy c -means algorithm to compensate for intensity inhomogeneities by using basis functions and to compensate for noise by using improved non-local information. The energy function depends on the coefficients of the basis functions, the membership ratios, the centroid of the tissues and an improved non-local information in the image. Intensity inhomogeneities estimation and image segmentation are simultaneously achieved by calculating the result of minimising this energy. The non-local framework has been widely used to provide non-local information; however, the traditional framework only considers the neighbouring patch information, which will lose information of the corner and end points. This study presents an improved non-local framework, which can contain the corner and end points region information. Experimental results on both real MRIs and simulated MR data show that the authors method can obtain more accurate results when segmenting images with bias field and noise.

Journal ArticleDOI
TL;DR: In this article, the stability of the EKF-based 3D pose estimators is analyzed in detail, and a composite technique is proposed to guarantee the stability and robustness of the procedure.
Abstract: Three-dimensional (3D) pose estimation of a rigid object by only one camera has a vital role in visual servoing systems, and extended Kalman filter (EKF) is vastly used for this task in an unstructured environment. In this study, the stability of the EKF-based 3D pose estimators is analysed in detail. The most challenging issue of the state-of-the-art EKF-based 3D pose estimators is the possibility of its divergence because of the measurement and model noises. By analysing the stability of conventional EKF-based pose estimators a composite technique is proposed to guarantee the stability of the procedure. In the proposed technique, the non-linear-uncertain estimation problem is decomposed into a non-linear-certain observation in addition to a linear-uncertain estimation problem. The first part is handled using the extended Kalman observer and the second part is accomplished by a simple Kalman filter. Finally, some experimental and simulation results are given in order to verify the robustness of the method and compare the performance of the proposed method in noisy and uncertain environment to the conventional techniques.

Journal ArticleDOI
TL;DR: The experimental results show the feasibility and effectiveness of the proposed framework for the purpose of multi-layer data registration and volumetric reconstruction of an object inside a scene and a method to estimate the translation vectors among virtual cameras.
Abstract: A novel approach for three-dimensional (3D) volumetric reconstruction of an object inside a scene is proposed. A camera network is used to observe the scene. Each camera within the network is rigidly coupled with an Inertial Sensor (IS). A virtual camera is defined for each IS-camera couple using the concept of infinite homography, by fusion of inertial and visual information. Using the inertial data and without planar ground assumption, a set of virtual horizontal planes are defined. The intersections of these inertial-based virtual planes with the object are registered using the concept of planar homography. Moreover a method to estimate the translation vectors among virtual cameras is proposed, which just needs the relative heights of two 3D points in the scene with respect to one of the cameras and their correspondences on the image planes. Different experimental results for the proposed 3D reconstruction method are provided on two different types of scenarios. In the first type, a single IS-camera couple is used and placed in different locations around the object. In the second type, the 3D reconstruction of a walking person (dynamic case) is performed where a set of installed cameras in a smart-room is used for the data acquisition. Moreover, a set of experiments are simulated to analyse the accuracy of the translation estimation method. The experimental results show the feasibility and effectiveness of the proposed framework for the purpose of multi-layer data registration and volumetric reconstruction.

Journal ArticleDOI
TL;DR: A new fast stochastic motion estimation technique is proposed that requires 5% of the total computations required by the full-search algorithm, and results in a quality that outperforms most of the well-known fast searching algorithms.
Abstract: Many fast search motion estimation algorithms have been developed to reduce the computational cost required by full-search algorithms. Fast search motion estimation techniques often converge to a local minimum, providing a significant reduction in computational cost. The motion vector measurement process in fast search algorithms is subject to noise and matching errors. Therefore researchers have investigated the use of Kalman filtering in order to seek optimal estimates. In this work, the authors propose a new fast stochastic motion estimation technique that requires 5% of the total computations required by the full-search algorithm, and results in a quality that outperforms most of the well-known fast searching algorithms. The measured motion vectors are obtained using a simplified hierarchical search block-matching algorithm, and are used as the measurement part of the Kalman filter. As for the prediction part of the filter, it is assumed that the motion vector of a current block can be predicted from its four neighbouring blocks. Using the predicted and measured motion vectors, the best estimates for motion vectors are obtained. Using standard methods of accuracy measurements, results show that the performance of the proposed technique approaches that of the full-search algorithm.

Journal ArticleDOI
TL;DR: The experimental results show that SLDP can give reasonable semantic results and achieves competitive performance compared with some techniques such as PCA, LPP, neighbourhood preserving embedding (NPE) and the recently proposed unified sparse subspace learning (USSL).
Abstract: One of the major disadvantages of the linear dimensionality reduction algorithms, such as principle component analysis (PCA) and linear discriminant analysis (LDA), is that the projections are lack of physical interpretation. Moreover, which features or variables play an important role in feature extraction and classification in classical linear dimensionality reduction methods is still not investigated well. This paper proposes a novel supervised learning method called sparse local discriminant projections (SLDPs) for linear dimensionality reduction. Differed from the recent manifold-learning-based methods such as local preserving projections (LPPs), SLDP introduces a sparse constraint into the objective function and integrates the local geometry, discriminant information and within-class geometry to obtain the sparse projections. The sparse projections can be efficiently computed by the Elastic Net. The most important and interesting thing is that the sparse projections learned by SLDP have a direct physical interpretation and provide us the discriminant knowledge and insightful understanding for the extracted features. The experimental results show that SLDP can give reasonable semantic results and achieves competitive performance compared with some techniques such as PCA, LPP, neighbourhood preserving embedding (NPE) and the recently proposed unified sparse subspace learning (USSL).

Journal ArticleDOI
TL;DR: A novel face recognition algorithm based on geometric features to alleviate the one sample per subject problem, called the robust estimation system that greatly improves recognition performance compared with the existing methods.
Abstract: In this study, the authors propose a novel face recognition algorithm based on geometric features to alleviate the one sample per subject problem, called the robust estimation system. Our application adopts both local and global information for robust estimation. The authors utilise the original images from the ORL and Yale databases for evaluation. The images of the FERET database are pre-processed to extract the pure face region and execute the affine transformation. The authors roughly divide the face images into four block images that are most significant for the face: left eye, right eye, nose and mouth. The feature extraction using magnitude of first-order gradients, based on geometric features, is ideal for estimating a single sample. While conducting the classification stage, local features are putatively matched before processing or the global random sample consensus robust estimation features, with the aim of identifying the fundamental matrix between two matched face images. Finally, similarity scores are calculated, and the candidate awarded the highest score is designated the correct subject. Experiments were implemented using the FERET, ORL and Yale databases to demonstrate the efficiency of the proposed method. The experimental results show that our algorithm greatly improves recognition performance compared with the existing methods.

Journal ArticleDOI
TL;DR: Experimental results prove the authors' claim that an accurate estimation of objects' depth in a scene can be obtained by taking into account extracted features' distribution over the target's surface.
Abstract: During the last decade, a wealth of research was devoted to building integrated vision systems capable of both recognising objects and providing their spatial information. Object recognition and pose estimation are among the most popular and challenging tasks in computer vision. Towards this end, in this work the authors propose a novel algorithm for objects' depth estimation. Moreover, they comparatively study two common two-part approaches, namely the scale invariant feature transform SIFT and the speeded-up robust features algorithm, in the particular application of location assignment of an object in a scene relatively to the camera, based on the proposed algorithm. Experimental results prove the authors' claim that an accurate estimation of objects' depth in a scene can be obtained by taking into account extracted features' distribution over the target's surface.

Journal ArticleDOI
TL;DR: A two-tier search method is proposed which achieves similar accuracy with a speed increase of two orders of magnitude and is evaluated using real input and the results obtained using the different approaches are compared.
Abstract: Vision-based hand pose estimation presents unique challenges, particularly if high-fidelity reconstruction is desired. Searching large databases of synthetic pose candidates for items similar to the input offers an attractive means of attaining this goal. The earth mover's distance is a perceptually meaningful measure of dissimilarity that has shown great promise in content-based image retrieval. It is in general, however, a computationally expensive operation and must be used sparingly. The authors investigate a way of economising on its use while preserving much of its accuracy when applied naively in the context of searching for hand pose candidates in large synthetic databases. In particular, a two-tier search method is proposed which achieves similar accuracy with a speed increase of two orders of magnitude. The system performance is evaluated using real input and the results obtained using the different approaches are compared.

Journal ArticleDOI
TL;DR: A video-based identification authentication framework via signature verification and pen-grasping posture analysis that is able to achieve both low false-rejection rates and lowfalse-acceptance rates for database containing both unskilled and skilled imitation signatures.
Abstract: This article proposes a video-based identification authentication framework via signature verification and pen-grasping posture analysis. The authors consider the case of using a camera instead of a pressure-sensitive tablet to acquire signatures. The proposed reliable verification method is useful when pressure-sensitive digitising tablets are not available. In addition, the authors can acquire more information in addition to the trajectories of the signature in video-based handwritten signature verification. The entire writing process and the pen-grasping posture are personalised features that cannot be easily imitated and forged. The authors analyse the signature trajectories using curvelets and the pen-grasping posture using modified motion energy images to perform user-dependent identification authentication. The proposed system is able to achieve both low false-rejection rates and low false-acceptance rates for database containing both unskilled and skilled imitation signatures.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed to combine the Geodesic and the Euclidean descriptors as just one descriptor to fit the texture as well as possible, and the classification results of several textures from the VisTex and Brodatz database show that this approach outperforms the classical pattern spectrum descriptor separately and does not require previous training.
Abstract: Mathematical morphology can be used to extract a shape-size distribution called pattern spectrum (PS) with texture description purposes. However, the structuring element (SE) used to compute it does not vary along the image; and therefore it does not capture its geometrical variations. The author-s proposal consists of computing an SE at each pixel whose size and shape varies with two distance criterions: an Geodesic distance and a Euclidean distance, in order to fit the texture as well as possible. Combining the Geodesic and the Euclidean descriptors as just one descriptor, the classification results of several textures from the VisTex and Brodatz database show that this approach outperforms the classical PS, the Geodesic and the Euclidean descriptors separately and, in contrast with other adaptive methods, it does not require previous training.