scispace - formally typeset
Search or ask a question

Showing papers by "Yap-Peng Tan published in 2013"


Journal Article•DOI•
TL;DR: This paper proposes a novel discriminative multimanifold analysis (DMMA) method by learning discrim inative features from image patches by partitioning each enrolled face image into several nonoverlapping patches to form an image set for each sample per person.
Abstract: Conventional appearance-based face recognition methods usually assume that there are multiple samples per person (MSPP) available for discriminative feature extraction during the training phase. In many practical face recognition applications such as law enhancement, e-passport, and ID card identification, this assumption, however, may not hold as there is only a single sample per person (SSPP) enrolled or recorded in these systems. Many popular face recognition methods fail to work well in this scenario because there are not enough samples for discriminant learning. To address this problem, we propose in this paper a novel discriminative multimanifold analysis (DMMA) method by learning discriminative features from image patches. First, we partition each enrolled face image into several nonoverlapping patches to form an image set for each sample per person. Then, we formulate the SSPP face recognition as a manifold-manifold matching problem and learn multiple DMMA feature spaces to maximize the manifold margins of different persons. Finally, we present a reconstruction-based manifold-manifold distance to identify the unlabeled subjects. Experimental results on three widely used face databases are presented to demonstrate the efficacy of the proposed approach.

326 citations


Journal Article•DOI•
TL;DR: An ordinary preserving manifold analysis approach is proposed to seek a low-dimensional subspace such that the samples with similar label values are projected to be as close as possible and those with dissimilar label values as far as possible, simultaneously.
Abstract: We propose in this paper an ordinary preserving manifold analysis approach for human age and head pose estimation While a large number of manifold learning algorithms have been proposed in the literature and some of them have been successfully applied to age/pose estimation, the ordinary characteristics of the age/pose information of samples have not been fully exploited to learn the low-dimensional discriminative features for these estimation tasks To address this, we propose an ordinary preserving manifold analysis approach to seek a low-dimensional subspace such that the samples with similar label values (ie, small age/pose difference) are projected to be as close as possible and those with dissimilar label values (ie, large age/pose difference) as far as possible, simultaneously Subsequently, we learn a multiple linear regression model to uncover the relation of these low-dimensional features and the ground-truth values of samples for age/pose estimation Experimental results on facial age estimation, gait-based human age estimation, and head pose estimation are presented to demonstrate the efficacy of our proposed approach

109 citations


Journal Article•DOI•
TL;DR: A cost-sensitive subspace analysis approach for face recognition that uses a cost matrix specifying different costs corresponding to different types of misclassifications to achieve a minimum overall recognition loss by performing recognition in these learned low-dimensional subspaces.
Abstract: Conventional subspace-based face recognition methods seek low-dimensional feature subspaces to achieve high classification accuracy and assume the same loss from different types of misclassification. This assumption, however, may not hold in many practical face recognition systems as different types of misclassification could lead to different losses. Motivated by this concern, this paper proposes a cost-sensitive subspace analysis approach for face recognition. Our approach uses a cost matrix specifying different costs corresponding to different types of misclassifications, into two popular and widely used discriminative subspace analysis methods and devises the cost-sensitive linear discriminant analysis (CSLDA) and cost-sensitive marginal fisher analysis (CSMFA) methods, to achieve a minimum overall recognition loss by performing recognition in these learned low-dimensional subspaces. To better exploit the complementary information from multiple features for improved face recognition, we further propose a multiview cost-sensitive subspace analysis approach by seeking a common feature subspace to fuse multiple face features to improve the recognition performance. Extensive experimental results demonstrate the effectiveness of our proposed methods.

63 citations


Proceedings Article•DOI•
Renliang Weng1, Jiwen Lu, Junlin Hu1, Gao Yang1, Yap-Peng Tan1 •
01 Dec 2013
TL;DR: A new partial face recognition approach is proposed by using feature set matching, which is able to align partial face patches to holistic gallery faces automatically and is robust to occlusions and illumination changes.
Abstract: Over the past two decades, a number of face recognition methods have been proposed in the literature. Most of them use holistic face images to recognize people. However, human faces are easily occluded by other objects in many real-world scenarios and we have to recognize the person of interest from his/her partial faces. In this paper, we propose a new partial face recognition approach by using feature set matching, which is able to align partial face patches to holistic gallery faces automatically and is robust to occlusions and illumination changes. Given each gallery image and probe face patch, we first detect key points and extract their local features. Then, we propose a Metric Learned Extended Robust Point Matching (MLERPM) method to discriminatively match local feature sets of a pair of gallery and probe samples. Lastly, the similarity of two faces is converted as the distance between two feature sets. Experimental results on three public face databases are presented to show the effectiveness of the proposed approach.

54 citations


Journal Article•DOI•
TL;DR: A method based on locality repulsion projections (LRP) and a sparse reconstruction-based similarity measure (SRSM) to address the problem of SSPP face recognition using multiple probe images is proposed.
Abstract: For many practical face recognition systems such as law enforcement, e-passport, and ID card identification, there is usually only a single sample per person (SSPP) enrolled in these systems, and many existing face recognition methods may fail to work well because there are not enough samples for discriminative feature extraction in this scenario. However, the probe samples of these face recognition systems are usually captured on the spot, and it is possible to collect multiple face images per person for on-location probing, which is potentially useful to improve the recognition performance. In this paper, we propose a method based on locality repulsion projections (LRP) and a sparse reconstruction-based similarity measure (SRSM) to address the problem of SSPP face recognition using multiple probe images. The LRP method is motivated by our observation that similar face images from different people may lie in a locality in the feature space and cause misclassifications. We design the method with the aim of separating the samples of different classes within a neighborhood through subspace projections for easier classification. To better characterize the similarity between each gallery face and the probe image set, we propose a SRSM method for assigning a label to each probe image set. Experimental results on five widely used face datasets are presented to demonstrate the effectiveness of the proposed approach.

38 citations


Proceedings Article•DOI•
22 Apr 2013
TL;DR: To better extract complementary information from different facial features, multiple ordinal ranking models are constructed, each corresponding to a feature set, and aggregate them into an effective age estimator.
Abstract: In this paper, we propose a multi-feature ordinal ranking (MFOR) method for facial age estimation. Different from most existing facial age estimation approaches where age estimation is treated as a classification or a regression problem, we formulate facial age estimation as a group of ordinal ranking subproblems, and each subproblem derives a separating hyperplane to divide face instances into two groups: samples with age larger than k and samples with labels no larger than k. To better extract complementary information from different facial features, we construct multiple ordinal ranking models, each corresponding to a feature set, and aggregate them into an effective age estimator. Experimental results on two public face aging datasets are presented to demonstrate the efficacy of the proposed method.

38 citations


Journal Article•DOI•
TL;DR: A scalable resource allocation framework for streaming scalable videos over multiuser multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) networks is proposed to achieve differentiated service objectives for different scalable video layers and handles fairness and efficiency better at different scenarios than the conventional schemes.
Abstract: In this paper, we propose a scalable resource allocation framework for streaming scalable videos over multiuser multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) networks. We exploit the utilities of scalable videos produced by the scalable extension of H.264/AVC (SVC) and investigate the multidimensional diversities of the multiuser MIMO-OFDM wireless networks. First, we study the rate-utility relationship of SVC via a packet prioritization scheme. Based on the rate-utility analysis, a scalable resource-allocation framework is proposed to achieve differentiated service objectives for different scalable video layers. To provide users with fair opportunities to acquire basic viewing experience, a fair scheme is designed to guarantee that each user is entitled to a MAXMIN fairness to have their base layer video packets received. After all users have their base layer packets successfully scheduled, resources are distributed to exploit the network efficiency. The two schemes are integrated into a unified bit loading and power allocation solution to enhance the practicability of the scalable framework. Experiment results confirms that the proposed scheme handles fairness and efficiency better at different scenarios than the conventional schemes.

37 citations


Proceedings Article•DOI•
01 Nov 2013
TL;DR: A robust partial face recognition approach based on local feature representation, where the similarity between each probe patch and gallery face is computed by using the instance-to-class distance with the sparse constraint is presented.
Abstract: We present a new face recognition approach from partial face patches by using an instance-to-class distance. While numerous face recognition methods have been proposed over the past two decades, most of them recognize persons from whole face images. In many real world applications, partial faces usually occur in unconstrained scenarios such as visual surveillance systems. Hence, it is very important to recognize an arbitrary facial patch to enhance the intelligence of such systems. In this paper, we develop a robust partial face recognition approach based on local feature representation, where the similarity between each probe patch and gallery face is computed by using the instance-to-class distance with the sparse constraint. Experiments on two popular face datasets are presented to show the efficacy of our proposed method.

24 citations


Journal Article•DOI•
TL;DR: This letter presents a hybrid saliency detection method for images by which it automatically predict the saliency regions based on low-level and high-level cues, and compares its algorithm to several state-of-the-art saliency Detection methods based on the well-known 1000 image EPFL database.
Abstract: Saliency information interpreted from the visual stimuli can predict the attentional behaviour of human perception, thus playing a key role in visual signal processing. In this letter, we present a hybrid saliency detection method for images by which we automatically predict the saliency regions based on low-level and high-level cues. Unlike existing bottom-up and top-down attentional methods, we consider the high-level cue imposed by the photographer. Based on this assumption, we estimate the defocus map of the image and integrate it with other low-level features based on the Bayesian framework. We compare our algorithm to several state-of-the-art saliency detection methods based on the well-known 1000 image EPFL database, and demonstrate the superior performance of our proposed algorithm.

19 citations


Journal Article•DOI•
TL;DR: A new Multiview Subspace Representation (MSR) method is devised which considers gait sequences collected from different views of the same subject as a feature set and extracts a linear subspace to describe the feature set.

17 citations


Journal Article•DOI•
TL;DR: Experimental results show that the proposed QoE-aware scalability adaptation scheme significantly outperforms the conventional adaptation schemes in terms of QoEs and provides a useful methodology to estimate video QeE which is important for Qo E-aware scalable video streaming.

Journal Article•DOI•
TL;DR: This paper jointly adapt the transmission rate, transmission power, and retransmission limit to minimize the transmission energy while ensuring that the video frame is reliably delivered before a deadline and demonstrates that allowing some retransmissions in the joint optimization consumes less energy, without compromising the target reliability.
Abstract: In wireless video transmissions, encoded video frames are often large in data load and truncated into many transport packets (TPs) for reliable transmissions. These TPs are to be delivered before a deadline at certain reliability that depends on the importance of the video frame. High power and bitrate transmission schemes are often deployed to ensure low loss rate, but at the cost of substantial energy consumption. This paper addresses the energy-minimizing transmission policy for highly reliable transmissions of a group of TP with a common deadline. We jointly adapt the transmission rate, transmission power, and retransmission limit to minimize the transmission energy while ensuring that the video frame is reliably delivered before a deadline. In a slow fading channel, we formulate a deterministic transmission policy that allocates a retransmission limit to the TPs and jointly optimizes with the transmission bitrate and power. Contrary to the intuition that avoids packet loss and retransmission to preserve energy, we demonstrate that allowing some retransmissions in the joint optimization consumes less energy, without compromising the target reliability. In a Rayleigh fading channel, a conventional approach adopts a supportable transmission rate and declares off-channel at deep fading. We generalize the approach to include various transmission schemes at different fading states and propose the probabilistic combination of these schemes. Given the fading statistics and a channel state, our policy determines to pause or to deploy a proper transmission scheme such that the video frame transmission is energy minimized, timely, and highly reliable. Extensive simulations confirm that the proposed transmission policies consume less energy than existing methods. In particular, when the deadline or the target reliability are tightened, the proposed policies yield even higher energy efficiency.

Proceedings Article•DOI•
26 May 2013
TL;DR: This paper aims to identify people from various activities such as eating, jumping, and weaving by applying an adaptive discriminant analysis method to project AEI features into a low-dimensional subspace, such that the intra-class (activities performed by the same person) variations are minimized and the interclass ( activities performed by different persons) are maximized, simultaneously.
Abstract: We investigate in this paper the problem of activity-based human identification. Different from most existing gait recognition methods where only human walking activity is considered and utilized for person identification, we aim to identify people from various activities such as eating, jumping, and weaving. For each video clip, we first extract binary human body masks by using background substraction, followed by computing the average energy image (AEI) features to represent each video clip. Then, a mapping is learned by applying an adaptive discriminant analysis (ADA) method to project AEI features into a low-dimensional subspace, such that the intra-class (activities performed by the same person) variations are minimized and the interclass (activities performed by different persons) are maximized, simultaneously. Moreover, interclass samples with large similarity difference are deemphasized and those with small difference are emphasized, such that more discriminative information can be used for recognition. Experimental results on three publicly available databases show the efficacy of our proposed approach.

Proceedings Article•DOI•
19 May 2013
TL;DR: To identify the opportunities and challenges in fast growing mobile media computing, several emerging topics including mobile visual search, retargeting, mobile video streaming, and cloud based mobile media Computing are discussed.
Abstract: In this paper, we review recent advances in mobile media communication, processing, and analysis. To identify the opportunities and challenges in fast growing mobile media computing, we discuss several emerging topics including mobile visual search, retargeting, mobile video streaming, and cloud based mobile media computing. According to the infrastructure of mobile devices vs. servers, we come up with essential concerns in mobile media computing such as wireless bandwidth consumption, mobile energy saving, media adaptation for better quality of services, the computational load shift from mobiles to servers, etc. With booming mobile Apps on diverse media consumption, it is envisioned that mobile media research and development is bringing about significant achievements in traditional topics of communication, processing, and analytics.

Proceedings Article•DOI•
19 May 2013
TL;DR: An cross-scene abnormal event detection method by adopting Bag of Words model with Spatial Pyramid Matching Kernel (SPM) cooperating with SIFT features and a SVM classifier to detect concerned events in public where scenes can be unlearned before.
Abstract: This paper presents an cross-scene abnormal event detection method by adopting Bag of Words (BoW) model with Spatial Pyramid Matching Kernel (SPM) cooperating with SIFT features and a SVM classifier. Different from existing abnormal event detection methods where abnormal events happened in a well-learned scene are considered and detected, we aim to detect concerned events in public where scenes can be unlearned before. Our method is motivated by the fact that the pattern of the notable events are similar and the learned models should be transferable to examine the events in other unlearned public scenes. To learn the patterns for an abnormal event, we divide the proposed method into two steps: feature coding and spatial pooling. For the feature coding step, the codebook is generated and the feature is quantized based on small patches. For the spatial pooling step, the patches are concatenating to exploit the spatial information of local regions. The intersection kernel is used to integrate with a SVM classifier. Experimental results on two benchmark databases demonstrate the efficacy of our proposed approach.

Proceedings Article•DOI•
15 Jul 2013
TL;DR: A new graph-based sparse coding and embedding (GSCE) method for activity-based human identification that learns a mapping to project each frame into a low-dimensional subspace to speed up the quantization procedure, such that more discriminative information can be further exploited for classification.
Abstract: In this paper, we propose a new graph-based sparse coding and embedding (GSCE) method for activity-based human identification. Different from human activity recognition which recognizes different types of human activities such as walking, running, eating, and drinking, in this study, we aim to identify persons from his/her activities. To our best knowledge, this problem has been seldom investigated in the literature. Given a training set of video clips, we first extract human body mask in each frame and learn a codebook to quantize these masks into a histogram feature by using a graph-based sparse coding technique to better preserve the similarity information of different frames within a same video clip. Moreover, we also learn a mapping to project each frame into a low-dimensional subspace to speed up the quantization procedure, such that more discriminative information can be further exploited for classification. Experimental results on three databases are presented to show the efficacy of the proposed method.

Proceedings Article•DOI•
19 May 2013
TL;DR: This paper presents an efficient algorithm to extract the SVC bitstream by a packet prioritization scheme, which is based on the analysis of packet dependencies (PD) in encoding of full scalability.
Abstract: The Scalable Video Coding (SVC) standard offers multiple scalabilities while maintaining high coding efficiency. However, the joint coding of multiple scalabilities complicates the rate adaptation as the SVC packets possess different priorities. To address this challenge, we propose in this paper a fast rate adaptation scheme for SVC in absence of the original video sequence. Specifically, we present an efficient algorithm to extract the SVC bitstream by a packet prioritization scheme, which is based on the analysis of packet dependencies (PD) in encoding of full scalability. Experimental results demonstrate that the proposed scheme achieves significant improvement in PSNR over the basic bit extraction scheme in SVC and attains comparable PSNR performance to the Quality Layer based rate adaptation scheme while significantly reducing the computational cost.

Proceedings Article•DOI•
15 Jul 2013
TL;DR: Experimental results on three widely used face datasets are presented to show the effectiveness of the proposed CRMMD method, where each gallery and probe sample is a set of face images captured from varying poses, illuminations and expressions.
Abstract: In this paper, we propose a new collaborative reconstruction-based manifold-manifold distance (CRMMD) method for face recognition with image sets, where each gallery and probe sample is a set of face images captured from varying poses, illuminations and expressions. Given each face image set, we first model it as a nonlinear manifold and then the recognition task is converted as a manifold-manifold matching problem. For each manifold, we divide it into several clusters and describe each cluster by using a local model. Then, we use the local models from each gallery manifold to collaboratively reconstruct each local model of the testing manifold and the minimal reconstruction error is used for classification. Experimental results on three widely used face datasets are presented to show the effectiveness of the proposed method.

Proceedings Article•DOI•
15 Jul 2013
TL;DR: This paper proposes an optimal hybrid pricing scheme which allows balanced tradeoff between fairness and efficiency in network service and derives the optimal rate and power allocation for video coding and transmission such that the network service charge and video distortion are minimized under a power constraint.
Abstract: Video chat is a power and rate-intensive application which requires efficient resource utilization. Unlike video streaming which is generally one way, video chats characterize distributed two way traffics relayed via base stations. In this paper, we propose a distributed rate and power allocation framework for joint coding and transmission in wireless video chats. The base station imposes a service charge, which considers relay transmission power as a cost, for relaying video bitstreams. For clients, we derive the optimal rate and power allocation for video coding and transmission such that the network service charge and video distortion are minimized under a power constraint. For the base station, existing pricing schemes could not ensure fairness and efficiency simultaneously. We propose an optimal hybrid pricing scheme which allows balanced tradeoff between fairness and efficiency in network service. Network dynamics of video chats can be analyzed in the Stackelberg game framework, and shown to converge to the Stackelberg equilibrium. Extensive simulations confirm the performance analysis of the proposed solutions and the network dynamics.

Proceedings Article•DOI•
01 Nov 2013
TL;DR: This work proposes a model that describes power-rate-distortion characteristic of the complexity scalable video coding more accurately and demonstrates that the model with online updates can be applied to solve the rate and power allocation problem optimally in joint coding and transmission.
Abstract: Wireless video chat is a power-consuming and of high bitrate application. To prolong the operational lifetime, optimal rate and power allocations in joint coding and transmission are necessary. We exploit low motion and high inter-frame correlation in video chats to determine a complexity-scalable video coding adaptation which is Pareto optimal. We propose a model that describes power-rate-distortion (PRD) characteristic of the complexity scalable video coding more accurately. As video contents are non-stationary, we formulate an online algorithm for the model's parameters updates. We demonstrate that the PRD model with online updates can be applied to solve the rate and power allocation problem optimally in joint coding and transmission. Simulation results confirm that the model describes PRD characteristics more accurately via online recursive updates. The resource allocation scheme which is based on the PRD model yields better video quality than recent methods in a resource-constrained wireless video chat application.