scispace - formally typeset
Search or ask a question

Showing papers by "Yue Ming published in 2016"


Journal ArticleDOI
TL;DR: Extensive experimental results show the MS-PCANet model can efficiently extract high-level feature presentations and outperform state-of-the-art face/expression recognition methods on multiple modalities benchmark face-related datasets.
Abstract: It is well known that higher level features can represent the abstract semantics of original data. We propose a multiple scales combined deep learning network to learn a set of high-level feature representations through each stage of convolutional neural network for face recognition, which is named as multiscaled principle component analysis (PCA) Network (MS-PCANet). There are two main differences between our model and the traditional deep learning network. On the one hand, we get the prefixed filter kernels by learning the principal component of images’ patches using PCA, nonlinearly process the convolutional results by using simple binary hashing, and pool them using spatial pyramid pooling method. On the other hand, in our model, the output features of several stages are fed to the classifier. The purpose of combining feature representations from multiple stages is to provide multiscaled features to the classifier, since the features in the latter stage are more global and invariant than those in the early stage. Therefore, our MS-PCANet feature compactly encodes both holistic abstract information and local specific information. Extensive experimental results show our MS-PCANet model can efficiently extract high-level feature presentations and outperform state-of-the-art face/expression recognition methods on multiple modalities benchmark face-related datasets.

21 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: The issue of face anti-spoofing with good performance in accuracy is explored by utilizing optical flow vector on two types of attacks: photos and videos shown on high-resolution electronic screens and spoofing attacks.
Abstract: Spoofing attack can easily deceive face recognition system. In this paper, we explore the issue of face anti-spoofing with good performance in accuracy by utilizing optical flow vector on two types of attacks: photos and videos shown on high-resolution electronic screens. The key idea is to calculate the displacement of optical flow vector between two successive frames of a face video and obtain a displacement sum of a certain number of frames. Under the circumstance of stable light, the sum of displacement differs from real access and other spoofing attacks. In this situation, we experiment on REPLAY-ATTACK, a common and popular face spoofing database which shows a good performance. We conclude that spoofing attacks and real faces have different optical flow motion trend that our method shows temperate guesstimate when facing with a broad set of face attacks.

15 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: This paper describes a system for human-computer interaction through images' local features SURF, and it uses threshold segmentation and bag-of-words algorithms to reduce the feature space dimensions.
Abstract: Hand gestures recognitions play an important role in human-computer interaction. To facilitate the understanding of computer vision-based hand gesture recognition, this paper describes a system for human-computer interaction through images' local features SURF, and we use threshold segmentation and bag-of-words algorithms to reduce the feature space dimensions. Leap motion is capable of collecting 800 images of hand gestures as 8 types efficiently. On the self-built database, we carry out experiments about SURF, LBP and geometric structure features by using SVM, RBF neural network and BP neural network to test performance and improve accuracy. The results of experiments indicate a good effect in aspect of recognition correctly of 99.5% using RBF neural network.

13 citations


Journal ArticleDOI
TL;DR: 3D motion scale invariant feature transform is developed for the description of the depth and motion information and serves as a more effective descriptor for the RGB and depth videos.
Abstract: Human vision system can receive the RGB and depth information at the same time and make an accurate judgment on human behaviors. However, in an ordinary camera, there is a loss in information when a 3D image is projected to a 2D plane. The depth and RGB information collected simultaneously by Kinect can provide more discriminant information for human behaviors than traditional cameras. Therefore, RGB-D camera is thought to be the key of solving human behavior recognition for a long time. In this paper, we develop 3D motion scale invariant feature transform for the description of the depth and motion information. It serves as a more effective descriptor for the RGB and depth videos. Hidden Markov Model is utilized for improving the accuracy of human behavior recognition. Experiments show that our framework provides richer information for discriminative point of behavior analysis and obtains better recognition performance.

12 citations


Journal ArticleDOI
TL;DR: A facial depth recovery method to construct a facial depth map from stereoscopic videos and a novel feature descriptor is proposed, called Local Mesh Scale-Invariant Feature Transform (LMSIFT) to reflect the different face recognition abilities in different facial regions.

11 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: This paper proposed a novel method to change this by fusing rhythm feature with gammatone frequency cepstral coefficients(GFCC) feature, which can improve the recognition rates by 21.3% on average based on the music database by using rhythm features fused with GFCC features comparing with Predominant melody, MFCC andGFCC features.
Abstract: Rhythm information, which plays an important role in music features, still has a long way to go. Most current researches on this field are based on single feature, which is unstable. In this paper, we proposed a novel method to change this by fusing rhythm feature with gammatone frequency cepstral coefficients(GFCC) feature. After the pre-processing including detecting the beginning of songs, removing the silence part using Energy and Zero crossing rate, our training and testing features are generated by fusing rhythm feature including pitch, tempo etc. with GFCC feature. Furthermore, we present several ways to measure the rhythmic similarity between two or more songs. This allows similar songs to be retrieved from a large collection. For recognition, we choose the Dynamic Time Warping(DTW) algorithm calculating the distance between test music and music database, and then we get a ranking list based on distance. It is demonstrated that we can improve the recognition rates by 21.3% on average based on our music database by using rhythm features fused with GFCC features comparing with Predominant melody, MFCC and GFCC features. Our music database has 500 songs and we choose 100 songs as testing music.

9 citations


Journal ArticleDOI
TL;DR: This paper proposes a simple and efficient iterative quantization binary codes (IQBC) feature learning method to learn a discriminative binary face descriptor in the data-driven way, and demonstrates that this IQBC descriptor outperforms other state-of-the-art face descriptors.

8 citations


Book ChapterDOI
20 Nov 2016
TL;DR: This paper introduces the current weighted non-locally self-similarity (WNLSS) method, which is originally proposed to remove the noise for natural images, into the face deblurring model and shows the properties of data fidelity term and regularization term also can fit well for facedeblurring problem.
Abstract: The human face is one of the most interesting subjects in various computer vision tasks. In recent years, significant progress has been made for generic image deblurring problem, but existing popular sparse representation based deblurring methods are not able to achieve excellent results on blurry face images. The failure of these methods mainly stems from the lack of local/non-local self-similarity prior knowledge. There are many similar non-local patches in the neighborhood of a given patch in a face image, therefore, this property should be effectively exploited to obtain a good estimation of the sparse coding coefficients. In this paper, we introduce the current weighted non-locally self-similarity (WNLSS) method [1], which is originally proposed to remove the noise for natural images, into the face deblurring model. There are two terms in the WNLSS sparse representation model, data fidelity term and regularization term. Based on the theoretical analysis, we show the properties of data fidelity term and regularization term also can fit well for face deblurring problem. The results also demonstrate that WNLSS method can achieve excellent performance in terms of both synthetic and real blurred face dataset.

6 citations


Posted Content
TL;DR: This paper proposes a new activation function, Multiple Parametric Exponential Linear Units (MPELU), aiming to generalize and unify the rectified and exponential linear units, and presents a deep MPELU residual architecture that achieves state-of-the-art performance on the CIFAR-10/100 datasets.
Abstract: Activation function is crucial to the recent successes of deep neural networks. In this paper, we first propose a new activation function, Multiple Parametric Exponential Linear Units (MPELU), aiming to generalize and unify the rectified and exponential linear units. As the generalized form, MPELU shares the advantages of Parametric Rectified Linear Unit (PReLU) and Exponential Linear Unit (ELU), leading to better classification performance and convergence property. In addition, weight initialization is very important to train very deep networks. The existing methods laid a solid foundation for networks using rectified linear units but not for exponential linear units. This paper complements the current theory and extends it to the wider range. Specifically, we put forward a way of initialization, enabling training of very deep networks using exponential linear units. Experiments demonstrate that the proposed initialization not only helps the training process but leads to better generalization performance. Finally, utilizing the proposed activation function and initialization, we present a deep MPELU residual architecture that achieves state-of-the-art performance on the CIFAR-10/100 datasets. The code is available at this https URL.

1 citations