scispace - formally typeset
Search or ask a question

Showing papers by "Yue Ming published in 2015"


Journal ArticleDOI
TL;DR: A robust regional bounding spherical descriptor (RBSR) is proposed to facilitate 3D face recognition and emotion analysis and three largest available databases, FRGC v2, CASIA and BU-3DFE, are contributed to the performance comparison.

42 citations


Proceedings ArticleDOI
21 Jul 2015
TL;DR: This paper proposes an effective deep learning framework by stacking multiple output features that learned through each stage of the Convolutional Neural Network (CNN), which is name as Stacked PCA Network (SPCANet).
Abstract: High-level features can represent the semantics of the original data and it is a plausible way to avoid the problem of hand-crafted features for face recognition. This paper proposes an effective deep learning framework by stacking multiple output features that learned through each stage of the Convolutional Neural Network (CNN). Different from the traditional deep learning network, we use Principal Component Analysis (PCA) to get the filter kernels of convolutional layer, which is name as Stacked PCA Network (SPCANet). Our SPCANet model follows the basic architecture of the CNN, which comprises three layers in each stage: convolutional filter layer, nonlinear processing layer and feature pooling layer. Firstly, in the convolutional filter layer of our model, PCA instead of stochastic gradient descent (SGD) is employed to learn filter kernels, and the output of all cascaded convolutional filter layers is used as the input of nonlinear processing layer. Secondly, the following nonlinear processing layer is also simplified. We use hashing method for nonlinear processing. Thirdly, the block based histograms instead of max-pooling technique are employed in the feature pooling layer. In the last output layer, the output of each stage is stacked together as one final feature output of our model. Extensive ex- periments conducted on many different face recognition scenarios demonstrate the effectiveness of our proposed approach.

42 citations


Journal ArticleDOI
05 May 2015-PLOS ONE
TL;DR: The new method of 3D human behavior recognition has achieved the rapid and efficient recognition of behavior videos and has good robustness for different environmental colors, lightings and other factors.
Abstract: With the rapid development of 3D somatosensory technology, human behavior recognition has become an important research field. Human behavior feature analysis has evolved from traditional 2D features to 3D features. In order to improve the performance of human activity recognition, a human behavior recognition method is proposed, which is based on a hybrid texture-edge local pattern coding feature extraction and integration of RGB and depth videos information. The paper mainly focuses on background subtraction on RGB and depth video sequences of behaviors, extracting and integrating historical images of the behavior outlines, feature extraction and classification. The new method of 3D human behavior recognition has achieved the rapid and efficient recognition of behavior videos. A large number of experiments show that the proposed method has faster speed and higher recognition rate. The recognition method has good robustness for different environmental colors, lightings and other factors. Meanwhile, the feature of mixed texture-edge uniform local binary pattern can be used in most 3D behavior recognition.

20 citations


Patent
23 Apr 2015
TL;DR: In this article, a three-dimensional facial recognition method and system is presented, which includes: performing pose estimation on an input binocular vision image pair by using a 3D facial reference model, to obtain a pose parameter and a virtual image pair of the 3D reference model with respect to the input image pair; reconstructing a facial depth image of the binocular image image pair using the virtual image pairs as prior information; detecting, according to the pose parameter, a local grid scale-invariant feature descriptor corresponding to an interest point in the facial depth images; and
Abstract: The present disclosure provides a three-dimensional facial recognition method and system. The method includes: performing pose estimation on an input binocular vision image pair by using a three-dimensional facial reference model, to obtain a pose parameter and a virtual image pair of the three-dimensional facial reference model with respect to the binocular vision image pair; reconstructing a facial depth image of the binocular vision image pair by using the virtual image pair as prior information; detecting, according to the pose parameter, a local grid scale-invariant feature descriptor corresponding to an interest point in the facial depth image; and generating a recognition result of the binocular vision image pair according to the detected local grid scale-invariant feature descriptor and training data having attached category annotations. The present disclosure can reduce computational costs and required storage space.

15 citations


Journal ArticleDOI
TL;DR: A complete framework of hand activity recognition combined depth information is presented for fine-motion analysis and the improved graph cuts method is introduced to hand location and tracking over time, which has a consistently better performance for real-world applications with fine- motion analysis.

15 citations


Patent
14 Oct 2015
TL;DR: In this article, a three-dimensional face image feature extraction method and system is presented, which comprises steps of performing face region segmentation to obtain a group of face regions; projecting each face region to a corresponding regional bounding sphere; obtaining expression of a corresponding face region according to a regional boundeding sphere, and labeling the face region as a regional Bounding spherical descriptor.
Abstract: The invention provides a three-dimensional face image feature extraction method and system. The method comprises steps of performing face region segmentation to obtain a group of face regions; projecting each face region to a corresponding regional bounding sphere; obtaining expression of a corresponding face region according to a regional bounding sphere, and labeling the face region as a regional bounding spherical descriptor; calculating the weight of the regional bounding spherical descriptor of each face region; and obtaining features of a three-dimensional face image according to expressions of the face regions and the corresponding weights. With the adoption of the method, the extracted features of the three-dimensional face image are enabled to meet face identification and sentiment analysis at the same time. In addition, the invention also provides a three-dimensional face image feature extraction system.

8 citations


Patent
21 Apr 2015
TL;DR: In this article, a hand motion identification method is proposed, which includes: obtaining a to-be-identified video, performing area-locating and tracking of a hand for the video, and extracting a red-green-blue (RGB) video and a depth information video of the hand, to obtain a feature point; representing the feature point by using a 3D Mesh motion scale-invariant feature transform (MoSIFT) feature descriptor.
Abstract: A hand motion identification method includes: obtaining a to-be-identified video; performing area-locating and tracking of a hand for the to-be-identified video, and extracting a red-green-blue (RGB) video and a depth information video of the hand; detecting the RGB video and the depth information video of the hand, to obtain a feature point; representing the feature point by using a 3D Mesh motion scale-invariant feature transform (MoSIFT) feature descriptor; and comparing the 3D Mesh MoSIFT feature descriptor of the feature point with a 3D Mesh MoSIFT feature descriptor in a positive sample obtained through beforehand training, to obtain a hand motion category in the to-be-identified video.

6 citations


Book ChapterDOI
24 Aug 2015
TL;DR: This work takes advantage of the superiority of Spectral Graph Theory in classification application and proposes a novel deep learning framework for face analysis called Spectral Regression Discriminant Analysis Network SRDANet, which demonstrates an excellent performance in face recognition and expression recognition with 2D/3D facial images simultaneously.
Abstract: In this work, we take advantage of the superiority of Spectral Graph Theory in classification application and propose a novel deep learning framework for face analysis which is called Spectral Regression Discriminant Analysis Network SRDANet. Our SRDANet model shares the same basic architecture of Convolutional Neural Network CNN, which comprises three basic components: convolutional filter layer, nonlinear processing layer and feature pooling layer. While it is different from traditional deep learning network that in our convolutional layer, we extract the leading eigenvectors from patches in facial image which are used as filter kernels instead of randomly initializing kernels and update them by stochastic gradient descent SGD. And the output of all cascaded convolutional filter layers is used as the input of nonlinear processing layer. In the following nonlinear processing layer, we use hashing method for nonlinear processing. In feature pooling layer, the block-based histograms are employed to pooling output features instead of max-pooling technique. At last, the output of feature pooling layer is considered as one final feature output of our model. Different from the previous single-task research for face analysis, our proposed approach demonstrates an excellent performance in face recognition and expression recognition with 2D/3D facial images simultaneously. Extensive experiments conducted on many different face analysis databases demonstrate the efficiency of our proposed SRDANet model. Databases such as Extended Yale B, PIE, ORL are used for 2D face recognition, FRGC v2 is used for 3D face recognition and BU-3DFE is used for 3D expression recognition.

5 citations


Patent
01 Apr 2015
TL;DR: In this paper, a method for extracting a characteristic of a 3D face image is proposed, which involves performing face area division to obtain a group of face areas and projecting each face area onto a corresponding regional bounding sphere.
Abstract: A method for extracting a characteristic of a three-dimensional face image includes: performing face area division, to obtain a group of face areas; projecting each face area onto a corresponding regional bounding sphere; obtaining an indication of the corresponding face area according to the regional bounding sphere, and recording the indication as a regional bounding spherical descriptor of the face area; calculating a weight of the regional bounding spherical descriptor of the face area for each face area; and obtaining a characteristic of a three-dimensional face image according to the indication of the face area and the corresponding weight.

5 citations


Patent
28 Oct 2015
TL;DR: In this article, a three-dimensional face identification method and system based on attitude estimation of an input binocular visual image pair via a 3D facial reference model is presented. But the method is limited to face detection.
Abstract: The invention provides a three-dimensional face identification method and system. The method comprises: performing attitude estimation of an input binocular visual image pair via a three-dimensional facial reference model to obtain a virtual image pair of the binocular visual image pair relative to an attitude parameter and the three-dimensional face reference model; rebuilding a facial depth image of the binocular visual image pair by taking the virtual image pair as prior information; detecting a local-mesh scale-invariant feature descriptor corresponding to characteristic points of the facial depth image according to the attitude parameter; and generating an identification result of the binocular visual image pair according to the detected local-mesh scale-invariant feature descriptor and training data attached with category labels. According to the invention, computational expense and required storage space are reduced.

4 citations


Proceedings ArticleDOI
25 Jun 2015
TL;DR: The experimental results demonstrate the algorithm proposed can make up for the deficiency of traditional activity recognition algorithms effectively and provide excellent experiment results on different databases of various complexities.
Abstract: Nowadays, more and more activity recognition algorithms begin to improve recognition performance by combining the RGB and depth information. Although, the space-time volumes (STV) algorithm and the space-time local features algorithm can combine the RGB and depth information effectively, they also have their own defects. Such as they need expensive computational cost and they are not suitable for modeling nonperiodic activity. In this paper, we propose a novel algorithm for three dimensional human activity recognition that combines spatial-domain local texture features and spatio-temporal local texture features. On the one hand, in order to extract spatial local texture features, we mix the RGB and depth image sequence which have been applied with ViBe (Visual Background extractor) and binarization operator. Then we obtain the RGB-MOHBBI and depth-MOBHBI respectively and perform intersect operation on them. Afterwards, we extract LBP feature from the mixed MOHBBI to describe spatial domain feature. On the other hand, we follow the same background subtraction and binarization method to process the RGB and depth image sequences and get the spatial-temporal local texture features. And then, we project the three dimensional image volume on plane X-T and plane Y-T to get the spatio-temporal behavior volume change image to which we apply LBP operator to extract features that can represent human activity feature in spatio-temporal domain. At last, we combine the two local features that are extracted by LBP algorithm as one integrated feature of our model final output. Extensive experiments are conducted on the BUPT Arm Activity Dataset and the BUPT Arm And Finger Activity Dataset. The experimental results demonstrate the algorithm we proposed in this paper can make up for the deficiency of traditional activity recognition algorithms effectively and provide excellent experiment results on different databases of various complexities.

Patent
04 Nov 2015
TL;DR: In this paper, a hand motion detection method and apparatus was proposed to acquire feature points from RGB and depth information video pairs and compare the feature points with three-dimensional grid motion SIFT feature descriptors.
Abstract: The invention relates to a hand motion identifying method and apparatus. The method comprises the following steps of acquiring a video to be identified; positioning and tracking a hand area in the video to be identified and extracting hand RGB video and depth information video pairs; detecting the hand RGB video and depth information video pairs to acquire feature points; adopting three-dimensional grid motion SIFT (scale-invariant feature transformation) feature descriptors to express the feature points; and comparing the three-dimensional grid motion SIFT feature descriptors of the feature points with three-dimensional grid motion SIFT feature descriptors acquired in advance by training and located in a positive class to acquire the class of hand motion in the video to be identified. The above-mentioned hand motion identifying method and apparatus extracting the feature points including the depth information make hand identifying accuracy greatly improved. By further adopting the three-dimensional grid motion SIFT feature descriptors to accurately express the feature points, the method and apparatus further improve the hand motion identifying accuracy.

Book ChapterDOI
01 Jan 2015
TL;DR: Experimental results showed that the proposed web content extraction method can effectively and accurately extract web content in different themes.
Abstract: Currently, web content extraction methods mostly focus on single-theme pages and have poor adaptability for multi-theme pages. In order to overcome this issue, this paper proposed a web content extraction method based on the punctuation distribution and HTML tag similarity. According to the characteristic that most of the punctuation appeared in the main text areas but rarely appeared in noise areas of web pages, an algorithm of obtaining minimum text area was presented. Furthermore, in the case of multi-theme pages, this paper proposed an approach to extract the titles and contents from each theme by further dividing the minimum text area into sub theme areas based on the tag similarity. Experimental results showed that the proposed method can effectively and accurately extract web content in different themes.

Book ChapterDOI
24 Aug 2015
TL;DR: Experimental results based on the common international 3D face databases demonstrate the higher-qualified performance of the proposed algorithm with effectiveness, robustness, and universality.
Abstract: In this paper, a robust 3D local SIFT feature is proposed for 3D face recognition. For preprocessing the original 3D face data, facial regional segmentation is first employed by fusing curvature characteristics and shape band mechanism. Then, we design a new local descriptor for the extracted regions, called 3D local Scale-Invariant Feature Transform 3D LSIFT. The key point detection based on 3D LSIFT can effectively reflect the geometric characteristic of 3D facial surface by encoding the gray and depth information captured by 3D face data. Then, 3D LSIFT descriptor extends to describe the discrimination on 3D faces. Experimental results based on the common international 3D face databases demonstrate the higher-qualified performance of our proposed algorithm with effectiveness, robustness, and universality.