Bio: Alifu Kuerban is an academic researcher from Xinjiang University. The author has contributed to research in topics: Facial recognition system & Computer science. The author has an hindex of 3, co-authored 5 publications receiving 175 citations.
TL;DR: A novel Point-to-Set Correlation Learning (PSCL) method is proposed, and experimentally shown that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB.
Abstract: Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Video-to-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX 1 Face DB. Specifically, we make three contributions. First, we collect and release a large-scale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation. 1 COX Face DB was constructed by Institute of Computing Technology, Chinese Academy of Sciences ( C AS) under the sponsor of OMRON Social Solutions Co. Ltd. ( O SS), and the support of X injiang University.
05 Nov 2012
TL;DR: Evaluation results not only show the grand challenges of the COX-S2V, but also validate the effectiveness of the proposed PaLo-LDA method over the competitive methods.
Abstract: In this paper, we explore the real-world Still-to-Video (S2V) face recognition scenario, where only very few (single, in many cases) still images per person are enrolled into the gallery while it is usually possible to capture one or multiple video clips as probe. Typical application of S2V is mug-shot based watch list screening. Generally, in this scenario, the still image(s) were collected under controlled environment, thus of high quality and resolution, in frontal view, with normal lighting and neutral expression. On the contrary, the testing video frames are of low resolution and low quality, possibly with blur, and captured under poor lighting, in non-frontal view. We reveal that the S2V face recognition has been heavily overlooked in the past. Therefore, we provide a benchmarking in terms of both a large scale dataset and a new solution to the problem. Specifically, we collect (and release) a new dataset named COX-S2V, which contains 1,000 subjects, with each subject a high quality photo and four video clips captured simulating video surveillance scenario. Together with the database, a clear evaluation protocol is designed for benchmarking. In addition, in addressing this problem, we further propose a novel method named Partial and Local Linear Discriminant Analysis (PaLo-LDA). We then evaluated the method on COX-S2V and compared with several classic methods including LDA, LPP, ScSR. Evaluation results not only show the grand challenges of the COX-S2V, but also validate the effectiveness of the proposed PaLo-LDA method over the competitive methods.
••10 Jul 2020
TL;DR: This work presents an object detection method based on single-shot detector (SSD), which focuses on accurate and real-time face masks detection in the supermarket, and proposes a Feature Enhancement Module (FEM) to strengthen the deep features learned from CNN models.
Abstract: Object detection, which aims to automatically mark the coordinates of objects of interest in pictures or videos, is an extension of image classification. In recent years, it has been widely used in intelligent traffic management, intelligent monitoring systems, military object detection, and surgical instrument positioning in medical navigation surgery, etc. COVID-19, a novel coronavirus outbreak at the end of 2019, poses a serious threat to public health. Many countries require everyone to wear a mask in public to prevent the spread of coronavirus. To effectively prevent the spread of the coronavirus, we present an object detection method based on single-shot detector (SSD), which focuses on accurate and real-time face masks detection in the supermarket. We make contributions in the following three aspects: 1) presenting a lightweight backbone network for feature extraction, which based on SSD and spatial separable convolution, aiming to improve the detection speed and meet the requirements of real-time detection; 2) proposing a Feature Enhancement Module (FEM) to strengthen the deep features learned from CNN models, aiming to enhance the feature representation of the small objects; 3) constructing COVID-19Mask, a large-scale dataset to detect whether shoppers are wearing masks, by collecting images in two supermarkets. The experiment results illustrate the high detection precision and real-time performance of the proposed algorithm.
••02 Oct 2009
TL;DR: This article narrates the composition of frame net according to the description content, conducts the description and the classification to the frame element's semantic role of modern Uyghur frame net, determines the semantic role labeling system and lays the good foundation for the UYghur framenet syntax and semantics recognition and the analysis.
Abstract: This article carries on a preliminary discussion and attempt to the Uyghur source language's frame semantics description system and the content, narrates the composition of frame net according to the description content, conducts the description and the classification to the frame element's semantic role of modern Uyghur frame net, determines the semantic role labeling system, lays the good foundation for the Uyghur framenet syntax and semantics recognition and the analysis. It also explores a feasible method and the mentality for the foundation Uyghur framenet based on the cognition.
••01 Mar 2021
TL;DR: Wang et al. as discussed by the authors developed a sign language translation animation library based on the unified, standardized and common grammar sign language, which can help deaf people to watch video with Chinese subtitles.
Abstract: Sign language is a common language for deaf and mute people to communicate information. In order to make deaf people watch video without barrier, video plays and sign language translation solves the problem of deaf people obtaining video information. Subtitle processing and Chinese word segmentation were adopted to find corresponding words in sign language dictionary, transcoding was sent to unity through socket mechanism, realizing the process of sign language translation driven by video subtitle. Deaf people can watch video with Chinese subtitles and see clear, smooth and natural sign language translation animation when using it. The sign language translation animation library is based on the unified, standardized and common grammar sign language. Users can easily get the information in video and gradually standardize the use of sign language. Movie sign language translation system is of special significance to the hearing impaired people to improve their quality of life and has produced practical value for social progress.
01 Jan 1993
TL;DR: A Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components, achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces.
Abstract: Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.
TL;DR: Major deep learning concepts pertinent to face image analysis and face recognition are reviewed, and a concise overview of studies on specific face recognition problems is provided, such as handling variations in pose, age, illumination, expression, and heterogeneous face matching.
06 Jul 2015
TL;DR: This paper proposes a novel metric learning approach to work directly on logarithms of SPD matrices by learning a tangent map that can directly transform the matrix Log-Euclidean Metric from the original tangent space to a new tangentspace of more discriminability.
Abstract: The manifold of Symmetric Positive Definite (SPD) matrices has been successfully used for data representation in image set classification. By endowing the SPD manifold with Log-Euclidean Metric, existing methods typically work on vector-forms of SPD matrix logarithms. This however not only inevitably distorts the geometrical structure of the space of SPD matrix logarithms but also brings low efficiency especially when the dimensionality of SPD matrix is high. To overcome this limitation, we propose a novel metric learning approach to work directly on logarithms of SPD matrices. Specifically, our method aims to learn a tangent map that can directly transform the matrix logarithms from the original tangent space to a new tangent space of more discriminability. Under the tangent map framework, the novel metric learning can then be formulated as an optimization problem of seeking a Mahalanobis-like matrix, which can take the advantage of traditional metric learning techniques. Extensive evaluations on several image set classification tasks demonstrate the effectiveness of our proposed metric learning method.
TL;DR: In this paper, a Riemannian network architecture is proposed for symmetric positive definite (SPD) matrix learning, where bilinear mapping layers are used to transform the input SPD matrices to more desirable SPD matrix matrices, eigenvalue rectification layers are exploited to apply a non-linear activation function to the new non-regular activation function, and an eigen value logarithm layer is designed to perform Riemanian computing on the resulting SPD matures for regular output layers.
Abstract: Symmetric Positive Definite (SPD) matrix learning methods have become popular in many image and video processing tasks, thanks to their ability to learn appropriate statistical representations while respecting Riemannian geometry of underlying SPD manifolds. In this paper we build a Riemannian network architecture to open up a new direction of SPD matrix non-linear learning in a deep model. In particular, we devise bilinear mapping layers to transform input SPD matrices to more desirable SPD matrices, exploit eigenvalue rectification layers to apply a non-linear activation function to the new SPD matrices, and design an eigenvalue logarithm layer to perform Riemannian computing on the resulting SPD matrices for regular output layers. For training the proposed deep network, we exploit a new backpropagation with a variant of stochastic gradient descent on Stiefel manifolds to update the structured connection weights and the involved SPD matrix data. We show through experiments that the proposed SPD matrix network can be simply trained and outperform existing SPD matrix learning and state-of-the-art methods in three typical visual classification tasks.