scispace - formally typeset
Search or ask a question
Author

Ming Shao

Bio: Ming Shao is an academic researcher from University of Massachusetts Dartmouth. The author has contributed to research in topics: Facial recognition system & Kinship. The author has an hindex of 31, co-authored 121 publications receiving 3298 citations. Previous affiliations of Ming Shao include Northeastern University & University of Massachusetts Amherst.


Papers
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper presents a multi-stream bi-directional recurrent neural network for fine-grained action detection that significantly outperforms state-of-the-art action detection methods on both datasets.
Abstract: We present a multi-stream bi-directional recurrent neural network for fine-grained action detection. Recently, twostream convolutional neural networks (CNNs) trained on stacked optical flow and image frames have been successful for action recognition in videos. Our system uses a tracking algorithm to locate a bounding box around the person, which provides a frame of reference for appearance and motion and also suppresses background noise that is not within the bounding box. We train two additional streams on motion and appearance cropped to the tracked bounding box, along with full-frame streams. Our motion streams use pixel trajectories of a frame as raw features, in which the displacement values corresponding to a moving scene point are at the same spatial position across several frames. To model long-term temporal dynamics within and between actions, the multi-stream CNN is followed by a bi-directional Long Short-Term Memory (LSTM) layer. We show that our bi-directional LSTM network utilizes about 8 seconds of the video sequence to predict an action label. We test on two action detection datasets: the MPII Cooking 2 Dataset, and a new MERL Shopping Dataset that we introduce and make available to the community with this paper. The results demonstrate that our method significantly outperforms state-of-the-art action detection methods on both datasets.

468 citations

Journal ArticleDOI
TL;DR: Extensive experiments on synthetic data, and important computer vision problems such as face recognition application and visual domain adaptation for object recognition demonstrate the superiority of the proposed approach over the existing, well-established methods.
Abstract: It is expensive to obtain labeled real-world visual data for use in training of supervised algorithms. Therefore, it is valuable to leverage existing databases of labeled data. However, the data in the source databases is often obtained under conditions that differ from those in the new task. Transfer learning provides techniques for transferring learned knowledge from a source domain to a target domain by finding a mapping between them. In this paper, we discuss a method for projecting both source and target data to a generalized subspace where each target sample can be represented by some combination of source samples. By employing a low-rank constraint during this transfer, the structure of source and target domains are preserved. This approach has three benefits. First, good alignment between the domains is ensured through the use of only relevant data in some subspace of the source domain in reconstructing the data in the target domain. Second, the discriminative power of the source domain is naturally passed on to the target domain. Third, noisy information will be filtered out during knowledge transfer. Extensive experiments on synthetic data, and important computer vision problems such as face recognition application and visual domain adaptation for object recognition demonstrate the superiority of the proposed approach over the existing, well-established methods.

293 citations

Journal ArticleDOI
TL;DR: Experimental results have shown that the proposed algorithms can effectively annotate the kin relationships among people in an image and semantic context can further improve the accuracy.
Abstract: There is an urgent need to organize and manage images of people automatically due to the recent explosion of such data on the Web in general and in social media in particular. Beyond face detection and face recognition, which have been extensively studied over the past decade, perhaps the most interesting aspect related to human-centered images is the relationship of people in the image. In this work, we focus on a novel solution to the latter problem, in particular the kin relationships. To this end, we constructed two databases: the first one named UB KinFace Ver2.0, which consists of images of children, their young parents and old parents, and the second one named FamilyFace. Next, we develop a transfer subspace learning based algorithm in order to reduce the significant differences in the appearance distributions between children and old parents facial images. Moreover, by exploring the semantic relevance of the associated metadata, we propose an algorithm to predict the most likely kin relationships embedded in an image. In addition, human subjects are used in a baseline study on both databases. Experimental results have shown that the proposed algorithms can effectively annotate the kin relationships among people in an image and semantic context can further improve the accuracy.

239 citations

Proceedings ArticleDOI
16 Jul 2011
TL;DR: Experimental results show that the hypothesis on the role of young parents is valid and transfer learning is effective to enhance the verification accuracy and the large gap between distributions can be significantly reduced and kinship verification problem becomesmore discriminative.
Abstract: Because of the inevitable impact factors such as pose, expression, lighting and aging on faces, identity verification through faces is still an unsolved problem. Research on biometrics raises an even challenging problem--is it possible to determine the kinship merely based on face images? A critical observation that faces of parents captured while they were young are more alike their children's compared with images captured when they are old has been revealed by genetics studies. This enlightens us the following research. First, a new kinship database named UB KinFace composed of child, young parent and old parent face images is collected from Internet. Second, an extended transfer subspace learning method is proposed aiming at mitigating the enormous divergence of distributions between children and old parents. The key idea is to utilize an intermediate distribution close to both the source and target distributions to bridge them and reduce the divergence. Naturally the young parent set is suitable for this task. Through this learning process, the large gap between distributions can be significantly reduced and kinship verification problem becomesmore discriminative. Experimental results show that our hypothesis on the role of young parents is valid and transfer learning is effective to enhance the verification accuracy.

154 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work forms a novel framework to jointly seek a low-rank embedding and semantic dictionary to link visual features with their semantic representations, which manages to capture shared features across different observed classes.
Abstract: Zero-shot learning for visual recognition has received much interest in the most recent years. However, the semantic gap across visual features and their underlying semantics is still the biggest obstacle in zero-shot learning. To fight off this hurdle, we propose an effective Low-rank Embedded Semantic Dictionary learning (LESD) through ensemble strategy. Specifically, we formulate a novel framework to jointly seek a low-rank embedding and semantic dictionary to link visual features with their semantic representations, which manages to capture shared features across different observed classes. Moreover, ensemble strategy is adopted to learn multiple semantic dictionaries to constitute the latent basis for the unseen classes. Consequently, our model could extract a variety of visual characteristics within objects, which can be well generalized to unknown categories. Extensive experiments on several zero-shot benchmarks verify that the proposed model can outperform the state-of-the-art approaches.

129 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This survey revisits feature selection research from a data perspective and reviews representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data, and categorizes them into four main groups: similarity- based, information-theoretical-based, sparse-learning-based and statistical-based.
Abstract: Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data-mining and machine-learning problems. The objectives of feature selection include building simpler and more comprehensible models, improving data-mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data. Methodologically, to emphasize the differences and similarities of most existing feature selection algorithms for conventional data, we categorize them into four main groups: similarity-based, information-theoretical-based, sparse-learning-based, and statistical-based methods. To facilitate and promote the research in this community, we also present an open source feature selection repository that consists of most of the popular feature selection algorithms (http://featureselection.asu.edu/). Also, we use it as an example to show how to evaluate feature selection algorithms. At the end of the survey, we present a discussion about some open problems and challenges that require more attention in future research.

1,566 citations

Book ChapterDOI
08 Oct 2016
TL;DR: This paper introduces new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell, and proposes a more powerful tree-structure based traversal method.
Abstract: 3D action recognition – analysis of human actions based on 3D skeleton data – becomes popular recently due to its succinctness, robustness, and view-invariant representation. Recent attempts on this problem suggested to develop RNN-based learning methods to model the contextual dependency in the temporal domain. In this paper, we extend this idea to spatio-temporal domains to analyze the hidden sources of action-related information within the input data over both domains concurrently. Inspired by the graphical structure of the human skeleton, we further propose a more powerful tree-structure based traversal method. To handle the noise and occlusion in 3D skeleton data, we introduce new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell. Our method achieves state-of-the-art performance on 4 challenging benchmark datasets for 3D human action analysis.

1,230 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: Quantitative and qualitative evaluation on both controlled and in-the-wild databases demonstrate the superiority of DR-GAN over the state of the art.
Abstract: The large pose discrepancy between two face images is one of the key challenges in face recognition. Conventional approaches for pose-invariant face recognition either perform face frontalization on, or learn a pose-invariant representation from, a non-frontal face image. We argue that it is more desirable to perform both tasks jointly to allow them to leverage each other. To this end, this paper proposes Disentangled Representation learning-Generative Adversarial Network (DR-GAN) with three distinct novelties. First, the encoder-decoder structure of the generator allows DR-GAN to learn a generative and discriminative representation, in addition to image synthesis. Second, this representation is explicitly disentangled from other face variations such as pose, through the pose code provided to the decoder and pose estimation in the discriminator. Third, DR-GAN can take one or multiple images as the input, and generate one unified representation along with an arbitrary number of synthetic images. Quantitative and qualitative evaluation on both controlled and in-the-wild databases demonstrate the superiority of DR-GAN over the state of the art.

1,016 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide a thorough overview on using a class of advanced machine learning techniques, namely deep learning (DL), to facilitate the analytics and learning in the IoT domain.
Abstract: In the era of the Internet of Things (IoT), an enormous amount of sensing devices collect and/or generate various sensory data over time for a wide range of fields and applications. Based on the nature of the application, these devices will result in big or fast/real-time data streams. Applying analytics over such data streams to discover new information, predict future insights, and make control decisions is a crucial process that makes IoT a worthy paradigm for businesses and a quality-of-life improving technology. In this paper, we provide a thorough overview on using a class of advanced machine learning techniques, namely deep learning (DL), to facilitate the analytics and learning in the IoT domain. We start by articulating IoT data characteristics and identifying two major treatments for IoT data from a machine learning perspective, namely IoT big data analytics and IoT streaming data analytics. We also discuss why DL is a promising approach to achieve the desired analytics in these types of data and applications. The potential of using emerging DL techniques for IoT data analytics are then discussed, and its promises and challenges are introduced. We present a comprehensive background on different DL architectures and algorithms. We also analyze and summarize major reported research attempts that leveraged DL in the IoT domain. The smart IoT devices that have incorporated DL in their intelligence background are also discussed. DL implementation approaches on the fog and cloud centers in support of IoT applications are also surveyed. Finally, we shed light on some challenges and potential directions for future research. At the end of each section, we highlight the lessons learned based on our experiments and review of the recent literature.

903 citations