scispace - formally typeset
Search or ask a question
Author

Xiaoqing Feng

Bio: Xiaoqing Feng is an academic researcher from Zhejiang University of Finance and Economics. The author has contributed to research in topics: Autoencoder & Feature learning. The author has an hindex of 1, co-authored 1 publications receiving 66 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A two-stage framework for multimodal video classification based on stacked contractive autoencoders based on deep networks for video classification achieves better performance compared with the state-of-the-art methods.

86 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work first classify deep multimodal learning architectures and then discusses methods to fuse learned multi-modal representations in deep-learning architectures.
Abstract: The success of deep learning has been a catalyst to solving increasingly complex machine-learning problems, which often involve multiple data modalities. We review recent advances in deep multimodal learning and highlight the state-of the art, as well as gaps and challenges in this active research field. We first classify deep multimodal learning architectures and then discuss methods to fuse learned multimodal representations in deep-learning architectures. We highlight two areas of research–regularization strategies and methods that learn or optimize multimodal fusion structures–as exciting areas for future work.

529 citations

Journal ArticleDOI
TL;DR: The superiority of the MESAE against the state-of-the-art shallow and deep emotion classifiers has been demonstrated under different sizes of the available physiological instances.

306 citations

Journal ArticleDOI
TL;DR: The key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of the knowledge, have never been reviewed previously are highlighted.
Abstract: Multimodal representation learning, which aims to narrow the heterogeneity gap among different modalities, plays an indispensable role in the utilization of ubiquitous multimodal data. Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years. In this paper, we provided a comprehensive survey on deep multimodal representation learning which has never been concentrated entirely. To facilitate the discussion on how the heterogeneity gap is narrowed, according to the underlying structures in which different modalities are integrated, we category deep multimodal representation learning methods into three frameworks: joint representation, coordinated representation, and encoder-decoder. Additionally, we review some typical models in this area ranging from conventional models to newly developed technologies. This paper highlights on the key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of our knowledge, have never been reviewed previously, even though they have become the major focuses of much contemporary research. For each framework or model, we discuss its basic structure, learning objective, application scenes, key issues, advantages, and disadvantages, such that both novel and experienced researchers can benefit from this survey. Finally, we suggest some important directions for future work.

243 citations

Journal ArticleDOI
TL;DR: Autoencoders (AEs) as mentioned in this paper have emerged as an alternative to manifold learning for conducting nonlinear feature fusion, and they can be used to generate reduced feature sets through the fusion of the original ones.

209 citations

Journal ArticleDOI
TL;DR: Results fully demonstrate that the stacked SAE-based diagnosis method can extract more discriminative high-level features and has a better performance in rotating machinery fault diagnosis compared with the traditional machine learning methods with shallow architectures.
Abstract: As a breakthrough in the field of machine fault diagnosis, deep learning has great potential to extract more abstract and discriminative features automatically without much prior knowledge compared with other methods, such as the signal processing and analysis-based methods and machine learning methods with shallow architectures. One of the most important aspects in measuring the extracted features is whether they can explore more information of the inputs and avoid redundancy to be representative. Thus, a stacked sparse autoencoder (SAE)-based machine fault diagnosis method is proposed in this paper. The penalty term of the SAE can help mine essential information and avoid redundancy. To help the constructed diagnosis network further mine more abstract and representative high-level features, the collected non-stationary and transient signals are preprocessed with ensemble empirical mode decomposition and autoregressive (AR) models to obtain AR parameters, which are extracted based on the intrinsic mode functions (IMFs) and regarded as the low-level features for the inputs of the proposed diagnosis network. Only the first four IMFs are considered, because fault information is mainly reflected in high-frequency IMFs. Experiments and comparisons are complemented to validate the superiority of the presented diagnosis network. Results fully demonstrate that the stacked SAE-based diagnosis method can extract more discriminative high-level features and has a better performance in rotating machinery fault diagnosis compared with the traditional machine learning methods with shallow architectures.

201 citations