scispace - formally typeset
Search or ask a question
Author

Tong Zhang

Bio: Tong Zhang is an academic researcher from South China University of Technology. The author has contributed to research in topics: Convolutional neural network & Graph (abstract data type). The author has an hindex of 28, co-authored 118 publications receiving 2618 citations. Previous affiliations of Tong Zhang include University of Macau & Hong Kong University of Science and Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: The proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition and experimental results demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.
Abstract: In this paper, we propose a novel deep learning framework, called spatial–temporal recurrent neural network (STRNN), to integrate the feature learning from both spatial and temporal information of signal sources into a unified spatial–temporal dependency model. In STRNN, to capture those spatially co-occurrent variations of human emotions, a multidirectional recurrent neural network (RNN) layer is employed to capture long-range contextual cues by traversing the spatial regions of each temporal slice along different directions. Then a bi-directional temporal RNN layer is further used to learn the discriminative features characterizing the temporal dependencies of the sequences, where sequences are produced from the spatial RNN layer. To further select those salient regions with more discriminative ability for emotion recognition, we impose sparse projection onto those hidden states of spatial and temporal domains to improve the model discriminant ability. Consequently, the proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition. Experimental results on the public emotion datasets of electroencephalogram and facial expression demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.

335 citations

Journal ArticleDOI
Tong Zhang1, Wenming Zheng1, Zhen Cui1, Yuan Zong1, Jingwei Yan1, Keyu Yan1 
TL;DR: A novel deep neural network (DNN)-driven feature learning method is proposed and applied to multi-view facial expression recognition (FER) and the experimental results show that the algorithm outperforms the state-of-the-art methods.
Abstract: In this paper, a novel deep neural network (DNN)-driven feature learning method is proposed and applied to multi-view facial expression recognition (FER). In this method, scale invariant feature transform (SIFT) features corresponding to a set of landmark points are first extracted from each facial image. Then, a feature matrix consisting of the extracted SIFT feature vectors is used as input data and sent to a well-designed DNN model for learning optimal discriminative features for expression classification. The proposed DNN model employs several layers to characterize the corresponding relationship between the SIFT feature vectors and their corresponding high-level semantic information. By training the DNN model, we are able to learn a set of optimal features that are well suitable for classifying the facial expressions across different facial views. To evaluate the effectiveness of the proposed method, two nonfrontal facial expression databases, namely BU-3DFE and Multi-PIE, are respectively used to testify our method and the experimental results show that our algorithm outperforms the state-of-the-art methods.

262 citations

Book ChapterDOI
Wenhao Jiang1, Lin Ma1, Yu-Gang Jiang2, Wei Liu1, Tong Zhang1 
08 Sep 2018
TL;DR: This paper proposes a novel recurrent fusion network (RFNet) for the image captioning task, which can exploit the interactions among the outputs of the image encoders and generate new compact and informative representations for the decoder.
Abstract: Recently, much advance has been made in image captioning, and an encoder-decoder framework has been adopted by all the state-of-the-art models. Under this framework, an input image is encoded by a convolutional neural network (CNN) and then translated into natural language with a recurrent neural network (RNN). The existing models counting on this framework employ only one kind of CNNs, e.g., ResNet or Inception-X, which describes the image contents from only one specific view point. Thus, the semantic meaning of the input image cannot be comprehensively understood, which restricts improving the performance. In this paper, to exploit the complementary information from multiple encoders, we propose a novel recurrent fusion network (RFNet) for the image captioning task. The fusion process in our model can exploit the interactions among the outputs of the image encoders and generate new compact and informative representations for the decoder. Experiments on the MSCOCO dataset demonstrate the effectiveness of our proposed RFNet, which sets a new state-of-the-art for image captioning.

242 citations

Proceedings ArticleDOI
01 Sep 2017
TL;DR: A novel multi-scale convolutional neural network (MSCNN) for single image crowd counting is proposed, able to generate scale-relevant features for higher crowd counting performances in a single-column architecture, which is both accuracy and cost effective for practical applications.
Abstract: Crowd counting on static images is a challenging problem due to scale variations. Recently deep neural networks have been shown to be effective in this task. However, existing neural-networks-based methods often use the multi-column or multi-network model to extract the scale-relevant features, which is more complicated for optimization and computation wasting. To this end, we propose a novel multi-scale convolutional neural network (MSCNN) for single image crowd counting. Based on the multi-scale blobs, the network is able to generate scale-relevant features for higher crowd counting performances in a single-column architecture, which is both accuracy and cost effective for practical applications. Complemental results show that our method outperforms the state-of-the-art methods on both accuracy and robustness with far less number of parameters.

191 citations

Posted Content
TL;DR: This paper evaluates the robustness of state-of-the-art face recognition models in the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but can only acquire hard-label predictions by sending queries to the target model.
Abstract: Face recognition has obtained remarkable progress in recent years due to the great improvement of deep convolutional neural networks (CNNs). However, deep CNNs are vulnerable to adversarial examples, which can cause fateful consequences in real-world face recognition applications with security-sensitive purposes. Adversarial attacks are widely studied as they can identify the vulnerability of the models before they are deployed. In this paper, we evaluate the robustness of state-of-the-art face recognition models in the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but can only acquire hard-label predictions by sending queries to the target model. This attack setting is more practical in real-world face recognition systems. To improve the efficiency of previous methods, we propose an evolutionary attack algorithm, which can model the local geometries of the search directions and reduce the dimension of the search space. Extensive experiments demonstrate the effectiveness of the proposed method that induces a minimum perturbation to an input face image with fewer queries. We also apply the proposed method to attack a real-world face recognition system successfully.

180 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way, which can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR.
Abstract: Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.

837 citations

Journal ArticleDOI
TL;DR: Practical suggestions on the selection of many hyperparameters are provided in the hope that they will promote or guide the deployment of deep learning to EEG datasets in future research.
Abstract: Objective Electroencephalography (EEG) analysis has been an important tool in neuroscience with applications in neuroscience, neural engineering (e.g. Brain-computer interfaces, BCI's), and even commercial applications. Many of the analytical tools used in EEG studies have used machine learning to uncover relevant information for neural classification and neuroimaging. Recently, the availability of large EEG data sets and advances in machine learning have both led to the deployment of deep learning architectures, especially in the analysis of EEG signals and in understanding the information it may contain for brain functionality. The robust automatic classification of these signals is an important step towards making the use of EEG more practical in many applications and less reliant on trained professionals. Towards this goal, a systematic review of the literature on deep learning applications to EEG classification was performed to address the following critical questions: (1) Which EEG classification tasks have been explored with deep learning? (2) What input formulations have been used for training the deep networks? (3) Are there specific deep learning network structures suitable for specific types of tasks? Approach A systematic literature review of EEG classification using deep learning was performed on Web of Science and PubMed databases, resulting in 90 identified studies. Those studies were analyzed based on type of task, EEG preprocessing methods, input type, and deep learning architecture. Main results For EEG classification tasks, convolutional neural networks, recurrent neural networks, deep belief networks outperform stacked auto-encoders and multi-layer perceptron neural networks in classification accuracy. The tasks that used deep learning fell into five general groups: emotion recognition, motor imagery, mental workload, seizure detection, event related potential detection, and sleep scoring. For each type of task, we describe the specific input formulation, major characteristics, and end classifier recommendations found through this review. Significance This review summarizes the current practices and performance outcomes in the use of deep learning for EEG classification. Practical suggestions on the selection of many hyperparameters are provided in the hope that they will promote or guide the deployment of deep learning to EEG datasets in future research.

777 citations