scispace - formally typeset
Search or ask a question

Showing papers on "Softmax function published in 2015"


Posted Content
TL;DR: This work proposes a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes and shows that it is more effective than CNN followed by a softmax classifier and using only word embeddings as input features is enough to achieve state-of-the-art results.
Abstract: Relation classification is an important semantic processing task for which state-ofthe-art systems still rely on costly handcrafted features. In this work we tackle the relation classification task using a convolutional neural network that performs classification by ranking (CR-CNN). We propose a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes. We perform experiments using the the SemEval-2010 Task 8 dataset, which is designed for the task of classifying the relationship between two nominals marked in a sentence. Using CRCNN, we outperform the state-of-the-art for this dataset and achieve a F1 of 84.1 without using any costly handcrafted features. Additionally, our experimental results show that: (1) our approach is more effective than CNN followed by a softmax classifier; (2) omitting the representation of the artificial class Other improves both precision and recall; and (3) using only word embeddings as input features is enough to achieve state-of-the-art results if we consider only the text between the two target nominals.

485 citations


Proceedings ArticleDOI
Zuxuan Wu1, Xi Wang1, Yu-Gang Jiang1, Hao Ye1, Xiangyang Xue1 
13 Oct 2015
TL;DR: Wang et al. as discussed by the authors proposed a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos.
Abstract: Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid learning framework that can model several important aspects of the video data. We also show that (1) combining the spatial and the short-term motion features in the regularized fusion network is better than direct classification and fusion using the CNN with a softmax layer, and (2) the sequence-based LSTM is highly complementary to the traditional classification strategy without considering the temporal frame orders. Extensive experiments are conducted on two popular and challenging benchmarks, the UCF-101 Human Actions and the Columbia Consumer Videos (CCV). On both benchmarks, our framework achieves very competitive performance: 91.3% on the UCF-101 and 83.5% on the CCV.

411 citations


Posted Content
TL;DR: OpenMax as mentioned in this paper adapts Meta-Recognition concepts to the activation patterns in the penultimate layer of the network to estimate the probability of an input being from an unknown class.
Abstract: Deep networks have produced significant gains for various visual recognition problems, leading to high impact academic and commercial applications. Recent work in deep networks highlighted that it is easy to generate images that humans would never classify as a particular object class, yet networks classify such images high confidence as that given class - deep network are easily fooled with images humans do not consider meaningful. The closed set nature of deep networks forces them to choose from one of the known classes leading to such artifacts. Recognition in the real world is open set, i.e. the recognition system should reject unknown/unseen classes at test time. We present a methodology to adapt deep networks for open set recognition, by introducing a new model layer, OpenMax, which estimates the probability of an input being from an unknown class. A key element of estimating the unknown probability is adapting Meta-Recognition concepts to the activation patterns in the penultimate layer of the network. OpenMax allows rejection of "fooling" and unrelated open set images presented to the system; OpenMax greatly reduces the number of obvious errors made by a deep network. We prove that the OpenMax concept provides bounded open space risk, thereby formally providing an open set recognition solution. We evaluate the resulting open set deep networks using pre-trained networks from the Caffe Model-zoo on ImageNet 2012 validation data, and thousands of fooling and open set images. The proposed OpenMax model significantly outperforms open set recognition accuracy of basic deep networks as well as deep networks with thresholding of SoftMax probabilities.

368 citations


Proceedings ArticleDOI
01 Jul 2015
TL;DR: This article proposed a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes and achieved state-of-the-art performance on the SemEval-2010 Task 8 dataset.
Abstract: Relation classification is an important semantic processing task for which state-ofthe-art systems still rely on costly handcrafted features. In this work we tackle the relation classification task using a convolutional neural network that performs classification by ranking (CR-CNN). We propose a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes. We perform experiments using the the SemEval-2010 Task 8 dataset, which is designed for the task of classifying the relationship between two nominals marked in a sentence. Using CRCNN, we outperform the state-of-the-art for this dataset and achieve a F1 of 84.1 without using any costly handcrafted features. Additionally, our experimental results show that: (1) our approach is more effective than CNN followed by a softmax classifier; (2) omitting the representation of the artificial class Other improves both precision and recall; and (3) using only word embeddings as input features is enough to achieve state-of-the-art results if we consider only the text between the two target nominals.

353 citations


Proceedings Article
25 Jul 2015
TL;DR: This paper proposes a supervised representation learning method based on deep autoencoders for transfer learning that consists of an embedding layer and a label encoding layer that minimize the difference between domains explicitly and encode label information in learning the representation.
Abstract: Transfer learning has attracted a lot of attention in the past decade. One crucial research issue in transfer learning is how to find a good representation for instances of different domains such that the divergence between domains can be reduced with the new representation. Recently, deep learning has been proposed to learn more robust or higherlevel features for transfer learning. However, to the best of our knowledge, most of the previous approaches neither minimize the difference between domains explicitly nor encode label information in learning the representation. In this paper, we propose a supervised representation learning method based on deep autoencoders for transfer learning. The proposed deep autoencoder consists of two encoding layers: an embedding layer and a label encoding layer. In the embedding layer, the distance in distributions of the embedded instances between the source and target domains is minimized in terms of KL-Divergence. In the label encoding layer, label information of the source domain is encoded using a softmax regression model. Extensive experiments conducted on three real-world image datasets demonstrate the effectiveness of our proposed method compared with several state-of-the-art baseline methods.

291 citations


Journal ArticleDOI
TL;DR: This paper introduces sparse Laplacian filter learning to obtain the filters of the network with large amounts of unlabeled data and proposes a vehicle type classification method using a semisupervised convolutional neural network from vehicle frontal-view images.
Abstract: In this paper, we propose a vehicle type classification method using a semisupervised convolutional neural network from vehicle frontal-view images. In order to capture rich and discriminative information of vehicles, we introduce sparse Laplacian filter learning to obtain the filters of the network with large amounts of unlabeled data. Serving as the output layer of the network, the softmax classifier is trained by multitask learning with small amounts of labeled data. For a given vehicle image, the network can provide the probability of each type to which the vehicle belongs. Unlike traditional methods by using handcrafted visual features, our method is able to automatically learn good features for the classification task. The learned features are discriminative enough to work well in complex scenes. We build the challenging BIT-Vehicle dataset, including 9850 high-resolution vehicle frontal-view images. Experimental results on our own dataset and a public dataset demonstrate the effectiveness of the proposed method.

282 citations


Proceedings ArticleDOI
Siqin Tao1, Tao Zhang1, Jun Yang1, Xueqian Wang1, Weining Lu1 
28 Jul 2015
TL;DR: An integrated deep neural network method consisting of ten different structure parameter networks is proposed and it has better generalization capability and represents strong robustness and eliminates the impact of noise remarkably.
Abstract: As bearings are the most common components of mechanical structure, it will be helpful to research bearing fault and diagnose the fault as early as possible in case of suffering greater losses. This paper proposes a deep neural network algorithm framework for bearing fault diagnosis based on stacked autoencoder and softmax regression. The simulation results verify the feasibility of the algorithm and show the excellent classification performance. In addition, this deep neural network represents strong robustness and eliminates the impact of noise remarkably. Last but not least, an integrated deep neural network method consisting of ten different structure parameter networks is proposed and it has better generalization capability.

98 citations


Proceedings ArticleDOI
25 May 2015
TL;DR: This paper introduces how to process a Deep Belief Network by using Restricted Boltzmann Machines, and combines it together with softmax classifier, and uses it in the recognition of handwritten numbers.
Abstract: Deep Belief Network is an algorithm among deep learning. It is an effective method of solving the problems from neural network with deep layers, such as low velocity and the overfitting phenomenon in learning. In this paper, we will introduce how to process a Deep Belief Network by using Restricted Boltzmann Machines. What is more, we will combine the Deep Belief Network together with softmax classifier, and use it in the recognition of handwritten numbers.

94 citations


Journal ArticleDOI
TL;DR: A general-purpose no-reference (NR) image quality assessment (IQA) framework based on deep neural network is presented and insight is given into the operation of this network and intuitive explanations of how it works and why it works well are given.

91 citations


Proceedings ArticleDOI
19 Apr 2015
TL;DR: Noise contrastive estimation (NCE) is explored in RNNLM training and is insensitive to the output layer size, resulting in a doubling in training speed on a GPU and a 56 times speed up in test time evaluation on a CPU.
Abstract: In recent years recurrent neural network language models (RNNLMs) have been successfully applied to a range of tasks including speech recognition. However, an important issue that limits the quantity of data used, and their possible application areas, is the computational cost in training. A signi??cant part of this cost is associated with the softmax function at the output layer, as this requires a normalization term to be explicitly calculated. This impacts both the training and testing speed, especially when a large output vocabulary is used. To address this problem, noise contrastive estimation (NCE) is explored in RNNLM training. NCE does not require the above normalization during both training and testing. It is insensitive to the output layer size. On a large vocabulary conversational telephone speech recognition task, a doubling in training speed on a GPU and a 56 times speed up in test time evaluation on a CPU were obtained.

87 citations


Book ChapterDOI
05 Oct 2015
TL;DR: A novel convolutional neural network CNN based method for optic cup and disc segmentation based on an entropy based sampling technique that outperforms existing methods on the public DRISHTI-GS data set.
Abstract: We propose a novel convolutional neural network CNN based method for optic cup and disc segmentation. To reduce computational complexity, an entropy based sampling technique is introduced that gives superior results over uniform sampling. Filters are learned over several layers with the output of previous layers serving as the input to the next layer. A softmax logistic regression classifier is subsequently trained on the output of all learned filters. In several error metrics, the proposed algorithm outperforms existing methods on the public DRISHTI-GS data set.

Posted Content
Zuxuan Wu1, Xi Wang1, Yu-Gang Jiang1, Hao Ye1, Xiangyang Xue1 
TL;DR: This work proposes a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos, and achieves very competitive performance on two popular and challenging benchmarks.
Abstract: Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid learning framework that can model several important aspects of the video data. We also show that (1) combining the spatial and the short-term motion features in the regularized fusion network is better than direct classification and fusion using the CNN with a softmax layer, and (2) the sequence-based LSTM is highly complementary to the traditional classification strategy without considering the temporal frame orders. Extensive experiments are conducted on two popular and challenging benchmarks, the UCF-101 Human Actions and the Columbia Consumer Videos (CCV). On both benchmarks, our framework achieves to-date the best reported performance: $91.3\%$ on the UCF-101 and $83.5\%$ on the CCV.

Book ChapterDOI
22 Jun 2015
TL;DR: This research shows the effectiveness of using pre-trained word vectors and the advantage of leveraging Twitter corpora for the unsupervised learning phase and achieves comparable performance to state-of-the-art methods on the Twitter2015 set.
Abstract: In the work presented in this paper, we conduct experiments on sentiment analysis in Twitter messages by using a deep convolutional neural network. The network is trained on top of pre-trained word embeddings obtained by unsupervised learning on large text corpora. We use CNN with multiple filters with varying window sizes on top of which we add 2 fully connected layers with dropout and a softmax layer. Our research shows the effectiveness of using pre-trained word vectors and the advantage of leveraging Twitter corpora for the unsupervised learning phase. The experimental evaluation is made on benchmark datasets provided on the SemEval 2015 competition for the Sentiment analysis in Twitter task. Despite the fact that the presented approach does not depend on hand-crafted features, we achieve comparable performance to state-of-the-art methods on the Twitter2015 set, measuring F1 score of 64.85 %.

Journal ArticleDOI
TL;DR: A model-based approach to distributed computing for multinomial logistic (softmax) regression for text analysis, which treats counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories.
Abstract: This article introduces a model-based approach to distributed computing for multinomial logistic (softmax) regression. We treat counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories. The work is driven by the high-dimensional-response multinomial models that are used in analysis of a large number of random counts. Our motivating applications are in text analysis, where documents are tokenized and the token counts are modeled as arising from a multinomial dependent upon document attributes. We estimate such models for a publicly available data set of reviews from Yelp, with text regressed onto a large set of explanatory variables (user, business, and rating information). The fitted models serve as a basis for exploring the connection between words and variables of interest, for reducing dimension into supervised factor scores, and for prediction. We argue that the approach herein provides an attractive option for social scientists and other text analysts who wish to bring familiar regression tools to bear on text data.

Posted Content
TL;DR: In this paper, the authors propose a novel approach explicitly designed to address a number of subtle yet important issues which have stymied earlier distance metric learning (DML) algorithms, which maintains an explicit model of the distributions of the different classes in representation space.
Abstract: Distance metric learning (DML) approaches learn a transformation to a representation space where distance is in correspondence with a predefined notion of similarity. While such models offer a number of compelling benefits, it has been difficult for these to compete with modern classification algorithms in performance and even in feature extraction. In this work, we propose a novel approach explicitly designed to address a number of subtle yet important issues which have stymied earlier DML algorithms. It maintains an explicit model of the distributions of the different classes in representation space. It then employs this knowledge to adaptively assess similarity, and achieve local discrimination by penalizing class distribution overlap. We demonstrate the effectiveness of this idea on several tasks. Our approach achieves state-of-the-art classification results on a number of fine-grained visual recognition datasets, surpassing the standard softmax classifier and outperforming triplet loss by a relative margin of 30-40%. In terms of computational performance, it alleviates training inefficiencies in the traditional triplet loss, reaching the same error in 5-30 times fewer iterations. Beyond classification, we further validate the saliency of the learnt representations via their attribute concentration and hierarchy recovery properties, achieving 10-25% relative gains on the softmax classifier and 25-50% on triplet loss in these tasks.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: A probabilistic graphical model (PGM) for prognosis and diagnosis of breast cancer is proposed and extensive experiments using real-world databases show promising results in comparison to Support Vector Machines and k-Nearest Neighbors classifiers, for classifying tumors and predicting events like recurrence and metastasis.
Abstract: We propose a probabilistic graphical model (PGM) for prognosis and diagnosis of breast cancer. PGMs are suitable for building predictive models in medical applications, as they are powerful tools for making decisions under uncertainty from big data with missing attributes and noisy evidence. Previous work relied mostly on clinical data to create a predictive model. Moreover, practical knowledge of an expert was needed to build the structure of a model, which may not be accurate. In our opinion, since cancer is basically a genetic disease, the integration of microarray and clinical data can improve the accuracy of a predictive model. However, since microarray data is high-dimensional, including genomic variables may lead to poor results for structure and parameter learning due to the curse of dimensionality and small sample size problems. We address these problems by applying manifold learning and a deep belief network (DBN) to microarray data. First, we construct a PGM and a DBN using clinical and microarray data, and extract the structure of the clinical model automatically by applying a structure learning algorithm to the clinical data. Then, we integrate these two models using softmax nodes. Extensive experiments using real-world databases, such as METABRIC and NKI, show promising results in comparison to Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN) classifiers, for classifying tumors and predicting events like recurrence and metastasis.

Proceedings ArticleDOI
19 Apr 2015
TL;DR: This paper shows that GMM can be easily integrated into the deep neural network framework by exploiting its equivalence with the log-linear mixture model (LMM), which can be transformed to a large softmax layer followed by a summation pooling layer.
Abstract: In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.

Proceedings ArticleDOI
19 Apr 2015
TL;DR: A new type of deep neural networks (DNNs) that uses a support vector machine (SVM) at the top layer for classification and has verified its effectiveness on the TIMIT task for continuous speech recognition.
Abstract: A new type of deep neural networks (DNNs) is presented in this paper. Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer for classification. The new DNN instead uses a support vector machine (SVM) at the top layer. Two training algorithms are proposed at the frame and sequence-level to learn parameters of SVM and DNN in the maximum-margin criteria. In the frame-level training, the new model is shown to be related to the multiclass SVM with DNN features; In the sequence-level training, it is related to the structured SVM with DNN features and HMM state transition features. Its decoding process is similar to the DNN-HMM hybrid system but with frame-level posterior probabilities replaced by scores from the SVM. We term the new model deep neural support vector machine (DNSVM). We have verified its effectiveness on the TIMIT task for continuous speech recognition.

Journal ArticleDOI
TL;DR: A new image retrieval method for binary images based on Deep Belief Networks (DBN) and Softmax classifier is proposed, which classifies the image data-set into some categories with the DBN and Soft max classifier first, and then classifying the query image in the same way.
Abstract: Currently, the common methods for image retrieval are content-based, while the abilities of image feature representation of these methods are very limited. In this paper, a new image retrieval method for binary images based on Deep Belief Networks (DBN) and Softmax classifier is proposed, which classifies the image data-set into some categories with the DBN and Softmax classifier first, and then classifies the query image in the same way, and those images in the same category will be returned as the similar images of the query image. Unlike the existing image retrieval models, the new method aims to provide a more effective representation and extraction measure by simulating the architecture of human visual system, and it is not necessary to set the threshold manually for this method like most of the existing methods based on the hamming distance computation. Experimental results show that the proposed method can get better recall and precision than some existing methods, such as perceptual hash a...

Journal ArticleDOI
TL;DR: A novel coupled autoassociative neural network is proposed for learning a target-to-source image representation for heterogenous face recognition and shows that the learned image representation-common latent features-by the coupled auto-association network produces competitive cross-modal face recognition results.
Abstract: Several models have been previously suggested for learning correlated representations between source and target modalities. In this paper, we propose a novel coupled autoassociative neural network for learning a target-to-source image representation for heterogenous face recognition. This coupled network is unique, because a cross-modal transformation is learned by forcing the hidden units (latent features) of two neural networks to be as similar as possible, while simultaneously preserving information from the input. The effectiveness of this model is demonstrated using multiple existing heterogeneous face recognition databases. Moreover, the empirical results show that the learned image representation—common latent features—by the coupled auto-associative produces competitive cross-modal face recognition results. These results are obtained by training a softmax classifier using only the latent features from the source domain and testing using only the latent features from the target domain.

Proceedings ArticleDOI
Fuzhen Zhuang, Dan Luo, Xin Jin, Hui Xiong1, Ping Luo, Qing He 
14 Nov 2015
TL;DR: This paper proposes a feature representation learning framework, which has the ability in combining the autoencoders, an effective way to learn good representation by using large amount of unlabeled data, and model parameter regularization methods into a unified model for multi-task learning.
Abstract: Multi-task learning aims at learning multiple related but different tasks. In general, there are two ways for multi-task learning. One is to exploit the small set of labeled data from all tasks to learn a shared feature space for knowledge sharing. In this way, the focus is on the labeled training samples while the large amount of unlabeled data is not sufficiently considered. Another way has a focus on how to share model parameters among multiple tasks based on the original features space. Here, the question is whether it is possible to combine the advantages of both approaches and develop a method, which can simultaneously learn a shared subspace for multiple tasks and learn the prediction models in this subspace? To this end, in this paper, we propose a feature representation learning framework, which has the ability in combining the autoencoders, an effective way to learn good representation by using large amount of unlabeled data, and model parameter regularization methods into a unified model for multi-task learning. Specifically, all the tasks share the same encoding and decoding weights to find their latent feature representations, based on which a regularized multi-task softmax regression method is used to find a distinct prediction model for each task. Also, some commonalities are considered in the prediction models according to the relatedness of multiple tasks. There are several advantages of the proposed model: 1) it can make full use of large amount of unlabeled data from all the tasks to learn satisfying representations, 2) the learning of distinct prediction models can benefit from the success of autoencoder, 3) since we incorporate the labeled information into the softmax regression method, so the learning of feature representation is indeed in a semi-supervised manner. Therefore, our model is a semi-supervised autoencoder for multi-task learning (SAML for short). Finally, extensive experiments on three real-world data sets demonstrate the effectiveness of the proposed framework. Moreover, the feature representation obtained in this model can be used by other methods to obtain improved results.

Journal ArticleDOI
TL;DR: This letter proposes a softmax regression-based feature fusion method by learning distinct weights for different features that enables the estimation of object-to-class similarity measures and the conditional probabilities that each object belongs to different classes.
Abstract: Various types of features can be extracted from very high resolution remote sensing images for object classification. It has been widely acknowledged that the classification performance can benefit from proper feature fusion. In this letter, we propose a softmax regression-based feature fusion method by learning distinct weights for different features. Our fusion method enables the estimation of object-to-class similarity measures and the conditional probabilities that each object belongs to different classes. Moreover, we introduce an approximate method for calculating the class-to-class similarities between different classes. Finally, the obtained fusion and similarity information are integrated into a marginalized kernel to build a support vector machine classifier. The advantages of our method are validated on QuickBird imagery.

Posted Content
TL;DR: An approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built is studied, demonstrating the tremendous value of the visual channel in phone classification even in audio with high signal to noise ratio.
Abstract: In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error rate (PER) of $41\%$ under clean condition on the IBM large vocabulary audio-visual studio dataset, this fusion model achieves a PER of $35.83\%$ demonstrating the tremendous value of the visual channel in phone classification even in audio with high signal to noise ratio. Second, we present a new deep network architecture that uses a bilinear softmax layer to account for class specific correlations between modalities. We show that combining the posteriors from the bilinear networks with those from the fused model mentioned above results in a further significant phone error rate reduction, yielding a final PER of $34.03\%$.

Journal ArticleDOI
TL;DR: The use of sparse-autoencoder to learn fault features improves the classification performance significantly with a small number of training data, with overall classification performance of 183 correct classifications out of 186 test cases for four different fault classes.
Abstract: A unique technique is proposed based on sparse-autoencoders for automated fault detection and classification using the acoustic signal generated from internal combustion (IC) engines. This technique does not require any hand-engineered feature extraction and feature selection from acoustic data for fault detection and classification, as usually done. The proposed technique uses sparse-autoencoder for unsupervised features extraction from the training data. The training and testing data sets are then transformed by these extracted features, before being used by the softmax regression for classification of unknown engines into healthy and faulty class. The use of sparse-autoencoder to learn fault features improves the classification performance significantly with a small number of training data. This technique is tested on industrial IC engine data set, with overall classification performance of 183 correct classifications out of 186 test cases for four different fault classes.

Proceedings Article
07 Dec 2015
TL;DR: In this paper, the exact loss, gradient update for the output weights, and gradient for backpropagation, all in O(d2) per example instead of O(Dd), remarkably without ever computing the D-dimensional output.
Abstract: An important class of problems involves training deep neural networks with sparse prediction targets of very high dimension D. These occur naturally in e.g. neural language models or the learning of word-embeddings, often posed as predicting the probability of next words among a vocabulary of size D (e.g. 200 000). Computing the equally large, but typically non-sparse D-dimensional output vector from a last hidden layer of reasonable dimension d (e.g. 500) incurs a prohibitive O(Dd) computational cost for each example, as does updating the D x d output weight matrix and computing the gradient needed for backpropagation to previous layers. While efficient handling of large sparse network inputs is trivial, the case of large sparse targets is not, and has thus so far been sidestepped with approximate alternatives such as hierarchical softmax or sampling-based approximations during training. In this work we develop an original algorithmic approach which, for a family of loss functions that includes squared error and spherical softmax, can compute the exact loss, gradient update for the output weights, and gradient for backpropagation, all in O(d2) per example instead of O(Dd), remarkably without ever computing the D-dimensional output. The proposed algorithm yields a speedup of D/4d, i.e. two orders of magnitude for typical sizes, for that critical part of the computations that often dominates the training time in this kind of network architecture.

Proceedings Article
01 Jan 2015
TL;DR: A novel neural network to predict what a user will buy next given her shopping history is proposed, which elegantly involves both the user's general interest and the sequential dependencies between items for prediction.
Abstract: One crucial task in recommendation is to predict what a user will buy next given her shopping history. In this paper, we propose a novel neural network to complete this task. The model consists of an embedding layer, a hidden layer and an output layer. Firstly, the distributed representations of the user and the items bought before are obtained and used to form a feature vector by the embedding layer. Then the hidden layer transforms the feature vector to another space by a non-linear operator. Finally, the softmax operator is adopted to output the probabilities of next items. We can see that the model elegantly involves both the user's general interest and the sequential dependencies between items for prediction. Experimental results on two real datasets prove the effectiveness of our model.

Patent
11 Mar 2015
TL;DR: In this article, a face identification method based on a random pooling convolutional neural network (RPN) was proposed. But, the method is not suitable for face recognition.
Abstract: The invention discloses a face identification method based on a random pooling convolutional neural network. According to the method, the characteristics of a face image are quickly extracted by the random pooling convolutional neural network and cascaded to realize face identification; selection strategies and steps of new pooling values are adopted in a process of creating the convolutional neural network and then supervised training is carried out by a softmax classifier; the probability distribution used in a sampling process is based on energy, and the effect of optimizing increment of the calculation speed of the characteristics extracted by the convolutional neural network and generalization application of a convolutional neural network training result can be achieved; the convolutional neural network training based on random pooling is simple and high in accuracy, and can promote wide application of random pooling in the process of extracting the face identification characteristics.

Journal ArticleDOI
TL;DR: A supervised replicated softmax model (sRSM), based on restricted Boltzmann machines and distributed representations, is proposed to learn naturally discriminative topics and is evaluated for the recognition of categorical or continuous emotional attributes via within and cross-corpus experiments.
Abstract: Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection of statistics over frames, thereby losing important local information in the process which affects the performance. To overcome these limitations, a novel feature extraction approach using latent topic models (LTMs) is presented in this study. Speech is assumed to comprise of a mixture of emotion-specific topics, where the latter capture emotionally salient information from the co-occurrences of frame-level acoustic features and yield better descriptors. Specifically, a supervised replicated softmax model (sRSM), based on restricted Boltzmann machines and distributed representations, is proposed to learn naturally discriminative topics. The proposed features are evaluated for the recognition of categorical or continuous emotional attributes via within and cross-corpus experiments conducted over acted and spontaneous expressions. In a within-corpus scenario, sRSM outperforms competing LTMs, while obtaining a significant improvement of 16.75% over popular statistics-based turn-level features for valence-based classification, which is considered to be a difficult task using only speech. Further analyses with respect to the turn duration show that the improvement is even more significant, 35%, on longer turns (>6 s), which is highly desirable for current turn-based practices. In a cross-corpus scenario, two novel adaptation-based approaches, instance selection, and weight regularization are proposed to reduce the inherent bias due to varying annotation procedures and cultural perceptions across databases. Experimental results indicate a natural, yet less severe, deterioration in performance - only 2.6% and 2.7%, thereby highlighting the generalization ability of the proposed features.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: Experiments show that the deeper backpropagation through the speaker dependent layer is necessary for improved recognition performance, and the speaker adaptively and jointly trained BN-GMM results in 5% relative improvement over very strong speaker-independent hybrid baseline on the Quaero English broadcast news and conversations task, and on the 300-hour Switchboard task.
Abstract: In the tandem approach, the output of a neural network (NN) serves as input features to a Gaussian mixture model (GMM) aiming to improve the emission probability estimates. As has been shown in our previous work, GMM with pooled covariance matrix can be integrated into a neural network framework as a softmax layer with hidden variables, which allows for joint estimation of both neural network and Gaussian mixture parameters. Here, this approach is extended to include speaker adaptive training (SAT) by introducing a speaker dependent neural network layer. Error backpropagation beyond this speaker dependent layer realizes the adaptive training of the Gaussian parameters as well as the optimization of the bottleneck (BN) tandem features of the underlying acoustic model, simultaneously. In this study, after the initialization by constrained maximum likelihood linear regression (CMLLR) the speaker dependent layer itself is kept constant during the joint training. Experiments show that the deeper backpropagation through the speaker dependent layer is necessary for improved recognition performance. The speaker adaptively and jointly trained BN-GMM results in 5% relative improvement over very strong speaker-independent hybrid baseline on the Quaero English broadcast news and conversations task, and on the 300-hour Switchboard task.

Patent
22 Jul 2015
TL;DR: In this paper, a deep-learning rolling bearing fault diagnosis method based on SDA (stacked denoising autoencoder) and Softmax regression is proposed. But the method is limited to the classification of rolling bearing faults.
Abstract: The invention provides a deep-learning rolling bearing fault diagnosis method based on SDA (stacked denoising autoencoder) and Softmax regression. On the basis of analysis of rolling bearing faults and aiming at the problem of limitations on precision and robustness of current sorting algorithms, knowledge in related fields of image mode recognition is used for reference, a deep-learning autonomous cognitive method based on a multilayer neural network is adopted, self expression of original data under the condition that an input portion is shielded is realized by the aid of a stacked denoising autoencoder model, and reconstructed data are inputted to a Softmax regression model to judge operating conditions and fault types of rolling bearings. According to analysis of test results, the deep-learning rolling bearing fault diagnosis method based on SDA and Softmax regression can be effectively applied to fault diagnosis of the rolling bearings.