Deep autoencoder neural networks for gene ontology annotation predictions
read more
Citations
DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks
A State-of-the-Art Survey on Deep Learning Theory and Architectures
Ten quick tips for machine learning in computational biology
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches.
Deep Learning with Convolutional Neural Networks Applied to Electromyography Data: A Resource for the Classification of Movements for Prosthetic Hands
References
Gene Ontology: tool for the unification of biology
Singular value decomposition and least squares solutions
Torch7: A Matlab-like Environment for Machine Learning
Neural networks and principal component analysis: learning from examples without local minima
Auto-association by multilayer perceptrons and singular value decomposition
Related Papers (5)
Reducing the Dimensionality of Data with Neural Networks
A fast learning algorithm for deep belief nets
Frequently Asked Questions (20)
Q2. What are the future works mentioned in the paper "Deep autoencoder neural networks for gene ontology annotation predictions" ?
Future work will address advantages and issues related to the application of the same methods and rule to the prediction of multi-terminologies, not only annotations.
Q3. How do the authors learn the parameters of the autoencoder?
The authors learn the parameters of the autoencoder by performing stochastic gradient descent to minimize the reconstruction error, the MSE between a and â.
Q4. What is the way to improve gene function annotation data bases?
One approach to improving gene function annotation data bases like GO is to use patterns in the known annotations to predict new annotations.
Q5. How many iterations did the neural network learn?
Autoencoder neural networks were trained using the free GPU-accelerated software package Torch7 [21] using stochastic gradient descent with a learning rate of 0.01 for 25 iterations.
Q6. What is the common hyperparameter for tSVD?
For tSVD, the number of singular values is a hyper-parameter that determines the rank of the final prediction matrix, andis usually chosen through cross-validation.
Q7. What is the advantage of deep neural networks over shallow machine learning methods?
Deep networks of multiple hidden layers have an advantage over shallow machine learning methods in that they are able to model complex data with greater efficiency.
Q8. What is the way to predict gene function annotations?
2014 533neural networks have more expressive power, and may be better suited for discovering the underlying patterns in gene function annotation data.
Q9. What is the definition of a hidden layer in an autencoder?
A small hidden layer in an autencoder network creates an information bottleneck, forcing the network to compress the data into a low-dimensional representation.
Q10. What is the SVD of the matrix A?
The SVD of the matrix A is given byA = U ΣV T (3)where U is a m ×m unitary matrix (i.e. UT U = I), Σ is a non-negative diagonal matrix of size m × n, and V T is a n × n unitary matrix (i.e. V T V = I).
Q11. What is the way to predict gene functions?
It can be used to predict both inaccuracies and missing gene functions — a large value of ãij suggests that gene i should be annotated with term j, whereas a value close to zero suggests the opposite.
Q12. How can the authors get the to predict gene-to-term annotations?
In order to better comprehend why à can be used to predict gene-to-term annotations, the authors highlight that an alternative expression of Equation (4) can be obtained using basic linear algebra manipulations:
Q13. What are the two algorithms used in this paper?
In this section the authors describe the two annotation-prediction algorithms used in this paper: Truncated Singular Value Decomposition and Autoencoder Neural Network.
Q14. What was the training and testing procedure for the tSVD and autoencoder algorithms?
Training and testing was performed on the unfolded matrices described in Equation 2 to eliminate the possibility of trivial predictions.
Q15. What is the advantage of the pLSA algorithm?
The approach has numerous advantages: (1) autoencoders can be trained online with very large datasets, (2) they can be trained quickly using graphics processors, and (3) the number and size of the hidden layers provides an easy way of controlling the complexity of the model.
Q16. What is the MSE of a hidden layer?
â) = ||a− â||22 = ||a− (Wd · h+ biasd)||22 (10)When the hidden layer has fewer dimensions than a, the autoencoder learns a compressed representation of the training data.
Q17. What is the threshold for making binary predictions?
To makebinary predictions, the authors set a threshold τ such that Ã(i, j) > τ is interpreted as a prediction that gene i should be annotated with feature j.
Q18. what is the optimal rank-k approximation of a?
1. Thematrix à is the optimal rank-k approximation of A, i.e. the one that minimizes the norm (either the spectral norm or the Frobenius norm) ‖A−Ã‖ subject to the rank constraint.
Q19. What is the way to predict gene-to-term annotations?
Based on Equation (5),the ith row of à can be written asãTi = a T i Vk V T k (7)ACM-BCB 2014 534Thus, the original annotation profile is first transformed in the eigen-term domain, while retaining only the first k eigenterms by the multiplication with Vk, and then mapped back to the original domain by means of V Tk .
Q20. How many annotations are added to the GO database?
these are far from complete and new annotations are added regularly; over a third of the biological process annotations have been added within the last four years.