scispace - formally typeset
Open AccessProceedings ArticleDOI

Deep autoencoder neural networks for gene ontology annotation predictions

TLDR
With experiments on gene annotation data from the Gene Ontology project, it is shown that deep autoencoder networks achieve better performance than other standard machine learning methods, including the popular truncated singular value decomposition.
Abstract
The annotation of genomic information is a major challenge in biology and bioinformatics. Existing databases of known gene functions are incomplete and prone to errors, and the bimolecular experiments needed to improve these databases are slow and costly. While computational methods are not a substitute for experimental verification, they can help in two ways: algorithms can aid in the curation of gene annotations by automatically suggesting inaccuracies, and they can predict previously-unidentified gene functions, accelerating the rate of gene function discovery. In this work, we develop an algorithm that achieves both goals using deep autoencoder neural networks. With experiments on gene annotation data from the Gene Ontology project, we show that deep autoencoder networks achieve better performance than other standard machine learning methods, including the popular truncated singular value decomposition.

read more

Content maybe subject to copyright    Report

Deep Autoencoder Neural Networks
for Gene Ontology Annotation Predictions
Davide Chicco
Politecnico di Milano
Dipartimento di Elettronica
Informazione Bioingegneria
Milan, Italy
davide.chicco@gmail.com
Peter Sadowski
University of California, Irvine
Dept. of Computer Science
Institute for Genomics and
Bioinformatics
Irvine, CA, USA
peter.j.sadowski@uci.edu
Pierre Baldi
University of California, Irvine
Dept. of Computer Science
Institute for Genomics and
Bioinformatics
Irvine, CA, USA
pfbaldi@ics.uci.edu
ABSTRACT
The annotation of genomic information is a major challenge
in biology and bioinformatics. Existing databases of known
gene functions are incomplete and prone to errors, and the
bimolecular experiments needed to improve these databases
are slow and costly. While computational methods are not
a substitute for experimental verification, they can help in
two ways: algorithms can aid in the curation of gene anno-
tations by automatically suggesting inaccuracies, and they
can predict previously-unidentified gene functions, acceler-
ating the rate of gene function discovery. In this work, we
develop an algorithm that achieves both goals using deep
autoencoder neural networks. With experiments on gene
annotation data from the Gene Ontology project, we show
that deep autoencoder networks achieve better performance
than other standard machine learning methods, including
the popular truncated singular value decomposition.
Categories and Subject Descriptors
I.2.6 [Artificial Intelligence]: Learning; J.3 [Life and
Medical Sciences]: Biology and Genetics; H.2.8 [Database
Applications]: Data mining
Keywords
biomolecular annotations, matrix-completion, autoencoders,
neural networks, Gene Ontology, truncated singular value
decomposition, principal component analysis
1. INTRODUCTION
In bioinformatics, a controlled gene function annotation
is a binary matrix associating genes or gene products with
corresponding author
corresponding author
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
BCB’14, September 20–23, 2014, Newport Beach, CA, USA.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-2894-4/14/09 ...$15.00.
http://dx.doi.org/10.1145/2649387.2649442.
functional features from a controlled vocabulary. These an-
notations are important for effective communication in biomed-
ical research, and lay the groundwork for bioinformatics soft-
ware tools and data mining investigations. The in vitro
biomolecular experiments used to validate gene functions are
expensive, so the development of computational methods to
identify errors and prioritize new biomolecular experiments
is a worthwhile area of research [1].
The Gene Ontology project (GO) is a bioinformatics ini-
tiative to characterize all the important features of genes and
gene products within a cell [2] [3]. GO is composed of three
controlled vocabularies structured as mostly-separate sub-
ontologies: biological processes, cellular components, and
molecular functions. Each GO sub-ontology is structured as
a directed acyclic graph of features (nodes) and ontological
relationships (edges). In January 2014, GO contained 39,000
terms with more than 25,450 biological processes, 2,250 cel-
lular components, and 9,650 molecular functions. However,
GO annotations are constantly being revised and added as
new experimental evidence is produced.
One approach to improving gene function annotation data
bases like GO is to use patterns in the known annotations
to predict new annotations. This can be viewed as a matrix-
completion problem, in which one attempts to recover a ma-
trix with some underlying structure from noisy observations.
Machine learning algorithms have proved very successful in
similar applications, such as the famous million-dollar Net-
flix prize awarded in 2009. Many machine learning algo-
rithms have already been applied to gene function annota-
tion ([4] [5] [6] [7] [8] [9]), but to the best of our knowledge
deep autoencoder neural networks have not. Deep networks
of multiple hidden layers have an advantage over shallow
machine learning methods in that they are able to model
complex data with greater efficiency. They have proven their
usefulness in fields such as vision and speech recognition, and
promise to yield similar performance gains in other machine
learning applications that have complex underlying struc-
ture in the data.
A popular algorithm for matrix-completion is the trun-
cated singular value decomposition method (tSVD). Kha-
tri et al. first used this method for GO annotation predic-
tion [10], and one of the authors of this work has extended
their method with gene clustering and term-term similarity
weights [11] [12]. However, the tSVD method can be viewed
as a special linear case of a more general approach using
autoencoders [13] [14] [15]. Deep, non-linear, autoencoder
ACM-BCB 2014 533

neural networks have more expressive power, and may be
better suited for discovering the underlying patterns in gene
function annotation data.
In this paper, we summarize the tSVD and autoencoder
methods, show how they can be used to predict annotations,
and compare the performance on six separate GO datasets.
2. SYSTEM AND METHODS
In this section we describe the two annotation-prediction
algorithms used in this paper: Truncated Singular Value De-
composition and Autoencoder Neural Network.
2.1 Truncated Singular Value Decomposition
Truncated Singular Value Decomposition (tSVD) [16] is a
matrix factorization method that produces a low-rank ap-
proximation to a matrix. Define A
d
{0, 1}
m×n
to be a
matrix of annotations. The m rows of A
d
correspond to
genes, while the n columns correspond to GO features, such
that
A
d
(i, j) =
(
1 if gene i is annotated with feature j,
0 otherwise.
(1)
When features are organized into ontologies, sometimes
only the most specific feature is specified, and the more gen-
eral features (ancestors) are implicit. Thus, in this work we
consider a modified matrix A defined as
A(i, j) =
1
if gene i is annotated with feature j
or with any descendant of j,
0
otherwise.
(2)
The i
th
row of the A matrix (a
T
i
) contains all the direct and
indirect annotations of gene i. The j
th
column encodes the
list of genes that have been annotated (directly or indirectly)
to feature j. This process is sometimes defined as annotation
unfolding [17].
Predictions are produced by computing the SVD of the
matrix A and truncating the less-significant singular values.
The SVD of the matrix A is given by
A = U Σ V
T
(3)
where U is a m × m unitary matrix (i.e. U
T
U = I), Σ is
a non-negative diagonal matrix of size m × n, and V
T
is
a n × n unitary matrix (i.e. V
T
V = I). Conventionally,
the entries along the diagonal of Σ (the singular values)
are sorted in non-increasing order. The number r p of
non-zero singular values is equal to the rank of the matrix
A, where p = min(m, n). For a positive integer k < r, the
tSVD matrix
e
A is given by
e
A = U
k
Σ
k
V
T
k
(4)
where U
k
(V
T
k
) is a m × k (n × k) matrix achieved by
retaining the first k columns of U (V ) and Σ is a k × k di-
agonal matrix with the k largest singular values along the
diagonal. The decomposition of the matrices and the differ-
ence between SVD and tSVD are represented in Fig. 1. The
matrix
e
A is the optimal rank-k approximation of A, i.e. the
one that minimizes the norm (either the spectral norm or
the Frobenius norm) kA
e
Ak subject to the rank constraint.
Figure 1: An illustration of the Singular Value
Decomposition (upper green image) and the Trun-
cated SVD reconstruction (lower blue image) of the
A matrix. In the classical SVD decompostion, A
{0, 1}
m×n
, U R
m×m
, Σ R
m×n
V
T
R
n×n
. In the
Truncated decomposition, where k N is the trun-
cation level, U
k
R
m×k
, Σ
k
R
k×k
, V
T
k
R
k×n
, and the
output matrix
e
A R
m×n
The matrix
e
A is real valued and can be interpreted as a
model of the noisy, incomplete observations. It can be used
to predict both inaccuracies and missing gene functions a
large value of ea
ij
suggests that gene i should be annotated
with term j, whereas a value close to zero suggests the oppo-
site. The choice of the k truncation parameter controls the
complexity of the model, and affects the predictions. Khatri
et al. use a fixed value of k = 500 in [10] [18] [19], while one
of the authors of this paper has developed a new discrete op-
timization algorithm to select the best truncation level on
the basis of the ROC AUCs, described in [20].
In order to better comprehend why
e
A can be used to
predict gene-to-term annotations, we highlight that an al-
ternative expression of Equation (4) can be obtained using
basic linear algebra manipulations:
e
A = A V
k
V
T
k
(5)
Additionally, the SVD of the matrix A is related to the
eigen-decomposition of the symmetric matrices T = A
T
A
and G = AA
T
. The columns of V
k
(U
k
) are a set of k
eigenvectors corresponding to the k largest eigenvalues of
the matrix T (G). The matrix T has a simple interpretation
in our context. In fact,
T (j
1
, j
2
) =
m
X
i=1
A
(i,j
1
)
· A
(i,j
2
)
(6)
i.e. T (j
1
, j
2
) is the number of genes annotated with both
terms, j
1
and j
2
. Consequently, T (j
1
, j
2
) indicates the (un-
normalized) correlation between term pairs and it can be
interpreted as a similarity score of the terms j
1
and j
2
, the
computation of which is exclusively based on the use of these
terms in available annotations. The eigenvectors of T (i.e.
the columns of V
k
) are a reduced set of eigen-terms. Intu-
itively, if two terms co-occur frequently, they are likely to
be mapped to the same eigen-term. Based on Equation (5),
the i
th
row of
e
A can be written as
ea
T
i
= a
T
i
V
k
V
T
k
(7)
ACM-BCB 2014 534

Figure 2: An autoencoder neural network with d
hidden layers. The number of input units is equal to
the number of output units, while there are usually
fewer units in each hidden layer.
Thus, the original annotation profile is first transformed in
the eigen-term domain, while retaining only the first k eigen-
terms by the multiplication with V
k
, and then mapped back
to the original domain by means of V
T
k
. This corresponds
to projecting the original vector a
T
i
onto the k-dimensional
subspace spanned by the columns of V
k
.
2.2 Autoencoder Neural Network
An autoencoder is a feed-forward artificial neural network
with the same input and target output. A small hidden
layer in an autencoder network creates an information bot-
tleneck, forcing the network to compress the data into a
low-dimensional representation. As with the tSVD method,
this modelling of the data can be used to make predictions.
For a simple autoencoder with a single hidden layer, the
vector of the hidden unit activities, h, is given by
h = f(W
e
· a + bias
e
) (8)
where f is the activation function (we use the logistic sig-
moid function in this work), W
e
is a parameter matrix, and
bias
e
is a vector of bias parameters. The hidden represen-
tation of the data is then mapped back into the space of a
using the decoding function:
ˆa = f (W
d
· h, +bias
d
) (9)
where W
d
is the decoding matrix and bias
d
a vector of bias
parameters. We learn the parameters of the autoencoder
by performing stochastic gradient descent to minimize the
reconstruction error, the MSE between a and ˆa.
MSE(a, ˆa) = ||a ˆa||
2
2
= ||a (W
d
· h + bias
d
)||
2
2
(10)
When the hidden layer has fewer dimensions than a, the au-
toencoder learns a compressed representation of the training
data. In fact, an autoencoder with k linear hidden units will
learn to project the data onto its first k principal compo-
nents, and the decoded data matrix is exactly the tSVD ma-
trix with the top k singular values [14]. Non-linear hidden
units allow an autoencoder to learn more complex encoding
functions, as do additional hidden layers.
As in the tSVD approach, the matrix A is an array of m
gene profiles with n possible features defined in Equation 2,
such that gene profile a
i
is the i
th
row of A. An autoen-
coder is trained to learn these gene profiles and produces a
prediction matrix
e
A as described in Fig. 3.
Given the input matrix A {0, 1}
m×n
, where rows and
columns correspond to genes and features, respectively:
1. Fix a number h of hidden units (h N, h < m), and
a number d of hidden layers (d {1, ..., maxhl})
2. Training: for each gene profile a
i
of A, where i
[1, m]:
(a) for each training iteration:
i. for each d hidden layer:
a) compute hidden activation h
i
from in-
put a
i
(Equation 8)
ii. compute reconstructed output ˆa
i
from
hidden activation h
i
(Equation 9)
iii. compute error gradient (Equation 10)
iv. back-propagate error gradient to update
weight parameters
3. Testing: for each gene profile a
i
of A, where i
[1, m]:
(a) autoencode a
i
and produce ˆa
i
(b) set ˆa
i
as i
th
row of the output matrix
e
A
Figure 3: Overview of the autoencoder neural net-
work algorithm.
2.3 Predictions
The tSVD and autoencoder both provide a prediction ma-
trix
e
A of real values, with larger values indicating a higher
predicted likelihood. For an ROC curve analysis, only the
relative ordering of these predictions is relevant. To make
binary predictions, we set a threshold τ such that
e
A(i, j) > τ
is interpreted as a prediction that gene i should be annotated
with feature j.
2.4 Autoencoder Training Details
Autoencoder neural networks were trained using the free
GPU-accelerated software package Torch7 [21] using stochas-
tic gradient descent with a learning rate of 0.01 for 25 it-
erations. L2 regularization was used on all weights, which
were initialized randomly from the uniform distribution over
[0, 1]. The hidden unit function is a Sigmoid.
2.5 Datasets
The GO database contains annotation datasets for a vari-
ety of species, and for each of the three GO sub-ontologies:
Biological Processes (BP), Molecular Functions (MF), and
Cellular Components (CC). We focused on the Bos taurus
(cattle) and Gallus gallus (red junglefowl) gene sets, which
are available from the Genomic and Proteomic Data Ware-
house (GPDW) [22] [23]. We use the July 2009 version of the
datasets for analyzing and selecting hyper-parameters, and
ACM-BCB 2014 535

the March 2013 version for comparing prediction algorithms.
Table 1 describes the size and number of annotations in each
version. We exclude all annotations that are flagged as IEA
(inferred from electronic annotation) or ND (no biological
data available), and all feature terms and genes that do not
appear in both dataset versions.
root term, which has the sub-ontology name (BP, CC,
MF). In January 2014, GO contained about 39,000 terms
describing gene and gene product features, with more than
25,450 BP, 9,650 MF and 3,350 CC terms. However, these
are far from complete and new annotations are added regu-
larly; over a third of the biological process annotations have
been added within the last four years.
3. RESULTS AND DISCUSSION
We perform two separate experiments. First, we analyze
the effects of hyper-parameters for both tSVD and the au-
toencoder algorithms on a validation set created by holding-
out (removing) 10% of the annotations from the July 2009
database, then we test the prediction algorithms on new an-
notations that were added in the 2013 version. In both cases,
the goal is to identify missing annotations within the large
set of negative training examples. Fig. 4 visually describes
the analysis procedure.
Figure 4: A flowchart of our analysis, with the
hyper-parameter selection and validation procedure
on the left, and the test procedure on the right. A
rounded rectangle represents an operation, repeated
in a cycle if attached to a sharp rectangle. A paral-
lelogram represent an output production step, and a
cylinder represents an interaction with the database.
3.1 Hyper-Parameter Analysis
For tSVD, the number of singular values is a hyper-parameter
that determines the rank of the final prediction matrix, and
is usually chosen through cross-validation. In an autoen-
coder network, the analogous hyper-parameter is the num-
ber of hidden units. These hyper-parameters control the
complexity of the model; keeping a large number of singu-
lar values or using a large number of hidden units results
in a very accurate reconstruction of the input data matrix,
but will overfit to noise, such as missing annotations and
inaccuracies. Figure 5 and Fig. 6 show how there is often
an optimal hyper-parameter of this type. The best hyper-
parameters for each data set are shown in Table 2.
The curves for each type of sub-ontology have similar be-
havior. For the Cellular Component annotation datasets,
the autoencoder algorithm always outperform tSVD, regard-
less of the number of singular values. For the Molecular
Function datasets, the autoencoder and tSVD have similar
AUCs with singular values in the range [20, 50], while au-
toencoder networks outperform tSVD in the other intervals.
For Biological Process datasets, the autoencoders outper-
form tSVD only when it uses the maximum possible number
of hidden units.
3.2 Predictive Accuracy
We test the tSVD and autoencoder algorithms on a set of
annotations added to the database between July 2009 and
March 2013. Training and testing was performed on the un-
folded matrices described in Equation 2 to eliminate the pos-
sibility of trivial predictions. The performance metric is the
percentage of the top 100 predictions from each method that
were added to the database during this period. The results
are displayed in Table 3, along with results from four other
state-of-the-art algorithms from the computational gene an-
notation literature:
1. tSVD with gene clustering (SIM1) [24] [25]
2. tSVD with gene clustering and term-term similarity
weights (SIM2) [24] [25]
3. Probabilistic Latent Semantic Analysis (pLSA) [26]
4. Latent Dirichlet Allocation (LDA) [27]
Overall, the tSVD-based techniques (tSVD, SIM1, SIM2)
achieve similar performance, and LDA appears comparable
to these methods. The pLSA algorithm performs slightly
better than these methods on most of the datasets, and the
autoencoder networks are consistently the best. The au-
toencoder networks improve performance by +6% to +36%
with respect to the second best method.
3.3 Novel Predictions
We examine the predicted annotations with highest likeli-
hood score that are not already annotated in the GO database.
Many of the predicted annotations are rather obvious high-
level descriptive features such as cellular process, so we list
the three interesting predictions with the highest likelihood
in Table 4, where we define interesting as an annotation with
distance greater than two from the root node in the ontology
tree.
4. CONCLUSIONS
Gene function annotation databases are an essential tool
in biomedical research, yet existing databases are incom-
plete and contain inaccuracies. In this work, we have shown
ACM-BCB 2014 536

Table 1: Quantitative characteristics of the considered annotation datasets in the July 2009 database version
versus the March 2013 database version used for testing. Numbers do not include annotations inferred from
electronic annotations (IEA), those for which no biological data is available (ND), obsolete terms, or obsolete
genes. #gs is the number of genes; #fs is the number of biological function features; #as is the number of
annotations; is the difference of annotation amounts of the #gs genes and the #fs features between the
two database versions, and % is the percentage difference.
July 2009 March 2013 #as comparison
Dataset #gs #fs #as #as ∆%
Bos taurus CC 497 493 8,003 9,683 1,680 20.99%
Bos taurus MF 543 856 4,295 6,394 2,099 48,87%
Bos taurus BP 512 2,719 17,145 27,075 9,930 57.92%
Gallus gallus CC 260 344 3,717 3,798 81 2.18%
Gallus gallus MF 309 501 2,358 2,654 256 10.86%
Gallus gallus BP 275 1,824 8,350 11,984 3,634 43.52%
(a) (b) (c)
Figure 5: AUC values for the tSVD and autoencoder predictions with different hyper-parameter choices
(number of singular values and number of hidden units, respectively) for Bos taurus Cellular Components
(5a), Molecular Functions (5b), and Biological Process (5c). For comparison purposes, we use an autoencoder
with a single hidden layer.
(a) (b) (c)
Figure 6: AUC values for the tSVD and autoencoder predictions with different hyper-parameter choices
(number of singular values and number of hidden units, respectively) for Gallus gallus Cellular Components
(6a), Molecular Functions (6b), and Biological Process (6c). For comparison purposes, we use an autoencoder
with a single hidden layer.
Table 2: Hyper-parameters were optimized separately for each algorithm and dataset. We select the number
of k singular values for tSVD, the number of clusters c for the SIM1 and SIM2 methods as described in [24];
the number of topics t in pLSA as described in [26]; the number of topics t in LDA as described in [27]; and
the number of hidden units h in each of d hidden layers for the autoencoder (AE) algorithm.
tSVD SIM pLSA LDA AE
Dataset k c t t h d
Bos taurus CC 90 3 12 465 2
Bos taurus MF 71 3 13 302 3
Bos taurus BP 241 5 112 500 2
Gallus gallus CC 51 3 25 258 3
Gallus gallus MF 41 2 74 271 3
Gallus gallus BP 111 3 126 253 2
ACM-BCB 2014 537

Figures
Citations
More filters
Proceedings ArticleDOI

DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks

TL;DR: DeepFool as discussed by the authors proposes the DeepFool algorithm to efficiently compute perturbations that fool deep networks, and thus reliably quantify the robustness of these classifiers by making them more robust.
Journal ArticleDOI

A State-of-the-Art Survey on Deep Learning Theory and Architectures

TL;DR: This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network and goes on to cover Convolutional Neural Network, Recurrent Neural Network (RNN), and Deep Reinforcement Learning (DRL).
Journal ArticleDOI

Ten quick tips for machine learning in computational biology

TL;DR: Ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that the authors observed hundreds of times in multiple bioinformatics projects are presented.
Posted Content

The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches.

TL;DR: This report presents a brief survey on development of DL approaches, including Deep Neural Network (DNN), Convolutional neural network (CNN), Recurrent Neural network (RNN) including Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL).
Journal ArticleDOI

Deep Learning with Convolutional Neural Networks Applied to Electromyography Data: A Resource for the Classification of Movements for Prosthetic Hands

TL;DR: The results show that convolutional neural networks with a very simple architecture can produce accurate results comparable to the average classical classification methods, and show that several factors can be fundamental for the analysis of sEMG data.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI

Singular value decomposition and least squares solutions

TL;DR: The decomposition of A is called the singular value decomposition (SVD) and the diagonal elements of ∑ are the non-negative square roots of the eigenvalues of A T A; they are called singular values.
Proceedings Article

Torch7: A Matlab-like Environment for Machine Learning

TL;DR: Torch7 is a versatile numeric computing framework and machine learning library that extends Lua that can easily be interfaced to third-party software thanks to Lua’s light interface.
Journal ArticleDOI

Neural networks and principal component analysis: learning from examples without local minima

TL;DR: The main result is a complete description of the landscape attached to E in terms of principal component analysis, showing that E has a unique minimum corresponding to the projection onto the subspace generated by the first principal vectors of a covariance matrix associated with the training patterns.
Journal ArticleDOI

Auto-association by multilayer perceptrons and singular value decomposition

TL;DR: It is shown that, for auto-association, the nonlinearities of the hidden units are useless and that the optimal parameter values can be derived directly by purely linear techniques relying on singular value decomposition and low rank matrix approximation, similar in spirit to the well-known Karhunen-Loève transform.
Related Papers (5)
Frequently Asked Questions (20)
Q1. What are the contributions in "Deep autoencoder neural networks for gene ontology annotation predictions" ?

In this work, the authors develop an algorithm that achieves both goals using deep autoencoder neural networks. With experiments on gene annotation data from the Gene Ontology project, the authors show that deep autoencoder networks achieve better performance than other standard machine learning methods, including the popular truncated singular value decomposition. 

Future work will address advantages and issues related to the application of the same methods and rule to the prediction of multi-terminologies, not only annotations. 

The authors learn the parameters of the autoencoder by performing stochastic gradient descent to minimize the reconstruction error, the MSE between a and â. 

One approach to improving gene function annotation data bases like GO is to use patterns in the known annotations to predict new annotations. 

Autoencoder neural networks were trained using the free GPU-accelerated software package Torch7 [21] using stochastic gradient descent with a learning rate of 0.01 for 25 iterations. 

For tSVD, the number of singular values is a hyper-parameter that determines the rank of the final prediction matrix, andis usually chosen through cross-validation. 

Deep networks of multiple hidden layers have an advantage over shallow machine learning methods in that they are able to model complex data with greater efficiency. 

2014 533neural networks have more expressive power, and may be better suited for discovering the underlying patterns in gene function annotation data. 

A small hidden layer in an autencoder network creates an information bottleneck, forcing the network to compress the data into a low-dimensional representation. 

The SVD of the matrix A is given byA = U ΣV T (3)where U is a m ×m unitary matrix (i.e. UT U = I), Σ is a non-negative diagonal matrix of size m × n, and V T is a n × n unitary matrix (i.e. V T V = I). 

It can be used to predict both inaccuracies and missing gene functions — a large value of ãij suggests that gene i should be annotated with term j, whereas a value close to zero suggests the opposite. 

In order to better comprehend why à can be used to predict gene-to-term annotations, the authors highlight that an alternative expression of Equation (4) can be obtained using basic linear algebra manipulations: 

In this section the authors describe the two annotation-prediction algorithms used in this paper: Truncated Singular Value Decomposition and Autoencoder Neural Network. 

Training and testing was performed on the unfolded matrices described in Equation 2 to eliminate the possibility of trivial predictions. 

The approach has numerous advantages: (1) autoencoders can be trained online with very large datasets, (2) they can be trained quickly using graphics processors, and (3) the number and size of the hidden layers provides an easy way of controlling the complexity of the model. 

â) = ||a− â||22 = ||a− (Wd · h+ biasd)||22 (10)When the hidden layer has fewer dimensions than a, the autoencoder learns a compressed representation of the training data. 

To makebinary predictions, the authors set a threshold τ such that Ã(i, j) > τ is interpreted as a prediction that gene i should be annotated with feature j. 

1. Thematrix à is the optimal rank-k approximation of A, i.e. the one that minimizes the norm (either the spectral norm or the Frobenius norm) ‖A−Ã‖ subject to the rank constraint. 

Based on Equation (5),the ith row of à can be written asãTi = a T i Vk V T k (7)ACM-BCB 2014 534Thus, the original annotation profile is first transformed in the eigen-term domain, while retaining only the first k eigenterms by the multiplication with Vk, and then mapped back to the original domain by means of V Tk . 

these are far from complete and new annotations are added regularly; over a third of the biological process annotations have been added within the last four years.