Showing papers by "Mikael Bodén published in 2005"

PDF

Open Access

Journal Article•DOI•

Prediction of subcellular localization using sequence-biased recurrent networks

[...]

Mikael Bodén¹, John Hawkins¹•Institutions (1)

15 May 2005-Bioinformatics

TL;DR: This work contrasts the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model, and demonstrates that recurrent networks improve the overall prediction performance.

...read moreread less

Abstract: Motivation: Targeting peptides direct nascent proteins to their specific subcellular compartment. Knowledge of targeting signals enables informed drug design and reliable annotation of gene products. However, due to the low similarity of such sequences and the dynamical nature of the sorting process, the computational prediction of subcellular localization of proteins is challenging. Results: We contrast the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model. The models are evaluated in terms of performance at the residue level and at the sequence level, and demonstrate that recurrent networks improve the overall prediction performance. Compared to the original results reported for TargetP, an ensemble of the tested models increases the accuracy by 6 and 5% on non-plant and plant data, respectively. Availability: The Protein Prowler incorporating the recurrent network predictor described in this paper is available online at http://pprowler.imb.uq.edu.au/ Contact: mikael@itee.uq.edu.au

...read moreread less

146 citations

Journal Article•DOI•

The Applicability of Recurrent Neural Networks for Biological Sequence Analysis

[...]

John Hawkins¹, Mikael Bodén¹•Institutions (1)

University of Queensland¹

01 Jul 2005-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: This paper argues that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset, and demonstrates that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides.

...read moreread less

Abstract: Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.

...read moreread less

45 citations

Proceedings Article•DOI•

BLOMAP: An encoding of amino acids which improves signal peptide cleavage site prediction

[...]

Stefan Maetschke, Michael Towsey, Mikael Bodén

01 Jan 2005

TL;DR: A comparison of several standard encoding methods shows, that for cleavage site prediction the frequently used orthonormal encoding is inferior compared to other methods.

...read moreread less

Abstract: Research on cleavage site prediction for signal peptides has focused mainly on the application of different classification algorithms to achieve improved prediction accuracies. This paper addresses the fundamental issue of amino acid encoding to present amino acid sequences in the most beneficial way for machine learning algorithms. A comparison of several standard encoding methods shows, that for cleavage site prediction the frequently used orthonormal encoding is inferior compared to other methods. The best results are achieved with a new encoding method named BLOMAP - based on the BLOSUM62 substitution matrix - using a Naive Bayes classifier.

...read moreread less

30 citations

Journal Article•DOI•

Improved access to sequential motifs: a note on the architectural bias of recurrent networks

[...]

Mikael Bodén¹, John Hawkins¹•Institutions (1)

University of Queensland¹

01 Mar 2005-IEEE Transactions on Neural Networks

TL;DR: By experimentation, it is shown that the bias of recurrent neural networks-recently analyzed by Tino et al. and Hammer and Tino-offers superior access to motifs compared to the standardly used feedforward neural networks.

...read moreread less

Abstract: For many biological sequence problems the available data occupies only sparse regions of the problem space. To use machine learning effectively for the analysis of sparse data we must employ architectures with an appropriate bias. By experimentation we show that the bias of recurrent neural networks-recently analyzed by Tino et al. and Hammer and Tino-offers superior access to motifs (sequential patterns) compared to the, in bioinformatics, standardly used feedforward neural networks.

...read moreread less

12 citations

Book Chapter•DOI•

Exploiting sequence dependencies in the prediction of peroxisomal proteins

[...]

Mark Wakabayashi, John Hawkins¹, Stefan Maetschke¹, Mikael Bodén¹•Institutions (1)

University of Queensland¹

06 Jul 2005

TL;DR: A range of machine learning algorithms are benchmarked, and it is shown that a classifier – based on the Support Vector Machine – produces more accurate results when dependencies between the conserved motif and the preceding section are exploited.

...read moreread less

Abstract: Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed to capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier – based on the Support Vector Machine – produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.

...read moreread less

5 citations

Proceedings Article•DOI•

Predicting Peroxisomal Proteins

[...]

John Hawkins¹, Mikael Bodén¹•Institutions (1)

University of Queensland¹

01 Jan 2005

TL;DR: This paper reports on the development of an SVM classifier with a separately trained logistic output function that uses an input window containing 12 consecutive residues at the C-terminus and the amino acid composition of the full sequence to predict peroxisomal proteins.

...read moreread less

Abstract: PTS1 proteins are peroxisomal matrix proteins that have a well conserved targeting motif at the C-terminal end. However, this motif is present in many non peroxisomal proteins as well, thus predicting peroxisomal proteins involves differentiating fake PTS1 signals from actual ones. In this paper we report on the development of an SVM classifier with a separately trained logistic output function. The model uses an input window containing 12 consecutive residues at the C-terminus and the amino acid composition of the full sequence. The final model gives a Matthews Correlation Coefficient of 0.77, representing an increase of 54% compared with the well-known PeroxiP predictor. We test the model by applying it to several proteomes of eukaryotes for which there is no evidence of a peroxisome, producing a false positive rate of 0.088%.

...read moreread less

5 citations

Computational biology and bioinformatics

[...]

Q. Yang, A. Schliep, C. Steinhoff, A. Schönhuth, A. Kundaje, M. Middendorf, F. Gao, C. Wiggins, C. Leslie, S. Kaski, Janne Nikkilä, J. Sinkkonen, Leo Lahti, C. Roos, J. Zhang, Wen Gao, J. Cai, S. He, R. Zeng, R. Chen, E. Keedwell, A. Narayanan, John Hawkins, Mikael Bodén, Biological Validation, M. Gustafsson, A. Lombardi, C. Demir, B. Yener - Show less +25 more

01 Jan 2005

2 citations

Proceedings Article•DOI•

Heuristic Algorithm for Computing Reversal Distance with MultiGene Families via Binary Integer Programming

[...]

Jakkarin Suksawatchon¹, Chidchanok Lursinsap, Mikael Bodén•Institutions (1)

Chulalongkorn University¹

01 Jan 2005

TL;DR: A new heuristic algorithm is proposed to compute the reversal distance between two genomes with multigene families via the concept of binary integer programming without removing gene duplicates.

...read moreread less

Abstract: Hannenhalli and Pevzner developed the first polynomial-time algorithm for the combinatorial problem of sorting of signed genomic data. Their algorithm solves the minimum number of reversals required for rearranging a genome to another when gene duplication is nonexisting. In this paper, we show how to extend the Hannenhalli-Pevzner approach to genomes with multigene families. We propose a new heuristic algorithm to compute the reversal distance between two genomes with multigene families via the concept of binary integer programming without removing gene duplicates. The experimental results on simulated and real biological data demonstrate that the proposed algorithm is able to find the reversal distance accurately.

...read moreread less

1 citations

Proceedings Article•DOI•

Detecting residues in targeting peptides

[...]

Mikael Bodén¹, John Hawkins•Institutions (1)

University of Queensland¹

01 Jan 2005

TL;DR: This work can be seen as building upon the currently popular series of predictors SignalP and TargetP, by exploiting the inherent bias for sequential pattern recognition exhibited by recurrent networks.

...read moreread less

Abstract: Knowledge of targeting signals is of immense importance for understanding the cellular processes by which proteins are sorted and transported. This paper presents a system of recurrent neural networks which demonstrate an ability to detect residues belonging to specific targeting peptides with greater accuracy than current feed forward models. The system can subsequently be used for determining sub-cellular localisation of proteins and for understanding the factors underlying translocation. The work can be seen as building upon the currently popular series of predictors SignalP and TargetP, by exploiting the inherent bias for sequential pattern recognition exhibited by recurrent networks.

...read moreread less

1 citations

Proceedings Article•DOI•

Predicting Structural Disruption of Proteins Caused by Crossover

[...]

Denis C. Bauer¹, Mikael Bodén¹, Ricarda Thier, Zheng Yuan•Institutions (1)

University of Queensland¹

01 Jan 2005

TL;DR: A machine learning model is presented that predicts a structural disruption score from a protein’s primary structure using a two step approach and indicates the feasibility of replacing SCHEMA with little loss of precision.

...read moreread less

Abstract: We present a machine learning model that predicts a structural disruption score from a protein’s primary structure. SCHEMA was introduced by Frances Arnold and colleagues as a method for determining putative recombination sites of a protein on the basis of the full (PDB) description of its structure. The present method provides an alternative to SCHEMA that is able to determine the same score from sequence data only. Circumventing the need for resolving the full structure enables the exploration of yet unresolved and even hypothetical sequences for protein design efforts. Deriving the SCHEMA score from a primary structure is achieved using a two step approach: first predicting a secondary structure from the sequence and then predicting the SCHEMA score from the predicted secondary structure. The correlation coefficient for the prediction is 0.88 and indicates the feasibility of replacing SCHEMA with little loss of precision.

...read moreread less