scispace - formally typeset
Search or ask a question

Showing papers on "Munich Information Center for Protein Sequences published in 2013"


Journal ArticleDOI
TL;DR: Investigating the effect of different genome annotations on RNA-seq quantification and providing guidelines for choosing a genome annotation based on research focus demonstrate that the selection of human genome annotation results in different gene expression estimates.
Abstract: Genome annotation is a crucial component of RNA-seq data analysis. Much effort has been devoted to producing an accurate and rational annotation of the human genome. An annotated genome provides a comprehensive catalogue of genomic functional elements. Currently, at least six human genome annotations are publicly available, including AceView Genes, Ensembl Genes, H-InvDB Genes, RefSeq Genes, UCSC Known Genes, and Vega Genes. Characteristics of these annotations differ because of variations in annotation strategies and information sources. When performing RNA-seq data analysis, researchers need to choose a genome annotation. However, the effect of genome annotation choice on downstream RNA-seq expression estimates is still unclear. This study (1) investigates the effect of different genome annotations on RNA-seq quantification and (2) provides guidelines for choosing a genome annotation based on research focus. We define the complexity of human genome annotations in terms of the number of genes, isoforms, and exons. This definition facilitates an investigation of potential relationships between complexity and variations in RNA-seq quantification. We apply several evaluation metrics to demonstrate the impact of genome annotation choice on RNA-seq expression estimates. In the mapping stage, the least complex genome annotation, RefSeq Genes, appears to have the highest percentage of uniquely mapped short sequence reads. In the quantification stage, RefSeq Genes results in the most stable expression estimates in terms of the average coefficient of variation over all genes. Stable expression estimates in the quantification stage translate to accurate statistics for detecting differentially expressed genes. We observe that RefSeq Genes produces the most accurate fold-change measures with respect to a ground truth of RT-qPCR gene expression estimates. Based on the observed variations in the mapping, quantification, and differential expression calling stages, we demonstrate that the selection of human genome annotation results in different gene expression estimates. When conducting research that emphasizes reproducible and robust gene expression estimates, a less complex genome annotation may be preferred. However, simpler genome annotations may limit opportunities for identifying or characterizing novel transcriptional or regulatory mechanisms. When conducting research that aims to be more exploratory, a more complex genome annotation may be preferred.

48 citations


Journal ArticleDOI
TL;DR: This review provides an overview of various bioinformatics methods and tools for the analysis of metazoan mitochondrial genomes and gives special emphasis to substitution models or data treatment that reduces certain systematic biases that are typical for meetazoan mitogenomes.

39 citations


Journal ArticleDOI
TL;DR: The main novelties distinguishing PRIMOS from other secondary PPI databases are the reassessment of known PPIs, and the capacity to validate personal experimental data by the authors' peer-reviewed, homology-based validation.
Abstract: Steady improvements in proteomics present a bioinformatic challenge to retrieve, store, and process the accumulating and often redundant amount of information. In particular, a large-scale comparison and analysis of protein–protein interaction (PPI) data requires tools for data interpretation as well as validation. At this juncture, the Protein Interaction and Molecule Search (PRIMOS) platform represents a novel web portal that unifies six primary PPI databases (BIND, Biomolecular Interaction Network Database; DIP, Database of Interacting Proteins; HPRD, Human Protein Reference Database; IntAct; MINT, Molecular Interaction Database; and MIPS, Munich Information Center for Protein Sequences) into a single consistent repository, which currently includes more than 196,700 redundancy-removed PPIs. PRIMOS supports three advanced search strategies centering on disease-relevant PPIs, on inter- and intra-organismal crosstalk relations (e.g., pathogen–host interactions), and on highly connected protein no...

11 citations


Journal ArticleDOI
01 Jan 2013

1 citations