InParanoid 7 : new algorithms and tools for eukaryotic orthology analysis
Gabriel Östlund,Thomas Schmitt,Kristoffer Forslund,Tina Köstler,David N. Messina,Sanjit Roopra,Oliver Frings,Erik L. L. Sonnhammer +7 more
Reads0
Chats0
TLDR
A two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows in homology assignment.Abstract:
The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.read more
Citations
More filters
Journal ArticleDOI
OrthoFinder: phylogenetic orthology inference for comparative genomics
David M. Emms,Steven L. Kelly +1 more
TL;DR: This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics.
Posted ContentDOI
OrthoFinder: phylogenetic orthology inference for comparative genomics
David M. Emms,Steven L. Kelly +1 more
TL;DR: This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted genes trees, gene duplication events, the rooted species tree, and comparative genomic statistics.
Journal ArticleDOI
The genome of woodland strawberry ( Fragaria vesca )
Vladimir Shulaev,Daniel J. Sargent,Ross N. Crowhurst,Todd C. Mockler,Otto Folkerts,Arthur L. Delcher,Pankaj Jaiswal,Keithanne Mockaitis,Aaron Liston,Shrinivasrao P. Mane,Paul Burns,Thomas M. Davis,Janet P. Slovin,Nahla V. Bassil,Roger P. Hellens,Clive Evans,Tim Harkins,Chinnappa D. Kodira,Brian Desany,Oswald Crasta,Roderick V. Jensen,Andrew C. Allan,Andrew C. Allan,Todd P. Michael,João C. Setubal,Jean-Marc Celton,D. Jasper G. Rees,Kelly P. Williams,Sarah H. Holt,Juan Jairo Ruiz Rojas,Mithu Chatterjee,Bo Liu,Herman Silva,Lee A. Meisel,Avital Adato,Sergei A. Filichkin,Michela Troggio,Roberto Viola,Tia-Lynn Ashman,Hao Wang,Palitha Dharmawardhana,Justin Elser,Rajani Raja,Henry D. Priest,Douglas W. Bryant,Samuel E. Fox,Scott A. Givan,Larry J. Wilhelm,Sushma Naithani,Alan Christoffels,David Y. Salama,Jade Carter,Elena Lopez Girona,Anna Zdepski,Wenqin Wang,Randall A. Kerstetter,Wilfried Schwab,Schuyler S. Korban,Jahn Davik,Amparo Monfort,Beatrice Denoyes-Rothan,Pere Arús,Ron Mittler,Barry S. Flinn,Asaph Aharoni,Jeffrey L. Bennetzen,Steven L. Salzberg,Allan W. Dickerman,Riccardo Velasco,Mark Borodovsky,Richard E. Veilleux,Kevin M. Folta +71 more
TL;DR: New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted, and macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes.
Journal ArticleDOI
A global genetic interaction network maps a wiring diagram of cellular function
Michael Costanzo,Benjamin VanderSluis,Elizabeth N. Koch,Anastasia Baryshnikova,Carles Pons,Guihong Tan,Wen Wang,Matej Usaj,Julia Hanchard,Susan D. Lee,Vicent Pelechano,Erin B. Styles,Maximilian Billmann,Jolanda van Leeuwen,Nydia Van Dyk,Zhen Yuan Lin,Elena Kuzmin,Justin Nelson,Jeff S. Piotrowski,Tharan Srikumar,Sondra Bahr,Yiqun Chen,Raamesh Deshpande,Christoph F. Kurat,Sheena C. Li,Zhijian Li,Mojca Mattiazzi Usaj,Hiroki Okada,Natasha Pascoe,Bryan Joseph San Luis,Sara Sharifpoor,Emira Shuteriqi,Scott W. Simpkins,Jamie Snider,Harsha Garadi Suresh,Yizhao Tan,Hongwei Zhu,Noël Malod-Dognin,Vuk Janjić,Natasa Przulj,Natasa Przulj,Olga G. Troyanskaya,Igor Stagljar,Tian Xia,Tian Xia,Yoshikazu Ohya,Anne-Claude Gingras,Brian Raught,Michael Boutros,Lars M. Steinmetz,Lars M. Steinmetz,Claire Moore,Adam P. Rosebrock,Amy A. Caudy,Chad L. Myers,Brenda J. Andrews,Charles Boone +56 more
TL;DR: A global genetic interaction network highlights the functional organization of a cell and provides a resource for predicting gene and pathway function and how coherent sets of negative or positive genetic interactions connect protein complex and pathways to map a functional wiring diagram of the cell.
Journal ArticleDOI
A census of human soluble protein complexes.
Pierre C. Havugimana,G. Traver Hart,Tamás Nepusz,Haixuan Yang,Andrei L. Turinsky,Zhihua Li,Peggy I. Wang,Daniel R. Boutz,Vincent Fong,Sadhna Phanse,Mohan Babu,Stephanie A. Craig,Pingzhao Hu,Cuihong Wan,James Vlasblom,Vaqaar Un Nisa Dar,Alexandr Bezginov,Greg W. Clark,Gabriel C. Wu,Shoshana J. Wodak,Elisabeth R. M. Tillier,Alberto Paccanaro,Edward M. Marcotte,Andrew Emili +23 more
TL;DR: Whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.
References
More filters
Journal ArticleDOI
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.
Journal ArticleDOI
Distinguishing Homologous From Analogous Proteins
TL;DR: This work provides a means by which it is possible to determine whether two groups of related proteins have a common ancestor or are of independent origin, and how many nucleotide positions must differ in the genes encoding the two presumptively homologous proteins.
Journal ArticleDOI
The TIGR Rice Genome Annotation Resource: Improvements and New Features
Shu Ouyang,Wei Zhu,John A. Hamilton,Haining Lin,Matthew Campbell,Kevin L. Childs,Françoise Thibaud-Nissen,Renae L. Malek,Yuandan Lee,Li Zheng,Joshua Orvis,Brian J. Haas,Jennifer R. Wortman,C. Robin Buell +13 more
TL;DR: Through incorporation of multiple transcript and proteomic expression data sets, the Institute for Genomic Research has been able to annotate 24 799 genes (31 739 gene models), representing ∼50% of the total gene models, as expressed in the rice genome.
Journal ArticleDOI
Protein database searches using compositionally adjusted substitution matrices
Stephen F. Altschul,John C. Wootton,E. Michael Gertz,Richa Agarwala,Aleksandr Morgulis,Alejandro A. Schäffer,Yi-Kuo Yu +6 more
TL;DR: This work has recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions.
Journal ArticleDOI
The Arabidopsis Information Resource (TAIR): gene structure and function annotation.
David Swarbreck,Christopher Wilks,Philippe Lamesch,Tanya Z. Berardini,Margarita Garcia-Hernandez,Hartmut Foerster,Donghui Li,Tom Meyer,Robert J. Muller,Larry Ploetz,Amie Radenbaugh,Shanker Singh,Vanessa Swing,Christophe Tissier,Peifen Zhang,Eva Huala +15 more
TL;DR: A combination of manual and computational methods were used to generate this release, which contains 27 029 protein-coding genes, 3889 pseudogenes or transposable elements and 1123 ncRNAs (32 041 genes in all, 37 019 gene models).
Related Papers (5)
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more