The COG database: an updated version includes eukaryotes
Roman L. Tatusov,Natalie D. Fedorova,John D. Jackson,Aviva R. Jacobs,Boris Kiryutin,Eugene V. Koonin,Dmitri M. Krylov,Raja Mazumder,Sergei L. Mekhedov,Anastasia N. Nikolskaya,B Sridhar Rao,Sergei Smirnov,Alexander V. Sverdlov,Sona Vasudevan,Yuri I. Wolf,Jodie J. Yin,Darren A. Natale +16 more
TLDR
A major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes is described and is expected to be a useful platform for functional annotation of newlysequenced genomes, including those of complex eukARYotes, and genome-wide evolutionary studies.Abstract:
The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after euk aryotic o rthologous g roups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The euk aryotic o rthologous g roups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.read more
Citations
More filters
Journal ArticleDOI
Database resources of the National Center for Biotechnology Information
David L. Wheeler,Deanna M. Church,Ron Edgar,Scott Federhen,Wolfgang Helmberg,Thomas L. Madden,Joan Pontius,Gregory D. Schuler,Lynn M. Schriml,Edwin Sequeira,Tugba O. Suzek,Tatiana Tatusova,Lukas Wagner +12 more
TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.
Journal ArticleDOI
A human gut microbial gene catalogue established by metagenomic sequencing
Junjie Qin,Ruiqiang Li,Jeroen Raes,Manimozhiyan Arumugam,Kristoffer Sølvsten Burgdorf,Chaysavanh Manichanh,Trine Nielsen,Nicolas Pons,Florence Levenez,Takuji Yamada,Daniel R. Mende,Junhua Li,Junming Xu,Shaochuan Li,Dongfang Li,Jianjun Cao,Bo Wang,Huiqing Liang,Huisong Zheng,Yinlong Xie,Julien Tap,Patricia Lepage,Marcelo Bertalan,Jean-Michel Batto,Torben Hansen,Denis Le Paslier,Allan Linneberg,H. Bjørn Nielsen,Eric Pelletier,Pierre Renault,Thomas Sicheritz-Pontén,Keith Turner,Hongmei Zhu,Chang Yu,Shengting Li,Min Jian,Yan Zhou,Yingrui Li,Xiuqing Zhang,Songgang Li,Nan Qin,Huanming Yang,Jian Wang,Søren Brunak,Joël Doré,Francisco Guarner,Karsten Kristiansen,Oluf Pedersen,Julian Parkhill,Jean Weissenbach,Peer Bork,S. Dusko Ehrlich,Jun Wang +52 more
TL;DR: The Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals are described, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species.
Journal ArticleDOI
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
Felipe A. Simão,Robert M. Waterhouse,Panagiotis Ioannidis,Evgenia V. Kriventseva,Evgeny M. Zdobnov +4 more
TL;DR: Zdobnov et al. as discussed by the authors proposed a measure for quantitative assessment of genome assembly and annotation completeness based on evolutionarily informed expectations of gene content, and implemented the assessment procedure in open-source software, with sets of Benchmarking Universal Single-Copy Orthologs.
Journal ArticleDOI
Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments
Gerard Talavera,Jose Castresana +1 more
TL;DR: Whether phylogenetic reconstruction improves after alignment cleaning or not is examined and cleaned alignments produce better topologies although, paradoxically, with lower bootstrap, which indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.
Journal ArticleDOI
Metagenomic Analysis of the Human Distal Gut Microbiome
Steven R. Gill,Mihai Pop,Robert T. DeBoy,Paul B. Eckburg,Paul B. Eckburg,Peter J. Turnbaugh,Buck S. Samuel,Jeffrey I. Gordon,David A. Relman,David A. Relman,Claire M. Fraser-Liggett,Karen E. Nelson +11 more
TL;DR: Using metabolic function analyses of identified genes, the human genome is compared with the average content of previously sequenced microbial genomes and humans are superorganisms whose metabolism represents an amalgamation of microbial and human attributes.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
Initial sequencing and analysis of the human genome.
Eric S. Lander,Lauren Linton,Bruce W. Birren,Chad Nusbaum,Michael C. Zody,Jennifer Baldwin,Keri Devon,Ken Dewar,Michael Doyle,William Fitzhugh,Roel Funke,Diane Gage,Katrina Harris,Andrew Heaford,John Howland,Lisa Kann,Jessica A. Lehoczky,Rosie Levine,Paul A. McEwan,Kevin McKernan,James Meldrim,Jill P. Mesirov,Cher Miranda,William Morris,Jerome Naylor,Christina Raymond,Mark Rosetti,Ralph Santos,Andrew Sheridan,Carrie Sougnez,Nicole Stange-Thomann,Nikola Stojanovic,Aravind Subramanian,Dudley Wyman,Jane Rogers,John Sulston,R Ainscough,Stephan Beck,David Bentley,John Burton,C M Clee,Nigel P. Carter,Alan Coulson,Rebecca Deadman,Panos Deloukas,Andrew Dunham,Ian Dunham,Richard Durbin,Lisa French,Darren Grafham,Simon G. Gregory,Tim Hubbard,Sean Humphray,Adrienne Hunt,Matthew Jones,Christine Lloyd,Amanda McMurray,Lucy Matthews,Simon Mercer,Sarah Milne,James C. Mullikin,Andrew J. Mungall,Robert W. Plumb,Mark T. Ross,Ratna Shownkeen,Sarah Sims,Robert H. Waterston,Richard K. Wilson,LaDeana W. Hillier,John Douglas Mcpherson,Marco A. Marra,Elaine R. Mardis,Lucinda Fulton,Asif T. Chinwalla,Kymberlie H. Pepin,Warren Gish,Stephanie L. Chissoe,Michael C. Wendl,Kim D. Delehaunty,Tracie L. Miner,Andrew Delehaunty,Jason B. Kramer,Lisa Cook,Robert S. Fulton,Douglas L. Johnson,Patrick Minx,Sandra W. Clifton,Trevor Hawkins,Elbert Branscomb,Paul Predki,Paul G. Richardson,Sarah Wenning,Tom Slezak,Norman A. Doggett,Jan Fang Cheng,Anne S. Olsen,Susan Lucas,Christopher J. Elkin,Edward Uberbacher,Marvin Frazier,Richard A. Gibbs,Donna M. Muzny,Steven E. Scherer,John Bouck,Erica Sodergren,Kim C. Worley,Catherine M. Rives,James H. Gorrell,Michael L. Metzker,Susan L. Naylor,Raju Kucherlapati,David L. Nelson,George M. Weinstock,Yoshiyuki Sakaki,Asao Fujiyama,Masahira Hattori,Tetsushi Yada,Atsushi Toyoda,Takehiko Itoh,Chiharu Kawagoe,Hidemi Watanabe,Yasushi Totoki,Todd D. Taylor,Jean Weissenbach,Roland Heilig,William Saurin,François Artiguenave,Philippe Brottier,Thomas Brüls,Eric Pelletier,Catherine Robert,Patrick Wincker,André Rosenthal,Matthias Platzer,Gerald Nyakatura,Stefan Taudien,Andreas Rump,Douglas R. Smith,Lynn Doucette-Stamm,Marc Rubenfield,Keith Weinstock,Mei Lee Hong,Joann Dubois,Huanming Yang,Jun Yu,Jian Wang,Guyang Huang,Jun Gu,Leroy Hood,Lee Rowen,Anup Madan,Shizen Qin,Ronald W. Davis,Nancy A. Federspiel,A. Pia Abola,Michael Proctor,Bruce A. Roe,Feng Chen,Huaqin Pan,Juliane Ramser,Hans Lehrach,Richard Reinhardt,W. Richard McCombie,Melissa De La Bastide,Neilay Dedhia,H. Blöcker,K. Hornischer,Gabriele Nordsiek,Richa Agarwala,L. Aravind,Jeffrey A. Bailey,Alex Bateman,Serafim Batzoglou,Ewan Birney,Peer Bork,Daniel G. Brown,Christopher B. Burge,Lorenzo Cerutti,Hsiu Chuan Chen,Deanna M. Church,Michele Clamp,Richard R. Copley,Tobias Doerks,Sean R. Eddy,Evan E. Eichler,Terrence S. Furey,James E. Galagan,James G. R. Gilbert,Cyrus L. Harmon,Yoshihide Hayashizaki,David Haussler,Henning Hermjakob,Karsten Hokamp,Wonhee Jang,L. Steven Johnson,Thomas A. Jones,Simon Kasif,Arek Kaspryzk,Scot Kennedy,W. James Kent,Paul Kitts,Eugene V. Koonin,Ian F Korf,David Kulp,Doron Lancet,Todd M. Lowe,Aoife McLysaght,Tarjei S. Mikkelsen,John V. Moran,Nicola Mulder,Victor J. Pollara,Chris P. Ponting,Greg Schuler,Jörg Schultz,Guy Slater,Arian F.A. Smit,Elia Stupka,Joseph Szustakowki,Danielle Thierry-Mieg,Jean Thierry-Mieg,Lukas Wagner,John W. Wallis,Raymond Wheeler,Alan Williams,Yuri I. Wolf,Kenneth H. Wolfe,Shiaw Pyng Yang,Ru Fang Yeh,Francis S. Collins,Mark S. Guyer,Jane Peterson,Adam Felsenfeld,Kris A. Wetterstrand,Richard M. Myers,Jeremy Schmutz,Mark Dickson,Jane Grimwood,David R. Cox,Maynard V. Olson,Rajinder Kaul,Christopher K. Raymond,Nobuyoshi Shimizu,Kazuhiko Kawasaki,Shinsei Minoshima,Glen A. Evans,Maria Athanasiou,Roger A. Schultz,Aristides Patrinos,Michael J. Morgan +248 more
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Journal ArticleDOI
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Journal ArticleDOI
Initial sequencing and comparative analysis of the mouse genome.
Robert H. Waterston,Kerstin Lindblad-Toh,Ewan Birney,Jane Rogers,Josep F. Abril,Pankaj K. Agarwal,Richa Agarwala,Rachel Ainscough,Marina Alexandersson,Peter An,Stylianos E. Antonarakis,John Attwood,Robert Baertsch,J Bailey,K F Barlow,Stephan Beck,Eric Berry,Bruce W. Birren,Toby Bloom,Peer Bork,Marc Botcherby,Nicolas Bray,Michael R. Brent,Daniel G. Brown,Daniel G. Brown,Stephen D. Brown,Carol J. Bult,John Burton,Jonathan Butler,R. D. Campbell,Piero Carninci,Simon Cawley,Francesca Chiaromonte,Asif T. Chinwalla,Deanna M. Church,Michele Clamp,C M Clee,Francis S. Collins,Lisa Cook,Richard R. Copley,Alan Coulson,Olivier Couronne,James Cuff,Val Curwen,Tim Cutts,Mark J. Daly,Robert David,Joy Davies,Kimberly D. Delehaunty,Justin Deri,Emmanouil T. Dermitzakis,Colin N. Dewey,Nicholas J. Dickens,Mark Diekhans,Sheila Dodge,Inna Dubchak,Diane M. Dunn,Sean R. Eddy,Laura Elnitski,Richard D. Emes,Pallavi Eswara,Eduardo Eyras,Adam Felsenfeld,Ginger A. Fewell,Paul Flicek,Karen Foley,Wayne N. Frankel,Lucinda Fulton,Robert S. Fulton,Terrence S. Furey,Diane Gage,Richard A. Gibbs,Gustavo Glusman,Sante Gnerre,Nick Goldman,Leo Goodstadt,Darren Grafham,Tina Graves,Eric D. Green,Simon G. Gregory,Roderic Guigó,Mark S. Guyer,Ross C. Hardison,David Haussler,Yoshihide Hayashizaki,Deana W. LaHillier,Angela S. Hinrichs,Wratko Hlavina,Timothy Holzer,Fan Hsu,Axin Hua,Tim Hubbard,Adrienne Hunt,Ian J. Jackson,David B. Jaffe,L. Steven Johnson,Matthew Jones,Thomas A. Jones,A Joy,Michael Kamal,Elinor K. Karlsson,Donna Karolchik,Arkadiusz Kasprzyk,Jun Kawai,Evan Keibler,Cristyn Kells,W. James Kent,Andrew Kirby,Diana L. Kolbe,Ian F Korf,Raju Kucherlapati,Edward J. Kulbokas,David Kulp,Tom Landers,J. P. Leger,Steven Leonard,Ivica Letunic,Rosie Levine,Jia Li,Ming Li,Christine Lloyd,Susan Lucas,Bin Ma,Donna Maglott,Elaine R. Mardis,Lucy Matthews,Evan Mauceli,John Mayer,Megan McCarthy,W. Richard McCombie,Stuart McLaren,Kirsten McLay,John Douglas Mcpherson,James Meldrim,Beverley Meredith,Jill P. Mesirov,Webb Miller,Tracie L. Miner,Emmanuel Mongin,Kate Montgomery,Michael J. Morgan,Richard Mott,James C. Mullikin,Donna M. Muzny,William E. Nash,Joanne O. Nelson,Michael N. Nhan,Robert Nicol,Zemin Ning,Chad Nusbaum,Michael J. O’Connor,Yasushi Okazaki,Karen Oliver,Emma Overton-Larty,Lior Pachter,Genís Parra,Kymberlie H. Pepin,Jane Peterson,Pavel A. Pevzner,Robert W. Plumb,Craig Pohl,Alex Poliakov,Tracy C. Ponce,Chris P. Ponting,Simon C. Potter,Michael A. Quail,Alexandre Reymond,Bruce A. Roe,Krishna M. Roskin,Edward M. Rubin,Alistair G. Rust,Ralph Santos,Victor Sapojnikov,Brian Schultz,Jörg Schultz,Matthias S. Schwartz,Scott Schwartz,Carol Scott,Steven Seaman,Steve Searle,Ted Sharpe,Andrew Sheridan,Ratna Shownkeen,Sarah Sims,Jonathan Singer,Guy Slater,Arian F.A. Smit,Douglas Smith,Brian Spencer,Arne Stabenau,Nicole Stange-Thomann,Charles W. Sugnet,Mikita Suyama,Glenn Tesler,Johanna Thompson,David Torrents,Evanne Trevaskis,John Tromp,Catherine Ucla,Abel Ureta-Vidal,Jade P. Vinson,Andrew von Niederhausern,Claire M. Wade,Melanie M. Wall,R. J. Weber,Robert B. Weiss,Michael C. Wendl,Anthony P. West,Kris A. Wetterstrand,Raymond Wheeler,Simon Whelan,Jamey Wierzbowski,David Willey,Sophie Williams,Richard K. Wilson,Eitan E. Winter,Kim C. Worley,Dudley Wyman,Shan Yang,Shiaw Pyng Yang,Evgeny M. Zdobnov,Michael C. Zody,Eric S. Lander +222 more
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Journal ArticleDOI
The genome sequence of Drosophila melanogaster
Mark Raymond Adams,Susan E. Celniker,Robert A. Holt,Cheryl A. Evans,Jeannine D. Gocayne,Peter Amanatides,Steve Scherer,Peter W. Li,Roger A. Hoskins,R. Galle,Reed A. George,Suzanna E. Lewis,Stephen Richards,Michael Ashburner,Scott Henderson,Granger G. Sutton,Jennifer R. Wortman,Mark Yandell,Qing Zhang,Lin Chen,Rhonda C. Brandon,Yu-Hui Rogers,R. Blazej,Mark Champe,Barret D. Pfeiffer,Kenneth H. Wan,Colleen Doyle,E. G. Baxter,Gregg Helt,Catherine R. Nelson,G. L. Gabor Miklos,Josep F. Abril,A. Agbayani,Huijin An,C. Andrews-Pfannkoch,Danita Baldwin,Richard M. Ballew,Anand Basu,James Baxendale,Leyla Bayraktaroglu,Ellen M. Beasley,Karen Beeson,Panayiotis V. Benos,Benjamin P. Berman,D. Bhandari,Slava Bolshakov,Dana Borkova,Michael R. Botchan,John Bouck,Peter Brokstein,Philippe Brottier,Kenneth C. Burtis,Dana A. Busam,Heather Butler,Edouard Cadieu,I. Chandra,J. Michael Cherry,Simon Cawley,Carl Dahlke,Lionel Davenport,P. Davies,B. de Pablos,Arthur L. Delcher,Zuoming Deng,A. Deslattes Mays,Ian M. Dew,Susanne Dietz,Kristina Dodson,Lisa Doup,Michael Downes,Shannon Dugan-Rocha,B. C. Dunkov,Patrick J. Dunn,K. J. Durbin,Carlos Evangelista,Concepcion Ferraz,Steven Ferriera,Wolfgang Fleischmann,Carl Fosler,Andrei Gabrielian,Neha Garg,William M. Gelbart,Kenneth Glasser,A. Glodek,Fangcheng Gong,J. Harley Gorrell,Zhiping Gu,Ping Guan,Michael Harris,Nomi L. Harris,Damon A. Harvey,Thomas J. Heiman,Judith Hernandez,Jarrett Houck,Damon Hostin,K. Houston,Timothy Howland,Ming-Hui Wei,Chinyere Ibegwam,M. Jalali,Francis Kalush,Gary H. Karpen,Zhaoxi Ke,James A. Kennison,K. A. Ketchum,B. E. Kimmel,Chinnappa D. Kodira,Cheryl L. Kraft,Saul A. Kravitz,David Kulp,Zhongwu Lai,Paul Lasko,Yiding Lei,Alexander Levitsky,Jun Li,Zhenya Li,Yunye Liang,Xiaoying Lin,Xiangjun Liu,B. Mattei,Tina C. McIntosh,Michael P. McLeod,D. McPherson,Gennady V. Merkulov,Natalia Milshina,Clark M. Mobarry,J. Morris,A. Moshrefi,Stephen M. Mount,Mee Moy,Brian Murphy,Lee Murphy,Donna M. Muzny,David L. Nelson,David R. Nelson,Keith Nelson,K. Nixon,Deborah R. Nusskern,Joanne Pacleb,Michael J. Palazzolo,G. S. Pittman,Sue Pan,J. Pollard,Vinita Puri,Martin G. Reese,Knut Reinert,Karin A. Remington,Robert D. C. Saunders,Robert D. C. Saunders,F. Scheeler,H. Shen,B. Christopher Shue,Inga Siden-Kiamos,Michael Simpson,Marian P. Skupski,Thomas J. Smith,Eugene G. Spier,Allan C. Spradling,Mark Stapleton,Renee Strong,E. Sun,Robert Svirskas,C. Tector,Russell Turner,Eli Venter,Aihui Wang,Xianyuan Wang,Zhen Yuan Wang,David A. Wassarman,George M. Weinstock,Jean Weissenbach,Sherita Williams,Trevor Woodage,Kim C. Worley,D. Wu,Shih-Hung Yang,Q. Alison Yao,Jane Ye,R. F. Yeh,Jayshree Zaveri,Ming Zhan,Gefei Zhang,Qi Zhao,Liansheng Zheng,Xiangqun Zheng,Fei Zhong,Wenyan Zhong,X. Zhou,Shiaoping C. Zhu,Xiancan Zhu,Hamilton O. Smith,Richard A. Gibbs,Eugene W. Myers,Gerald M. Rubin,J. Craig Venter +194 more
TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Related Papers (5)
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more