Showing papers on "Genome published in 2005"

PDF

Open Access

Journal Article•DOI•

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

[...]

Adam Siepel¹, Gill Bejerano, Jakob Skou Pedersen², Angie S. Hinrichs, Minmei Hou, Kate R. Rosenbloom, Hiram Clawson, John Spieth, LaDeana W. Hillier, Stephen Richards, George M. Weinstock, Richard K. Wilson, Richard A. Gibbs, W. James Kent, Webb Miller, David Haussler - Show less +12 more•Institutions (2)

University of California, Santa Cruz¹, Aarhus University²

01 Aug 2005-Genome Research

TL;DR: A comprehensive search for conserved elements in vertebrate genomes is conducted, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes), using a two-state phylogenetic hidden Markov model (phylo-HMM).

...read moreread less

Abstract: We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.

...read moreread less

3,719 citations

Journal Article•DOI•

The map-based sequence of the rice genome

[...]

Takashi Matsumoto¹, Jianzhong Wu¹, Hiroyuki Kanamori¹, Yuichi Katayose¹ +262 more•Institutions (25)

11 Aug 2005-Nature

TL;DR: A map-based, finished quality sequence that covers 95% of the 389 Mb rice genome, including virtually all of the euchromatin and two complete centromeres, and finds evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes.

...read moreread less

Abstract: Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.

...read moreread less

3,423 citations

Journal Article•DOI•

The Transcriptional Landscape of the Mammalian Genome

[...]

Piero Carninci, Takeya Kasukawa¹, Shintaro Katayama, Julian Gough +194 more•Institutions (36)

02 Sep 2005-Science

TL;DR: Detailed polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.

...read moreread less

Abstract: This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.

...read moreread less

3,412 citations

Journal Article•DOI•

Genome-Wide Identification and Testing of Superior Reference Genes for Transcript Normalization in Arabidopsis

[...]

Tomasz Czechowski¹, Mark Stitt¹, Thomas Altmann¹, Michael K. Udvardi¹, Wolf-Riidiger Scheible¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

01 Sep 2005-Plant Physiology

TL;DR: Hundreds of Arabidopsis genes were found that outperform traditional reference genes in terms of expression stability throughout development and under a range of environmental conditions, and the developed PCR primers or hybridization probes for the novel reference genes will enable better normalization and quantification of transcript levels inArabidopsis in the future.

...read moreread less

Abstract: Gene transcripts with invariant abundance during development and in the face of environmental stimuli are essential reference points for accurate gene expression analyses, such as RNA gel-blot analysis or quantitative reverse transcription-polymerase chain reaction (PCR). An exceptionally large set of data from Affymetrix ATH1 whole-genome GeneChip studies provided the means to identify a new generation of reference genes with very stable expression levels in the model plant species Arabidopsis (Arabidopsis thaliana). Hundreds of Arabidopsis genes were found that outperform traditional reference genes in terms of expression stability throughout development and under a range of environmental conditions. Most of these were expressed at much lower levels than traditional reference genes, making them very suitable for normalization of gene expression over a wide range of transcript levels. Specific and efficient primers were developed for 22 genes and tested on a diverse set of 20 cDNA samples. Quantitative reverse transcription-PCR confirmed superior expression stability and lower absolute expression levels for many of these genes, including genes encoding a protein phosphatase 2A subunit, a coatomer subunit, and an ubiquitin-conjugating enzyme. The developed PCR primers or hybridization probes for the novel reference genes will enable better normalization and quantification of transcript levels in Arabidopsis in the future.

...read moreread less

2,694 citations

Journal Article•DOI•

Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary

[...]

Xizeng Mao¹, Tao Cai¹, John G. Olyarchuk¹, Liping Wei¹•Institutions (1)

Peking University¹

01 Oct 2005-Bioinformatics

TL;DR: A KO-Based Annotation System (KOBAS) is developed that can automatically annotate a set of sequences with KO terms and identify both the most frequent and the statistically significantly enriched pathways.

...read moreread less

Abstract: Motivation: High-throughput technologies such as DNA sequencing and microarrays have created the need for automated annotation of large sets of genes, including whole genomes, and automated identification of pathways. Ontologies, such as the popular Gene Ontology (GO), provide a common controlled vocabulary for these types of automated analysis. Yet, while GO offers tremendous value, it also has certain limitations such as the lack of direct association with pathways. Results: We demonstrated the use of the KEGG Orthology (KO), part of the KEGG suite of resources, as an alternative controlled vocabulary for automated annotation and pathway identification. We developed a KO-Based Annotation System (KOBAS) that can automatically annotate a set of sequences with KO terms and identify both the most frequent and the statistically significantly enriched pathways. Results from both whole genome and microarray gene cluster annotations with KOBAS are comparable and complementary to known annotations. KOBAS is a freely available standalone Python program that can contribute significantly to genome annotation and microarray analysis. Availability: Supplementary data and the KOBAS system are available at http://genome.cbi.pku.edu.cn/download.html Contact: weilp@mail.cbi.pku.edu.cn

...read moreread less

2,595 citations

Journal Article•DOI•

Genome sequence, comparative analysis and haplotype structure of the domestic dog

[...]

Kerstin Lindblad-Toh¹, Claire M. Wade¹, Claire M. Wade², Tarjei S. Mikkelsen³ +238 more•Institutions (11)

08 Dec 2005-Nature

TL;DR: A high-quality draft genome sequence of the domestic dog is reported, together with a dense map of single nucleotide polymorphisms (SNPs) across breeds, to shed light on the structure and evolution of genomes and genes.

...read moreread less

Abstract: Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

...read moreread less

2,431 citations

Journal Article•DOI•

Initial sequence of the chimpanzee genome and comparison with the human genome

[...]

Tarjei S. Mikkelsen, LaDeana W. Hillier, Evan E. Eichler, Michael C. Zody, David B. Jaffe, Shiaw-Pyng Yang¹, Wolfgang Enard¹, Ines Hellmann, Kerstin Lindblad-Toh, Tasha K. Altheide, Nicoletta Archidiacono, Peer Bork, Jonathan Butler, Jean L. Chang, Ze Cheng, Asif T. Chinwalla, Pieter J. de Jong, Kimberley D. Delehaunty, Catrina Fronick, Lucinda L. Fulton¹, Yoav Gilad, Gustavo Glusman, Sante Gnerre, Tina Graves, Toshiyuki Hayakawa, Karen E. Hayden, Xiaoqiu Huang, Hongkai Ji, W. James Kent, Mary Claire King, Edward J. Kulbokasl, Ming K. Lee, Ge Liu, Carlos López-Otín, Kateryna D. Makova, Orna Man, Elaine R. Mardis, Evan Mauceli, Tracie L. Miner, William E. Nash, Joanne O. Nelson¹, Svante Pääbo, Nick Patterson, Craig Pohl, Katherine S. Pollard¹, Kay Prüfer, Xose S. Puente, David Reich, Mariano Rocchi, Kate R. Rosenbloom, Maryellen Ruvolo, Daniel J. Richter, Stephen F. Schaffner, Arian F.A. Smit, Scott M. Smith, Mikita Suyama, James E. Taylor, David Torrents, Eray Tüzün, Ajit Varki, Gloria Velasco, Mario Ventura, John W. Wallis, Michael C. Wendl, Richard K. Wilson, Eric S. Lander, Robert H. Waterston - Show less +63 more•Institutions (1)

Max Planck Society¹

01 Sep 2005-Nature

TL;DR: It is found that the patterns of evolution in human and chimpanzee protein-coding genes are highly correlated and dominated by the fixation of neutral and slightly deleterious alleles.

...read moreread less

Abstract: Here we present a draft genome sequence of the common chimpanzee (Pan troglodytes). Through comparison with the human genome, we have generated a largely complete catalogue of the genetic differenc ...

...read moreread less

2,267 citations

Journal Article•DOI•

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”

[...]

Hervé Tettelin, Vega Masignani, Michael J. Cieslewicz¹, Claudio Donati, Duccio Medini, Naomi L. Ward², Samuel V. Angiuoli³, Jonathan Crabtree³, Amanda L. Jones⁴, A. Scott Durkin³, Robert T. DeBoy³, Tanja M. Davidsen³, Marirosa Mora, Maria Scarselli, Immaculada Margarit Y Ros, Jeremy Peterson³, Christopher R. Hauser³, Jaideep P. Sundaram³, William C. Nelson³, Ramana Madupu³, Lauren M. Brinkac³, Robert J. Dodson³, M. J. Rosovitz³, Steven A. Sullivan³, Sean C. Daugherty³, Daniel H. Haft³, Jeremy D. Selengut³, Michelle L. Gwinn³, Liwei Zhou³, Nikhat Zafar³, Hoda Khouri³, Diana Radune³, George Dimitrov³, Kisha Watkins³, Kevin J. B. O'Connor⁵, Shannon Smith³, Teresa Utterback³, Owen White³, Craig E. Rubens⁴, Guido Grandi, Lawrence C. Madoff¹, Dennis L. Kasper¹, John L. Telford, Michael R. Wessels¹, Rino Rappuoli, Claire M. Fraser⁶ - Show less +42 more•Institutions (6)

Harvard University¹, University of Maryland, Baltimore County², J. Craig Venter Institute³, Boston Children's Hospital⁴, Johns Hopkins University⁵, George Washington University⁶

27 Sep 2005-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans, was generated and Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactic pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.

...read moreread less

Abstract: The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.

...read moreread less

2,092 citations

Journal Article•DOI•

Galaxy: A platform for interactive large-scale genome analysis

[...]

Belinda Giardine¹, Cathy Riemer¹, Ross C. Hardison, Richard Burhans¹, Laura Elnitski², Prachi Shah¹, Prachi Shah², Yi Zhang¹, Daniel Blankenberg, Istvan Albert, James Taylor¹, Webb Miller¹, W. James Kent³, Anton Nekrutenko - Show less +10 more•Institutions (3)

Pennsylvania State University¹, National Institutes of Health², University of California, Santa Cruz³

01 Oct 2005-Genome Research

TL;DR: An interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queries, and visualize the results.

...read moreread less

Abstract: Accessing and analyzing the exponentially expanding genomic sequence and functional data pose a challenge for biomedical researchers. Here we describe an interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queries, and visualize the results. The heart of Galaxy is a flexible history system that stores the queries from each user; performs operations such as intersections, unions, and subtractions; and links to other computational tools. Galaxy can be accessed at http://g2.bx.psu.edu.

...read moreread less

2,071 citations

Journal Article•DOI•

Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals

[...]

Xiaohui Xie¹, Jun Lu¹, Edward J. Kulbokas¹, Todd R. Golub¹, Vamsi K. Mootha¹, Kerstin Lindblad-Toh¹, Eric S. Lander¹, Eric S. Lander², Manolis Kellis², Manolis Kellis¹ - Show less +6 more•Institutions (2)

Broad Institute¹, Massachusetts Institute of Technology²

17 Mar 2005-Nature

TL;DR: In this article, a comparative analysis of the human, mouse, rat and dog genomes is presented to create a systematic catalogue of common regulatory motifs in promoters and 3' untranslated regions (3' UTRs).

...read moreread less

Abstract: Comprehensive identification of all functional elements encoded in the human genome is a fundamental need in biomedical research. Here, we present a comparative analysis of the human, mouse, rat and dog genomes to create a systematic catalogue of common regulatory motifs in promoters and 3' untranslated regions (3' UTRs). The promoter analysis yields 174 candidate motifs, including most previously known transcription-factor binding sites and 105 new motifs. The 3'-UTR analysis yields 106 motifs likely to be involved in post-transcriptional regulation. Nearly one-half are associated with microRNAs (miRNAs), leading to the discovery of many new miRNA genes and their likely target genes. Our results suggest that previous estimates of the number of human miRNA genes were low, and that miRNAs regulate at least 20% of human genes. The overall results provide a systematic view of gene regulation in the human, which will be refined as additional mammalian genomes become available.

...read moreread less

1,954 citations

Journal Article•DOI•

Genomic insights that advance the species definition for prokaryotes

[...]

Konstantinos T. Konstantinidis¹, James M. Tiedje•Institutions (1)

Michigan State University¹

15 Feb 2005-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The average nucleotide identity of the shared genes between two strains was found to be a robust means to compare genetic relatedness among strains, and that ANI values of approximately 94% corresponded to the traditional 70% DNA-DNA reassociation standard of the current species definition.

...read moreread less

Abstract: To help advance the species definition for prokaryotes, we have compared the gene content of 70 closely related and fully sequenced bacterial genomes to identify whether species boundaries exist, and to determine the role of the organism's ecology on its shared gene content. We found the average nucleotide identity (ANI) of the shared genes between two strains to be a robust means to compare genetic relatedness among strains, and that ANI values of ≈94% corresponded to the traditional 70% DNA–DNA reassociation standard of the current species definition. At the 94% ANI cutoff, current species includes only moderately homogeneous strains, e.g., most of the >4-Mb genomes share only 65–90% of their genes, apparently as a result of the strains having evolved in different ecological settings. Furthermore, diagnostic genetic signatures (boundaries) are evident between groups of strains of the same species, and the intergroup genetic similarity can be as high as 98–99% ANI, indicating that justifiable species might be found even among organisms that are nearly identical at the nucleotide level. Notably, a large fraction, e.g., up to 65%, of the differences in gene content within species is associated with bacteriophage and transposase elements, revealing an important role of these elements during bacterial speciation. Our findings are consistent with a definition for species that would include a more homogeneous set of strains than provided by the current definition and one that considers the ecology of the strains in addition to their evolutionary distance.

...read moreread less

Journal Article•DOI•

The Genome of the African Trypanosome Trypanosoma brucei

[...]

Matthew Berriman¹, Elodie Ghedin², Elodie Ghedin³, Christiane Hertz-Fowler¹, Gaëlle Blandin³, Hubert Renauld¹, Daniella Castanheira Bartholomeu³, Nicola Lennard¹, Elisabet Caler³, N. Hamlin¹, Brian J. Haas³, Ulrike Böhme¹, Linda Hannick³, Martin Aslett¹, Joshua Shallom³, Lucio Marcello⁴, Lihua Hou³, Bill Wickstead⁵, U. Cecilia M. Alsmark⁶, Claire Arrowsmith¹, Rebecca Atkin¹, Andrew Barron¹, Frédéric Bringaud⁷, Karen Brooks¹, Mark Carrington⁸, Inna Cherevach¹, Tracey-Jane Chillingworth¹, Carol Churcher¹, Louise Clark¹, Craig Corton¹, Ann Cronin¹, Robert L. Davies¹, Jonathon Doggett¹, Appolinaire Djikeng³, Tamara Feldblyum³, Mark C. Field⁸, Audrey Fraser¹, Ian Goodhead¹, Zahra Hance¹, David Harper¹, Barbara Harris¹, Heidi Hauser¹, Jessica B. Hostetler³, Al Ivens¹, Kay Jagels¹, David W. Johnson¹, Justin Johnson³, Kristine Jones³, Arnaud Kerhornou¹, Hean Koo³, Natasha Larke¹, Scott M. Landfear⁹, Christopher Larkin³, Vanessa Leech⁸, Alexandra Line¹, Angela Lord¹, Annette MacLeod⁴, P. Mooney¹, Sharon Moule¹, David M. A. Martin¹⁰, Gareth W. Morgan¹¹, Karen Mungall¹, Halina Norbertczak¹, Doug Ormond¹, Grace Pai³, Christopher S. Peacock¹, Jeremy Peterson³, Michael A. Quail¹, Ester Rabbinowitsch¹, Marie-Adèle Rajandream¹, Chris P Reitter⁸, Steven L. Salzberg³, Mandy Sanders¹, Seth Schobel³, Sarah Sharp¹, Mark Simmonds¹, Anjana J. Simpson³, Luke J. Tallon³, C. Michael R. Turner⁴, Andrew Tait⁴, Adrian Tivey¹, Susan Van Aken³, Danielle Walker¹, David Wanless³, Shiliang Wang³, Brian White¹, Owen White³, Sally Whitehead¹, John Woodward¹, Jennifer R. Wortman³, Mark Raymond Adams¹², T. Martin Embley⁶, Keith Gull⁵, Elisabetta Ullu¹³, J. David Barry⁴, Alan H. Fairlamb¹⁰, Fred R. Opperdoes¹⁴, Barclay G. Barrell¹, John E. Donelson¹⁵, Neil Hall³, Neil Hall¹⁶, Claire M. Fraser³, Sara E. Melville⁸, Najib M. El-Sayed², Najib M. El-Sayed³ - Show less +101 more•Institutions (16)

Wellcome Trust Sanger Institute¹, George Washington University², J. Craig Venter Institute³, University of Glasgow⁴, University of Oxford⁵, Newcastle University⁶, University of Bordeaux⁷, University of Cambridge⁸, Oregon Health & Science University⁹, University of Dundee¹⁰, Imperial College London¹¹, Case Western Reserve University¹², Yale University¹³, Université catholique de Louvain¹⁴, University of Iowa¹⁵, Wellcome Trust¹⁶

15 Jul 2005-Science

TL;DR: Comparisons of the cytoskeleton and endocytic trafficking systems of Trypanosoma brucei with those of humans and other eukaryotic organisms reveal major differences.

...read moreread less

Abstract: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including ∼900 pseudogenes and ∼1700 T. brucei–specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.

...read moreread less

Journal Article•DOI•

De novo identification of repeat families in large genomes

[...]

Alkes L. Price¹, Neil C. Jones¹, Pavel A. Pevzner¹•Institutions (1)

University of California, San Diego¹

01 Jan 2005-Bioinformatics

TL;DR: A new method for de novo identification of repeat families via extension of consensus seeds is developed, which enables a rigorous definition of repeat boundaries, a key issue in repeat analysis.

...read moreread less

Abstract: Every time we compare two species that are closer to each other than either is to humans, we get nearly killed by unmasked repeats. Webb Miller (Personal communication) Motivation:De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis. Results: Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that ∼2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence. Availability: Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html Contact: ppevzner@cs.ucsd.edu

...read moreread less

Journal Article•DOI•

The genome sequence of the rice blast fungus Magnaporthe grisea

[...]

Ralph A. Dean¹, Nicholas J. Talbot², Daniel J. Ebbole³, Mark L. Farman⁴, Thomas K. Mitchell¹, Marc J. Orbach⁵, Michael R. Thon³, Resham Kulkarni⁶, Resham Kulkarni¹, Jin-Rong Xu⁷, Huaqin Pan¹, Nick D. Read⁸, Yong-Hwan Lee⁹, Ignazio Carbone¹, Doug Brown¹, Yeonyee Oh¹, Nicole M. Donofrio¹, Jun Seop Jeong¹, Darren M. Soanes², Slavica Djonovic³, Elena A. Kolomiets³, Cathryn J. Rehmeyer⁴, Weixi Li⁴, Michael W. Harding⁵, Soonok Kim⁹, Marc-Henri Lebrun¹⁰, Heidi U. Böhnert¹⁰, Sean J. Coughlan¹¹, Jonathan Butler¹², Sarah E. Calvo¹², Li-Jun Ma¹², Robert Nicol¹², Seth Purcell¹², Chad Nusbaum¹², James E. Galagan¹², Bruce W. Birren¹² - Show less +32 more•Institutions (12)

North Carolina State University¹, University of Exeter², Texas A&M University³, University of Kentucky⁴, University of Arizona⁵, Research Triangle Park⁶, Purdue University⁷, University of Edinburgh⁸, Seoul National University⁹, Bayer¹⁰, Agilent Technologies¹¹, Broad Institute¹²

21 Apr 2005-Nature

TL;DR: The draft sequence of the M. grisea genome is reported, reflecting the clonal nature of this fungus imposed by widespread rice cultivation and analysis of the gene set provides an insight into the adaptations required by a fungus to cause disease.

...read moreread less

Abstract: Magnaporthe grisea is the most destructive pathogen of rice worldwide and the principal model organism for elucidating the molecular basis of fungal disease of plants. Here, we report the draft sequence of the M. grisea genome. Analysis of the gene set provides an insight into the adaptations required by a fungus to cause disease. The genome encodes a large and diverse set of secreted proteins, including those defined by unusual carbohydrate-binding domains. This fungus also possesses an expanded family of G-protein-coupled receptors, several new virulence-associated genes and large suites of enzymes involved in secondary metabolism. Consistent with a role in fungal pathogenesis, the expression of several of these genes is upregulated during the early stages of infection-related development. The M. grisea genome has been subject to invasion and proliferation of active transposable elements, reflecting the clonal nature of this fungus imposed by widespread rice cultivation.

...read moreread less

Journal Article•DOI•

Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology

[...]

Christopher Paul Wild¹•Institutions (1)

American Association For Cancer Research¹

01 Aug 2005-Cancer Epidemiology, Biomarkers & Prevention

TL;DR: The sequencing and mapping of the human genome provides a foundation for the elucidation of gene expression and protein function, and the identification of the biochemical pathways implicated in the natural history of chronic diseases.

...read moreread less

Abstract: The sequencing and mapping of the human genome provides a foundation for the elucidation of gene expression and protein function, and the identification of the biochemical pathways implicated in the natural history of chronic diseases, including cancer, diabetes, and vascular and neurodegenerative

...read moreread less

Journal Article•DOI•

Fungal secondary metabolism — from biochemistry to genomics

[...]

Nancy P. Keller¹, Geoffrey Turner², Joan W. Bennett³•Institutions (3)

University of Wisconsin-Madison¹, University of Sheffield², Tulane University³

01 Dec 2005-Nature Reviews Microbiology

TL;DR: Questions are addressed, including which evolutionary pressures led to gene clustering, why closely related species produce different profiles of secondary metabolites, and whether fungal genomics will accelerate the discovery of new pharmacologically active natural products.

...read moreread less

Abstract: Much of natural product chemistry concerns a group of compounds known as secondary metabolites. These low-molecular-weight metabolites often have potent physiological activities. Digitalis, morphine and quinine are plant secondary metabolites, whereas penicillin, cephalosporin, ergotrate and the statins are equally well known fungal secondary metabolites. Although chemically diverse, all secondary metabolites are produced by a few common biosynthetic pathways, often in conjunction with morphological development. Recent advances in molecular biology, bioinformatics and comparative genomics have revealed that the genes encoding specific fungal secondary metabolites are clustered and often located near telomeres. In this review, we address some important questions, including which evolutionary pressures led to gene clustering, why closely related species produce different profiles of secondary metabolites, and whether fungal genomics will accelerate the discovery of new pharmacologically active natural products.

...read moreread less

Journal Article•DOI•

Reverse engineering of regulatory networks in human B cells.

[...]

Katia Basso, Adam A. Margolin, Gustavo Stolovitzky¹, Ulf Klein, Riccardo Dalla-Favera², Andrea Califano - Show less +2 more•Institutions (2)

IBM¹, Columbia University²

20 Mar 2005-Nature Genetics

TL;DR: The reconstruction of regulatory networks from expression profiles of human B cells is reported, suggestive of a hierarchical, scale-free network, where a few highly interconnected genes (hubs) account for most of the interactions.

...read moreread less

Abstract: Cellular phenotypes are determined by the differential activity of networks linking coregulated genes. Available methods for the reverse engineering of such networks from genome-wide expression profiles have been successful only in the analysis of lower eukaryotes with simple genomes. Using a new method called ARACNe (algorithm for the reconstruction of accurate cellular networks), we report the reconstruction of regulatory networks from expression profiles of human B cells. The results are suggestive a hierarchical, scale-free network, where a few highly interconnected genes (hubs) account for most of the interactions. Validation of the network against available data led to the identification of MYC as a major hub, which controls a network comprising known target genes as well as new ones, which were biochemically validated. The newly identified MYC targets include some major hubs. This approach can be generally useful for the analysis of normal and pathologic networks in mammalian cells.

...read moreread less

Journal Article•DOI•

Two rounds of whole genome duplication in the ancestral vertebrate.

[...]

Paramvir S. Dehal¹, Jeffrey L. Boore¹, Jeffrey L. Boore²•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of California, Berkeley²

06 Sep 2005-PLOS Biology

TL;DR: The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, and the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage is highlighted.

...read moreread less

Abstract: The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, and then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish–tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of four-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

...read moreread less

Journal Article•DOI•

The genome of the kinetoplastid parasite, Leishmania major.

[...]

Alasdair Ivens¹, Christopher S. Peacock¹, Elizabeth A. Worthey², Lee Murphy¹, Gautam Aggarwal², Matthew Berriman¹, Ellen Sisk², Marie-Adèle Rajandream¹, Ellen Adlem¹, Rita Aert³, Atashi Anupama², Zina Apostolou, Philip Attipoe², Nathalie Bason¹, Christopher Bauser⁴, Alfred Beck⁵, Stephen M. Beverley⁶, Gabriella Bianchettin⁷, K. Borzym⁵, G. Bothe⁴, Carlo V. Bruschi⁷, Carlo V. Bruschi⁸, Matt Collins¹, Eithon Cadag², Laura Ciarloni⁷, Christine Clayton, Richard M.R. Coulson⁹, Ann Cronin¹, Angela K. Cruz¹⁰, Robert L. Davies¹, Javier G. De Gaudenzi¹¹, Deborah E. Dobson⁶, Andreas Duesterhoeft, Gholam Fazelina², Nigel Fosker¹, Alberto C.C. Frasch¹¹, Audrey Fraser¹, Monika Fuchs, Claudia Gabel, Arlette Goble¹, André Goffeau¹², David Harris¹, Christiane Hertz-Fowler¹, Helmut Hilbert, David Horn¹³, Yiting Huang², Sven Klages⁵, Andrew J Knights¹, Michael Kube⁵, Natasha Larke¹, Lyudmila Litvin², Angela Lord¹, Tin Louie², Marco A. Marra, David Masuy¹², Keith R. Matthews¹⁴, Shulamit Michaeli, Jeremy C. Mottram¹⁵, Silke Müller-Auer, Heather Munden², Siri Nelson², Halina Norbertczak¹, Karen Oliver¹, Susan O'Neil¹, Martin Pentony², Thomas M. Pohl⁴, Claire Price¹, Bénédicte Purnelle¹², Michael A. Quail¹, Ester Rabbinowitsch¹, Richard Reinhardt⁵, Michael A. Rieger, Joel Rinta², Johan Robben³, Laura Robertson², Jeronimo C. Ruiz¹⁰, Simon Rutter¹, David L. Saunders¹, Melanie Schäfer, Jacquie Schein, David C. Schwartz¹⁶, Kathy Seeger¹, Amber Seyler², Sarah Sharp¹, Heesun Shin, Dhileep Sivam², Rob Squares¹, Steve Squares¹, Valentina Tosato⁷, Christy Vogt², Guido Volckaert³, Rolf Wambutt, T. Warren¹, Holger Wedler, John Woodward¹, Shiguo Zhou¹⁶, Wolfgang Zimmermann, Deborah F. Smith¹⁷, Jenefer M. Blackwell¹⁸, Kenneth Stuart², Kenneth Stuart¹⁹, Bart Barrell¹, Peter J. Myler¹⁹, Peter J. Myler² - Show less +100 more•Institutions (19)

Wellcome Trust Sanger Institute¹, Seattle Biomed², Katholieke Universiteit Leuven³, GATC Biotech⁴, Max Planck Society⁵, Washington University in St. Louis⁶, University of Trieste⁷, International Centre for Genetic Engineering and Biotechnology⁸, European Bioinformatics Institute⁹, University of São Paulo¹⁰, National Scientific and Technical Research Council¹¹, Université catholique de Louvain¹², University of London¹³, University of Edinburgh¹⁴, University of Glasgow¹⁵, University of Wisconsin-Madison¹⁶, University of York¹⁷, University of Cambridge¹⁸, University of Washington¹⁹

15 Jul 2005-Science

TL;DR: The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Tritryp genomes suggest that the mechanisms regulating RNA polymerase II–directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling.

...read moreread less

Abstract: Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.

...read moreread less

Journal Article•DOI•

The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease

[...]

Najib M. El-Sayed¹, Peter J. Myler², Peter J. Myler³, Daniella Castanheira Bartholomeu⁴, Daniel Nilsson⁵, Gautam Aggarwal², Anh-Nhi Tran⁵, Elodie Ghedin¹, Elizabeth A. Worthey², Arthur L. Delcher, Gaëlle Blandin⁴, Scott J. Westenberger⁶, Elisabet Caler⁴, Gustavo C. Cerqueira⁷, Carole Branche⁵, Brian J. Haas⁴, Atashi Anupama², Erik Arner⁵, Lena Åslund⁸, Philip Attipoe², Esteban J. Bontempi⁵, Frédéric Bringaud⁹, Peter Burton¹⁰, Eithon Cadag², David A. Campbell⁶, Mark Carrington¹¹, Jonathan Crabtree⁴, Hamid Darban⁵, José Franco da Silveira¹², Pieter J. de Jong¹³, Kimberly Edwards⁵, Paul T. Englund¹⁴, Gholam Fazelina², Tamara Feldblyum⁴, Marcela Ferella⁵, Alberto C.C. Frasch¹⁵, Keith Gull¹⁶, David Horn¹⁷, Lihua Hou⁴, Yiting Huang², Ellen Kindlund⁵, Michele M. Klingbeil¹⁸, Sindy Kluge⁵, Hean Koo⁴, Daniela R. Lacerda¹⁹, Mariano J. Levin²⁰, Hernan Lorenzi²⁰, Tin Louie², Carlos Renato Machado⁷, Richard McCulloch¹⁰, Alan McKenna⁵, Yumi Mizuno⁵, Jeremy C. Mottram¹⁰, Siri Nelson², Stephen Ochaya⁵, Kazutoyo Osoegawa¹³, Grace Pai⁴, Marilyn Parsons², Marilyn Parsons³, Martin Pentony², Ulf Pettersson⁸, Mihai Pop⁴, José Luis Ramírez²¹, Joel Rinta², Laura Robertson², Steven L. Salzberg, Daniel O. Sánchez¹⁵, Amber Seyler², Reuben Sunil Kumar Sharma¹¹, Jyoti Shetty⁴, Anjana J. Simpson⁴, Ellen Sisk², Martti T. Tammi⁵, Martti T. Tammi²², Rick L. Tarleton²³, Santuza M. R. Teixeira⁷, Susan Van Aken⁴, Christy Vogt², Pauline N. Ward¹⁰, Bill Wickstead¹⁶, Jennifer R. Wortman⁴, Owen White⁴, Claire M. Fraser⁴, Kenneth Stuart², Kenneth Stuart³, Björn Andersson⁵ - Show less +82 more•Institutions (23)

15 Jul 2005-Science

TL;DR: Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.

...read moreread less

Abstract: Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.

...read moreread less

Journal Article•DOI•

Distribution and intensity of constraint in mammalian genomic sequence

[...]

Gregory M. Cooper¹, Eric A. Stone, George Asimenos, Eric D. Green, Serafim Batzoglou, Arend Sidow - Show less +2 more•Institutions (1)

Stanford University¹

01 Jul 2005-Genome Research

TL;DR: A number of elements in this region that have undergone intense purifying selection throughout mammalian evolution are described, and it is shown that these important elements are more numerous than previously thought.

...read moreread less

Abstract: Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the results of such an analysis on an alignment of sequences from 29 mammalian species. The alignment captures ∼3.9 neutral substitutions per site and spans ∼1.9 Mbp of the human genome. We identify constrained elements from 3 bp to over 1 kbp in length, covering ∼5.5% of the human locus. Our estimate for the total amount of nonexonic constraint experienced by this locus is roughly twice that for exonic constraint. Constrained elements tend to cluster, and we identify large constrained regions that correspond well with known functional elements. While constraint density inversely correlates with mobile element density, we also show the presence of unambiguously constrained elements overlapping mammalian ancestral repeats. In addition, we describe a number of elements in this region that have undergone intense purifying selection throughout mammalian evolution, and we show that these important elements are more numerous than previously thought. These results were obtained with Genomic Evolutionary Rate Profiling (GERP), a statistically rigorous and biologically transparent framework for constrained element identification. GERP identifies regions at high resolution that exhibit nucleotide substitution deficits, and measures these deficits as “rejected substitutions.” Rejected substitutions reflect the intensity of past purifying selection and are used to rank and characterize constrained elements. We anticipate that GERP and the types of analyses it facilitates will provide further insights and improved annotation for the human genome as mammalian genome sequence data become richer.

...read moreread less

Journal Article•DOI•

Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae

[...]

James E. Galagan¹, Sarah E. Calvo¹, Christina A. Cuomo¹, Li-Jun Ma¹, Jennifer R. Wortman², Serafim Batzoglou³, Su-In Lee³, Meray Baştürkmen⁴, Christina C. Spevak⁴, John Clutterbuck⁵, Vladimir V. Kapitonov⁶, Jerzy Jurka⁶, Claudio Scazzocchio⁷, Mark L. Farman⁸, Jonathan Butler¹, Seth Purcell¹, Steve Harris⁹, Gerhard H. Braus¹⁰, Oliver W. Draht¹⁰, Silke Busch¹⁰, Christophe d'Enfert¹¹, Christiane Bouchier¹¹, Gustavo H. Goldman¹², Deborah Bell-Pedersen¹³, Sam Griffiths-Jones¹⁴, John H. Doonan¹⁵, Jae-Hyuk Yu¹⁶, Kay Vienken¹⁷, Arnab Pain¹⁴, Michael Freitag¹⁸, Eric U. Selker¹⁸, David B. Archer¹⁹, Miguel A. Peñalva²⁰, Berl R. Oakley²¹, Michelle Momany²², Toshihiro Tanaka²³, Toshitaka Kumagai²⁴, Kiyoshi Asai²⁴, Masayuki Machida²⁴, William C. Nierman²⁵, David W. Denning²⁶, Mark X. Caddick²⁷, Michael J. Hynes²⁸, Mathieu Paoletti¹⁹, Reinhard Fischer²⁹, Reinhard Fischer¹⁷, Bruce L. Miller³⁰, Paul S. Dyer¹⁹, Matthew S. Sachs⁴, Stephen A. Osmani²¹, Bruce W. Birren¹ - Show less +47 more•Institutions (30)

Broad Institute¹, J. Craig Venter Institute², Stanford University³, Oregon Health & Science University⁴, University of Glasgow⁵, Genetic Information Research Institute⁶, Institut Universitaire de France⁷, University of Kentucky⁸, University of Nebraska–Lincoln⁹, University of Göttingen¹⁰, Pasteur Institute¹¹, University of São Paulo¹², Texas A&M University¹³, Wellcome Trust Sanger Institute¹⁴, John Innes Centre¹⁵, University of Wisconsin-Madison¹⁶, Max Planck Society¹⁷, University of Oregon¹⁸, University of Nottingham¹⁹, Spanish National Research Council²⁰, Ohio State University²¹, University of Georgia²², Tokyo Institute of Technology²³, National Institute of Advanced Industrial Science and Technology²⁴, George Washington University²⁵, University of Manchester²⁶, University of Liverpool²⁷, University of Melbourne²⁸, Karlsruhe Institute of Technology²⁹, University of Idaho³⁰

22 Dec 2005-Nature

TL;DR: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution, and a comparative study with Aspergillus fumigatus and As pergillus oryzae, used in the production of sake, miso and soy sauce, provides new insight into eukaryotic genome evolution and gene regulation.

...read moreread less

Abstract: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation.

...read moreread less

Journal Article•DOI•

The genome of the social amoeba Dictyostelium discoideum

[...]

Ludwig Eichinger¹, Justin A. Pachebat², Justin A. Pachebat¹, Gernot Glöckner, Marie-Adèle Rajandream³, Richard Sucgang⁴, Matthew Berriman³, J. Song⁴, Rolf Olsen⁵, Karol Szafranski, Qikai Xu⁴, Budi Tunggal¹, Sarah K. Kummerfeld², Martin Madera², Bernard Anri Konfortov², Francisco Rivero¹, Alan T. Bankier², Rüdiger Lehmann, N. Hamlin³, Robert L. Davies³, Pascale Gaudet⁶, Petra Fey⁶, Karen E Pilcher⁶, Guokai Chen⁴, David L. Saunders³, Erica Sodergren⁴, P. Davis³, Arnaud Kerhornou³, X. Nie⁴, Neil Hall³, Christophe Anjard⁵, Lisa Hemphill⁴, Nathalie Bason³, Patrick Farbrother¹, Brian A. Desany⁴, Eric M. Just⁶, Takahiro Morio⁷, René Rost⁸, Carol Churcher³, J. Cooper³, Stephen F. Haydock⁹, N. van Driessche⁴, Ann Cronin³, Ian Goodhead³, Donna M. Muzny⁴, T. Mourier³, Arnab Pain³, Mingyang Lu⁴, D. Harper³, R. Lindsay⁴, Heidi Hauser³, Kylie R. James³, M. Quiles⁴, M. Madan Babu², Tsuneyuki Saito¹⁰, Carmen Buchrieser¹¹, A. Wardroper¹², A. Wardroper², Marius Felder, M. Thangavelu, D. Johnson³, Andrew J Knights³, H. Loulseged⁴, Karen Mungall³, Karen Oliver³, Claire Price³, Michael A. Quail³, Hideko Urushihara⁷, Judith Hernandez⁴, Ester Rabbinowitsch³, David Steffen⁴, Mandy Sanders³, Jun Ma⁴, Yuji Kohara¹³, Sarah Sharp³, Mark Simmonds³, S. Spiegler³, Adrian Tivey³, Sumio Sugano¹⁴, Brian White³, Danielle Walker³, John Woodward³, Thomas Winckler, Yoshiaki Tanaka⁷, Gad Shaulsky⁴, Michael Schleicher⁸, George M. Weinstock⁴, André Rosenthal, Edward C. Cox¹⁵, Rex L. Chisholm⁶, Richard A. Gibbs⁴, William F. Loomis⁵, Matthias Platzer, Robert R. Kay², Jeffrey G. Williams¹⁶, Paul H. Dear², Angelika A. Noegel¹, Bart Barrell³, Adam Kuspa⁴ - Show less +95 more•Institutions (16)

University of Cologne¹, Laboratory of Molecular Biology², Wellcome Trust Sanger Institute³, Baylor College of Medicine⁴, University of California, San Diego⁵, Northwestern University⁶, University of Tsukuba⁷, Ludwig Maximilian University of Munich⁸, University of Cambridge⁹, Hokkaido University¹⁰, Pasteur Institute¹¹, University of York¹², National Institute of Genetics¹³, University of Tokyo¹⁴, Princeton University¹⁵, University of Dundee¹⁶

05 May 2005-Nature

TL;DR: A proteome-based phylogeny shows that the amoebozoa diverged from the animal–fungal lineage after the plant–animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.

...read moreread less

Abstract: The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal-fungal lineage after the plant-animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.

...read moreread less

Journal Article•DOI•

Polyploidy and genome evolution in plants.

[...]

Keith L. Adams¹, Jonathan F. Wendel²•Institutions (2)

University of British Columbia¹, Iowa State University²

01 Apr 2005-Current Opinion in Plant Biology

TL;DR: Evidence is now supported by evidence showing that genes that are retained in duplicate typically diversify in function or undergo subfunctionalization, with some duplicate genes more prone to retention than others.

...read moreread less

Journal Article•DOI•

Identification of microRNAs of the herpesvirus family

[...]

Sébastien Pfeffer¹, Alain Sewer², Mariana Lagos-Quintana¹, Robert L. Sheridan³, Chris Sander³, Friedrich A. Grässer, Linda F. van Dyk⁴, C. Kiong Ho³, C. Kiong Ho⁵, Stewart Shuman³, Minchen Chien, James J. Russo, Jingyue Ju⁶, Glenn Randall¹, Brett D. Lindenbach¹, Charles M. Rice¹, Viviana Simon⁷, David D. Ho⁷, Mihaela Zavolan², Thomas Tuschl¹ - Show less +16 more•Institutions (7)

Rockefeller University¹, University of Basel², Memorial Sloan Kettering Cancer Center³, University of Colorado Hospital⁴, University at Buffalo⁵, Columbia University⁶, Aaron Diamond AIDS Research Center⁷

16 Feb 2005-Nature Methods

TL;DR: To identify other miRNA genes in pathogenic viruses, a new miRNA gene prediction method with small-RNA cloning from several virus-infected cell types was combined and predicted miRNAs in several large DNA viruses.

...read moreread less

Abstract: Epstein-Barr virus (EBV or HHV4), a member of the human herpesvirus (HHV) family, has recently been shown to encode microRNAs (miRNAs). In contrast to most eukaryotic miRNAs, these viral miRNAs do not have close homologs in other viral genomes or in the genome of the human host. To identify other miRNA genes in pathogenic viruses, we combined a new miRNA gene prediction method with small-RNA cloning from several virus-infected cell types. We cloned ten miRNAs in the Kaposi sarcoma-associated virus (KSHV or HHV8), nine miRNAs in the mouse gammaherpesvirus 68 (MHV68) and nine miRNAs in the human cytomegalovirus (HCMV or HHV5). These miRNA genes are expressed individually or in clusters from either polymerase (pol) II or pol III promoters, and share no substantial sequence homology with one another or with the known human miRNAs. Generally, we predicted miRNAs in several large DNA viruses, and we could neither predict nor experimentally identify miRNAs in the genomes of small RNA viruses or retroviruses.

...read moreread less

Journal Article•DOI•

Genome-wide association studies: theoretical and practical concerns

[...]

William Y.S. Wang¹, Bryan J. Barratt², Bryan J. Barratt¹, David Clayton¹, John A. Todd¹ - Show less +1 more•Institutions (2)

University of Cambridge¹, AstraZeneca²

01 Feb 2005-Nature Reviews Genetics

TL;DR: The main factors — including models of the allelic architecture of common diseases, sample size, map density and sample-collection biases — that need to be taken into account in order to optimize the cost efficiency of identifying genuine disease-susceptibility loci are outlined.

...read moreread less

Abstract: To fully understand the allelic variation that underlies common diseases, complete genome sequencing for many individuals with and without disease is required. This is still not technically feasible. However, recently it has become possible to carry out partial surveys of the genome by genotyping large numbers of common SNPs in genome-wide association studies. Here, we outline the main factors - including models of the allelic architecture of common diseases, sample size, map density and sample-collection biases - that need to be taken into account in order to optimize the cost efficiency of identifying genuine disease-susceptibility loci.

...read moreread less

Journal Article•DOI•

Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution

[...]

Jill Cheng¹, Philipp Kapranov¹, Jorg Drenkow¹, Sujit Dike¹, Shane Brubaker¹, Sandeep Patel¹, Jeffrey Long¹, David Stern¹, Hari Tammana¹, Gregg Helt¹, Victor Sementchenko¹, Antonio Piccolboni¹, Stefan Bekiranov¹, Dione K. Bailey¹, Madhavan Ganesh¹, Srinka Ghosh¹, Ian Bell¹, Daniela S. Gerhard, Thomas R. Gingeras¹ - Show less +15 more•Institutions (1)

Affymetrix¹

20 May 2005-Science

TL;DR: The transcribed portions of the human genome are predominantly composed of interlaced networks of both poly A+ and poly A– annotated transcripts and unannotated transcripts of unknown function, which has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.

...read moreread less

Abstract: Sites of transcription of polyadenylated and nonpolyadenylated RNAs for 10 human chromosomes were mapped at 5-base pair resolution in eight cell lines. Unannotated, nonpolyadenylated transcripts comprise the major proportion of the transcriptional output of the human genome. Of all transcribed sequences, 19.4, 43.7, and 36.9% were observed to be polyadenylated, nonpolyadenylated, and bimorphic, respectively. Half of all transcribed sequences are found only in the nucleus and for the most part are unannotated. Overall, the transcribed portions of the human genome are predominantly composed of interlaced networks of both poly A+ and poly A- annotated transcripts and unannotated transcripts of unknown function. This organization has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.

...read moreread less

Journal Article•DOI•

Concerted and birth-and-death evolution of multigene families.

[...]

Masatoshi Nei¹, Alejandro P. Rooney•Institutions (1)

Pennsylvania State University¹

14 Nov 2005-Annual Review of Genetics

TL;DR: Until around 1990, most multigene families were thought to be subject to concerted evolution, in which all member genes of a family evolve as a unit in concert, but phylogenetic analysis of MHC and other immune system genes showed a quite different evolutionary pattern, and a new model called birth-and-death evolution was proposed.

...read moreread less

Abstract: Until around 1990, most multigene families were thought to be subject to concerted evolution, in which all member genes of a family evolve as a unit in concert. However, phylogenetic analysis of MHC and other immune system genes showed a quite different evolutionary pattern, and a new model called birth-and-death evolution was proposed. In this model, new genes are created by gene duplication and some duplicate genes stay in the genome for a long time, whereas others are inactivated or deleted from the genome. Later investigations have shown that most non-rRNA genes including highly conserved histone or ubiquitin genes are subject to this type of evolution. However, the controversy over the two models is still continuing because the distinction between the two models becomes difficult when sequence differences are small. Unlike concerted evolution, the model of birth-and-death evolution can give some insights into the origins of new genetic systems or new phenotypic characters.

...read moreread less

Journal Article•DOI•

Genome sequencing and analysis of Aspergillus oryzae

[...]

Masayuki Machida¹, Kiyoshi Asai¹, Motoaki Sano¹, Toshihiro Tanaka², Toshitaka Kumagai¹, Goro Terai¹, Goro Terai³, Ken Ichi Kusumoto, Toshihide Arima, Osamu Akita, Yutaka Kashiwagi, Keietsu Abe⁴, Katsuya Gomi⁴, Hiroyuki Horiuchi⁵, Katsuhiko Kitamoto⁵, Tetsuo Kobayashi⁶, Michio Takeuchi⁷, David W. Denning⁸, James E. Galagan⁹, William C. Nierman¹⁰, Jiujiang Yu¹¹, David B. Archer¹², Joan W. Bennett¹³, Deepak Bhatnagar¹¹, Thomas E. Cleveland¹¹, Natalie D. Fedorova¹⁴, Osamu Gotoh¹, Hiroshi Horikawa², Akira Hosoyama², Masayuki Ichinomiya⁵, Rie Igarashi², Kazuhiro Iwashita, Praveen R. Juvvadi⁵, Masashi Kato⁶, Yumiko Kato², Taishin Kin¹, Akira Kokubun², Hiroshi Maeda⁴, Noriko Maeyama², Jun-ichi Maruyama⁵, Hideki Nagasaki¹, Tasuku Nakajima⁴, Ken Oda, Kinya Okada¹, Ian T. Paulsen¹⁴, Kazutoshi Sakamoto, Toshihiko Sawano², Mikio Takahashi², Kumiko Takase¹, Yasunobu Terabayashi¹, Jennifer R. Wortman¹⁴, Osamu Yamada, Youhei Yamagata⁴, Hideharu Anazawa, Yoji Hata, Yoshinao Koide, Takashi Komori³, Yasuji Koyama¹⁵, Toshitaka Minetoki, Sivasundaram Suharnan, Akimitsu Tanaka, Katsumi Isono², Satoru Kuhara¹⁶, Naotake Ogasawara¹⁷, Hisashi Kikuchi² - Show less +61 more•Institutions (17)

National Institute of Advanced Industrial Science and Technology¹, National Institute of Technology and Evaluation², Intec, Inc.³, Tohoku University⁴, University of Tokyo⁵, Nagoya University⁶, Tokyo University of Agriculture and Technology⁷, University of Manchester⁸, Broad Institute⁹, George Washington University¹⁰, Agricultural Research Service¹¹, University of Nottingham¹², Tulane University¹³, J. Craig Venter Institute¹⁴, Kikkoman¹⁵, Kyushu University¹⁶, Nara Institute of Science and Technology¹⁷

22 Dec 2005-Nature

TL;DR: Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.

...read moreread less

Abstract: The genome of Aspergillus oryzae, a fungus important for the production of traditional fermented foods and beverages in Japan, has been sequenced. The ability to secrete large amounts of proteins and the development of a transformation system have facilitated the use of A. oryzae in modern biotechnology. Although both A. oryzae and Aspergillus flavus belong to the section Flavi of the subgenus Circumdati of Aspergillus, A. oryzae, unlike A. flavus, does not produce aflatoxin, and its long history of use in the food industry has proved its safety. Here we show that the 37-megabase (Mb) genome of A. oryzae contains 12,074 genes and is expanded by 7-9 Mb in comparison with the genomes of Aspergillus nidulans and Aspergillus fumigatus. Comparison of the three aspergilli species revealed the presence of syntenic blocks and A. oryzae-specific blocks (lacking synteny with A. nidulans and A. fumigatus) in a mosaic manner throughout the genome of A. oryzae. The blocks of A. oryzae-specific sequence are enriched for genes involved in metabolism, particularly those for the synthesis of secondary metabolites. Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.

...read moreread less

Journal Article•DOI•

A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome

[...]

Simon Myers¹, Leonardo Bottolo¹, Colin Freeman¹, Gil McVean¹, Peter Donnelly¹ - Show less +1 more•Institutions (1)

University of Oxford¹

14 Oct 2005-Science

TL;DR: A high-resolution genetic map of the human genome is presented, based on statistical analyses of genetic variation data, and more than 25,000 recombination hotspots are identified, together with motifs and sequence contexts that play a role in hotspot activity.

...read moreread less

Abstract: Genetic maps, which document the way in which recombination rates vary over a genome, are an essential tool for many genetic analyses. We present a high-resolution genetic map of the human genome, based on statistical analyses of genetic variation data, and identify more than 25,000 recombination hotspots, together with motifs and sequence contexts that play a role in hotspot activity. Differences between the behavior of recombination rates over large (megabase) and small (kilobase) scales lead us to suggest a two-stage model for recombination in which hotspots are stochastic features, within a framework in which large-scale rates are constrained.

...read moreread less

Collapse