scispace - formally typeset
Search or ask a question

Showing papers on "Genome published in 2006"


Journal ArticleDOI
02 Jun 2006-Science
TL;DR: Using metabolic function analyses of identified genes, the human genome is compared with the average content of previously sequenced microbial genomes and humans are superorganisms whose metabolism represents an amalgamation of microbial and human attributes.
Abstract: The human intestinal microbiota is composed of 10(13) to 10(14) microorganisms whose collective genome ("microbiome") contains at least 100 times as many genes as our own genome. We analyzed approximately 78 million base pairs of unique DNA sequence and 2062 polymerase chain reaction-amplified 16S ribosomal DNA sequences obtained from the fecal DNAs of two healthy adults. Using metabolic function analyses of identified genes, we compared our human genome with the average content of previously sequenced microbial genomes. Our microbiome has significantly enriched metabolism of glycans, amino acids, and xenobiotics; methanogenesis; and 2-methyl-d-erythritol 4-phosphate pathway-mediated biosynthesis of vitamins and isoprenoids. Thus, humans are superorganisms whose metabolism represents an amalgamation of microbial and human attributes.

4,111 citations


Journal ArticleDOI
Gerald A. Tuskan1, Gerald A. Tuskan2, Stephen P. DiFazio3, Stephen P. DiFazio1, Stefan Jansson4, Joerg Bohlmann5, Igor V. Grigoriev6, Uffe Hellsten6, Nicholas H. Putnam6, Steven G. Ralph5, Stephane Rombauts7, Asaf Salamov6, Jacquie Schein, Lieven Sterck7, Andrea Aerts6, Rishikeshi Bhalerao4, Rishikesh P. Bhalerao8, Damien Blaudez9, Wout Boerjan7, Annick Brun9, Amy M. Brunner10, Victor Busov11, Malcolm M. Campbell12, John E. Carlson13, Michel Chalot9, Jarrod Chapman6, G.-L. Chen1, Dawn Cooper5, Pedro M. Coutinho14, Jérémy Couturier9, Sarah F. Covert15, Quentin C. B. Cronk5, R. Cunningham1, John M. Davis16, Sven Degroeve7, Annabelle Déjardin9, Claude W. dePamphilis13, John C. Detter6, Bill Dirks17, Inna Dubchak18, Inna Dubchak6, Sébastien Duplessis9, Jürgen Ehlting5, Brian E. Ellis5, Karla C Gendler19, David Goodstein6, Michael Gribskov20, Jane Grimwood21, Andrew Groover22, Lee E. Gunter1, Björn Hamberger5, Berthold Heinze, Yrjö Helariutta23, Yrjö Helariutta24, Yrjö Helariutta8, Bernard Henrissat14, D. Holligan15, Robert A. Holt, Wenyu Huang6, N. Islam-Faridi22, Steven J.M. Jones, M. Jones-Rhoades25, Richard A. Jorgensen19, Chandrashekhar P. Joshi11, Jaakko Kangasjärvi24, Jan Karlsson4, Colin T. Kelleher5, Robert Kirkpatrick, Matias Kirst16, Annegret Kohler9, Udaya C. Kalluri1, Frank W. Larimer1, Jim Leebens-Mack15, Jean-Charles Leplé9, Philip F. LoCascio1, Y. Lou6, Susan Lucas6, Francis Martin9, Barbara Montanini9, Carolyn A. Napoli19, David R. Nelson26, C D Nelson22, Kaisa Nieminen24, Ove Nilsson8, V. Pereda9, Gary F. Peter16, Ryan N. Philippe5, Gilles Pilate9, Alexander Poliakov18, J. Razumovskaya1, Paul G. Richardson6, Cécile Rinaldi9, Kermit Ritland5, Pierre Rouzé7, D. Ryaboy18, Jeremy Schmutz21, J. Schrader27, Bo Segerman4, H. Shin, Asim Siddiqui, Fredrik Sterky, Astrid Terry6, Chung-Jui Tsai11, Edward C. Uberbacher1, Per Unneberg, Jorma Vahala24, Kerr Wall13, Susan R. Wessler15, Guojun Yang15, T. Yin1, Carl J. Douglas5, Marco A. Marra, Göran Sandberg8, Y. Van de Peer7, Daniel S. Rokhsar17, Daniel S. Rokhsar6 
15 Sep 2006-Science
TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.
Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

4,025 citations


Journal ArticleDOI
TL;DR: Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.
Abstract: The first wave of information from the analysis of the human genome revealed SNPs to be the main source of genetic and phenotypic human variation. However, the advent of genome-scanning technologies has now uncovered an unexpectedly large extent of what we term 'structural variation' in the human genome. This comprises microscopic and, more commonly, submicroscopic variants, which include deletions, duplications and large-scale copy-number variants - collectively termed copy-number variants or copy-number polymorphisms - as well as insertions, inversions and translocations. Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.

1,804 citations


Journal ArticleDOI
24 Mar 2006-Cell
TL;DR: A screen based on high-content imaging was developed to identify genes required for mitotic progression in human cancer cells and applied to an arrayed set of 5,000 unique shRNA-expressing lentiviruses that target 1,028 human genes, providing a widely applicable resource for loss-of-function screens.

1,760 citations


Journal ArticleDOI
26 Oct 2006-Nature
TL;DR: The genome sequence of the honeybee Apis mellifera is reported, suggesting a novel African origin for the species A. melliferA and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.
Abstract: Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A+T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference and DNA methylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.

1,673 citations


Journal ArticleDOI
TL;DR: The most striking feature of the USA300 genome is the horizontal acquisition of a novel mobile genetic element that encodes an arginine deiminase pathway and an oligopeptide permease system that could contribute to growth and survival of USA300.

1,507 citations


Journal ArticleDOI
TL;DR: The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data.
Abstract: The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.

1,332 citations


Journal ArticleDOI
TL;DR: These tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini.
Abstract: Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.

1,324 citations


Journal ArticleDOI
01 Apr 2006-Obesity
TL;DR: The 12th update of the human obesity gene map is presented, which incorporates published results up to the end of October 2005, and shows putative loci on all chromosomes except Y.
Abstract: This paper presents the 12th update of the human obesity gene map, which incorporates published results up to the end of October 2005. Evidence from single-gene mutation obesity cases, Mendelian disorders exhibiting obesity as a clinical feature, transgenic and knockout murine models relevant to obesity, quantitative trait loci (QTL) from animal cross-breeding experiments, association studies with candidate genes, and linkages from genome scans is reviewed. As of October 2005, 176 human obesity cases due to single-gene mutations in 11 different genes have been reported, 50 loci related to Mendelian syndromes relevant to human obesity have been mapped to a genomic region, and causal genes or strong candidates have been identified for most of these syndromes. There are 244 genes that, when mutated or expressed as transgenes in the mouse, result in phenotypes that affect body weight and adiposity. The number of QTLs reported from animal models currently reaches 408. The number of human obesity QTLs derived from genome scans continues to grow, and we now have 253 QTLs for obesity-related phenotypes from 61 genome-wide scans. A total of 52 genomic regions harbor QTLs supported by two or more studies. The number of studies reporting associations between DNA sequence variation in specific genes and obesity phenotypes has also increased considerably, with 426 findings of positive associations with 127 candidate genes. A promising observation is that 22 genes are each supported by at least five positive studies. The obesity gene map shows putative loci on all chromosomes except Y. The electronic version of the map with links to useful publications and relevant sites can be found at http://obesitygene.pbrc.edu.

1,205 citations


Journal ArticleDOI
13 Jan 2006-Cell
TL;DR: A robust approach is described that couples chromatin immunoprecipitation (ChIP) with the paired-end ditag (PET) sequencing strategy for unbiased and precise global localization of transcription-factor binding sites (TFBS).

1,180 citations


Journal ArticleDOI
TL;DR: A high-throughput 3C approach, 3C-Carbon Copy (5C), that employs microarrays or quantitative DNA sequencing using 454-technology as detection methods that should be widely applicable for large-scale mapping of cis- and trans- interaction networks of genomic elements and for the study of higher-order chromosome structure.
Abstract: Physical interactions between genetic elements located throughout the genome play important roles in gene regulation and can be identified with the Chromosome Conformation Capture (3C) methodology. 3C converts physical chromatin interactions into specific ligation products, which are quantified individually by PCR. Here we present a high-throughput 3C approach, 3C-Carbon Copy (5C), that employs microarrays or quantitative DNA sequencing using 454-technology as detection methods. We applied 5C to analyze a 400-kb region containing the human beta-globin locus and a 100-kb conserved gene desert region. We validated 5C by detection of several previously identified looping interactions in the beta-globin locus. We also identified a new looping interaction in K562 cells between the beta-globin Locus Control Region and the gamma-beta-globin intergenic region. Interestingly, this region has been implicated in the control of developmental globin gene switching. 5C should be widely applicable for large-scale mapping of cis- and trans- interaction networks of genomic elements and for the study of higher-order chromosome structure.

Journal ArticleDOI
23 Nov 2006-Nature
TL;DR: This study characterized the in vivo enhancer activity of a large group of non-coding elements in the human genome that are conserved in human–pufferfish, Takifugu (Fugu) rubripes, or ultraconserved in human-mouse–rat.
Abstract: Identifying the sequences that direct the spatial and temporal expression of genes and defining their function in vivo remains a significant challenge in the annotation of vertebrate genomes. One major obstacle is the lack of experimentally validated training sets. In this study, we made use of extreme evolutionary sequence conservation as a filter to identify putative gene regulatory elements, and characterized the in vivo enhancer activity of a large group of non-coding elements in the human genome that are conserved in human-pufferfish, Takifugu (Fugu) rubripes, or ultraconserved in human-mouse-rat. We tested 167 of these extremely conserved sequences in a transgenic mouse enhancer assay. Here we report that 45% of these sequences functioned reproducibly as tissue-specific enhancers of gene expression at embryonic day 11.5. While directing expression in a broad range of anatomical structures in the embryo, the majority of the 75 enhancers directed expression to various regions of the developing nervous system. We identified sequence signatures enriched in a subset of these elements that targeted forebrain expression, and used these features to rank all approximately 3,100 non-coding elements in the human genome that are conserved between human and Fugu. The testing of the top predictions in transgenic mice resulted in a threefold enrichment for sequences with forebrain enhancer activity. These data dramatically expand the catalogue of human gene enhancers that have been characterized in vivo, and illustrate the utility of such training sets for a variety of biological applications, including decoding the regulatory vocabulary of the human genome.

Journal ArticleDOI
Jörg Kämper1, Regine Kahmann1, Michael Bölker2, Li-Jun Ma3, Thomas Brefort1, Barry J. Saville4, Barry J. Saville5, Flora Banuett6, James W. Kronstad7, Scott E. Gold8, Olaf Müller1, Michael H. Perlin9, Han A. B. Wösten10, Ronald P. de Vries10, Jose Ruiz-Herrera, Cristina G. Reynaga-Peña, Karen M. Snetselaar11, Michael P. McCann11, José Pérez-Martín12, Michael Feldbrügge1, Christoph W. Basse1, Gero Steinberg1, José I. Ibeas12, William K. Holloman13, Plinio Guzmán14, Mark L. Farman15, Jason E. Stajich16, Rafael Sentandreu17, Juan Manuel González-Prieto, John C. Kennell18, Lazaro Molina1, Jan Schirawski1, Artemio Mendoza-Mendoza1, Doris Greilinger1, Karin Münch1, Nicole Rössel1, Mario Scherer1, Miroslav Vranes1, Oliver Ladendorf1, Volker Vincon1, Uta Fuchs1, Björn Sandrock2, Shaowu Meng5, Eric C.H. Ho5, Matt J. Cahill5, Kylie J. Boyce7, Jana Klose7, Steven J. Klosterman8, Heine J. Deelstra10, Lucila Ortiz-Castellanos, Weixi Li15, Patricia Sánchez-Alonso14, Peter Schreier19, Isolde Häuser-Hahn19, Martin Vaupel19, Edda Koopmann19, Gabi Friedrich19, Hartmut Voss, Thomas Schlüter, Jonathan Margolis20, Darren Mark Platt20, Candace Swimmer20, Andreas Gnirke20, Feng Chen20, Valentina Vysotskaia20, Gertrud Mannhaupt1, Ulrich Güldener, Martin Münsterkötter, Dirk Haase, Matthias Oesterheld, Hans-Werner Mewes21, Evan Mauceli3, David DeCaprio3, Claire M. Wade3, Jonathan Butler3, Sarah Young3, David B. Jaffe3, Sarah E. Calvo3, Chad Nusbaum3, James E. Galagan3, Bruce W. Birren3 
02 Nov 2006-Nature
TL;DR: The discovery of the secreted protein gene clusters and the functional demonstration of their decisive role in the infection process illuminate previously unknown mechanisms of pathogenicity operating in biotrophic fungi.
Abstract: Ustilago maydis is a ubiquitous pathogen of maize and a well-established model organism for the study of plant-microbe interactions. This basidiomycete fungus does not use aggressive virulence strategies to kill its host. U. maydis belongs to the group of biotrophic parasites (the smuts) that depend on living tissue for proliferation and development. Here we report the genome sequence for a member of this economically important group of biotrophic fungi. The 20.5-million-base U. maydis genome assembly contains 6,902 predicted protein-encoding genes and lacks pathogenicity signatures found in the genomes of aggressive pathogenic fungi, for example a battery of cell-wall-degrading enzymes. However, we detected unexpected genomic features responsible for the pathogenicity of this organism. Specifically, we found 12 clusters of genes encoding small secreted proteins with unknown function. A significant fraction of these genes exists in small gene families. Expression analysis showed that most of the genes contained in these clusters are regulated together and induced in infected tissue. Deletion of individual clusters altered the virulence of U. maydis in five cases, ranging from a complete lack of symptoms to hypervirulence. Despite years of research into the mechanism of pathogenicity in U. maydis, no 'true' virulence factors had been previously identified. Thus, the discovery of the secreted protein gene clusters and the functional demonstration of their decisive role in the infection process illuminate previously unknown mechanisms of pathogenicity operating in biotrophic fungi. Genomic analysis is, similarly, likely to open up new avenues for the discovery of virulence determinants in other pathogens.

Journal ArticleDOI
06 Apr 2006-Nature
TL;DR: This work uses environmental genomics—the reconstruction of genomic data directly from the environment—to assemble the genome of the uncultured anammox bacterium Kuenenia stuttgartiensis from a complex bioreactor community, and identifies candidate genes responsible for ladderane biosynthesis and biological hydrazine metabolism.
Abstract: Ten years ago a fortuitous discovery led to the identification of oceanic bacteria capable of anaerobic ammonium oxidation (anammox). It was soon recognized that the anammox reaction has great ecological significance, as it is responsible for removing up to 50% of fixed nitrogen from the oceans. The genome of the anammox bacterium Kuenenia stuttgartiensis has now been sequenced in a remarkable feat of what is called environmental genomics. Anammox bacteria grow very slowly and are not available in pure culture. For genome analysis an inoculum of wastewater sludge was grown in a bioreactor for one year, clocking up 10–15 generations. The DNA of the whole microbial community was sequenced and the genome of this one anammox bacterium was deduced from the results. With the genome sequence known, it will be possible to gain insight into the metabolism and evolution of these important bacteria. The genome of Kuenenia stuttgartiensis has been sequenced to learn more about anaerobic ammonium oxidation. Anaerobic ammonium oxidation (anammox) has become a main focus in oceanography and wastewater treatment1,2. It is also the nitrogen cycle's major remaining biochemical enigma. Among its features, the occurrence of hydrazine as a free intermediate of catabolism3,4, the biosynthesis of ladderane lipids5,6 and the role of cytoplasm differentiation7 are unique in biology. Here we use environmental genomics8,9—the reconstruction of genomic data directly from the environment—to assemble the genome of the uncultured anammox bacterium Kuenenia stuttgartiensis10 from a complex bioreactor community. The genome data illuminate the evolutionary history of the Planctomycetes and allow us to expose the genetic blueprint of the organism's special properties. Most significantly, we identified candidate genes responsible for ladderane biosynthesis and biological hydrazine metabolism, and discovered unexpected metabolic versatility.

Journal ArticleDOI
10 Nov 2006-Science
TL;DR: The sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus is reported, a model for developmental and systems biology and yields insights into the evolution of deuterostomes.
Abstract: We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.

Journal ArticleDOI
01 Sep 2006-Science
TL;DR: Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oömycete avirulence genes.
Abstract: Draft genome sequences have been determined for the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Oomycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms, and the presence of many Phytophthora genes of probable phototroph origin supports a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oomycete avirulence genes.

Journal ArticleDOI
TL;DR: It is suggested that the human genome contains many more miRNAs than currently identified and an experimental approach called miRNA serial analysis of gene expression (miRAGE) is developed and used to perform the largest experimental analysis of human mi RNAs to date.
Abstract: MicroRNAs (miRNAs) are a class of small noncoding RNAs that have important regulatory roles in multicellular organisms. The public miRNA database contains 321 human miRNA sequences, 234 of which have been experimentally verified. To explore the possibility that additional miRNAs are present in the human genome, we have developed an experimental approach called miRNA serial analysis of gene expression (miRAGE) and used it to perform the largest experimental analysis of human miRNAs to date. Sequence analysis of 273,966 small RNA tags from human colorectal cells allowed us to identify 200 known mature miRNAs, 133 novel miRNA candidates, and 112 previously uncharacterized miRNA* forms. To aid in the evaluation of candidate miRNAs, we disrupted the Dicer locus in three human colorectal cancer cell lines and examined known and novel miRNAs in these cells. These studies suggest that the human genome contains many more miRNAs than currently identified and provide an approach for the large-scale experimental cloning of novel human miRNAs in human tissues.

Journal ArticleDOI
TL;DR: An overview of amino acid substitution (AAS) prediction methods, which use sequence and/or structure to predict the effect of an AAS on protein function, and the utility of AAS prediction methods for Mendelian and complex diseases as well as their broader applications for understanding protein function.
Abstract: Nonsynonymous single nucleotide polymorphisms (nsSNPs) are coding variants that introduce amino acid changes in their corresponding proteins. Because nsSNPs can affect protein function, they are believed to have the largest impact on human health compared with SNPs in other regions of the genome. Therefore, it is important to distinguish those nsSNPs that affect protein function from those that are functionally neutral. Here we provide an overview of amino acid substitution (AAS) prediction methods, which use sequence and/or structure to predict the effect of an AAS on protein function. Most methods predict approximately 25–30% of human nsSNPs to negatively affect protein function, and such nsSNPs tend to be rare in the population. We discuss the utility of AAS prediction methods for Mendelian and complex diseases as well as their broader applications for understanding protein function.

Journal ArticleDOI
TL;DR: The complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain, is determined; it indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons.
Abstract: We determined the complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain. Our analysis indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons. These mobile elements are putatively responsible for the acquisition by C. difficile of an extensive array of genes involved in antimicrobial resistance, virulence, host interaction and the production of surface structures. The metabolic capabilities encoded in the genome show multiple adaptations for survival and growth within the gut environment. The extreme genome variability was confirmed by whole-genome microarray analysis; it may reflect the organism's niche in the gut and should provide information on the evolution of virulence in this organism.

Journal ArticleDOI
TL;DR: The present bioinformatic research proposes a strategy to answer the question of how many and which proteins encoded in the human genome may require zinc for their physiological function by a combination of approaches, which include searching in the proteome for the zinc-binding patterns that are obtained from all available X-ray data.
Abstract: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, or for regulation of their activities or for structural purposes. Genome sequencing projects have provided a huge number of protein primary sequences, but, even though several different elaborate analyses and annotations have been enabled by a rich and ever-increasing portfolio of bioinformatic tools, metal-binding properties remain difficult to predict as well as to investigate experimentally. Consequently, the present knowledge about metalloproteins is only partial. The present bioinformatic research proposes a strategy to answer the question of how many and which proteins encoded in the human genome may require zinc for their physiological function. This is achieved by a combination of approaches, which include: (i) searching in the proteome for the zinc-binding patterns that, on their turn, are obtained from all available X-ray data; (ii) using libraries of metal-binding protei...

Journal ArticleDOI
TL;DR: Using global transposon mutagenesis, this work identifies 382 of the 482 M. genitalium protein-coding genes as essential, plus five sets of disrupted genes that encode proteins with potentially redundant essential functions, such as phosphate transport.
Abstract: Mycoplasma genitalium has the smallest genome of any organism that can be grown in pure culture. It has a minimal metabolism and little genomic redundancy. Consequently, its genome is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. Using global transposon mutagenesis, we isolated and characterized gene disruption mutants for 100 different nonessential protein-coding genes. None of the 43 RNA-coding genes were disrupted. Herein, we identify 382 of the 482 M. genitalium protein-coding genes as essential, plus five sets of disrupted genes that encode proteins with potentially redundant essential functions, such as phosphate transport. Genes encoding proteins of unknown function constitute 28% of the essential protein-coding genes set. Disruption of some genes accelerated M. genitalium growth.

Journal ArticleDOI
TL;DR: The complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri, is unveiled, making O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry.
Abstract: The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C(4) photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry.

Journal ArticleDOI
TL;DR: One or more new sequencing technologies are expected to become the mainstay of future research, and to make DNA sequencing centre stage as a routine tool in genetic research in the coming years.

Journal ArticleDOI
TL;DR: This review focuses on the monophyletic group of animal RNA viruses united in the order Nidovirales, which includes the largest known RNA genomes and will therefore be called ‘large’ nidoviruses in this review.

01 Aug 2006
TL;DR: Comparison genome analysis is utilized to identify candidate enhancer elements in the human genome coupled with the experimental determination of their in vivo enhancer activity in transgenic mice, and experimentally validated training sets are expected to provide a basis for a wide range of downstream computational and functional studies of enhancer function.
Abstract: Despite the known existence of distant-acting cis-regulatory elements in the human genome, only a small fraction of these elements has been identified and experimentally characterized in vivo. This paucity of enhancer collections with defined activities has thus hindered computational approaches for the genome-wide prediction of enhancers and their functions. To fill this void, we utilize comparative genome analysis to identify candidate enhancer elements in the human genome coupled with the experimental determination of their in vivo enhancer activity in transgenic mice (1). These data are available through the VISTA Enhancer Browser (http://enhancer.lbl.gov). This growing database currently contains over 250 experimentally tested DNA fragments, of which more than 100 have been validated as tissue-specific enhancers. For each positive enhancer, we provide digital images of whole-mount embryo staining at embryonic day 11.5 and an anatomical description of the reporter gene expression pattern. Users can retrieve elements near single genes of interest, search for enhancers that target reporter gene expression to a particular tissue, or download entire collections of enhancers with a defined tissue specificity or conservation depth. These experimentally validated training sets are expected to provide a basis for a wide range of downstream computational and functional studies of enhancer function.

Journal ArticleDOI
09 Nov 2006-Nature
TL;DR: It is shown that in the unicellular eukaryote Paramecium tetraurelia, a ciliate, most of the nearly 40,000 genes arose through at least three successive whole-genome duplications.
Abstract: The duplication of entire genomes has long been recognized as having great potential for evolutionary novelties, but the mechanisms underlying their resolution through gene loss are poorly understood. Here we show that in the unicellular eukaryote Paramecium tetraurelia, a ciliate, most of the nearly 40,000 genes arose through at least three successive whole-genome duplications. Phylogenetic analysis indicates that the most recent duplication coincides with an explosion of speciation events that gave rise to the P. aurelia complex of 15 sibling species. We observed that gene loss occurs over a long timescale, not as an initial massive event. Genes from the same metabolic pathway or protein complex have common patterns of gene loss, and highly expressed genes are over-retained after all duplications. The conclusion of this analysis is that many genes are maintained after whole-genome duplication not because of functional innovation but because of gene dosage constraints.

Journal ArticleDOI
TL;DR: This work describes a systematic method for using dense SNP genotype data to discover deletions and its application to data from the International HapMap Consortium to characterize and catalogue segregating deletion variants across the human genome.
Abstract: The locations and properties of common deletion variants in the human genome are largely unknown. We describe a systematic method for using dense SNP genotype data to discover deletions and its application to data from the International HapMap Consortium to characterize and catalogue segregating deletion variants across the human genome. We identified 541 deletion variants (94% novel) ranging from 1 kb to 745 kb in size; 278 of these variants were observed in multiple, unrelated individuals, 120 in the homozygous state. The coding exons of ten expressed genes were found to be commonly deleted, including multiple genes with roles in sex steroid metabolism, olfaction and drug response. These common deletion polymorphisms typically represent ancestral mutations that are in linkage disequilibrium with nearby SNPs, meaning that their association to disease can often be evaluated in the course of SNP-based whole-genome association studies.

Journal ArticleDOI
TL;DR: By quantifying RNA expression on both strands of the complete genome of Saccharomyces cerevisiae using a high-density oligonucleotide tiling array, this study identifies the boundary, structure, and level of coding and noncoding transcripts.
Abstract: There is abundant transcription from eukaryotic genomes unaccounted for by protein coding genes. A high-resolution genome-wide survey of transcription in a well annotated genome will help relate transcriptional complexity to function. By quantifying RNA expression on both strands of the complete genome of Saccharomyces cerevisiae using a high-density oligonucleotide tiling array, this study identifies the boundary, structure, and level of coding and noncoding transcripts. A total of 85% of the genome is expressed in rich media. Apart from expected transcripts, we found operon-like transcripts, transcripts from neighboring genes not separated by intergenic regions, and genes with complex transcriptional architecture where different parts of the same gene are expressed at different levels. We mapped the positions of 3' and 5' UTRs of coding genes and identified hundreds of RNA transcripts distinct from annotated genes. These nonannotated transcripts, on average, have lower sequence conservation and lower rates of deletion phenotype than protein coding genes. Many other transcripts overlap known genes in antisense orientation, and for these pairs global correlations were discovered: UTR lengths correlated with gene function, localization, and requirements for regulation; antisense transcripts overlapped 3' UTRs more than 5' UTRs; UTRs with overlapping antisense tended to be longer; and the presence of antisense associated with gene function. These findings may suggest a regulatory role of antisense transcription in S. cerevisiae. Moreover, the data show that even this well studied genome has transcriptional complexity far beyond current annotation.

Journal ArticleDOI
14 Jul 2006-Science
TL;DR: H-NS provides a previously unrecognized mechanism of bacterial defense against foreign DNA, enabling the acquisition of DNA from exogenous sources while avoiding detrimental consequences from unregulated expression of newly acquired genes.
Abstract: Horizontal gene transfer plays a major role in microbial evolution. However, newly acquired sequences can decrease fitness unless integrated into preexisting regulatory networks. We found that the histone-like nucleoid structuring protein (H-NS) selectively silences horizontally acquired genes by targeting sequences with GC content lower than the resident genome. Mutations in hns are lethal in Salmonella unless accompanied by compensatory mutations in other regulatory loci. Thus, H-NS provides a previously unrecognized mechanism of bacterial defense against foreign DNA, enabling the acquisition of DNA from exogenous sources while avoiding detrimental consequences from unregulated expression of newly acquired genes. Characteristic GC/AT ratios of bacterial genomes may facilitate discrimination between a cell's own DNA and foreign DNA.

Book
15 Jan 2006
TL;DR: This book discusses three ways to achieve "Drive" within-Individual Kinship Conflicts rates of Spread Effects on the Host Population, and the study of Selfish Genetic Elements.
Abstract: Preface 1. SELFISH GENETIC ELEMENTS Genetic Cooperation and Conflict Three Ways to Achieve "Drive" Within-Individual Kinship Conflicts Rates of Spread Effects on the Host Population The Study of Selfish Genetic Elements Design of This Book 2. AUTOSOMAL KILLERS The t Haplotype Discovery Structure of the t Haplotype History and Distribution Genetics of Drive Importance of Mating System and Gamete Competition Fate of Resistant Alleles Selection for Inversions Recessive Lethals in t Complexes Enhancers and Suppressors t and the Major Histocompatability Complex Heterozygous (+/t) Fitness Effects: Sex Antagonistic? Accounting for t Frequencies in Nature Other Gamete Killers Segregation Distorter in Drosophila Spore Killers in Fungi Incidence of Gamete Killers Maternal-Effect Killers Medea in Flour Beetles HSR, scat+, and OmDDK in Mice The Evolution of Maternal-Effect Killers Gestational Drive? Gametophyte Factors in Plants 3. SELFISH SEX CHROMOSOMES Sex Chromosome Drive in the Diptera Killer X Chromosomes Killer Y Chromosomes Taxonomic Distribution of Killer Sex Chromosomes Evolutionary Cycles of Sex Determination Feminizing X (and Y) Chromosomes in Rodents The Varying Lemming The Wood Lemming Other Murids Other Conflicts: Sex Ratios and Mate Choice 4. GENOMIC IMPRINTING Imprinting and Parental Investment in Mammals Igf2 and Igf2r: Oppositely Imprinted, Oppositely Acting Growth Factors in Mice Growth Effects of Imprinted Genes in Mice and Humans Evolution of the Imprinting Apparatus The Mechanisms of Imprinting Involve Methylation and Are Complex Conflict Between Different Components of the Imprinting Machinery History of Conflict Reflected in the Imprinting Apparatus Evolutionary Turnover of the Imprinting Apparatus Intralocus Interactions, Polar Overdominance, and Paramutation Transmission Ratio Distortion at Imprinted Loci Biparental Imprinting and Other Possibilities Other Traits: Social Interactions after the Period of Parental Investment Maternal Behavior in Mice Inbreeding and Dispersal Kin Recognition Functional Interpretation of Tissue Effects in Chimeric Mice Deceit and Selves-Deception Imprinting and the Sex Chromosomes Genomic Imprinting in Other Taxa Flowering Plants Other Taxa Predicted to Have Imprinting 5. SELFISH MITOCHONDRIAL DNA Mitochondrial Genomics: A Primer Mitochondrial Selection within the Individual "Petite" Mutations in Yeast Within-Individual Selection and the Evolution of Uniparental Inheritance Within-Individual Selection under Uniparental Inheritance DUI: Mother-to-Daughter and Father-to-Son mtDNA Inheritance in Mussels Cytoplasmic Male Sterility Uniparental Inheritance Implies Unisexual Selection Disproportionate Role of mtDNA in Plant Male Sterility Mechanisms of Mitochondrial Action and Nuclear Reaction CMS and Restorers in Natural Populations CMS, Masculinization, and the Evolution of Separate Sexes Pollen Limitation, Frequency Dependence, and Local Extinction Resource Reallocation Versus Inbreeding Avoidance Importance of Mutational Variation CMS and Paternal Transmission Other Traces of Mito-Nuclear Conflict Mitochondria and Apoptosis Mitochondria and Germ Cell Determination Mitochondria and RNA Editing 6. GENE CONVERSION AND HOMING Biased Gene Conversion Molecular Mechanisms Effective Selection Coefficients Due to BGC in Fungi BGC and Genome Evolution BGC and Evolution of the Meiotic Machinery Homing and Retrohoming How HEGs Home HEGs Usually Associated with Self-Splicing Introns or Inteins HEGs and Host Mating System Evolutionary Cycle of Horizontal Transmission, Degeneration, and Loss HEG Domestication and Mating-Type Switching in Yeast Group II Introns Artificial HEGs As Tools for Population Genetic Engineering The Basic Construct Increasing the Load Preventing Natural Resistance and Horizontal Transmission Population Genetic Engineering Other Uses 7. TRANSPOSABLE ELEMENTS Molecular Structure and Mechanisms DNA Transposons LINEs and SINEs LTR Retroelements Population Biology and Natural Selection Transposition Rates Low But Greater Than Excision Rates Natural Selection on the Host Slows the Spread of Transposable Elements Rapid Spread of P Elements in D. melanogaster Net Reproductive Rate a Function of Transposition Rate and Effect on Host Fitness Reducing Harm to the Host Transposition Rate and Copy Number "Regulation" Selection for Self-Recognition Defective and Repressor Elements Extinction of Active Elements in Host Species Horizontal Transmission and Long-Term Persistence Transposable Elements in Inbred and Outcrossed Populations Beneficial Inserts Rates of Fixation Transposable Elements and Host Evolution Transposable Elements and Chromosomal Rearrangements Transposable Elements and Genome Size Co-Option of Transposable Element Functions and Host Defenses Transposable Elements As Parasites, Not Host Adaptations or Mutualists Origins Ancient, Chimeric, and Polyphyletic Origins 8. FEMALE DRIVE Selfish Centromeres and Female Meiosis Abnormal Chromosome 10 of Maize Other Knobs in Maize Deleterious Effects of Knobs in Maize Knobs, Supernumerary Segments, and Neocentromeres in Other Species Meiosis-Specific Centromeres and Holocentric Chromosomes Selfish Centromeres and Meiosis I The Importance of Centromere Number: Robertsonian Translocations in Mammals Sperm-Dependent Female Drive? Female Drive and Karyotype Evolution Polar Bodies Rejoining the Germline 9. B CHROMOSOMES Drive Types of Drive Genetics of A and B Factors Affecting B Drive Transmission Rates inWell-Studied Species Absence of Drive Degree of Outcrossing and Drive Effects on the Phenotype Effects on Genome Size, Cell Size, and Cell Cycle Effects on the External Phenotype Disappearance from Somatic Tissue B Number and the Odd-Even Effect Negative Effects of Bs More Pronounced under Harsher Conditions Is the Sex of Drive Associated with the Sex of Phenotypic Effect? B Effects on Recombination Among the As Pairing of A Chromosomes in Hybrids Neutral and Beneficial Bs Beneficial B Chromosomes B Chromosomes in Eyprepocnemis plorans: A Case of Continuous Neutralization? Structure and Content Size Polymorphism Heterochromatin Genes Tandem Repeats The Origin of Bs A Factors Associated with B Presence Genome Size Chromosome Number Ploidy Shape of A Chromosomes Bs and the Sex Ratio Paternal Sex Ratio (PSR) in Nasonia X-B Associations in Orthoptera Has the Drosophila Y Evolved from a B? Other Effects of Bs on the Sex Ratio Male Sterility in Plantago 10. GENOMIC EXCLUSION Paternal Genome Loss in Males, or Parahaplodiploidy PGL in Mites PGL in Scale Insects PGL in the Coffee Borer Beetle PGL in Springtails? Evolution of PGL PGL and Haplodiploidy Sciarid Chromosome System Notable Features of the Sciarid System An Evolutionary Hypothesis Mechanisms PGL in Gall Midges Hybridogenesis, or Hemiclonal Reproduction The Topminnow Poeciliopsis The Water Frog Rana esculenta The Stick Insect Bacillus rossius-grandii Evolution of Hybridogenesis Androgenesis, or Maternal Genome Loss The Conifer Cupressus dupreziana The Clam Corbicula The Stick Insect Bacillus rossius-grandii 11. SELFISH CELL LINEAGES Mosaics Somatic Cell Lineage Selection: Cancer and the Adaptive Immune System Cell Lineage Selection in the Germline Evolution of the Germline Selfish Genes and Germline-Limited DNA Chimeras Taxonomic Survey of Chimerism Somatic Chimerism and Polar Bodies 12. SUMMARY AND FUTURE DIRECTIONS Logic of Selfish Genetic Elements Molecular Genetics Selfish Genes and Sex Fate of a Selfish Gene within a Species Movement between Species Distribution among Species Role in Host Evolution The HiddenWorld of Selfish Genetic Elements References Glossary Index