scispace - formally typeset
Search or ask a question

Showing papers on "Genome published in 2007"


Journal ArticleDOI
14 Jun 2007-Nature
TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.
Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

5,091 citations


Journal ArticleDOI
29 Jun 2007-Cell
TL;DR: A relatively small set of miRNAs, many of which are ubiquitously expressed, account for most of the differences in miRNA profiles between cell lineages and tissues.

3,687 citations


Journal ArticleDOI
TL;DR: It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available and reveal extensive gene diversity within the current concept of "species".
Abstract: DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.

3,471 citations


Journal ArticleDOI
26 Aug 2007-Nature
TL;DR: A high-quality draft of the genome sequence of grapevine is obtained from a highly homozygous genotype, revealing the contribution of three ancestral genomes to the grapevine haploid content and explaining the chronology of previously described whole-genome duplication events in the evolution of flowering plants.
Abstract: The analysis of the first plant genomes provided unexpected evidence for genome duplication events in species that had previously been considered as true diploids on the basis of their genetics. These polyploidization events may have had important consequences in plant evolution, in particular for species radiation and adaptation and for the modulation of functional capacities. Here we report a high-quality draft of the genome sequence of grapevine (Vitis vinifera) obtained from a highly homozygous genotype. The draft sequence of the grapevine genome is the fourth one produced so far for flowering plants, the second for a woody species and the first for a fruit crop (cultivated for both fruit and beverage). Grapevine was selected because of its important place in the cultural heritage of humanity beginning during the Neolithic period. Several large expansions of gene families with roles in aromatic features are observed. The grapevine genome has not undergone recent genome duplication, thus enabling the discovery of ancestral traits and features of the genetic organization of flowering plants. This analysis reveals the contribution of three ancestral genomes to the grapevine haploid content. This ancestral arrangement is common to many dicotyledonous plants but is absent from the genome of rice, which is a monocotyledon. Furthermore, we explain the chronology of previously described whole-genome duplication events in the evolution of flowering plants.

3,311 citations


Journal ArticleDOI
08 Jun 2007-Science
TL;DR: A large-scale chromatin immunoprecipitation assay based on direct ultrahigh-throughput DNA sequencing was developed, which was then used to map in vivo binding of the neuron-restrictive silencer factor (NRSF; also known as REST) to 1946 locations in the human genome.
Abstract: In vivo protein-DNA interactions connect each transcription factor with its direct targets to form a gene network scaffold. To map these protein-DNA interactions comprehensively across entire mammalian genomes, we developed a large-scale chromatin immunoprecipitation assay (ChIPSeq) based on direct ultrahigh-throughput DNA sequencing. This sequence census method was then used to map in vivo binding of the neuron-restrictive silencer factor (NRSF; also known as REST, for repressor element–1 silencing transcription factor) to 1946 locations in the human genome. The data display sharp resolution of binding position [±50 base pairs (bp)], which facilitated our finding motifs and allowed us to identify noncanonical NRSF-binding motifs. These ChIPSeq data also have high sensitivity and specificity [ROC (receiver operator characteristic) area ≥ 0.96] and statistical confidence (P <10^(–4)), properties that were important for inferring new candidate interactions. These include key transcription factors in the gene network that regulates pancreatic islet cell development.

2,789 citations


Journal ArticleDOI
TL;DR: The interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella.
Abstract: Motivation: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host. Results: The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella. Availability: Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer Contact: adelcher@umiacs.umd.edu

2,738 citations


Journal Article
TL;DR: In this paper, the coding exons of the family of 518 protein kinases were sequenced in 210 cancers of diverse histological types to explore the nature of the information that will be derived from cancer genome sequencing.
Abstract: AACR Centennial Conference: Translational Cancer Medicine-- Nov 4-8, 2007; Singapore PL02-05 All cancers are due to abnormalities in DNA. The availability of the human genome sequence has led to the proposal that resequencing of cancer genomes will reveal the full complement of somatic mutations and hence all the cancer genes. To explore the nature of the information that will be derived from cancer genome sequencing we have sequenced the coding exons of the family of 518 protein kinases, ~1.3Mb DNA per cancer sample, in 210 cancers of diverse histological types. Despite the screen being directed toward the coding regions of a gene family that has previously been strongly implicated in oncogenesis, the results indicate that the majority of somatic mutations detected are “passengers”. There is considerable variation in the number and pattern of these mutations between individual cancers, indicating substantial diversity of processes of molecular evolution between cancers. The imprints of exogenous mutagenic exposures, mutagenic treatment regimes and DNA repair defects can all be seen in the distinctive mutational signatures of individual cancers. This systematic mutation screen and others have previously yielded a number of cancer genes that are frequently mutated in one or more cancer types and which are now anticancer drug targets (for example BRAF , PIK3CA , and EGFR ). However, detailed analyses of the data from our screen additionally suggest that there exist a large number of additional “driver” mutations which are distributed across a substantial number of genes. It therefore appears that cells may be able to utilise mutations in a large repertoire of potential cancer genes to acquire the neoplastic phenotype. However, many of these genes are employed only infrequently. These findings may have implications for future anticancer drug development.

2,737 citations


Journal ArticleDOI
08 Mar 2007-Nature
TL;DR: More than 1,000 somatic mutations found in 274 megabases of DNA corresponding to the coding exons of 518 protein kinase genes in 210 diverse human cancers reveal the evolutionary diversity of cancers and implicates a larger repertoire of cancer genes than previously anticipated.
Abstract: Cancers arise owing to mutations in a subset of genes that confer growth advantage. The availability of the human genome sequence led us to propose that systematic resequencing of cancer genomes for mutations would lead to the discovery of many additional cancer genes. Here we report more than 1,000 somatic mutations found in 274 megabases (Mb) of DNA corresponding to the coding exons of 518 protein kinase genes in 210 diverse human cancers. There was substantial variation in the number and pattern of mutations in individual cancers reflecting different exposures, DNA repair defects and cellular origins. Most somatic mutations are likely to be 'passengers' that do not contribute to oncogenesis. However, there was evidence for 'driver' mutations contributing to the development of the cancers studied in approximately 120 genes. Systematic sequencing of cancer genomes therefore reveals the evolutionary diversity of cancers and implicates a larger repertoire of cancer genes than previously anticipated.

2,732 citations


Journal ArticleDOI
12 Jul 2007-Nature
TL;DR: The generation and validation of a genome-wide library of Drosophila melanogaster RNAi transgenes, enabling the conditional inactivation of gene function in specific tissues of the intact organism and opening up the prospect of systematically analysing gene functions in any tissue and at any stage of the Drosophile lifespan.
Abstract: Forward genetic screens in model organisms have provided important insights into numerous aspects of development, physiology and pathology. With the availability of complete genome sequences and the introduction of RNA-mediated gene interference (RNAi), systematic reverse genetic screens are now also possible. Until now, such genome-wide RNAi screens have mostly been restricted to cultured cells and ubiquitous gene inactivation in Caenorhabditis elegans. This powerful approach has not yet been applied in a tissue-specific manner. Here we report the generation and validation of a genome-wide library of Drosophila melanogaster RNAi transgenes, enabling the conditional inactivation of gene function in specific tissues of the intact organism. Our RNAi transgenes consist of short gene fragments cloned as inverted repeats and expressed using the binary GAL4/UAS system. We generated 22,270 transgenic lines, covering 88% of the predicted protein-coding genes in the Drosophila genome. Molecular and phenotypic assays indicate that the majority of these transgenes are functional. Our transgenic RNAi library thus opens up the prospect of systematically analysing gene functions in any tissue and at any stage of the Drosophila lifespan.

2,721 citations


Journal ArticleDOI
Sabeeha S. Merchant1, Simon E. Prochnik2, Olivier Vallon3, Elizabeth H. Harris4, Steven J. Karpowicz1, George B. Witman5, Astrid Terry2, Asaf Salamov2, Lillian K. Fritz-Laylin6, Laurence Maréchal-Drouard7, Wallace F. Marshall8, Liang-Hu Qu9, David R. Nelson10, Anton A. Sanderfoot11, Martin H. Spalding12, Vladimir V. Kapitonov13, Qinghu Ren, Patrick J. Ferris14, Erika Lindquist2, Harris Shapiro2, Susan Lucas2, Jane Grimwood15, Jeremy Schmutz15, Pierre Cardol3, Pierre Cardol16, Heriberto Cerutti17, Guillaume Chanfreau1, Chun-Long Chen9, Valérie Cognat7, Martin T. Croft18, Rachel M. Dent6, Susan K. Dutcher19, Emilio Fernández20, Hideya Fukuzawa21, David González-Ballester22, Diego González-Halphen23, Armin Hallmann, Marc Hanikenne16, Michael Hippler24, William Inwood6, Kamel Jabbari25, Ming Kalanon26, Richard Kuras3, Paul A. Lefebvre11, Stéphane D. Lemaire27, Alexey V. Lobanov17, Martin Lohr28, Andrea L Manuell29, Iris Meier30, Laurens Mets31, Maria Mittag32, Telsa M. Mittelmeier33, James V. Moroney34, Jeffrey L. Moseley22, Carolyn A. Napoli33, Aurora M. Nedelcu35, Krishna K. Niyogi6, Sergey V. Novoselov17, Ian T. Paulsen, Greg Pazour5, Saul Purton36, Jean-Philippe Ral7, Diego Mauricio Riaño-Pachón37, Wayne R. Riekhof, Linda A. Rymarquis38, Michael Schroda, David B. Stern39, James G. Umen14, Robert D. Willows40, Nedra F. Wilson41, Sara L. Zimmer39, Jens Allmer42, Janneke Balk18, Katerina Bisova43, Chong-Jian Chen9, Marek Eliáš44, Karla C Gendler33, Charles R. Hauser45, Mary Rose Lamb46, Heidi K. Ledford6, Joanne C. Long1, Jun Minagawa47, M. Dudley Page1, Junmin Pan48, Wirulda Pootakham22, Sanja Roje49, Annkatrin Rose50, Eric Stahlberg30, Aimee M. Terauchi1, Pinfen Yang51, Steven G. Ball7, Chris Bowler25, Carol L. Dieckmann33, Vadim N. Gladyshev17, Pamela J. Green38, Richard A. Jorgensen33, Stephen P. Mayfield29, Bernd Mueller-Roeber37, Sathish Rajamani30, Richard T. Sayre30, Peter Brokstein2, Inna Dubchak2, David Goodstein2, Leila Hornick2, Y. Wayne Huang2, Jinal Jhaveri2, Yigong Luo2, Diego Martinez2, Wing Chi Abby Ngau2, Bobby Otillar2, Alexander Poliakov2, Aaron Porter2, Lukasz Szajkowski2, Gregory Werner2, Kemin Zhou2, Igor V. Grigoriev2, Daniel S. Rokhsar2, Daniel S. Rokhsar6, Arthur R. Grossman22 
University of California, Los Angeles1, United States Department of Energy2, University of Paris3, Duke University4, University of Massachusetts Medical School5, University of California, Berkeley6, Centre national de la recherche scientifique7, University of California, San Francisco8, Sun Yat-sen University9, University of Tennessee Health Science Center10, University of Minnesota11, Iowa State University12, Genetic Information Research Institute13, Salk Institute for Biological Studies14, Stanford University15, University of Liège16, University of Nebraska–Lincoln17, University of Cambridge18, Washington University in St. Louis19, University of Córdoba (Spain)20, Kyoto University21, Carnegie Institution for Science22, National Autonomous University of Mexico23, University of Münster24, École Normale Supérieure25, University of Melbourne26, University of Paris-Sud27, University of Mainz28, Scripps Research Institute29, Ohio State University30, University of Chicago31, University of Jena32, University of Arizona33, Louisiana State University34, University of New Brunswick35, University College London36, University of Potsdam37, Delaware Biotechnology Institute38, Boyce Thompson Institute for Plant Research39, Macquarie University40, Oklahoma State University Center for Health Sciences41, İzmir University of Economics42, Academy of Sciences of the Czech Republic43, Charles University in Prague44, St. Edward's University45, University of Puget Sound46, Hokkaido University47, Tsinghua University48, Washington State University49, Appalachian State University50, Marquette University51
12 Oct 2007-Science
TL;DR: Analyses of the Chlamydomonas genome advance the understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.
Abstract: Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.

2,554 citations


Journal ArticleDOI
Andrew G. Clark1, Michael B. Eisen2, Michael B. Eisen3, Douglas Smith  +426 moreInstitutions (70)
08 Nov 2007-Nature
TL;DR: These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution.
Abstract: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

Journal ArticleDOI
23 Feb 2007-Cell
TL;DR: Current research efforts are reviewed, with an emphasis on large-scale studies, emerging technologies, and challenges ahead.

Journal ArticleDOI
TL;DR: This study reports a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data, and defines a set of conserved protein families that occur in a wide range of eukaryotes and presents a mapping procedure that accurately identifies their exon-intron structures in a novel genomic sequence.
Abstract: Motivation The numbers of finished and ongoing genome projects are increasing at a rapid rate, and providing the catalog of genes for these new genomes is a key challenge. Obtaining a set of well-characterized genes is a basic requirement in the initial steps of any genome annotation process. An accurate set of genes is needed in order to learn about species-specific properties, to train gene-finding programs, and to validate automatic predictions. Unfortunately, many new genome projects lack comprehensive experimental data to derive a reliable initial set of genes. Results In this study, we report a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data. We define a set of conserved protein families that occur in a wide range of eukaryotes, and present a mapping procedure that accurately identifies their exon-intron structures in a novel genomic sequence. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. Our procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages. Availability Software and data sets are available online at http://korflab.ucdavis.edu/Datasets.

Journal ArticleDOI
TL;DR: A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.
Abstract: Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

Journal ArticleDOI
TL;DR: Nine newly explored regions of the chloroplast genome offer levels of variation better than the best regions identified in an earlier study and are therefore likely to be the best choices for molecular studies at low taxonomic levels.
Abstract: Although the chloroplast genome contains many noncoding regions, relatively few have been exploited for interspecific phylogenetic and intraspecific phylogeographic studies. In our recent evaluation of the phylogenetic utility of 21 noncoding chloroplast regions, we found the most widely used noncoding regions are among the least variable, but the more variable regions have rarely been employed. That study led us to conclude that there may be unexplored regions of the chloroplast genome that have even higher relative levels of variability. To explore the potential variability of previously unexplored regions, we compared three pairs of single-copy chloroplast genome sequences in three disparate angiosperm lineages: Atropa vs. Nicotiana (asterids); Lotus vs. Medicago (rosids); and Saccharum vs. Oryza (monocots). These three separate sequence alignments highlighted 13 mutational hotspots that may be more variable than the best regions of our former study. These 13 regions were then selected for a more detailed analysis. Here we show that nine of these newly explored regions (rpl32-trnL((UAG)), trnQ((UUG))-5'rps16, 3'trnV((UAC))-ndhC, ndhF-rpl32, psbD-trnT((GGU)), psbJ-petA, 3'rps16-5'trnK((UUU)), atpI-atpH, and petL-psbE) offer levels of variation better than the best regions identified in our earlier study and are therefore likely to be the best choices for molecular studies at low taxonomic levels.

Journal ArticleDOI
TL;DR: New insights have been gained into how silencing in eukaryotic cells has been co-opted to serve essential functions in 'host' cells, highlighting the importance of TEs in the epigenetic regulation of the genome.
Abstract: Overlapping epigenetic mechanisms have evolved in eukaryotic cells to silence the expression and mobility of transposable elements (TEs). Owing to their ability to recruit the silencing machinery, TEs have served as building blocks for epigenetic phenomena, both at the level of single genes and across larger chromosomal regions. Important progress has been made recently in understanding these silencing mechanisms. In addition, new insights have been gained into how this silencing has been co-opted to serve essential functions in 'host' cells, highlighting the importance of TEs in the epigenetic regulation of the genome.

Journal ArticleDOI
06 Jul 2007-Science
TL;DR: A comparative analysis of the draft genome of an emerging cnidarian model, the starlet sea anemone Nematostella vectensis, suggests that gene “inventions” along the lineage leading to animals were likely already well integrated with preexisting eukaryotic genes in the eumetazoan progenitor.
Abstract: Sea anemones are seemingly primitive animals that, along with corals, jellyfish, and hydras, constitute the oldest eumetazoan phylum, the Cnidaria. Here, we report a comparative analysis of the draft genome of an emerging cnidarian model, the starlet sea anemone Nematostella vectensis. The sea anemone genome is complex, with a gene repertoire, exon-intron structure, and large-scale gene linkage more similar to vertebrates than to flies or nematodes, implying that the genome of the eumetazoan ancestor was similarly complex. Nearly one-fifth of the inferred genes of the ancestor are eumetazoan novelties, which are enriched for animal functions like cell signaling, adhesion, and synaptic transmission. Analysis of diverse pathways suggests that these gene "inventions" along the lineage leading to animals were likely already well integrated with preexisting eukaryotic genes in the eumetazoan progenitor.

Journal ArticleDOI
TL;DR: ChIP-seq identified 41,582 and 11,004 putative STAT1-binding regions in stimulated and unstimulated cells, respectively, and found 24 loci known to contain STAT1 interferon-responsive binding sites, including 24 that were enriched in sequences similar to known STAT1 binding motifs.
Abstract: We developed a method, ChIP-sequencing (ChIP-seq), combining chromatin immunoprecipitation (ChIP) and massively parallel sequencing to identify mammalian DNA sequences bound by transcription factors in vivo. We used ChIP-seq to map STAT1 targets in interferon-γ (IFN-γ)–stimulated and unstimulated human HeLa S3 cells, and compared the method's performance to ChIP-PCR and to ChIP-chip for four chromosomes. By ChIP-seq, using 15.1 and 12.9 million uniquely mapped sequence reads, and an estimated false discovery rate of less than 0.001, we identified 41,582 and 11,004 putative STAT1-binding regions in stimulated and unstimulated cells, respectively. Of the 34 loci known to contain STAT1 interferon-responsive binding sites, ChIP-seq found 24 (71%). ChIP-seq targets were enriched in sequences similar to known STAT1 binding motifs. Comparisons with two ChIP-PCR data sets suggested that ChIP-seq sensitivity was between 70% and 92% and specificity was at least 95%.

Journal ArticleDOI
13 Apr 2007-Science
TL;DR: The genome sequence of an Indian-origin Macaca mulatta female is determined and compared with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families.
Abstract: The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.

Journal ArticleDOI
19 Oct 2007-Science
TL;DR: High-throughput and massive paired-end mapping (PEM) was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome, documenting that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function.
Abstract: Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) ∼3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

Journal ArticleDOI
TL;DR: It is shown that the promoter regions (1 kb upstream of the transcription start site TSS) of genes are significantly enriched in quadruplex motifs relative to the rest of the genome, with >40% of human gene promoters containing one or more quadruplexaterials.
Abstract: Certain G-rich DNA sequences readily form four-stranded structures called G-quadruplexes. These sequence motifs are located in telomeres as a repeated unit, and elsewhere in the genome, where their function is currently unknown. It has been proposed that G-quadruplexes may be directly involved in gene regulation at the level of transcription. In support of this hypothesis, we show that the promoter regions (1 kb upstream of the transcription start site TSS) of genes are significantly enriched in quadruplex motifs relative to the rest of the genome, with >40% of human gene promoters containing one or more quadruplex motif. Furthermore, these promoter quadruplexes strongly associate with nuclease hypersensitive sites identified throughout the genome via biochemical measurement. Regions of the human genome that are both nuclease hypersensitive and within promoters show a remarkable (230-fold) enrichment of quadruplex elements, compared to the rest of the genome. These quadruplex motifs identified in promoter regions also show an interesting structural bias towards more stable forms. These observations support the proposal that promoter G-quadruplexes are directly involved in the regulation of gene expression.

Journal ArticleDOI
23 Feb 2007-Cell
TL;DR: The functional relevance of spatial and temporal genome organization at three hierarchical levels: the organization of nuclear processes, the higher-order organization of the chromatin fiber, and the spatial arrangement of genomes within the cell nucleus are discussed.

Journal ArticleDOI
TL;DR: Through incorporation of multiple transcript and proteomic expression data sets, the Institute for Genomic Research has been able to annotate 24 799 genes (31 739 gene models), representing ∼50% of the total gene models, as expressed in the rice genome.
Abstract: In The Institute for Genomic Research Rice Genome Annotation project (http://rice.tigr.org), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42,653 non-transposable element-related genes encoding 49,472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13,237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31,739 gene models), representing approximately 50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.

Journal ArticleDOI
Vishvanath Nene1, Jennifer R. Wortman1, Daniel Lawson, Brian J. Haas1, Chinnappa D. Kodira2, Zhijian Jake Tu3, Brendan J. Loftus, Zhiyong Xi4, Karyn Megy, Manfred Grabherr2, Quinghu Ren1, Evgeny M. Zdobnov, Neil F. Lobo5, Kathryn S. Campbell6, Susan E. Brown7, Maria de Fatima Bonaldo8, Jingsong Zhu9, Steven P. Sinkins10, David G. Hogenkamp11, Paolo Amedeo1, Peter Arensburger9, Peter W. Atkinson9, Shelby L. Bidwell1, Jim Biedler3, Ewan Birney, Robert V. Bruggner5, Javier Costas, Monique R. Coy3, Jonathan Crabtree1, Matt Crawford2, Becky deBruyn5, David DeCaprio2, Karin Eiglmeier12, Eric Eisenstadt1, Hamza El-Dorry13, William M. Gelbart6, Suely Lopes Gomes13, Martin Hammond, Linda Hannick1, James R. Hogan5, Michael H. Holmes1, David M. Jaffe2, J. Spencer Johnston, Ryan C. Kennedy5, Hean Koo1, Saul A. Kravitz, Evgenia V. Kriventseva14, David Kulp15, Kurt LaButti2, Eduardo Lee1, Song Li3, Diane D. Lovin5, Chunhong Mao3, Evan Mauceli2, Carlos Frederico Martins Menck13, Jason R. Miller1, Philip Montgomery2, Akio Mori5, Ana L. T. O. Nascimento16, Horacio Naveira17, Chad Nusbaum2, Sinéad B. O'Leary2, Joshua Orvis1, Mihaela Pertea, Hadi Quesneville, Kyanne R. Reidenbach11, Yu-Hui Rogers, Charles Roth12, Jennifer R. Schneider5, Michael C. Schatz, Martin Shumway1, Mario Stanke, Eric O. Stinson5, Jose M. C. Tubio, Janice P. Vanzee11, Sergio Verjovski-Almeida13, Doreen Werner18, Owen White1, Stefan Wyder14, Qiandong Zeng2, Qi Zhao1, Yongmei Zhao1, Catherine A. Hill11, Alexander S. Raikhel9, Marcelo B. Soares8, Dennis L. Knudson7, Norman H. Lee, James E. Galagan2, Steven L. Salzberg, Ian T. Paulsen1, George Dimopoulos4, Frank H. Collins5, Bruce W. Birren2, Claire M. Fraser-Liggett, David W. Severson5 
22 Jun 2007-Science
TL;DR: A draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at approximately 1376 million base pairs is about 5 times the size of the genomes of the malaria vector Anopheles gambiae was presented in this paper.
Abstract: We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at approximately 1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements. These contribute to a factor of approximately 4 to 6 increase in average gene length and in sizes of intergenic regions relative to An. gambiae and Drosophila melanogaster. Nonetheless, chromosomal synteny is generally maintained among all three insects, although conservation of orthologous gene order is higher (by a factor of approximately 2) between the mosquito species than between either of them and the fruit fly. An increase in genes encoding odorant binding, cytochrome P450, and cuticle domains relative to An. gambiae suggests that members of these protein families underpin some of the biological differences between the two mosquito species.

Journal ArticleDOI
TL;DR: The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up to a full chromosome and includes assembly data, genes and gene predictions, mRNA and EST alignments, and comparative genomics, regulation, expression and variation data.
Abstract: The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up to a full chromosome and includes assembly data, genes and gene predictions, mRNA and EST alignments, and comparative genomics, regulation, expression and variation data. The database is optimized for fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. In the past year, 22 new assemblies and several new sets of human variation annotation have been released. New features include VisiGene, a fully integrated in situ hybridization image browser; phyloGif, for drawing evolutionary tree diagrams; a redesigned Custom Track feature; an expanded SNP annotation track; and many new display options. The Genome Browser, other tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.

Journal ArticleDOI
07 Jun 2007-Nature
TL;DR: A high-quality draft genome sequence of a small egg-laying freshwater teleost, medaka, revealed that eight major interchromosomal rearrangements took place in a remarkably short period of ∼50 Myr after the whole-genome duplication event in the teleost ancestor and afterwards, intriguingly, the medaka genome preserved its ancestral karyotype for more than 300‬Myr.
Abstract: The medaka fish (Oryzias latipes) is a popular pet in Japan and more recently a laboratory model organism for developmental genetics and evolutionary biology. Now the medaka's genome has been sequenced and analysed by a large Japanese consortium. Cichlids and stickleback, which are emerging model systems for understanding the genetic basis of vertebrate speciation, are evolutionarily closer to medaka than zebrafish, so the medaka's genome sequence will yield valuable insights into 400 million years of vertebrate genome evolution. The medaka fish (Oryzias latipes) has long been a popular pet in Japan and more recently a laboratory model organism; it now has its genome sequenced and analysed by a Japanese consortium. Teleosts comprise more than half of all vertebrate species and have adapted to a variety of marine and freshwater habitats1. Their genome evolution and diversification are important subjects for the understanding of vertebrate evolution. Although draft genome sequences of two pufferfishes have been published2,3, analysis of more fish genomes is desirable. Here we report a high-quality draft genome sequence of a small egg-laying freshwater teleost, medaka (Oryzias latipes). Medaka is native to East Asia and an excellent model system for a wide range of biology, including ecotoxicology, carcinogenesis, sex determination4,5,6 and developmental genetics7. In the assembled medaka genome (700 megabases), which is less than half of the zebrafish genome, we predicted 20,141 genes, including ∼2,900 new genes, using 5′-end serial analysis of gene expression tag information. We found single nucleotide polymorphisms (SNPs) at an average rate of 3.42% between the two inbred strains derived from two regional populations; this is the highest SNP rate seen in any vertebrate species. Analyses based on the dense SNP information show a strict genetic separation of 4 million years (Myr) between the two populations, and suggest that differential selective pressures acted on specific gene categories. Four-way comparisons with the human, pufferfish (Tetraodon), zebrafish and medaka genomes revealed that eight major interchromosomal rearrangements took place in a remarkably short period of ∼50 Myr after the whole-genome duplication event in the teleost ancestor and afterwards, intriguingly, the medaka genome preserved its ancestral karyotype for more than 300 Myr.

Book
01 Jan 2007
TL;DR: The Origin of Eukaryotes Genome Size and Organismal Complexity The Human Genome Why Population Size Matters Three Keys to Chromosomal Integrity The Nucleotide-composition Landscape Mobile Genetic Elements Genomic Expansion by Gene Duplication Genes in Pieces Transcription and Regulatory-region Complexity Expansion and Contraction of Organelle Genomes
Abstract: The Origin of Eukaryotes Genome Size and Organismal Complexity The Human Genome Why Population Size Matters Three Keys to Chromosomal Integrity The Nucleotide-composition Landscape Mobile Genetic Elements Genomic Expansion by Gene Duplication Genes in Pieces Transcription and Regulatory-region Complexity Expansion and Contraction of Organelle Genomes Sex Chromosome Evolution Genomfart

Journal ArticleDOI
TL;DR: This review focuses on DNA-mediated or class 2 transposons and emphasizes how this class of elements is distinguished from other types of mobile elements in terms of their structure, amplification dynamics, and genomic effect.
Abstract: Transposable elements are mobile genetic units that exhibit broad diversity in their structure and transposition mechanisms. Transposable elements occupy a large fraction of many eukaryotic genomes and their movement and accumulation represent a major force shaping the genes and genomes of almost all organisms. This review focuses on DNA-mediated or class 2 transposons and emphasizes how this class of elements is distinguished from other types of mobile elements in terms of their structure, amplification dynamics, and genomic effect. We provide an up-to-date outlook on the diversity and taxonomic distribution of all major types of DNA transposons in eukaryotes, including Helitrons and Mavericks. We discuss some of the evolutionary forces that influence their maintenance and diversification in various genomic environments. Finally, we highlight how the distinctive biological features of DNA transposons have contributed to shape genome architecture and led to the emergence of genetic innovations in different eukaryotic lineages.

Journal ArticleDOI
19 Dec 2007-PLOS ONE
TL;DR: A high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens.
Abstract: Background. Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings. We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitisspecific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions. Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape.

Journal ArticleDOI
TL;DR: Phylogenetic trees from multiple methods provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales, and the plastid genome trees also provide strongSupport for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids.
Abstract: Angiosperms are the largest and most successful clade of land plants with >250,000 species distributed in nearly every terrestrial habitat. Many phylogenetic studies have been based on DNA sequences of one to several genes, but, despite decades of intensive efforts, relationships among early diverging lineages and several of the major clades remain either incompletely resolved or weakly supported. We performed phylogenetic analyses of 81 plastid genes in 64 sequenced genomes, including 13 new genomes, to estimate relationships among the major angiosperm clades, and the resulting trees are used to examine the evolution of gene and intron content. Phylogenetic trees from multiple methods, including model-based approaches, provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales. The plastid genome trees also provide strong support for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids. Resolution of relationships among the major clades of angiosperms provides the necessary framework for addressing numerous evolutionary questions regarding the rapid diversification of angiosperms. Gene and intron content are highly conserved among the early diverging angiosperms and basal eudicots, but 62 independent gene and intron losses are limited to the more derived monocot and eudicot clades. Moreover, a lineage-specific correlation was detected between rates of nucleotide substitutions, indels, and genomic rearrangements.