scispace - formally typeset
Search or ask a question

Showing papers on "Genome published in 2010"


Journal ArticleDOI
04 Mar 2010-Nature
TL;DR: The Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals are described, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species.
Abstract: To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively

9,268 citations


Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

3,743 citations


Journal ArticleDOI
07 May 2010-Science
TL;DR: The genomic data suggest that Neandertals mixed with modern human ancestors some 120,000 years ago, leaving traces of Ne andertal DNA in contemporary humans, suggesting that gene flow from Neand Bertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.
Abstract: Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.

3,575 citations


Journal ArticleDOI
25 Jun 2010-PLOS ONE
TL;DR: A new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss is described, demonstrating high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental loss and loss.
Abstract: Background Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.

3,302 citations


Journal ArticleDOI
Andre Franke1, Dermot P.B. McGovern2, Jeffrey C. Barrett3, Kai Wang4, Graham L. Radford-Smith5, Tariq Ahmad6, Charlie W. Lees7, Tobias Balschun1, James Lee8, Rebecca L. Roberts9, Carl A. Anderson3, Joshua C. Bis10, Suzanne Bumpstead3, David Ellinghaus1, Eleonora M. Festen11, Michel Georges12, Todd Green13, Talin Haritunians2, Luke Jostins3, Anna Latiano14, Christopher G. Mathew15, Grant W. Montgomery5, Natalie J. Prescott15, Soumya Raychaudhuri13, Jerome I. Rotter2, Philip Schumm16, Yashoda Sharma17, Lisa A. Simms5, Kent D. Taylor2, David C. Whiteman5, Cisca Wijmenga11, Robert N. Baldassano4, Murray L. Barclay9, Theodore M. Bayless18, Stephan Brand19, Carsten Büning20, Albert Cohen21, Jean Frederick Colombel22, Mario Cottone, Laura Stronati, Ted Denson23, Martine De Vos24, Renata D'Incà, Marla Dubinsky2, Cathryn Edwards25, Timothy H. Florin26, Denis Franchimont27, Richard B. Gearry9, Jürgen Glas19, Jürgen Glas28, Jürgen Glas22, André Van Gossum27, Stephen L. Guthery29, Jonas Halfvarson30, Hein W. Verspaget31, Jean-Pierre Hugot32, Amir Karban33, Debby Laukens24, Ian C. Lawrance34, Marc Lémann32, Arie Levine35, Cécile Libioulle12, Edouard Louis12, Craig Mowat36, William G. Newman37, Julián Panés, Anne M. Phillips36, Deborah D. Proctor17, Miguel Regueiro38, Richard K Russell39, Paul Rutgeerts40, Jeremy D. Sanderson41, Miquel Sans, Frank Seibold42, A. Hillary Steinhart43, Pieter C. F. Stokkers44, Leif Törkvist45, Gerd A. Kullak-Ublick46, David C. Wilson7, Thomas D. Walters43, Stephan R. Targan2, Steven R. Brant18, John D. Rioux47, Mauro D'Amato45, Rinse K. Weersma11, Subra Kugathasan48, Anne M. Griffiths43, John C. Mansfield49, Severine Vermeire40, Richard H. Duerr38, Mark S. Silverberg43, Jack Satsangi7, Stefan Schreiber1, Judy H. Cho17, Vito Annese14, Hakon Hakonarson4, Mark J. Daly13, Miles Parkes8 
TL;DR: A meta-analysis of six Crohn's disease genome-wide association studies and a series of in silico analyses highlighted particular genes within these loci implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP.
Abstract: We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.

2,482 citations


Journal ArticleDOI
02 Jul 2010-Science
TL;DR: The design, synthesis, and assembly of the 1.08–mega–base pair Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information and its transplantation into a M. capricolum recipient cell to create new cells that are controlled only by the synthetic chromosome are reported.
Abstract: We report the design, synthesis, and assembly of the 1.08-mega-base pair Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information and its transplantation into a M. capricolum recipient cell to create new M. mycoides cells that are controlled only by the synthetic chromosome. The only DNA in the cells is the designed synthetic DNA sequence, including "watermark" sequences and other designed gene deletions and polymorphisms, and mutations acquired during the building process. The new cells have expected phenotypic properties and are capable of continuous self-replication.

2,256 citations


Journal ArticleDOI
TL;DR: The new disease/drug information resource named KEGG MEDICUS can be used as a reference knowledge base for computational analysis of molecular networks, especially, by integrating large-scale experimental datasets.
Abstract: Most human diseases are complex multi-factorial diseases resulting from the combination of various genetic and environmental factors. In the KEGG database resource (http://www.genome.jp/kegg/), diseases are viewed as perturbed states of the molecular system, and drugs as perturbants to the molecular system. Disease information is computerized in two forms: pathway maps and gene/ molecule lists. The KEGG PATHWAY database contains pathway maps for the molecular systems in both normal and perturbed states. In the KEGG DISEASE database, each disease is represented by a list of known disease genes, any known environmental factors at the molecular level, diagnostic markers and therapeutic drugs, which may reflect the underlying molecular system. The KEGG DRUG database contains chemical structures and/or chemical components of all drugs in Japan, including crude drugs and TCM (Traditional Chinese Medicine) formulas, and drugs in the USA and Europe. This database also captures knowledge about two types of molecular networks: the interaction network with target molecules, metabolizing enzymes, other drugs, etc. and the chemical structure transformation network in the history of drug development. The new disease/drug information resource named KEGG MEDICUS can be used as a reference knowledge base for computational analysis of molecular networks, especially, by integrating large-scale experimental datasets.

2,181 citations


Journal ArticleDOI
TL;DR: A broad range of outcomes has resulted from the application of the same core technology: targeted genome cleavage by engineered, sequence-specific zinc finger nucleases followed by gene modification during subsequent repair.
Abstract: Reverse genetics in model organisms such as Drosophila melanogaster, Arabidopsis thaliana, zebrafish and rats, efficient genome engineering in human embryonic stem and induced pluripotent stem cells, targeted integration in crop plants, and HIV resistance in immune cells - this broad range of outcomes has resulted from the application of the same core technology: targeted genome cleavage by engineered, sequence-specific zinc finger nucleases followed by gene modification during subsequent repair. Such 'genome editing' is now established in human cells and a number of model organisms, thus opening the door to a range of new experimental and therapeutic possibilities.

2,074 citations


Journal ArticleDOI
Thomas J. Hudson1, Thomas J. Hudson2, Warwick Anderson3, Axel Aretz4  +270 moreInstitutions (92)
15 Apr 2010
TL;DR: Systematic studies of more than 25,000 cancer genomes will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
Abstract: The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

2,041 citations


Journal ArticleDOI
01 Apr 2010-Nature
TL;DR: It is concluded that the heritability void left by genome-wide association studies will not be accounted for by common CNVs, and 30 loci with CNVs that are candidates for influencing disease susceptibility are identified.
Abstract: Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.

1,892 citations


Journal ArticleDOI
TL;DR: This protocol describes a fast and reliable method for the preparation of barcoded ("indexed") sequencing libraries for Illumina's Genome Analyzer platform, which avoids expensive library preparation kits and can be performed in a 96-well plate setup using multi-channel pipettes, requiring not more than two or three days of lab work.
Abstract: The large amount of DNA sequence data generated by high-throughput sequencing technologies often allows multiple samples to be sequenced in parallel on a single sequencing run. This is particularly true if subsets of the genome are studied rather than complete genomes. In recent years, target capture from sequencing libraries has largely replaced polymerase chain reaction (PCR) as the preferred method of target enrichment. Parallelizing target capture and sequencing for multiple samples requires the incorporation of sample-specific barcodes into sequencing libraries, which is necessary to trace back the sample source of each sequence. This protocol describes a fast and reliable method for the preparation of barcoded ("indexed") sequencing libraries for Illumina's Genome Analyzer platform. The protocol avoids expensive commercial library preparation kits and can be performed in a 96-well plate setup using multi-channel pipettes, requiring not more than two or three days of lab work. Libraries can be prepared from any type of double-stranded DNA, even if present in subnanogram quantity.

Journal ArticleDOI
06 Aug 2010-Cell
TL;DR: A model whereby transcription factors activate lincRNAs that serve as key repressors by physically associating with repressive complexes and modulate their localization to sets of previously active genes is proposed.

Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: The genomes of a malignant melanoma and a lymphoblastoid cell line from the same person are sequenced, providing the first comprehensive catalogue of somatic mutations from an individual cancer.
Abstract: All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.

Journal ArticleDOI
John P. Vogel1, David F. Garvin2, Todd C. Mockler2, Jeremy Schmutz, Daniel S. Rokhsar3, Michael W. Bevan4, Kerrie Barry5, Susan Lucas5, Miranda Harmon-Smith5, Kathleen Lail5, Hope Tice5, Jane Grimwood, Neil McKenzie4, Naxin Huo6, Yong Q. Gu6, Gerard R. Lazo6, Olin D. Anderson6, Frank M. You7, Ming-Cheng Luo7, Jan Dvorak7, Jonathan M. Wright4, Melanie Febrer4, Dominika Idziak8, Robert Hasterok8, Erika Lindquist5, Mei Wang5, Samuel E. Fox2, Henry D. Priest2, Sergei A. Filichkin2, Scott A. Givan2, Douglas W. Bryant2, Jeff H. Chang2, Haiyan Wu9, Wei Wu10, An-Ping Hsia10, Patrick S. Schnable9, Anantharaman Kalyanaraman11, Brad Barbazuk12, Todd P. Michael, Samuel P. Hazen13, Jennifer N. Bragg6, Debbie Laudencia-Chingcuanco6, Yiqun Weng14, Georg Haberer, Manuel Spannagl, Klaus F. X. Mayer, Thomas Rattei15, Therese Mitros3, Sang-Jik Lee16, Jocelyn K. C. Rose16, Lukas A. Mueller16, Thomas L. York16, Thomas Wicker17, Jan P. Buchmann17, Jaakko Tanskanen18, Alan H. Schulman18, Heidrun Gundlach, Michael W. Bevan4, Antonio Costa de Oliveira19, Luciano da C. Maia19, William R. Belknap6, Ning Jiang, Jinsheng Lai9, Liucun Zhu20, Jianxin Ma20, Cheng Sun21, Ellen J. Pritham21, Jérôme Salse, Florent Murat, Michael Abrouk, Rémy Bruggmann, Joachim Messing, Noah Fahlgren2, Christopher M. Sullivan2, James C. Carrington2, Elisabeth J. Chapman, Greg D. May22, Jixian Zhai23, Matthias Ganssmann23, Sai Guna Ranjan Gurazada23, Marcelo A German23, Blake C. Meyers23, Pamela J. Green23, Ludmila Tyler3, Jiajie Wu7, James A. Thomson6, Shan Chen13, Henrik Vibe Scheller24, Jesper Harholt25, Peter Ulvskov25, Jeffrey A. Kimbrel2, Laura E. Bartley24, Peijian Cao24, Ki-Hong Jung26, Manoj Sharma24, Miguel E. Vega-Sánchez24, Pamela C. Ronald24, Chris Dardick6, Stefanie De Bodt27, Wim Verelst27, Dirk Inzé27, Maren Heese28, Arp Schnittger28, Xiaohan Yang29, Udaya C. Kalluri29, Gerald A. Tuskan29, Zhihua Hua14, Richard D. Vierstra14, Yu Cui9, Shuhong Ouyang9, Qixin Sun9, Zhiyong Liu9, Alper Yilmaz30, Erich Grotewold30, Richard Sibout31, Kian Hématy31, Grégory Mouille31, Herman Höfte31, Todd P. Michael, Jérôme Pelloux32, Devin O'Connor3, James C. Schnable3, Scott C. Rowe3, Frank G. Harmon3, Cynthia L. Cass33, John C. Sedbrook33, Mary E. Byrne4, Sean Walsh4, Janet Higgins4, Pinghua Li16, Thomas P. Brutnell16, Turgay Unver34, Hikmet Budak34, Harry Belcram, Mathieu Charles, Boulos Chalhoub, Ivan Baxter35 
11 Feb 2010-Nature
TL;DR: The high-quality genome sequence will help Brachypodium reach its potential as an important model system for developing new energy and food crops and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat.
Abstract: Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

Journal ArticleDOI
03 Jun 2010-Nature
TL;DR: This study demonstrates the feasibility of GWA studies in A. thaliana and suggests that the approach will be appropriate for many other organisms, particularly when inbred lines are available.
Abstract: Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases, genome-wide association (GWA) studies have, owing to advances in genotyping and sequencing technology, become an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available, because once these lines have been genotyped they can be phenotyped multiple times, making it possible (as well as extremely cost effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly self-fertilizing model plant known to harbour considerable genetic variation for many adaptively important traits. Our results are dramatically different from those of human GWA studies, in that we identify many common alleles of major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true associations from false. However, a-priori candidates are significantly over-represented among these associations as well, making many of them excellent candidates for follow-up experiments. Our study demonstrates the feasibility of GWA studies in A. thaliana and suggests that the approach will be appropriate for many other organisms.

Journal ArticleDOI
TL;DR: There is such a diversity of DNA methylation profiling techniques that it can be challenging to select one, and this Review discusses the different approaches and their relative merits and introduces considerations for data analysis.
Abstract: Methylation of cytosine bases in DNA provides a layer of epigenetic control in many eukaryotes that has important implications for normal biology and disease. Therefore, profiling DNA methylation across the genome is vital to understanding the influence of epigenetics. There has been a revolution in DNA methylation analysis technology over the past decade: analyses that previously were restricted to specific loci can now be performed on a genome-scale and entire methylomes can be characterized at single-base-pair resolution. However, there is such a diversity of DNA methylation profiling techniques that it can be challenging to select one. This Review discusses the different approaches and their relative merits and introduces considerations for data analysis.

Journal ArticleDOI
TL;DR: GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments, and is predicted to predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences.
Abstract: Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments.

Journal ArticleDOI
18 Mar 2010-Nature
TL;DR: Comparison of genomes of three phenotypically diverse Fusarium species revealed lineage-specific genomic regions in F. oxysporum that include four entire chromosomes and account for more than one-quarter of the genome, putting the evolution of fungal pathogenicity into a new perspective.
Abstract: Fusarium species are among the most important phytopathogenic and toxigenic fungi. To understand the molecular underpinnings of pathogenicity in the genus Fusarium, we compared the genomes of three phenotypically diverse species: Fusarium graminearum, Fusarium verticillioides and Fusarium oxysporum f. sp. lycopersici. Our analysis revealed lineage-specific (LS) genomic regions in F. oxysporum that include four entire chromosomes and account for more than one-quarter of the genome. LS regions are rich in transposons and genes with distinct evolutionary profiles but related to pathogenicity, indicative of horizontal acquisition. Experimentally, we demonstrate the transfer of two LS chromosomes between strains of F. oxysporum, converting a non-pathogenic strain into a pathogen. Transfer of LS chromosomes between otherwise genetically isolated strains explains the polyphyletic origin of host specificity and the emergence of new pathogenic lineages in F. oxysporum. These findings put the evolution of fungal pathogenicity into a new perspective.

Journal ArticleDOI
01 Jan 2010-Science
TL;DR: A genome sequencing platform that achieves efficient imaging and low reagent consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays of self-assembling DNA nanoballs is described.
Abstract: Genome sequencing of large numbers of individuals promises to advance the understanding, treatment, and prevention of human diseases, among other applications. We describe a genome sequencing platform that achieves efficient imaging and low reagent consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays of self-assembling DNA nanoballs. We sequenced three human genomes with this platform, generating an average of 45- to 87-fold coverage per genome and identifying 3.2 to 4.5 million sequence variants per genome. Validation of one genome data set demonstrates a sequence accuracy of about 1 false variant per 100 kilobases. The high accuracy, affordable cost of $4400 for sequencing consumables, and scalability of this platform enable complete human genome sequencing for the detection of rare variants in large-scale genetic studies.

Journal ArticleDOI
Stephen Richards1, Richard A. Gibbs1, Nicole M. Gerardo2, Nancy A. Moran3  +220 moreInstitutions (58)
TL;DR: The genome of the pea aphid shows remarkable levels of gene duplication and equally remarkable gene absences that shed light on aspects of aphid biology, most especially its symbiosis with Buchnera.
Abstract: Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.

Journal ArticleDOI
TL;DR: This work investigates state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH and finds that some distance formulas are very robust against missing fractions of genomic information.
Abstract: The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.

Journal ArticleDOI
TL;DR: An algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities and its accuracy is described and several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.
Abstract: We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes.

Journal ArticleDOI
TL;DR: This Review focuses on the methodological considerations for characterizing somatic genome alterations in cancer and the future prospects for these approaches.
Abstract: Cancer is fundamentally a disease of the genome and so high-throughput sequencing technologies offer great potential for improving our understanding of the biology and treatment of cancer Experimental strategies, computational approaches and cancer-specific considerations for detecting different types of genomic alterations are discussed Cancers are caused by the accumulation of genomic alterations Therefore, analyses of cancer genome sequences and structures provide insights for understanding cancer biology, diagnosis and therapy The application of second-generation DNA sequencing technologies (also known as next-generation sequencing) — through whole-genome, whole-exome and whole-transcriptome approaches — is allowing substantial advances in cancer genomics These methods are facilitating an increase in the efficiency and resolution of detection of each of the principal types of somatic cancer genome alterations, including nucleotide substitutions, small insertions and deletions, copy number alterations, chromosomal rearrangements and microbial infections This Review focuses on the methodological considerations for characterizing somatic genome alterations in cancer and the future prospects for these approaches

Journal ArticleDOI
Ruiqiang Li, Wei Fan, Geng Tian1, Hongmei Zhu, Lin He2, Lin He3, Jing Cai4, Jing Cai1, Quanfei Huang, Qingle Cai5, Bo Li, Yinqi Bai, Zhihe Zhang6, Ya-Ping Zhang4, Wen Wang4, Jun Li, Fuwen Wei1, Heng Li7, Min Jian, Jianwen Li, Zhaolei Zhang8, Rasmus Nielsen9, Dawei Li, Wanjun Gu10, Zhentao Yang, Zhaoling Xuan, Oliver A. Ryder, Frederick C. Leung11, Yan Zhou, Jianjun Cao, Xiao Sun10, Yonggui Fu12, Xiaodong Fang, Xiaosen Guo, Bo Wang, Rong Hou6, Fujun Shen6, Bo Mu, Peixiang Ni, Runmao Lin, Wubin Qian, Guo-Dong Wang4, Guo-Dong Wang1, Chang Yu, Wenhui Nie4, Jinhuan Wang4, Zhigang Wu, Huiqing Liang, Jiumeng Min5, Qi Wu1, Shifeng Cheng5, Jue Ruan1, Mingwei Wang, Zhongbin Shi, Ming Wen, Binghang Liu, Xiaoli Ren, Huisong Zheng, Dong Dong8, Kathleen Cook8, Gao Shan, Hao Zhang, Carolin Kosiol13, Xueying Xie10, Zuhong Lu10, Hancheng Zheng, Yingrui Li1, Cynthia C. Steiner, Tommy Tsan-Yuk Lam11, Siyuan Lin, Qinghui Zhang, Guoqing Li, Jing Tian, Timing Gong, Hongde Liu10, Dejin Zhang10, Lin Fang, Chen Ye, Juanbin Zhang, Wenbo Hu12, Anlong Xu12, Yuanyuan Ren, Guojie Zhang1, Guojie Zhang4, Michael William Bruford14, Qibin Li1, Lijia Ma1, Yiran Guo1, Na An, Yujie Hu1, Yang Zheng1, Yongyong Shi2, Zhiqiang Li2, Qing Liu, Yanling Chen, Jing Zhao, Ning Qu5, Shancen Zhao, Feng Tian, Xiaoling Wang, Haiyin Wang, Lizhi Xu, Xiao Liu, Tomas Vinar15, Yajun Wang16, Tak-Wah Lam11, Siu-Ming Yiu11, Shiping Liu17, Hemin Zhang, Desheng Li, Yan Huang, Xia Wang, Guohua Yang, Zhi Jiang, Junyi Wang, Nan Qin, Li Li, Jingxiang Li, Lars Bolund, Karsten Kristiansen18, Gane Ka-Shu Wong19, Maynard V. Olson20, Xiuqing Zhang, Songgang Li, Huanming Yang, Jing Wang, Jun Wang18 
21 Jan 2010-Nature
TL;DR: Using next-generation sequencing technology alone, a draft sequence of the giant panda genome is generated and assembled, indicating that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition.
Abstract: Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.

Journal ArticleDOI
TL;DR: Phenomics should be recognized and pursued as an independent discipline to enable the development and adoption of high-throughput and high-dimensional phenotyping.
Abstract: A key goal of biology is to understand phenotypic characteristics, such as health, disease and evolutionary fitness. Phenotypic variation is produced through a complex web of interactions between genotype and environment, and such a 'genotype-phenotype' map is inaccessible without the detailed phenotypic data that allow these interactions to be studied. Despite this need, our ability to characterize phenomes - the full set of phenotypes of an individual - lags behind our ability to characterize genomes. Phenomics should be recognized and pursued as an independent discipline to enable the development and adoption of high-throughput and high-dimensional phenotyping.

Journal ArticleDOI
Sushmita Roy1, Jason Ernst1, Peter V. Kharchenko2, Pouya Kheradpour1, Nicolas Nègre3, Matthew L. Eaton4, Jane M. Landolin5, Christopher A. Bristow1, Lijia Ma3, Michael F. Lin1, Stefan Washietl6, Bradley I. Arshinoff7, Ferhat Ay8, Patrick E. Meyer9, Nicolas Robine10, Nicole L. Washington5, Luisa Di Stefano2, Eugene Berezikov11, Christopher D. Brown3, Rogerio Candeias6, Joseph W. Carlson5, Adrian Carr12, Irwin Jungreis1, Daniel Marbach1, Rachel Sealfon1, Michael Y. Tolstorukov2, Sebastian Will6, Artyom A. Alekseyenko2, Carlo G. Artieri13, Benjamin W. Booth5, Angela N. Brooks14, Qi Dai10, Carrie A. Davis15, Michael O. Duff16, X. Feng, Andrey A. Gorchakov2, Tingting Gu17, Jorja G. Henikoff10, Philipp Kapranov18, Renhua Li13, Heather K. MacAlpine4, John H. Malone13, Aki Minoda5, Jared T. Nordman6, Katsutomo Okamura10, Marc D. Perry7, Sara K. Powell4, Nicole C. Riddle17, Akiko Sakai2, Anastasia Samsonova2, Jeremy E. Sandler5, Yuri B. Schwartz2, Noa Sher6, Rebecca Spokony3, David Sturgill13, Marijke J. van Baren17, Kenneth H. Wan5, Li Yang16, Charles Yu5, Elise A. Feingold13, Peter J. Good13, Mark S. Guyer13, Rebecca F. Lowdon13, Kami Ahmad2, Justen Andrews19, Bonnie Berger1, Steven E. Brenner14, Michael R. Brent17, Lucy Cherbas19, Sarah C. R. Elgin17, Thomas R. Gingeras18, Robert L. Grossman3, Roger A. Hoskins5, Thomas C. Kaufman19, W. J. Kent20, Mitzi I. Kuroda2, Terry L. Orr-Weaver6, Norbert Perrimon2, Vincenzo Pirrotta21, James W. Posakony22, Bing Ren22, Steven Russell12, Peter Cherbas19, Brenton R. Graveley16, Suzanna E. Lewis5, Gos Micklem12, Brian Oliver13, Peter J. Park2, Susan E. Celniker5, Steven Henikoff23, Gary H. Karpen14, Eric C. Lai10, David M. MacAlpine4, Lincoln Stein7, Kevin P. White3, Manolis Kellis1 
24 Dec 2010-Science
TL;DR: The Drosophila Encyclopedia of DNA Elements (modENCODE) project as mentioned in this paper has been used to map transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines.
Abstract: To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.

Journal ArticleDOI
11 Mar 2010-Nature
TL;DR: A novel differential approach selective for the 5′ end of primary transcripts is presented, establishing a paradigm for mapping and annotating the primary transcriptomes of many living species and discovering hundreds of transcriptional start sites within operons, and opposite to annotated genes.
Abstract: Genome sequencing of Helicobacter pylori has revealed the potential proteins and genetic diversity of this prevalent human pathogen, yet little is known about its transcriptional organization and noncoding RNA output. Massively parallel cDNA sequencing (RNA-seq) has been revolutionizing global transcriptomic analysis. Here, using a novel differential approach (dRNA-seq) selective for the 5' end of primary transcripts, we present a genome-wide map of H. pylori transcriptional start sites and operons. We discovered hundreds of transcriptional start sites within operons, and opposite to annotated genes, indicating that complexity of gene expression from the small H. pylori genome is increased by uncoupling of polycistrons and by genome-wide antisense transcription. We also discovered an unexpected number of approximately 60 small RNAs including the epsilon-subdivision counterpart of the regulatory 6S RNA and associated RNA products, and potential regulators of cis- and trans-encoded target messenger RNAs. Our approach establishes a paradigm for mapping and annotating the primary transcriptomes of many living species.

Journal ArticleDOI
30 Apr 2010-Science
TL;DR: Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four and demonstrate the value of complete genome sequencing in families.
Abstract: We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.

Journal ArticleDOI
20 May 2010-Nature
TL;DR: A method to globally capture intra- and inter-chromosomal interactions is developed and applied to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae, which recapitulates known features of genome organization, thereby validating the method, and identifies new features.
Abstract: Layered on top of information conveyed by DNA sequence and chromatin are higher order structures that encompass portions of chromosomes, entire chromosomes, and even whole genomes. Interphase chromosomes are not positioned randomly within the nucleus, but instead adopt preferred conformations. Disparate DNA elements co-localize into functionally defined aggregates or 'factories' for transcription and DNA replication. In budding yeast, Drosophila and many other eukaryotes, chromosomes adopt a Rabl configuration, with arms extending from centromeres adjacent to the spindle pole body to telomeres that abut the nuclear envelope. Nonetheless, the topologies and spatial relationships of chromosomes remain poorly understood. Here we developed a method to globally capture intra- and inter-chromosomal interactions, and applied it to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae. The map recapitulates known features of genome organization, thereby validating the method, and identifies new features. Extensive regional and higher order folding of individual chromosomes is observed. Chromosome XII exhibits a striking conformation that implicates the nucleolus as a formidable barrier to interaction between DNA sequences at either end. Inter-chromosomal contacts are anchored by centromeres and include interactions among transfer RNA genes, among origins of early DNA replication and among sites where chromosomal breakpoints occur. Finally, we constructed a three-dimensional model of the yeast genome. Our findings provide a glimpse of the interface between the form and function of a eukaryotic genome.

Journal ArticleDOI
TL;DR: Recent progress in abiotic stress studies, especially in the post-genomic era, is summarized, new perspectives on research directions for the next decade are offered, and the availability of the complete genome sequence has facilitated access to essential information.
Abstract: Understanding abiotic stress responses in plants is an important and challenging topic in plant research. Physiological and molecular biological analyses have allowed us to draw a picture of abiotic stress responses in various plants, and determination of the Arabidopsis genome sequence has had a great impact on this research field. The availability of the complete genome sequence has facilitated access to essential information for all genes, e.g. gene products and their function, transcript levels, putative cis-regulatory elements, and alternative splicing patterns. These data have been obtained from comprehensive transcriptome analyses and studies using full-length cDNA collections and T-DNA- or transposon-tagged mutant lines, which were also enhanced by genome sequence information. Moreover, studies on novel regulatory mechanisms involving use of small RNA molecules, chromatin modulation and genomic DNA modification have enabled us to recognize that plants have evolved complicated and sophisticated systems in response to complex abiotic stresses. Integrated data obtained with various 'omics' approaches have provided a more comprehensive picture of abiotic stress responses. In addition, research on stress responses in various plant species other than Arabidopsis has increased our knowledge regarding the mechanisms of plant stress tolerance in nature. Based on this progress, improvements in crop stress tolerance have been attempted by means of gene transfer and marker-assisted breeding. In this review, we summarize recent progress in abiotic stress studies, especially in the post-genomic era, and offer new perspectives on research directions for the next decade.