scispace - formally typeset
Search or ask a question

Showing papers by "J. Craig Venter Institute published in 2007"


Journal ArticleDOI
29 Jun 2007-Cell
TL;DR: A relatively small set of miRNAs, many of which are ubiquitously expressed, account for most of the differences in miRNA profiles between cell lineages and tissues.

3,687 citations


Journal ArticleDOI
Andrew G. Clark1, Michael B. Eisen2, Michael B. Eisen3, Douglas Smith  +426 moreInstitutions (70)
08 Nov 2007-Nature
TL;DR: These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution.
Abstract: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

2,057 citations


Journal ArticleDOI
TL;DR: A metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition, which yielded an extensive dataset consisting of 7.7 million sequencing reads.
Abstract: The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed "fragment recruitment," addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed "extreme assembly," made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.

1,982 citations


Journal ArticleDOI
TL;DR: A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.
Abstract: Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

1,843 citations


Journal ArticleDOI
13 Apr 2007-Science
TL;DR: The genome sequence of an Indian-origin Macaca mulatta female is determined and compared with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families.
Abstract: The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.

1,297 citations


Journal ArticleDOI
Vishvanath Nene1, Jennifer R. Wortman1, Daniel Lawson, Brian J. Haas1, Chinnappa D. Kodira2, Zhijian Jake Tu3, Brendan J. Loftus, Zhiyong Xi4, Karyn Megy, Manfred Grabherr2, Quinghu Ren1, Evgeny M. Zdobnov, Neil F. Lobo5, Kathryn S. Campbell6, Susan E. Brown7, Maria de Fatima Bonaldo8, Jingsong Zhu9, Steven P. Sinkins10, David G. Hogenkamp11, Paolo Amedeo1, Peter Arensburger9, Peter W. Atkinson9, Shelby L. Bidwell1, Jim Biedler3, Ewan Birney, Robert V. Bruggner5, Javier Costas, Monique R. Coy3, Jonathan Crabtree1, Matt Crawford2, Becky deBruyn5, David DeCaprio2, Karin Eiglmeier12, Eric Eisenstadt1, Hamza El-Dorry13, William M. Gelbart6, Suely Lopes Gomes13, Martin Hammond, Linda Hannick1, James R. Hogan5, Michael H. Holmes1, David M. Jaffe2, J. Spencer Johnston, Ryan C. Kennedy5, Hean Koo1, Saul A. Kravitz, Evgenia V. Kriventseva14, David Kulp15, Kurt LaButti2, Eduardo Lee1, Song Li3, Diane D. Lovin5, Chunhong Mao3, Evan Mauceli2, Carlos Frederico Martins Menck13, Jason R. Miller1, Philip Montgomery2, Akio Mori5, Ana L. T. O. Nascimento16, Horacio Naveira17, Chad Nusbaum2, Sinéad B. O'Leary2, Joshua Orvis1, Mihaela Pertea, Hadi Quesneville, Kyanne R. Reidenbach11, Yu-Hui Rogers, Charles Roth12, Jennifer R. Schneider5, Michael C. Schatz, Martin Shumway1, Mario Stanke, Eric O. Stinson5, Jose M. C. Tubio, Janice P. Vanzee11, Sergio Verjovski-Almeida13, Doreen Werner18, Owen White1, Stefan Wyder14, Qiandong Zeng2, Qi Zhao1, Yongmei Zhao1, Catherine A. Hill11, Alexander S. Raikhel9, Marcelo B. Soares8, Dennis L. Knudson7, Norman H. Lee, James E. Galagan2, Steven L. Salzberg, Ian T. Paulsen1, George Dimopoulos4, Frank H. Collins5, Bruce W. Birren2, Claire M. Fraser-Liggett, David W. Severson5 
22 Jun 2007-Science
TL;DR: A draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at approximately 1376 million base pairs is about 5 times the size of the genomes of the malaria vector Anopheles gambiae was presented in this paper.
Abstract: We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at approximately 1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements. These contribute to a factor of approximately 4 to 6 increase in average gene length and in sizes of intergenic regions relative to An. gambiae and Drosophila melanogaster. Nonetheless, chromosomal synteny is generally maintained among all three insects, although conservation of orthologous gene order is higher (by a factor of approximately 2) between the mosquito species than between either of them and the fruit fly. An increase in genes encoding odorant binding, cytochrome P450, and cuticle domains relative to An. gambiae suggests that members of these protein families underpin some of the biological differences between the two mosquito species.

1,107 citations


Journal ArticleDOI
TL;DR: This work used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling sequences to add a great deal of diversity to known protein families and shed light on their evolution.
Abstract: Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

871 citations


Journal ArticleDOI
21 Sep 2007-Science
TL;DR: It is shown that some of these inserted Wolbachia genes are transcribed within eukaryotic cells lacking endosymbionts, potentially providing a mechanism for acquisition of new genes and functions.
Abstract: Although common among bacteria, lateral gene transfer-the movement of genes between distantly related organisms-is thought to occur only rarely between bacteria and multicellular eukaryotes. However, the presence of endosymbionts, such as Wolbachia pipientis, within some eukaryotic germlines may facilitate bacterial gene transfers to eukaryotic host genomes. We therefore examined host genomes for evidence of gene transfer events from Wolbachia bacteria to their hosts. We found and confirmed transfers into the genomes of four insect and four nematode species that range from nearly the entire Wolbachia genome (>1 megabase) to short (<500 base pairs) insertions. Potential Wolbachia-to-host transfers were also detected computationally in three additional sequenced insect genomes. We also show that some of these inserted Wolbachia genes are transcribed within eukaryotic cells lacking endosymbionts. Therefore, heritable lateral gene transfer occurs into eukaryotic hosts from their prokaryote symbionts, potentially providing a mechanism for acquisition of new genes and functions.

772 citations


Journal ArticleDOI
12 Jan 2007-Science
TL;DR: The genome sequence of the protist Trichomonas vaginalis predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.
Abstract: We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the similar to 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.

751 citations


Journal ArticleDOI
TL;DR: It is speculated that this latter process may be involved in altering the cell-surface characteristics of each species, and selenoenzymes, novel fusion proteins, and loss of some major protein families including ones associated with chromatin are likely important adaptations for achieving a small cell size.
Abstract: The smallest known eukaryotes, at ≈1-μm diameter, are Ostreococcus tauri and related species of marine phytoplankton. The genome of Ostreococcus lucimarinus has been completed and compared with that of O. tauri. This comparison reveals surprising differences across orthologous chromosomes in the two species from highly syntenic chromosomes in most cases to chromosomes with almost no similarity. Species divergence in these phytoplankton is occurring through multiple mechanisms acting differently on different chromosomes and likely including acquisition of new genes through horizontal gene transfer. We speculate that this latter process may be involved in altering the cell-surface characteristics of each species. In addition, the genome of O. lucimarinus provides insights into the unique metal metabolism of these organisms, which are predicted to have a large number of selenocysteine-containing proteins. Selenoenzymes are more catalytically active than similar enzymes lacking selenium, and thus the cell may require less of that protein. As reported here, selenoenzymes, novel fusion proteins, and loss of some major protein families including ones associated with chromatin are likely important adaptations for achieving a small cell size.

612 citations


Journal ArticleDOI
21 Sep 2007-Science
TL;DR: In this article, the authors sequenced the ∼90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predicted ∼11,500 protein coding genes in 71 Mb of robustly assembled sequence.
Abstract: Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the ∼90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict ∼11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during ∼350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.

Journal ArticleDOI
TL;DR: SSAKE is a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets.
Abstract: Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake Contact: [email protected]

Journal ArticleDOI
TL;DR: The results showed that a few key cytochromes play a role in all of the processes but that their degrees of participation in each process are very different, suggesting a very complex picture of electron transfer to solid and soluble substrates by S. oneidensis MR-1.
Abstract: Shewanella oneidensis MR-1 is a gram-negative facultative anaerobe capable of utilizing a broad range of electron acceptors, including several solid substrates. S. oneidensis MR-1 can reduce Mn(IV) and Fe(III) oxides and can produce current in microbial fuel cells. The mechanisms that are employed by S. oneidensis MR-1 to execute these processes have not yet been fully elucidated. Several different S. oneidensis MR-1 deletion mutants were generated and tested for current production and metal oxide reduction. The results showed that a few key cytochromes play a role in all of the processes but that their degrees of participation in each process are very different. Overall, these data suggest a very complex picture of electron transfer to solid and soluble substrates by S. oneidensis MR-1.

Journal ArticleDOI
TL;DR: The genome sequences and new annotation of two different isolates of strain D39 and the corrected sequence of strain R6 are reported and the implications of the D39 genome sequences to studies of pneumococcal physiology and pathogenesis are presented and discussed.
Abstract: Streptococcus pneumoniae (pneumococcus) is a leading human respiratory pathogen that causes a variety of serious mucosal and invasive diseases. D39 is an historically important serotype 2 strain that was used in experiments by Avery and coworkers to demonstrate that DNA is the genetic material. Although isolated nearly a century ago, D39 remains extremely virulent in murine infection models and is perhaps the strain used most frequently in current studies of pneumococcal pathogenesis. To date, the complete genome sequences have been reported for only two S. pneumoniae strains: TIGR4, a recent serotype 4 clinical isolate, and laboratory strain R6, an avirulent, unencapsulated derivative of strain D39. We report here the genome sequences and new annotation of two different isolates of strain D39 and the corrected sequence of strain R6. Comparisons of these three related sequences allowed deduction of the likely sequence of the D39 progenitor and mutations that arose in each isolate. Despite its numerous repeated sequences and IS elements, the serotype 2 genome has remained remarkably stable during cultivation, and one of the D39 isolates contains only five relatively minor mutations compared to the deduced D39 progenitor. In contrast, laboratory strain R6 contains 71 single-base-pair changes, six deletions, and four insertions and has lost the cryptic pDP1 plasmid compared to the D39 progenitor strain. Many of these mutations are in or affect the expression of genes that play important roles in regulation, metabolism, and virulence. The nature of the mutations that arose spontaneously in these three strains, the relative global transcription patterns determined by microarray analyses, and the implications of the D39 genome sequences to studies of pneumococcal physiology and pathogenesis are presented and discussed.

Journal ArticleDOI
TL;DR: The CAMERA (Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis) community database for metagenomic data deposition is an important first step in developing methods for monitoring microbial communities.
Abstract: Microbes are responsible for most of the chemical transformations that are crucial to sustaining life on Earth. Their ability to inhabit almost any environmental niche suggests that they possess an incredible diversity of physiological capabilities. However, we have little to no information on a majority of the millions of microbial species that are predicted to exist, mainly because of our inability to culture them in the laboratory. A growing discipline called metagenomics allows us to study these uncultured organisms by deciphering their genetic information from DNA that is extracted directly from their environment, thus effectively bypassing the laboratory culture step. Metagenomics allows us to address the questions “who's there?”, “what are they doing?”, and “how are they doing it?”, offering insights into the evolutionary history as well as previously unrecognized physiological abilities of uncultured communities. Studies such as the J. Craig Venter Institute's Global Ocean Sampling (GOS) expedition (in this issue) reveal a remarkable breadth and depth of microbial diversity in the oceans. To date, researchers have made significant but largely preliminary inroads into understanding the biogeography of microbial populations across ecosystems. We know even less about the dynamic physiological processes and complex interactions that impact global carbon cycles and ocean productivity. Marine microbes are thought to act as part of the biological conduit that transports carbon dioxide from the surface to the deep oceanic realms. By removing carbon from the atmosphere and sequestering it (in the form of organic matter), marine microorganisms may significantly affect global climate. Although we now have numerous global and real-time methods to measure physical and chemical parameters within the ocean, few methods or concepts have been developed to measure important microbial processes on a global scale. Even if the technology to make such measurements existed, we would presently not know what to measure or how to interpret those measurements. We invite the research community to submit its metagenomics data to CAMERA. We need a systematic way to explore the structure and function of ocean ecosystems, and their impact on global carbon processing and climate. Metagenomics has the potential to shed light on the genetic controls of these processes by investigating the key players, their roles, and community compositions that may change as a function of time, climate, nutrients, carbon dioxide, and anthropogenic factors. These studies include a substantial informatics component, requiring researchers to take on complex computational and mathematical challenges. Nonetheless, microbiologists have been quick to seize upon this modern technique, resulting in a deluge of sequence data, and an ever-widening gap between the rates of collecting data and interpreting it. The Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) project [1] is an important first step in attempting to bridge these gaps and in developing global methods for monitoring microbial communities in the ocean and their response to environmental changes. The aim is to create a rich, distinctive data repository and bioinformatics tools resource that will address many of the unique challenges of metagenomics and enable researchers to unravel the biology of environmental microorganisms (Figure 1). CAMERA's database includes environmental metagenomic and genomic sequence data, associated environmental parameters (“metadata”), precomputed search results, and software tools to support powerful cross-analysis of environmental samples. Figure 1 Schematic of Intended Core Functions of the CAMERA Project The initial release will include data and tools associated with the companion set of GOS expedition publications [2–4]; metagenome data from the Hawaii Ocean Time Series Station ALOHA [5] and marine viromes from four different oceanic regions[6]; standard nonredundant sequence databases (e.g., nrnt for nucleotides and nraa for amino acids[7]); and collections of microbial genome sequences, including a set of 155 marine microbial genomes funded by the Gordon and Betty Moore Foundation. The focal point for the CAMERA project is its Web site: http://camera.calit2.net. We invite the research community to submit its metagenomics data to CAMERA, and are establishing mechanisms to streamline this process. Here we describe some of the key challenges and features of the CAMERA project.

Journal ArticleDOI
03 Aug 2007-Science
TL;DR: This work completely replaced the genome of a bacterial cell with one from another species by transplanting a whole genome as naked DNA into Mycoplasma capricolum cells by polyethylene glycol–mediated transformation.
Abstract: As a step toward propagation of synthetic genomes, we completely replaced the genome of a bacterial cell with one from another species by transplanting a whole genome as naked DNA. Intact genomic DNA from Mycoplasma mycoides large colony (LC), virtually free of protein, was transplanted into Mycoplasma capricolum cells by polyethylene glycol-mediated transformation. Cells selected for tetracycline resistance, carried by the M. mycoides LC chromosome, contain the complete donor genome and are free of detectable recipient genomic sequences. These cells that result from genome transplantation are phenotypically identical to the M. mycoides LC donor strain as judged by several criteria.

Journal ArticleDOI
TL;DR: Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly and the difficulties of sequencing and assembling complex genomes de novo are not yet addressed.
Abstract: Despite rapidly decreasing costs and innovative technologies, sequencing of angiosperm genomes is not yet undertaken lightly. Generating larger amounts of sequence data more quickly does not address the difficulties of sequencing and assembling complex genomes de novo. The cotton ( Gossypium spp.)

Journal ArticleDOI
TL;DR: Survey sequencing and comparative analysis of the elephant shark genome are described, showing the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes.
Abstract: Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras) provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4× coverage) and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element–like and long interspersed element–like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

Journal ArticleDOI
TL;DR: Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.
Abstract: Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-μl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells.

Journal ArticleDOI
TL;DR: New ‘massively parallel’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome’ that many feel is prerequisite to personalized genomic medicine.
Abstract: Fifteen years elapsed between the discovery of the double helix (1953) and the first DNA sequencing (1968). Modern DNA sequencing began in 1977, with development of the chemical method of Maxam and Gilbert and the dideoxy method of Sanger, Nicklen and Coulson, and with the first complete DNA sequence (phage rX174), which demonstrated that sequence could give profound insights into genetic organization. Incremental improvements allowed sequencing of molecules >200kb (human cytomegalovirus) leading to an avalanche of data that demanded computational analysis and spawned the field of bioinformatics. The US Human Genome Project spurred sequencing activity. By 1992 the first ‘sequencing factory’ was established, and others soon followed. The first complete cellular genome sequences, from bacteria, appeared in 1995 and other eubacterial, archaebacterial and eukaryotic genomes were soon sequenced. Competition between the public Human Genome Project and Celera Genomics produced working drafts of the human genome sequence, published in 2001, but refinement and analysis of the human genome sequence will continue for the foreseeable future. New ‘massively parallel’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome’ that many feel is prerequisite to personalized genomic medicine. These advances will also allow new approaches to a variety of problems in biology, evolution and the environment.

Journal ArticleDOI
TL;DR: This huge phylogenetic and functional space is explored to cast light on the ancient evolution of this superfamily of enzymes built on a common protein kinase–like (PKL) fold and serves as a model for further structural and functional analysis of enzyme evolution.
Abstract: The eukaryotic protein kinase (ePK) domain mediates the majority of signaling and coordination of complex events in eukaryotes. By contrast, most bacterial signaling is thought to occur through structurally unrelated histidine kinases, though some ePK-like kinases (ELKs) and small molecule kinases are known in bacteria. Our analysis of the Global Ocean Sampling (GOS) dataset reveals that ELKs are as prevalent as histidine kinases and may play an equally important role in prokaryotic behavior. By combining GOS and public databases, we show that the ePK is just one subset of a diverse superfamily of enzymes built on a common protein kinase–like (PKL) fold. We explored this huge phylogenetic and functional space to cast light on the ancient evolution of this superfamily, its mechanistic core, and the structural basis for its observed diversity. We cataloged 27,677 ePKs and 18,699 ELKs, and classified them into 20 highly distinct families whose known members suggest regulatory functions. GOS data more than tripled the count of ELK sequences and enabled the discovery of novel families and classification and analysis of all ELKs. Comparison between and within families revealed ten key residues that are highly conserved across families. However, all but one of the ten residues has been eliminated in one family or another, indicating great functional plasticity. We show that loss of a catalytic lysine in two families is compensated by distinct mechanisms both involving other key motifs. This diverse superfamily serves as a model for further structural and functional analysis of enzyme evolution.

Journal ArticleDOI
TL;DR: Identification of the mechanism for chimera formation provides new insight into the MDA reaction and suggests methods to reduce chimeras, particularly for whole genome sequencing.
Abstract: Multiple Displacement Amplification (MDA) is a method used for amplifying limiting DNA sources. The high molecular weight amplified DNA is ideal for DNA library construction. While this has enabled genomic sequencing from one or a few cells of unculturable microorganisms, the process is complicated by the tendency of MDA to generate chimeric DNA rearrangements in the amplified DNA. Determining the source of the DNA rearrangements would be an important step towards reducing or eliminating them. Here, we characterize the major types of chimeras formed by carrying out an MDA whole genome amplification from a single E. coli cell and sequencing by the 454 Life Sciences method. Analysis of 475 chimeras revealed the predominant reaction mechanisms that create the DNA rearrangements. The highly branched DNA synthesized in MDA can assume many alternative secondary structures. DNA strands extended on an initial template can be displaced becoming available to prime on a second template creating the chimeras. Evidence supports a model in which branch migration can displace 3'-ends freeing them to prime on the new templates. More than 85% of the resulting DNA rearrangements were inverted sequences with intervening deletions that the model predicts. Intramolecular rearrangements were favored, with displaced 3'-ends reannealing to single stranded 5'-strands contained within the same branched DNA molecule. In over 70% of the chimeric junctions, the 3' termini had initiated priming at complimentary sequences of 2–21 nucleotides (nts) in the new templates. Formation of chimeras is an important limitation to the MDA method, particularly for whole genome sequencing. Identification of the mechanism for chimera formation provides new insight into the MDA reaction and suggests methods to reduce chimeras. The 454 sequencing approach used here will provide a rapid method to assess the utility of reaction modifications.

Journal ArticleDOI
Takeshi Itoh1, Takeshi Itoh2, Tsuyoshi Tanaka1, Roberto A. Barrero, Chisato Yamasaki2, Yasuyuki Fujii2, Phillip Hilton2, Baltazar A. Antonio1, Hideo Aono, Rolf Apweiler, Richard Bruskiewich3, Thomas E. Bureau4, Frances A. Burr5, Antonio Costa de Oliveira6, Galina Fuks7, Takuya Habara2, Georg Haberer, Bin Han, Erimi Harada2, Aiko T. Hiraki2, Hirohiko Hirochika1, Douglas R. Hoen4, Hiroki Hokari2, Satomi Hosokawa, Yue-Ie C. Hsing8, Hiroshi Ikawa9, Kazuho Ikeo, Tadashi Imanishi2, Tadashi Imanishi10, Yukiyo Ito, Pankaj Jaiswal11, Masako Kanno2, Yoshihiro Kawahara12, Yoshihiro Kawahara2, Toshiyuki Kawamura2, Hiroaki Kawashima2, Jitendra P. Khurana13, Shoshi Kikuchi1, Setsuko Komatsu1, Kanako O. Koyanagi10, Hiromi Kubooka2, Damien Lieberherr14, Yao-Cheng Lin8, David M. Lonsdale, Takashi Matsumoto1, Akihiro Matsuya2, W. Richard McCombie15, Joachim Messing7, Akio Miyao1, Nicola Mulder, Yoshiaki Nagamura1, Jongmin Nam16, Jongmin Nam17, Nobukazu Namiki, Hisataka Numa1, Shin Nurimoto2, Claire O'Donovan, Hajime Ohyanagi9, Toshihisa Okido, Satoshi Oota, Naoki Osato, Lance E. Palmer15, Lance E. Palmer18, Francis Quetier19, Saurabh Raghuvanshi13, Naomi Saichi2, Hiroaki Sakai2, Hiroaki Sakai1, Yasumichi Sakai9, Katsumi Sakata9, Tetsuya Sakurai, Fumihiko Sato2, Yoshiharu Sato2, Heiko Schoof20, Heiko Schoof21, Motoaki Seki, Michie Shibata, Yuji Shimizu9, Kazuo Shinozaki, Yuji Shinso2, Nagendra K. Singh22, Brian Smith-White23, Jun-ichi Takeda2, Motohiko Tanino2, Tatiana Tatusova23, Supat Thongjuea24, Fusano Todokoro2, Mika Tsugane, Akhilesh K. Tyagi13, Apichart Vanavichit24, Aihui Wang25, Rod A. Wing, Kaori Yamaguchi2, Mayu Yamamoto, Naoyuki Yamamoto2, Yeisoo Yu26, Hao Zhang2, Qiang Zhao, Kenichi Higo1, Benjamin Burr5, Takashi Gojobori2, Takuji Sasaki1 
TL;DR: The results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.
Abstract: We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.

Journal ArticleDOI
01 Aug 2007-Genetics
TL;DR: It is demonstrated that there are few genes, if any, whose expression is linearly correlated with the ploids and can be dramatically changed because of ploidy alteration, and that alteration of ploids caused subtle expression changes of a substantial percentage of genes in the potato genome.
Abstract: Polyploidy is remarkably common in the plant kingdom and polyploidization is a major driving force for plant genome evolution. Polyploids may contain genomes from different parental species (allopolyploidy) or include multiple sets of the same genome (autopolyploidy). Genetic and epigenetic changes associated with allopolyploidization have been a major research subject in recent years. However, we know little about the genetic impact imposed by autopolyploidization. We developed a synthetic autopolyploid series in potato (Solanum phureja) that includes one monoploid (1x) clone, two diploid (2x) clones, and one tetraploid (4x) clone. Cell size and organ thickness were positively correlated with the ploidy level. However, the 2x plants were generally the most vigorous and the 1x plants exhibited less vigor compared to the 2x and 4x individuals. We analyzed the transcriptomic variation associated with this autopolyploid series using a potato cDNA microarray containing ∼9000 genes. Statistically significant expression changes were observed among the ploidies for ∼10% of the genes in both leaflet and root tip tissues. However, most changes were associated with the monoploid and were within the twofold level. Thus, alteration of ploidy caused subtle expression changes of a substantial percentage of genes in the potato genome. We demonstrated that there are few genes, if any, whose expression is linearly correlated with the ploidy and can be dramatically changed because of ploidy alteration.

Journal ArticleDOI
TL;DR: Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction, which will greatly accelerate the pace of sequencing from uncultured microbes.

Journal ArticleDOI
TL;DR: The results support the notion that marine AAnP populations are complex and dynamic, and compose an important fraction of bacterioplankton assemblages in certain oceanic areas.
Abstract: Summary Aerobic anoxygenic photosynthetic bacteria (AAnP) were recently proposed to be significant contributors to global oceanic carbon and energy cycles. However, AAnP abundance, spatial distribution, diversity and potential ecological importance remain poorly understood. Here we present metagenomic data from the Global Ocean Sampling expedition indicating that AAnP diversity and abundance vary in different oceanic regions. Furthermore, we show for the first time that the composition of AAnP assemblages change between different oceanic regions, with specific bacterial assemblages adapted to open ocean or coastal areas respectively. Our results support the notion that marine AAnP populations are complex and dynamic, and compose an important fraction of bacterioplankton assemblages in certain oceanic areas.

Journal ArticleDOI
TL;DR: ETP gene clusters appear to have a single origin and have been inherited relatively intact rather than assembling independently in the different ascomycete lineages, suggesting that a progenitor ETP gene cluster assembled within an ancestral taxon.
Abstract: Genes responsible for biosynthesis of fungal secondary metabolites are usually tightly clustered in the genome and co-regulated with metabolite production. Epipolythiodioxopiperazines (ETPs) are a class of secondary metabolite toxins produced by disparate ascomycete fungi and implicated in several animal and plant diseases. Gene clusters responsible for their production have previously been defined in only two fungi. Fungal genome sequence data have been surveyed for the presence of putative ETP clusters and cluster data have been generated from several fungal taxa where genome sequences are not available. Phylogenetic analysis of cluster genes has been used to investigate the assembly and heredity of these gene clusters. Putative ETP gene clusters are present in 14 ascomycete taxa, but absent in numerous other ascomycetes examined. These clusters are discontinuously distributed in ascomycete lineages. Gene content is not absolutely fixed, however, common genes are identified and phylogenies of six of these are separately inferred. In each phylogeny almost all cluster genes form monophyletic clades with non-cluster fungal paralogues being the nearest outgroups. This relatedness of cluster genes suggests that a progenitor ETP gene cluster assembled within an ancestral taxon. Within each of the cluster clades, the cluster genes group together in consistent subclades, however, these relationships do not always reflect the phylogeny of ascomycetes. Micro-synteny of several of the genes within the clusters provides further support for these subclades. ETP gene clusters appear to have a single origin and have been inherited relatively intact rather than assembling independently in the different ascomycete lineages. This progenitor cluster has given rise to a small number of distinct phylogenetic classes of clusters that are represented in a discontinuous pattern throughout ascomycetes. The disjunct heredity of these clusters is discussed with consideration to multiple instances of independent cluster loss and lateral transfer of gene clusters between lineages.

Journal ArticleDOI
TL;DR: This work establishes that the primary cea mutation arose as a single disease allele in a common ancestor of herding breeds as well as highlights the value of comparative population analysis for refining regions of linkage.
Abstract: The features of modern dog breeds that increase the ease of mapping common diseases, such as reduced heterogeneity and extensive linkage disequilibrium, may also increase the difficulty associated with fine mapping and identifying causative mutations. One way to address this problem is by combining data from multiple breeds segregating the same trait after initial linkage has been determined. The multibreed approach increases the number of potentially informative recombination events and reduces the size of the critical haplotype by taking advantage of shortened linkage disequilibrium distances found across breeds. In order to identify breeds that likely share a trait inherited from the same ancestral source, we have used cluster analysis to divide 132 breeds of dog into five primary breed groups. We then use the multibreed approach to fine-map Collie eye anomaly (cea), a complex disorder of ocular development that was initially mapped to a 3.9-cM region on canine chromosome 37. Combined genotypes from affected individuals from four breeds of a single breed group significantly narrowed the candidate gene region to a 103-kb interval spanning only four genes. Sequence analysis revealed that all affected dogs share a homozygous deletion of 7.8 kb in the NHEJ1 gene. This intronic deletion spans a highly conserved binding domain to which several developmentally important proteins bind. This work both establishes that the primary cea mutation arose as a single disease allele in a common ancestor of herding breeds as well as highlights the value of comparative population analysis for refining regions of linkage.


Journal ArticleDOI
TL;DR: Lsr2 appears to regulate several important pathways in mycobacteria by preferentially binding to AT-rich sequences, including genes induced by antibiotics and those associated with inducible multi-drug tolerance.
Abstract: Multi-drug tolerance is a key phenotypic property that complicates the sterilization of mammals infected with Mycobacterium tuberculosis. Previous studies have established that iniBAC, an operon that confers multi-drug tolerance to M. bovis BCG through an associated pump-like activity, is induced by the antibiotics isoniazid (INH) and ethambutol (EMB). An improved understanding of the functional role of antibiotic-induced genes and the regulation of drug tolerance may be gained by studying the factors that regulate antibiotic-mediated gene expression. An M. smegmatis strain containing a lacZ gene fused to the promoter of M. tuberculosis iniBAC (PiniBAC) was subjected to transposon mutagenesis. Mutants with constitutive expression and increased EMB-mediated induction of PiniBAC::lacZ mapped to the lsr2 gene (MSMEG6065), a small basic protein of unknown function that is highly conserved among mycobacteria. These mutants had a marked change in colony morphology and generated a new polar lipid. Complementation with multi-copy M. tuberculosis lsr2 (Rv3597c) returned PiniBAC expression to baseline, reversed the observed morphological and lipid changes, and repressed PiniBAC induction by EMB to below that of the control M. smegmatis strain. Microarray analysis of an lsr2 knockout confirmed upregulation of M. smegmatis iniA and demonstrated upregulation of genes involved in cell wall and metabolic functions. Fully 121 of 584 genes induced by EMB treatment in wild-type M. smegmatis were upregulated (“hyperinduced”) to even higher levels by EMB in the M. smegmatis lsr2 knockout. The most highly upregulated genes and gene clusters had adenine-thymine (AT)–rich 5-prime untranslated regions. In M. tuberculosis, overexpression of lsr2 repressed INH-mediated induction of all three iniBAC genes, as well as another annotated pump, efpA. The low molecular weight and basic properties of Lsr2 (pI 10.69) suggested that it was a histone-like protein, although it did not exhibit sequence homology with other proteins in this class. Consistent with other histone-like proteins, Lsr2 bound DNA with a preference for circular DNA, forming large oligomers, inhibited DNase I activity, and introduced a modest degree of supercoiling into relaxed plasmids. Lsr2 also inhibited in vitro transcription and topoisomerase I activity. Lsr2 represents a novel class of histone-like proteins that inhibit a wide variety of DNA-interacting enzymes. Lsr2 appears to regulate several important pathways in mycobacteria by preferentially binding to AT-rich sequences, including genes induced by antibiotics and those associated with inducible multi-drug tolerance. An improved understanding of the role of lsr2 may provide important insights into the mechanisms of action of antibiotics and the way that mycobacteria adapt to stresses such as antibiotic treatment.