scispace - formally typeset
Search or ask a question

Showing papers by "J. Craig Venter Institute published in 2005"


Journal ArticleDOI
Takashi Matsumoto1, Jianzhong Wu1, Hiroyuki Kanamori1, Yuichi Katayose1  +262 moreInstitutions (25)
11 Aug 2005-Nature
TL;DR: A map-based, finished quality sequence that covers 95% of the 389 Mb rice genome, including virtually all of the euchromatin and two complete centromeres, and finds evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes.
Abstract: Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.

3,423 citations


Journal ArticleDOI
Piero Carninci, Takeya Kasukawa1, Shintaro Katayama, Julian Gough  +194 moreInstitutions (36)
02 Sep 2005-Science
TL;DR: Detailed polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
Abstract: This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.

3,412 citations


Journal ArticleDOI
TL;DR: The genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans, was generated and Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactic pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
Abstract: The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.

2,092 citations


Journal ArticleDOI
Matthew Berriman1, Elodie Ghedin2, Elodie Ghedin3, Christiane Hertz-Fowler1, Gaëlle Blandin3, Hubert Renauld1, Daniella Castanheira Bartholomeu3, Nicola Lennard1, Elisabet Caler3, N. Hamlin1, Brian J. Haas3, Ulrike Böhme1, Linda Hannick3, Martin Aslett1, Joshua Shallom3, Lucio Marcello4, Lihua Hou3, Bill Wickstead5, U. Cecilia M. Alsmark6, Claire Arrowsmith1, Rebecca Atkin1, Andrew Barron1, Frédéric Bringaud7, Karen Brooks1, Mark Carrington8, Inna Cherevach1, Tracey-Jane Chillingworth1, Carol Churcher1, Louise Clark1, Craig Corton1, Ann Cronin1, Robert L. Davies1, Jonathon Doggett1, Appolinaire Djikeng3, Tamara Feldblyum3, Mark C. Field8, Audrey Fraser1, Ian Goodhead1, Zahra Hance1, David Harper1, Barbara Harris1, Heidi Hauser1, Jessica B. Hostetler3, Al Ivens1, Kay Jagels1, David W. Johnson1, Justin Johnson3, Kristine Jones3, Arnaud Kerhornou1, Hean Koo3, Natasha Larke1, Scott M. Landfear9, Christopher Larkin3, Vanessa Leech8, Alexandra Line1, Angela Lord1, Annette MacLeod4, P. Mooney1, Sharon Moule1, David M. A. Martin10, Gareth W. Morgan11, Karen Mungall1, Halina Norbertczak1, Doug Ormond1, Grace Pai3, Christopher S. Peacock1, Jeremy Peterson3, Michael A. Quail1, Ester Rabbinowitsch1, Marie-Adèle Rajandream1, Chris P Reitter8, Steven L. Salzberg3, Mandy Sanders1, Seth Schobel3, Sarah Sharp1, Mark Simmonds1, Anjana J. Simpson3, Luke J. Tallon3, C. Michael R. Turner4, Andrew Tait4, Adrian Tivey1, Susan Van Aken3, Danielle Walker1, David Wanless3, Shiliang Wang3, Brian White1, Owen White3, Sally Whitehead1, John Woodward1, Jennifer R. Wortman3, Mark Raymond Adams12, T. Martin Embley6, Keith Gull5, Elisabetta Ullu13, J. David Barry4, Alan H. Fairlamb10, Fred R. Opperdoes14, Barclay G. Barrell1, John E. Donelson15, Neil Hall3, Neil Hall16, Claire M. Fraser3, Sara E. Melville8, Najib M. El-Sayed2, Najib M. El-Sayed3 
15 Jul 2005-Science
TL;DR: Comparisons of the cytoskeleton and endocytic trafficking systems of Trypanosoma brucei with those of humans and other eukaryotic organisms reveal major differences.
Abstract: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including ∼900 pseudogenes and ∼1700 T. brucei–specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.

1,631 citations


Journal ArticleDOI
William C. Nierman1, William C. Nierman2, Arnab Pain3, Michael J. Anderson4, Jennifer R. Wortman1, Jennifer R. Wortman2, H. Stanley Kim2, H. Stanley Kim1, Javier Arroyo5, Matthew Berriman3, Keietsu Abe6, David B. Archer7, Clara Bermejo5, Joan W. Bennett8, Paul Bowyer4, Dan Chen2, Dan Chen1, Matthew Collins3, Richard Coulsen, Robert L. Davies3, Paul S. Dyer7, Mark L. Farman9, Nadia Fedorova2, Nadia Fedorova1, Natalie D. Fedorova2, Natalie D. Fedorova1, T. Feldblyum1, T. Feldblyum2, Reinhard Fischer10, Nigel Fosker3, Audrey Fraser3, José Luis García11, María Josefa Marcos García12, Ariette Goble3, Gustavo H. Goldman13, Katsuya Gomi6, Sam Griffith-Jones3, R. Gwilliam3, Brian J. Haas2, Brian J. Haas1, Hubertus Haas14, David Harris3, H. Horiuchi15, Jiaqi Huang1, Jiaqi Huang2, Sean Humphray3, Javier Jiménez12, Nancy P. Keller15, H. Khouri1, H. Khouri2, Katsuhiko Kitamoto16, Tetsuo Kobayashi17, Sven Konzack10, Resham Kulkarni2, Resham Kulkarni1, Toshitaka Kumagai18, Anne Lafton19, Jean-Paul Latgé19, Weixi Li9, Angela Lord3, Charles Lu2, Charles Lu1, William H. Majoros2, William H. Majoros1, Gregory S. May20, Bruce L. Miller21, Yasmin Ali Mohamoud1, Yasmin Ali Mohamoud2, María Molina5, Michel Monod22, Isabelle Mouyna19, Stephanie Mulligan2, Stephanie Mulligan1, Lee Murphy3, Susan O'Neil3, Ian T. Paulsen1, Ian T. Paulsen2, Miguel A. Peñalva11, Mihaela Pertea2, Mihaela Pertea1, Claire Price3, Bethan L. Pritchard4, Michael A. Quail3, Ester Rabbinowitsch3, Neil Rawlins3, Marie Adele Rajandream3, Utz Reichard23, Hubert Renauld3, Geoffrey D. Robson4, Santiago Rodríguez de Córdoba11, José Manuel Rodríguez-Peña5, Catherine M. Ronning1, Catherine M. Ronning2, Simon Rutter3, Steven L. Salzberg2, Steven L. Salzberg1, Miguel del Nogal Sánchez12, Juan C. Sánchez-Ferrero11, David L. Saunders3, Kathy Seeger3, Rob Squares3, S. Squares3, Michio Takeuchi24, Fredj Tekaia19, Geoffrey Turner25, Carlos R. Vázquez de Aldana12, J. Weidman2, J. Weidman1, Owen White2, Owen White1, John Woodward3, Jae-Hyuk Yu15, Claire M. Fraser2, Claire M. Fraser1, James E. Galagan26, Kiyoshi Asai18, Masayuki Machida18, Neil Hall2, Neil Hall3, Bart Barrell3, David W. Denning4 
22 Dec 2005-Nature
TL;DR: The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus and revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype.
Abstract: Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus.

1,356 citations


Journal ArticleDOI
Najib M. El-Sayed1, Peter J. Myler2, Peter J. Myler3, Daniella Castanheira Bartholomeu4, Daniel Nilsson5, Gautam Aggarwal2, Anh-Nhi Tran5, Elodie Ghedin1, Elizabeth A. Worthey2, Arthur L. Delcher, Gaëlle Blandin4, Scott J. Westenberger6, Elisabet Caler4, Gustavo C. Cerqueira7, Carole Branche5, Brian J. Haas4, Atashi Anupama2, Erik Arner5, Lena Åslund8, Philip Attipoe2, Esteban J. Bontempi5, Frédéric Bringaud9, Peter Burton10, Eithon Cadag2, David A. Campbell6, Mark Carrington11, Jonathan Crabtree4, Hamid Darban5, José Franco da Silveira12, Pieter J. de Jong13, Kimberly Edwards5, Paul T. Englund14, Gholam Fazelina2, Tamara Feldblyum4, Marcela Ferella5, Alberto C.C. Frasch15, Keith Gull16, David Horn17, Lihua Hou4, Yiting Huang2, Ellen Kindlund5, Michele M. Klingbeil18, Sindy Kluge5, Hean Koo4, Daniela R. Lacerda19, Mariano J. Levin20, Hernan Lorenzi20, Tin Louie2, Carlos Renato Machado7, Richard McCulloch10, Alan McKenna5, Yumi Mizuno5, Jeremy C. Mottram10, Siri Nelson2, Stephen Ochaya5, Kazutoyo Osoegawa13, Grace Pai4, Marilyn Parsons2, Marilyn Parsons3, Martin Pentony2, Ulf Pettersson8, Mihai Pop4, José Luis Ramírez21, Joel Rinta2, Laura Robertson2, Steven L. Salzberg, Daniel O. Sánchez15, Amber Seyler2, Reuben Sunil Kumar Sharma11, Jyoti Shetty4, Anjana J. Simpson4, Ellen Sisk2, Martti T. Tammi22, Martti T. Tammi5, Rick L. Tarleton23, Santuza M. R. Teixeira7, Susan Van Aken4, Christy Vogt2, Pauline N. Ward10, Bill Wickstead16, Jennifer R. Wortman4, Owen White4, Claire M. Fraser4, Kenneth Stuart2, Kenneth Stuart3, Björn Andersson5 
15 Jul 2005-Science
TL;DR: Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
Abstract: Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.

1,349 citations


Journal ArticleDOI
22 Dec 2005-Nature
TL;DR: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution, and a comparative study with Aspergillus fumigatus and As pergillus oryzae, used in the production of sake, miso and soy sauce, provides new insight into eukaryotic genome evolution and gene regulation.
Abstract: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation.

1,297 citations


Journal ArticleDOI
22 Dec 2005-Nature
TL;DR: Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.
Abstract: The genome of Aspergillus oryzae, a fungus important for the production of traditional fermented foods and beverages in Japan, has been sequenced. The ability to secrete large amounts of proteins and the development of a transformation system have facilitated the use of A. oryzae in modern biotechnology. Although both A. oryzae and Aspergillus flavus belong to the section Flavi of the subgenus Circumdati of Aspergillus, A. oryzae, unlike A. flavus, does not produce aflatoxin, and its long history of use in the food industry has proved its safety. Here we show that the 37-megabase (Mb) genome of A. oryzae contains 12,074 genes and is expanded by 7-9 Mb in comparison with the genomes of Aspergillus nidulans and Aspergillus fumigatus. Comparison of the three aspergilli species revealed the presence of syntenic blocks and A. oryzae-specific blocks (lacking synteny with A. nidulans and A. fumigatus) in a mosaic manner throughout the genome of A. oryzae. The blocks of A. oryzae-specific sequence are enriched for genes involved in metabolism, particularly those for the synthesis of secondary metabolites. Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.

1,149 citations


Journal ArticleDOI
TL;DR: A consortium of ten laboratories from the Washington, DC–Baltimore, USA, area was formed to compare data obtained from three widely used platforms using identical RNA samples to demonstrate that there are relatively large differences in data obtained in labs using the same platform, but that the results from the best-performing labs agree rather well.
Abstract: Microarray technology is a powerful tool for measuring RNA expression for thousands of genes at once. Various studies have been published comparing competing platforms with mixed results: some find agreement, others do not. As the number of researchers starting to use microarrays and the number of cross-platform meta-analysis studies rapidly increases, appropriate platform assessments become more important. Here we present results from a comparison study that offers important improvements over those previously described in the literature. In particular, we noticed that none of the previously published papers consider differences between labs. For this study, a consortium of ten laboratories from the Washington, DC–Baltimore, USA, area was formed to compare data obtained from three widely used platforms using identical RNA samples. We used appropriate statistical analysis to demonstrate that there are relatively large differences in data obtained in labs using the same platform, but that the results from the best-performing labs agree rather well.

897 citations


Journal ArticleDOI
15 Jul 2005-Science
TL;DR: No evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont is revealed, and a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters is revealed.
Abstract: A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.

761 citations


Journal ArticleDOI
25 Feb 2005-Science
TL;DR: Comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes, and the genome is rich in transposons, many of which cluster at candidate centromeric regions.
Abstract: Cryptococcus neoformans is a basidionnycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its similar to20-megabase genome, which contains similar to6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.

Journal ArticleDOI
TL;DR: Analysis of this first sequenced endosymbiont genome from a filarial nematode provides insight into endosYmbionT evolution and additionally provides new potential targets for elimination of cutaneous and lymphatic human filarial disease.
Abstract: Complete genome DNA sequence and analysis is presented for Wolbachia, the obligate alpha-proteobacterial endosymbiont required for fertility and survival of the human filarial parasitic nematode Brugia malayi. Although, quantitatively, the genome is even more degraded than those of closely related Rickettsia species, Wolbachia has retained more intact metabolic pathways. The ability to provide riboflavin, flavin adenine dinucleotide, heme, and nucleotides is likely to be Wolbachia's principal contribution to the mutualistic relationship, whereas the host nematode likely supplies amino acids required for Wolbachia growth. Genome comparison of the Wolbachia endosymbiont of B. malayi (wBm) with the Wolbachia endosymbiont of Drosophila melanogaster (wMel) shows that they share similar metabolic trends, although their genomes show a high degree of genome shuffling. In contrast to wMel, wBm contains no prophage and has a reduced level of repeated DNA. Both Wolbachia have lost a considerable number of membrane biogenesis genes that apparently make them unable to synthesize lipid A, the usual component of proteobacterial membranes. However, differences in their peptidoglycan structures may reflect the mutualistic lifestyle of wBm in contrast to the parasitic lifestyle of wMel. The smaller genome size of wBm, relative to wMel, may reflect the loss of genes required for infecting host cells and avoiding host defense systems. Analysis of this first sequenced endosymbiont genome from a filarial nematode provides insight into endosymbiont evolution and additionally provides new potential targets for elimination of cutaneous and lymphatic human filarial disease.

Journal ArticleDOI
TL;DR: The full sequencing and functional expression of a marine natural-product pathway from an obligate symbiont is presented, and a related cluster was identified in Trichodesmium erythraeum IMS101, an important bloom-forming cyanobacterium.
Abstract: Prochloron spp. are obligate cyanobacterial symbionts of many didemnid family ascidians. It has been proposed that the cyclic peptides of the patellamide class found in didemnid extracts are synthesized by Prochloron spp., but studies in which host and symbiont cells are separated and chemically analyzed to identify the biosynthetic source have yielded inconclusive results. As part of the Prochloron didemni sequencing project, we identified patellamide biosynthetic genes and confirmed their function by heterologous expression of the whole pathway in Escherichia coli. The primary sequence of patellamides A and C is encoded on a single ORF that resembles a precursor peptide. We propose that this prepatellamide is heterocyclized to form thiazole and oxazoline rings, and the peptide is cleaved to yield the two cyclic patellamides, A and C. This work represents the full sequencing and functional expression of a marine natural-product pathway from an obligate symbiont. In addition, a related cluster was identified in Trichodesmium erythraeum IMS101, an important bloom-forming cyanobacterium.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method for standardizing global gene expression analysis between laboratories and across platforms, which can be found in Section 5.2.1.1].
Abstract: Addendum: Standardizing global gene expression analysis between laboratories and across platforms

Journal ArticleDOI
TL;DR: In this article, the authors describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution.
Abstract: Prochlorococcus is a marine cyanobacterium that numerically dominates the mid-latitude oceans and is the smallest known oxygenic phototroph. Numerous isolates from diverse areas of the world’s oceans have been studied and shown to be physiologically and genetically distinct. All isolates described thus far can be assigned to either a tightly clustered high-light (HL)-adapted clade, or a more divergent low-light (LL)-adapted group. The 16S rRNA sequences of the entire Prochlorococcus group differ by at most 3%, and the four initially published genomes revealed patterns of genetic differentiation that help explain physiological differences among the isolates. Here we describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution. There are 1,273 genes that represent the core shared by all 12 genomes. They are apparently sufficient, according to metabolic reconstruction, to encode a functional cell. We describe a phylogeny for all 12 isolates by subjecting their complete proteomes to three different phylogenetic analyses. For each non-core gene, we used a maximum parsimony method to estimate which ancestor likely first acquired or lost each gene. Many of the genetic differences among isolates, especially for genes involved in outer membrane synthesis and nutrient transport, are found within the same clade. Nevertheless, we identified some genes defining HL and LL ecotypes, and clades within these broad ecotypes, helping to demonstrate the basis of HL and LL adaptations in Prochlorococcus. Furthermore, our estimates of gene gain events allow us to identify highly variable genomic islands that are not apparent through simple pairwise comparisons. These results emphasize the functional roles, especially those connected to outer membrane synthesis and transport that dominate the flexible genome and set it apart from the core. Besides identifying islands and demonstrating their role throughout the history of Prochlorococcus, reconstruction of past gene gains and losses shows that much of the variability exists at the ‘‘leaves of the tree,’’ between the most closely related strains. Finally, the identification of core and flexible genes from this 12-genome comparison is largely consistent with the relative frequency of Prochlorococcus genes found in global ocean metagenomic databases, further closing the gap between our understanding of these organisms in the lab and the wild.

Journal ArticleDOI
20 Oct 2005-Nature
TL;DR: A new, large-scale sequencing effort to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations is reported, encompassing a total of 2,821,103 nucleotides.
Abstract: Influenza viruses are remarkably adept at surviving in the human population over a long timescale. The human influenza A virus continues to thrive even among populations with widespread access to vaccines, and continues to be a major cause of morbidity and mortality. The virus mutates from year to year, making the existing vaccines ineffective on a regular basis, and requiring that new strains be chosen for a new vaccine. Less-frequent major changes, known as antigenic shift, create new strains against which the human population has little protective immunity, thereby causing worldwide pandemics. The most recent pandemics include the 1918 'Spanish' flu, one of the most deadly outbreaks in recorded history, which killed 30-50 million people worldwide, the 1957 'Asian' flu, and the 1968 'Hong Kong' flu. Motivated by the need for a better understanding of influenza evolution, we have developed flexible protocols that make it possible to apply large-scale sequencing techniques to the highly variable influenza genome. Here we report the results of sequencing 209 complete genomes of the human influenza A virus, encompassing a total of 2,821,103 nucleotides. In addition to increasing markedly the number of publicly available, complete influenza virus genomes, we have discovered several anomalies in these first 209 genomes that demonstrate the dynamic nature of influenza transmission and evolution. This new, large-scale sequencing effort promises to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations. All data from this project are being deposited, without delay, in public archives.

Journal ArticleDOI
TL;DR: A phylogenetic analysis of 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York State, United States demonstrated that multiple lineages can co-circulate, persist, and reassort in epidemiologically significant ways, and underscore the importance of genomic analyses for future influenza surveillance.
Abstract: Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York State, United States, and observed multiple co-circulating clades with different population frequencies. Strikingly, phylogenies inferred for individual gene segments revealed that multiple reassortment events had occurred among these clades, such that one clade of H3N2 viruses present at least since 2000 had provided the hemagglutinin gene for all those H3N2 viruses sampled after the 2002–2003 influenza season. This reassortment event was the likely progenitor of the antigenically variant influenza strains that caused the A/Fujian/411/2002-like epidemic of the 2003–2004 influenza season. However, despite sharing the same hemagglutinin, these phylogenetically distinct lineages of viruses continue to co-circulate in the same population. These data, derived from the first large-scale analysis of H3N2 viruses, convincingly demonstrate that multiple lineages can co-circulate, persist, and reassort in epidemiologically significant ways, and underscore the importance of genomic analyses for future influenza surveillance.

Journal ArticleDOI
01 Jul 2005-Science
TL;DR: The genome sequence of Theileria parva is reported, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand.
Abstract: We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.

Journal ArticleDOI
TL;DR: Two model legumes, Medicago truncatula and Lotus japonicus, are currently targets of large-scale genome sequencing projects and the prospect of integrating genome information from Mt and Lj is exciting.
Abstract: Two model legumes, Medicago truncatula ( Mt ) and Lotus japonicus ( Lj ), are currently targets of large-scale genome sequencing projects. As a result, legumes are one of few plant families with extensive genome sequence in multiple species. The prospect of integrating genome information from Mt and

Journal ArticleDOI
TL;DR: This work describes a computational method for miRNA prediction and the results of its application to the discovery of novel mammalian miRNAs, and shows that although the overall miRNA content in the observed clusters is very similar across the three considered species, the internal organization of the clusters changes in evolution.
Abstract: MicroRNAs (miRNAs) are endogenous 21 to 23-nucleotide RNA molecules that regulate protein-coding gene expression in plants and animals via the RNA interference pathway. Hundreds of them have been identified in the last five years and very recent works indicate that their total number is still larger. Therefore miRNAs gene discovery remains an important aspect of understanding this new and still widely unknown regulation mechanism. Bioinformatics approaches have proved to be very useful toward this goal by guiding the experimental investigations. In this work we describe our computational method for miRNA prediction and the results of its application to the discovery of novel mammalian miRNAs. We focus on genomic regions around already known miRNAs, in order to exploit the property that miRNAs are occasionally found in clusters. Starting with the known human, mouse and rat miRNAs we analyze 20 kb of flanking genomic regions for the presence of putative precursor miRNAs (pre-miRNAs). Each genome is analyzed separately, allowing us to study the species-specific identity and genome organization of miRNA loci. We only use cross-species comparisons to make conservative estimates of the number of novel miRNAs. Our ab initio method predicts between fifty and hundred novel pre-miRNAs for each of the considered species. Around 30% of these already have experimental support in a large set of cloned mammalian small RNAs. The validation rate among predicted cases that are conserved in at least one other species is higher, about 60%, and many of them have not been detected by prediction methods that used cross-species comparisons. A large fraction of the experimentally confirmed predictions correspond to an imprinted locus residing on chromosome 14 in human, 12 in mouse and 6 in rat. Our computational tool can be accessed on the world-wide-web. Our results show that the assumption that many miRNAs occur in clusters is fruitful for the discovery of novel miRNAs. Additionally we show that although the overall miRNA content in the observed clusters is very similar across the three considered species, the internal organization of the clusters changes in evolution.

Journal ArticleDOI
TL;DR: Conditions for rolling-circle amplification (RCA) of individual DNA molecules 5–7 kb in size by >109-fold, using φ29 DNA polymerase is described, which allows cell-free cloning of individual synthetic DNA molecules that cannot be cloned in Escherichia coli, and may also speed genome sequencing by eliminating the need for biological cloning.
Abstract: We describe conditions for rolling-circle amplification (RCA) of individual DNA molecules 5-7 kb in size by >10(9)-fold, using phi29 DNA polymerase. The principal difficulty with amplification of small amounts of template by RCA using phi29 DNA polymerase is "background" DNA synthesis that usually occurs when template is omitted, or at low template concentrations. Reducing the reaction volume while keeping the amount of template fixed increases the template concentration, resulting in a suppression of background synthesis. Cell-free cloning of single circular molecules by using phi29 DNA polymerase was achieved by carrying out the amplification reactions in very small volumes, typically 600 nl. This procedure allows cell-free cloning of individual synthetic DNA molecules that cannot be cloned in Escherichia coli, for example synthetic phage genomes carrying lethal mutations. It also allows cell-free cloning of genomic DNA isolated from bacteria. This DNA can be sequenced directly from the phi29 DNA polymerase reaction without further amplification. In contrast to PCR amplification, RCA using phi29 DNA polymerase does not produce mutant jackpots, and the high processivity of the enzyme eliminates stuttering at homopolymer tracts. Cell-free cloning has many potential applications to both natural and synthetic DNA. These include environmental DNA samples that have proven difficult to clone and synthetic genes encoding toxic products. The method may also speed genome sequencing by eliminating the need for biological cloning.

Journal ArticleDOI
TL;DR: A high frequency of closely adjacent, apparent double crossover events that may represent gene conversions and large regions of genetic homogeneity among the archetypal clonal lineages are detected, reflecting the relatively few genetic outbreeding events that have occurred since their recent origin are detected.
Abstract: Toxoplasma gondii is a highly successful protozoan parasite in the phylum Apicomplexa, which contains numerous animal and human pathogens. T.gondii is amenable to cellular, biochemical, molecular and genetic studies, making it a model for the biology of this important group of parasites. To facilitate forward genetic analysis, we have developed a high-resolution genetic linkage map for T.gondii. The genetic map was used to assemble the scaffolds from a 10X shotgun whole genome sequence, thus defining 14 chromosomes with markers spaced at ∼300 kb intervals across the genome. Fourteen chromosomes were identified comprising a total genetic size of ∼592 cM and an average map unit of ∼104 kb/cM. Analysis of the genetic parameters in T.gondii revealed a high frequency of closely adjacent, apparent double crossover events that may represent gene conversions. In addition, we detected large regions of genetic homogeneity among the archetypal clonal lineages, reflecting the relatively few genetic outbreeding events that have occurred since their recent origin. Despite these unusual features, linkage analysis proved to be effective in mapping the loci determining several drug resistances. The resulting genome map provides a framework for analysis of complex traits such as virulence and transmission, and for comparative population genetic studies.

Journal ArticleDOI
TL;DR: The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized.
Abstract: We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.

Journal ArticleDOI
Nathalie Choisne1, Nadia Demange1, Gisela Orjeda1, Sylvie Samain1, Angélique D'Hont2, Laurence Cattolico1, Eric Pelletier1, Arnaud Couloux1, Béatrice Segurens1, Patrick Wincker1, Claude Scarpelli1, Jean Weissenbach1, Marcel Salanoubat1, Nagendra K. Singh3, Trilochan Mohapatra3, Tilak Raj Sharma3, Kishor Gaikwad3, Alok Singh3, Vivek Dalal3, Subodh K. Srivastava3, Anupam Dixit3, Ajit K. Pal3, Irfan Ahmad Ghazi3, Mahavir Yadav3, Awadhesh Pandit3, Ashutosh Bhargava3, K. Sureshbabu3, Rekha Dixit3, Harvinder Singh3, Suresh C. Swain3, Sumita Pal3, M. Ragiba3, Pradeep K. Singh3, Vibha Singhal3, Sangeeta D. Mendiratta3, Kamlesh Batra3, Saurabh Raghuvanshi4, Amitabh Mohanty4, Arvind K. Bharti4, Anupama Gaur4, Vikrant Gupta4, Dibyendu Kumar4, Ravi Vydianathan4, Shuba Vij4, Anita Kapur4, Parul Khurana4, Sulabha Sharma4, Paramjit Khurana4, Jitendra P. Khurana4, Akhilesh K. Tyagi4, Qiaoping Yuan5, Shu Ouyang5, Jia Liu5, Wei Zhu5, Aihui Wang5, Haining Lin5, John P. Hamilton5, Brian J. Haas5, Jennifer R. Wortman5, Kristine Jones5, Mary Kim5, Larry Overton5, Tamara Tsitrin5, Douglas Fadrosh5, Jayati Bera5, Bruce Weaver5, Shaohua Jin5, Shivani Johri5, Matt Reardon5, Hue Vuong5, Luke J. Tallon5, Susan Van Aken5, Matthew R. Lewis5, Teresa Utterback5, Tamara Feldblyum5, Victoria Zismann5, Stacey E. Iobst5, Joseph Hsiao5, Aymeric R. De Vazeille5, Steven L. Salzberg5, Owen White5, Claire M. Fraser5, C. Robin Buell5, Yeisoo Yu6, Teri Rambo6, Jennifer Currie6, Kristi Collura6, Hyeran Kim6, Diana Stum6, Wenming Wang6, Dave Kudrna6, Christopher Mueller6, Rod A. Wing6, Melissa Kramer7, Lori Spiegel7, Lidia Nascimento7, R. Preston7, Theresa Zutavern7, Joachim Messing8 
TL;DR: Based on syntenic alignments of these chromosomes, rice chromosome 11 and 12 do not appear to have resulted from a single whole-genome duplication event as previously suggested.
Abstract: Background: Rice is an important staple food and, with the smallest cereal genome, serves as a reference species for studies on the evolution of cereals and other grasses Therefore, decoding its entire genome will be a prerequisite for applied and basic research on this species and all other cereals Results: We have determined and analyzed the complete sequences of two of its chromosomes, 11 and 12, which total 559 Mb (143% of the entire genome length), based on a set of overlapping clones A total of 5,993 non-transposable element related genes are present on these chromosomes Among them are 289 disease resistance-like and 28 defense-response genes, a higher proportion of these categories than on any other rice chromosome A three-Mb segment on both chromosomes resulted from a duplication 77 million years ago (mya), the most recent large-scale duplication in the rice genome Paralogous gene copies within this segmental duplication can be aligned with genomic assemblies from sorghum and maize Although these gene copies are preserved on both chromosomes, their expression patterns have diverged When the gene order of rice chromosomes 11 and 12 was compared to wheat gene loci, significant synteny between these orthologous regions was detected, illustrating the presence of conserved genes alternating with recently evolved genes Conclusion: Because the resistance and defense response genes, enriched on these chromosomes relative to the whole genome, also occur in clusters, they provide a preferred target for breeding durable disease resistance in rice and the isolation of their allelic variants The recent duplication of a large chromosomal segment coupled with the high density of disease resistance gene clusters makes this the most recently evolved part of the rice genome Based on syntenic alignments of these chromosomes, rice chromosome 11 and 12 do not appear to have resulted from a single whole-genome duplication event as previously suggested (Resume d'auteur)

Journal ArticleDOI
TL;DR: In this article, the authors report on the sequence analysis of members of the receptor tyrosine kinase (RTK) gene family in the genomes of glioblastoma brain tumors.
Abstract: It is now clear that tyrosine kinases represent attractive targets for therapeutic intervention in cancer. Recent advances in DNA sequencing technology now provide the opportunity to survey mutational changes in cancer in a high-throughput and comprehensive manner. Here we report on the sequence analysis of members of the receptor tyrosine kinase (RTK) gene family in the genomes of glioblastoma brain tumors. Previous studies have identified a number of molecular alterations in glioblastoma, including amplification of the RTK epidermal growth factor receptor. We have identified mutations in two other RTKs: (i) fibroblast growth receptor 1, including the first mutations in the kinase domain in this gene observed in any cancer, and (ii) a frameshift mutation in the platelet-derived growth factor receptor-α gene. Fibroblast growth receptor 1, platelet-derived growth factor receptor-α, and epidermal growth factor receptor are all potential entry points to the phosphatidylinositol 3-kinase and mitogen-activated protein kinase intracellular signaling pathways already known to be important for neoplasia. Our results demonstrate the utility of applying DNA sequencing technology to systematically assess the coding sequence of genes within cancer genomes.

Journal ArticleDOI
15 Sep 2005-Nature
TL;DR: A sequencing system has been developed that can read 25 million bases of genetic code — the entire genome of some fungi — within four hours, and may provide an alternative approach to DNA sequencing.
Abstract: A sequencing system has been developed that can read 25 million bases of genetic code — the entire genome of some fungi — within four hours. The technique may provide an alternative approach to DNA sequencing. The race is on for a big prize: the job of providing the world's DNA sequencing laboratories with the successor to the ‘Sanger-based’ technology that gave us the first wave of genome sequences. One technology in the frame is that produced by 454 Life Sciences Corporation of Branford, Connecticut. Today's technology reads 67,000 base pairs per hour; this new approach is 100 times faster, reading 6 million base pairs per hour. The improved performance results from using picolitre-sized chemical reactors, enhanced light-emitting sequencing chemistries and complex informatics. Further miniaturization of the system is planned. Such leaps in technology may one day make it possible to analyse an individual's genome before designing therapy: the ultimate in personalized medicine.

Journal ArticleDOI
TL;DR: A stochastic model for initiation of DNA replication in the fission yeast is proposed and it is demonstrated that at least half of intergenes have potential origin activity and that the relative ability of an intergene to function as an origin is governed primarily by AT content and length.
Abstract: Origins of DNA replication in Schizosaccharomyces pombe lack a specific consensus sequence analogous to the Saccharomyces cerevisiae autonomously replicating sequence (ARS) consensus, raising the question of how they are recognized by the replication machinery. Because all well characterized S. pombe origins are located in intergenic regions, we analyzed the sequence properties and biological activity of such regions. The AT content of intergenes is very high (≈70%), and runs of A's or T's occur with a significantly greater frequency than expected. Additionally, the two DNA strands in intergenes display compositional asymmetry that strongly correlates with the direction of transcription of flanking genes. Importantly, the sequence properties of known S. pombe origins of DNA replication are similar to those of intergenes in general. In functional studies, we assayed the in vivo origin activity of 26 intergenes in a 68-kb region of S. pombe chromosome 2. We also assayed the origin activity of sets of randomly chosen intergenes with the same length or AT content. Our data demonstrate that at least half of intergenes have potential origin activity and that the relative ability of an intergene to function as an origin is governed primarily by AT content and length. We propose a stochastic model for initiation of DNA replication in the fission yeast. In this model, the number of AT tracts in a given sequence is the major determinant of its probability of binding SpORC and serving as a replication origin. A similar model may explain some features of origins of DNA replication in metazoans.

Journal ArticleDOI
TL;DR: It is demonstrated that shear-induced cyclooxygenase (COX)-2 suppresses phosphatidylinositol 3-kinase (PI3-K) activity, which represses antioxidant response element (ARE)/NF-E2 related factor 2 (Nrf2)-mediated transcriptional response in human chondrocytes, which contributes to their apoptosis.
Abstract: Fluid shear exerts anti-inflammatory and anti-apoptotic effects on endothelial cells by inducing the coordinated expression of phase 2 detoxifying and antioxidant genes. In contrast, high shear is pro-apoptotic in chondrocytes and promotes matrix degradation and cartilage destruction. We have analyzed the mechanisms regulating shear-mediated chondrocyte apoptosis by cDNA microarray technology and bioinformatics. We demonstrate that shear-induced cyclooxygenase (COX)-2 suppresses phosphatidylinositol 3-kinase (PI3-K) activity, which represses antioxidant response element (ARE)/NF-E2 related factor 2 (Nrf2)-mediated transcriptional response in human chondrocytes. The resultant decrease in antioxidant capacity of sheared chondrocytes contributes to their apoptosis. Phase 2 inducers, and to a lesser extent COX-2-selective inhibitors, negate the shear-mediated suppression of ARE-driven phase 2 activity and apoptosis. The abrogation of shear-induced COX-2 expression by PI3-K activity and/or stimulation of the Nrf2/ARE pathway suggests the existence of PI3-K/Nrf2/ARE negative feedback loops that potentially interfere with c-Jun N-terminal kinase 2 activity upstream of COX-2. Reconstructing the signaling network regulating shear-induced chondrocyte apoptosis may provide insights to optimize conditions for culturing artificial cartilage in bioreactors and for developing therapeutic strategies for arthritic disorders.

Journal ArticleDOI
TL;DR: MPSS analysis has resulted in a significant extension of the knowledge of CT antigens, leading to the discovery of a distinctive X-linked CT-antigen gene family.
Abstract: Massively parallel signature sequencing (MPSS) generates millions of short sequence tags corresponding to transcripts from a single RNA preparation. Most MPSS tags can be unambiguously assigned to genes, thereby generating a comprehensive expression profile of the tissue of origin. From the comparison of MPSS data from 32 normal human tissues, we identified 1,056 genes that are predominantly expressed in the testis. Further evaluation by using MPSS tags from cancer cell lines and EST data from a wide variety of tumors identified 202 of these genes as candidates for encoding cancer/testis (CT) antigens. Of these genes, the expression in normal tissues was assessed by RT-PCR in a subset of 166 intron-containing genes, and those with confirmed testis-predominant expression were further evaluated for their expression in 21 cancer cell lines. Thus, 20 CT or CT-like genes were identified, with several exhibiting expression in five or more of the cancer cell lines examined. One of these genes is a member of a CT gene family that we designated as CT45. The CT45 family comprises six highly similar (>98% cDNA identity) genes that are clustered in tandem within a 125-kb region on Xq26.3. CT45 was found to be frequently expressed in both cancer cell lines and lung cancer specimens. Thus, MPSS analysis has resulted in a significant extension of our knowledge of CT antigens, leading to the discovery of a distinctive X-linked CT-antigen gene family.

Journal ArticleDOI
TL;DR: These resources, developed as a part of the Cancer Chromosome Aberration Project (CCAP) initiative, aid the search for new cancer‐associated genes and foster insights into the causes and consequences of genetic alterations in cancer.
Abstract: To catalog data on chromosomal aberrations in cancer derived from emerging molecular cytogenetic techniques and to integrate these data with genome maps, we have established two resources, the NCI and NCBI SKY/M-FISH & CGH Database and the Cancer Chromosomes database. The goal of the former is to allow investigators to submit and analyze clinical and research cytogenetic data. It contains a karyotype parser tool, which automatically converts the ISCN short-form karyotype into an internal representation displayed in detailed form and as a colored ideogram with band overlay, and also has a tool to compare CGH profiles from multiple cases. The Cancer Chromosomes database integrates the SKY/M-FISH & CGH Database with the Mitelman Database of Chromosome Aberrations in Cancer and the Recurrent Chromosome Aberrations in Cancer database. These three datasets can now be searched seamlessly by use of the Entrez search and retrieval system for chromosome aberrations, clinical data, and reference citations. Common diagnoses, anatomic sites, chromosome breakpoints, junctions, numerical and structural abnormalities, and bands gained and lost among selected cases can be compared by use of the "similarity" report. Because the model used for CGH data is a subset of the karyotype data, it is now possible to examine the similarities between CGH results and karyotypes directly. All chromosomal bands are directly linked to the Entrez Map Viewer database, providing integration of cytogenetic data with the sequence assembly. These resources, developed as a part of the Cancer Chromosome Aberration Project (CCAP) initiative, aid the search for new cancer-associated genes and foster insights into the causes and consequences of genetic alterations in cancer.