Showing papers by "J. Craig Venter Institute published in 2005"
••
TL;DR: A map-based, finished quality sequence that covers 95% of the 389 Mb rice genome, including virtually all of the euchromatin and two complete centromeres, and finds evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes.
Abstract: Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.
3,423 citations
••
TL;DR: Detailed polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
Abstract: This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
3,412 citations
••
TL;DR: The genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans, was generated and Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactic pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
Abstract: The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
2,092 citations
••
Wellcome Trust Sanger Institute1, George Washington University2, J. Craig Venter Institute3, University of Glasgow4, University of Oxford5, Newcastle University6, University of Bordeaux7, University of Cambridge8, Oregon Health & Science University9, University of Dundee10, Imperial College London11, Case Western Reserve University12, Yale University13, Université catholique de Louvain14, University of Iowa15, Wellcome Trust16
TL;DR: Comparisons of the cytoskeleton and endocytic trafficking systems of Trypanosoma brucei with those of humans and other eukaryotic organisms reveal major differences.
Abstract: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including ∼900 pseudogenes and ∼1700 T. brucei–specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.
1,631 citations
••
Washington University in St. Louis1, J. Craig Venter Institute2, Wellcome Trust Sanger Institute3, University of Manchester4, Complutense University of Madrid5, Tohoku University6, University of Nottingham7, Tulane University8, University of Kentucky9, Max Planck Society10, Spanish National Research Council11, University of Salamanca12, University of São Paulo13, Innsbruck Medical University14, University of Wisconsin-Madison15, University of Tokyo16, Nagoya University17, National Institute of Advanced Industrial Science and Technology18, Pasteur Institute19, University of Texas MD Anderson Cancer Center20, University of Idaho21, University of Lausanne22, University of Göttingen23, Tokyo University of Agriculture and Technology24, University of Sheffield25, Broad Institute26
TL;DR: The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus and revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype.
Abstract: Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus.
1,356 citations
••
George Washington University1, Seattle Biomed2, University of Washington3, J. Craig Venter Institute4, Karolinska Institutet5, University of California, Los Angeles6, Universidade Federal de Minas Gerais7, Uppsala University8, Centre national de la recherche scientifique9, University of Glasgow10, University of Cambridge11, Federal University of São Paulo12, Children's Hospital Oakland Research Institute13, Johns Hopkins University School of Medicine14, National Research Council15, University of Oxford16, University of London17, University of Massachusetts Amherst18, Oswaldo Cruz Foundation19, University of Buenos Aires20, Central University of Venezuela21, National University of Singapore22, University of Georgia23
TL;DR: Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
Abstract: Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
1,349 citations
••
Broad Institute1, J. Craig Venter Institute2, Stanford University3, Oregon Health & Science University4, University of Glasgow5, Genetic Information Research Institute6, Institut Universitaire de France7, University of Kentucky8, University of Nebraska–Lincoln9, University of Göttingen10, Pasteur Institute11, University of São Paulo12, Texas A&M University13, Wellcome Trust Sanger Institute14, John Innes Centre15, University of Wisconsin-Madison16, Max Planck Society17, University of Oregon18, University of Nottingham19, Spanish National Research Council20, Ohio State University21, University of Georgia22, Tokyo Institute of Technology23, National Institute of Advanced Industrial Science and Technology24, George Washington University25, University of Manchester26, University of Liverpool27, University of Melbourne28, Karlsruhe Institute of Technology29, University of Idaho30
TL;DR: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution, and a comparative study with Aspergillus fumigatus and As pergillus oryzae, used in the production of sake, miso and soy sauce, provides new insight into eukaryotic genome evolution and gene regulation.
Abstract: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation.
1,297 citations
••
National Institute of Advanced Industrial Science and Technology1, National Institute of Technology and Evaluation2, Intec, Inc.3, Tohoku University4, University of Tokyo5, Nagoya University6, Tokyo University of Agriculture and Technology7, University of Manchester8, Broad Institute9, George Washington University10, Agricultural Research Service11, University of Nottingham12, Tulane University13, J. Craig Venter Institute14, Kikkoman15, Kyushu University16, Nara Institute of Science and Technology17
TL;DR: Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.
Abstract: The genome of Aspergillus oryzae, a fungus important for the production of traditional fermented foods and beverages in Japan, has been sequenced. The ability to secrete large amounts of proteins and the development of a transformation system have facilitated the use of A. oryzae in modern biotechnology. Although both A. oryzae and Aspergillus flavus belong to the section Flavi of the subgenus Circumdati of Aspergillus, A. oryzae, unlike A. flavus, does not produce aflatoxin, and its long history of use in the food industry has proved its safety. Here we show that the 37-megabase (Mb) genome of A. oryzae contains 12,074 genes and is expanded by 7-9 Mb in comparison with the genomes of Aspergillus nidulans and Aspergillus fumigatus. Comparison of the three aspergilli species revealed the presence of syntenic blocks and A. oryzae-specific blocks (lacking synteny with A. nidulans and A. fumigatus) in a mosaic manner throughout the genome of A. oryzae. The blocks of A. oryzae-specific sequence are enriched for genes involved in metabolism, particularly those for the synthesis of secondary metabolites. Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.
1,149 citations
••
TL;DR: A consortium of ten laboratories from the Washington, DC–Baltimore, USA, area was formed to compare data obtained from three widely used platforms using identical RNA samples to demonstrate that there are relatively large differences in data obtained in labs using the same platform, but that the results from the best-performing labs agree rather well.
Abstract: Microarray technology is a powerful tool for measuring RNA expression for thousands of genes at once. Various studies have been published comparing competing platforms with mixed results: some find agreement, others do not. As the number of researchers starting to use microarrays and the number of cross-platform meta-analysis studies rapidly increases, appropriate platform assessments become more important. Here we present results from a comparison study that offers important improvements over those previously described in the literature. In particular, we noticed that none of the previously published papers consider differences between labs. For this study, a consortium of ten laboratories from the Washington, DC–Baltimore, USA, area was formed to compare data obtained from three widely used platforms using identical RNA samples. We used appropriate statistical analysis to demonstrate that there are relatively large differences in data obtained in labs using the same platform, but that the results from the best-performing labs agree rather well.
897 citations
••
George Washington University1, University of Washington2, Seattle Biomed3, J. Craig Venter Institute4, Wellcome Trust Sanger Institute5, Karolinska Institutet6, Newcastle University7, Centre national de la recherche scientifique8, Universidade Federal de Minas Gerais9, Medical Research Council10, University of Cambridge11, University of Iowa12
TL;DR: No evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont is revealed, and a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters is revealed.
Abstract: A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.
761 citations
••
J. Craig Venter Institute1, Stanford University2, International School for Advanced Studies3, Duke University4, Washington University in St. Louis5, Saint Louis University6, University of British Columbia7, Boston Children's Hospital8, University of Texas Health Science Center at San Antonio9, Pasteur Institute10, National Institutes of Health11, University of Düsseldorf12, Connecticut Agricultural Experiment Station13, Boston University14, University of California, Santa Cruz15
TL;DR: Comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes, and the genome is rich in transposons, many of which cluster at candidate centromeric regions.
Abstract: Cryptococcus neoformans is a basidionnycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its similar to20-megabase genome, which contains similar to6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.
••
TL;DR: Analysis of this first sequenced endosymbiont genome from a filarial nematode provides insight into endosYmbionT evolution and additionally provides new potential targets for elimination of cutaneous and lymphatic human filarial disease.
Abstract: Complete genome DNA sequence and analysis is presented for Wolbachia, the obligate alpha-proteobacterial endosymbiont required for fertility and survival of the human filarial parasitic nematode Brugia malayi. Although, quantitatively, the genome is even more degraded than those of closely related Rickettsia species, Wolbachia has retained more intact metabolic pathways. The ability to provide riboflavin, flavin adenine dinucleotide, heme, and nucleotides is likely to be Wolbachia's principal contribution to the mutualistic relationship, whereas the host nematode likely supplies amino acids required for Wolbachia growth. Genome comparison of the Wolbachia endosymbiont of B. malayi (wBm) with the Wolbachia endosymbiont of Drosophila melanogaster (wMel) shows that they share similar metabolic trends, although their genomes show a high degree of genome shuffling. In contrast to wMel, wBm contains no prophage and has a reduced level of repeated DNA. Both Wolbachia have lost a considerable number of membrane biogenesis genes that apparently make them unable to synthesize lipid A, the usual component of proteobacterial membranes. However, differences in their peptidoglycan structures may reflect the mutualistic lifestyle of wBm in contrast to the parasitic lifestyle of wMel. The smaller genome size of wBm, relative to wMel, may reflect the loss of genes required for infecting host cells and avoiding host defense systems. Analysis of this first sequenced endosymbiont genome from a filarial nematode provides insight into endosymbiont evolution and additionally provides new potential targets for elimination of cutaneous and lymphatic human filarial disease.
••
TL;DR: The full sequencing and functional expression of a marine natural-product pathway from an obligate symbiont is presented, and a related cluster was identified in Trichodesmium erythraeum IMS101, an important bloom-forming cyanobacterium.
Abstract: Prochloron spp. are obligate cyanobacterial symbionts of many didemnid family ascidians. It has been proposed that the cyclic peptides of the patellamide class found in didemnid extracts are synthesized by Prochloron spp., but studies in which host and symbiont cells are separated and chemically analyzed to identify the biosynthetic source have yielded inconclusive results. As part of the Prochloron didemni sequencing project, we identified patellamide biosynthetic genes and confirmed their function by heterologous expression of the whole pathway in Escherichia coli. The primary sequence of patellamides A and C is encoded on a single ORF that resembles a precursor peptide. We propose that this prepatellamide is heterocyclized to form thiazole and oxazoline rings, and the peptide is cleaved to yield the two cyclic patellamides, A and C. This work represents the full sequencing and functional expression of a marine natural-product pathway from an obligate symbiont. In addition, a related cluster was identified in Trichodesmium erythraeum IMS101, an important bloom-forming cyanobacterium.
••
TL;DR: In this paper, the authors proposed a method for standardizing global gene expression analysis between laboratories and across platforms, which can be found in Section 5.2.1.1].
Abstract: Addendum: Standardizing global gene expression analysis between laboratories and across platforms
••
TL;DR: In this article, the authors describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution.
Abstract: Prochlorococcus is a marine cyanobacterium that numerically dominates the mid-latitude oceans and is the smallest known oxygenic phototroph. Numerous isolates from diverse areas of the world’s oceans have been studied and shown to be physiologically and genetically distinct. All isolates described thus far can be assigned to either a tightly clustered high-light (HL)-adapted clade, or a more divergent low-light (LL)-adapted group. The 16S rRNA sequences of the entire Prochlorococcus group differ by at most 3%, and the four initially published genomes revealed patterns of genetic differentiation that help explain physiological differences among the isolates. Here we describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution. There are 1,273 genes that represent the core shared by all 12 genomes. They are apparently sufficient, according to metabolic reconstruction, to encode a functional cell. We describe a phylogeny for all 12 isolates by subjecting their complete proteomes to three different phylogenetic analyses. For each non-core gene, we used a maximum parsimony method to estimate which ancestor likely first acquired or lost each gene. Many of the genetic differences among isolates, especially for genes involved in outer membrane synthesis and nutrient transport, are found within the same clade. Nevertheless, we identified some genes defining HL and LL ecotypes, and clades within these broad ecotypes, helping to demonstrate the basis of HL and LL adaptations in Prochlorococcus. Furthermore, our estimates of gene gain events allow us to identify highly variable genomic islands that are not apparent through simple pairwise comparisons. These results emphasize the functional roles, especially those connected to outer membrane synthesis and transport that dominate the flexible genome and set it apart from the core. Besides identifying islands and demonstrating their role throughout the history of Prochlorococcus, reconstruction of past gene gains and losses shows that much of the variability exists at the ‘‘leaves of the tree,’’ between the most closely related strains. Finally, the identification of core and flexible genes from this 12-genome comparison is largely consistent with the relative frequency of Prochlorococcus genes found in global ocean metagenomic databases, further closing the gap between our understanding of these organisms in the lab and the wild.
••
TL;DR: A new, large-scale sequencing effort to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations is reported, encompassing a total of 2,821,103 nucleotides.
Abstract: Influenza viruses are remarkably adept at surviving in the human population over a long timescale. The human influenza A virus continues to thrive even among populations with widespread access to vaccines, and continues to be a major cause of morbidity and mortality. The virus mutates from year to year, making the existing vaccines ineffective on a regular basis, and requiring that new strains be chosen for a new vaccine. Less-frequent major changes, known as antigenic shift, create new strains against which the human population has little protective immunity, thereby causing worldwide pandemics. The most recent pandemics include the 1918 'Spanish' flu, one of the most deadly outbreaks in recorded history, which killed 30-50 million people worldwide, the 1957 'Asian' flu, and the 1968 'Hong Kong' flu. Motivated by the need for a better understanding of influenza evolution, we have developed flexible protocols that make it possible to apply large-scale sequencing techniques to the highly variable influenza genome. Here we report the results of sequencing 209 complete genomes of the human influenza A virus, encompassing a total of 2,821,103 nucleotides. In addition to increasing markedly the number of publicly available, complete influenza virus genomes, we have discovered several anomalies in these first 209 genomes that demonstrate the dynamic nature of influenza transmission and evolution. This new, large-scale sequencing effort promises to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations. All data from this project are being deposited, without delay, in public archives.
••
TL;DR: A phylogenetic analysis of 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York State, United States demonstrated that multiple lineages can co-circulate, persist, and reassort in epidemiologically significant ways, and underscore the importance of genomic analyses for future influenza surveillance.
Abstract: Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York State, United States, and observed multiple co-circulating clades with different population frequencies. Strikingly, phylogenies inferred for individual gene segments revealed that multiple reassortment events had occurred among these clades, such that one clade of H3N2 viruses present at least since 2000 had provided the hemagglutinin gene for all those H3N2 viruses sampled after the 2002–2003 influenza season. This reassortment event was the likely progenitor of the antigenically variant influenza strains that caused the A/Fujian/411/2002-like epidemic of the 2003–2004 influenza season. However, despite sharing the same hemagglutinin, these phylogenetically distinct lineages of viruses continue to co-circulate in the same population. These data, derived from the first large-scale analysis of H3N2 viruses, convincingly demonstrate that multiple lineages can co-circulate, persist, and reassort in epidemiologically significant ways, and underscore the importance of genomic analyses for future influenza surveillance.
••
TL;DR: The genome sequence of Theileria parva is reported, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand.
Abstract: We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.
••
TL;DR: Two model legumes, Medicago truncatula and Lotus japonicus, are currently targets of large-scale genome sequencing projects and the prospect of integrating genome information from Mt and Lj is exciting.
Abstract: Two model legumes, Medicago truncatula ( Mt ) and Lotus japonicus ( Lj ), are currently targets of large-scale genome sequencing projects. As a result, legumes are one of few plant families with extensive genome sequence in multiple species. The prospect of integrating genome information from Mt and
••
TL;DR: This work describes a computational method for miRNA prediction and the results of its application to the discovery of novel mammalian miRNAs, and shows that although the overall miRNA content in the observed clusters is very similar across the three considered species, the internal organization of the clusters changes in evolution.
Abstract: MicroRNAs (miRNAs) are endogenous 21 to 23-nucleotide RNA molecules that regulate protein-coding gene expression in plants and animals via the RNA interference pathway. Hundreds of them have been identified in the last five years and very recent works indicate that their total number is still larger. Therefore miRNAs gene discovery remains an important aspect of understanding this new and still widely unknown regulation mechanism. Bioinformatics approaches have proved to be very useful toward this goal by guiding the experimental investigations. In this work we describe our computational method for miRNA prediction and the results of its application to the discovery of novel mammalian miRNAs. We focus on genomic regions around already known miRNAs, in order to exploit the property that miRNAs are occasionally found in clusters. Starting with the known human, mouse and rat miRNAs we analyze 20 kb of flanking genomic regions for the presence of putative precursor miRNAs (pre-miRNAs). Each genome is analyzed separately, allowing us to study the species-specific identity and genome organization of miRNA loci. We only use cross-species comparisons to make conservative estimates of the number of novel miRNAs. Our ab initio method predicts between fifty and hundred novel pre-miRNAs for each of the considered species. Around 30% of these already have experimental support in a large set of cloned mammalian small RNAs. The validation rate among predicted cases that are conserved in at least one other species is higher, about 60%, and many of them have not been detected by prediction methods that used cross-species comparisons. A large fraction of the experimentally confirmed predictions correspond to an imprinted locus residing on chromosome 14 in human, 12 in mouse and 6 in rat. Our computational tool can be accessed on the world-wide-web. Our results show that the assumption that many miRNAs occur in clusters is fruitful for the discovery of novel miRNAs. Additionally we show that although the overall miRNA content in the observed clusters is very similar across the three considered species, the internal organization of the clusters changes in evolution.
••
TL;DR: Conditions for rolling-circle amplification (RCA) of individual DNA molecules 5–7 kb in size by >109-fold, using φ29 DNA polymerase is described, which allows cell-free cloning of individual synthetic DNA molecules that cannot be cloned in Escherichia coli, and may also speed genome sequencing by eliminating the need for biological cloning.
Abstract: We describe conditions for rolling-circle amplification (RCA) of individual DNA molecules 5-7 kb in size by >10(9)-fold, using phi29 DNA polymerase. The principal difficulty with amplification of small amounts of template by RCA using phi29 DNA polymerase is "background" DNA synthesis that usually occurs when template is omitted, or at low template concentrations. Reducing the reaction volume while keeping the amount of template fixed increases the template concentration, resulting in a suppression of background synthesis. Cell-free cloning of single circular molecules by using phi29 DNA polymerase was achieved by carrying out the amplification reactions in very small volumes, typically 600 nl. This procedure allows cell-free cloning of individual synthetic DNA molecules that cannot be cloned in Escherichia coli, for example synthetic phage genomes carrying lethal mutations. It also allows cell-free cloning of genomic DNA isolated from bacteria. This DNA can be sequenced directly from the phi29 DNA polymerase reaction without further amplification. In contrast to PCR amplification, RCA using phi29 DNA polymerase does not produce mutant jackpots, and the high processivity of the enzyme eliminates stuttering at homopolymer tracts. Cell-free cloning has many potential applications to both natural and synthetic DNA. These include environmental DNA samples that have proven difficult to clone and synthetic genes encoding toxic products. The method may also speed genome sequencing by eliminating the need for biological cloning.
••
TL;DR: A high frequency of closely adjacent, apparent double crossover events that may represent gene conversions and large regions of genetic homogeneity among the archetypal clonal lineages are detected, reflecting the relatively few genetic outbreeding events that have occurred since their recent origin are detected.
Abstract: Toxoplasma gondii is a highly successful protozoan parasite in the phylum Apicomplexa, which contains numerous animal and human pathogens. T.gondii is amenable to cellular, biochemical, molecular and genetic studies, making it a model for the biology of this important group of parasites. To facilitate forward genetic analysis, we have developed a high-resolution genetic linkage map for T.gondii. The genetic map was used to assemble the scaffolds from a 10X shotgun whole genome sequence, thus defining 14 chromosomes with markers spaced at ∼300 kb intervals across the genome. Fourteen chromosomes were identified comprising a total genetic size of ∼592 cM and an average map unit of ∼104 kb/cM. Analysis of the genetic parameters in T.gondii revealed a high frequency of closely adjacent, apparent double crossover events that may represent gene conversions. In addition, we detected large regions of genetic homogeneity among the archetypal clonal lineages, reflecting the relatively few genetic outbreeding events that have occurred since their recent origin. Despite these unusual features, linkage analysis proved to be effective in mapping the loci determining several drug resistances. The resulting genome map provides a framework for analysis of complex traits such as virulence and transmission, and for comparative population genetic studies.
••
TL;DR: The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized.
Abstract: We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.
••
TL;DR: Based on syntenic alignments of these chromosomes, rice chromosome 11 and 12 do not appear to have resulted from a single whole-genome duplication event as previously suggested.
Abstract: Background: Rice is an important staple food and, with the smallest cereal genome, serves as a reference species for studies on the evolution of cereals and other grasses Therefore, decoding its entire genome will be a prerequisite for applied and basic research on this species and all other cereals Results: We have determined and analyzed the complete sequences of two of its chromosomes, 11 and 12, which total 559 Mb (143% of the entire genome length), based on a set of overlapping clones A total of 5,993 non-transposable element related genes are present on these chromosomes Among them are 289 disease resistance-like and 28 defense-response genes, a higher proportion of these categories than on any other rice chromosome A three-Mb segment on both chromosomes resulted from a duplication 77 million years ago (mya), the most recent large-scale duplication in the rice genome Paralogous gene copies within this segmental duplication can be aligned with genomic assemblies from sorghum and maize Although these gene copies are preserved on both chromosomes, their expression patterns have diverged When the gene order of rice chromosomes 11 and 12 was compared to wheat gene loci, significant synteny between these orthologous regions was detected, illustrating the presence of conserved genes alternating with recently evolved genes Conclusion: Because the resistance and defense response genes, enriched on these chromosomes relative to the whole genome, also occur in clusters, they provide a preferred target for breeding durable disease resistance in rice and the isolation of their allelic variants The recent duplication of a large chromosomal segment coupled with the high density of disease resistance gene clusters makes this the most recently evolved part of the rice genome Based on syntenic alignments of these chromosomes, rice chromosome 11 and 12 do not appear to have resulted from a single whole-genome duplication event as previously suggested (Resume d'auteur)
••
TL;DR: In this article, the authors report on the sequence analysis of members of the receptor tyrosine kinase (RTK) gene family in the genomes of glioblastoma brain tumors.
Abstract: It is now clear that tyrosine kinases represent attractive targets for therapeutic intervention in cancer. Recent advances in DNA sequencing technology now provide the opportunity to survey mutational changes in cancer in a high-throughput and comprehensive manner. Here we report on the sequence analysis of members of the receptor tyrosine kinase (RTK) gene family in the genomes of glioblastoma brain tumors. Previous studies have identified a number of molecular alterations in glioblastoma, including amplification of the RTK epidermal growth factor receptor. We have identified mutations in two other RTKs: (i) fibroblast growth receptor 1, including the first mutations in the kinase domain in this gene observed in any cancer, and (ii) a frameshift mutation in the platelet-derived growth factor receptor-α gene. Fibroblast growth receptor 1, platelet-derived growth factor receptor-α, and epidermal growth factor receptor are all potential entry points to the phosphatidylinositol 3-kinase and mitogen-activated protein kinase intracellular signaling pathways already known to be important for neoplasia. Our results demonstrate the utility of applying DNA sequencing technology to systematically assess the coding sequence of genes within cancer genomes.
••
TL;DR: A sequencing system has been developed that can read 25 million bases of genetic code — the entire genome of some fungi — within four hours, and may provide an alternative approach to DNA sequencing.
Abstract: A sequencing system has been developed that can read 25 million bases of genetic code — the entire genome of some fungi — within four hours. The technique may provide an alternative approach to DNA sequencing. The race is on for a big prize: the job of providing the world's DNA sequencing laboratories with the successor to the ‘Sanger-based’ technology that gave us the first wave of genome sequences. One technology in the frame is that produced by 454 Life Sciences Corporation of Branford, Connecticut. Today's technology reads 67,000 base pairs per hour; this new approach is 100 times faster, reading 6 million base pairs per hour. The improved performance results from using picolitre-sized chemical reactors, enhanced light-emitting sequencing chemistries and complex informatics. Further miniaturization of the system is planned. Such leaps in technology may one day make it possible to analyse an individual's genome before designing therapy: the ultimate in personalized medicine.
••
TL;DR: A stochastic model for initiation of DNA replication in the fission yeast is proposed and it is demonstrated that at least half of intergenes have potential origin activity and that the relative ability of an intergene to function as an origin is governed primarily by AT content and length.
Abstract: Origins of DNA replication in Schizosaccharomyces pombe lack a specific consensus sequence analogous to the Saccharomyces cerevisiae autonomously replicating sequence (ARS) consensus, raising the question of how they are recognized by the replication machinery. Because all well characterized S. pombe origins are located in intergenic regions, we analyzed the sequence properties and biological activity of such regions. The AT content of intergenes is very high (≈70%), and runs of A's or T's occur with a significantly greater frequency than expected. Additionally, the two DNA strands in intergenes display compositional asymmetry that strongly correlates with the direction of transcription of flanking genes. Importantly, the sequence properties of known S. pombe origins of DNA replication are similar to those of intergenes in general. In functional studies, we assayed the in vivo origin activity of 26 intergenes in a 68-kb region of S. pombe chromosome 2. We also assayed the origin activity of sets of randomly chosen intergenes with the same length or AT content. Our data demonstrate that at least half of intergenes have potential origin activity and that the relative ability of an intergene to function as an origin is governed primarily by AT content and length. We propose a stochastic model for initiation of DNA replication in the fission yeast. In this model, the number of AT tracts in a given sequence is the major determinant of its probability of binding SpORC and serving as a replication origin. A similar model may explain some features of origins of DNA replication in metazoans.
••
TL;DR: It is demonstrated that shear-induced cyclooxygenase (COX)-2 suppresses phosphatidylinositol 3-kinase (PI3-K) activity, which represses antioxidant response element (ARE)/NF-E2 related factor 2 (Nrf2)-mediated transcriptional response in human chondrocytes, which contributes to their apoptosis.
Abstract: Fluid shear exerts anti-inflammatory and anti-apoptotic effects on endothelial cells by inducing the coordinated expression of phase 2 detoxifying and antioxidant genes. In contrast, high shear is pro-apoptotic in chondrocytes and promotes matrix degradation and cartilage destruction. We have analyzed the mechanisms regulating shear-mediated chondrocyte apoptosis by cDNA microarray technology and bioinformatics. We demonstrate that shear-induced cyclooxygenase (COX)-2 suppresses phosphatidylinositol 3-kinase (PI3-K) activity, which represses antioxidant response element (ARE)/NF-E2 related factor 2 (Nrf2)-mediated transcriptional response in human chondrocytes. The resultant decrease in antioxidant capacity of sheared chondrocytes contributes to their apoptosis. Phase 2 inducers, and to a lesser extent COX-2-selective inhibitors, negate the shear-mediated suppression of ARE-driven phase 2 activity and apoptosis. The abrogation of shear-induced COX-2 expression by PI3-K activity and/or stimulation of the Nrf2/ARE pathway suggests the existence of PI3-K/Nrf2/ARE negative feedback loops that potentially interfere with c-Jun N-terminal kinase 2 activity upstream of COX-2. Reconstructing the signaling network regulating shear-induced chondrocyte apoptosis may provide insights to optimize conditions for culturing artificial cartilage in bioreactors and for developing therapeutic strategies for arthritic disorders.
••
TL;DR: MPSS analysis has resulted in a significant extension of the knowledge of CT antigens, leading to the discovery of a distinctive X-linked CT-antigen gene family.
Abstract: Massively parallel signature sequencing (MPSS) generates millions of short sequence tags corresponding to transcripts from a single RNA preparation. Most MPSS tags can be unambiguously assigned to genes, thereby generating a comprehensive expression profile of the tissue of origin. From the comparison of MPSS data from 32 normal human tissues, we identified 1,056 genes that are predominantly expressed in the testis. Further evaluation by using MPSS tags from cancer cell lines and EST data from a wide variety of tumors identified 202 of these genes as candidates for encoding cancer/testis (CT) antigens. Of these genes, the expression in normal tissues was assessed by RT-PCR in a subset of 166 intron-containing genes, and those with confirmed testis-predominant expression were further evaluated for their expression in 21 cancer cell lines. Thus, 20 CT or CT-like genes were identified, with several exhibiting expression in five or more of the cancer cell lines examined. One of these genes is a member of a CT gene family that we designated as CT45. The CT45 family comprises six highly similar (>98% cDNA identity) genes that are clustered in tandem within a 125-kb region on Xq26.3. CT45 was found to be frequently expressed in both cancer cell lines and lung cancer specimens. Thus, MPSS analysis has resulted in a significant extension of our knowledge of CT antigens, leading to the discovery of a distinctive X-linked CT-antigen gene family.
••
TL;DR: These resources, developed as a part of the Cancer Chromosome Aberration Project (CCAP) initiative, aid the search for new cancer‐associated genes and foster insights into the causes and consequences of genetic alterations in cancer.
Abstract: To catalog data on chromosomal aberrations in cancer derived from emerging molecular cytogenetic techniques and to integrate these data with genome maps, we have established two resources, the NCI and NCBI SKY/M-FISH & CGH Database and the Cancer Chromosomes database. The goal of the former is to allow investigators to submit and analyze clinical and research cytogenetic data. It contains a karyotype parser tool, which automatically converts the ISCN short-form karyotype into an internal representation displayed in detailed form and as a colored ideogram with band overlay, and also has a tool to compare CGH profiles from multiple cases. The Cancer Chromosomes database integrates the SKY/M-FISH & CGH Database with the Mitelman Database of Chromosome Aberrations in Cancer and the Recurrent Chromosome Aberrations in Cancer database. These three datasets can now be searched seamlessly by use of the Entrez search and retrieval system for chromosome aberrations, clinical data, and reference citations. Common diagnoses, anatomic sites, chromosome breakpoints, junctions, numerical and structural abnormalities, and bands gained and lost among selected cases can be compared by use of the "similarity" report. Because the model used for CGH data is a subset of the karyotype data, it is now possible to examine the similarities between CGH results and karyotypes directly. All chromosomal bands are directly linked to the Entrez Map Viewer database, providing integration of cytogenetic data with the sequence assembly. These resources, developed as a part of the Cancer Chromosome Aberration Project (CCAP) initiative, aid the search for new cancer-associated genes and foster insights into the causes and consequences of genetic alterations in cancer.