scispace - formally typeset
Search or ask a question

Showing papers by "J. Craig Venter Institute published in 2001"


Journal ArticleDOI
20 Jul 2001-Science
TL;DR: A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low–guanine/cytosine Gram-positive species.
Abstract: The 2,160,837-base pair genome sequence of an isolate of Streptococcus pneumoniae, a Gram-positive pathogen that causes pneumonia, bacteremia, meningitis, and otitis media, contains 2236 predicted coding regions; of these, 1440 (64%) were assigned a biological role. Approximately 5% of the genome is composed of insertion sequences that may contribute to genome rearrangements through uptake of foreign DNA. Extracellular enzyme systems for the metabolism of polysaccharides and hexosamines provide a substantial source of carbon and nitrogen for S. pneumoniae and also damage host tissues and facilitate colonization. A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low-guanine/cytosine (GC) Gram-positive species. Several surface-exposed proteins that may serve as potential vaccine candidates were identified. Comparative genome hybridization with DNA arrays revealed strain differences in S. pneumoniae that could contribute to differences in virulence and antigenicity.

1,409 citations


Journal ArticleDOI
Jun Kawai, Akira Shinagawa, K. Shibata, Masayasu Yoshino, Masayoshi Itoh, Y. Ishii, Takahiro Arakawa, A. Hara, Yoshifumi Fukunishi, Hideaki Konno, Jun Adachi, S. Fukuda, Katsunori Aizawa, Masaki Izawa, Kenichiro Nishi, H. Kiyosawa, S. Kondo, Itaru Yamanaka, Takashi Saito, Yasushi Okazaki, Takashi Gojobori1, Hidemasa Bono, Takeya Kasukawa2, Rintaro Saito, Koji Kadota, Hideo Matsuda3, Michael Ashburner, Serge Batalov4, Thomas L. Casavant5, W. Fleischmann, Terry Gaasterland6, Carmela Gissi7, Benjamin L. King, Hiromi Kochiwa8, P. Kuehl9, Simon L. Lewis10, Y. Matsuo, Itoshi Nikaido11, Graziano Pesole7, John Quackenbush12, Lynn M. Schriml13, F. Staubli, R. Suzuki8, Masaru Tomita8, Lukas Wagner13, Takanori Washio8, K. Sakai, Toshihisa Okido, Masaaki Furuno, H. Aono, Richard M. Baldarelli, Gregory S. Barsh14, Judith A. Blake, Dario Boffelli15, N. Bojunga, Piero Carninci, M. F. De Bonaldo5, Michael J. Brownstein13, Carol J. Bult, Christopher D.M. Fletcher4, Masaki Fujita16, Manuela Gariboldi, Stefano Gustincich17, David E. Hill, Marion A. Hofmann, David A. Hume18, Mamoru Kamiya, Norman H. Lee12, Paul A. Lyons19, Luigi Marchionni20, Jun Mashima1, J. Mazzarelli21, Peter Mombaerts6, P. Nordone22, Brian Z. Ring14, M. Ringwald, Ivan Rodriguez6, Naoaki Sakamoto, H. Sasaki23, K. Sato24, Christian Schönbach, Tsukasa Seya, Y. Shibata, Kai-Florian Storch, Harukazu Suzuki, Kazuhito Toyo-oka25, Kuan Hong Wang26, Charles J. Weitz17, Charles A. Whittaker26, L. Wilming27, Anthony Wynshaw-Boris25, K. Yoshida, Y. Hasegawa2, Hideya Kawaji3, Hideya Kawaji2, S. Kohtsuki2, Yoshihide Hayashizaki24 
08 Feb 2001-Nature
TL;DR: The first RIKEN clone collection is described, which is one of the largest described for any organism and analysis of these cDNAs extends known gene families and identifies new ones.
Abstract: The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

700 citations


Journal ArticleDOI
TL;DR: C. crescentus is, to the authors' knowledge, the first free-living α-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine andhuman pathogen Brucella abortus.
Abstract: The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living alpha-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus.

543 citations


Journal ArticleDOI
TL;DR: The Comprehensive Microbial Resource (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes as mentioned in this paper.
Abstract: The Comprehensive Microbial Resource or CMR (http://cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence and annotation for complete and publicly available bacterial and archaeal genomes. In addition to displaying the original annotation from GenBank, the CMR makes available secondary automated structural and functional annotation across all genomes to provide consistent data types necessary for effective mining of genomic data. Precomputed homology searches are stored to allow meaningful genome comparisons. The CMR supplies users with over 50 different tools to utilize the sequence and annotation data across one or more of the 571 currently available genomes. At the gene level users can view the gene annotation and underlying evidence. Genome level information includes whole genome graphical displays, biochemical pathway maps and genome summary data. Comparative tools display analysis between genomes with homology and genome alignment tools, and searches across the accessions, annotation, and evidence assigned to all genes/genomes are available. The data and tools on the CMR aid genomic research and analysis, and the CMR is included in over 200 scientific publications. The code underlying the CMR website and the CMR database are freely available for download with no license restrictions.

502 citations


Journal ArticleDOI
TL;DR: It is found that lateral gene transfer has played a fundamental role in the evolution of S. aureus, demonstrating that methicillin-resistant strains have evolved multiple independent times, rather than from a single ancestral strain.
Abstract: An emerging theme in medical microbiology is that extensive variation exists in gene content among strains of many pathogenic bacterial species. However, this topic has not been investigated on a genome scale with strains recovered from patients with well-defined clinical conditions. Staphylococcus aureus is a major human pathogen and also causes economically important infections in cows and sheep. A DNA microarray representing >90% of the S. aureus genome was used to characterize genomic diversity, evolutionary relationships, and virulence gene distribution among 36 strains of divergent clonal lineages, including methicillin-resistant strains and organisms causing toxic shock syndrome. Genetic variation in S. aureus is very extensive, with ≈22% of the genome comprised of dispensable genetic material. Eighteen large regions of difference were identified, and 10 of these regions have genes that encode putative virulence factors or proteins mediating antibiotic resistance. We find that lateral gene transfer has played a fundamental role in the evolution of S. aureus. The mec gene has been horizontally transferred into distinct S. aureus chromosomal backgrounds at least five times, demonstrating that methicillin-resistant strains have evolved multiple independent times, rather than from a single ancestral strain. This finding resolves a long-standing controversy in S. aureus research. The epidemic of toxic shock syndrome that occurred in the 1970s was caused by a change in the host environment, rather than rapid geographic dissemination of a new hypervirulent strain. DNA microarray analysis of large samples of clinically characterized strains provides broad insights into evolution, pathogenesis, and disease emergence.

463 citations


Journal ArticleDOI
TL;DR: The genome of the halophilic archaeon Halobacterium sp.
Abstract: The genome of the halophilic archaeon Halobacterium sp. NRC-1 and predicted proteome have been analyzed by computational methods and reveal characteristics relevant to life in an extreme environment distinguished by hypersalinity and high solar radiation: (1) The proteome is highly acidic, with a median pI of 4.9 and mostly lacking basic proteins. This characteristic correlates with high surface negative charge, determined through homology modeling, as the major adaptive mechanism of halophilic proteins to function in nearly saturating salinity. (2) Codon usage displays the expected GC bias in the wobble position and is consistent with a highly acidic proteome. (3) Distinct genomic domains of NRC-1 with bacterial character are apparent by whole proteome BLAST analysis, including two gene clusters coding for a bacterial-type aerobic respiratory chain. This result indicates that the capacity of halophiles for aerobic respiration may have been acquired through lateral gene transfer. (4) Two regions of the large chromosome were found with relatively lower GC composition and overrepresentation of IS elements, similar to the minichromosomes. These IS-element-rich regions of the genome may serve to exchange DNA between the three replicons and promote genome evolution. (5) GC-skew analysis showed evidence for the existence of two replication origins in the large chromosome. This finding and the occurrence of multiple chromosomes indicate a dynamic genome organization with eukaryotic character.

318 citations


Journal ArticleDOI
15 Feb 2001-Nature
TL;DR: This resource represents the first comprehensive integration of cytogenetic, radiation hybrid, linkage and sequence maps of the human genome and provides an independent validation of the sequence map and framework for contig order and orientation.
Abstract: We have placed 7,600 cytogenetically defined landmarks on the draft sequence of the human genome to help with the characterization of genes altered by gross chromosomal aberrations that cause human disease. The landmarks are large-insert clones mapped to chromosome bands by fluorescence in situ hybridization. Each clone contains a sequence tag that is positioned on the genomic sequence. This genome-wide set of sequence-anchored clones allows structural and functional analyses of the genome. This resource represents the first comprehensive integration of cytogenetic, radiation hybrid, linkage and sequence maps of the human genome; provides an independent validation of the sequence map and framework for contig order and orientation; surveys the genome for large-scale duplications, which are likely to require special attention during sequence assembly; and allows a stringent assessment of sequence differences between the dark and light bands of chromosomes. It also provides insight into large-scale chromatin structure and the evolution of chromosomes and gene families and will accelerate our understanding of the molecular bases of human disease and cancer.

314 citations


Journal ArticleDOI
08 Jun 2001-Science
TL;DR: About 40 genes were found to be exclusively shared by humans and bacteria and are candidate examples of horizontal transfer from bacteria to vertebrates.
Abstract: The human genome was analyzed for evidence that genes had been laterally transferred into the genome from prokaryotic organisms. Protein sequence comparisons of the proteomes of human, fruit fly, nematode worm, yeast, mustard weed, eukaryotic parasites, and all completed prokaryote genomes were performed, and all genes shared between human and each of the other groups of organisms were collected. About 40 genes were found to be exclusively shared by humans and bacteria and are candidate examples of horizontal transfer from bacteria to vertebrates. Gene loss combined with sample size effects and evolutionary rate variation provide an alternative, more biologically plausible explanation.

301 citations


Journal ArticleDOI
TL;DR: It is demonstrated that fiber-fluorescence in situ hybridization is a powerful technique to analyze large repetitive regions in the higher eukaryotic genomes and is a valuable complement to ongoing large genome sequencing projects.
Abstract: Previously conducted sequence analysis of Arabidopsis thaliana (ecotype Columbia-0) reported an insertion of 270-kb mtDNA into the pericentric region on the short arm of chromosome 2. DNA fiber-based fluorescence in situ hybridization analyses reveal that the mtDNA insert is 618 ± 42 kb, ≈2.3 times greater than that determined by contig assembly and sequencing analysis. Portions of the mitochondrial genome previously believed to be absent were identified within the insert. Sections of the mtDNA are repeated throughout the insert. The cytological data illustrate that DNA contig assembly by using bacterial artificial chromosomes tends to produce a minimal clone path by skipping over duplicated regions, thereby resulting in sequencing errors. We demonstrate that fiber-fluorescence in situ hybridization is a powerful technique to analyze large repetitive regions in the higher eukaryotic genomes and is a valuable complement to ongoing large genome sequencing projects.

209 citations


Journal ArticleDOI
TL;DR: The new system, RBSfinder, is tested on a validated set of genes from Escherichia coli, for which it improves the accuracy of start site locations predicted by computational gene finding systems from the range 67-77% to 90% correct.
Abstract: As the pace of genome sequencing has accelerated, the need for highly accurate gene prediction systems has grown. Computational systems for identifying genes in prokaryotic genomes have sensitivities of 98-99% or higher (Delcher et al., Nucleic Acids Res., 27, 4636-4641, 1999). These accuracy figures are calculated by comparing the locations of verified stop codons to the predictions. Determining the accuracy of start codon prediction is more problematic, however, due to the relatively small number of start sites that have been confirmed by independent, non-computational methods. Nonetheless, the accuracy of gene finders at predicting the exact gene boundaries at both the 5' and 3' ends of genes is of critical importance for microbial genome annotation, especially in light of the important signaling information that is sometimes found on the 5' end of a protein coding region. In this paper we propose a probabilistic method to improve the accuracy of gene identification systems at finding precise translation start sites. The new system, RBSfinder, is tested on a validated set of genes from Escherichia coli, for which it improves the accuracy of start site locations predicted by computational gene finding systems from the range 67-77% to 90% correct.

208 citations


Journal ArticleDOI
TL;DR: It is shown that pachytene chromosome-based fluorescence in situ hybridization analysis is the most effective approach to integrate DNA sequences with euchromatic and heterochromatic features in the rice genome.
Abstract: Rice (Oryza sativa L.) will be the first major crop, as well as the first monocot plant species, to be completely sequenced. Integration of DNA sequence-based maps with cytological maps will be essential to fully characterize the rice genome. We have isolated a set of 24 chromosomal arm-specific bacterial artificial chromosomes to facilitate rice chromosome identification. A standardized rice karyotype was constructed using meiotic pachytene chromosomes of O. sativa spp. japonica rice var. Nipponbare. This karyotype is anchored by centromere-specific and chromosomal arm-specific cytological landmarks and is fully integrated with the most saturated rice genetic linkage maps in which Nipponbare was used as one of the mapping parents. An ideogram depicting the distribution of heterochromatin in the rice genome was developed based on the patterns of 4',6-diamidino-2-phenylindole staining of the Nipponbare pachytene chromosomes. The majority of the heterochromatin is distributed in the pericentric regions with some rice chromosomes containing a significantly higher proportion of heterochromatin than other chromosomes. We showed that pachytene chromosome-based fluorescence in situ hybridization analysis is the most effective approach to integrate DNA sequences with euchromatic and heterochromatic features.

Patent
29 Oct 2001
TL;DR: In this paper, the authors provided proteins from group B streptococcus and group A staphylococcus pyogenes, including amino acid sequences and corresponding nucleotide sequences, for vaccines, immunogenic compositions, and diagnostics.
Abstract: The invention provides proteins from group B streptococcus (Streptococcus agalactiae) and group A streptococcus (Streptococcus pyogenes), including amino acid sequences and the corresponding nucleotide sequences. Data are given to show that the proteins are useful antigens for vaccines, immunogenic compositions, and/or diagnostics. The proteins are also targets for antibiotics.

Journal ArticleDOI
TL;DR: A high quality cDNA library of the Plasmodium sporozoite stage is constructed by using the rodent malaria parasite P. yoelii, an important model for malaria vaccine development and ESTs for three proteins that may be involved in host cell invasion are identified and documented their expression in sporozoites.
Abstract: Most studies of gene expression in Plasmodium have been concerned with asexual and/or sexual erythrocytic stages. Identification and cloning of genes expressed in the preerythrocytic stages lag far behind. We have constructed a high quality cDNA library of the Plasmodium sporozoite stage by using the rodent malaria parasite P. yoelii, an important model for malaria vaccine development. The technical obstacles associated with limited amounts of RNA material were overcome by PCR-amplifying the transcriptome before cloning. Contamination with mosquito RNA was negligible. Generation of 1,972 expressed sequence tags (EST) resulted in a total of 1,547 unique sequences, allowing insight into sporozoite gene expression. The circumsporozoite protein (CS) and the sporozoite surface protein 2 (SSP2) are well represented in the data set. A blastx search with all tags of the nonredundant protein database gave only 161 unique significant matches (P(N) ≤ 10−4), whereas 1,386 of the unique sequences represented novel sporozoite-expressed genes. We identified ESTs for three proteins that may be involved in host cell invasion and documented their expression in sporozoites. These data should facilitate our understanding of the preerythrocytic Plasmodium life cycle stages and the development of preerythrocytic vaccines.

Journal Article
TL;DR: The complete genome sequences of 36 microorganisms have now been published and this wealth of genome data has enabled the development of comparative genomic and functional genomic approaches to investigate the biology of these organisms.
Abstract: The complete genome sequences of 36 microorganisms have now been published and this wealth of genome data has enabled the development of comparative genomic and functional genomic approaches to investigate the biology of these organisms. Comparative genomic analyses of membrane transport systems have revealed that transporter substrate specificities correlate with an organism’s lifestyle. The types and numbers of predicted drug efflux systems vary dramatically amongst sequenced organisms. Microarray and gene knockout studies to date have suggested that predicted drug efflux genes often appear to be a) nonessential and b) expressed at detectable levels under standard laboratory growth conditions.

Journal ArticleDOI
TL;DR: The identification and characterization of 12,267 potential variants (SNPs and other small insertions/deletions) of human chromosome 22, discovered in the overlaps of 460 clones used for the chromosome sequencing are reported.
Abstract: The recent publication of the complete sequence of human chromosome 22 provides a platform from which to investigate genomic sequence variation. We report the identification and characterization of 12,267 potential variants (SNPs and other small insertions/deletions) of human chromosome 22, discovered in the overlaps of 460 clones used for the chromosome sequencing. We found, on average, 1 potential variant every 1.07 kb and approximately 18% of the potential variants involve insertions/deletions. The SNPs have been positioned both relative to each other, and to genes, predicted genes, repeat sequences, other genetic markers, and the 2730 SNPs previously identified on the chromosome. A subset of the SNPs were verified experimentally using either PCR–RFLP or genomic Invader assays. These experiments confirmed 92% of the potential variants in a panel of 92 individuals. [Details of the SNPs and RFLP assays can be found at http://www.sanger.ac.uk and in dbSNP.]

Journal ArticleDOI
TL;DR: The comprehensive analysis of the genome sequence of the plant Arabidopsis thaliana has been completed recently and much remains to be done to refine the analysis of encoded genes and define the functions of encoded proteins systematically.

Journal ArticleDOI
TL;DR: A large-scale BAC end-sequencing project at The Institute for Genomic Research has generated one of the most extensive sets of sequence markers for the mouse genome to date, and analyses indicate that the high-quality mouse BACend sequences will be a valuable resource to the community.
Abstract: Because of the high stability (Shizuya et al. 1992; Kim et al. 1996a,b), libraries constructed in bacterial artificial chromosome (BAC) vectors have become the standard clone sets in high-throughput genomic sequencing projects of organisms with large genomes. End sequences from BACs provide highly specific markers. A genome sequencing approach (Venter et al. 1996) has been described, in which a clone contig is extended by selecting the minimally overlapping clones in each direction by searching the finished BAC sequence against a BAC end sequence (BES) database. Because BACs (an average insert size of 150 kb) are sufficiently large to traverse most tandem arrays of homology units and repeats, BESs are useful in genome assembly and chromosome walking and have been used extensively to confirm, join, and order existing contigs (International Human Genome Sequencing Consortium 2001a). The whole-genome shotgun sequencing strategy relies on BESs as the primary scaffold onto which the end sequences from the smaller clones are assembled (Venter et al. 1998, 2001). The mouse and the human share many fundamental biological processes. Consequently, the mouse has been used frequently in medical research and is the best model system for studying human disease. Additionally, the mouse genome sequence facilitates the accurate annotation of the human genome. As such, National Institutes of Health (NIH) launched a mouse genome-sequencing project in October, 1999 (http://www.nhgri.nih.gov/NEWS/MouseRelease.htm). Compared with the human, significantly fewer large-scale mapping efforts have been conducted for the mouse and much less data are available to the community (Hudson et al. 1995; Dietrich et al. 1996; Schuler et al. 1996; McCarthy et al. 1997; Stewart et al. 1997; Deloukas et al. 1998; Van Etten et al. 1999; International Human Genome Mapping Consortium 2001a; Olivier et al. 2001). A large-scale BAC end-sequencing project generates an extensive set of random markers across the genome in an inexpensive and rapid fashion, and will be crucial to the success of the combined strategy of BAC-based sequencing and a moderate level of whole-genome shotgun sequencing that is being used for the mouse genome. The Institute for Genomic Research (TIGR) is the only center conducting large-scale BAC end-sequencing for the mouse, in which the aim of the project is to generate accurate BES pairs from 170,000 RPCI-23 clones (Osoegawa et al. 2000) and 130,000 RPCI-24 clones to support the mouse genome sequencing project. The same set of clones has been fingerprinted at the Genome Sequencing Centre of British Columbia Cancer Research Centre at Vancouver Canada (http://www.bcgsc.bc.ca/projects/mouse_mapping/). We have approached the goal of the project and have generated ∼450,000 sequences (http://www.tigr.org/tdb/bac_ends/mouse/bac_end_intro.html). To provide a better characterization of this valuable resource, we conducted comprehensive quality assessment and sequence analyses as described below.

Patent
29 Oct 2001
TL;DR: In this paper, the authors provided proteins from group B streptococcus and group A staphylococcus pyogenes, including amino acid sequences and corresponding nucleotide sequences, for vaccines, immunogenic compositions, and diagnostics.
Abstract: The invention provides proteins from group B streptococcus (Streptococcus agalactiae) and group A streptococcus (Streptococcus pyogenes), including amino acid sequences and the corresponding nucleotide sequences. Data are given to show that the proteins are useful antigens for vaccines, immunogenic compositions, and/or diagnostics. The proteins are also targets for antibiotics.