scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence

TL;DR: The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve the understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions.
Abstract: Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A new greedy alignment algorithm is introduced with particularly good performance and it is shown that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data.
Abstract: For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.

4,628 citations


Cites methods from "Deciphering the biology of Mycobact..."

  • ...The H37Rv sequence is 4,441,529 nucleotides, while the other sequence is roughly the same length, though at the time we obtained the sequence (March 3, 1999), it consisted of 42 contigs....

    [...]

  • ...In typical Blast fashion (Altschul et al., 1990), the program begins by making a table of all 12-mers in the H37Rv sequence; this took about 15 seconds on our Sun Ultra-30 workstation (296 MHz)....

    [...]

  • ...Part of the contig aligns with the start of the H37Rv, and part with the end (the genomes are circular, so the starting point for the sequence record is largely arbitrary)....

    [...]

  • ...We used that program to align the genomic sequence of Mycobacterium tuberculosis, strain H37Rv (Cole et al., 1998), with the sequence being generated at The Institute for Genomic Research from another M. tuberculosis strain that is 99% identical....

    [...]

Journal ArticleDOI
31 Aug 2000-Nature
TL;DR: It is proposed that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.
Abstract: Pseudomonas aeruginosa is a ubiquitous environmental bacterium that is one of the top three causes of opportunistic human infections. A major factor in its prominence as a pathogen is its intrinsic resistance to antibiotics and disinfectants. Here we report the complete sequence of P. aeruginosa strain PAO1. At 6.3 million base pairs, this is the largest bacterial genome sequenced, and the sequence provides insights into the basis of the versatility and intrinsic drug resistance of P. aeruginosa. Consistent with its larger genome size and environmental adaptability, P. aeruginosa contains the highest proportion of regulatory genes observed for a bacterial genome and a large number of genes involved in the catabolism, transport and efflux of organic compounds as well as four potential chemotaxis systems. We propose that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.

4,220 citations

Journal ArticleDOI
09 May 2002-Nature
TL;DR: The 8,667,507 base pair linear chromosome of Streptomyces coelicolor is reported, containing the largest number of genes so far discovered in a bacterium.
Abstract: Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

3,077 citations


Cites background from "Deciphering the biology of Mycobact..."

  • ...lipopeptide calcium-dependent antibiotic (CDA) of Streptomyces coelicolor A3(2)....

    [...]

  • ...Spore colour in Streptomyces coelicolor A3(2) involves the developmentally...

    [...]

  • ...Streptomyces coelicolor A3(2) chromosome....

    [...]

  • ...The strain used, M145, is a prototrophic derivative of strain A3(2) lacking its two plasmids (SCP1, linear, 365 kb, AL590463, AL590464; and SCP2, circular, 31 kb, AL645771, which have been sequenced separately)....

    [...]

  • ...synthase III (FabH) is essential for fatty acid biosynthesis in Streptomyces coelicolor A3(2)....

    [...]

Journal ArticleDOI
TL;DR: The use of transposon site hybridization (TraSH) is described to comprehensively identify the genes required by the causative agent, Mycobacterium tuberculosis, for optimal growth, suggesting that the minimal gene set required for survival varies greatly between organisms with different evolutionary histories.
Abstract: Despite over a century of research, tuberculosis remains a leading cause of infectious death worldwide. Faced with increasing rates of drug resistance, the identification of genes that are required for the growth of this organism should provide new targets for the design of antimycobacterial agents. Here, we describe the use of transposon site hybridization (TraSH) to comprehensively identify the genes required by the causative agent, Mycobacterium tuberculosis, for optimal growth. These genes include those that can be assigned to essential pathways as well as many of unknown function. The genes important for the growth of M. tuberculosis are largely conserved in the degenerate genome of the leprosy bacillus, Mycobacterium leprae, indicating that non-essential functions have been selectively lost since this bacterium diverged from other mycobacteria. In contrast, a surprisingly high proportion of these genes lack identifiable orthologues in other bacteria, suggesting that the minimal gene set required for survival varies greatly between organisms with different evolutionary histories.

2,362 citations

Journal ArticleDOI
TL;DR: An automatic method for recognizing natively disordered regions from amino acid sequence is described and benchmarked against predictors that were assessed at the latest critical assessment of techniques for protein structure prediction (CASP) experiment and represents a statistically significant improvement on the methods evaluated on the same targets at CASP.

1,946 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Abstract: We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.

12,432 citations

Journal ArticleDOI
TL;DR: A program is described, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases.
Abstract: We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.

9,629 citations

Journal ArticleDOI
05 Sep 1997-Science
TL;DR: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented and reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident.
Abstract: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer.

7,723 citations


"Deciphering the biology of Mycobact..." refers background in this paper

  • ...This represents the second-largest bacterial genome sequence currently available (after that of Escherichia coli...

    [...]

Journal ArticleDOI
F. Kunst1, Naotake Ogasawara2, Ivan Moszer1, Alessandra M. Albertini3  +151 moreInstitutions (30)
20 Nov 1997-Nature
TL;DR: Bacillus subtilis is the best-characterized member of the Gram-positive bacteria, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.
Abstract: Bacillus subtilis is the best-characterized member of the Gram-positive bacteria. Its genome of 4,214,810 base pairs comprises 4,100 protein-coding genes. Of these protein-coding genes, 53% are represented once, while a quarter of the genome corresponds to several gene families that have been greatly expanded by gene duplication, the largest family containing 77 putative ATP-binding transport proteins. In addition, a large proportion of the genetic capacity is devoted to the utilization of a variety of carbon sources, including many plant-derived molecules. The identification of five signal peptidase genes, as well as several genes for components of the secretion apparatus, is important given the capacity of Bacillus strains to secrete large amounts of industrially important enzymes. Many of the genes are involved in the synthesis of secondary metabolites, including antibiotics, that are more typically associated with Streptomyces species. The genome contains at least ten prophages or remnants of prophages, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.

3,753 citations

Related Papers (5)