scispace - formally typeset
Search or ask a question

Showing papers on "Genome published in 2001"


Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations


Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations


Journal ArticleDOI
17 May 2001-Nature
TL;DR: This review summarizes the main DNA caretaking systems and their impact on genome stability and carcinogenesis.
Abstract: The early notion that cancer is caused by mutations in genes critical for the control of cell growth implied that genome stability is important for preventing oncogenesis. During the past decade, knowledge about the mechanisms by which genes erode and the molecular machinery designed to counteract this time-dependent genetic degeneration has increased markedly. At the same time, it has become apparent that inherited or acquired deficiencies in genome maintenance systems contribute significantly to the onset of cancer. This review summarizes the main DNA caretaking systems and their impact on genome stability and carcinogenesis.

3,898 citations


Journal ArticleDOI
14 Dec 2001-Science
TL;DR: A method for systematic construction of double mutants, termed synthetic genetic array (SGA) analysis, in which a query mutation is crossed to an array of ∼4700 deletion mutants is developed, which should produce a global map of gene function.
Abstract: In Saccharomyces cerevisiae, more than 80% of the ∼6200 predicted genes are nonessential, implying that the genome is buffered from the phenotypic consequences of genetic perturbation. To evaluate function, we developed a method for systematic construction of double mutants, termed synthetic genetic array (SGA) analysis, in which a query mutation is crossed to an array of ∼4700 deletion mutants. Inviable double-mutant meiotic progeny identify functional relationships between genes. SGA analysis of genes with roles in cytoskeletal organization (BNI1,ARP2, ARC40, BIM1), DNA synthesis and repair (SGS1, RAD27), or uncharacterized functions (BBC1, NBP2) generated a network of 291 interactions among 204 genes. Systematic application of this approach should produce a global map of gene function.

2,164 citations



Journal ArticleDOI
25 Jan 2001-Nature
TL;DR: It is found that lateral gene transfer is far more extensive than previously anticipated and 1,387 new genes encoded in strain-specific clusters of diverse sizes were found in O157:H7, including candidate virulence factors, alternative metabolic capacities, several prophages and other new functions—all of which could be targets for surveillance.
Abstract: The bacterium Escherichia coli O157:H7 is a worldwide threat to public health and has been implicated in many outbreaks of haemorrhagic colitis, some of which included fatalities caused by haemolytic uraemic syndrome. Close to 75,000 cases of O157:H7 infection are now estimated to occur annually in the United States. The severity of disease, the lack of effective treatment and the potential for large-scale outbreaks from contaminated food supplies have propelled intensive research on the pathogenesis and detection of E. coli O157:H7 (ref. 4). Here we have sequenced the genome of E. coli O157:H7 to identify candidate genes responsible for pathogenesis, to develop better methods of strain detection and to advance our understanding of the evolution of E. coli, through comparison with the genome of the non-pathogenic laboratory strain E. coli K-12 (ref. 5). We find that lateral gene transfer is far more extensive than previously anticipated. In fact, 1,387 new genes encoded in strain-specific clusters of diverse sizes were found in O157:H7. These include candidate virulence factors, alternative metabolic capacities, several prophages and other new functions--all of which could be targets for surveillance.

2,011 citations


Journal ArticleDOI
TL;DR: The resultant primer set is suitable for all influenza A viruses to generate full-length cDNAs, to subtype viruses, to sequence their DNA, and to construct expression plasmids for reverse genetics systems.
Abstract: To systematically identify and analyze the 15 HA and 9 NA subtypes of influenza A virus, we need reliable, simple methods that not only characterize partial sequences but analyze the entire influenza A genome. We designed primers based on the fact that the 15 and 21 terminal segment specific nucleotides of the genomic viral RNA are conserved between all influenza A viruses and unique for each segment. The primers designed for each segment contain influenza virus specific nucleotides at their 3'-end and non-influenza virus nucleotides at the 5'-end. With this set of primers, we were able to amplify all eight segments of N1, N2, N4, N5, and N8 subtypes. For N3, N6, N7, and N9 subtypes, the segment specific sequences of the neuraminidase genes are different. Therefore, we optimized the primer design to allow the amplification of those neuraminidase genes as well. The resultant primer set is suitable for all influenza A viruses to generate full-length cDNAs, to subtype viruses, to sequence their DNA, and to construct expression plasmids for reverse genetics systems.

1,924 citations


Journal ArticleDOI
TL;DR: The current knowledge of the human ABC genes, their role in inherited disease, and understanding of the topology of these genes within the membrane are reviewed.
Abstract: The ATP-binding cassette (ABC) transporter superfamily contains membrane proteins that translocate a variety of substrates across extra- and intra-cellular membranes. Genetic variation in these genes is the cause of or contributor to a wide variety of human disorders with Mendelian and complex inheritance, including cystic fibrosis, neurological disease, retinal degeneration, cholesterol and bile transport defects, anemia, and drug response. Conservation of the ATP-binding domains of these genes has allowed the identification of new members of the superfamily based on nucleotide and protein sequence homology. Phylogenetic analysis is used to divide all 48 known ABC transporters into seven distinct subfamilies of proteins. For each gene, the precise map location on human chromosomes, expression data, and localization within the superfamily has been determined. These data allow predictions to be made as to potential functions or disease phenotypes associated with each protein. In this paper, we review the current state of knowledge on all human ABC genes in inherited disease and drug resistance. In addition, the availability of the complete Drosophila genome sequence allows the comparison of the known human ABC genes with those in the fly genome. The combined data enable an evolutionary analysis of the superfamily. Complete characterization of all ABC from the human genome and from model organisms will lead to important insights into the physiology and the molecular basis of many human disorders.

1,751 citations


Journal ArticleDOI
22 Feb 2001-Nature
TL;DR: Comparing the 3.27-megabase genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis provides clear explanations for these properties and reveals an extreme case of reductive evolution.
Abstract: Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

1,620 citations


Journal ArticleDOI
TL;DR: A set of 200 Class I SSR markers was developed and integrated into the existing microsatellite map of rice, providing immediate links between the genetic, physical, and sequence-based maps.
Abstract: A total of 57.8 Mb of publicly available rice (Oryza sativa L.) DNA sequence was searched to determine the frequency and distribution of different simple sequence repeats (SSRs) in the genome. SSR loci were categorized into two groups based on the length of the repeat motif. Class I, or hypervariable markers, consisted of SSRs > or =20 bp, and Class II, or potentially variable markers, consisted of SSRs > or =12 bp <20 bp. The occurrence of Class I SSRs in end-sequences of EcoRI- and HindIII-digested BAC clones was one SSR per 40 Kb, whereas in continuous genomic sequence (represented by 27 fully sequenced BAC and PAC clones), the frequency was one SSR every 16 kb. Class II SSRs were estimated to occur every 3.7 kb in BAC ends and every 1.9 kb in fully sequenced BAC and PAC clones. GC-rich trinucleotide repeats (TNRs) were most abundant in protein-coding portions of ESTs and in fully sequenced BACs and PACs, whereas AT-rich TNRs showed no such preference, and di- and tetranucleotide repeats were most frequently found in noncoding, intergenic regions of the rice genome. Microsatellites with poly(AT)n repeats represented the most abundant and polymorphic class of SSRs but were frequently associated with the Micropon family of miniature inverted-repeat transposable elements (MITEs) and were difficult to amplify. A set of 200 Class I SSR markers was developed and integrated into the existing microsatellite map of rice, providing immediate links between the genetic, physical, and sequence-based maps. This contribution brings the number of microsatellite markers that have been rigorously evaluated for amplification, map position, and allelic diversity in Oryza spp. to a total of 500.

1,495 citations


Journal ArticleDOI
26 Oct 2001-Science
TL;DR: A large number of predicted genes encoding surface and secreted proteins, transporters, and transcriptional regulators are found, consistent with the ability of both species to adapt to diverse environments.
Abstract: Listeria monocytogenes is a food-borne pathogen with a high mortality rate that has also emerged as a paradigm for intracellular parasitism. We present and compare the genome sequences of L. monocytogenes (2,944,528 base pairs) and a nonpathogenic species, L. innocua (3,011,209 base pairs). We found a large number of predicted genes encoding surface and secreted proteins, transporters, and transcriptional regulators, consistent with the ability of both species to adapt to diverse environments. The presence of 270 L. monocytogenes and 149 L. innocua strain-specific genes (clustered in 100 and 63 islets, respectively) suggests that virulence in Listeria results from multiple gene acquisition and deletion events.

Journal ArticleDOI
20 Jul 2001-Science
TL;DR: A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low–guanine/cytosine Gram-positive species.
Abstract: The 2,160,837-base pair genome sequence of an isolate of Streptococcus pneumoniae, a Gram-positive pathogen that causes pneumonia, bacteremia, meningitis, and otitis media, contains 2236 predicted coding regions; of these, 1440 (64%) were assigned a biological role. Approximately 5% of the genome is composed of insertion sequences that may contribute to genome rearrangements through uptake of foreign DNA. Extracellular enzyme systems for the metabolism of polysaccharides and hexosamines provide a substantial source of carbon and nitrogen for S. pneumoniae and also damage host tissues and facilitate colonization. A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low-guanine/cytosine (GC) Gram-positive species. Several surface-exposed proteins that may serve as potential vaccine candidates were identified. Comparative genome hybridization with DNA arrays revealed strain differences in S. pneumoniae that could contribute to differences in virulence and antigenicity.

Journal ArticleDOI
TL;DR: The complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak is reported, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655, are identified, which may represent the fundamental backbone of the E. coli chromosome.
Abstract: Escherichia coli O157:H7 is a major food-borne infectious pathogen that causes diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome. Here we report the complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655. The chromosome is 5.5 Mb in size, 859 Kb larger than that of K-12. We identified a 4.1-Mb sequence highly conserved between the two strains, which may represent the fundamental backbone of the E. coli chromosome. The remaining 1.4-Mb sequence comprises of O157:H7-specific sequences, most of which are horizontally transferred foreign DNAs. The predominant roles of bacteriophages in the emergence of O157:H7 is evident by the presence of 24 prophages and prophage-like elements that occupy more than half of the O157:H7-specific sequences. The O157:H7 chromosome encodes 1632 proteins and 20 tRNAs that are not present in K-12. Among these, at least 131 proteins are assumed to have virulence-related functions. Genome-wide codon usage analysis suggested that the O157:H7-specific tRNAs are involved in the efficient expression of the strain-specific genes. A complete set of the genes specific to O157:H7 presented here sheds new insight into the pathogenicity and the physiology of O157:H7, and will open a way to fully understand the molecular mechanisms underlying the O157:H7 infection.

Journal ArticleDOI
25 Oct 2001-Nature
TL;DR: The genome sequence is sequenced of a S. typhi (CT18) that is resistant to multiple drugs, revealing the presence of hundreds of insertions and deletions compared with the Escherichia coli genome, ranging in size from single genes to large islands.
Abstract: Salmonella enterica serovar Typhi (S. typhi) is the aetiological agent of typhoid fever, a serious invasive bacterial disease of humans with an annual global burden of approximately 16 million cases, leading to 600,000 fatalities. Many S. enterica serovars actively invade the mucosal surface of the intestine but are normally contained in healthy individuals by the local immune defence mechanisms. However, S. typhi has evolved the ability to spread to the deeper tissues of humans, including liver, spleen and bone marrow. Here we have sequenced the 4,809,037-base pair (bp) genome of a S. typhi (CT18) that is resistant to multiple drugs, revealing the presence of hundreds of insertions and deletions compared with the Escherichia coli genome, ranging in size from single genes to large islands. Notably, the genome sequence identifies over two hundred pseudogenes, several corresponding to genes that are known to contribute to virulence in Salmonella typhimurium. This genetic degradation may contribute to the human-restricted host range for S. typhi. CT18 harbours a 218,150-bp multiple-drug-resistance incH1 plasmid (pHCM1), and a 106,516-bp cryptic plasmid (pHCM2), which shows recent common ancestry with a virulence plasmid of Yersinia pestis.

Journal ArticleDOI
Chung-I Wu1
TL;DR: Significantly, the genetic architecture underlying RI, the patterns of species hybridization and the molecular signature of speciation genes all appear to support the view that RI is one of the manifestations of differential adaptation, as Darwin (1859) suggested.
Abstract: The unit of adaptation is usually thought to be a gene or set of interacting genes, rather than the whole genome, and this may be true of species differentiation. Defining species on the basis of reproductive isolation (RI), on the other hand, is a concept best applied to the entire genome. The biological species concept (BSC; Mayr, 1963) stresses the isolation aspect of speciation on the basis of two fundamental genetic assumptions ‐ the number of loci underlying species differentiation is large and the whole genome behaves as a cohesive, or coadapted genetic unit. Under these tenets, the exchange of any part of the genomes between diverging groups is thought to destroy their integrity. Hence, the maintenance of each species’ genome cohesiveness by isolating mechanisms has become the central concept of species. In contrast, the Darwinian view of speciation is about differential adaptation to different natural or sexual environments. RI is viewed as an important by product of differential adaptation and complete RI across the whole genome need not be considered as the most central criterion of speciation. The emphasis on natural and sexual selection thus makes the Darwinian view compatible with the modern genic concept of evolution. Genetic and molecular analyses of speciation in the last decade have yielded surprisingly strong support for the neo-Darwinian view of extensive genetic differentiation and epistasis during speciation. However, the extent falls short of what BSC requires in order to achieve whole-genome ‘cohesiveness’. Empirical observations suggest that the gene is the unit of species differentiation. Significantly, the genetic architecture underlying RI, the patterns of species hybridization and the molecular signature of speciation genes all appear to support the view that RI is one of the manifestations of differential adaptation, as Darwin (1859, Chap. 8) suggested. The nature of this adaptation may be as much the result of sexual selection as natural selection. In the light of studies since its early days, BSC may now need a major revision by shifting the emphasis from isolation at the level of whole genome to differential adaptation at the genic level. With this revision, BSC would in fact be close to Darwin’s original concept of speciation.

Journal ArticleDOI
04 Oct 2001-Nature
TL;DR: The evidence of ongoing genome fluidity, expansion and decay suggests Y. pestis is a pathogen that has undergone large-scale genetic flux and provides a unique insight into the ways in which new and highly virulent pathogens evolve.
Abstract: The Gram-negative bacterium Yersinia pestis is the causative agent of the systemic invasive infectious disease classically referred to as plague, and has been responsible for three human pandemics: the Justinian plague (sixth to eighth centuries), the Black Death (fourteenth to nineteenth centuries) and modern plague (nineteenth century to the present day). The recent identification of strains resistant to multiple drugs and the potential use of Y. pestis as an agent of biological warfare mean that plague still poses a threat to human health. Here we report the complete genome sequence of Y. pestis strain CO92, consisting of a 4.65-megabase (Mb) chromosome and three plasmids of 96.2 kilobases (kb), 70.3 kb and 9.6 kb. The genome is unusually rich in insertion sequences and displays anomalies in GC base-composition bias, indicating frequent intragenomic recombination. Many genes seem to have been acquired from other bacteria and viruses (including adhesins, secretion systems and insecticidal toxins). The genome contains around 150 pseudogenes, many of which are remnants of a redundant enteropathogenic lifestyle. The evidence of ongoing genome fluidity, expansion and decay suggests Y. pestis is a pathogen that has undergone large-scale genetic flux and provides a unique insight into the ways in which new and highly virulent pathogens evolve.

Journal ArticleDOI
TL;DR: It is becoming clear that alternative splicing has an extremely important role in expanding protein diversity and might therefore partially underlie the apparent discrepancy between gene number and organismal complexity.

Journal ArticleDOI
27 Jul 2001-Science
TL;DR: The annotated DNA sequence of the α-proteobacteriumSinorhizobium meliloti, the symbiont of alfalfa, is presented, indicating that all three elements contribute, in varying degrees, to symbiosis and reveals how this genome may have emerged during evolution.
Abstract: The scarcity of usable nitrogen frequently limits plant growth. A tight metabolic association with rhizobial bacteria allows legumes to obtain nitrogen compounds by bacterial reduction of dinitrogen (N2) to ammonium (NH4+). We present here the annotated DNA sequence of the alpha-proteobacterium Sinorhizobium meliloti, the symbiont of alfalfa. The tripartite 6.7-megabase (Mb) genome comprises a 3.65-Mb chromosome, and 1.35-Mb pSymA and 1.68-Mb pSymB megaplasmids. Genome sequence analysis indicates that all three elements contribute, in varying degrees, to symbiosis and reveals how this genome may have emerged during evolution. The genome sequence will be useful in understanding the dynamics of interkingdom associations and of life in soil environments.

Journal ArticleDOI
TL;DR: The fixation and long-term persistence of horizontally transferred genes suggests that they confer a selective advantage on the recipient organism, and the nature of this advantage remains unclear, but detailed examination of several cases of acquisition of eukaryotic genes by bacteria seems to reveal the evolutionary forces involved.
Abstract: Comparative analysis of bacterial, archaeal, and eukaryotic genomes indicates that a significant fraction of the genes in the prokaryotic genomes have been subject to horizontal transfer. In some cases, the amount and source of horizontal gene transfer can be linked to an organism's lifestyle. For example, bacterial hyperthermophiles seem to have exchanged genes with archaea to a greater extent than other bacteria, whereas transfer of certain classes of eukaryotic genes is most common in parasitic and symbiotic bacteria. Horizontal transfer events can be classified into distinct categories of acquisition of new genes, acquisition of paralogs of existing genes, and xenologous gene displacement whereby a gene is displaced by a horizontally transferred ortholog from another lineage (xenolog). Each of these types of horizontal gene transfer is common among prokaryotes, but their relative contributions differ in different lineages. The fixation and long-term persistence of horizontally transferred genes suggests that they confer a selective advantage on the recipient organism. In most cases, the nature of this advantage remains unclear, but detailed examination of several cases of acquisition of eukaryotic genes by bacteria seems to reveal the evolutionary forces involved. Examples include isoleucyl-tRNA synthetases whose acquisition from eukaryotes by several bacteria is linked to antibiotic resistance, ATP/ADP translocases acquired by intracellular parasitic bacteria, Chlamydia and Rickettsia, apparently from plants, and proteases that may be implicated in chlamydial pathogenesis.

Journal ArticleDOI
TL;DR: Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration and indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group.
Abstract: Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.]

Journal ArticleDOI
TL;DR: This work has created a phylogenetically arranged report on rRNA gene copy number for a diverse collection of prokaryotic microorganisms in an attempt to understand the evolutionary implications of rRNA operon redundancy.
Abstract: The Ribosomal RNA Operon Copy Number Database (rrndb) is an Internet-accessible database containing annotated information on rRNA operon copy number among prokaryotes. Gene redundancy is uncommon in prokaryotic genomes, yet the rRNA genes can vary from one to as many as 15 copies. Despite the widespread use of 16S rRNA gene sequences for identification of prokaryotes, information on the number and sequence of individual rRNA genes in a genome is not readily accessible. In an attempt to understand the evolutionary implications of rRNA operon redundancy, we have created a phylogenetically arranged report on rRNA gene copy number for a diverse collection of prokaryotic microorganisms. Each entry (organism) in the rrndb contains detailed information linked directly to external websites including the Ribosomal Database Project, GenBank, PubMed and several culture collections. Data contained in the rrndb will be valuable to researchers investigating microbial ecology and evolution using 16S rRNA gene sequences. The rrndb web site is directly accessible on the WWW at http://rrndb.cme.msu.edu.

Journal ArticleDOI
TL;DR: The 1,852,442-bp sequence of an M1 strain of Streptococcus pyogenes, a Gram-positive pathogen, has been determined and contains 1,752 predicted protein-encoding genes, consistent with the observation that S. pyogene is responsible for a wider variety of human disease than any other bacterial species.
Abstract: The 1,852,442-bp sequence of an M1 strain of Streptococcus pyogenes, a Gram-positive pathogen, has been determined and contains 1,752 predicted protein-encoding genes. Approximately one-third of these genes have no identifiable function, with the remainder falling into previously characterized categories of known microbial function. Consistent with the observation that S. pyogenes is responsible for a wider variety of human disease than any other bacterial species, more than 40 putative virulence-associated genes have been identified. Additional genes have been identified that encode proteins likely associated with microbial "molecular mimicry" of host characteristics and involved in rheumatic fever or acute glomerulonephritis. The complete or partial sequence of four different bacteriophage genomes is also present, with each containing genes for one or more previously undiscovered superantigen-like proteins. These prophage-associated genes encode at least six potential virulence factors, emphasizing the importance of bacteriophages in horizontal gene transfer and a possible mechanism for generating new strains with increased pathogenic potential.

Journal ArticleDOI
22 Nov 2001-Nature
TL;DR: The DNA sequences of the 11 chromosomes of the ∼2.9-megabase (Mb) genome of Encephalitozoon cuniculi are reported and it is hypothesize that microsporidia have retained a mitochondrion-derived organelle.
Abstract: Microsporidia are obligate intracellular parasites infesting many animal groups. Lacking mitochondria and peroxysomes, these unicellular eukaryotes were first considered a deeply branching protist lineage that diverged before the endosymbiotic event that led to mitochondria. The discovery of a gene for a mitochondrial-type chaperone combined with molecular phylogenetic data later implied that microsporidia are atypical fungi that lost mitochondria during evolution. Here we report the DNA sequences of the 11 chromosomes of the approximately 2.9-megabase (Mb) genome of Encephalitozoon cuniculi (1,997 potential protein-coding genes). Genome compaction is reflected by reduced intergenic spacers and by the shortness of most putative proteins relative to their eukaryote orthologues. The strong host dependence is illustrated by the lack of genes for some biosynthetic pathways and for the tricarboxylic acid cycle. Phylogenetic analysis lends substantial credit to the fungal affiliation of microsporidia. Because the E. cuniculi genome contains genes related to some mitochondrial functions (for example, Fe-S cluster assembly), we hypothesize that microsporidia have retained a mitochondrion-derived organelle.

Journal ArticleDOI
TL;DR: The arrays provide precise measurement in cell lines and clinical material, so that they can reliably detect and quantify high-level amplifications and single-copy alterations in diploid, polyploid and heterogeneous backgrounds.
Abstract: We have assembled arrays of approximately 2,400 BAC clones for measurement of DNA copy number across the human genome. The arrays provide precise measurement (s.d. of log2 ratios=0.05-0.10) in cell lines and clinical material, so that we can reliably detect and quantify high-level amplifications and single-copy alterations in diploid, polyploid and heterogeneous backgrounds.

Journal ArticleDOI
TL;DR: It is demonstrated how the intracellular concentrations of metabolites can reveal phenotypes for proteins active in metabolic regulation, and this approach to functional analysis, using comparative metabolomics, is called FANCY—an abbreviation for functional analysis by co-responses in yeast.
Abstract: A large proportion of the 6,000 genes present in the genome of Saccharomyces cerevisiae, and of those sequenced in other organisms, encode proteins of unknown function. Many of these genes are "silent," that is, they show no overt phenotype, in terms of growth rate or other fluxes, when they are deleted from the genome. We demonstrate how the intracellular concentrations of metabolites can reveal phenotypes for proteins active in metabolic regulation. Quantification of the change of several metabolite concentrations relative to the concentration change of one selected metabolite can reveal the site of action, in the metabolic network, of a silent gene. In the same way, comprehensive analyses of metabolite concentrations in mutants, providing "metabolic snapshots," can reveal functions when snapshots from strains deleted for unstudied genes are compared to those deleted for known genes. This approach to functional analysis, using comparative metabolomics, we call FANCY—an abbreviation for functional analysis by co-responses in yeast.

Journal ArticleDOI
TL;DR: This work proposes a merger of genomics and genetics into 'genetical genomics', which involves expression profiling and marker-based fingerprinting of each individual of a segregating population, and exploits all the statistical tools used in the analysis of quantitative trait loci.

Journal ArticleDOI
John Douglas Mcpherson1, Marco A. Marra2, Marco A. Marra1, LaDeana W. Hillier1, Robert H. Waterston1, Asif T. Chinwalla1, John W. Wallis1, Mandeep Sekhon1, Kristine M. Wylie1, Elaine R. Mardis1, Richard K. Wilson1, Robert S. Fulton1, Tamara A. Kucaba1, Caryn Wagner-McPherson1, William B. Barbazuk1, Simon G. Gregory3, Sean Humphray3, Lisa French3, R Evans3, Graeme Bethel3, Adam Whittaker3, Jane L. Holden3, Owen T. McCann3, Andrew Dunham3, Carol Soderlund4, Carol Scott3, David R. Bentley3, Gregory D. Schuler5, Hsiu Chuan Chen5, Wonhee Jang5, Eric D. Green5, Jacquelyn R. Idol5, Valerie Maduro5, Kate Montgomery6, Eunice Lee6, Ashley Miller6, Suzanne Emerling6, Raju Kucherlapati6, Richard A. Gibbs7, Steve Scherer7, J. Harley Gorrell7, Erica Sodergren7, Kerstin P. Clerc-Blankenburg7, Paul E. Tabor7, S. Naylor8, Dawn Garcia8, J. de Jong9, J. de Jong10, J. de Jong11, Joseph J. Catanese10, Joseph J. Catanese9, Joseph J. Catanese11, Norma J. Nowak10, Kazutoyo Osoegawa10, Kazutoyo Osoegawa11, Kazutoyo Osoegawa9, Shizhen Qin12, Lee Rowen12, Anuradha Madan12, Monica Dors12, Leroy Hood12, Barbara J. Trask13, Cynthia Friedman13, Hillary Massa13, Vivian G. Cheung14, Ilan R. Kirsch5, Thomas Reid5, Raluca Yonescu5, Jean Weissenbach, Thomas Brüls, Roland Heilig, Elbert Branscomb15, Anne S. Olsen15, Norman A. Doggett15, Jan Fang Cheng15, Trevor Hawkins15, Richard M. Myers16, Jin Shang16, Lucía Ramírez16, Jeremy Schmutz16, Olivia Velasquez16, Kami Dixon16, Nancy E. Stone16, David R. Cox16, David Haussler17, W. James Kent17, Terrence S. Furey17, Sanja Rogic17, Scot Kennedy17, Steven J.M. Jones2, André Rosenthal5, Gaiping Wen5, Markus Schilhabel5, Gernot Gloeckner5, Gerald Nyakatura5, Reiner Siebert18, Brigitte Schlegelberger18, Julie R. Korenberg19, Xiao Ning Chen19, Asao Fujiyama, Masahira Hattori, Atsushi Toyoda, Tetsushi Yada, Hong Seok Park, Yoshiyuki Sakaki, Nobuyoshi Shimizu20, Shuichi Asakawa20, Kazuhiko Kawasaki20, Takashi Sasaki20, Ai Shintani20, Atsushi Shimizu20, Kazunori Shibuya20, Jun Kudoh20, Shinsei Minoshima20, Juliane Ramser21, Peter Seranski21, Céline Hoff21, Annemarie Poustka21, Richard Reinhardt21, Hans Lehrach21 
15 Feb 2001-Nature
TL;DR: The construction of the whole-genome bacterial artificial chromosome (BAC) map and its integration with previous landmark maps and information from mapping efforts focused on specific chromosomal regions are reported.
Abstract: The human genome is by far the largest genome to be sequenced, and its size and complexity present many challenges for sequence assembly. The International Human Genome Sequencing Consortium constructed a map of the whole genome to enable the selection of clones for sequencing and for the accurate assembly of the genome sequence. Here we report the construction of the whole-genome bacterial artificial chromosome (BAC) map and its integration with previous landmark maps and information from mapping efforts focused on specific chromosomal regions. We also describe the integration of sequence data with the map.

Journal ArticleDOI
TL;DR: The successful application of the microarray technology platform to the analysis of DNA polymorphisms is presented and the potential of a high-throughput genome analysis method called Diversity Array Technology, DArT' is demonstrated.
Abstract: Here we present the successful application of the microarray technology platform to the analysis of DNA polymorphisms. Using the rice genome as a model, we demonstrate the potential of a highthroughput genome analysis method called Diversity Array Technology, DArT. In the format presented here the technology is assaying for the presence (or amount) of a specific DNA fragment in a representation derived from the total genomic DNA of an organism or a population of organisms. Two different approaches are presented: the first involves contrasting two representations on a single array while the second involves contrasting a representation with a reference DNA fragment common to all elements of the array. The Diversity Panels created using this method allow genetic fingerprinting of any organism or group of organisms belonging to the gene pool from which the panel was developed. Diversity Arrays enable rapid and economical application of a highly parallel, solid-state genotyping technology to any genome or complex genomic mixtures.

Journal ArticleDOI
TL;DR: Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity.
Abstract: The genome sequence of the solvent-producing bacterium Clostridium acetobutylicum ATCC 824 has been determined by the shotgun approach. The genome consists of a 3.94-Mb chromosome and a 192-kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria. However, the C. acetobutylicum genome also contains a significant number of predicted operons that are shared with distantly related bacteria and archaea but not with B. subtilis. Phylogenetic analysis is compatible with the dissemination of such operons by horizontal transfer. The enzymes of the solventogenesis pathway and of the cellulosome of C. acetobutylicum comprise a new set of metabolic capacities not previously represented in the collection of complete genomes. These enzymes show a complex pattern of evolutionary affinities, emphasizing the role of lateral gene exchange in the evolution of the unique metabolic profile of the bacterium. Many of the sporulation genes identified in B. subtilis are missing in C. acetobutylicum, which suggests major differences in the sporulation process. Thus, comparative analysis reveals both significant conservation of the genome organization and pronounced differences in many systems that reflect unique adaptive strategies of the two gram-positive bacteria.

Journal ArticleDOI
05 Oct 2001-Science
TL;DR: Oligonucleotide microarrays were used to map the detailed topography of chromosome replication in the budding yeast Saccharomyces cerevisiae, finding the two ends of each of the 16 chromosomes are highly correlated in their times of replication.
Abstract: Oligonucleotide microarrays were used to map the detailed topography of chromosome replication in the budding yeast Saccharomyces cerevisiae. The times of replication of thousands of sites across the genome were determined by hybridizing replicated and unreplicated DNAs, isolated at different times in S phase, to the microarrays. Origin activations take place continuously throughout S phase but with most firings near mid-S phase. Rates of replication fork movement vary greatly from region to region in the genome. The two ends of each of the 16 chromosomes are highly correlated in their times of replication. This microarray approach is readily applicable to other organisms, including humans.