scispace - formally typeset
Search or ask a question
Author

William S. Hayes

Bio: William S. Hayes is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Gene & Genome. The author has an hindex of 8, co-authored 9 publications receiving 4107 citations.

Papers
More filters
Journal ArticleDOI
07 Aug 1997-Nature
TL;DR: Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, and for DNA restriction and modification, and consistent with its restricted niche, it has a few regulatory networks, and a limited metabolic repertoire and biosynthetic capacity.
Abstract: Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins were identified, underscoring the potential complexity of host-pathogen interaction. Based on the large number of sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and dinucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive evolution. Consistent with its restricted niche, H. pylori has a few regulatory networks, and a limited metabolic repertoire and biosynthetic capacity. Its survival in acid conditions depends, in part, on its ability to establish a positive inside-membrane potential in low pH.

3,577 citations

Journal ArticleDOI
TL;DR: By comparing proteins encoded by the two bacterial genomes, it is shown that extensive gene shuffling and variation in the extent of gene paralogy are major trends in bacterial evolution; this comparison has also allowed us to deduce crucial aspects of the largely uncharacterized metabolism of H. influenzae.

302 citations

Journal ArticleDOI
TL;DR: The method, GeneMark-Genesis, which learns the parameters of Markov models of protein-coding and noncoding regions from anonymous bacterial genomic sequence is presented and it is shown that the atypical model allows one to predict genes that escape identification by the typical model.
Abstract: In this report we address the problem of accurate statistical modeling of DNA sequences, either coding or noncoding, for a bacterial species whose genome (or a large portion) was sequenced but not yet characterized experimentally. Availability of these models is critical for successful solution of the genome annotation task by statistical methods of gene finding. We present the method, GeneMark-Genesis, which learns the parameters of Markov models of protein-coding and noncoding regions from anonymous bacterial genomic sequence. These models are subsequently used in the GeneMark and GeneMark.hmm gene-finding programs. Although there is basically one model of a noncoding region for a given genome, several models of protein-coding region are automatically obtained by GeneMark-Genesis. The diversity of protein-coding models reflects the diversity of oligonucleotide compositions, particularly the diversity of codon usage strategies observed in genes from one and the same genome. In the simplest and the most important case, there are just two gene models—typical and atypical ones. We show that the atypical model allows one to predict genes that escape identification by the typical model. Many genes predicted by the atypical model appear to be horizontally transferred genes. The early versions of GeneMark-Genesis were used for annotating the genomes of Methanoccocus jannaschii and Helicobacter pylori. We report the results of accuracy testing of the full-scale version of GeneMark-Genesis on 10 completely sequenced bacterial genomes. Interestingly, the GeneMark.hmm program that employed the typical and atypical models defined by GeneMark-Genesis was able to predict 683 new atypical genes with 176 of them confirmed by similarity search.

125 citations

Journal ArticleDOI
19 Jan 1952-Nature
TL;DR: The development of nutritionally independent prototroph colonies from mixed cultures of doubly dependent mutant strains of Bact. acidi lactici was first demonstrated in 1946 as discussed by the authors.
Abstract: THE development of nutritionally independent prototroph colonies from mixed cultures of doubly dependent mutant strains of Bact. coli K 12 was first demonstrated in 19461. Back mutation to prototrophism did not occur when the mutants were cultured separately. Since the pattern of unselected marker characters in prototrophs was usually different from that in either mutant, the phenomenon was clearly due to genetic recombination. The incompetence in recombination of culture filtrates (unlike type transformation in Pneumococcus), and recent evidence for the occasional occurrence of diploid heterozygous prototrophs2, strongly support the current theory that the genetic transfer is mediated by sexual conjugation. Attempts to reproduce the phenomenon in other strains and species have failed, though successful out-crossing of K 12 mutants with a strain of Bact. acidi lactici has been reported3.

88 citations

Journal ArticleDOI
TL;DR: GeneLynx is a meta-database providing an extensive collection of hyperlinks to human gene-specific information in diverse databases available on the Internet, and a communal curation system for user-aided annotation.
Abstract: GeneLynx is a meta-database providing an extensive collection of hyperlinks to human gene-specific information in diverse databases available on the Internet. The GeneLynx project is based on the simple notion that given any gene-specific identifier (accession number, gene name, text, or sequence), scientists should be able to access a single location that provides a set of links to all the publicly available information pertinent to the specified human gene. GeneLynx was implemented as an extensible relational database with an intuitive and user-friendly Web interface. The data are automatically extracted from more than 40 external resources, using appropriate approaches to maximize coverage of the available data. Construction and curation of the system is mediated by a custom set of software tools. An indexing utility is provided to facilitate the establishment of hyperlinks in external databases. A unique feature of the GeneLynx system is a communal curation system for user-aided annotation. GeneLynx can be accessed freely at http://www.genelynx.org.

56 citations


Cited by
More filters
Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations

Journal ArticleDOI
05 Sep 1997-Science
TL;DR: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented and reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident.
Abstract: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer.

7,723 citations

Journal ArticleDOI
TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Abstract: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at http://www.tigr.org/software/mummer.

4,886 citations

Journal ArticleDOI
F. Kunst1, Naotake Ogasawara2, Ivan Moszer1, Alessandra M. Albertini3  +151 moreInstitutions (30)
20 Nov 1997-Nature
TL;DR: Bacillus subtilis is the best-characterized member of the Gram-positive bacteria, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.
Abstract: Bacillus subtilis is the best-characterized member of the Gram-positive bacteria. Its genome of 4,214,810 base pairs comprises 4,100 protein-coding genes. Of these protein-coding genes, 53% are represented once, while a quarter of the genome corresponds to several gene families that have been greatly expanded by gene duplication, the largest family containing 77 putative ATP-binding transport proteins. In addition, a large proportion of the genetic capacity is devoted to the utilization of a variety of carbon sources, including many plant-derived molecules. The identification of five signal peptidase genes, as well as several genes for components of the secretion apparatus, is important given the capacity of Bacillus strains to secrete large amounts of industrially important enzymes. Many of the genes are involved in the synthesis of secondary metabolites, including antibiotics, that are more typically associated with Streptomyces species. The genome contains at least ten prophages or remnants of prophages, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.

3,753 citations

Journal ArticleDOI
TL;DR: This review summarizes the development in the field since the previous review and begins to understand how this bilayer of the outer membrane can retard the entry of lipophilic compounds, owing to increasing knowledge about the chemistry of lipopolysaccharide from diverse organisms and the way in which lipopoly Saccharide structure is modified by environmental conditions.
Abstract: Gram-negative bacteria characteristically are surrounded by an additional membrane layer, the outer membrane. Although outer membrane components often play important roles in the interaction of symbiotic or pathogenic bacteria with their host organisms, the major role of this membrane must usually be to serve as a permeability barrier to prevent the entry of noxious compounds and at the same time to allow the influx of nutrient molecules. This review summarizes the development in the field since our previous review (H. Nikaido and M. Vaara, Microbiol. Rev. 49:1-32, 1985) was published. With the discovery of protein channels, structural knowledge enables us to understand in molecular detail how porins, specific channels, TonB-linked receptors, and other proteins function. We are now beginning to see how the export of large proteins occurs across the outer membrane. With our knowledge of the lipopolysaccharide-phospholipid asymmetric bilayer of the outer membrane, we are finally beginning to understand how this bilayer can retard the entry of lipophilic compounds, owing to our increasing knowledge about the chemistry of lipopolysaccharide from diverse organisms and the way in which lipopolysaccharide structure is modified by environmental conditions.

3,585 citations