Showing papers in &quot;Genome Research in 1998&quot;

Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities

TL;DR: In this article, a base-calling program for automated sequencer traces, phred, with improved accuracy was proposed. But it was not shown to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.

...read moreread less

Abstract: The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.

...read moreread less

7,627 citations

Journal Article•DOI•

[...]

Brent Ewing¹, Philip Green¹•Institutions (1)

University of Washington¹

Consed: A Graphical Tool for Sequence Finishing

TL;DR: The ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data, is developed and implemented in the base-calling program.

...read moreread less

Abstract: Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.

...read moreread less

5,334 citations

Journal Article•DOI•

[...]

David Gordon¹, Chris Abajian¹, Philip Green¹•Institutions (1)

University of Washington¹

Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process.

TL;DR: A finishing tool, consed, which attempts to implement principles of shotgun sequencing by using error probabilities from phred and phrap as an objective criterion to guide the entire finishing process.

...read moreread less

Abstract: Sequencing of large clones or small genomes is generally done by the shotgun approach (Anderson et al. 1982). This has two phases: (1) a shotgun phase in which a number of reads are generated from random subclones and assembled into contigs, followed by (2) a directed, or finishing phase in which the assembly is inspected for correctness and for various kinds of data anomalies (such as contaminant reads, unremoved vector sequence, and chimeric or deleted reads), additional data are collected to close gaps and resolve low quality regions, and editing is performed to correct assembly or base-calling errors. Finishing is currently a bottleneck in large-scale sequencing efforts, and throughput gains will depend both on reducing the need for human intervention and making it as efficient as possible. We have developed a finishing tool, consed, which attempts to implement these principles. A distinguishing feature relative to other programs is the use of error probabilities from our programs phred and phrap as an objective criterion to guide the entire finishing process. More information is available at http:// www.genome.washington.edu/consed/consed. html.

...read moreread less

3,486 citations

Journal Article•DOI•

[...]

Richard W Michelmore¹, Blake C. Meyers¹•Institutions (1)

University of California, Davis¹

01 Nov 1998-Genome Research

TL;DR: A new model adapted and expanded from one proposed for the evolution of vertebrate major histocompatibility complex and immunoglobulin gene families is proposed resulting in evolution of individual R genes within a haplotype that emphasizes divergent selection acting on arrays of solvent-exposed residues in the LRR.

...read moreread less

Abstract: Classical genetic and molecular data show that genes determining disease resistance in plants are frequently clustered in the genome. Genes for resistance (R genes) to diverse pathogens cloned from several species encode proteins that have motifs in common. These motifs indicate that R genes are part of signal-transduction systems. Most of these R genes encode a leucine-rich repeat (LRR) region. Sequences encoding putative solvent-exposed residues in this region are hypervariable and have elevated ratios of nonsynonymous to synonymous substitutions; this suggests that they have evolved to detect variation in pathogen-derived ligands. Generation of new resistance specificities previously had been thought to involve frequent unequal crossing-over and gene conversions. However, comparisons between resistance haplotypes reveal that orthologs are more similar than paralogs implying a low rate of sequence homogenization from unequal crossing-over and gene conversion. We propose a new model adapted and expanded from one proposed for the evolution of vertebrate major histocompatibility complex and immunoglobulin gene families. Our model emphasizes divergent selection acting on arrays of solvent-exposed residues in the LRR resulting in evolution of individual R genes within a haplotype. Intergenic unequal crossing-over and gene conversions are important but are not the primary mechanisms generating variation.

...read moreread less

1,022 citations

Journal Article•DOI•

A DNA Polymorphism Discovery Resource for Research on Human Genetic Variation

[...]

Francis S. Collins¹, Lisa D. Brooks¹, Aravinda Chakravarti•Institutions (1)

A computer program for aligning a cDNA sequence with a genomic DNA sequence.

01 Dec 1998-Genome Research

TL;DR: A large number of mapped SNPs will be valuable as markers throughout the genome for finding SNPs that do affect gene function, as linkage disequilibrium over tens to hundreds of kilobases is expected to be found in many regions of the human genome.

...read moreread less

Abstract: perform association analysis on many affected and unaffected individuals, which would require hundreds of thousands of variants spread over the entire genome (Risch and Merikangas 1996). Such a large number of variants is currently not available. The DNA Polymorphism Discovery Resource is designed to promote their discovery. About 90% of sequence variants in humans are differences in single bases of DNA, called single nucleotide polymorphisms (SNPs). SNPs in the coding regions of genes (cSNPs) or in regulatory regions are more likely to cause functional differences than SNPs elsewhere. Although most SNPs do not affect gene function, a large number of mapped SNPs will be valuable as markers throughout the genome for finding SNPs that do affect gene function, as linkage disequilibrium over tens to hundreds of kilobases is expected to be found in many regions of the human genome. Both SNPs and cSNPs can be identified by using the DNA Polymorphism Discovery Resource. When two random chromosomes are

...read moreread less

836 citations

Journal Article•DOI•

[...]

Liliana Florea¹, George Hartzell, Zheng Zhang¹, Gerald M. Rubin, Webb Miller¹ - Show less +1 more•Institutions (1)

Pennsylvania State University¹

01 Sep 1998-Genome Research

TL;DR: A freely available computer program solves the problem of efficiently aligning a transcribed and spliced DNA sequence with a genomic sequence containing that gene, allowing for introns in the genomic sequence and a relatively small number of sequencing errors.

...read moreread less

Abstract: We address the problem of efficiently aligning a transcribed and spliced DNA sequence with a genomic sequence containing that gene, allowing for introns in the genomic sequence and a relatively small number of sequencing errors. A freely available computer program, described herein, solves the problem for a 100-kb genomic sequence in a few seconds on a workstation.

...read moreread less

764 citations

Journal Article•DOI•

Phylogenomics: Improving Functional Predictions for Uncharacterized Genes by Evolutionary Analysis

[...]

Jonathan A. Eisen¹•Institutions (1)

Stanford University¹

Transposable Elements and Genome Organization: A Comprehensive Survey of Retrotransposons Revealed by the Complete Saccharomyces cerevisiae Genome Sequence

TL;DR: It is suggested that functional predictions can be greatly improved by focusing on how the genes became similar in sequence (i.e., evolution) rather than on the sequence similarity itself.

...read moreread less

Abstract: The ability to accurately predict gene function based on gene sequence is an important tool in many areas of biological research. Such predictions have become particularly important in the genomics age in which numerous gene sequences are generated with little or no accompanying experimentally determined functional information. Almost all functional prediction methods rely on the identification, characterization, and quantification of sequence similarity between the gene of interest and genes for which functional information is available. Because sequence is the prime determining factor of function, sequence similarity is taken to imply similarity of function. There is no doubt that this assumption is valid in most cases. However, sequence similarity does not ensure identical functions, and it is common for groups of genes that are similar in sequence to have diverse (although usually related) functions. Therefore, the identification of sequence similarity is frequently not enough to assign a predicted function to an uncharacterized gene; one must have a method of choosing among similar genes with different functions. In such cases, most functional prediction methods assign likely functions by quantifying the levels of similarity among genes. I suggest that functional predictions can be greatly improved by focusing on how the genes became similar in sequence (i.e., evolution) rather than on the sequence similarity itself. It is well established that many aspects of comparative biology can benefit from evolutionary studies (Felsenstein 1985), and comparative molecular biology is no exception (e.g., Altschul et al. 1989; Goldman et al. 1996). In this commentary, I discuss the use of evolutionary information in the prediction of gene function. To appreciate the potential of a phylogenomic approach to the prediction of gene function, it is necessary to first discuss how gene sequence is commonly used to predict gene function and some general features about gene evolution.

...read moreread less

608 citations

Journal Article•DOI•

[...]

Jin M. Kim¹, Swathi Vanguri², Jef D. Boeke³, Abram Gabriel², Daniel F. Voytas¹ - Show less +1 more•Institutions (3)

Iowa State University¹, Rutgers University², Johns Hopkins University³

The Relative Power of Family-Based and Case-Control Designs for Linkage Disequilibrium Studies of Complex Human Diseases I. DNA Pooling

TL;DR: A genome-wide survey of Saccharomyces cerevisiae retrotransposons offers the first opportunity to view organizational and evolutionary trends among retrotranspoons at the genome level, and it is hoped the compiled data will serve as a starting point for further investigation and for comparison to other, more complex genomes.

...read moreread less

Abstract: We conducted a genome-wide survey of Saccharomyces cerevisiae retrotransposons and identified a total of 331 insertions, including 217 Ty1, 34 Ty2, 41 Ty3, 32 Ty4, and 7 Ty5 elements. Eighty-five percent of insertions were solo long terminal repeats (LTRs) or LTR fragments. Overall, retrotransposon sequences constitute >377 kb or 3.1% of the genome. Independent evolution of retrotransposon sequences was evidenced by the identification of a single-base pair insertion/deletion that distinguishes the highly similar Ty1 and Ty2 LTRs and the identification of a distinct Ty1 subfamily (Ty18). Whereas Ty1, Ty2, and Ty5 LTRs displayed a broad range of sequence diversity (typically ranging from 70%‐99% identity), Ty3 and Ty4 LTRs were highly similar within each element family (most sharing >96% nucleotide identity). Therefore, Ty3 and Ty4 may be more recent additions to the S. cerevisiae genome and perhaps entered through horizontal transfer or past polyploidization events. Distribution of Ty elements is distinctly nonrandom: 90% of Ty1, 82% of Ty2, 95% of Ty3, and 88% of Ty4 insertions were found within 750 bases of tRNA genes or other genes transcribed by RNA polymerase III. tRNA genes are the principle determinant of retrotransposon distribution, and there is, on average, 1.2 insertions per tRNA gene. Evidence for recombination was found near many Ty elements, particularly those not associated with tRNA gene targets. For these insertions, 58- and 38-flanking sequences were often duplicated and rearranged among multiple chromosomes, indicating that recombination between retrotransposons can influence genome organization. S. cerevisiae offers the first opportunity to view organizational and evolutionary trends among retrotransposons at the genome level, and we hope our compiled data will serve as a starting point for further investigation and for comparison to other, more complex genomes.

...read moreread less

539 citations

Journal Article•DOI•

[...]

Neil Risch¹, Jun Teng•Institutions (1)

Stanford University¹

01 Dec 1998-Genome Research

TL;DR: It is demonstrated that for sibships with parents, only the parents require individual genotyping to derive the TDT statistic, whereas all the offspring can be pooled, which can potentially lead to considerable savings in genotypes, especially for multiplex sibship.

...read moreread less

Abstract: We consider statistics for analyzing a variety of family-based and nonfamily-based designs for detecting linkage disequilibrium of a marker with a disease susceptibility locus. These designs include sibships with parents, sibships without parents, and use of unrelated controls. We also provide formulas for and evaluate the relative power of different study designs using these statistics. In this first paper in the series, we derive statistical tests based on data derived from DNA pooling experiments and describe their characteristics. Although designs based on affected and unaffected sibs without parents are usually robust to population stratification, they suffer a loss of power compared with designs using parents or unrelateds as controls. Although increasing the number of unaffected sibs improves power, the increase is generally not substantial. Designs including sibships with multiple affected sibs are typically the most powerful, with any of these control groups, when the disease allele frequency is low. When the allele frequency is high, however, designs with unaffected sibs as controls do not retain this advantage. In designs with parents, having an affected parent has little impact on the power, except for rare dominant alleles, where the power is increased compared with families with no affected parents. Finally, we also demonstrate that for sibships with parents, only the parents require individual genotyping to derive the TDT statistic, whereas all the offspring can be pooled. This can potentially lead to considerable savings in genotyping, especially for multiplex sibships. The formulas and tables we derive should provide some guidance to investigators designing nuclear family-based linkage disequilibrium studies for complex diseases.

...read moreread less

399 citations

Journal Article•DOI•

Predicting gene regulatory elements in silico on a genomic scale.

[...]

Alvis Brāzma¹, Inge Jonassen, Jaak Vilo, Esko Ukkonen•Institutions (1)

European Bioinformatics Institute¹

01 Nov 1998-Genome Research

TL;DR: A new sequence pattern discovery algorithm is developed that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences.

...read moreread less

Abstract: We performed a systematic analysis of gene upstream regions in the yeast genome for occurrences of regular expression-type patterns with the goal of identifying potential regulatory elements. To achieve this goal, we have developed a new sequence pattern discovery algorithm that searches exhaustively for a priori unknown regular expression-type patterns that are over-represented in a given set of sequences. We applied the algorithm in two cases, (1) discovery of patterns in the complete set of >6000 sequences taken upstream of the putative yeast genes and (2) discovery of patterns in the regions upstream of the genes with similar expression profiles. In the first case, we looked for patterns that occur more frequently in the gene upstream regions than in the genome overall. In the second case, first we clustered the upstream regions of all the genes by similarity of their expression profiles on the basis of publicly available gene expression data and then looked for sequence patterns that are over-represented in each cluster. In both cases we considered each pattern that occurred at least in some minimum number of sequences, and rated them on the basis of their over-representation. Among the highest rating patterns, most have matches to substrings in known yeast transcription factor-binding sites. Moreover, several of them are known to be relevant to the expression of the genes from the respective clusters. Experiments on simulated data show that the majority of the discovered patterns are not expected to occur by chance.

...read moreread less

368 citations

Journal Article•DOI•

Reading Bits of Genetic Information: Methods for Single-Nucleotide Polymorphism Analysis

[...]

Ulf Landegren¹, Mats Nilsson, Pui-Yan Kwok•Institutions (1)

Uppsala University¹

01 Aug 1998-Genome Research

TL;DR: In this paper, a method for single-nucleotide poly-morphisms analysis was proposed, based on reading bits of genetic information, which can be found in the Appendix.

...read moreread less

Abstract: Reading bits of genetic information: Methods for single-nucleotide poly-mor-phisms analysis.

...read moreread less

Journal Article•DOI•

The synuclein family.

[...]

Christian Lavedan¹•Institutions (1)

Models of Molecular Evolution and Phylogeny

01 Sep 1998-Genome Research

TL;DR: The present review offers a synopsis of the current state of knowledge of all synuclein family members in different species.

...read moreread less

Abstract: The synuclein gene family recently came into the spotlight, when one of its members, alpha-synuclein, was found to be mutated in several families with autosomal dominant Parkinson's disease (PD). A peptide of the alpha-synuclein protein had been characterized previously as a major component of amyloid plaques in brains of patients with Alzheimer's disease (AD). The mechanism by which this presynaptic protein is involved in the two most common neurodegenerative disorders, AD and PD, remains unclear. Remarkably, another member of this gene family, gamma-synuclein, has been shown to be overexpressed in breast carcinomas and may also be overexpressed in ovarian cancer. The possible involvement of the synuclein proteins in the etiology of common human diseases has raised exciting questions and is the subject of intense investigation. Details of the properties of any member of the synuclein family may provide useful information for understanding the characteristics and function of other family members. The present review offers a synopsis of the current state of knowledge of all synuclein family members in different species.

...read moreread less

Journal Article•DOI•

[...]

Pietro Liò¹, Nick Goldman•Institutions (1)

University of Cambridge¹

01 Dec 1998-Genome Research

TL;DR: These models, including the analysis of observed DNA base and amino acid mutation patterns, the concept of site heterogeneity, and the incorporation of structural biology data, all of which have become particularly important in recent years are discussed.

...read moreread less

Abstract: Phylogenetic reconstruction is a fast-growing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid replacement. These may become some of the basic models for comparative genome research. We discuss these models, including the analysis of observed DNA base and amino acid mutation patterns, the concept of site heterogeneity, and the incorporation of structural biology data, all of which have become particularly important in recent years. We also describe the use of such models in phylogenetic reconstruction and statistical methods for the comparison of different models.

...read moreread less

Journal Article•DOI•

Novel Families of Putative Protein Kinases in Bacteria and Archaea: Evolution of the “Eukaryotic” Protein Kinase Superfamily

[...]

Christopher J. Leonard¹, L. Aravind, Eugene V. Koonin•Institutions (1)

Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic mycobacterium dna arrays

01 Oct 1998-Genome Research

TL;DR: From the phylogenetic distribution of the proteins encoded by the completely sequenced bacterial and archaeal genomes, the existence of an ancestral protein kinase prior to the divergence of eukaryotes, bacteria, and archaea is inferred.

...read moreread less

Abstract: The central role of serine/threonine and tyrosine protein kinases in signal transduction and cellular regulation in eukaryotes is well established and widely documented. Considerably less is known about the prevalence and role of these protein kinases in bacteria and archaea. In order to examine the evolutionary origins of the eukaryotic-type protein kinase (ePK) superfamily, we conducted an extensive analysis of the proteins encoded by the completely sequenced bacterial and archaeal genomes. We detected five distinct families of known and predicted putative protein kinases with representatives in bacteria and archaea that share a common ancestry with the eukaryotic protein kinases. Four of these protein families have not been identified previously as protein kinases. From the phylogenetic distribution of these families, we infer the existence of an ancestral protein kinase(s) prior to the divergence of eukaryotes, bacteria, and archaea.

...read moreread less

Journal Article•DOI•

[...]

Thomas R. Gingeras¹, Ghassan Ghandour, Eugene Wang, Anthony Berno, Peter M. Small, Francis Drobniewski, David Alland, Edward Desmond, Mark Holodniy, Jorg Drenkow - Show less +6 more•Institutions (1)

Affymetrix¹

Analogous Enzymes: Independent Inventions in Enzyme Evolution

TL;DR: This work demonstrates the general point that DNA microarrays that sequence important genomic regions (such as drug resistance or pathogenicity islands) can simultaneously identify species and provide some insight into the organism's population structure.

...read moreread less

Abstract: High-density oligonucleotide arrays can be used to rapidly examine large amounts of DNA sequence in a high throughput manner. An array designed to determine the specific nucleotide sequence of 705 bp of the rpoB gene of Mycobacterium tuberculosis accurately detected rifampin resistance associated with mutations of 44 clinical isolates of M. tuberculosis. The nucleotide sequence diversity in 121 Mycobacterial isolates (comprised of 10 species) was examined by both conventional dideoxynucleotide sequencing of the rpoB and 16S genes and by analysis of the rpoB oligonucleotide array hybridization patterns. Species identification for each of the isolates was similar irrespective of whether 16S sequence, rpoB sequence, or the pattern of rpoB hybridization was used. However, for several species, the number of alleles in the 16S and rpoB gene sequences provided discordant estimates of the genetic diversity within a species. In addition to confirming the array's intended utility for sequencing the region of M. tuberculosis that confers rifampin resistance, this work demonstrates that this array can identify the species of nontuberculous Mycobacteria. This demonstrates the general point that DNA microarrays that sequence important genomic regions (such as drug resistance or pathogenicity islands) can simultaneously identify species and provide some insight into the organism's population structure.

...read moreread less

Journal Article•DOI•

[...]

Michael Y. Galperin¹, D. Roland Walker, Eugene V. Koonin•Institutions (1)

Emerging Patterns of Comparative Genome Organization in Some Mammalian Species as Revealed by Zoo-FISH

01 Aug 1998-Genome Research

TL;DR: Recruitment of enzymes that catalyze a similar but distinct reaction seems to be a major scenario for the evolution of analogous enzymes, which should be taken into account for functional annotation of genomes.

...read moreread less

Abstract: It is known that the same reaction may be catalyzed by structurally unrelated enzymes. We performed a systematic search for such analogous (as opposed to homologous) enzymes by evaluating sequence conservation among enzymes with the same enzyme classification (EC) number using sensitive, iterative sequence database search methods. Enzymes without detectable sequence similarity to each other were found for 105 EC numbers (a total of 243 distinct proteins). In 34 cases, independent evolutionary origin of the suspected analogous enzymes was corroborated by showing that they possess different structural folds. Analogous enzymes were found in each class of enzymes, but their overall distribution on the map of biochemical pathways is patchy, suggesting multiple events of gene transfer and selective loss in evolution, rather than acquisition of entire pathways catalyzed by a set of unrelated enzymes. Recruitment of enzymes that catalyze a similar but distinct reaction seems to be a major scenario for the evolution of analogous enzymes, which should be taken into account for functional annotation of genomes. For many analogous enzymes, the bacterial form of the enzyme is different from the eukaryotic one; such enzymes may be promising targets for the development of new antibacterial drugs.

...read moreread less

Journal Article•DOI•

[...]

Bhanu P. Chowdhary¹, Terje Raudsepp, Lutz Frönicke, Harry Scherthan•Institutions (1)

Swedish University of Agricultural Sciences¹

01 Jun 1998-Genome Research

TL;DR: The use of Zoo-FISH to identify regions of chromosomal homology has allowed the transfer of information from map-rich species such as human and mouse to a wide variety of other species, and provided a basis for developing a picture of the ancestral mammalian karyotype.

...read moreread less

Abstract: Although gene maps for a variety of evolutionarily diverged mammalian species have expanded rapidly during the past few years, until recently it has been difficult to precisely define chromosomal segments that are homologous between species. A solution to this problem has come from the development of Zoo-FISH, also known as cross-species chromosome painting. The use of Zoo-FISH to identify regions of chromosomal homology has allowed the transfer of information from map-rich species such as human and mouse to a wide variety of other species. From a Zoo-FISH analysis spanning four mammalian orders (Primates, Artiodactyla, Carnivora, and Perissodactyla), and involving eight species (human, pig, cattle, Indian muntjac, cat, American mink, harbor seal, and horse), three distinct classes of synteny conservation have been designated: (1) conservation of whole chromosome synteny, (2) conservation of large chromosomal blocks, and (3) conservation of neighboring segment combinations. This analysis has also made it possible to identify a set of chromosome segments (based on human chromosome equivalents) that probably made up the karyotype of the common ancestor of the four orders. This approach provides a basis for developing a picture of the ancestral mammalian karyotype, but a full understanding will depend on studies encompassing more diverse combinations of mammalian orders.

...read moreread less

Journal Article•DOI•

Dispersed Repetitive DNA Has Spread to New Genomes Since Polyploid Formation in Cotton

[...]

X.-P. Zhao¹, Yang Si, Robert E. Hanson, Charles F. Crane, H. J. Price², David M. Stelly, Jonathan F. Wendel, Andrew H. Paterson - Show less +4 more•Institutions (2)

Plant Genome Mapping Laboratory¹, Iowa State University²

Complete Genomic Sequence and Analysis of the Prion Protein Gene Region from Three Mammalian Species

TL;DR: The discovery of A-genome repeats in G. gossypioides adds genome-wide support to a suggestion previously based on evidence from only a single genetic locus that this species may be either the closest living descendant of the New World cotton ancestor, or an adulterated relic of polyploid formation.

...read moreread less

Abstract: Polyploid formation has played a major role in the evolution of many plant and animal genomes; however, surprisingly little is known regarding the subsequent evolution of DNA sequences that become newly united in a common nucleus. Of particular interest is the repetitive DNA fraction, which accounts for most nuclear DNA in higher plants and animals and which can be remarkably different, even in closely related taxa. In one recently formed polyploid, cotton (Gossypium barbadense L.; AD genome), 83 non-cross-hybridizing DNA clones contain dispersed repeats that are estimated to comprise about 24% of the nuclear DNA. Among these, 64 (77%) are largely restricted to diploid taxa containing the larger A genome and collectively account for about half of the difference in DNA content between Old World (A) and New World (D) diploid ancestors of cultivated AD tetraploid cotton. In tetraploid cotton, FISH analysis showed that some A-genome dispersed repeats appear to have spread to D-genome chromosomes. Such spread may also account for the finding that one, and only one, D-genome diploid cotton, Gossypium gossypioides, contains moderate levels of (otherwise) A-genome-specific repeats in addition to normal levels of D-genome repeats. The discovery of A-genome repeats in G. gossypioides adds genome-wide support to a suggestion previously based on evidence from only a single genetic locus that this species may be either the closest living descendant of the New World cotton ancestor, or an adulterated relic of polyploid formation. Spread of dispersed repeats in the early stages of polyploid formation may provide a tag to identify diploid progenitors of a polyploid. Although most repetitive clones do not correspond to known DNA sequences, 4 correspond to known transposons, most contain internal subrepeats, and at least 12 (including 2 of the possible transposons) hybridize to mRNAs expressed at readily discernible levels in cotton seedlings, implicating transposition as one possible mechanism of spread. Integration of molecular, phylogenetic, and cytogenetic analysis of dispersed repetitive DNA may shed new light on evolution of other polyploid genomes, as well as providing valuable landmarks for many aspects of genome analysis.

...read moreread less

Journal Article•DOI•

[...]

Inyoul Lee¹, David Westaway, Arian F.A. Smit, Kai Wang, Jason Seto, Lei Chen, Chetana Acharya, Mike Ankener, Dale Baskin, Carol L. Cooper, Hong Yao, Stanley B. Prusiner, Leroy Hood - Show less +9 more•Institutions (1)

University of Washington¹

01 Oct 1998-Genome Research

TL;DR: Ten(5) bp of DNA from clones containing human, sheep, and mouse PrP genes isolated in cosmids or lambda phage is sequenced and sequences in noncoding DNA that are conserved between the three species and may represent biologically functional sites are identified.

...read moreread less

Abstract: The prion protein (PrP), first identified in scrapie-infected rodents, is encoded by a single exon of a single-copy chromosomal gene. In addition to the protein-coding exon, PrP genes in mammals contain one or two 5'-noncoding exons. To learn more about the genomic organization of regions surrounding the PrP exons, we sequenced 10(5) bp of DNA from clones containing human, sheep, and mouse PrP genes isolated in cosmids or lambda phage. Our findings are as follows: (1) Although the human PrP transcript does not include the untranslated exon 2 found in its mouse and sheep counterparts, the large intron of the human PrP gene contains an exon 2-like sequence flanked by consensus splice acceptor and donor sites. (2) The mouse Prnpa but not the Prnpb allele found in 44 inbred lines contains a 6593 nucleotide retroviral genome inserted into the anticoding strand of intron 2. This intracisternal A-particle element is flanked by duplications of an AAGGCT nucleotide motif. (3) We found that the PrP gene regions contain from 40% to 57% genome-wide repetitive elements that independently increased the size of the locus in all three species by numerous mutations. The unusually long sheep PrP 3'-untranslated region contains a "fossil" 1.2-kb mariner transposable element. (4) We identified sequences in noncoding DNA that are conserved between the three species and may represent biologically functional sites.

...read moreread less

Journal Article•DOI•

A Homogeneous, Ligase-Mediated DNA Diagnostic Test

[...]

Xiangning Chen¹, Kenneth J. Livak, Pui-Yan Kwok•Institutions (1)

Washington University in St. Louis¹

Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa.

TL;DR: The development of a homogeneous DNA detection method that requires no further manipulations after the initial reaction is set up and can be automated for high-throughput genotyping in large-scale population studies is described.

...read moreread less

Abstract: Single-nucleotide variations are the most widely distributed genetic markers in the human genome. A subset of these variations, the substitution mutations, are responsible for most genetic disorders. As single nucleotide polymorphism (SNP) markers are being developed for molecular diagnosis of genetic disorders and large-scale population studies for genetic analysis of complex traits, a simple, sensitive, and specific test for single nucleotide changes is highly desirable. In this report we describe the development of a homogeneous DNA detection method that requires no further manipulations after the initial reaction is set up. This assay, named dye-labeled oligonucleotide ligation (DOL), combines the PCR and the oligonucleotide ligation reaction in a two-stage thermal cycling sequence with fluorescence resonance energy transfer (FRET) detection monitored in real time. Because FRET occurs only when the donor and acceptor dyes are in close proximity, one can infer the genotype or mutational status of a DNA sample by monitoring the specific ligation of dye-labeled oligonucleotide probes. We have successfully applied the DOL assay to genotype 10 SNPs or mutations. By designing the PCR primers and ligation probes in a consistent manner, multiple assays can be done under the same thermal cycling conditions. The standardized design and execution of the DOL assay means that it can be automated for high-throughput genotyping in large-scale population studies.

...read moreread less

Journal Article•DOI•

[...]

James W. Ajioka¹, John C. Boothroyd², Brian P. Brunk³, Adrian B. Hehl², Ledeana Hillier⁴, Ian D. Manger², Marco A. Marra⁴, G. Christian Overton³, David S. Roos³, Kiew Lian Wan, Robert H. Waterston⁴, L. David Sibley⁴ - Show less +8 more•Institutions (4)

University of Cambridge¹, Stanford University², University of Pennsylvania³, Washington University in St. Louis⁴

01 Jan 1998-Genome Research

TL;DR: To accelerate gene discovery and facilitate genetic mapping in the protozoan parasite Toxoplasma gondii, >7000 new ESTs from the 5' ends of randomly selected tachyzoite cDNAs are generated, with success in identifying new genes.

...read moreread less

Abstract: To accelerate gene discovery and facilitate genetic mapping in the protozoan parasite Toxoplasma gondii, we have generated >7000 new ESTs from the 5' ends of randomly selected tachyzoite cDNAs. Comparison of the ESTs with the existing gene databases identified possible functions for more than 500 new T. gondii genes by virtue of sequence motifs shared with conserved protein families, including factors involved in transcription, translation, protein secretion, signal transduction, cytoskeleton organization, and metabolism. Despite this success in identifying new genes, more than 50% of the ESTs correspond to genes of unknown function, reflecting the divergent evolutionary status of this parasite. A newly recognized class of genes was identified based on its similarity to sequences known only from other members of the same phylum, therefore identifying sequences that are apparently restricted to the Apicomplexa. Such genes may underlie pathways common to this group of medically important parasites, therefore identifying potential targets for intervention.

...read moreread less

Journal Article•DOI•

Molecular Basis for the Dominant White Phenotype in the Domestic Pig

[...]

Stefan L. Marklund¹, James W. Kijas, Heriberto Rodriguez-Martinez, Lars Rönnstrand, Keiko Funa, Maria Moller, Dirk Lange, Inger Edfors-Lilja, Leif Andersson - Show less +5 more•Institutions (1)

Swedish University of Agricultural Sciences¹

01 Aug 1998-Genome Research

TL;DR: The dominant white phenotype in domestic pigs is caused by two mutations in the KIT gene encoding the mast/stem cell growth factor receptor (MGF), one gene duplication associated with a partially dominant phenotype and a splice mutation in one of the copies leading to the fully dominant allele.

...read moreread less

Abstract: The change of phenotypic traits in domestic animals and crops as a response to selective breeding mimics the much slower evolutionary change in natural populations. Here, we describe that the dominant white phenotype in domestic pigs is caused by two mutations in the KIT gene encoding the mast/stem cell growth factor receptor (MGF), one gene duplication associated with a partially dominant phenotype and a splice mutation in one of the copies leading to the fully dominant allele. The splice mutation is a G to A substitution in the first nucleotide of intron 17 and leads to skipping of exon 17. The duplication is most likely a regulatory mutation affecting KIT expression, whereas the splice mutation is expected to cause a receptor with impaired or absent tyrosine kinase activity. Immunocytochemistry showed that this variant form is expressed in 17- to 19-day-old pig embryos. Hundreds of millions of white pigs around the world are assumed to be heterozygous or homozygous for the two mutations. [The EMBL accession numbers for porcine KIT1*0101, KIT1*0202, KIT2*0202, and KIT2*0101 are AJ223228-AJ223231, respectively.]

...read moreread less

Journal Article•DOI•

Overlapping Genomic Sequences: A Treasure Trove of Single-Nucleotide Polymorphisms

[...]

Patricia Taillon-Miller¹, Zhijie Gu, Qun Li, LaDeana W. Hillier, Pui-Yan Kwok - Show less +1 more•Institutions (1)

Washington University in St. Louis¹

01 Jul 1998-Genome Research

TL;DR: The results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations.

...read moreread less

Abstract: An efficient strategy to develop a dense set of single-nucleotide polymorphism (SNP) markers is to take advantage of the human genome sequencing effort currently under way. Our approach is based on the fact that bacterial artificial chromosomes (BACs) and P1-based artificial chromosomes (PACs) used in long-range sequencing projects come from diploid libraries. If the overlapping clones sequenced are from different lineages, one is comparing the sequences from 2 homologous chromosomes in the overlapping region. We have analyzed in detail every SNP identified while sequencing three sets of overlapping clones found on chromosome 5p15.2, 7q21-7q22, and 13q12-13q13. In the 200.6 kb of DNA sequence analyzed in these overlaps, 153 SNPs were identified. Computer analysis for repetitive elements and suitability for STS development yielded 44 STSs containing 68 SNPs for further study. All 68 SNPs were confirmed to be present in at least one of the three (Caucasian, African-American, Hispanic) populations studied. Furthermore, 42 of the SNPs tested (62%) were informative in at least one population, 32 (47%) were informative in two or more populations, and 23 (34%) were informative in all three populations. These results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations.

...read moreread less

Journal Article•DOI•

Comparative Gene Mapping: A Fine-Scale Survey of Chromosome Rearrangements between Ruminants and Humans

[...]

Laurent Schibler¹, Daniel Vaiman, A. Oustry, Corinne Giraud-Delville, Edmond P. Cribiu - Show less +1 more•Institutions (1)

Institut national de la recherche agronomique¹

01 Sep 1998-Genome Research

TL;DR: This comprehensive map of goat chromosomes will speed up positional cloning projects in domestic ruminants and clarify some aspects of mammalian chromosomal evolution.

...read moreread less

Abstract: A total of 202 genes were cytogenetically mapped to goat chromosomes, multiplying by five the total number of regional gene localizations in domestic ruminants (255). This map encompasses 249 and 173 common anchor loci regularly spaced along human and murine chromosomes, respectively, which makes it possible to perform a genome-wide comparison between three mammalian orders. Twice as many rearrangements as revealed by ZOO-FISH were observed. The average size of conserved fragments could be estimated at 27 and 8 cM with humans and mice, respectively. The position of evolutionary breakpoints often correspond with human chromosome sites known to be vulnerable to rearrangement in neoplasia. Furthermore, 75 microsatellite markers, 30 of which were isolated from gene-containing bacterial artificial chromosomes (BACs), were added to the previous goat genetic map, achieving 88% genome coverage. Finally, 124 microsatellites were cytogenetically mapped, which made it possible to physically anchor and orient all the linkage groups. We believe that this comprehensive map will speed up positional cloning projects in domestic ruminants and clarify some aspects of mammalian chromosomal evolution. [The sequence data described in this paper have been submitted to the GenBank data library under accession nos. G40978‐G41020, AF083170‐AF083184, AF088286, AF08287, AF083401‐AF083406, AF082884, and AF082885.]

...read moreread less

Journal Article•DOI•

Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss

[...]

Hugh M. Robertson¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

A map of 75 human ribosomal protein genes

TL;DR: Phylogenetic analyses of the str and stl families, and comparisons with a few orthologs in Caenorhabditis briggsae, reveal ongoing processes of gene duplication, diversification, and movement.

...read moreread less

Abstract: The str family of genes encoding seven-transmembrane G-protein-coupled or serpentine receptors related to the ODR-10 diacetyl chemoreceptor is very large, with at least 197 members in the Caenorhabditis elegans genome. The closely related stl family has 43 genes, and both families are distantly related to the srd family with 55 genes. Analysis of the structures of these genes indicates that a third of them are clearly or likely pseudogenes. Preliminary surveys of other candidate chemoreceptor families indicates that as many as 800 genes and pseudogenes or 6% of the genome might encode 550 functional chemoreceptors constituting 4% of the C. elegans protein complement. Phylogenetic analyses of the str and stl families, and comparisons with a few orthologs in Caenorhabditis briggsae, reveal ongoing processes of gene duplication, diversification, and movement. The reconstructed ancestral gene structures for these two families have eight introns each, four of which are homologous. Mapping of intron distributions on the phylogenetic tree reveals that each intron has been lost many times independently. Most of these introns were lost individually, which might best be explained by precise in-frame deletions involving nonhomologous recombination between short direct repeats at their termini. [Alignment of the putatively functional proteins in the str and stl families is available from Pfam (http://genome. wustl.edu/Pfam); alignments of all translations are available at http://cshl.org/gr; alignments of the genes are available from the author at hughrobe@uiuc.edu]

...read moreread less

Journal Article•DOI•

[...]

Naoya Kenmochi¹, Tomoko Kawaguchi, Steve Rozen, Elizabeth Davis, Nathan Goodman, Thomas J. Hudson, Tatsuo Tanaka, David C. Page - Show less +4 more•Institutions (1)

Massachusetts Institute of Technology¹

Sequencing Multimegabase-Template DNA with BigDye Terminator Chemistry

TL;DR: This map provides a foundation for the study of the possible roles of ribosomal protein deficiencies in chromosomal and Mendelian disorders.

...read moreread less

Abstract: We mapped 75 genes that collectively encode >90% of the proteins found in human ribosomes. Because localization of ribosomal protein genes (rp genes) is complicated by the existence of processed pseudogenes, multiple strategies were devised to identify PCR-detectable sequence-tagged sites (STSs) at introns. In some cases we exploited specific, pre-existing information about the intron/exon structure of a given human rp gene or its homolog in another vertebrate. When such information was unavailable, selection of PCR primer pairs was guided by general insights gleaned from analysis of all mammalian rp genes whose intron/exon structures have been published. For many genes, PCR amplification of introns was facilitated by use of YAC pool DNAs rather than total human genomic DNA as templates. We then assigned the rp gene STSs to individual human chromosomes by typing human‐rodent hybrid cell lines. The genes were placed more precisely on the physical map of the human genome by typing of radiation hybrids or screening YAC libraries. Fifty-one previously unmapped rp genes were localized, and 24 previously reported rp gene localizations were confirmed, refined, or corrected. Though functionally related and coordinately expressed, the 75 mapped genes are widely dispersed: Both sex chromosomes and at least 20 of the 22 autosomes carry one or more rp genes. Chromosome 19, known to have a high gene density, contains an unusually large number of rp genes (12). This map provides a foundation for the study of the possible roles of ribosomal protein deficiencies in chromosomal and Mendelian disorders. [The sequence data described in this paper have been submitted to GenBank. They are listed in Table 1.]

...read moreread less

Journal Article•DOI•

[...]

Cheryl R. Heiner¹, Kathryn L. Hunkapiller, Shiaw-Min Chen, John I. Glass, Ellson Y. Chen - Show less +1 more•Institutions (1)

Applied Biosystems¹

Reconstruction of Amino Acid Biosynthesis Pathways from the Complete Genome Sequence

TL;DR: Using the recently introduced BigDye terminators, large-template DNA can be directly sequenced with custom primers on automated instruments without additional manipulations of template DNA, thereby bypassing tedious subcloning steps.

...read moreread less

Abstract: In microbial genome or large-insert clone sequencing projects that use the predominant random subclone sequencing strategy, progress tends to decrease dramatically at late stages as one confronts gaps. At these points, DNA is under-represented or unstable in subclones (E.Y. Chen et al. 1996; Chissoe et al. 1997). Further sequencing with additional random subclones is then inefficient at best, and one must frequently employ alternative cloning systems or additional methods like long-range PCR to recover missing DNA (C.N. Chen et al. 1996). The variability of performance of these methods and the necessity for custom-tailored work tend to hamper the late stages of sequencing efforts. In contrast, if one can sequence directly from genomic DNA (or large-insert clones such as BACs or PACs) with walking primers, cumbersome work to fill gaps could be completed in a much shorter time. As an example, in a recent project to sequence the 750-kb genome of Ureaplasma urealyticum (J. Glass, in prep.) assemblage of ∼13,000 sequence reads and combinatorial PCR reactions to join contigs left two gaps. No λ pUC, or M13 subclones were recovered that spanned the gaps, nor were PCR products derived with any of several sets of flanking primers. The difficulty of cloning these segments is probably attributable to repeated sequences in and near the two gaps, but the high sensitivity of the recently introduced BigDye terminator (Rosenblum et al. 1997) permitted direct sequencing of the gap regions on genomic U. urealyticum DNA templates. Using the conditions described in this report, two gaps of 259 and 121 bp were sequenced from both strands with walking primers to complete the project of 751,723 bp. Direct sequencing was further tested for larger templates, and good results were reproducibly obtained with 1.2-Mb Mycoplasma fermentans, 2.3-Mb Streptococcus pneumoniae, and 4.6-Mb Escherichia coli genomic DNA (see example in Fig. Fig.1).1). In addition, several difficult gaps in sequencing projects with BAC clones, ranging in size from 140 to 250 kb, have also been filled in this manner. Essentially the method is applicable whenever 2–3 μg of high-quality large-template DNA is available. Figure 1 Sequencing of E. coli K12 strain genomic DNA with BigDye terminators. Approximately 3 μg of E. coli DNA was sequenced with an apaG gene primer (5′-GTTCCCACACTCATTCATTA) using the conditions described in the text.

...read moreread less

Journal Article•DOI•

[...]

Hidemasa Bono¹, Hiroyuki Ogata¹, Susumu Goto¹, Minoru Kanehisa¹•Institutions (1)

Kyoto University¹