scispace - formally typeset
Search or ask a question

Showing papers on "Sequence analysis published in 1999"


Journal ArticleDOI
TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Abstract: A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm’s speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human β T cell receptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface at c3.biomath.mssm.edu/trf.html has been established for automated use of the program.

6,577 citations


Journal ArticleDOI
TL;DR: The results clearly delineate the technical boundaries of current approaches for quantitative analysis of protein expression and reveal that simple deduction from mRNA transcript analysis is insufficient to predict protein expression levels from quantitative mRNA data.
Abstract: The description of the state of a biological system by the quantitative measurement of the system constituents is an essential but largely unexplored area of biology. With recent technical advances including the development of differential display-PCR (21), of cDNA microarray and DNA chip technology (20, 27), and of serial analysis of gene expression (SAGE) (34, 35), it is now feasible to establish global and quantitative mRNA expression profiles of cells and tissues in species for which the sequence of all the genes is known. However, there is emerging evidence which suggests that mRNA expression patterns are necessary but are by themselves insufficient for the quantitative description of biological systems. This evidence includes discoveries of posttranscriptional mechanisms controlling the protein translation rate (15), the half-lives of specific proteins or mRNAs (33), and the intracellular location and molecular association of the protein products of expressed genes (32). Proteome analysis, defined as the analysis of the protein complement expressed by a genome (26), has been suggested as an approach to the quantitative description of the state of a biological system by the quantitative analysis of protein expression profiles (36). Proteome analysis is conceptually attractive because of its potential to determine properties of biological systems that are not apparent by DNA or mRNA sequence analysis alone. Such properties include the quantity of protein expression, the subcellular location, the state of modification, and the association with ligands, as well as the rate of change with time of such properties. In contrast to the genomes of a number of microorganisms (for a review, see reference 11) and the transcriptome of Saccharomyces cerevisiae (35), which have been entirely determined, no proteome map has been completed to date. The most common implementation of proteome analysis is the combination of two-dimensional gel electrophoresis (2DE) (isoelectric focusing-sodium dodecyl sulfate [SDS]-polyacrylamide gel electrophoresis) for the separation and quantitation of proteins with analytical methods for their identification. 2DE permits the separation, visualization, and quantitation of thousands of proteins reproducibly on a single gel (18, 24). By itself, 2DE is strictly a descriptive technique. The combination of 2DE with protein analytical techniques has added the possibility of establishing the identities of separated proteins (1, 2) and thus, in combination with quantitative mRNA analysis, of correlating quantitative protein and mRNA expression measurements of selected genes. The recent introduction of mass spectrometric protein analysis techniques has dramatically enhanced the throughput and sensitivity of protein identification to a level which now permits the large-scale analysis of proteins separated by 2DE. The techniques have reached a level of sensitivity that permits the identification of essentially any protein that is detectable in the gels by conventional protein staining (9, 29). Current protein analytical technology is based on the mass spectrometric generation of peptide fragment patterns that are idiotypic for the sequence of a protein. Protein identity is established by correlating such fragment patterns with sequence databases (10, 22, 37). Sophisticated computer software (8) has automated the entire process such that proteins are routinely identified with no human interpretation of peptide fragment patterns. In this study, we have analyzed the mRNA and protein levels of a group of genes expressed in exponentially growing cells of the yeast S. cerevisiae. Protein expression levels were quantified by metabolic labeling of the yeast proteins to a steady state, followed by 2DE and liquid scintillation counting of the selected, separated protein species. Separated proteins were identified by in-gel tryptic digestion of spots with subsequent analysis by microspray liquid chromatography-tandem mass spectrometry (LC-MS/MS) and sequence database searching. The corresponding mRNA transcript levels were calculated from SAGE frequency tables (35). This study, for the first time, explores a quantitative comparison of mRNA transcript and protein expression levels for a relatively large number of genes expressed in the same metabolic state. The resultant correlation is insufficient for prediction of protein levels from mRNA transcript levels. We have also compared the relative amounts of protein and mRNA with the respective codon bias values for the corresponding genes. This comparison indicates that codon bias by itself is insufficient to accurately predict either the mRNA or the protein expression levels of a gene. In addition, the results demonstrate that only highly expressed proteins are detectable by 2DE separation of total cell lysates and that therefore the construction of complete proteome maps with current technology will be very challenging, irrespective of the type of organism.

3,947 citations


Journal ArticleDOI
TL;DR: All findings are applicable to automatic database searches and using intermediate sequences for finding links between more distant families was almost as successful: pairs were predicted to be homologous when the respective sequence families had proteins in common.
Abstract: Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 20-35% sequence identity. Here, more than a million sequence alignments were analysed between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (i) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (ii) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if 10 residues were similar in an alignment of length 16 (>60%), structural similarity could not be inferred. (iii) The 'more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (iv) Using intermediate sequences for finding links between more distant families was almost as successful: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches.

1,679 citations


Journal ArticleDOI
TL;DR: Small subunit rRNA sequence data were generated for 27 strains of cyanobacteria and incorporated into a phylogenetic analysis of 1,377 aligned sequence positions, finding all plastids cluster as a strongly supported monophyletic group arising near the root of the cyanobacterial line of descent.
Abstract: Small subunit rRNA sequence data were generated for 27 strains of cyanobacteria and incorporated into a phylogenetic analysis of 1,377 aligned sequence positions from a diverse sampling of 53 cyanobacteria and 10 photosynthetic plastids. Tree inference was carried out using a maximum likelihood method with correction for site-to-site variation in evolutionary rate. Confidence in the inferred phylogenetic relationships was determined by construction of a majority-rule consensus tree based on alternative topologies not considered to be statistically significantly different from the optimal tree. The results are in agreement with earlier studies in the assignment of individual taxa to specific sequence groups. Several relationships not previously noted among sequence groups are indicated, whereas other relationships previously supported are contradicted. All plastids cluster as a strongly supported monophyletic group arising near the root of the cyanobacterial line of descent.

1,301 citations


Journal ArticleDOI
Ian Dunham1, Nobuyoshi Shimizu1, Bruce A. Roe1, S. Chissoe1  +220 moreInstitutions (15)
02 Dec 1999-Nature
TL;DR: The sequence of the euchromatic part of human chromosome 22 is reported, which consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.
Abstract: Knowledge of the complete genomic DNA sequence of an organism allows a systematic approach to defining its genetic components. The genomic sequence provides access to the complete structures of all genes, including those without known function, their control elements, and, by inference, the proteins they encode, as well as all other biologically important sequences. Furthermore, the sequence is a rich and permanent source of information for the design of further biological studies of the organism and for the study of evolution through cross-species sequence comparison. The power of this approach has been amply demonstrated by the determination of the sequences of a number of microbial and model organisms. The next step is to obtain the complete sequence of the entire human genome. Here we report the sequence of the euchromatic part of human chromosome 22. The sequence obtained consists of 12 contiguous segments spanning 33.4 megabases, contains at least 545 genes and 134 pseudogenes, and provides the first view of the complex chromosomal landscapes that will be found in the rest of the genome.

1,075 citations


Journal ArticleDOI
TL;DR: The HIV RT and Protease Sequence Database is an on-line relational database that catalogues evolutionary and drug-related human immunodeficiency virus reverse transcriptase (RT) and protease sequence variation.
Abstract: The HIV RT and Protease Sequence Database is an online relational database that catalogs evolutionary and drug-related human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease sequence variation (http://hivdb.stanford.edu ). The database contains a compilation of nearly all published HIV RT and protease sequences including International Collaboration database submissions (e.g., GenBank) and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. The database is curated and sequences are annotated with data from >230 literature references. Users can retrieve additional data and view alignments of sequence sets meeting specific criteria (e.g., treatment history, subtype, presence of a particular mutation). A gene-specific sequence analysis program, new user-defined queries and nearly 2000 additional sequences were added in 1999.

980 citations


Journal ArticleDOI
TL;DR: The usage of the newer macrolides has increased dramatically over the last few years, which has led to increased exposure of bacterial populations to macrolide resistance, and the nomenclature for these genes has varied and has been inconsistent.
Abstract: Macrolides are composed of 14 (erythromycin and clarithromycin)-, 15 (azithromycin)-, or 16 (josamycin, spiramycin, and tylosin)-membered lactones to which are attached amino and/or neutral sugars via glycosidic bonds. Erythromycin was introduced in 1952 as the first macrolide antibiotic. Unfortunately, within a year, erythromycin-resistant (Emr) staphylococci from the United States, Europe, and Japan were described (101). Erythromycin is produced by Saccharopolyspora erythraea, while the newer macrolides are semisynthetic molecules with substitutions on the lactone. The newer derivatives, such as clarithromycin and azithromycin, have improved intracellular and tissue penetration, are more stable, are better absorbed, have a lower incidence of gastrointestinal side effects, and are less likely to interact with other drugs. They are useable against a wider range of infectious bacteria, such as Legionella, Chlamydia, Haemophilus, and some Mycobacterium species (not M. tuberculosis), and their pharmacokinetics provide for less frequent dosing than erythromycin (21, 47, 96, 97). As a result, the usage of the newer macrolides has increased dramatically over the last few years, which has led to increased exposure of bacterial populations to macrolides (101–103, 107). Macrolides inhibit protein synthesis by stimulating dissociation of the peptidyl-tRNA molecule from the ribosomes during elongation (101, 103). This results in chain termination and a reversible stoppage of protein synthesis. The first mechanism of macrolide resistance described was due to posttranscriptional modification of the 23S rRNA by the adenine-N6 methyltransferase (101–103). These enzymes add one or two methyl groups to a single adenine (A2058 in Escherichia coli) in the 23S rRNA moiety. Over the last 30 years, a number of adenine-N6-methyltransferases from different species, genera, and isolates have been described. In general, genes encoding these methylases have been designated erm (erythromycin ribosome methylation), although there are exceptions, especially in the antibiotic-producing organisms (see Tables ​Tables11 and ​and3)3) (103). As the number of erm genes described has grown, the nomenclature for these genes has varied and has been inconsistent (Table ​(Table1).1). In some cases, unrelated genes have been given the same letter designation, while in other cases, highly related genes (>90% identity) have been given different names. TABLE 1 rRNA methylase genes involved in MLSB resistance TABLE 3 Location of antibiotic resistance genesa The binding site in the 50S ribosomal subunit for erythromycin overlaps the binding site of the newer macrolides, as well as the structurally unrelated lincosamides and streptogramin B antibiotics. The modification by methylase(s) reduces the binding of all three classes of antibiotics, which results in resistance against macrolides, lincosamides, and streptogramin B antibiotics (MLSB). The rRNA methylases are the best studied among macrolide resistance mechanisms (47, 101–103). However, a variety of other mechanisms have been described which also confer resistance (Table ​(Table2).2). Many of these alternative mechanisms of resistance confer resistance to only one or two of the antibiotic classes of the MLSB complex. TABLE 2 Efflux and inactivating genes In this review, we suggest a new nomenclature for naming MLS genes and propose to use the rules developed for identifying and naming new tetracycline resistance genes (51, 52). This system, with a few recent modifications, was originally designed because of the ability of two genes to be distinguished uniquely by DNA-DNA probe methodology (51). It was generally found that two genes with <80% amino acid sequence identity provided enough variability in nucleotide sequence to permit distinct probes to be designed. Although many investigators are likely to sequence new genes, the use of probe technology allows rapid identification of isolates containing potentially new genes, as well as a reliable way to screen populations and determine the frequency of any one resistant determinant. Therefore, we continued this paradigm by assigning two genes of ≥80% amino acid identity to the same class and same letter designation, while two genes that show ≤79% amino acid identity are given a different letter designation. Table ​Table11 shows the results of the classification, with some classes having members with little variability, while others, like classes A and O, show a greater range of homology at both the DNA and amino acid levels. As new gene sequences emerge, ideally they will need to be compared by oligonucleotide probe hybridization and/or sequence analysis against the bank of known genes before a new designation is assigned. If multiple genes are available in any one class, especially when there is a range as in class A, then all representative members of the class should be examined, not just one. To confirm that the proposed name or number for the newly discovered resistance determinant has not been used by another investigator, please contact M. C. Roberts for this information. A similar request has been made for new tet genes (52).

846 citations


Journal ArticleDOI
01 Sep 1999-Genetics
TL;DR: The results show that Drosophila genes have a wide range of sensitivity to inactivation by P elements, and provide a rationale for greatly expanding the BDGP primary collection based entirely on insertion site sequencing, and predict that this approach can bring >85% of all Dosophila open reading frames under experimental control.
Abstract: A fundamental goal of genetics and functional genomics is to identify and mutate every gene in model organisms such as Drosophila melanogaster. The Berkeley Drosophila Genome Project (BDGP) gene disruption project generates single P-element insertion strains that each mutate unique genomic open reading frames. Such strains strongly facilitate further genetic and molecular studies of the disrupted loci, but it has remained unclear if P elements can be used to mutate all Drosophila genes. We now report that the primary collection has grown to contain 1045 strains that disrupt more than 25% of the estimated 3600 Drosophila genes that are essential for adult viability. Of these P insertions, 67% have been verified by genetic tests to cause the associated recessive mutant phenotypes, and the validity of most of the remaining lines is predicted on statistical grounds. Sequences flanking >920 insertions have been determined to exactly position them in the genome and to identify 376 potentially affected transcripts from collections of EST sequences. Strains in the BDGP collection are available from the Bloomington Stock Center and have already assisted the research community in characterizing >250 Drosophila genes. The likely identity of 131 additional genes in the collection is reported here. Our results show that Drosophila genes have a wide range of sensitivity to inactivation by P elements, and provide a rationale for greatly expanding the BDGP primary collection based entirely on insertion site sequencing. We predict that this approach can bring >85% of all Drosophila open reading frames under experimental control.

808 citations


Journal Article
TL;DR: High levels of expression in the S1-M1-80 cells and in the human breast cancer subline, MCF-7 AdVp3000, are consistent with the identification of a new ATP binding cassette transporter, which is overexpressed in mitoxantrone-resistant cells.
Abstract: Reports of multiple distinct mitoxantrone-resistant sublines without overexpression of P-glycoprotein or the multidrug-resistance associated protein have raised the possibility of the existence of another major transporter conferring drug resistance. In the present study, a cDNA library from mitoxantrone-resistant S1-M1-80 human colon carcinoma cells was screened by differential hybridization. Two cDNAs of different lengths were isolated and designated MXR1 and MXR2. Sequencing revealed a high degree of homology for the cDNAs with Expressed Sequence Tag sequences previously identified as belonging to an ATP binding cassette transporter. Homology to the Drosophila white gene and its homologues was found for the predicted amino acid sequence. Using either cDNA as a probe in a Northern analysis demonstrated high levels of expression in the S1-M1-80 cells and in the human breast cancer subline, MCF-7 AdVp3000. Levels were lower in earlier steps of selection, and in partial revertants. The gene is amplified 10-12-fold in the MCF-7 AdVp3000 cells, but not in the S1-M1-80 cells These studies are consistent with the identification of a new ATP binding cassette transporter, which is overexpressed in mitoxantrone-resistant cells.

806 citations


Journal ArticleDOI
16 Dec 1999-Nature
TL;DR: The sequence of chromosome 2 from the Columbia ecotype is reported in two gap-free assemblies (contigs) of 3.6 and 16 megabases, which represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date.
Abstract: Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.

792 citations


Journal ArticleDOI
TL;DR: Coimmunoprecipitation experiments indicate that these HDAC proteins are not components of the previously identified HDAC1 and HDAC2 NRD and mSin3A complexes, however, HDAC4 andHDAC5 associate with HDAC3 in vivo, which suggests that the human class II HDAC enzymes may function in cellular processes distinct from those of HDACs.
Abstract: Gene expression is in part controlled by chromatin remodeling factors and the acetylation state of nucleosomal histones The latter process is regulated by histone acetyltransferases and histone deacetylases (HDACs) Previously, three human and five yeast HDAC enzymes had been identified These can be categorized into two classes: the first class represented by yeast Rpd3-like proteins and the second by yeast Hda1-like proteins Human HDAC1, HDAC2, and HDAC3 proteins are members of the first class, whereas no class II human HDAC proteins had been identified The amino acid sequence of Hda1p was used to search the GenBank/expressed sequence tag databases to identify partial sequences from three putative class II human HDAC proteins The corresponding full-length cDNAs were cloned and defined as HDAC4, HDAC5, and HDAC6 These proteins possess certain features present in the conserved catalytic domains of class I human HDACs, but also contain additional sequence domains Interestingly, HDAC6 contains an internal duplication of two catalytic domains, which appear to function independently of each other These class II HDAC proteins have differential mRNA expression in human tissues and possess in vitro HDAC activity that is inhibited by trichostatin A Coimmunoprecipitation experiments indicate that these HDAC proteins are not components of the previously identified HDAC1 and HDAC2 NRD and mSin3A complexes However, HDAC4 and HDAC5 associate with HDAC3 in vivo This finding suggests that the human class II HDAC enzymes may function in cellular processes distinct from those of HDAC1 and HDAC2

Journal ArticleDOI
TL;DR: The nucleotide binding site (NBS) is a characteristic domain of many plant resistance gene products and its wide distribution in the plant kingdom and their prevalence in the Arabidopsis and rice genomes indicate that they are ancient, diverse and common in plants.
Abstract: The nucleotide binding site (NBS) is a characteristic domain of many plant resistance gene products. An increasing number of NBS-encoding sequences are being identified through gene cloning, PCR amplification with degenerate primers, and genome sequencing projects. The NBS domain was analyzed from 14 known plant resistance genes and more than 400 homologs, representing 26 genera of monocotyledonous, dicotyle-donous and one coniferous species. Two distinct groups of diverse sequences were identified, indicating divergence during evolution and an ancient origin for these sequences. One group was comprised of sequences encoding an N-terminal domain with Toll/Interleukin-1 receptor homology (TIR), including the known resistance genes, N, M, L6, RPP1 and RPP5. Surprisingly, this group was entirely absent from monocot species in searches of both random genomic sequences and large collections of ESTs. A second group contained monocot and dicot sequences, including the known resistance genes, RPS2, RPM1, I2, Mi, Dm3, Pi-B, Xa1, RPP8, RPS5 and Prf. Amino acid signatures in the conserved motifs comprising the NBS domain clearly distinguished these two groups. The Arabidopsis genome is estimated to contain approximately 200 genes that encode related NBS motifs; TIR sequences were more abundant and outnumber non-TIR sequences threefold. The Arabidopsis NBS sequences currently in the databases are located in approximately 21 genomic clusters and 14 isolated loci. NBS-encoding sequences may be more prevalent in rice. The wide distribution of these sequences in the plant kingdom and their prevalence in the Arabidopsis and rice genomes indicate that they are ancient, diverse and common in plants. Sequence inferences suggest that these genes encode a novel class of nucleotide-binding proteins.

Journal ArticleDOI
TL;DR: A PCR-based approach to sequencing complete mitochondrial genomes is described along with a set of 86 primers designed primarily for avian mitochondrial DNA, which should make available a wider variety of mitochondrial genes for studies based on smaller data sets.

Journal ArticleDOI
TL;DR: The presence of substantial variations across the entire genome and in sgmRNA processing indicates that PRRSV has evolved independently on separate continents, and suggests that changes in swine husbandry and management may have contributed to the emergence of PRRS.
Abstract: Porcine reproductive and respiratory syndrome virus (PRRSV) is a recently described arterivirus responsible for disease in swine worldwide. Comparative sequence analysis of 3′-terminal structural genes of the single-stranded RNA viral genome revealed the presence of two genotypic classes of PRRSV, represented by the prototype North American and European strains, VR-2332 and Lelystad virus (LV), respectively. To better understand the evolution and pathogenicity of PRRSV, we obtained the 12,066-base 5′-terminal nucleotide sequence of VR-2332, encoding the viral replication activities, and compared it to those of LV and other arteriviruses. VR-2332 and LV differ markedly in the 5′ leader and sections of the open reading frame (ORF) 1a region. The ORF 1b sequence was nearly colinear but varied in similarity of proteins encoded in identified regions. Furthermore, molecular and biochemical analysis of subgenomic mRNA (sgmRNA) processing revealed extensive variation in the number of sgmRNAs which may be generated during infection and in the lengths of noncoding sequence between leader-body junctions and the translation-initiating codon AUG. In addition, VR-2332 and LV select different leader-body junction sites from a pool of similar candidate sites to produce sgmRNA 7, encoding the viral nucleocapsid protein. The presence of substantial variations across the entire genome and in sgmRNA processing indicates that PRRSV has evolved independently on separate continents. The near-simultaneous global emergence of a new swine disease caused by divergently evolved viruses suggests that changes in swine husbandry and management may have contributed to the emergence of PRRS.

Journal ArticleDOI
TL;DR: The results suggest that EDS1 functions upstream of salicylic acid-dependent PR1 mRNA accumulation and is not required for jasmonic acid-induced PDF1.2 mRNA expression.
Abstract: A major class of plant disease resistance (R) genes encodes leucine-rich-repeat proteins that possess a nucleotide binding site and amino-terminal similarity to the cytoplasmic domains of the Drosophila Toll and human IL-1 receptors. In Arabidopsis thaliana, EDS1 is indispensable for the function of these R genes. The EDS1 gene was cloned by targeted transposon tagging and found to encode a protein that has similarity in its amino-terminal portion to the catalytic site of eukaryotic lipases. Thus, hydrolase activity, possibly on a lipid-based substrate, is anticipated to be central to EDS1 function. The predicted EDS1 carboxyl terminus has no significant sequence homologies, although analysis of eight defective eds1 alleles reveals it to be essential for EDS1 function. Two plant defense pathways have been defined previously that depend on salicylic acid, a phenolic compound, or jasmonic acid, a lipid-derived molecule. We examined the expression of EDS1 mRNA and marker mRNAs (PR1 and PDF1.2, respectively) for these two pathways in wild-type and eds1 mutant plants after different challenges. The results suggest that EDS1 functions upstream of salicylic acid-dependent PR1 mRNA accumulation and is not required for jasmonic acid-induced PDF1.2 mRNA expression.

Journal ArticleDOI
TL;DR: The CLV2 gene encodes a receptor-like protein (RLP), with a presumed extracellular domain composed of leucine-rich repeats similar to those found in plant and animal receptors, but with a very short predicted cytoplasmic tail.
Abstract: The CLAVATA2 (CLV2) gene regulates both meristem and organ development in Arabidopsis. We isolated the CLV2 gene and found that it encodes a receptor-like protein (RLP), with a presumed extracellular domain composed of leucine-rich repeats similar to those found in plant and animal receptors, but with a very short predicted cytoplasmic tail. RLPs lacking cytoplasmic signaling domains have not been previously shown to regulate development in plants. Our prior work has demonstrated that the CLV1 receptor-like kinase (RLK) is present as a disulfide-linked multimer in vivo. We report that CLV2 is required for the normal accumulation of CLV1 protein and its assembly into protein complexes, indicating that CLV2 may form a heterodimer with CLV1 to transduce extracellular signals. Sequence analysis suggests that the charged residue in the predicted transmembrane domain of CLV2 may be a common feature of plant RLPs and RLKs. In addition, the chromosomal region in which CLV2 is located contains an extremely high rate of polymorphism, with 50 nucleotide and 15 amino acid differences between Landsberg erecta and Columbia ecotypes within the CLV2 coding sequence.

Journal ArticleDOI
TL;DR: The sequence homologies and putative subcellular localization of spastin suggest that this ATPase is involved in the assembly or function of nuclear protein complexes.
Abstract: Autosomal dominant hereditary spastic paraplegia (AD-HSP) is a genetically heterogeneous neurodegenerative disorder characterized by progressive spasticity of the lower limbs. Among the four loci causing AD-HSP identified so far, the SPG4 locus at chromosome 2p21–p22 has been shown to account for 40–50% of all AD-HSP families. Using a positional cloning strategy based on obtaining sequence of the entire SPG4 interval, we identified a candidate gene encoding a new member of the AAA protein family, which we named spastin. Sequence analysis of this gene in seven SPG4-linked pedigrees revealed several DNA modifications, including missense, nonsense and splice-site mutations. Both SPG4 and its mouse orthologue were shown to be expressed early and ubiquitously in fetal and adult tissues. The sequence homologies and putative subcellular localization of spastin suggest that this ATPase is involved in the assembly or function of nuclear protein complexes.

Journal ArticleDOI
TL;DR: A number of Arabidopsis Expressed Sequence Tags (ESTs) have been identified that encode gene products bearing remarkable similarity to SCR throughout their carboxyl-termini, indicating that SCR is the prototype of a novel gene family.
Abstract: Mutations at the SCARECROW (SCR) locus in Arabidopsis thaliana result in defective radial patterning in the root and shoot. The SCR gene product contains sequences which suggest that it is a transcription factor. A number of Arabidopsis Expressed Sequence Tags (ESTs) have been identified that encode gene products bearing remarkable similarity to SCR throughout their carboxyl-termini, indicating that SCR is the prototype of a novel gene family. These ESTs have been designated SCARECROW-LIKE (SCL). The gene products of the GIBBERELLIN-INSENSITIVE (GAI) and the REPRESSOR of ga1-3 (RGA) loci show high structural and sequence similarity to SCR and the SCLs. Sequence analysis of the products of the GRAS (GAI, RGA, SCR) gene family indicates that they share a variable amino-terminus and a highly conserved carboxyl-terminus that contains five recognizable motifs. The SCLs have distinct patterns of expression, but all of those analyzed show expression in the root. One of them, SCL3, has a tissue-specific pattern of expression in the root similar to SCR. The importance of the GRAS gene family in plant biology has been established by the functional analyses of SCR, GAI and RGA.

Journal ArticleDOI
TL;DR: While 18S-rDNA sequences do not always provide the taxonomic resolution to identify fungal species and strains, they do provide information on the diversity and dynamics of groups of related species in environmental samples with sufficient resolution to produce discrete bands which can be separated by TGGE.
Abstract: Like bacteria, fungi play an important role in the soil ecosystem. As only a small fraction of the fungi present in soil can be cultured, conventional microbiological techniques yield only limited information on the composition and dynamics of fungal communities in soil. DNA-based methods do not depend on the culturability of microorganisms, and therefore they offer an attractive alternative for the study of complex fungal community structures. For this purpose, we designed various PCR primers that allow the specific amplification of fungal 18S-ribosomal-DNA (rDNA) sequences, even in the presence of nonfungal 18S rDNA. DNA was extracted from the wheat rhizosphere, and 18S rDNA gene banks were constructed in Escherichia coli by cloning PCR products generated with primer pairs EF4-EF3 (1.4 kb) and EF4-fung5 (0.5 kb). Fragments of 0.5 kb from the cloned inserts were sequenced and compared to known rDNA sequences. Sequences from all major fungal taxa were amplified by using both primer pairs. As predicted by computer analysis, primer pair EF4-EF3 appeared slightly biased to amplify Basidiomycota and Zygomycota, whereas EF4-fung5 amplified mainly Ascomycota. The 61 clones that were sequenced matched the sequences of 24 different species in the Ribosomal Database Project (RDP) database. Similarity values ranged from 0.676 to 1. Temperature gradient gel electrophoresis (TGGE) analysis of the fungal community in the wheat rhizosphere of a microcosm experiment was carried out after amplification of total DNA with both primer pairs. This resulted in reproducible, distinctive fingerprints, confirming the difference in amplification specificity. Clear banding patterns were obtained with soil and rhizosphere samples by using both primer sets in combination. By comparing the electrophoretic mobility of community fingerprint bands to that of the bands obtained with separate clones, some could be tentatively identified. While 18S-rDNA sequences do not always provide the taxonomic resolution to identify fungal species and strains, they do provide information on the diversity and dynamics of groups of related species in environmental samples with sufficient resolution to produce discrete bands which can be separated by TGGE. This combination of 18S-rDNA PCR amplification and TGGE community analysis should allow study of the diversity, composition, and dynamics of the fungal community in bulk soil and in the rhizosphere.

Journal ArticleDOI
TL;DR: The family of zinc finger domains described here is sufficient for the construction of 17 million novel proteins that bind the 5'-(GNN)6-3' family of DNA sequences and should allow for the rapid construction of novel gene switches and provide the basis for a universal system for gene control.
Abstract: We have taken a comprehensive approach to the generation of novel DNA binding zinc finger domains of defined specificity. Herein we describe the generation and characterization of a family of zinc finger domains developed for the recognition of each of the 16 possible 3-bp DNA binding sites having the sequence 5′-GNN-3′. Phage display libraries of zinc finger proteins were created and selected under conditions that favor enrichment of sequence-specific proteins. Zinc finger domains recognizing a number of sequences required refinement by site-directed mutagenesis that was guided by both phage selection data and structural information. In many cases, residues not expected to make base-specific contacts had effects on specificity. A number of these domains demonstrate exquisite specificity and discriminate between sequences that differ by a single base with >100-fold loss in affinity. We conclude that the three helical positions −1, 3, and 6 of a zinc finger domain are insufficient to allow for the fine specificity of the DNA binding domain to be predicted. These domains are functionally modular and may be recombined with one another to create polydactyl proteins capable of binding 18-bp sequences with subnanomolar affinity. The family of zinc finger domains described here is sufficient for the construction of 17 million novel proteins that bind the 5′-(GNN)6-3′ family of DNA sequences. These materials and methods should allow for the rapid construction of novel gene switches and provide the basis for a universal system for gene control.

Journal ArticleDOI
TL;DR: A new prediction technique locating potential GPI-modification sites in precursor sequences has been applied for large-scale protein sequence database searches and has been implemented in the prototype software "big-Pi predictor" which may find application as a genome annotation and target selection tool.

Journal ArticleDOI
TL;DR: Results elucidate, for the first time, a molecular mechanism mediating phase variation in staphylcocci, and demonstrate that a naturally occurring insertion sequence element is actively involved in the modulation of expression of a Staphylococcus virulence factor.
Abstract: Biofilm formation of Staphylococcus epidermidis on smooth polymer surfaces has been shown to be mediated by the ica operon. Upon activation of this operon, a polysaccharide intercellular adhesin (PIA) is synthesized that supports bacterial cell-to-cell contacts and triggers the production of thick, multilayered biofilms. Thus, the ica gene cluster represents a genetic determinant that significantly contributes to the virulence of specific Staphylococcus epidermidis strains. PIA synthesis has been reported recently to undergo a phase variation process. In this study, biofilm-forming Staphylococcus epidermidis strains and their PIA-negative phase variants were analysed genetically to investigate the molecular mechanisms of phase variation. We have characterized biofilm-negative variants by Southern hybridization with ica-specific probes, polymerase chain reaction and nucleotide sequencing. The data obtained in these analyses suggested that in approximately 30% of the variants the missing biofilm formation was due to the inactivation of either the icaA or the icaC gene by the insertion of the insertion sequence element IS256. Furthermore, it was shown that the transposition of IS256 into the ica operon is a reversible process. After repeated passages of the PIA-negative insertional mutants, the biofilm-forming phenotype could be restored. Nucleotide sequence analyses of the revertants confirmed the complete excision of IS256, including the initially duplicated 8 bp target sites. These results elucidate, for the first time, a molecular mechanism mediating phase variation in staphylcocci, and they demonstrate that a naturally occurring insertion sequence element is actively involved in the modulation of expression of a Staphylococcus virulence factor.

Journal ArticleDOI
TL;DR: Using consensus regions in gene sequences encoding the two forms of nitrite reductase (Nir), a key enzyme in the denitrification pathway, two sets of PCR primers to amplifycd1- and Cu-nir were designed and conserved.
Abstract: Using consensus regions in gene sequences encoding the two forms of nitrite reductase (Nir), a key enzyme in the denitrification pathway, we designed two sets of PCR primers to amplify cd1- and Cu-nir. The primers were evaluated by screening defined denitrifying strains, denitrifying isolates from wastewater treatment plants, and extracts from activated sludge. Sequence relationships of nir genes were also established. The cd1 primers were designed to amplify a 778 to 799-bp region of cd1-nir in the six published sequences. Likewise, the Cu primers amplified a 473-bp region in seven of the eight published Cu-nir sequences. Together, the two sets of PCR primers amplified nir genes in nine species within four genera, as well as in four of the seven sludge isolates. The primers did not amplify genes of nondenitrifying strains. The Cu primers amplified the expected fragment in all 13 sludge samples, but cd1-nir fragments were only obtained in five samples. PCR products of the expected sizes were verified as nir genes after hybridization to DNA probes, except in one case. The sequenced nir fragments were related to other nir sequences, demonstrating that the primers amplified the correct gene. The selected primer sites for Cu-nir were conserved, while broad-range primers targeting conserved regions of cd1-nir seem to be difficult to find. We also report on the existence of Cu-nir in Paracoccus denitrificans Pd1222.

Journal ArticleDOI
TL;DR: Comparative sequence analysis of amplified rpoB DNAs can be used efficiently to identify clinical isolates of mycobacteria in parallel with traditional culture methods and as a supplement to 16S rDNA gene analysis.
Abstract: The genus Mycobacterium comprises a wide range of organisms, including obligate parasites causing serious human and animal diseases, opportunistic pathogens, and saprophytic species found in nature. Human infections are caused mainly by slowly growing mycobacteria that need more than 7 days to form visible colonies on solid media. Traditionally, the definitive diagnosis of mycobacterial infections has been dependent on the isolation and identification of causative agents and requires a series of specialized physiological and biochemical tests. The procedures for these tests are complex, laborious, and usually impeded by the slow growth of mycobacteria in clinical laboratories. In particular, Mycobacterium leprae has not been cultivated in vitro. There have been increasing numbers of reports of infections caused by mycobacteria other than M. tuberculosis (MOTT), especially in association with human immunodeficiency virus infection. These are rarely disease associated, previously unknown or newly recognized mycobacteria that are not easy to identify; moreover, due to their phenotypic similarity to certain species, they cannot be easily characterized by the conventional methods of identification. However, mycobacterial systematics may help in the differentiation and identification of these phenotypically similar mycobacterial species. Descriptive taxonomic analyses have been used to classify mycobacterial species. For a clearer definition of species boundaries, macromolecular comparisons, particularly of 16S rRNA, which is highly conserved throughout organisms, have been used to determine phylogenetic relationships. Mycobacterial phylogenetic analysis based on sequences of 16S rRNA (25, 31) or the 16S rRNA gene (16S rDNA) (27) has helped to define mycobacterial species. This analysis demonstrated the usefulness of genotypic studies, especially when conventional procedures are inapplicable, particularly for the differentiation and identification of novel and uncultivable mycobacteria (19). It has been suggested, however, that for the delineation of species boundaries, 16S rRNA-based phylogenetic analysis has its limitations (11). Ambiguous results due to the presence of two different 16S rRNA genes in an organism would also limit the use of 16S rDNA sequencing in the identification of mycobacterial species (23, 26). Doubts about its usefulness were raised because M. kansasii, a pathogenic mycobacterium, could not be distinguished from nonpathogenic M. gastri by this means (35). A similar result was observed in 23S rRNA sequence analysis; the sequence for M. kansasii was identical to that for M. celatum (32). rpoB encodes the β subunit of RNA polymerase. The rpoB nucleotide sequences of three mycobacterial species were previously known (15, 16, 22). Missense mutations within rpoB’s limited region are known to be related to rifampin resistance in M. tuberculosis (34). Recently, the rpoB gene was used as an alternative tool to identify mycobacteria (14). However, only a limited number of reference species (five slowly growing and five rapidly growing species) in the genus Mycobacterium were used. In the present study, rpoB DNAs (342 bp) comprising a highly conserved region throughout the eubacteria (5) were amplified from the 44 reference strains of mycobacteria. Their nucleotide sequences (306 bp) were directly determined and compared to study their phylogenetic relationships. To demonstrate the feasibility of using this method in which rpoB sequences are compared and a phylogenetic tree with reference species is inferred, this procedure was applied to clinical isolates. We suggest that this procedure is a useful identification method that can be completed within two working days.

Journal ArticleDOI
24 Dec 1999-Science
TL;DR: High-precision genetic mapping was used to define the regions that contain centromere functions on each natural chromosome in Arabidopsis thaliana, and the DNA within the centromeres was not merely structural but also encoded several expressed genes.
Abstract: High-precision genetic mapping was used to define the regions that contain centromere functions on each natural chromosome in Arabidopsis thaliana. These regions exhibited dramatic recombinational repression and contained complex DNA surrounding large arrays of 180-base pair repeats. Unexpectedly, the DNA within the centromeres was not merely structural but also encoded several expressed genes. The regions flanking the centromeres were densely populated by repetitive elements yet experienced normal levels of recombination. The genetically defined centromeres were well conserved among Arabidopsis ecotypes but displayed limited sequence homology between different chromosomes, excluding repetitive DNA. This investigation provides a platform for dissecting the role of individual sequences in centromeres in higher eukaryotes.

Journal ArticleDOI
Klaus F. X. Mayer1, C. Schüller1, R. Wambutt, George Murphy2  +230 moreInstitutions (21)
16 Dec 1999-Nature
TL;DR: Analysis of 17.38 megabases of unique sequence, representing about 17% of the Arabidopsis genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements.
Abstract: The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.

Journal ArticleDOI
TL;DR: CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods.
Abstract: Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

Journal ArticleDOI
TL;DR: The epidemiology of tick-borne encephalitis virus was investigated by comparative sequence analysis of virus strains isolated in endemic areas of Europe and Asia and three genetic lineages could be clearly distinguished, corresponding to a European, a Far Eastern and a Siberian subtype.
Abstract: The epidemiology of tick-borne encephalitis virus was investigated by comparative sequence analysis of virus strains isolated in endemic areas of Europe and Asia. Phylogenetic relationships were determined from the nucleotide and amino acid sequences of the major envelope (E) protein of 16 newly sequenced strains and nine previously published sequences. Three genetic lineages could be clearly distinguished, corresponding to a European, a Far Eastern and a Siberian subtype. Amino acids characteristic for each of the subtypes ('signature' amino a cids) were identified and their location in the atomic structure of protein E was determined. The degree of variation between strains within subtypes was low and exhibited a maximum of only 2.2% at the amino acid level. A maximum difference of 5.6% was found between the three subtypes, which is in the range of variation reported for other flaviviruses.

Book ChapterDOI
TL;DR: A comparative sequence analysis with 298 available receiver domain sequences of cognate response regulators demonstrates a significant correlation between kinase and regulator subfamilies, suggesting that different subclasses of His-Asp phosphorelay systems have evolved independently of one another.
Abstract: Signal transduction in microorganisms and plants is often mediated by His-Asp phosphorelay systems. Two conserved families of proteins are centrally involved: histidine protein kinases and phospho-aspartyl response regulators. The kinases generally function in association with sensory elements that regulate their activities in response to environmental signals. A sequence analysis with 348 histidine kinase domains reveals that this family consists of distinct subgroups. A comparative sequence analysis with 298 available receiver domain sequences of cognate response regulators demonstrates a significant correlation between kinase and regulator subfamilies. These findings suggest that different subclasses of His-Asp phosphorelay systems have evolved independently of one another.

Journal ArticleDOI
TL;DR: The cytochrome P450 14alpha-demethylase, encoded by the ERG11 (CYP51) gene, is the primary target for the azole class of antifungals and mutations found in the two N-terminal regions only Y132H was demonstrated to be of importance for azole resistance.
Abstract: The cytochrome P450 14α-demethylase, encoded by the ERG11 (CYP51) gene, is the primary target for the azole class of antifungals. Changes in the azole affinity of this enzyme caused by amino acid substitutions have been reported as a resistance mechanism. Nine Candida albicans strains were used in this study. The ERG11 base sequence of seven isolates, of which only two were azole-sensitive, were determined. The ERG11 base sequences of the other two strains have been published previously. In these seven isolates, 12 different amino acid substitutions were identified, of which six have not been described previously (A149V, D153E, E165Y, S279F, V452A and G465S). In addition, 16 silent mutations were found. Two different biochemical assays, subcellular sterol biosynthesis and CO binding to reduced microsomal fractions, were used to evaluate the sensitivity of the cytochromes for fluconazole and itraconazole. Enzyme preparations from four isolates showed reduced itraconazole susceptibility, whereas more pronounced resistance to fluconazole was observed in five isolates. A three-dimensional model of C. albicans Cyp51p was used to position all 29 reported substitutions, 98 in total identified in 53 sequences. These 29 substitutions were not randomly distributed over the sequence but clustered in three regions from amino acids 105 to 165, from 266 to 287 and from 405 to 488, suggesting the existence of hotspot regions. Of the mutations found in the two N-terminal regions only Y132H was demonstrated to be of importance for azole resistance. In the C-terminal region three mutations are associated with resistance, suggesting that the non-characterized substitutions found in this region should be prioritized for further analysis.