scispace - formally typeset
Search or ask a question

Showing papers on "Sequence analysis published in 2001"



Journal ArticleDOI
20 Jul 2001-Science
TL;DR: A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low–guanine/cytosine Gram-positive species.
Abstract: The 2,160,837-base pair genome sequence of an isolate of Streptococcus pneumoniae, a Gram-positive pathogen that causes pneumonia, bacteremia, meningitis, and otitis media, contains 2236 predicted coding regions; of these, 1440 (64%) were assigned a biological role. Approximately 5% of the genome is composed of insertion sequences that may contribute to genome rearrangements through uptake of foreign DNA. Extracellular enzyme systems for the metabolism of polysaccharides and hexosamines provide a substantial source of carbon and nitrogen for S. pneumoniae and also damage host tissues and facilitate colonization. A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low-guanine/cytosine (GC) Gram-positive species. Several surface-exposed proteins that may serve as potential vaccine candidates were identified. Comparative genome hybridization with DNA arrays revealed strain differences in S. pneumoniae that could contribute to differences in virulence and antigenicity.

1,409 citations


Journal ArticleDOI
TL;DR: It is concluded that small RNAs are much more widespread than previously imagined and that these versatile molecules may play important roles in the fine-tuning of cell responses to changing environments.

786 citations


Journal ArticleDOI
TL;DR: Only 37% of the Anabaena genes showed significant sequence similarity to those of Synechocystis, indicating a high degree of divergence of the gene information between the two cyanobacterial strains.
Abstract: The nucleotide sequence of the entire genome of a filamentous cyanobacterium, Anabaena sp. strain PCC 7120, was determined. The genome of Anabaena consisted of a single chromosome (6,413,771 bp) and six plasmids, designated pCC7120α (408,101 bp), pCC7120β (186,614 bp), pCC7120γ (101,965 bp), pCC7120δ (55,414 bp), pCC7120e (40,340 bp), and pCC7120ζ (5,584 bp). The chromosome bears 5368 potential protein-encoding genes, four sets of rRNA genes, 48 tRNA genes representing 42 tRNA species, and 4 genes for small structural RNAs. The predicted products of 45% of the potential protein-encoding genes showed sequence similarity to known and predicted proteins of known function, and 27% to translated products of hypothetical genes. The remaining 28% lacked significant similarity to genes for known and predicted proteins in the public DNA databases. More than 60 genes involved in various processes of heterocyst formation and nitrogen fixation were assigned to the chromosome based on their similarity to the reported genes. One hundred and ninety-five genes coding for components of two-component signal transduction systems, nearly 2.5 times as many as those in Synechocystis sp. PCC 6803, were identified on the chromosome. Only 37% of the Anabaena genes showed significant sequence similarity to those of Synechocystis, indicating a high degree of divergence of the gene information between the two cyanobacterial strains.

699 citations


Journal ArticleDOI
TL;DR: A comparative analysis of the structural relationships among vertebrate PTP domains is presented and a comprehensive resource for sequence analysis of phosphotyrosine-specific PTPs is provided.
Abstract: With the current access to the whole genomes of various organisms and the completion of the first draft of the human genome, there is a strong need for a structure-function classification of protein families as an initial step in moving from DNA databases to a comprehensive understanding of human biology. As a result of the explosion in nucleic acid sequence information and the concurrent development of methods for high-throughput functional characterization of gene products, the genomic revolution also promises to provide a new paradigm for drug discovery, enabling the identification of molecular drug targets in a significant number of human diseases. This molecular view of diseases has contributed to the importance of combining primary sequence data with three-dimensional structure and has increased the awareness of computational homology modeling and its potential to elucidate protein function. In particular, when important proteins or novel therapeutic targets are identified—like the family of protein tyrosine phosphatases (PTPs) (reviewed in reference 53)—a structure-function classification of such protein families becomes an invaluable framework for further advances in biomedical science. Here, we present a comparative analysis of the structural relationships among vertebrate PTP domains and provide a comprehensive resource for sequence analysis of phosphotyrosine-specific PTPs.

690 citations


Journal ArticleDOI
TL;DR: This method successfully distinguished rRNA gene sequence libraries from soil and bioreactors and correctly failed to find differences between libraries of the same composition.
Abstract: To determine the significance of differences between clonal libraries of environmental rRNA gene sequences, differences between homologous coverage curves, CX(D), and heterologous coverage curves, CXY(D), were calculated by a Cramer-von Mises-type statistic and compared by a Monte Carlo test procedure. This method successfully distinguished rRNA gene sequence libraries from soil and bioreactors and correctly failed to find differences between libraries of the same composition.

648 citations


Journal ArticleDOI
20 Jul 2001-Virology
TL;DR: The collective information on WSSV and the phylogenetic analysis on the viral DNA polymerase suggest that W SSV differs profoundly from all presently known viruses and that it is a representative of a new virus family.

582 citations


Journal ArticleDOI
TL;DR: The first complete genome sequence of a marine invertebrate virus, White spot bacilliform virus is reported, indicating that WSBV differs from all known viruses, although a few genes display a weak homology to herpesvirus genes.
Abstract: We report the first complete genome sequence of a marine invertebrate virus. White spot bacilliform virus (WSBV; or white spot syndrome virus) is a major shrimp pathogen with a high mortality rate and a wide host range. Its double-stranded circular DNA genome of 305,107 bp contains 181 open reading frames (ORFs). Nine homologous regions containing 47 repeated minifragments that include direct repeats, atypical inverted repeat sequences, and imperfect palindromes were identified. This is the largest animal virus that has been completely sequenced. Although WSBV is morphologically similar to insect baculovirus, the two viruses are not detectably related at the amino acid level. Rather, some WSBV genes are more homologous to eukaryotic genes than viral genes. In fact, sequence analysis indicates that WSBV differs from all known viruses, although a few genes display a weak homology to herpesvirus genes. Most of the ORFs encode proteins that bear no homology to any known proteins, either suggesting that WSBV represents a novel class of viruses or perhaps implying a significant evolutionary distance between marine and terrestrial viruses. The most unique feature of WSBV is the presence of an intact collagen gene, a gene encoding an extracellular matrix protein of animal cells that has never been found in any viruses. Determination of the genome of WSBV will facilitate a better understanding of the molecular mechanism underlying the pathogenesis of the WSBV virus and will also provide useful information concerning the evolution and divergence of marine and terrestrial animal viruses at the molecular level.

542 citations


Journal ArticleDOI
04 Jan 2001-Oncogene
TL;DR: A novel sequence, designated ASPL, fused in-frame to TFE3 exon 4 (type 1 fusion) or exon 3 (type 2 fusion), supporting ASPL-TFE3 as its oncogenically significant fusion product is established and establishing the utility of this assay in the diagnosis of ASPS.
Abstract: Alveolar soft part sarcoma (ASPS) is an unusual tumor with highly characteristic histopathology and ultrastructure, controversial histogenesis, and enigmatic clinical behavior. Recent cytogenetic studies have identified a recurrent der(17) due to a non-reciprocal t(X;17)(p11.2;q25) in this sarcoma. To define the interval containing the Xp11.2 break, we first performed FISH on ASPS cases using YAC probes for OATL1 (Xp11.23) and OATL2 (Xp11.21), and cosmid probes from the intervening genomic region. This localized the breakpoint to a 160 kb interval. The prime candidate within this previously fully sequenced region was TFE3, a transcription factor gene known to be fused to translocation partners on 1 and X in some papillary renal cell carcinomas. Southern blotting using a TFE3 genomic probe identified non-germline bands in several ASPS cases, consistent with rearrangement and possible fusion of TFE3 with a gene on 17q25. Amplification of the 5' portion of cDNAs containing the 3' portion of TFE3 in two different ASPS cases identified a novel sequence, designated ASPL, fused in-frame to TFE3 exon 4 (type 1 fusion) or exon 3 (type 2 fusion). Reverse transcriptase PCR using a forward primer from ASPL and a TFE3 exon 4 reverse primer detected an ASPL-TFE3 fusion transcript in all ASPS cases (12/12: 9 type 1, 3 type 2), establishing the utility of this assay in the diagnosis of ASPS. Using appropriate primers, the reciprocal fusion transcript, TFE3-ASPL, was detected in only one of 12 cases, consistent with the non-reciprocal nature of the translocation in most cases, and supporting ASPL-TFE3 as its oncogenically significant fusion product. ASPL maps to chromosome 17, is ubiquitously expressed, and matches numerous ESTs (Unigene cluster Hs.84128) but no named genes. The ASPL cDNA open reading frame encodes a predicted protein of 476 amino acids that contains within its carboxy-terminal portion of a UBX-like domain that shows significant similarity to predicted proteins of unknown function in several model organisms. The ASPL-TFE3 fusion replaces the N-terminal portion of TFE3 by the fused ASPL sequences, while retaining the TFE3 DNA-binding domain, implicating transcriptional deregulation in the pathogenesis of this tumor, consistent with the biology of several other translocation-associated sarcomas. Oncogene (2001) 20, 48 - 57.

524 citations


Journal ArticleDOI
TL;DR: A comparative sequence analysis algorithm for detecting novel structural RNA genes by test the pattern of substitutions observed in a pairwise alignment of two homologous sequences to suggest that this approach detects noncoding RNA genes with a fair degree of reliability.
Abstract: Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive.

515 citations


Journal ArticleDOI
TL;DR: The results illustrate that ARM and HEAT-repeat proteins, while having a common phylogenetic origin, have since diverged significantly and discuss evolutionary scenarios that could account for the great diversity of repeats observed.

Journal ArticleDOI
TL;DR: RT-PCR analysis showed that all genes, including seg and sei, belong to an operon, designated theEnterotoxin gene cluster (egc), identifying egc as a putative nursery of enterotoxin genes.
Abstract: The recently described staphylococcal enterotoxins (SE) G and I were originally identified in two separate strains of Staphylococcus aureus. We have previously shown that the corresponding genes seg and sei are present in S. aureus in tandem orientation, on a 3.2-kb DNA fragment (Jarraud, J. et al. 1999. J. Clin. Microbiol. 37:2446-2449). Sequence analysis of seg-sei intergenic DNA and flanking regions revealed three enterotoxin-like open reading frames related to seg and sei, designated sek, sel, and sem, and two pseudogenes, psi ent1 and psi ent2. RT-PCR analysis showed that all these genes, including seg and sei, belong to an operon, designated the enterotoxin gene cluster (egc). Recombinant SEG, SEI, SEK, SEL, and SEM showed superantigen activity, each with a specific V beta pattern. Distribution studies of genes encoding superantigens in clinical S. aureus isolates showed that most strains harbored such genes and in particular the enterotoxin gene cluster, whatever the disease they caused. Phylogenetic analysis of enterotoxin genes indicated that they all potentially derived from this cluster, identifying egc as a putative nursery of enterotoxin genes.

Journal ArticleDOI
TL;DR: The TIGR Gene Indices are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes.
Abstract: While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http:// www.tigr.org/tdb/tgi.shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.

Journal ArticleDOI
TL;DR: In this review, several genomic approaches that are being used to identify regulatory sequences in mammalian genomes are highlighted.
Abstract: With the continuing accomplishments of the human genome project, high-throughput strategies to identify DNA sequences that are important in mammalian gene regulation are becoming increasingly feasible. In contrast to the historic, labour-intensive, wet-laboratory methods for identifying regulatory sequences, many modern approaches are heavily focused on the computational analysis of large genomic data sets. Data from inter-species genomic sequence comparisons and genome-wide expression profiling, integrated with various computational tools, are poised to contribute to the decoding of genomic sequence and to the identification of those sequences that orchestrate gene regulation. In this review, we highlight several genomic approaches that are being used to identify regulatory sequences in mammalian genomes.

Journal ArticleDOI
TL;DR: Results indicate that within an apparently homogeneous population, as determined by macroscale comparison and nucleotide sequence analysis, remarkable genetic differences exist among single-colony isolates of H. pylori.
Abstract: Isolates of the gastric pathogen Helicobacter pylori harvested from different individuals are highly polymorphic. Strain variation also has been observed within a single host. To more fully ascertain the extent of H. pylori genetic diversity within the ecological niche of its natural host, we harvested additional isolates of the sequenced H. pylori strain J99 from its human source patient after a 6-year interval. Randomly amplified polymorphic DNA PCR and DNA sequencing of four unlinked loci indicated that these isolates were closely related to the original strain. In contrast, microarray analysis revealed differences in genetic content among all of the isolates that were not detected by randomly amplified polymorphic DNA PCR or sequence analysis. Several ORFs from loci scattered throughout the chromosome in the archival strain did not hybridize with DNA from the recent strains, including multiple ORFs within the J99 plasticity zone. In addition, DNA from the recent isolates hybridized with probes for ORFs specific for the other fully sequenced H. pylori strain 26695, including a putative traG homolog. Among the additional J99 isolates, patterns of genetic diversity were distinct both when compared with each other and to the original prototype isolate. These results indicate that within an apparently homogeneous population, as determined by macroscale comparison and nucleotide sequence analysis, remarkable genetic differences exist among single-colony isolates of H. pylori. Direct evidence that H. pylori has the capacity to lose and possibly acquire exogenous DNA is consistent with a model of continuous microevolution within its cognate host.

Journal ArticleDOI
TL;DR: For many years, sequencing of the 16S ribosomal RNA (rRNA) gene has served as an important tool for determining phylogenetic relationships between bacteria as discussed by the authors, and the features of this molecular target that make it a useful phylogenetic tool also make it useful for bacterial detection and identification in the clinical laboratory.

Journal ArticleDOI
TL;DR: Environmental sequence data and TM7-specific FISH analysis indicate that members of the TM7 division are present in a variety of terrestrial, aquatic, and clinical habitats, and suggests that TM7 bacteria, like Archaea, may be streptomycin resistant at the ribosome level.
Abstract: A molecular approach was used to investigate a recently described candidate division of the domain Bacteria, TM7, currently known only from environmental 16S ribosomal DNA sequence data. A number of TM7-specific primers and probes were designed and evaluated. Fluorescence in situ hybridization (FISH) of a laboratory scale bioreactor using two independent TM7-specific probes revealed a conspicuous sheathed-filament morphotype, fortuitously enriched in the reactor. Morphologically, the filament matched the description of the Eikelboom morphotype 0041-0675 widely associated with bulking problems in activated-sludge wastewater treatment systems. Transmission electron microscopy of the bioreactor sludge demonstrated that the sheathed-filament morphotype had a typical gram-positive cell envelope ultrastructure. Therefore, TM7 is only the third bacterial lineage recognized to have gram-positive representatives. TM7-specific FISH analysis of two full-scale wastewater treatment plant sludges, including the one used to seed the laboratory scale reactor, indicated the presence of a number of morphotypes, including sheathed filaments. TM7-specific PCR clone libraries prepared from the two full-scale sludges yielded 23 novel TM7 sequences. Three subdivisions could be defined based on these data and publicly available sequences. Environmental sequence data and TM7-specific FISH analysis indicate that members of the TM7 division are present in a variety of terrestrial, aquatic, and clinical habitats. A highly atypical base substitution (Escherichia coli position 912; C to U) for bacterial 16S rRNAs was present in almost all TM7 sequences, suggesting that TM7 bacteria, like Archaea, may be streptomycin resistant at the ribosome level.

Journal ArticleDOI
TL;DR: The identification and cloning of all functional human odorant receptor genes is an important initial step in understanding receptor-ligand specificity and combinatorial encoding of odorant stimuli in human olfaction.
Abstract: The mammalian olfactory apparatus is able to recognize and distinguish thousands of structurally diverse volatile chemicals. This chemosensory function is mediated by a very large family of seven-transmembrane olfactory (odorant) receptors encoded by approximately 1,000 genes, the majority of which are believed to be pseudogenes in humans. The strategy of our sequence database mining for full-length, functional candidate odorant receptor genes was based on the high overall sequence similarity and presence of a number of conserved sequence motifs in all known mammalian odorant receptors as well as the absence of introns in their coding sequences. We report here the identification and physical cloning of 347 putative human full-length odorant receptor genes. Comparative sequence analysis of the predicted gene products allowed us to identify and define a number of consensus sequence motifs and structural features of this vast family of receptors. A new nomenclature for human odorant receptors based on their chromosomal localization and phylogenetic analysis is proposed. We believe that these sequences represent the essentially complete repertoire of functional human odorant receptors. The identification and cloning of all functional human odorant receptor genes is an important initial step in understanding receptor-ligand specificity and combinatorial encoding of odorant stimuli in human olfaction.

Journal ArticleDOI
TL;DR: A PCR-based assay which allows the detection of staphylococci at the genus level by targeting thetuf gene, which encodes the elongation factor Tu, demonstrated sufficient interspecies polymorphism to generate genus- and species-specific capture probes.
Abstract: We have developed a PCR-based assay which allows the detection of staphylococci at the genus level by targeting the tuf gene, which encodes the elongation factor Tu. Degenerate PCR primers derived from consensus regions of several tuf genes were used to amplify a target region of 884 bp from 11 representative staphylococcal species. Subsequently, the entire nucleotide sequence of these amplicons was determined. The analysis of a multiple alignment of these sequences revealed regions conserved among staphylococci but distinct from those of other gram-positive bacteria genetically related to staphylococci. PCR primers complementary to these regions could amplify specifically and efficiently a DNA fragment of 370 bp for all of 27 different staphylococcal species tested. There was no amplification with genomic DNA prepared from 53 nonstaphylococcal species tested to verify the specificity of the assay (20 gram positive and 33 gram negative). Furthermore, this assay amplified efficiently all 27 American Type Culture Collection (ATCC) staphylococcal reference strains as well as 307 clinical isolates of staphylococci from the Quebec City region. Analysis of the multiple sequence alignment for the 884-bp fragment for the 11 staphylococcal species as well as comparison of the sequences for the 370-bp amplicon from five unrelated ATCC and clinical strains for each of the species S. aureus, S. epidermidis, S. haemolyticus, S. hominis, and S. saprophyticus demonstrated sufficient interspecies polymorphism to generate genus- and species-specific capture probes. This sequence information allowed the development of Staphylococcus-specific and species-specific (targeting S. aureus, S. epidermidis, S. haemolyticus, S. hominis, or S. saprophyticus) capture probes hybridizing to the 370-bp amplicon. In conclusion, this PCR assay is suitable for detection of staphylococci at both genus and species levels.

Journal ArticleDOI
TL;DR: It is concluded that the functional homology of human KIR and mouse Ly49 genes arose by convergent evolution and has interesting parallels with the major histocompatibility complex (MHC) in which some of the polymorphic genes are ligands for NK molecules.
Abstract: The two sets of inhibitory and activating natural killer (NK) receptor genes belong either to the Ig or to the C-type lectin superfamilies. Both are extensive and diverse, comprising genes of varying degrees of relatedness, indicative of a process of iterative duplication. We have constructed gene maps to help understand how and when NK receptor genes developed and the nature of their polymorphism. A cluster of over 15 C-type lectin genes, the natural killer complex is located on human chromosome 12p13.1, syntenic with a region in mouse that borders multiple Ly49 loci. The equivalent locus in man is occupied by a single pseudogene, LY49L. The immunoglobulin superfamily of loci, the leukocyte receptor complex (LRC), on chromosome 19q13.4, contains many polymorphic killer cell immunoglobulin-like receptor (KIR) genes as well as multiple related sequences. These include immunoglobulin-like transcript (ILT) (or leukocyte immunoglobulin-like receptor genes), leukocyte-associated inhibitory receptor genes (LAIR), NKp46, Fc alphaR and the platelet glycoprotein receptor VI locus, which encodes a collagen-binding molecule. KIRs are expressed mostly on NK cells and some T cells. The other LRC loci are more widely expressed. Further centromeric of the LRC are sets of additional loci with weak sequence similarity to the KIRs, including the extensive CD66(CEA) and Siglec families. The LRC-syntenic region in mice contains no orthologues of KIRs. Some of the KIR genes are highly polymorphic in terms of sequence as well as for presence/absence of genes on different haplotypes. Some anchor loci, such as KIR2DL4, are present on most haplotypes. A few ILT loci, such as ILT5 and ILT8, are polymorphic, but only ILT6 exhibits presence/absence variation. This knowledge of the genomic organisation of the extensive NK superfamilies underpins efforts to understand the functions of the encoded NK receptor molecules. It leads to the conclusion that the functional homology of human KIR and mouse Ly49 genes arose by convergent evolution. NK receptor immunogenetics has interesting parallels with the major histocompatibility complex (MHC) in which some of the polymorphic genes are ligands for NK molecules. There are hints of an ancient genetic relationship between NK receptor genes and MHC-paralogous regions on chromosomes 1, 9 and 19. The picture that emerges from both complexes is of eternal evolutionary restlessness, presumably in response to resistance to disease.

Journal ArticleDOI
TL;DR: To best evaluate the taxonomic status of NTM species submitted to the authors' reference laboratory, a 16S rRNA sequence database is created by sequencing 121 American Type Culture Collection strains encompassing 92 species of mycobacteria, and the Ribosomal Differentiation of Medical Microorganisms service is made freely available on the Internet.
Abstract: The use of the 16S rRNA gene for identification of nontuberculous mycobacteria (NTM) provides a faster and better ability to accurately identify them in addition to contributing significantly in the discovery of new species. Despite their associated problems, many rely on the use of public sequence databases for sequence comparisons. To best evaluate the taxonomic status of NTM species submitted to our reference laboratory, we have created a 16S rRNA sequence database by sequencing 121 American Type Culture Collection strains encompassing 92 species of mycobacteria, and have also included chosen unique mycobacterial sequences from public sequence repositories. In addition, the Ribosomal Differentiation of Medical Microorganisms (RIDOM) service has made freely available on the Internet mycobacterial identification by 16S rRNA analysis. We have evaluated 122 clinical NTM species using our database, comparing >1,400 bp of the 16S gene, and the RIDOM database, comparing ∼440 bp. The breakdown of analysis was as follows: 61 strains had a sequence with 100% similarity to the type strain of an established species, 19 strains showed a 1- to 5-bp divergence from an established species, 11 strains had sequences corresponding to uncharacterized strain sequences in public databases, and 31 strains represented unique sequences. Our experience with analysis of the 16S rRNA gene of patient strains has shown that clear-cut results are not the rule. As many clinical, research, and environmental laboratories currently employ 16S-based identification of bacteria, including mycobacteria, a freely available quality-controlled database such as that provided by RIDOM is essential to accurately identify species or detect true sequence variations leading to the discovery of new species.

Journal ArticleDOI
TL;DR: The results confirm that the sodA gene constitutes a highly discriminative target sequence for differentiating closely related bacterial species, and demonstrate the usefulness of this method for rapid and accurate species identification of CNS isolates, although it does not allow discrimination of subspecies.
Abstract: Simple PCR and sequencing assays that utilize a single pair of degenerate primers were used to characterize a 429-bp-long DNA fragment internal (sodAint) to the sodA gene encoding the manganese-dependent superoxide dismutase in 40 coagulase-negative staphylococcal (CNS) type strains. The topology of the phylogenetic tree obtained was in general agreement with that which was inferred from an analysis of their 16S rRNA or hsp60 gene sequences. Sequence analysis revealed that the staphylococcal sodA genes exhibit a higher divergence than does the corresponding 16S ribosomal DNA. These results confirm that the sodA gene constitutes a highly discriminative target sequence for differentiating closely related bacterial species. Clinical isolates that could not be identified at the species level by phenotypical tests were identified by use of this database. These results demonstrate the usefulness of this method for rapid and accurate species identification of CNS isolates, although it does not allow discrimination of subspecies. The sodA sequence polymorphisms observed with staphylococcal species offer good opportunities for the development of assays based on DNA chip technologies.

Journal ArticleDOI
TL;DR: The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs and iterative searches can be conducted to enrich collections of homologous RNAs.

Journal ArticleDOI
TL;DR: This study gives an example of how much the selection of different variable regions combined with different specificities of the flanking “universal” primers can affect a PCR-based microbial community analysis.
Abstract: Genetic profiling techniques of microbial communities based on PCR-amplified signature genes, such as denaturing gradient gel electrophoresis or single-strand-conformation polymorphism (SSCP) analysis, are normally done with PCR products of less than 500-bp. The most common target for diversity analysis, the small-subunit rRNA genes, however, are larger, and thus, only partial sequences can be analyzed. Here, we compared the results obtained by PCR targeting different variable (V) regions (V2 and V3, V4 and V5, and V6 to V8) of the bacterial 16S rRNA gene with primers hybridizing to evolutionarily conserved flanking regions. SSCP analysis of single-stranded PCR products generated from 13 different bacterial species showed fewer bands with products containing V4-V5 (average, 1.7 bands per organism) than with V2-V3 (2.2 bands) and V6-V8 (2.3 bands). We found that the additional bands (>1 per organism) were caused by intraspecies operon heterogeneities or by more than one conformation of the same sequence. Community profiles, generated by PCR-SSCP from bacterial-cell consortia extracted from rhizospheres of field-grown maize (Zea mays), were analyzed by cloning and sequencing of the dominant bands. A total of 48 sequences could be attributed to 34 different strains from 10 taxonomical groups. Independent of the primer pairs, we found proteobacteria (α, β, and γ subgroups) and members of the genus Paenibacillus (low G+C gram-positive) to be the dominant organisms. Other groups, however, were only detected with single primer pairs. This study gives an example of how much the selection of different variable regions combined with different specificities of the flanking “universal” primers can affect a PCR-based microbial community analysis.

Journal ArticleDOI
01 May 2001-RNA
TL;DR: The authors' western blot assays confirmed the presence of specific antibodies to a new HCV antigen encoded, at least in part, in an alternate reading frame (ARF) overlapping the core-encoding region.
Abstract: Many viruses have overlapping genes and/or regions in which a nucleic acid signal is embedded in a coding sequence. To search for dual-use regions in the hepatitis C virus (HCV), we developed a facile computer-based sequence analysis method to map dual-use regions in coding sequences. Eight diverse full-length HCV RNA and polyprotein sequences were aligned and analyzed. A cluster of unusually conserved synonymous codons was found in the core-encoding region, indicating a potential overlapping open reading frame (ORF). Four peptides (A1, A2, A3, and A4) representing this alternate reading frame protein (ARFP), two others from the HCV core protein, and one from bovine serum albumin (BSA) were conjugated to BSA and used in western blots to test sera for specific antibodies from 100 chronic HCV patients, 44 healthy controls, and 60 patients with non-HCV liver disease. At a 1:20,000 dilution, specific IgGs to three of the four ARFP peptides were detected in chronic HCV sera. Reactivity to either the A1 or A3 peptides (both ARFP derived) was significantly associated with chronic HCV infection, when compared to non-HCV liver disease serum samples (10/100 versus 1/60; p , 0.025). Antibodies to A4 were not detected in any serum sample. Our western blot assays confirmed the presence of specific antibodies to a new HCV antigen encoded, at least in part, in an alternate reading frame (ARF) overlapping the core-encoding region. Because this novel HCV protein stimulates specific immune responses, it has potential value in diagnostic tests and as a component of vaccines. This protein is predicted to be highly basic and may play a role in HCV replication, pathogenesis, and carcinogenesis.

Journal ArticleDOI
TL;DR: Using cross-species techniques, cloned, sequenced, and characterized equine melanocortin-1-receptor (MC1R) and agouti-signaling-protein (ASIP) and completed a partial sequence of tyrosinase-related protein 1 (TYRP1).
Abstract: Coat color genetics, when successfully adapted and applied to different mammalian species, provides a good demonstration of the powerful concept of comparative genetics. Using cross-species techniques, we have cloned, sequenced, and characterized equine melanocortin-1-receptor (MC1R) and agouti-signaling-protein (ASIP), and completed a partial sequence of tyrosinase-related protein 1 (TYRP1). The coding sequences and parts of the flanking regions of those genes were systematically analyzed in 40 horses and mutations typed in a total of 120 horses. Our panel represented 22 different horse breeds, including 11 different coat colors of Equus caballus. The comparison of a 1721-bp genomic fragment of MC1R among the 11 coat color phenotypes revealed no sequence difference apart from the known chestnut allele (C901T). In particular, no dominant black (E D) mutation was found. In a 4994-bp genomic fragment covering the three putative exons, two introns and parts of the 5′- and 3′-UTRs of ASIP, two intronic base substitutions (SNP-A845G and C2374A), a point mutation in the 3′-UTRs (A4734G), and an 11-bp deletion in exon 2 (ADEx2) were detected. The deletion was found to be homozygous and completely associated with horse recessive black coat color (A a /A a ) in 24 black horses out of 9 different breeds from our panel. The frameshift initiated by ADEx2 is believed to alter the regular coding sequence, acting as a loss-of-function ASIP mutation. In TYRP1 a base substitution was detected in exon 2 (C189T), causing a threonine to methionine change of yet unknown function, and an SNP (A1188G) was found in intron 2.

Journal ArticleDOI
TL;DR: Comparative amino acid sequence alignments revealed that PEDV is most closely related to human coronavirus (HCoV)-229E and transmissible gastroenteritis virus (TGEV) and less related to murine hepatitis virus (MHV), and infectious bronchitis virus(IBV).
Abstract: The sequence of the replicase gene of porcine epidemic diarrhoea virus (PEDV) has been determined. This completes the sequence of the entire genome of strain CV777, which was found to be 28,033 nucleotides (nt) in length (excluding the poly A-tail). A cloning strategy, which involves primers based on conserved regions in the predicted ORF1 products from other coronaviruses whose genome sequence has been determined, was used to amplify the equivalent, but as yet unknown, sequence of PEDV. Primary sequences derived from these products were used to design additional primers resulting in the amplification and sequencing of the entire ORF1 of PEDV. Analysis of the nucleotide sequences revealed a small open reading frame (ORF) located near the 5′ end (no 99–137), and two large, slightly overlapping ORFs, ORF1a (nt 297–12650) and ORF1b (nt 12605–20641). The ORF1a and ORF1b sequences overlapped at a potential ribosomal frame shift site. The amino acid sequence analysis suggested the presence of several functional motifs within the putative ORF1 protein. By analogy to other coronavirus replicase gene products, three protease and one growth factor-like motif were seen in ORF1a, and one polymerase domain, one metal ion-binding domain, and one helicase motif could be assigned within ORF1b. Comparative amino acid sequence alignments revealed that PEDV is most closely related to human coronavirus (HCoV)-229E and transmissible gastroenteritis virus (TGEV) and less related to murine hepatitis virus (MHV) and infectious bronchitis virus (IBV). These results thus confirm and extend the findings from sequence analysis of the structural genes of PEDV.

Journal ArticleDOI
TL;DR: Overall, these results suggest that lysogenic conversion is a major mechanism driving the evolution of Salmonella bacteria.
Abstract: Gene transfer between separate lineages of a bacterial pathogen can promote recombinational divergence and the emergence of new pathogenic variants. Temperate bacteriophages, by virtue of their ability to carry foreign DNA, are potential key players in this process. Our previous work has shown that representative strains of Salmonella typhimurium (LT2, ATCC14028 and SL1344) are lysogenic for two temperate bacteriophages: Gifsy-1 and Gifsy-2. Several lines of evidence suggested that both elements carry genes that contribute to Salmonella virulence. One such gene, on the Gifsy-2 prophage, codes for the [Cu, Zn] superoxide dismutase SodCI. Other putative pathogenicity determinants were uncovered more recently. These include genes for known or presumptive type III-translocated proteins and a locus, duplicated on both prophages, showing sequence similarity to a gene involved in Salmonella enteropathogenesis (pipA). In addition to Gifsy-1 and Gifsy-2, each of the above strains was found to harbour a specific set of prophages also carrying putative pathogenicity determinants. A phage released from strain LT2 and identified as phage Fels-1 carries the nanH gene and a novel sodC gene, which was named sodCIII. Strain ATCC14028 releases a lambdoid phage, named Gifsy-3, which contains the phoP/phoQ-activated pagJ gene and the gene for the secreted leucine-rich repeat protein SspH1. Finally, a phage specifically released from strain SL1344 was identified as SopEPhi. Most phage-associated loci transferred efficiently between Salmonella strains of the same or different serovars. Overall, these results suggest that lysogenic conversion is a major mechanism driving the evolution of Salmonella bacteria.

Journal ArticleDOI
TL;DR: Surprisingly, several of the genes previously reported to be essential for a self-replicating minimal cell are missing in the M.pulmonis genome although this one is larger than the other mycoplasma genomes fully sequenced until now.
Abstract: Mycoplasma pulmonis is a wall-less eubacterium belonging to the Mollicutes (trivial name, mycoplasmas) and responsible for murine respiratory diseases. The genome of strain UAB CTIP is composed of a single circular 963 879 bp chromosome with a G + C content of 26.6 mol%, i.e. the lowest reported among bacteria, Ureaplasma urealyticum apart. This genome contains 782 putative coding sequences (CDSs) covering 91.4% of its length and a function could be assigned to 486 CDSs whilst 92 matched the gene sequences of hypothetical proteins, leaving 204 CDSs without significant database match. The genome contains a single set of rRNA genes and only 29 tRNAs genes. The replication origin oriC was localized by sequence analysis and by using the G + C skew method. Sequence polymorphisms within stretches of repeated nucleotides generate phase-variable protein antigens whilst a recombinase gene is likely to catalyse the site-specific DNA inversions in major M.pulmonis surface antigens. Furthermore, a hemolysin, secreted nucleases and a glyco-protease are predicted virulence factors. Surprisingly, several of the genes previously reported to be essential for a self-replicating minimal cell are missing in the M.pulmonis genome although this one is larger than the other mycoplasma genomes fully sequenced until now.

Journal ArticleDOI
TL;DR: A defined template mixture of seven closely related 16S-rDNA clones was used in a PCR-cloning experiment to assess and track sources of artifactual sequence variation in 16S rDNA clone libraries and may partially explain the high degree of microheterogeneity typical of sequence clusters detected in environmental clone libraries.
Abstract: A defined template mixture of seven closely related 16S-rDNA clones was used in a PCR-cloning experiment to assess and track sources of artifactual sequence variation in 16S rDNA clone libraries. At least 14% of the recovered clones contained aberrations. Artifact sources were polymerase errors, a mutational hot spot, and cloning of heteroduplexes and chimeras. These data may partially explain the high degree of microheterogeneity typical of sequence clusters detected in environmental clone libraries.