scispace - formally typeset
Search or ask a question

Showing papers on "Sequence analysis published in 1996"


Journal ArticleDOI
25 Oct 1996-Science
TL;DR: The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration and provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history.
Abstract: The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.

4,254 citations


Journal ArticleDOI
TL;DR: The sequence determination of the entire genome of the Synechocystis sp.
Abstract: The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously reported sequence of 1,003,450 bp from map position 64% to 92% of the genome. The entire sequence was assembled from the sequences of the physical map-based contigs of cosmid clones and of lambda clones and long PCR products which were used for gap-filling. The accuracy of the sequence was guaranteed by analysis of both strands of DNA through the entire genome. The authenticity of the assembled sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA using the assembled sequence data. To predict the potential protein-coding regions, analysis of open reading frames (ORFs), analysis by the GeneMark program and similarity search to databases were performed. As a result, a total of 3,168 potential protein genes were assigned on the genome, in which 145 (4.6%) were identical to reported genes and 1,257 (39.6%) and 340 (10.8%) showed similarity to reported and hypothetical genes, respectively. The remaining 1,426 (45.0%) had no apparent similarity to any genes in databases. Among the potential protein genes assigned, 128 were related to the genes participating in photosynthetic reactions. The sum of the sequences coding for potential protein genes occupies 87% of the genome length. By adding rRNA and tRNA genes, therefore, the genome has a very compact arrangement of protein- and RNA-coding regions. A notable feature on the gene organization of the genome was that 99 ORFs, which showed similarity to transposase genes and could be classified into 6 groups, were found spread all over the genome, and at least 26 of them appeared to remain intact. The result implies that rearrangement of the genome occurred frequently during and after establishment of this species.

2,523 citations


Journal ArticleDOI
TL;DR: The cag region may encode a novel H. pylori secretion system for the export of virulence determinants and Transposon inactivation of several of the cagI genes abolishes induction of IL-8 expression in gastric epithelial cell lines.
Abstract: cagA, a gene that codes for an immunodominant antigen, is present only in Helicobacter pylori strains that are associated with severe forms of gastroduodenal disease (type I strains). We found that the genetic locus that contains cagA (cag) is part of a 40-kb DNA insertion that likely was acquired horizontally and integrated into the chromosomal glutamate racemase gene. This pathogenicity island is flanked by direct repeats of 31 bp. In some strains, cag is split into a right segment (cagI) and a left segment (cagII) by a novel insertion sequence (IS605). In a minority of H. pylori strains, cagI and cagII are separated by an intervening chromosomal sequence. Nucleotide sequencing of the 23,508 base pairs that form the cagI region and the extreme 3' end of the cagII region reveals the presence of 19 ORFs that code for proteins predicted to be mostly membrane associated with one gene (cagE), which is similar to the toxin-secretion gene of Bordetella pertussis, ptlC, and the transport systems required for plasmid transfer, including the virB4 gene of Agrobacterium tumefaciens. Transposon inactivation of several of the cagI genes abolishes induction of IL-8 expression in gastric epithelial cell lines. Thus, we believe the cag region may encode a novel H. pylori secretion system for the export of virulence determinants.

1,860 citations


Journal ArticleDOI
TL;DR: Four children with severe mycobacterial infections had a mutation in the gene for interferon-gamma receptor 1 that leads to the absence of receptors on cell surfaces and a functional defect in the up-regulation of tumor necrosis factor alpha by macrophages in response to interferOn-Gamma.
Abstract: BACKGROUND Genetic differences in immune responses may affect susceptibility to mycobacterial infection, but no specific genes have been implicated in humans. We studied four children who had an unexplained genetic susceptibility to mycobacterial infection and who appeared to have inherited the same recessive mutation from a common ancestor. METHODS We used microsatellite analysis, immunofluorescence studies, and sequence analysis to study the affected patients, unaffected family members, and normal controls. RESULTS A genome search using microsatellite markers identified a region on chromosome 6q in which the affected children were all homozygous for eight markers. The gene for interferon-gamma receptor 1 maps to this region. Immunofluorescence studies showed that the receptor was absent on leukocytes from the affected children. Sequence analysis of complementary DNA for the gene for interferon-gamma receptor 1 revealed a point mutation at nucleotide 395 that introduces a stop codon and results in a truncated protein that lacks the transmembrane and cytoplasmic domains. CONCLUSIONS Four children with severe mycobacterial infections had a mutation in the gene for interferon-gamma receptor 1 that leads to the absence of receptors on cell surfaces and a functional defect in the up-regulation of tumor necrosis factor alpha by macrophages in response to interferon-gamma. The interferon-gamma pathway is important in the response to intracellular pathogens such as mycobacteria.

1,178 citations


Journal ArticleDOI
TL;DR: The entire genome of the bacterium Mycoplasma pneumoniae M129 has been sequenced and a functional classification to a large number of ORFs is tentatively assigned and the biochemical and physiological properties of this bacterium are deduced.
Abstract: The entire genome of the bacterium Mycoplasma pneumoniae M129 has been sequenced. It has a size of 816,394 base pairs with an average G+C content of 40.0 mol%. We predict 677 open reading frames (ORFs) and 39 genes coding for various RNA species. Of the predicted ORFs, 75.9% showed significant similarity to genes/proteins of other organisms while only 9.9% did not reveal any significant similarity to gene sequences in databases. This permitted us tentatively to assign a functional classification to a large number of ORFs and to deduce the biochemical and physiological properties of this bacterium. The reduction of the genome size of M. pneumoniae during its reductive evolution from ancestral bacteria can be explained by the loss of complete anabolic (e.g. no amino acid synthesis) and metabolic pathways. Therefore, M. pneumoniae depends in nature on an obligate parasitic lifestyle which requires the provision of exogenous essential metabolites. All the major classes of cellular processes and metabolic pathways are briefly described. For a number of activities/functions present in M. pneumoniae according to experimental evidence, the corresponding genes could not be identified by similarity search. For instance we failed to identify genes/proteins involved in motility, chemotaxis and management of oxidative stress.

1,136 citations


Book ChapterDOI
TL;DR: For genomic studies, it is essential to view compositional bias in the context of many types of other features, such as recognizable functional sites, transcripts, coding sequences, and homologies, which are being integrated into software packages that have graphic multilevel browsing facilities and include zoom functions.
Abstract: Publisher Summary This chapter discusses the analysis of compositionally biased region in sequence databases Programs sequence (SEG) and protein sequence (PSEG) are tuned for amino acid sequences and nucleotide sequence (NSEG) for nucleotide sequences The programs can be applied to either individual sequences, including whole chromosomes if appropriate, or entire sequence databases Compositional complexity is based only on residue composition, regardless of the patterns or periodicity of sequence repetitiveness This contrasts with some alternative methods that use counts of k-grams to define residue patterns and clustering Complexity, pattern, and periodicity are distinct abstract attributes of simple sequences For genomic studies, it is essential to view compositional bias in the context of many types of other features, such as recognizable functional sites, transcripts, coding sequences, and homologies For this purpose, the SEG family of programs is being integrated into software packages, or workbenches, that have graphic multilevel browsing facilities and include zoom functions

786 citations


Journal ArticleDOI
TL;DR: FlaA is needed for crossing the fish integument and may play a role in virulence after invasion of the host and is a single transcriptional unit.
Abstract: A flagellin gene from the fish pathogen Vibrio anguillarum was cloned, sequenced, and mutagenized. The DNA sequence suggests that the flaA gene encodes a 40.1-kDa protein and is a single transcriptional unit. A polar mutation and four in-frame deletion mutations (180 bp deleted from the 5' end of the gene, 153 bp deleted from the 3' end of the gene, a double deletion of both the 180- and 153-bp deletions, and 942 bp deleted from the entire gene) were made. Compared with the wild type, all mutants were partially motile, and a shortening of the flagellum was seen by electron microscopy. Wild-type phenotypes were regained when the mutations were transcomplemented with the flaA gene. Protein analysis indicated that the flaA gene corresponds to a 40-kDa protein and that the flagellum consists of three additional flagellin proteins with molecular masses of 41, 42, and 45 kDa. N-terminal sequence analysis confirmed that the additional proteins were flagellins with N termini that are 82 to 88% identical to the N terminus of FlaA. Virulence studies showed that the N terminal deletion, the double deletion, and the 942-bp deletion increased the 50% lethal dose between 70- and 700-fold via immersion infection, whereas infection via intraperitoneal injection showed no loss in virulence. In contrast, the polar mutant and the carboxy-terminal deletion mutant showed approximately a 10(4)-fold increase in the 50% lethal dose by both immersion and intraperitoneal infection. In summary, FlaA is needed for crossing the fish integument and may play a role in virulence after invasion of the host.

661 citations


Journal ArticleDOI
TL;DR: The study involved 95 M type reference GAS strains and a survey of 74 recent clinical isolates to accurately deduce emm types corresponding to the majority of the known group A streptococcal (GAS) M serotypes.
Abstract: Rapid sequence analysis of specific PCR products was used to accurately deduce emm types corresponding to the majority of the known group A streptococcal (GAS) M serotypes. The study involved 95 M type reference GAS strains and a survey of 74 recent clinical isolates. A high percentage of agreement between M type serology and the previously published 5' sequences of the emm genes of M type reference strains was noted. The 5' sequences for six established M protein genes--the emm-32, emm-34, emm-38, emm-40, emm-42, and emm-71 genes--were determined to supplement the existing emm sequence database. Rapid sequence analysis differentiated serologically M-nontypeable strains and was used to establish the probable.

607 citations


Journal ArticleDOI
TL;DR: A culture-independent survey of the soil microbial diversity in a clover-grass pasture in southern Wisconsin was conducted by sequence analysis of a universal clone library of genes coding for small-subunit rRNA (rDNA), finding the enormous microbial diversity found in this soil in two ways, as phylogenetic trees and as multidimensional-scaling plots.
Abstract: A culture-independent survey of the soil microbial diversity in a clover-grass pasture in southern Wisconsin was conducted by sequence analysis of a universal clone library of genes coding for small-subunit rRNA (rDNA). A rapid and efficient method for extraction of DNA from soils which resulted in highly purified DNA with minimal shearing was developed. Universal small-subunit-rRNA primers were used to amplify DNA extracted from the pasture soil. The PCR products were cloned into pGEM-T, and either hypervariable or conserved regions were sequenced. The relationships of 124 sequences to those of cultured organisms of known phylogeny were determined. Of the 124 clones sequenced, 98.4% were from the domain Bacteria. Two of the rDNA sequences were derived from eukaryotic organelles. Two of the 124 sequences were of nuclear origin, one being fungal and the other a plant sequence. No sequences of the domain Archaea were found. Within the domain, Bacteria, three kingdoms were highly represented: the Proteobacteria (16.1%), the Cytophaga-Flexibacter-Bacteroides group (21.8%), and the low G+C-content gram-positive group (21.8%). Some kingdoms, such as the Thermotogales, the green nonsulfur group, Fusobacteria, and the Spirochaetes, were absent. A large number of the sequences (39.4%) were distributed among several clades that are not among the major taxa described by Olsen et al. (G.J. Olsen, C.R. Woese, and R. Overbeek, J. Bacteriol., 176:1-6, 1994). From the alignments of the sequence data, distance matrices were calculated to display the enormous microbial diversity found in this soil in two ways, as phylogenetic trees and as multidimensional-scaling plots.

602 citations


Journal ArticleDOI
TL;DR: The authors' studies show a clear division of T. cruzi into two major lineages presenting a high phylogenetic divergence and hypotheses are discussed to explain the origin of the two lineages as well as isolates that are hybrid for group 1 and 2 rDNA markers.

548 citations


Journal ArticleDOI
TL;DR: Comparative sequence analysis of housefly strains carrying kdr or the more potentsuper-kdr factor revealed two amino acid mutations that correlate with these resistance phenotypes, and suggest a binding site for pyrethroids at the intracellular mouth of the channel pore in a region known to be important for channel inactivation.
Abstract: We report the isolation of cDNA clones containing the full 6.3-kb coding sequence of the para-type sodium channel gene of the housefly, Musca domestica. This gene has been implicated as the site of knockdown resistance (kdr), an important resistance mechanism that confers nerve insensitivity to DDT and pyrethroid insecticides. The cDNAs predict a polypeptide of 2108 amino acids with close sequence homology (92% identity) to the Drosophila para sodium channel, and around 50% homology to vertebrate sodium channels, Only one major splice form of the housefly sodium channel was detected, in contrast to the Drosophila para transcript which has been reported to undergo extensive alternative splicing. Comparative sequence analysis of housefly strains carrying kdr or the more potent super-kdr factor revealed two amino acid mutations that correlate with these resistance phenotypes. Both mutations are located in domain II of the sodium channel. A leucine to phenylalanine replacement in the hydro-phobic IIS6 transmembrane segment was found in two independent kdr strains and six super-kdr strains of diverse geographic origin, while an additional methionine to threonine replacement within the intracellular IIS4-S5 loop was found only in the super-kdr strains. Neither mutation was present in five pyrethroid-sensitive strains. The mutations suggest a binding site for pyrethroids at the intracellular mouth of the channel pore in a region known to be important for channel inactivation.

Journal ArticleDOI
TL;DR: An established culture-based method with direct amplification and partial sequencing of cloned 16S rRNA genes from a human fecal specimen was compared, and there was good agreement between culturing bacteria and sampling rDNA directly.
Abstract: Human colonic biota is a complex microbial ecosystem that serves as a host defense. Unlike most microbial ecosystems, its composition has been studied extensively by relatively efficient culture methods. We have compared an established culture-based method with direct amplification and partial sequencing of cloned 16S rRNA genes from a human fecal specimen. Nine cycles of PCR were also compared with 35 cycles. Colonies and cloned amplicons were classified by comparing their ribosomal DNA (rDNA; DNA coding for rRNA) sequences with rDNA sequences of known phylogeny. Quantitative culture recovered 58% of the microscopic count. The 48 colonies identified gave 21 rDNA sequences; it was estimated that 72% of the rDNA sequences from the total population of culturable cells would match these 21 sampled sequences (72% coverage). Fifty 9-cycle clones gave 27 sequences and 59% coverage of cloned rDNAs. Thirty-nine rDNAs cloned after 35 cycles of PCR gave 13 sequences for 74% coverage. Thus, the representation of the ecosystem after 35 cycles of PCR was distorted and lacked diversity. However, when the number of temperature cycles was minimized, biodiversity was preserved, and there was good agreement between culturing bacteria and sampling rDNA directly.

Journal ArticleDOI
TL;DR: The results indicate that the HCV genome RNA terminates with a highly conserved RNA element which is likely to be required for authentic HCV replication and recovery of infectious RNA from cDNA.
Abstract: Previous reports suggest that the hepatitis C virus (HCV) genome RNA terminates with homopolymer tracts of either poly(U) or poly(A). By ligation of synthetic oligonucleotides followed by reverse transcription-PCR, cDNA cloning, and sequence analysis, we determined the 3'-terminal sequence of HCV genome RNA. Our results show that the HCV 3' nontranslated region consists of four elements (positive sense, 5' to 3'): (i) a short sequence with significant variability among genotypes, (ii) a homopolymeric poly(U) tract, (iii) a polypyrimidine stretch consisting of mainly U with interspersed C residues, (iv) a novel sequence of 98 bases. This latter nucleotide sequence is not present in human genomic DNA and is highly conserved among HCV genotypes. The 3'-terminal 46 bases are predicted to form a stable stem-loop structure. Using a quantitative-competitive reverse transcription-PCR assay, we show that a substantial fraction of HCV genome RNAs from a high- specific-infectivity inoculum contain this 3'-terminal sequence element. These results indicate that the HCV genome RNA terminates with a highly conserved RNA element which is likely to be required for authentic HCV replication and recovery of infectious RNA from cDNA.

Journal ArticleDOI
TL;DR: The presence of methanogenic bacteria was assessed in peat and soil cores taken from upland moors and formed two clusters on the end of long branches within the methanogen radiation that are distinct from each other.
Abstract: The presence of methanogenic bacteria was assessed in peat and soil cores taken from upland moors. The sampling area was largely covered by blanket bog peat together with small areas of red-brown limestone and peaty gley. A 30-cm-deep core of each soil type was taken, and DNA was extracted from 5-cm transverse sections. Purified DNA was subjected to PCR amplification with primers IAf and 1100Ar, which specifically amplify 1.1 kb of the archaeal 16S rRNA gene, and ME1 and ME2, which were designed to amplify a 0.75-kb region of the alpha-subunit gene for methyl coenzyme M reductase (MCR). Amplification with both primer pairs was obtained only with DNA extracted from the two deepest sections of the blanket bog peat core. This is consistent with the notion that anaerobiosis is required for activity and survival of the methanogen population. PCR products from both amplifications were cloned, and the resulting transformants were screened with specific oligonucleotide probes internal to the MCR or archaeal 16S rRNA PCR product. Plasmid DNA was extracted from probe-positive clones of both types and the insert was sequenced. The DNA sequences of 8 MCR clones were identical, as were those of 16 of the 17 16S rRNA clones. One clone showed marked variation from the remainder in specific regions of the sequence. From a comparison of these two different 16S rRNA sequences, an oligonucleotide was synthesized that was 100% homologous to a sequence region of the first 16 clones but had six mismatches with the variant. This probe was used to screen primary populations of PCR clones, and all of those that were probe negative were checked for the presence of inserts, which were then sequenced. By using this strategy, further novel methanogen 16S rRNA variants were identified and analyzed. The sequences recovered from the peat formed two clusters on the end of long branches within the methanogen radiation that are distinct from each other. These cannot be placed directly with sequences from any cultured taxa for which sequence information is available.

Journal ArticleDOI
TL;DR: The feasibility of HXGPRT as both a positive and negative selectable marker for stable transformation of T. gondii was demonstrated under selection with mycophenolic acid andKinetic analysis of purified recombinant enzyme revealed no significant differences between the two isoforms.

Book ChapterDOI
TL;DR: Although there are several different comparison programs available (e.g., BLASTP, FASTA, SSEARCH, and BLITZ) that can be used with different scoring systems, the following search protocol should identify homologous sequences whenever they can be found.
Abstract: Although there are several different comparison programs available (e.g., BLASTP, FASTA, SSEARCH, and BLITZ) that can be used with different scoring systems (e.g., PAM120, PAM250, BLOSUM50, BLOSUM62) and different databases (e.g., PIR, SWISS-PROT, GenPept), the following search protocol should identify homologous sequences whenever they can be found. 1. Always compare protein sequences if the genes encode proteins. Protein sequence comparison will typically double the evolutionary lookback time over DNA sequence comparison. 2. Search several sequence databases using a rapid sequence comparison program (e.g., BLASTP or FASTA, ktup = 2). Well-curated databases like PIR or SWISS-PROT tend to have fewer redundant sequences, which improves the statistical significance of a match, but they are less comprehensive and up-to-date than GenPept. 3. If there is good agreement between the distribution of scores and the theoretical distribution, and the alignments do not include "simple sequence" domains, accept sequences with FASTA E() values or BLASTP P() values below 0.02 as homologous. 4. If no library sequences are found with E values below 0.02, perform additional searches with FASTA, ktup = 1, or SSEARCH. If library sequences with E values less than 0.02 are found, the sequences are probably homologous, unless a low-complexity domain is aligned. However, sequences with similarity scores from 0.02 to 10.0 may be homologous as well. To characterize these more distantly related sequences, select "marginal" library sequences and use them to search the databases. Additional family members should have E values less than 0.05. 5. Homologous sequences share a common ancestor, and thus a common protein fold. Depending on the evolutionary distance and divergence path, two or more homologous sequences may have very few absolutely conserved residues. However, if homology has been inferred between A and B, between B and C, and between C and D, A and D must be homologous, even if they share no significant similarity. 6. Sequences with marginal E values should also be tested using the PRSS program. Compare the query and library sequences using at least 200 (and preferably 1000) shuffles. Shuffles using a window (-w) of 10-20 are more stringent than a uniform shuffle. Use the E value after 1000 shuffles to confirm an inference of homology. 7. Homologous sequences are usually similar over an entire sequence or domain, typically sharing 20-25% or greater identity for more than 200 residues. Matches that are more than 50% identical in a 20- to 40-amino acid region occur frequently by chance and do not indicate homology. By following these steps, one will very rarely assert that two sequences are homologous when in fact they are not. However, these criteria are stringent; distantly related homologous sequences may fail to be detected because their similarity is not statistically significant. These tests are biased toward missing some distantly related sequences to avoid the possibility of misidentifying unrelated ones. In most database searches, the ratio of related to unrelated sequences is more than 4000:1 (e.g., 10 related and 40,000 unrelated sequences). Thus, one is more likely to mistakenly identify two sequences as related than to overlook a genuine relationship, and our conservative evaluation criteria reflect that bias.

Journal ArticleDOI
TL;DR: The conserved N- or RPS2-homologous NBS sequences and their positional associations with mapped soybean-resistance genes suggest that a number of the soybean disease- Resistance genes may belong to this superfamily.
Abstract: The tobacco N and Arabidopsis RPS2 genes, among several recently cloned disease-resistance genes, share highly conserved structure, a nucleotide-binding site (NBS). Using degenerate oligonucleotide primers for the NBS region of N and RPS2, we have amplified and cloned the NBS sequences from soybean. Each of these PCR-derived NBS clones detected low-or moderate-copy soybean DNA sequences and belongs to 1 of 11 different classes. Sequence analysis showed that all PCR clones encode three motifs (P-loop, kinase-2, and kinase-3a) of NBS nearly identical to those in N and RPS2. The intervening region between P-loop and kinase-3a of the 11 classes has high (26% average) amino acid sequence similarity to the N gene although not as high (19% average) to RPS2. These 11 classes represent a superfamily of NBS-containing soybean genes that are homologous to N and RPS2. Each class or subfamily was assessed for its positional association with known soybean disease-resistance genes through near-isogenic line assays, followed by linkage analysis in F2 populations using restriction fragment length polymorphisms. Five of the 11 subfamilies have thus far been mapped to the vicinity of known soybean genes for resistance to potyviruses (Rsv1 and Rpv), Phytophthora root rot (Rps1, Rps2, and Rps3), and powdery mildew (rmd). The conserved N- or RPS2-homologous NBS sequences and their positional associations with mapped soybean-resistance genes suggest that a number of the soybean disease-resistance genes may belong to this superfamily. The candidate subfamilies of NBS-containing genes identified by genetic mapping should greatly facilitate the molecular cloning of disease-resistance genes.

Journal ArticleDOI
TL;DR: The sequence analysis and results obtained using various DNA polymerases appear to support the slipped strand displacement model as a potential explanation for how these stutter products are generated.
Abstract: The PCR amplification of tetranucleotide short tandem repeat (STR) loci typically produces a minor product band 4 bp shorter than the corresponding main allele band; this is referred to as the stutter band. Sequence analysis of the main and stutter bands for two sample alleles of the STR locus vWA reveals that the stutter band lacks one repeat unit relative to the main allele. Sequencing results also indicate that the number and location of the different 4 bp repeat units vary between samples containing a typical verses low proportion of stutter product. The results also suggest that the proportion of stutter product relative to the main allele increases as the number of uninterrupted core repeat units increases. The sequence analysis and results obtained using various DNA polymerases appear to support the slipped strand displacement model as a potential explanation for how these stutter products are generated.

Journal ArticleDOI
B Springer1, L Stockman1, K Teschner1, G D Roberts1, Erik C. Böttger1 
TL;DR: It is concluded that molecular typing by 16S rRNA sequence determination is not only more rapid (12 to 36 h versus 4 to 8 weeks) but also more accurate than traditional typing.
Abstract: Previous studies have indicated that the conventional tests used for the identification of mycobacteria may (i) frequently result in erroneous identification and (ii) underestimate the diversity within the genus Mycobacterium. To address this issue in a more systematic fashion, a study comparing phenotypic and molecular methods for the identification of mycobacteria was initiated. Focus was given to isolates which were difficult to identify to species level and which yielded inconclusive results by conventional tests performed under day-to-day routine laboratory conditions. Traditional methods included growth rate, colonial morphology, pigmentation, biochemical profiles, and gas-liquid chromatography of short-chain fatty acids. Molecular identification was done by PCR-mediated partial sequence analysis of the gene encoding the 16S rRNA. A total of 34 isolates was included in this study; 13 of the isolates corresponded to established species, and 21 isolates corresponded to previously uncharacterized taxa. For five isolates, phenotypic and molecular analyses gave identical results. For five isolates, minor discrepancies were present; four isolates remained unidentified after biochemical testing. For 20 isolates, major discrepancies between traditional and molecular typing methods were observed. Retrospective analysis of the data revealed that the discrepant results were without exception due to erroneous biochemical test results or interpretations. In particular, phenotypic identification schemes were compromised with regard to the recognition of previously undescribed taxa. We conclude that molecular typing by 16S rRNA sequence determination is not only more rapid (12 to 36 h versus 4 to 8 weeks) but also more accurate than traditional typing.

Journal ArticleDOI
TL;DR: The cloning, sequence analysis, tissue distribution, and functional expression of the K-Cl cotransport protein, KCC1, demonstrate that the K CC1 cDNAs encode a widely expressed K- Cl cotranporter with the characteristics of theK-Cl transporter that has been characterized in red cells.

Journal ArticleDOI
TL;DR: The characterized genes display extensive diversity in sequence and expression pattern and this information was utilized to determine potential structural, functional and evolutionary relationships to previously characterized members of the ABC superfamily.
Abstract: As an approach to characterizing all human ATP-binding cassette (ABC) superfamily genes, a search of the human expressed sequence tag (EST) database was performed using sequences from known ABC genes. A total of 105 clones, containing sequences of potential ABC genes, were identified, representing 21 distinct genes. This brings the total number of characterized human ABC genes from 12 to 33. The new ABC genes were mapped by PCR on somatic cell and radiation hybrid panels and yeast artificial chromosomes (YACs). The genes are located on human chromosomes 1, 2, 3, 4, 6, 7, 10, 12, 13, 14, 16, 17 and X; at locations distinct from previously mapped members of the superfamily. The characterized genes display extensive diversity in sequence and expression pattern and this information was utilized to determine potential structural, functional and evolutionary relationships to previously characterized members of the ABC superfamily.

Journal ArticleDOI
TL;DR: EPS expression in the non-EPS-producing heterologous host, Lactococcus lactis MG1363, showed that within the 15.25-kb region, a region with a size of 14.52 kb encoding the 13 genes epsA to epsM was capable of directing EPS synthesis and secretion in this host.
Abstract: We report the identification and characterization of the eps gene cluster of Streptococcus thermophilus Sfi6 required for exopolysaccharide (EPS) synthesis. This report is the first genetic work concerning EPS production in a food microorganism. The EPS secreted by this strain consists of the following tetrasaccharide repeating unit:-->3)-beta-D-Galp-(1-->3)-[alpha-D-Galp-(1-->6)]-beta-D- D-Galp-(1-->3)-alpha-D-Galp-D-GalpNAc-(1-->. The genetic locus The genetic locus was identified by Tn916 mutagenesis in combination with a plate assay to identify Eps mutants. Sequence analysis of the gene region, which was obtained from subclones of a genomic library of Sfi6, revealed a 15.25-kb region encoding 15 open reading frames. EPS expression in the non-EPS-producing heterologous host, Lactococcus lactis MG1363, showed that within the 15.25-kb region, a region with a size of 14.52 kb encoding the 13 genes epsA to epsM was capable of directing EPS synthesis and secretion in this host. Homology searches of the predicted proteins in the Swiss-Prot database revealed high homology (40 to 68% identity) for epsA, B, C, D, and E and the genes involved in capsule synthesis in Streptococcus pneumoniae and Streptococcus agalactiae. Moderate to low homology (37 to 18% identity) was detected for epsB, D, F, and H and the genes involved in capsule synthesis in Staphylococcus aureus for epsC, D, and E and the genes involved in exopolysaccharide I (EPSI) synthesis in Rhizobium meliloti for epsC to epsJ and the genes involved in lipopolysaccharide synthesis in members of the Enterobacteriaceae, and finally for eps K and lipB of Neisseria meningitidis. Genes (epsJ, epsL, and epsM) for which the predicted proteins showed little or no homology with proteins in the Swiss-Prot database were shown to be involved in EPS synthesis by single-crossover gene disruption experiments.

Journal ArticleDOI
TL;DR: It is shown, using sequence analysis methods, that bacterial and plant PLDs show significant sequence similarities both to each other, and to two other classes of phospholipid‐specific enzymes, bacterial cardiolipin synthases, and eukaryotic and bacterial phosphatidylserine synthase, indicating that these enzymes form an homologous family.
Abstract: Phosphatidylcholine-specific phospholipase D (PLD) enzymes catalyze hydrolysis of phospholipid phosphodiester bonds, and also transphosphatidylation of phospholipids to acceptor alcohols. Bacterial and plant PLD enzymes have not been shown previously to be homologues or to be homologous to any other protein. Here we show, using sequence analysis methods, that bacterial and plant PLDs show significant sequence similarities both to each other, and to two other classes of phospholipid-specific enzymes, bacterial cardiolipin synthases, and eukaryotic and bacterial phosphatidylserine synthases, indicating that these enzymes form an homologous family. This family is suggested also to include two Poxviridae proteins of unknown function (p37K and protein K4), a bacterial endonuclease (nuc), an Escherichia coli putative protein (o338) containing an N-terminal domain showing similarities with helicase motifs V and VI, and a Synechocystis sp. putative protein with a C-terminal domain likely to possess a DNA-binding function. Surprisingly, four regions of sequence similarity that occur once in nuc and o338, appear twice in all other homologues, indicating that the latter molecules are bi-lobed, having evolved from an ancestor or ancestors that underwent a gene duplication and fusion event. It is suggested that, for each of these enzymes, conserved histidine, lysine, aspartic acid, and/or asparagine residues may be involved in a two-step ping pong mechanism involving an enzyme-substrate intermediate.

Journal ArticleDOI
28 Nov 1996-Gene
TL;DR: This work determined the nucleotide sequence 3.8 kb upstream and 5.2 kb downstream of the toxin genes A and B of Clostridium difficile and defined the pathogenicity locus (PaLoc) as a distinct genetic element.

Journal ArticleDOI
15 Feb 1996-Nature
TL;DR: Possible strategies for systematic approach to the discovery of gene function in the yeast Saccfiaromyces cerevisiae, a model eukaryote whose genome sequence will soon be completed.
Abstract: Genome sequencing is leading to the discovery of new genes at a rate 50-100 times greater than that achieved by classical genetics, but the biological function of almost half of these genes is completely unknown. In order fully to exploit genome sequence data, a systematic approach to the discovery of gene function is required. Possible strategies are discussed here in the context of functional analysis in the yeast Saccharomyces cerevisiae, a model eukaryote whose genome sequence will soon be completed.

Journal ArticleDOI
TL;DR: The data establish that the close biological relationship ofHHV-6 and HHV-7 is reflected at the genetic level, where there is a very high degree of conservation of genetic content and encoded amino acid sequences.
Abstract: Human herpesvirus 7 (HHV-7) is a recently isolated betaherpesvirus that is prevalent in the human population, with primary infection usually occurring in early childhood. HHV-7 is related to human herpesvirus 6 (HHV-6) in terms of both biological and, from limited prior DNA sequence analysis, genetic criteria. However, extensive analysis of the HHV-7 genome has not been reported, and the precise phylogenetic relationship of HHV-7 to the other human betaherpesviruses HHV-6 and human cytomegalovirus has not been determined. Here I report on the determination and analysis of the complete DNA sequence of HHV-7 strain JI. The data establish that the close biological relationship of HHV-6 and HHV-7 is reflected at the genetic level, where there is a very high degree of conservation of genetic content and encoded amino acid sequences. The data also delineate loci of divergence between the HHV-6 and HHV-7 genomes, which occur at the genome terminal in the region of the terminal direct-repeat elements and within limited regions of the unique component. Of potential significance with respect to biological and evolutionary divergence of HHV-6 and HHV-7 are notable structural differences in putative transcriptional regulatory genes specified by the direct-repeat and immediate-early region A loci of these viruses and the absence of an equivalent of the HHV-6 adeno-associated virus type 2 rep gene homolog in HHV-7.

Journal ArticleDOI
TL;DR: Polymerase chain reaction-based assays are developed to distinguish all seven CYP2C9 cDNA sequences, and have determined their allele frequencies in the Caucasian population, allowing the prediction of CYP 2C9 phenotype, thus identifying those individuals who may exhibit different drug pharmacokinetics for CYP1C9 substrates.
Abstract: Cytochrome P450 CYP2C9 metabolizes a wide variety of clinically important drugs, including phenytoin, tolbutamide, warfarin and a large number of non-steroidal anti-inflammatory drugs. Previous studies have shown that even relatively conservative changes in the amino acid composition of this enzyme can affect both its activity and substrate specificity. To date six different human CYP2C9 cDNA sequences, as well as the highly homologous CYP2C10 sequence have been reported suggesting that the CYP2C9 gene is polymorphic. Only nine single base substitutions in the coding region of CYP2C9 account for the differences seen between the CYP2C9 proteins. In this report we have developed polymerase chain reaction (PCR)-based assays to distinguish all seven sequences, and have determined their allele frequencies in the Caucasian population. Of the seven sequences studied in one hundred individuals only three appeared to be CYP2C9 alleles. These alleles termed CYP2C9*1, CYP2C9*2 and CYP2C9*3 had allele frequencies of 0.79, 0.125 and 0.085 respectively. The CYP2C10 gene could not be found in any of the samples studied. The assays developed here will allow the prediction of CYP2C9 phenotype, thus identifying those individuals who may exhibit different drug pharmacokinetics for CYP2C9 substrates.

Journal ArticleDOI
15 Mar 1996-Virology
TL;DR: A new hemagglutinin (HA) subtype, H15, was proposed in this paper, where the amino acid sequence data was evaluated when determining the HA subtypes of influenza A viruses.

Journal ArticleDOI
TL;DR: A gene coding for a catalase-peroxidase activity was identified on a 9-7 kb Smal DNA fragment derived from the large plasmid pO157 of enterohaemorrhagic Escherichia coli (EHEC) O157:H7 strain EDL 933, and the newly discovered enzyme was designated KatP, to indicate its plasmids origin.
Abstract: A gene coding for a catalase-peroxidase activity was identified on a 9.7 kb Smal DNA fragment derived from the large plasmid pO157 of enterohaemorrhagic Escherichia coli (EHEC) O157:H7 strain EDL 933. Nucleotide sequencing revealed an ORF of 2208 bp and predicted a 736 amino acid polypeptide with a molecular mass of 81.8 kDa. This putative protein was found to be highly homologous to members of the bacterial bifunctional catalase-peroxidase family. Analysis of its amino acid sequence revealed the presence of characteristic peroxidase 1 and 2 motifs. In addition, an N-terminal signal sequence was found, suggesting that the catalase-peroxidase is transported through the cytoplasmic membrane. EHEC catalase-peroxidase activities were investigated in cytoplasmic and periplasmic crude extracts as well as in culture supernatants from wild-type and recombinant E. coli strains. EHEC-specific catalase-peroxidase activity was detected primarily in the periplasm in strain EDL 933. The newly discovered enzyme was designated KatP, to indicate its plasmid origin. PCR analysis of representative strains of all enteric E. coli pathogroups (i.e. enterohaemorrhagic, enterotoxigenic, enteropathogenic, enteroaggregative and enteroinvasive E. coli) revealed a close association between the occurrence of EHEC-haemolysin and the katP gene in Shiga-like-toxin-producing E. coli O157 strains.

Journal ArticleDOI
05 Apr 1996-Cell
TL;DR: In this article, the authors established conditions where 12-and 23-spacer signal sequences are both necessary for cleavage and showed that the RAG proteins determine both aspects of the specificity of V(D)J recombination, the recognition of a single signal sequence and the correct 12/23 coupling in a pair of signals.