scispace - formally typeset
Search or ask a question

Showing papers on "GenBank published in 1994"


Book ChapterDOI
TL;DR: Using the FASTA Program to Search Protein and DNA Sequence Databases, and Converting Between Sequence Formats.
Abstract: GCG: Fragment Assembly Programs. GCG: Drawing Linear Restriction Maps. GCG: Drawing Circular Restriction Maps. GCG: Displaying Restriction Sites and Possible Translations in a DNA Sequence. GCG: Assembly of Sequences into New Sequence Constructs. GCG: Comparison of Sequences. GCG: Production of Multiple-Sequence Alignment. GCG: Database Searching. GCG: Pattern Recognition. GCG: Translation of DNA Sequence. GCG: Analysis of Protein Sequences. GCG: The Analysis of RNA Secondary Structure. GCG: Preparing Sequence Data for Publication. MicroGenie: Introduction and Restriction Enzyme Analysis. MicroGenie: Shotgun DNA Sequencing. MicroGenie: Translation. MicroGenie: Protein Analysis. MicroGenie: Homology Searches. PC/GENE: Sequence Entry and Assembly. PC/GENE: Restriction Enzyme Analysis. PC/GENE: Translation and Searches for Protein Coding Regions. PC/GENE: Sequence Comparisons and Homologies. PC/GENE: Database Searches. PC/GENE: Searches for Functional Sites in Nucleic Acids and Proteins. Using the FASTA Program to Search Protein and DNA Sequence Databases. Converting Between Sequence Formats. Obtaining Software via INTERNET. Submission of Nucleotide Sequence Data to EMBL/GenBank/DDBJ. Index.

1,106 citations


Journal ArticleDOI
TL;DR: The characterization of 5-HTT gene will aid to advance molecular pharmacologic studies of5-HT uptake regulation and facilitate investigations of its role in psychiatric disorders.
Abstract: The gene encoding the human serotonin transporter (5-HTT) has been isolated and characterized. The human 5-HTT gene is composed of 14 exons spanning approximately 31 kb. The sequence of all exons including adjacent intronic sequences and a tandem repeat DNA polymorphism (VNTR) has been determined and deposited in the EMBL/GenBank data base with the accession numbers X76753 to X76762. The characterization of 5-HTT gene will aid to advance molecular pharmacologic studies of 5-HT uptake regulation and facilitate investigations of its role in psychiatric disorders.

625 citations


Journal ArticleDOI
TL;DR: Three regions of sequence similarity have been reported in several protein and small-molecule S-adenosylmethionine-dependent methyltransferases and it is suggested that these conserved regions contribute to the binding of the substrate S-ADenosyl methionines and/or the product S- adenosylhomocysteine.

475 citations


Journal ArticleDOI
TL;DR: A protocol for the prediction of the coding sequences of unidentified human genes based on the double selection and sequence analysis of cDNA clones with inserts carrying unreported 5'-terminal sequences and with insert sizes corresponding to nearly full-length transcripts is established.
Abstract: We established a protocol for the prediction of the coding sequences of unidentified human genes based on the double selection and sequence analysis of cDNA clones with inserts carrying unreported 5'-terminal sequences and with insert sizes corresponding to nearly full-length transcripts. By applying the protocol, cDNA clones with inserts longer than 2 kb were isolated from a cDNA library of human immature myeloid cell line KG-1, and the coding sequences of 40 new genes were predicted. A computer search of the sequences indicated that 20 genes contained sequences similar to known genes in the GenBank/EMBL databases. The sequences of the remaining 20 genes were entirely new, and characteristic protein motifs or domains were identified in 32 genes. Other sequence features noted were that the coding sequences of 23 genes were followed by relatively long stretches of 3'-untranslated sequences and that 5 genes contained repetitive sequences in their 3'-untranslated regions. The chromosomal location of these genes has been determined. By increasing the scale of the above analysis, the coding sequences of many unidentified genes can be predicted.

406 citations


Journal ArticleDOI
18 Nov 1994-Science
TL;DR: DNA isolated from 80-million-year-old bone fragments found in strata of the Upper Cretaceous Blackhawk Formation in the roof of an underground coal mine in eastern Utah demonstrates that small fragments of DNA may survive in bone for millions of years.
Abstract: DNA was extracted from 80-million-year-old bone fragments found in strata of the Upper Cretaceous Blackhawk Formation in the roof of an underground coal mine in eastern Utah. This DNA was used as the template in a polymerase chain reaction that amplified and sequenced a portion of the gene encoding mitochondrial cytochrome b. These sequences differ from all other cytochrome b sequences investigated, including those in the GenBank and European Molecular Biology Laboratory databases. DNA isolated from these bone fragments and the resulting gene sequences demonstrate that small fragments of DNA may survive in bone for millions of years.

271 citations


Journal ArticleDOI
TL;DR: HVERGEN allows the user to easily select sets of homologous vertebrate genes, and thus is particularly useful for comparative sequence analysis, or molecular evolution studies.
Abstract: Comparison of homologous genes is a major step for many studies related to genome structure, function or evolution. Similarity search programs easily find genes homologous to a given sequence. However, only very tedious manual procedures allow the retrieval of all sets of homologous genes sequenced for a given set of species. Moreover, this search often generates errors due to the complexity of data to be managed simultaneously: phylogenetic trees, alignments, taxonomy, sequences and related information. HOVERGEN helps to solve these problems by integrating all this information. HOVERGEN corresponds to GenBank sequences from all vertebrate species, with some data corrected, clarified, or completed, notably to address the problem of redundancy. Coding sequences have been classified in gene families. Protein multiple alignments and phylogenetic trees have been calculated for each family. Sequences and related information have been structured in an ACNUC database which permits complex selections. A graphical interface has been developed to visualize and edit trees. Genes are displayed in color, according to their taxonomy. Users have directly access to all information attached to sequences and to multiple alignments simply by clicking on genes. This graphical tool gives thus a rapid and simple access to all data necessary to interpret homology relationships between genes. HOVERGEN allows the user to easily select sets of homologous vertebrate genes, and thus is particularly useful for comparative sequence analysis, or molecular evolution studies.

226 citations


Journal ArticleDOI
TL;DR: The databases and services of the European Bioinformatics Institute (EBI) are described, which maintain and distributes the EMBL Nucleotide Sequence Database, Europe's primary nucleotide sequence data resource.
Abstract: This paper describes the databases and services of the European Bioinformatics Institute (EBI). In collaboration with DDBJ and GenBank/NCBI, the EBI maintains and distributes the EMBL Nucleotide Sequence Database, Europe's primary nucleotide sequence data resource. The EBI also maintains and distributes the SWISS-PROT Protein Sequence Database, in collaboration with Amos Bairoch of the University of Geneva. Over thirty additional specialist molecular biology databases, as well as software and documentation of interest to molecular biologists, are also available. The EBI network services include database searching, entry retrieval, and sequence similarity searching facilities.

184 citations


Journal ArticleDOI
TL;DR: Thirty of the bovine-derived microsatellite systems gave specific and polymorphic products in sheep, adding to the number of useful markers in that species.
Abstract: Microsatellites or simple sequence repeat (SSR) polymorphisms are used widely in the construction of linkage maps in many species. High levels of polymorphism coupled with the ease of analysis of the polymerase chain reaction (PCR) have resulted in this type of maker being one of the most widely used for genetic analysis. In this paper we describe 58 polymorphic bovine microsatellites that were isolated from insert size selected bovine genomic libraries. Primer sequences, number of alleles, and heterozygosity levels in cattle reference families are reported. Chromosomal locations for 47 of these microsatellites as well as for 7 previously described systems derived from entries in the Genbank or EMBL databases have been determined. The markers map to 24 syntenic or chromosomal locations. Polymorphic bovine microsatellites were estimated to occur, on average, every 320 kb, and there is no evidence of clustering in the genome. Thirty of the bovine-derived microsatellite systems gave specific and polymorphic products in sheep, adding to the number of useful markers in that species.

161 citations


Journal ArticleDOI
TL;DR: A comparison of the human cDNA sequence with the GenBank expressed sequence tag (EST) data base has identified a relative from human skeletal muscle, EST25263, which is probably a human homologue of the published mouse syntrophin 2.
Abstract: Duchenne and Becker muscular dystrophies are caused by defects of dystrophin, which forms a part of the membrane cytoskeleton of specialized cells such as muscle. It has been previously shown that the dystrophin-associated protein A1 (59-kDa DAP) is actually a heterogeneous group of phosphorylated proteins consisting of an acidic (alpha-A1) and a distinct basic (beta-A1) component. Partial peptide sequence of the A1 complex purified from rabbit muscle permitted the design of oligonucleotide probes that were used to isolate a cDNA for one human isoform of A1. This cDNA encodes a basic A1 isoform that is distinct from the recently described syntrophins in Torpedo and mouse and is expressed in many tissues with at least five distinct mRNA species of 5.9, 4.8, 4.3, 3.1, and 1.5 kb. A comparison of our human cDNA sequence with the GenBank expressed sequence tag (EST) data base has identified a relative from human skeletal muscle, EST25263, which is probably a human homologue of the published mouse syntrophin 2. We have mapped the human basic component of A1 and EST25263 genes to chromosomes 8q23-24 and 16, respectively.

120 citations


Journal ArticleDOI
TL;DR: CDNA probes from reverse-transcribed mRNAs of fetal and adult hearts were used to study differential expression of selected clones in cardiac development and catalogued according to their putative structural and cellular functions.
Abstract: The heart, which is composed of all the cellular components of the circulatory system, is a representative organ for obtaining genes expressed in the cardiovascular system in normal and disease states. We used partial sequences of cDNA clones, or expressed sequence tags, to identify and tag genes expressed in this organ. More than 3500 partial sequences representing > 3000 cDNA clones have been obtained from either the 5' or 3' end of inserts derived from human heart cDNA libraries. Of 3132 cDNA clones analyzed by sequence similarity searching against the GenBank/EMBL data bases, 1485 (47.4%) were found to represent additional, previously undiscovered genes, whereas 267 clones were matched to human brain expressed sequence tags. Clones matching to known genes were catalogued according to their putative structural and cellular functions. cDNA probes from reverse-transcribed mRNAs of fetal and adult hearts were used to study differential expression of selected clones in cardiac development. Cataloguing genes expressed in the heart may provide insight into the genes involved in health and cardiovascular disease.

114 citations


Journal ArticleDOI
01 May 1994-Genomics
TL;DR: A comparison of the positions of microsatellites in human vs rodent homologous sequences indicates that some arrays are not extensively conserved for long periods of time, even when they form parts of protein coding sequences.

Journal ArticleDOI
TL;DR: The analysis of a number of randomly selected cDNAs, by a combination of measuring mRNA expression, ‘single-pass’ sequencing (SPS, and genome mapping), provides synergistic information in eventually deducing the actual function of these types of clones.
Abstract: As one component of a maize genome project, we report the analysis of a number of randomly selected cDNAs, by a combination of measuring mRNA expression, 'single-pass' sequencing (SPS), and genome mapping. Etiolated seedling (490) and membrane-free polysomal endosperm cDNA clones (576) were evaluated for their transcription levels by hybridizing with a probe prepared from total mRNA and categorized as corresponding to abundantly or rarely expressed mRNAs and as either constitutive or tissue-specific. A total 313 clones from the two libraries were submitted to 'single-pass' sequencing from the presumed 5' end of the mRNA and the nucleotide sequence compared with the GenBank database. About 61% of the clones showed no significant similarities within GenBank, 14% of the clones exhibited a high degree of similarity, while the remaining 25% exhibited a lesser degree of similarity. The chromosomal location of more than 300 clones was determined by RFLP mapping using standard populations. The results demonstrate that a combination of analyses provides synergistic information in eventually deducing the actual function of these types of clones.

Journal ArticleDOI
TL;DR: Examination of the sequence database for several prokaryotic and eukaryotic organisms, demonstrates that coding sequences with in-phase, 100% overlapping antisense ORFs are present in every genome studied so far.
Abstract: Long Open Reading Frames (ORFs) in antisense DNA strands have been reported in the literature as being rare events. However, an extensive analysis of the GenBank database revealed that a substantial number of genes from several species contain an in-phase ORF in the antisense strand, that overlaps entirely the coding sequence of the sense strand, or even extends beyond. The findings described in this paper show that this is a frequent, non-random phenomenon, which is primarily dependent on codon usage, and to a lesser extent on gene size and GC content. Examination of the sequence database for several prokaryotic and eukaryotic organisms, demonstrates that coding sequences with in-phase, 100% overlapping antisense ORFs are present in every genome studied so far.

Journal ArticleDOI
01 Mar 1994-Yeast
TL;DR: Preliminary analysis of null mutants constructed by gene replacement has indicated that the MDL genes are not essential for viability of yeast, suggesting they are ‘half‐molecule’ ABC proteins.
Abstract: ATP-binding cassette (ABC) transporters share significant sequence identity within their ATP-binding domains. Degenerate oligonucleotides based on highly conserved portions of the ATP-binding domain genes were used to clone portions of two members of the ABC gene superfamily from Saccharomyces cerevisiae DNA. These genes were designated MDL1 and MDL2 (for multidrug resistance-like). Each MDL gene is predicted to encode a single set of transmembrane domains and a single ATP-binding domain, thus the MDL gene products are 'half-molecule' ABC proteins. The two genes were mapped to precise regions on chromosomes XII and XVI and show a considerable similarity to the mammalian P-glycoprotein/multidrug resistance (MDR) and peptide transporter (TAP) genes. Preliminary analysis of null mutants constructed by gene replacement has indicated that the MDL genes are not essential for viability of yeast. The sequences have been deposited in the GenBank data library under Accession Numbers L16958 (Locus YSCBCSA) and L16959 (Locus YSCBCSB).

Journal ArticleDOI
TL;DR: A rule-based interactive computer program for finding introns called INTRON.PLOT has been developed and was used to successfully analyze 7 newly sequenced genes, including those of Schizosaccharomyces pombe DNA sequences.
Abstract: A database of 210 Schizosaccharomyces pombe DNA sequences (524,794 bp) was extracted from GenBank (release number 81.0) and examined by a number of methods in order to characterize statistical features of these sequences that might serve as signals or constraints for messenger RNA splicing. The statistical information compiled includes splicing signal (donor, acceptor and branch site) profiles, translational initiation start profile, exon/intron length distributions, ORF distribution, CDS size distribution, codon usage table, and 6-tuple distribution. The information content of the various signals are also presented. A rule-based interactive computer program for finding introns called INTRON.PLOT has been developed and was used to successfully analyze 7 newly sequenced genes.

Journal ArticleDOI
TL;DR: The discovery of the first two group I introns in the nsrDNA from the genus Acanthamoeba are described, which are in different locations in the genes, and have no significant primary sequence similarity to each other.
Abstract: The discovery of group I introns in small subunit nuclear rDNA (nsrDNA) is becoming more common as the effort to generate phylogenies based upon nsrDNA sequences grows. In this paper we describe the discovery of the first two group I introns in the nsrDNA from the genus Acanthamoeba. The introns are in different locations in the genes, and have no significant primary sequence similarity to each other. They are identified as group I introns by the conserved P, Q, R and S sequences (1), and the ability to fit the sequences to a consensus secondary structure model for the group I introns (1, 2). Both introns are absent from the mature srRNA. A BLAST search (3) of nucleic acid sequences present in GenBank and EMBL revealed that the A. griffini intron was most similar to the nsrDNA group I intron of the green alga Dunaliella parva. A similar search found that the A. lenticulata intron was not similar to any of the other reported group I introns.

Journal ArticleDOI
11 Oct 1994-Gene
TL;DR: The deduced PAU1, PAU2 and PAU3 aa sequences are all highly homologous with the SRP1 aa sequence, which contains eight serine-rich tandem repeats of 12 aa each, at its C terminus, leading to the suggested name of seripauperins for this family of genes.

Journal ArticleDOI
TL;DR: A human fetal heart cDNA library was constructed in the lambda gt22A expression vector and partial cDNA sequences, or expressed sequence tags (ESTs) were searched against the Genbank and EMBL databases to identify novel genes expressed in the human cardiovascular system.

Journal ArticleDOI
TL;DR: A repetitive element in C.elegans has been found that bears high homology to the element mariner of Drosophila mauritiana and this class of elements has now been described in insects, planaria and nematodes.
Abstract: A repetitive element in C.elegans has been found that bears high homology to the element mariner of Drosophila mauritiana (EMBL accession number X77804). This element is present in about 20 copies in the N2 strain of C.elegans, and appears in roughly equal copy numbers in the related strain BO and in the hybrid strains RW7097 and TR679. There is only one copy of this MLE in three related species of Caenorhabditis. A cDNA of this mariner-like element (MLE) codes for a protein with 58% homology to the Drosophila transposase. The mariner-like element is not mobile in N2. This class of elements has now been described in insects, planaria and nematodes (GenBank accession number M98552 and this report).

Journal ArticleDOI
01 Sep 1994-Yeast
TL;DR: A gene whose expression enables yeast cells to overcome the inhibition of growth produced by the presence of 2‐deoxyglucose is isolated, which contains an open reading frame that may code for a protein of 27 100 Da.
Abstract: We have isolated a gene whose expression enables yeast cells to overcome the inhibition of growth produced by the presence of 2-deoxyglucose. The gene contains an open reading frame of 738 bp that may code for a protein of 27 100 Da. Cells carrying this gene contain high levels of a specific 2-deoxyglucose-6-phosphate phosphatase. The expression of this phosphatase is increased by the presence of 2-deoxyglucose and is constant along the growth curve. The sequence reported here has the GenBank accession number U03107.

Journal ArticleDOI
TL;DR: The computational tools developed in this study may help to design more efficient ASOs by decreasing their nonspecific binding activity.
Abstract: Antisense oligonucleotides (ASOs) are capable of blocking the expression of targeted genes and are potential antitumor and antiviral therapeutic agents. The specificity of ASO gene inhibition is compromised when homology to other sequences allows the selected ASO to bind to nontargeted mRNAs. To reduce this nonspecific activity, an ASO should target a sequence that is predicted to be unlikely to occur in other mRNAs. The probability of a sequence being unique can be predicted by determining the genomic frequency of short stretches of sequences contained within the target sequence. Two computer programs, OLIGOMER and HEXAGRAPH, were developed for this analysis. OLIGOMER was used to analyze the genomic frequencies of di-, tri-, and hexamers in more than 24 million nucleotides from 8 different genomes in GenBank. A mathematical model was developed that predicts the genomic frequency of longer oligomers on the basis of the observed frequencies of shorter oligomers. The second program, HEXAGRAPH, was used to g...

Journal ArticleDOI
01 Jul 1994-Genomics
TL;DR: The findings indicate that the matches in the testis transcript population appear to be identifying a different spectrum of gene sequences.

Journal ArticleDOI
01 Jul 1994-Yeast
TL;DR: The DNA sequence of the LTE1 gene on the left arm of chromosome I of Saccharomyces cerevisiae has been determined and the derived amino acid sequence has significant similarities to the amino acid sequences of the guanine nucleotide releasing factor isolated from a rat brain library.
Abstract: The DNA sequence of the LTE1 gene on the left arm of chromosome I of Saccharomyces cerevisiae has been determined. The LTE1 open reading frame comprises 4305 bp that can be translated into 1435 amino acid residues. The position of this open reading frame corresponds well to that of a 4·7 kb transcript that has been mapped to this position. The derived amino acid sequence has significant similarities to the amino acid sequence of the guanine nucleotide releasing factor isolated from a rat brain library. The carboxy-terminus of the LTE1 protein also shows similarities to other guanine nucleotide exchange factors of the S. cerevisiae CDC25 family. The sequence has been deposited in the GenBank data library under Accession Number L20125.

Journal ArticleDOI
15 Jan 1994-Genomics
TL;DR: To study the connection among NotI linking clones, CpG islands, and genes, the sequence surrounding 143 NotI sites was determined and suggests that NotI Sites have a much stronger association with genes.

Journal ArticleDOI
TL;DR: This update of the Escherichia coli database (ECD release 20) represents another substantial increase in sequence information, it also allows now to find the exact physical location of each individual gene or regulatory region, even regarding discrepancies in nomenclature.
Abstract: We have compiled the DNA sequence data for E. coli available from the GENBANK and EMBL data libraries and independently from the literature. Starting with this update of our Escherichia coli database (ECD release 20) we provide major changes compared to previous issues. This update not only represents another substantial increase in sequence information, it also allows now to find the exact physical location of each individual gene or regulatory region, even regarding discrepancies in nomenclature. In order to save space this printed version does not contain the database itself anymore, but we provide several examples. The complete database is publically available in electronic form together with a self explaining application program or as a flat file. The complete compilation including a full set of genetic map data and the E. coli protein index can be obtained in machine readable form from the EMBL data library as a part of the CD-ROM issue of the EMBL sequence database, released and updated every three months. After deletion of all detected overlaps a total of 2,878,364 individual bp is found to be determined till the end of June 1994. This corresponds to a total of 60.98% of the entire E. coli chromosome consisting of about 4,720 kbp. This number may actually be higher by 9161 bp derived from other strains of E. coli.

Journal ArticleDOI
01 Sep 1994-Plasmid
TL;DR: The restriction enzyme and genetic map of the antibiotic-resistance region of plasmid pSa is related to Tn21 integrons by the insertion of 5.4 kb containing a chloramphenicol resistance gene (catII) and a 1.1-kb direct repeat.

Journal ArticleDOI
01 Nov 1994-Yeast
TL;DR: The REV7 gene is cloned by complementation of the rev7‐2 mutant defect, and its sequence is determined which encodes a predicted protein of Mr 28 759 which is unlike any other protein in the NCBI non‐redundant protein sequence data base, and which is inessential for viability.
Abstract: The function of the REV7 gene is required for DNA damage-induced mutagenesis in budding yeast, Saccharomyces cerevisiae, and is therefore thought to promote replication past sites of mutagen damage in the DNA template. We have cloned this gene by complementation of the rev7-2 mutant defect, and determined its sequence. REV7 encodes a predicted protein of Mr 28 759 which is unlike any other protein in the NCBI non-redundant protein sequence data base, and which is inessential for viability. The sequence of the 3·88 kb yeast genomic fragment containing REV7 has been deposited in Genbank accession number U07228.

Journal ArticleDOI
01 Apr 1994-Yeast
TL;DR: The complete DNA sequence of cosmid clone pUKG148 comprising 28 600 base pairs was determined from an ordered set of subclones as mentioned in this paper, which contains 22 open reading frames longer than 100 amino acids of which five are entirely covered by other, longer reading frames.
Abstract: The complete DNA sequence of cosmid clone pUKG148 comprising 28 600 base pairs was determined from an ordered set of subclones. The sequence contains 22 open reading frames longer than 100 amino acids of which five are entirely covered by other, longer reading frames. YKL054 exhibits 25% homology at the amino acid level to a number of plant storage proteins of the glutenin type, YKL056 is 40% homologous to a translationally controlled mammalian tumour protein, YKL058 (TOA2) is identical to the small subunit of transcription factor TFIIA from yeast and YKL060 is identical to the FBA1 gene also from yeast, already sequenced but not mapped to chromosome XI. The remaining 13 open reading frames show weak or no homology to known genes. The nucleotide sequence data have been deposited in the EMBL and GenBank data libraries under Accession Number X75781.

Journal ArticleDOI
01 Apr 1994-Yeast
TL;DR: The ability of this gene, in two copies per cell, to reverse the csg1 defect suggests it may have a role in regulating Ca2+ homeostasis.
Abstract: We have isolated, sequenced, mapped and disrupted a novel gene, CCC1, from Saccharomyces cerevisiae. This gene displays non-allelic complementation of the Ca2+-sensitive phenotype conferred by the csg1 mutation. The ability of this gene, in two copies per cell, to reverse the csg1 defect suggests it may have a role in regulating Ca2+ homeostasis. The sequence of CCC1 indicates that it encodes a 322 amino acid, membrane-associated protein. The CCC1 gene is located on the right arm of chromosome XII. The sequence has been deposited in the GenBank data library under Accession Number L24112.

Journal ArticleDOI
TL;DR: A computer-aided homology search of the GenBank nucleotide database using the amino acid sequence of human acyl CoA-binding protein (ACBP)/diazepam-binding inhibitor (DBI)-endozepine as a probe revealed that a genomic fragment containing the gene encoding the mallard duck S-acyl fatty acid synthase thioesterase also contains sequences which encode the duck homolog of ACBP/DBI.
Abstract: A computer-aided homology search of the GenBank nucleotide database using the amino acid sequence of human acyl CoA-binding protein (ACBP)/diazepam-binding inhibitor (DBI)-endozepine as a ...