scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Recognition of protein coding regions in DNA sequences

11 Sep 1982-Nucleic Acids Research (Oxford University Press)-Vol. 10, Iss: 17, pp 5303-5318
TL;DR: The test has been thoroughly proven on 400,000 bases of sequence data: it misclassifies 5% of the regions tested and gives an answer of "No Opinion" one fifth of the time.
Abstract: We give a test for protein coding regions which is based on simple and universal differences between protein-coding and noncoding DNA. The test is simple enough to use without a computer and is completely objective. The test has been thoroughly proven on 400,000 bases of sequence data: it misclassifies 5% of the regions tested and gives an answer of "No Opinion" one fifth of the time. We predict some new coding and noncoding regions in published sequences.
Citations
More filters
Journal ArticleDOI
TL;DR: A group of programs that will interact with each other has been developed for the Digital Equipment Corporation VAX computer using the VMS operating system.
Abstract: The University of Wisconsin Genetics Computer Group (UWGCG) has been organized to develop computational tools for the analysis and publication of biological sequence data. A group of programs that will interact with each other has been developed for the Digital Equipment Corporation VAX computer using the VMS operating system. The programs available and the conditions for transfer are described.

14,575 citations

Journal ArticleDOI
28 May 1993-Science
TL;DR: A gene discovered by positional cloning has been identified as the von Hippel-Lindau (VHL) disease tumor suppressor gene, and a restriction fragment encompassing the gene showed rearrangements in 28 of 221 VHL kindreds.
Abstract: A gene discovered by positional cloning has been identified as the von Hippel-Lindau (VHL) disease tumor suppressor gene. A restriction fragment encompassing the gene showed rearrangements in 28 of 221 VHL kindreds. Eighteen of these rearrangements were due to deletions in the candidate gene, including three large nonoverlapping deletions. Intragenic mutations were detected in cell lines derived from VHL patients and from sporadic renal cell carcinomas. The VHL gene is evolutionarily conserved and encodes two widely expressed transcripts of approximately 6 and 6.5 kilobases. The partial sequence of the inferred gene product shows no homology to other proteins, except for an acidic repeat domain found in the procyclic surface membrane glycoprotein of Trypanosoma brucei.

2,714 citations

Journal ArticleDOI
21 Jun 1991-Science
TL;DR: Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs), which will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing.
Abstract: Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor. Forty-six ESTs were mapped to chromosomes after amplification by the polymerase chain reaction. This fast approach to cDNA characterization will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, and serve as a resource in diverse biological research fields.

2,375 citations

Journal ArticleDOI
TL;DR: The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.
Abstract: Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.

2,209 citations

Journal ArticleDOI
TL;DR: The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step and was characterized as appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.
Abstract: Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1% query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.

1,798 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper organizes the organization of protein codes into split genes, a small number of which are expressed in the chickenuct, and discusses generalization, generalization and Molecular Evolution.

3,865 citations

Journal ArticleDOI
TL;DR: The complete nucleotide sequence of the 16S RNA gene from the rrnB cistron of Escherichia coli has been determined by using three rapid DNA sequencing methods, and discrepancies may be explained by heterogeneity among 16S rRNA sequences from different cistrons.
Abstract: The complete nucleotide sequence of the 16S RNA gene from the rrnB cistron of Escherichia coli has been determined by using three rapid DNA sequencing methods. Nearly all of the structure has been confirmed by two to six independent sequence determinations on both DNA strands. The length of the 16S rRNA chain inferred from the DNA sequence is 1541 nucleotides, in close agreement with previous estimates. We note discrepancies between this sequence and the most recent version of it reported from direct RNA sequencing [Ehresmann, C., Stiegler, P., Carbon, P. & Ebel, J.P. (1977) FEBS Lett. 84, 337-341]. A few of these may be explained by heterogeneity among 16S rRNA sequences from different cistrons. No nucleotide sequences were found in the 16S rRNA gene that cannot be reconciled with RNase digestion products of mature 16S rRNA.

2,326 citations

Journal ArticleDOI
TL;DR: Comparison of the sequence of λrifd18 with sequences from other isolates of the rrB operon provides direct evidence for structural rearrangements within rRNA operons.

1,683 citations

Journal ArticleDOI
TL;DR: The nucleic acid sequence bank now contains 161 mRNAs, 43 new genes are added, and internal regulation of mRNA expression by different third base choices between quartet and duet codons is proposed for bacterial genes.
Abstract: The nucleic acid sequence bank now contains 161 mRNAs, 43 new genes are added. One sequence, that of B. mori fibroin, is dropped due to uncertainty on the starting point for translation. Frequencies of all codons are given for each gene added and for each genome type in the total bank. A new series of correspondence analyses on codon use is presented, substantiating the genome hypothesis. Internal regulation of mRNA expression by different third base choices between quartet and duet codons is proposed for bacterial genes.

910 citations

Journal ArticleDOI
TL;DR: A conceptual framework for developing a theory of translational initiation and three approaches to a higher order approximation are described.
Abstract: INTRODUCTION 365 BIOCHEMISTRY OF TRANSLATIONAL INITIATION 367 A CONCEPTUAL FRAMEWORK: RIBOSOME BINDING SITE STRENGTHS 373 THE NATURE OF RIBOSOME BINDING SITES 374 A First Approximation 374 Three Approaches to a Higher Order Approximation 380 Genetics and biochemistry 380 Statistics .......•••••••••.••..•.••.•...•.•........•....•••••••••••••••••• •.•••••••••• •.........•.• 382 DETERMINANTS: SEQUENCE AND/OR STRUCTURE ... 388 Pathways to Initiation 388 Unstructured Signals 389 Structured Initiation Regions 389 TRANSLATIONAL REGULATION 392 Unstructured RNAs 393 RNAs Whose Structures Seem Obvious 394 Complex Translational Regulation 396

754 citations