scispace - formally typeset
Search or ask a question

Showing papers by "Roderic Guigó published in 1997"


Book ChapterDOI
01 Jan 1997
TL;DR: This chapter reviews the sequence-based measures indicative of protein-coding function in genomic DNA and finds that amino acid usage and codon preference carry a lot of information about coding function, and neither of these measures appears to be as discriminant as codon usage.
Abstract: Publisher Summary This chapter reviews the sequence-based measures indicative of protein-coding function in genomic DNA. A coding statistic can be defined as a function that computes given a DNA sequence a real number related to the likelihood that the sequence is coding for a protein. Model-dependent coding statistics are likely to capture more of the specific features of coding DNA since they are dependent on more parameters. It is suggested that model-dependent coding statistics may be more powerful in discriminating coding from noncoding DNA. A DNA sequence can be partitioned in a sequence of consecutive nonoverlapping codons in three different ways depending on the nucleotide in the sequence on which the grouping of nucleotides into codons starts. It is found that amino acid usage and codon preference carry a lot of information about coding function, and neither of these measures appears to be as discriminant as codon usage. The distribution of base frequencies at codon positions can be assumed to describe statistically a prototypical codon. The measures based on base compositional bias between codon positions are also elaborated.

89 citations


Journal ArticleDOI
TL;DR: As the Human Genome Project enters the large-scale sequencing phase, computational gene identification methods are becoming essential for the automatic analysis and annotation of large uncharacterized genomic sequences.

55 citations


Journal ArticleDOI
TL;DR: With increasing frequency the DNA sequence of a large region of the human genome is known before the biologically relevant features that encode – protein-coding genes, in particular – have been fully characterized, and characterization by computational analysis is substantially less expensive and costly than by experimental means.
Abstract: In the positional cloning approach to the identification of human genes, the gene responsible for a given phenotype is first mapped to its genomic localization, and then a variety of experimental methods is used to identify, within this region, the DNA sequences directly encoding the gene. In this approach, obtaining the sequence of the gene is, thus, one of the last steps in the gene identification process, and it is not uncommon that it is only carried out partially. However, the progress in DNA sequencing technology is altering this picture. Obtaining the continuous sequence of a large genomic region known to contain a gene is already feasible [1], and, as the Human Genome Project is entering the largescale sequencing phase, large uncharacterized genomic sequences – typically a few hundred kilobases – are being obtained in a highly automatic way (visit, for instance, the Human Sequencing page at the Sanger Center Web site „http://www.sanger. ac.uk/humanseq/“). With increasing frequency the DNA sequence of a large region of the human genome is thus known before the biologically relevant features that encode – protein-coding genes, in particular – have been fully characterized. Obviously, characterizing such features by computational analysis is substantially less expensive and costly than by experimental means. Not surprisingly, the problem of the identification of genes in DNA sequences is attracting increasing attention within the field of computational biology. The problem is particularly relevant to molecular medicine, since determining the coding DNA segments, may become the rate limiting step in the process of the identification of disease genes localized to a given Roderic Guigó

25 citations