scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Functional annotation of hypothetical proteins - A review.

29 Dec 2006-Bioinformation (Biomedical Informatics Publishing Group)-Vol. 1, Iss: 8, pp 335-338
TL;DR: Some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins are discussed, including automated genome sequence analysis and annotation.
Abstract: The complete human genome sequences in the public database provide ways to understand the blue print of life. As of June 29, 2006, 27 archaeal, 326 bacterial and 21 eukaryotes is complete genomes are available and the sequencing for 316 bacterial, 24 archaeal, 126 eukaryotic genomes are in progress. The traditional biochemical/molecular experiments can assign accurate functions for genes in these genomes. However, the process is time-consuming and costly. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced genomes. Automated genome sequence analysis and annotation may provide ways to understand genomes. Thus, determination of protein function is one of the challenging problems of the post-genome era. This demands bioinformatics to predict functions of un-annotated protein sequences by developing efficient tools. Here, we discuss some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A parsimony approach, called MinPath (Minimal set of Pathways), is developed for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset.
Abstract: A common biological pathway reconstruction approach—as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences—starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naive mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database—as compared to the naive mapping approach—eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities.

410 citations


Cites background from "Functional annotation of hypothetic..."

  • ...After all, many bacterial genomes have fewer than 60% of their genes assigned to a proposed function [9,10]....

    [...]

Journal ArticleDOI
TL;DR: The high duplication of CAZy domains coupled with the ability to acquire foreign genes by LGT may have allowed the bacterium to rapidly adapt to changing plant biomass-rich environments.
Abstract: Caldicellulosiruptor bescii DSM 6725 utilizes various polysaccharides and grows efficiently on untreated high-lignin grasses and hardwood at an optimum temperature of 80C. It is a promising anaerobic bacterium for studying high-temperature biomass conversion. Its genome contains 2666 protein- coding sequences organized into 1209 operons. Expression of 2196 genes (83%) was confirmed ex- perimentally. At least 322 genes appear to have been obtained by lateral gene transfer (LGT). Putative functions were assigned to 364 conserved/hypothetical protein (C/HP) genes. The genome contains 171 and 88 genes related to carbo- hydrate transport and utilization, respectively. Growth on cellulose led to the up-regulation of 32 carbohydrate-active (CAZy), 61 sugar transport, 25 transcription factor and 234 C/HP genes. Some C/HPs were overproduced on cellulose or xylan, suggesting their involvement in polysaccharide conversion. A unique feature of the genome is en- richment with genes encoding multi-modular, multi-functional CAZy proteins organized into one large cluster, the products of which are proposed to act synergistically on different components of plant cell walls and to aid the ability of C. bescii to convert plant biomass. The high duplication of CAZy domains coupled with the ability to acquire foreign genes by LGT may have allowed the bacterium to rapidly adapt to changing plant biomass-rich environments.

112 citations

Journal ArticleDOI
TL;DR: Comparisons with genes previously identified to be associated with diapause in the Dipteran Sarcophaga crassipalpis and with caste differentiation in bumble bees demonstrate an intriguing interplay between pathways underpinning adaptation to environmental extremes and the evolution of sociality in insects.
Abstract: Diapause is the key adaptation allowing insects to survive unfavourable conditions and inhabit an array of environments. Physiological changes during diapause are largely conserved across species and are hypothesized to be regulated by a conserved suite of genes (a 'toolkit'). Furthermore, it is hypothesized that in social insects, this toolkit was co-opted to mediate caste differentiation between long-lived, reproductive, diapause-capable queens and short-lived, sterile workers. Using Bombus terrestris queens, we examined the physiological and transcriptomic changes associated with diapause and CO2 treatment, which causes queens to bypass diapause. We performed comparative analyses with genes previously identified to be associated with diapause in the Dipteran Sarcophaga crassipalpis and with caste differentiation in bumble bees. As in Diptera, diapause in bumble bees is associated with physiological and transcriptional changes related to nutrient storage, stress resistance and core metabolic pathways. There is a significant overlap, both at the level of transcript and gene ontology, between the genetic mechanisms mediating diapause in B. terrestris and S. crassipalpis, reaffirming the existence of a conserved insect diapause genetic toolkit. However, a substantial proportion (10%) of the differentially regulated transcripts in diapausing queens have no clear orthologs in other species, and key players regulating diapause in Diptera (juvenile hormone and vitellogenin) appear to have distinct functions in bumble bees. We also found a substantial overlap between genes related to caste determination and diapause in bumble bees. Thus, our studies demonstrate an intriguing interplay between pathways underpinning adaptation to environmental extremes and the evolution of sociality in insects.

92 citations

Journal ArticleDOI
TL;DR: High-throughput proteomic approaches based on a shotgun strategy, and high-resolution mass spectrometers, have modified the authors’ view of exoproteomes and how these new approaches should be exploited to obtain the maximum useful information from a sample, whatever its origin.
Abstract: The term 'exoproteome' describes the protein content that can be found in the extracellular proximity of a given biological system. These proteins arise from cellular secretion, other protein export mechanisms or cell lysis, but only the most stable proteins in this environment will remain in abundance. It has been shown that these proteins reflect the physiological state of the cells in a given condition and are indicators of how living systems interact with their environments. High-throughput proteomic approaches based on a shotgun strategy, and high-resolution mass spectrometers, have modified the authors' view of exoproteomes. In the present review, the authors describe how these new approaches should be exploited to obtain the maximum useful information from a sample, whatever its origin. The methodologies used for studying secretion from model cell lines derived from eukaryotic, multicellular organisms, virulence determinants of pathogens and environmental bacteria and their relationships with their habitats are illustrated with several examples. The implication of such data, in terms of proteogenomics and the discovery of novel protein functions, is discussed.

85 citations

Journal ArticleDOI
TL;DR: This critical review discusses the complex nexus of chit in and chitinase and assesses both their pathogenic as well as utilitarian aspects.

69 citations