scispace - formally typeset
Search or ask a question

Showing papers by "Walter R. Gilks published in 2007"


Journal ArticleDOI
TL;DR: It is proposed that nematode genomes contain an alternative set of CNEs that share sequence characteristics, but not identity, with their vertebrate counterparts, and reflect the parallel evolution of alternative enhancers for a common set of developmental regulatory genes in different animal groups.
Abstract: Background: The human genome contains thousands of non-coding sequences that are often more conserved between vertebrate species than protein-coding exons. These highly conserved non-coding elements (CNEs) are associated with genes that coordinate development, and have been proposed to act as transcriptional enhancers. Despite their extreme sequence conservation in vertebrates, sequences homologous to CNEs have not been identified in invertebrates. Results: Here we report that nematode genomes contain an alternative set of CNEs that share sequence characteristics, but not identity, with their vertebrate counterparts. CNEs thus represent a very unusual class of sequences that are extremely conserved within specific animal lineages yet are highly divergent between lineages. Nematode CNEs are also associated with developmental regulatory genes, and include well-characterized enhancers and transcription factor binding sites, supporting the proposed function of CNEs as cis-regulatory elements. Most remarkably, 40 of 156 human CNE-associated genes with invertebrate orthologs are also associated with CNEs in both worms and flies. Conclusion: A core set of genes that regulate development is associated with CNEs across three animal groups (worms, flies and vertebrates). We propose that these CNEs reflect the parallel evolution of alternative enhancers for a common set of developmental regulatory genes in different animal groups. This 're-wiring' of gene regulatory networks containing key developmental coordinators was probably a driving force during the evolution of animal body plans. CNEs may, therefore, represent the genomic traces of these 'hard-wired' core gene regulatory networks that specify the development of each alternative animal body plan.

119 citations


Journal ArticleDOI
TL;DR: Using a previously developed automated method for enzyme annotation, the re-annotation of the ENZYME database is reported and the analysis of local error rates per class and it is demonstrated that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage.
Abstract: Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/ .

20 citations


Journal ArticleDOI
TL;DR: An inventory is collected, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.
Abstract: This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.

9 citations


Journal ArticleDOI
TL;DR: A pronounced information pattern around CNE borders is found; although the CNEs themselves are AT rich and have high entropy, they are flanked by GC-rich regions of low entropy (complexity) and like in human promoter regions, the TBP, NF-Y and some other binding motifs are clustered around C NE boundaries, which may suggest a possible transcription regulatory function of C NEs.
Abstract: Recently, a set of highly conserved non-coding elements (CNEs) has been derived from a comparison between the genomes of the puffer fish, Takifugu or Fugu rubripes, and man. In order to facilitate the identification of these conserved elements in silico, we characterize them by a number of statistical features. We found a pronounced information pattern around CNE borders; although the CNEs themselves are AT rich and have high entropy (complexity), they are flanked by GC-rich regions of low entropy (complexity). We also identified the most abundant motifs within and around of CNEs, and identified those that group around their borders. Like in human promoter regions, the TBP, NF-Y and some other binding motifs are clustered around CNE boundaries, which may suggest a possible transcription regulatory function of CNEs.

7 citations


Journal ArticleDOI
TL;DR: An efficient algorithm for the inversion of covariance matrices that arise in the context of phylogenetic tree construction is described and it is shown how under these assumptions the covariance tensor for a tree with n leaves can be inverted in O(n 2 ) operations.
Abstract: We describe an efficient algorithm for the inversion of covariance matrices that arise in the context of phylogenetic tree construction. Phylogenetic trees describe the evolutionary relationships between species, and their construction is computationally demanding. Many approaches involve the symmetric matrix of evolutionary distances between species. Regarding these distances as random variables, the corresponding set of variances and covariances form a rank-4 tensor, and the innerproduct defined by its inverse can be used to assign statistical scores to candidate trees. We describe a natural set of assumptions for the phylogenetic tree under construction, and show how under these assumptions the covariance tensor for a tree with n leaves can be inverted in O(n 2 ) operations. In addition to presenting the inversion algorithm, we hope this article will open algebraic and computational problems from the field of phylogeny to a wider audience.

1 citations


Journal ArticleDOI
TL;DR: In this paper, the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering is considered, and Correspondence Indicators are defined as measures of relationship between sequence and function and further formulated two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a function class.
Abstract: BACKGROUND: One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on the detection of sequence similarity and the premise that functional properties are conserved during evolution. Most automatic approaches developed to date rely on the identification of clusters of homologous proteins and the mapping of new proteins onto these clusters, which are expected to share functional characteristics. RESULTS: Here, we inverse the logic of this process, by considering the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering. In this mode, the starting point is a database of labelled proteins according to a functional classification scheme, and the subsequent use of sequence similarity allows defining the membership of new proteins to these functional classes. In this framework, we define the Correspondence Indicators as measures of relationship between sequence and function and further formulate two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a functional class. This approach allows the parametrisation of different sequence search strategies and provides a direct measure of annotation error rates. We validate this approach with a database of enzymes labelled by their corresponding four-digit EC numbers and analyse specific cases. CONCLUSION: The performance of this method is significantly higher than the simple strategy consisting in transferring the annotation from the highest scoring BLAST match and is expected to find applications in automated functional annotation pipelines.

1 citations