scispace - formally typeset
Search or ask a question

Showing papers by "Douglas B. Kell published in 2005"


Journal ArticleDOI
TL;DR: In this article, the authors present a review of clustering validation techniques for post-genomic data analysis, with a particular focus on their application to postgenomic analysis of biological data.
Abstract: Motivation: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge---whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics. Results: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation. Availability: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/ Contact: J.Handl@postgrad.manchester.ac.uk Supplementary information: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/

884 citations


Journal ArticleDOI
TL;DR: The principles, experimental approaches and scientific outcomes that have been obtained with this useful and convenient strategy to study the inner structure and behaviour of a system are reviewed.
Abstract: One element of classical systems analysis treats a system as a black or grey box, the inner structure and behaviour of which can be analysed and modelled by varying an internal or external condition, probing it from outside and studying the effect of the variation on the external observables. The result is an understanding of the inner make-up and workings of the system. The equivalent of this in biology is to observe what a cell or system excretes under controlled conditions - the 'metabolic footprint' or exometabolome - as this is readily and accurately measurable. Here, we review the principles, experimental approaches and scientific outcomes that have been obtained with this useful and convenient strategy.

432 citations


Journal ArticleDOI
TL;DR: A comprehensive comparison of total metabolites in field-grown GM and conventional potato tubers using a hierarchical approach initiating with rapid metabolome " fingerprinting" to guide more detailed profiling of metabolites where significant differences are suspected.
Abstract: There is current debate whether genetically modified (GM) plants might contain unexpected, potentially undesirable changes in overall metabolite composition. However, appropriate analytical technology and acceptable metrics of compositional similarity require development. We describe a comprehensive comparison of total metabolites in field-grown GM and conventional potato tubers using a hierarchical approach initiating with rapid metabolome “fingerprinting” to guide more detailed profiling of metabolites where significant differences are suspected. Central to this strategy are data analysis procedures able to generate validated, reproducible metrics of comparison from complex metabolome data. We show that, apart from targeted changes, these GM potatoes in this study appear substantially equivalent to traditional cultivars.

371 citations


Journal ArticleDOI
TL;DR: This short tutorial review and position paper seeks to set out some of the elements of “best practice” in the optimal acquisition of biological variables, and in the means by which they may be turned into reliable knowledge.
Abstract: Metabolomics, like other omics methods, produces huge datasets of biological variables, often accompanied by the necessary metadata. However, regardless of the form in which these are produced they are merely the ground substance for assisting us in answering biological questions. In this short tutorial review and position paper we seek to set out some of the elements of ‘‘best practice’’ in the optimal acquisition of such data, and in the means by which they may be turned into reliable knowledge. Many of these steps involve the solution of what amount to combinatorial optimization problems, and methods developed for these, especially those based on evolutionary computing, are proving valuable. This is done in terms of a ‘‘pipeline’’ that goes from the design of good experiments, through instrumental optimization, data storage and manipulation, the chemometric data processing methods in common use, and the necessary means of validation and cross-validation for giving conclusions that are credible and likely to be robust when applied in comparable circumstances to samples not used in their generation.

159 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe an entirely automated (closed-loop) strategy for doing this and apply it to the optimization of gas chromatographic separations of the metabolomes of human serum and of yeast fermentation broths.
Abstract: The number of instrumental parameters controlling modern analytical apparatus can be substantial, and varying them systematically to optimize a particular chromatographic separation, for example, is out of the question because of the astronomical number of combinations that are possible (i.e., the “search space” is very large). However, heuristic methods, such as those based on evolutionary computing, can be used to explore such search spaces efficiently. We here describe the implementation of an entirely automated (closed-loop) strategy for doing this and apply it to the optimization of gas chromatographic separations of the metabolomes of human serum and of yeast fermentation broths. Without human intervention, the Robot Chromatographer system (i) initializes the settings on the instrument, (ii) controls the analytical run, (iii) extracts the variables defining the analytical performance (specifically the number of peaks, signal/noise ratio, and run time), (iv) chooses (via the PESA-II multiobjective ge...

140 citations


Journal ArticleDOI
TL;DR: Genetic programming is exploited, a powerful data mining method, to identify patterns of metabolites that distinguish plasma from patients with pre-eclampsia from that taken from healthy, matched controls, and identification of the metabolites involved may lead to an improved understanding of the aetiological basis of pre- eClampsia and thus the development of targeted therapies.
Abstract: Pre-eclampsia is a multi-system disorder of pregnancy with major maternal and perinatal implications. Emerging therapeutic strategies are most likely to be maximally effective if commenced weeks or even months prior to the clinical presentation of the disease. Although widespread plasma alterations precede the clinical onset of pre-eclampsia, no single plasma constituent has emerged as a sensitive or specific predictor of risk. Consequently, currently available methods of identifying the condition prior to clinical presentation are of limited clinical use. We have exploited genetic programming, a powerful data mining method, to identify patterns of metabolites that distinguish plasma from patients with pre-eclampsia from that taken from healthy, matched controls. High-resolution gas chromatography time-of-flight mass spectrometry (GC-tof-MS) was performed on 87 plasma samples from women with pre-eclampsia and 87 matched controls. Normalised peak intensity data were fed into the Genetic Programming (GP) system which was set up to produce a model that gave an output of 1 for patients and 0 for controls. The model was trained on 50% of the data generated and tested on a separate hold-out set of 50%. The model generated by GP from the GC-tof-MS data identified a metabolomic pattern that could be used to produce two simple rules that together discriminate pre-eclampsia from normal pregnant controls using just 3 of the metabolite peak variables, with a sensitivity of 100% and a specificity of 98%. Thus, pre-eclampsia can be diagnosed at the level of small-molecule metabolism in blood plasma. These findings justify a prospective assessment of metabolomic technology as a screening tool for pre-eclampsia, while identification of the metabolites involved may lead to an improved understanding of the aetiological basis of pre-eclampsia and thus the development of targeted therapies.

124 citations


Journal ArticleDOI
TL;DR: This paper generalises a previously-described model of the error-prone polymerase chain reaction (PCR) reaction to conditions of arbitrarily variable amplification efficiency and initial population size, and restricts the extent to which the model may explore sequence space for a prescribed set of parameters.

122 citations


Journal ArticleDOI
TL;DR: To determine the utility of vacuum‐packed polythene bags as a convenient, flexible and cost‐effective alternative to fixed volume glass vessels for lab‐scale silage studies.
Abstract: H.E. JOHNSON, R.J. MERRY, D.R. DAVIES, D.B. KELL, M.K. THEODOROU AND G.W. GRIFFITH. 2004. Aims: To determine the utility of vacuum-packed polythene bags as a convenient, flexible and cost-effective alternative to fixed volume glass vessels for lab-scale silage studies. Methods and Results: Using perennial ryegrass or red clover forage, similar fermentations (as assessed by pH measurement) occurred in glass tube and vacuum-packed silos over a 35-day period. As vacuum-packing devices allow modification of initial packing density, the effect of four different settings (initial packing densities of 0AE397, 0AE435, 0AE492 and 0AE534 g cm )3 ) on the silage fermentation over 16 days was examined. Significant differences in pH decline and lactate accumulation were observed at different vacuum settings. Gas accumulation was apparent within all bags and changes in bag volume with time was observed to vary according to initial packing density. Conclusions: Vacuum-packed silos do provide a realistic model system for lab-scale silage fermentations. Significance and Impact of the Study: Use of vacuum-packed silos holds potential for lab-scale evaluations of silage fermentations, allowing higher throughput of samples, more consistent packing as well as the possibility of investigating the effects of different initial packing densities and use of different wrapping materials.

70 citations


01 Jan 2005
TL;DR: The role in part is to explain why this type of mathematical model is both useful and important, and will likely become part of the standard armory of successful biologists.
Abstract: The use of models in biology is at once both familiar and arcane. It is familiar because, as we shall argue, biologists presently and regularly use models as abstractions of reality: diagrams, laws, graphs, plots, relationships, chemical formulae and so on are all essentially models of some external reality that we are trying to describe and understand (Fig. 1.1). In the same way we use and speak of ‘model organisms’ such as baker’s yeast or Arabidopsis thaliana, whose role lies in being similar to many organisms without being the same as any other one. Indeed, our theories and hypotheses about biological objects and systems are in one sense also just models. Yet the use of models is for most biologists arcane because familiarity with a subset of model types, especially quantitative mathematical models, has lain outside the mainstream during the last 50 years of the purposely reductionist and qualitative era of molecular biology. It is largely these types of model that are an integral part of the ‘new’ (and not-so-new) Systems Biology and on which much of the rest of this book concentrates. Since all such models are developed for some kind of a purpose, our role in part is to explain why this type of mathematical model is both useful and important, and will likely become part of the standard armory of successful biologists.

61 citations


Journal ArticleDOI
27 Dec 2005
TL;DR: Analysis of dynamic models of part of the NF-kappaB signalling pathway reveals a level of complexity in these dynamic models that is not apparent from study of their individual parameters alone and point to the value of manipulating multiple elements of complex networks to achieve desired physiological effects.
Abstract: In previous work, we studied the behaviour of a model of part of the NF-kappaB signalling pathway. The model displayed oscillations that varied both in number, amplitude and frequency when its parameters were varied. Sensitivity analysis showed that just nine of the 64 reaction parameters were mainly responsible for the control of the oscillations when these parameters were varied individually. However, the control of the properties of any complex system is distributed, and, as many of these reactions are highly non-linear, we expect that their interactions will be too. Pairwise modulation of these nine parameters gives a search space some 50 times smaller (81 against 4096) than that required for the pairwise modulation of all 64 reactions, and this permitted their study (which would otherwise have been effectively intractable). Strikingly synergistic effects were observed, in which the effect of one of the parameters was strongly (and even qualitatively) dependent on the values of another parameter. Regions of parameter space could be found in which the amplitude, but not the frequency (timing), of oscillations varied, and vice versa. Such modelling will permit the design and performance of experiments aimed at disentangling the role of the dynamics of oscillations, rather than simply their amplitude, in determining cell fate. Overall, the analyses reveal a level of complexity in these dynamic models that is not apparent from study of their individual parameters alone and point to the value of manipulating multiple elements of complex networks to achieve desired physiological effects.

57 citations


Journal ArticleDOI
TL;DR: It is determined that many biological signals may be frequency-rather than amplitude-encoded, and closed loop machine learning and its use in the optimization of scientific instrumentation, and the ability to effect high-quality and quasi-continuous optical images of cells are determined.
Abstract: In answering the question 'Systems Biology--will it work?' (which it self-evidently has already), it is appropriate to highlight advances in philosophy, in new technique development and in novel findings. In terms of philosophy, we see that systems biology involves an iterative interplay between linked activities--instance, between theory and experiment, between induction and deduction and between measurements of parameters and variables--with more emphasis than has perhaps been common now being focused on the first in each of these pairs. In technique development, we highlight closed loop machine learning and its use in the optimization of scientific instrumentation, and the ability to effect high-quality and quasi-continuous optical images of cells. This leads to many important and novel findings. In the first case, these may involve new biomarkers for disease, whereas in the second case, we have determined that many biological signals may be frequency-rather than amplitude-encoded. This leads to a very different view of how signalling 'works' (equations such as that of Michaelis and Menten which use only amplitudes, i.e. concentrations, are inadequate descriptors), lays emphasis on the signal processing network elements that lie 'downstream' of what are traditionally considered the signals, and allows one simply to understand how cross-talk may be avoided between pathways which nevertheless use common signalling elements. The language of cells is much richer than we had supposed, and we are now well placed to decode it.

01 Jan 2005
TL;DR: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis.

Journal ArticleDOI
TL;DR: maxdLoad2 and maxdBrowse are portable and compatible with all common operating systems and major database servers and provide a powerful, flexible package for annotation of microarray experiments and a convenient dissemination environment.
Abstract: maxdLoad2 is a relational database schema and Java® application for microarray experimental annotation and storage. It is compliant with all standards for microarray meta-data capture; including the specification of what data should be recorded, extensive use of standard ontologies and support for data exchange formats. The output from maxdLoad2 is of a form acceptable for submission to the ArrayExpress microarray repository at the European Bioinformatics Institute. maxdBrowse is a PHP web-application that makes contents of maxdLoad2 databases accessible via web-browser, the command-line and web-service environments. It thus acts as both a dissemination and data-mining tool.

Journal ArticleDOI
TL;DR: A simple approach to the screening of metabolic information that will be valuable in generating metabolomic data is demonstrated and this technique was subsequently used to generate metabolic footprints from cell-free supernatants and enabled to discriminate haploid yeast single-gene deletants (mutants).
Abstract: The importance of metabolomic data in functional genomic investigations is increasingly becoming evident, as is its utility in novel biomarker discovery. We demonstrate a simple approach to the screening of metabolic information that we believe will be valuable in generating metabolomic data. Laser desorption ionisation mass spectrometry on porous silicon was effective in detecting 22 of 30 metabolites in a mixture in the negative-ion mode and 19 of 30 metabolites in the positive-ion mode, without the employment of any prior analyte separation steps. Overall, 26 of the 30 metabolites could be covered between the positive and negative-ion modes. Although the response for the metabolites at a given concentration differed, it was possible to generate direct quantitative information for a given analyte in the mixture. This technique was subsequently used to generate metabolic footprints from cell-free supernatants and, when combined with chemometric analysis, enabled us to discriminate haploid yeast single-gene deletants (mutants). In particular, the metabolic footprint of a deletion mutant in a gene encoding a transcriptional activator (Gln3p) showed increased levels of peaks, including one corresponding to glutamate, compared to the other mutants and the wild-type strain tested, enabling its discrimination based on metabolic information.

Journal ArticleDOI
01 Apr 2005-Science
TL;DR: In this paper, the experimental data showed no correlation between NF-kappa B (ReLA) expression level and oscillation dynamics, and a small change to the computational model used by Barken et al. to generate their theoretical data reduced the apparent discrepancies.
Abstract: Our experimental data shows no correlation between NF-kappa B (ReLA) expression level and oscillation dynamics. We show that a small change to the computational model used by Barken et al. to generate their theoretical data reduces the apparent discrepancies. Cell system differences and possible compensatory changes to normal signaling in their genetically engineered knockout cells may explain differences between the two studies.

01 Jan 2005
TL;DR: In this article, a short tutorial review and position paper is presented to set out some of the elements of "best practice" in the optimal acquisition of such data, and in the means by which they may be turned into reliable knowledge.
Abstract: Metabolomics, like other omics methods, produces huge datasets of biological variables, often accompanied by the necessary metadata. However, regardless of the form in which these are produced they are merely the ground substance for assisting us in answering biological questions. In this short tutorial review and position paper we seek to set out some of the elements of “best practice” in the optimal acquisition of such data, and in the means by which they may be turned into reliable knowledge. Many of these steps involve the solution of what amount to combinatorial optimization problems, and methods developed for these, especially those based on evolutionary computing, are proving valuable. This is done in terms of a “pipeline” that goes from the design of good experiments, through instrumental optimization, data storage and manipulation, the chemometric data processing methods in common use, and the necessary means of validation and cross-validation for giving conclusions that are credible and likely to be robust when applied in comparable circumstances to samples not used in their generation.