scispace - formally typeset
Search or ask a question

Showing papers by "Douglas B. Kell published in 2007"


Journal ArticleDOI
TL;DR: A list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact is provided, and a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers are suggested.
Abstract: Many metabolomics, and other high-content or high-throughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case’ and ‘control’ samples. However, it is unfortunately very easy to find markers that are apparently persuasive but that are in fact entirely spurious, and there are well-known examples in the proteomics literature. The main types of danger are not entirely independent of each other, but include bias, inadequate sample size (especially relative to the number of metabolite variables and to the required statistical power to prove that a biomarker is discriminant), excessive false discovery rate due to multiple hypothesis testing, inappropriate choice of particular numerical methods, and overfitting (generally caused by the failure to perform adequate validation and cross-validation). Many studies fail to take these into account, and thereby fail to discover anything of true significance (despite their claims). We summarise these problems, and provide pointers to a substantial existing literature that should assist in the improved design and evaluation of metabolomics experiments, thereby allowing robust scientific conclusions to be drawn from the available data. We provide a list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact, and suggest a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers. These tools can be applied to individual metabolites by using multiple univariate tests performed in parallel across all metabolite peaks. They may also be applied to the validation of multivariate models. We stress in particular that classical p-values such as “p < 0.05”, that are often used in biomedicine, are far too optimistic when multiple tests are done simultaneously (as in metabolomics). Ultimately it is desirable that all data and metadata are available electronically, as this allows the entire community to assess conclusions drawn from them. These analyses apply to all high-dimensional ‘omics’ datasets.

747 citations


Journal ArticleDOI
TL;DR: The goal of this group is to define the reporting requirements associated with the statistical analysis of metabolite data with respect to other measured/collected experimental data (often called meta-data).
Abstract: The goal of this group is to define the reporting requirements associated with the statistical analysis (including univariate, multivariate, informatics, machine learning etc.) of metabolite data with respect to other measured/collected experimental data (often called meta-data). These definitions will embrace as many aspects of a complete metabolomics study as possible at this time. In chronological order this will include: Experimental Design, both in terms of sample collection/matching, and data acquisition scheduling of samples through whichever spectroscopic technology used; Deconvolution (if required); Pre-processing, for example, data cleaning, outlier detection, row/column scaling, or other transformations; Definition and parameterization of subsequent visualizations and Statistical/Machine learning Methods applied to the dataset; If required, a clear definition of the Model Validation Scheme used (including how data are split into training/validation/test sets); Formal indication on whether the data analysis has been Independently Tested (either by experimental reproduction, or blind hold out test set). Finally, data interpretation and the visual representations and hypotheses obtained from the data analyses.

397 citations


Journal ArticleDOI
TL;DR: This work constitutes a first comprehensive systems biology study on growth-rate control in the eukaryotic cell and has direct implications for advanced studies on cell growth, in vivo regulation of metabolic fluxes for comprehensive metabolic engineering, and for the design of genome-scale systems biology models of the eUKaryoticcell.
Abstract: Background: Cell growth underlies many key cellular and developmental processes, yet a limited number of studies have been carried out on cell-growth regulation. Comprehensive studies at the transcriptional, proteomic and metabolic levels under defined controlled conditions are currently lacking. Results: Metabolic control analysis is being exploited in a systems biology study of the eukaryotic cell. Using chemostat culture, we have measured the impact of changes in flux (growth rate) on the transcriptome, proteome, endometabolome and exometabolome of the yeast Saccharomyces cerevisiae. Each functional genomic level shows clear growth-rateassociated trends and discriminates between carbon-sufficient and carbon-limited conditions. Genes consistently and significantly upregulated with increasing growth rate are frequently

289 citations


Journal ArticleDOI
TL;DR: This paper reviews the application of multiobjective optimization in the fields of bioinformatics and computational biology and identifies five distinct "contexts," giving rise to multiple objectives.
Abstract: This paper reviews the application of multiobjective optimization in the fields of bioinformatics and computational biology. A survey of existing work, organized by application area, forms the main body of the review, following an introduction to the key concepts in multiobjective optimization. An original contribution of the review is the identification of five distinct "contexts,” giving rise to multiple objectives: These are used to explain the reasons behind the use of multiobjective optimization in each application area and also to point the way to potential future uses of the technique.

274 citations


Journal ArticleDOI
TL;DR: The findings demonstrate the power of data-driven metabolomics approaches to identify biomarkers of heart failure and not renal disease per se and 3 further new compounds were also excellent discriminators between patients and controls.
Abstract: There is intense interest in the identification of novel biomarkers which improve the diagnosis of heart failure. Serum samples from 52 patients with systolic heart failure (EF <40% plus signs and symptoms of failure) and 57 controls were analyzed by gas chromatography – time of flight – mass spectrometry and the raw data reduced to 272 statistically robust metabolite peaks. 38 peaks showed a significant difference between case and control (p<5·10)5). Two such metabolites were pseudouridine, a modified nucleotide present in t- and rRNA and a marker of cell turnover, as well as the tricarboxylic acid cycle intermediate 2-oxoglutarate. Furthermore, 3 further new compounds were also excellent discriminators between patients and controls: 2-hydroxy, 2- methylpropanoic acid, erythritol and 2,4,6-trihydroxypyrimidine. Although renal disease may be associated with heart failure, and metabolites associated with renal disease and other markers were also elevated (e.g. urea, creatinine and uric acid), there was no correlation within the patient group between these metabolites and our heart failure biomarkers, indicating that these were indeed biomarkers of heart failure and not renal disease per se. These findings demonstrate the power of data-driven metabolomics approaches to identify such markers of disease.

156 citations


Journal ArticleDOI
TL;DR: In this article, data sets acquired using GC with time-of-flight MS (GC-TOF-MS) were processed using three different deconvolution software packages (LECO ChromaTOF, AMDIS and SpectralWorks AnalyzerPro).
Abstract: Traditional options available for deconvolution of data from gas chromatography-mass spectrometry (GC-MS) experiments have mostly been confined to semi-automated methods, which cannot compete with high-throughput and rapid analysis in metabolomics. In the present study, data sets acquired using GC with time-of-flight MS (GC-TOF-MS) were processed using three different deconvolution software packages (LECO ChromaTOF, AMDIS and SpectralWorks AnalyzerPro). We paid attention to the extent of detection, identification and agreement of qualitative results, and took interest in the flexibility and the productivity of these programs in their application. We made comparisons using data from the analysis of a test-mixture solution of 36 endogenous metabolites with a wide range of relative concentration ratios. We detected differences in the number of components identified and the accuracy of deconvolution. Using the AMDIS Search program, the resulting mass spectra after deconvolution were searched against the author-constructed retention index/mass spectral libraries containing both the mass spectra and the retention indices of derivatives of a set of metabolites. We based analyte identifications on both retention indices and spectral similarity. The results showed that there were large differences in the numbers of components identified and the qualitative results from the three programs. AMDIS and ChromaTOF produced a large number of false positives, while AnalyzerPro produced some false negatives. We found that, in these three software packages, component width is the most important parameter for predicting the accuracy of the deconvoluted result.

143 citations


Journal ArticleDOI
01 Aug 2007-Yeast
TL;DR: Metabolomic and genomic analysis comparison of the nine brewing yeasts identified metabolomics as a powerful tool in separating genotypically and phenotypically similar strains.
Abstract: The characterization of industrial yeast strains by examining their metabolic footprints (exometabolomes) was investigated and compared to genome-based discriminatory methods. A group of nine industrial brewing yeasts was studied by comparing their metabolic footprints, genetic fingerprints and comparative genomic hybridization profiles. Metabolic footprinting was carried out by both direct injection mass spectrometry (DIMS) and gas chromatography time-of-flight mass spectrometry (GC-TOF-MS), with data analysed by principal components analysis (PCA) and canonical variates analysis (CVA). The genomic profiles of the nine yeasts were compared by PCR-restriction fragment length polymorphism (PCR-RFLP) analysis, genetic fingerprinting using amplified fragment length polymorphism (AFLP) analysis and microarray comparative genome hybridizations (CGH). Metabolomic and genomic analysis comparison of the nine brewing yeasts identified metabolomics as a powerful tool in separating genotypically and phenotypically similar strains. For some strains discrimination not achieved genomically was observed metabolomically.

110 citations


Journal ArticleDOI
TL;DR: This paper outlines a method for developing a kinetic model for a metabolic network, based solely on the knowledge of reaction stoichiometries, and observes an excellent agreement between the real and approximate models.
Abstract: Two divergent modelling methodologies have been adopted to increase our understanding of metabolism and its regulation. Constraint-based modelling highlights the optimal path through a stoichiometric network within certain physicochemical constraints. Such an approach requires minimal biological data to make quantitative inferences about network behaviour; however, constraint-based modelling is unable to give an insight into cellular substrate concentrations. In contrast, kinetic modelling aims to characterize fully the mechanics of each enzymatic reaction. This approach suffers because parameterizing mechanistic models is both costly and time-consuming. In this paper, we outline a method for developing a kinetic model for a metabolic network, based solely on the knowledge of reaction stoichiometries. Fluxes through the system, estimated by flux balance analysis, are allowed to vary dynamically according to linlog kinetics. Elasticities are estimated from stoichiometric considerations. When compared to a popular branched model of yeast glycolysis, we observe an excellent agreement between the real and approximate models, despite the absence of (and indeed the requirement for) experimental data for kinetic constants. Moreover, using this particular methodology affords us analytical forms for steady state determination, stability analyses and studies of dynamical behaviour.

108 citations


Journal ArticleDOI
TL;DR: This work extended the closed-loop optimization system that was previously developed for one-dimensional GC-TOF-MS to comprehensive two-dimensional (GCxGC) chromatography, and improved the number of metabolites observable relative to those in 1D GC by some 3-fold.
Abstract: Metabolomics seeks to measure potentially all the metabolites in a biological sample, and consequently, we need to develop and optimize methods to increase significantly the number of metabolites we can detect. We extended the closed-loop (iterative, automated) optimization system that we had previously developed for one-dimensional GC-TOF-MS (O'Hagan, S.; Dunn, W. B.; Brown, M.; Knowles, J. D.; Kell, D. B. Anal. Chem. 2005, 77, 290-303) to comprehensive two-dimensional (GCxGC) chromatography. The heuristic approach used was a multiobjective version of the efficient global optimization algorithm. In just 300 automated runs, we improved the number of metabolites observable relative to those in 1D GC by some 3-fold. The optimized conditions allowed for the detection of over 4000 raw peaks, of which some 1800 were considered to be real metabolite peaks and not impurities or peaks with a signal/noise ratio of less than 5. A variety of computational methods served to explain the basis for the improvement. This closed-loop optimization strategy is a generic and powerful approach for the optimization of any analytical instrumentation.

98 citations


01 Jan 2007
TL;DR: Bioinformatics is a discipline that uses computational and mathematical techniques to store, manage, and analyze biological data in order to answer biological questions as mentioned in this paper, and is an in silico science discipline.
Abstract: Bioinformatics is a discipline that uses computational and mathematical techniques to store, manage, and analyze biological data in order to answer biological questions. Bioinformatics has over 850 databases [154] and numerous tools that work over those databases and local data to produce even more data themselves. In order to perform an analysis, a bioinformatician uses one or more of these resources to gather, filter, and transform data to answer a question. Thus, bioinformatics is an in silico science.

94 citations


Book ChapterDOI
01 Jan 2007
TL;DR: Bioinformatics is a discipline that uses computational and mathematical techniques to store, manage, and analyze biological data in order to answer biological questions as mentioned in this paper, and is an in silico science discipline.
Abstract: Bioinformatics is a discipline that uses computational and mathematical techniques to store, manage, and analyze biological data in order to answer biological questions. Bioinformatics has over 850 databases [154] and numerous tools that work over those databases and local data to produce even more data themselves. In order to perform an analysis, a bioinformatician uses one or more of these resources to gather, filter, and transform data to answer a question. Thus, bioinformatics is an in silico science.

Journal ArticleDOI
TL;DR: There is an emerging recognition of the importance of modelling large‐scale biochemical systems, with the ‘digital human’ an obviously desirable goal, and existing and developing standards are beginning to permit the principled storage and exchange of biochemical network models.
Abstract: There is an emerging recognition of the importance of modelling large-scale biochemical systems, with the 'digital human' an obviously desirable goal. This will then permit researchers to analyse the behaviour of such systems in silico so as to be able to perform 'what-if?' experiments prior to determining whether they are actually worthwhile or not, and for understanding whether a particular model does in fact describe or predict experimental observations. Existing and developing standards such as SBML are beginning to permit the principled storage and exchange of biochemical network models, while environments for effecting distributed workflows (such as Taverna) will allow us to link together these models and their behaviour. This allows the local experts to work on those parts of cellular or organellar biochemistry on which they have most expertise, while making their results available to the community as a whole. This kind of architecture permits the distributed yet integrated goal of an evolving 'digital human' model to be realized.

Journal ArticleDOI
TL;DR: There are many reasons why it is appropriate to concentrate on the metabolome (BOX 1), the most significant being that it hinges upon the properties of networks, and is thus an issue of systems biology.
Abstract: With the mainstream concentration during the reductionist molecular biology era being on qualitative studies of macromolecules, metabolism has become the Cinderella subject of this period [1]. Howe...

Book ChapterDOI
01 Jan 2007
TL;DR: The chapter summarizes the philosophical status of a variety of sciences, such as biology, physics, and molecular biology that have made systems biologically possible.
Abstract: Publisher Summary The chapter summarizes the philosophical status of a variety of sciences, such as biology, physics, and molecular biology that have made systems biologically possible. The philosophy of physics states that science deduces predictions from hypotheses that can be verified. More generally, physics aims to explain multiple phenomena on the basis of simpler and fewer principles. While theoretical physics is both respectable and a major part of the activities of the physicists, theoretical biology is a minor part of modern biology. Much of that science of biology accepted the diversity that appeared to inhabit the biosphere: organisms were classified and compared, and their behavior was studied in the sense of establishing correlations between properties. The hypotheses and the activities of molecular biology became (intentionally) largely qualitative, and the concepts became comparative so that their tests (verifications/falsifications) could give a digital “yes/no” answer. It is now possible to purify many or most of the water-soluble macromolecules that are active in living cells and determine their structure by X-ray crystallography. On the other hand, cell biology defines the organization of life at the cellular level in qualitative terms of its molecules.

Proceedings ArticleDOI
09 Jul 2007
TL;DR: Based on a simplified model of the IkappaBalpha-NF-kappaB signal transduction pathway, global sensitivity analysis has been performed to identify those parameters that exert significant control on the system outputs.
Abstract: Based on a simplified model of the (TNF-alpha mediated) IkappaBalpha-NF-kappaB signal transduction pathway, global sensitivity analysis has been performed to identify those parameters that exert significant control on the system outputs. The permutation operation in Morris method is modified to work for log-uniform sampling parameters. The identified sensitive parameters are then estimated using multivariable search such that the output of the model matches experimental data representing the nuclear concentration of NF-kappaB. Such parameter tuning leads to much better agreement between the model and the experimental time series relative to those previously published. This shows the importance of global sensitivity analysis in Systems Biology models.

Proceedings ArticleDOI
07 Jul 2007
TL;DR: Genetic programming is used to develop a function which predicts the detectability of peptides from their calculated physico-chemical properties and the success of this method is found to be highly dependent on the initial selection of input parameters.
Abstract: The accurate quantification of proteins is important in several areas of cell biology, biotechnology and medicine. Both relative and absolute quantification of proteins is often determined following mass spectrometric analysis of one or more of their constituent peptides. However, in order for quantification to be successful, it is important that the experimenter knows which peptides are readily detectable under the mass spectrometric conditions used for analysis. In this paper, genetic programming is used to develop a function which predicts the detectability of peptides from their calculated physico-chemical properties. Classification is carried out in two stages: the selection of a good classifier using the AUROC objective function and the setting of an appropriate threshold. This allows the user to select the balance point between conflicting priorities in an intuitive way. The success of this method is found to be highly dependent on the initial selection of input parameters. The use of brood recombination and a modified version of the multi-objective FOCUS method are also investigated. While neither has a significant effect on predictive accuracy, the use of the FOCUS method leads to considerably more compact solutions.

01 Jan 2007
TL;DR: Bioinformatics is a discipline that uses computational and mathematical techniques to store, manage, and analyze biological data in order to answer biological questions.



Proceedings Article
09 Jul 2007
TL;DR: Genetic programming is used to develop a function which predicts the detectability of peptides from their calculated physico-chemical properties and the success of this method is found to be highly dependent on the initial selection of input parameters.
Abstract: The accurate quantification of proteins is important in several areas of cell biology, biotechnology and medicine. Both relative and absolute quantification of proteins is often determined following mass spectrometric analysis of one or more of their constituent peptides. However, in order for quantification to be successful, it is important that the experimenter knows which peptides are readily detectable under the mass spectrometric conditions used for analysis. In this paper, genetic programming is used to develop a function which predicts the detectability of peptides from their calculated physico-chemical properties. Classification is carried out in two stages: the selection of a good classifier using the AUROC objective function and the setting of an appropriate threshold. This allows the user to select the balance point between conflicting priorities in an intuitive way. The success of this method is found to be highly dependent on the initial selection of input parameters. The use of brood recombination and a modified version of the multi-objective FOCUS method are also investigated. While neither has a significant effect on predictive accuracy, the use of the FOCUS method leads to considerably more compact solutions.

01 Jan 2007
TL;DR: Sensitivity analysis is normally used to analyze how sensitive a system is with respect to the change of parameters or initial conditions and is perhaps best known in systems biology via the formalism of metabolic control analysis.
Abstract: Sensitivity analysis is normally used to analyze how sensitive a system is with respect to the change of parameters or initial conditions and is perhaps best known in systems biology via the formalism of metabolic control analysis [1, 2]. The nuclear factor B (NF-B) signalling pathway is an important cellular signalling pathway, of which protein phosphorylation is a major factor controlling the activation of further downstream events. The NF-κB proteins regulate numerous genes that play important roles in inter- and intra-cellular signalling, cellular stress responses, cell growth, survival, and apoptosis. As such, its specificity and its role in the temporal control of gene expression are of crucial physiological interest.

Journal ArticleDOI
TL;DR: Two mathematical models are created in order to better understand the mechanisms of PKA activation and to investigate the complete cAMP pathway.
Abstract: Background The small, diffusible molecule cAMP plays a key signalling role in almost all organisms. In S. cerevisiae, cAMP is synthesized by adenylate cyclase [1], and hydrolyzed by the phosphodiesterases Pde1p and Pde2p [2,3]. The only function of cAMP in yeast is to activate PKA (Protein Kinase A). A molecule of PKA is a tetramer consisting of two catalytic (C) and two regulatory (R) subunits. Cyclic AMP binds to the R subunit, allowing its dissociation from C, allowing C to become catalytically active. PKA is believed to activate Pde1p [4], as well as indirectly inhibit the activity of adenylate cyclase [5]. We have created two mathematical models (one simplified and one detailed) in order to better understand the mechanisms of PKA activation. We have also created a model to investigate the complete cAMP pathway.