scispace - formally typeset
Search or ask a question

Showing papers by "Douglas B. Kell published in 2008"


Journal ArticleDOI
TL;DR: This work describes how it has produced a consensus metabolic network reconstruction for S. cerevisiae, and places special emphasis on referencing molecules to persistent databases or using database-independent forms, such as SMILES or InChI strings, as this permits their chemical structure to be represented unambiguously and in a manner that permits automated reasoning.
Abstract: Genomic data allow the large-scale manual or semi-automated assembly of metabolic network reconstructions, which provide highly curated organism-specific knowledge bases. Although several genome-scale network reconstructions describe Saccharomyces cerevisiae metabolism, they differ in scope and content, and use different terminologies to describe the same chemical entities. This makes comparisons between them difficult and underscores the desirability of a consolidated metabolic network that collects and formalizes the 'community knowledge' of yeast metabolism. We describe how we have produced a consensus metabolic network reconstruction for S. cerevisiae. In drafting it, we placed special emphasis on referencing molecules to persistent databases or using database-independent forms, such as SMILES or InChI strings, as this permits their chemical structure to be represented unambiguously and in a manner that permits automated reasoning. The reconstruction is readily available via a publicly accessible database and in the Systems Biology Markup Language (http://www.comp-sys-bio.org/yeastnet). It can be maintained as a resource that serves as a common denominator for studying the systems biology of yeast. Similar strategies should benefit communities studying genome-scale metabolic networks of other organisms.

605 citations


Repository
TL;DR: It is argued that the role of poorly liganded iron has been rather underappreciated in the past, and that in combination with peroxide and superoxide its activity underpins the behaviour of a great many physiological processes that degrade over time.
Abstract: The production of peroxide and superoxide is an inevitable consequence of aerobic metabolism, and while these particular "reactive oxygen species" (ROSs) can exhibit a number of biological effects, they are not of themselves excessively reactive and thus they are not especially damaging at physiological concentrations. However, their reactions with poorly liganded iron species can lead to the catalytic production of the very reactive and dangerous hydroxyl radical, which is exceptionally damaging, and a major cause of chronic inflammation. We review the considerable and wide-ranging evidence for the involvement of this combination of (su)peroxide and poorly liganded iron in a large number of physiological and indeed pathological processes and inflammatory disorders, especially those involving the progressive degradation of cellular and organismal performance. These diseases share a great many similarities and thus might be considered to have a common cause (i.e. iron-catalysed free radical and especially hydroxyl radical generation). The studies reviewed include those focused on a series of cardiovascular, metabolic and neurological diseases, where iron can be found at the sites of plaques and lesions, as well as studies showing the significance of iron to aging and longevity. The effective chelation of iron by natural or synthetic ligands is thus of major physiological (and potentially therapeutic) importance. As systems properties, we need to recognise that physiological observables have multiple molecular causes, and studying them in isolation leads to inconsistent patterns of apparent causality when it is the simultaneous combination of multiple factors that is responsible. This explains, for instance, the decidedly mixed effects of antioxidants that have been observed, etc...

451 citations


Journal ArticleDOI
TL;DR: Evidence is discussed supporting the idea that rather than being an exception, carrier-mediated and active uptake of drugs may be more common than is usually assumed and the implications for drug discovery and development are considered.
Abstract: It is generally thought that many drug molecules are transported across biological membranes via passive diffusion at a rate related to their lipophilicity. However, the types of biophysical forces involved in the interaction of drugs with lipid membranes are no different from those involved in their interaction with proteins, and so arguments based on lipophilicity could also be applied to drug uptake by membrane transporters or carriers. In this article, we discuss the evidence supporting the idea that rather than being an exception, carrier-mediated and active uptake of drugs may be more common than is usually assumed - including a summary of specific cases in which drugs are known to be taken up into cells via defined carriers - and consider the implications for drug discovery and development.

438 citations


Journal ArticleDOI
TL;DR: A range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places are examined.
Abstract: Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as “thought in cold storage,” and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.

206 citations


Journal ArticleDOI
TL;DR: Two parameters (snthresh and bw) provided significant changes to the number of peaks detected and the peak area reproducibility for the dataset used and showed both the instruments and XCMS to be applicable to the reproduced and valid detection of disease biomarkers present in serum.

197 citations


Journal ArticleDOI
TL;DR: Based on this small pilot study, the UK Biobank sampling, transport and fractionation protocols are considered suitable to provide samples, which can produce scientifically robust and valid data in metabolomic studies.
Abstract: BACKGROUND: The stability of mammalian serum and urine in large metabolomic investigations is essential for accurate, valid and reproducible studies. The stability of mammalian serum and urine, either processed immediately by freezing at -80 degrees C or stored at 4 degrees C for 24 h before being frozen, was compared in a pilot metabolomic study of samples from 40 separate healthy volunteers. METHODS: Metabolic profiling with GC-TOF-MS was performed for serum and urine samples collected from 40 volunteers and stored at -80 degrees C or 4 degrees C for 24 h before being frozen at -80 degrees C. Subsequent Wilcoxon rank sum test and Principal Components Analysis (PCA) methods were used to assess whether differences in the metabolomes were detected between samples stored at 4 degrees C for 0 or 24 h. RESULTS: More than 700 unique metabolite peaks were detected, with over 200 metabolite peaks detected in any one sample. PCA and Wilcoxon rank sum tests of serum and urine data showed as a general observation that the variance associated with the replicate analysis per sample (analytical variance) was of the same magnitude as the variance observed between samples stored at 4 degrees C for 0 or 24 h. From a functional point of view the metabolomic composition of the majority of samples did not change in a statistically significant manner when stored under two different conditions. CONCLUSIONS: Based on this small pilot study, the UK Biobank sampling, transport and fractionation protocols are considered suitable to provide samples, which can produce scientifically robust and valid data in metabolomic studies.

137 citations


Journal ArticleDOI
TL;DR: It is shown that any such system can be treated as a ‘communication channel’ for which the associations between inputs and outputs can be quantified via a decomposition of their mutual information into different components characterizing the main effect of individual inputs and their interactions.
Abstract: Most systems can be represented as networks that couple a series of nodes to each other via one or more edges, with typically unknown equations governing their quantitative behaviour. A major question then pertains to the importance of each of the elements that act as system inputs in determining the output(s). We show that any such system can be treated as a ‘communication channel’ for which the associations between inputs and outputs can be quantified via a decomposition of their mutual information into different components characterizing the main effect of individual inputs and their interactions. Unlike variance-based approaches, our novel methodology can easily accommodate correlated inputs.

124 citations


Journal ArticleDOI
TL;DR: Using competition experiments in continuous cultures grown in different nutrient environments, genes that show haploinsufficiency phenotypes or haploproficient phenotypes are identified and this chromosome determines a yeast's mating type, and the concentration of haplOinsufficient genes there may be a mechanism to prevent its loss.
Abstract: Using competition experiments in continuous cultures grown in different nutrient environments (glucose limited, ammonium limited, phosphate limited and white grape juice), we identified genes that show haploinsufficiency phenotypes (reduced growth rate when hemizygous) or haploproficiency phenotypes (increased growth rate when hemizygous). Haploproficient genes (815, 1,194, 733 and 654 in glucose-limited, ammonium-limited, phosphate-limited and white grape juice environments, respectively) frequently show that phenotype in a specific environmental context. For instance, genes encoding components of the ubiquitination pathway or the proteasome show haploproficiency in nitrogen-limited conditions where protein conservation may be beneficial. Haploinsufficiency is more likely to be observed in all environments, as is the case with genes determining polar growth of the cell. Haploproficient genes seem randomly distributed in the genome, whereas haploinsufficient genes (685, 765, 1,277 and 217 in glucose-limited, ammonium-limited, phosphate-limited and white grape juice environments, respectively) are over-represented on chromosome III. This chromosome determines a yeast's mating type, and the concentration of haploinsufficient genes there may be a mechanism to prevent its loss.

98 citations


Journal ArticleDOI
TL;DR: This is the first study to identify, in an unbiased manner, a series of small-molecular-weight metabolites that effectively detect preeclampsia in plasma and provides new insights into the pathology of this condition and raises the possibility of the development of a predictive test.
Abstract: In a previous study, the ability of gas chromatography time-of-flight mass spectrometry to detect potential metabolic biomarkers in preeclampsia was demonstrated. In this study, the authors sought to validate their preliminary findings in an entirely different patient cohort using a complementary, novel, and powerful combination of analytical tools (ultra performance liquid chromatography and LTQ Orbitrap mass spectrometry system). Eight metabolites that appeared in the authors' previous patient cohort were identified as being statistically significant (P < .01) as discriminatory biomarkers. The chemical identities of these 8 metabolites were established using authentic chemical standards. They included uric acid, 2-oxoglutarate, glutamate, and alanine. This is the first study to identify, in an unbiased manner, a series of small-molecular-weight metabolites that effectively detect preeclampsia in plasma. The identity of these metabolites provides new insights into the pathology of this condition and raises the possibility of the development of a predictive test.

97 citations


Journal ArticleDOI
TL;DR: In this article, both the sensitivity analysis and robust experimental design strategies for the IκB-NF-κB signal transduction model were developed for cellular networks based on sensitivity analysis, and the initial IKK intensity was calculated using an optimal experimental design process.
Abstract: Experimental design for cellular networks based on sensitivity analysis is studied in this work. Both optimal and robust experimental design strategies are developed for the IκB-NF-κB signal transduction model. Based on local sensitivity analysis, the initial IKK intensity is calculated using an optimal experimental design process, and several scalarization measures of the Fisher information matrix are compared. Global sensitivity analysis and robust experimental design techniques are then developed to consider parametric uncertainties in the model. The modified Morris method is employed in global sensitivity analysis, and a semidefinite programming method is exploited to implement the robust experimental design for the problem of measurement set selection. The parametric impacts on the oscillatory behavior of NF-κB in the nucleus are also discussed. © 2008 Wiley Periodicals, Inc. Int J Chem Kinet 40: 730–741, 2008

80 citations


Journal ArticleDOI
TL;DR: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows.
Abstract: Background: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusions: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.

Journal ArticleDOI
01 Jan 2008
TL;DR: The landscape adaptive particle swarm optimizer (LAPSO) is an efficient method to escape from convergence to local optima and approaches the global optimum rapidly on the problems used.
Abstract: Several modified particle swarm optimizers are proposed in this paper. In DVPSO, a distribution vector is used in the update of velocity. This vector is adjusted automatically according to the distribution of particles in each dimension. In COPSO, the probabilistic use of a 'crossing over' update is introduced to escape from local minima. The landscape adaptive particle swarm optimizer (LAPSO) combines these two schemes with the aim of achieving more robust and efficient search. Empirical performance comparisons between these new modified PSO methods, and also the inertia weight PSO (IFPSO), the constriction factor PSO (CFPSO) and a covariance matrix adaptation evolution strategy (CMAES) are presented on several benchmark problems. All the experimental results show that LAPSO is an efficient method to escape from convergence to local optima and approaches the global optimum rapidly on the problems used.

Journal ArticleDOI
TL;DR: Differences indicate that up-regulation of metabolites in the mycelia of S. hirsutum may be connected to a defensive role or to stress, and proof of principle for the employment of metabolic profiling for biological discovery studies of metabolites produced by fungi is shown.
Abstract: The paper presents the first proof-of principle study of metabolite profiles of the interacting mycelial fronts of a wood decomposer basidiomycete, Stereum hirsutum, paired with two competitor basidiomycetes, Coprinus disseminatus and C. micaceus, using TLC and GC-TOF-MS profiling. GC-TOF-MS profiles were information rich, with a total of 190 metabolite peaks detected and more than 120 metabolite peaks detected per sample. The metabolite profiles were able to discriminate between the interactions of S. hirsutum with the two species of Coprinus. In confrontation with C. micaceus, where S. hirsutum mycelial fronts always overgrew those of C. micaceus, there were down-regulations of metabolites in the interaction zone, compared to monocultures of both S. hirsutum and C. micaceus. In contrast, in pairings with C. disseminatus, whose mycelia overgrew those of S. hirsutum, there were some up-regulations compared with monoculture controls, the majority of the metabolites being characteristic of the S. hirsutum monoculture profile. These differences indicate that up-regulation of metabolites in the mycelia of S. hirsutum may be connected to a defensive role or to stress. The results also show proof of principle for the employment of metabolic profiling for biological discovery studies of metabolites produced by fungi that could be applied to natural product screening programmes.

Journal ArticleDOI
01 Aug 2008-Placenta
TL;DR: It is concluded that metabolomic strategies offer a novel approach to investigate placental function when conducted under carefully controlled conditions, with appropriate statistical analysis, metabolic differences can be identified in placental explants in response to altered O2 tension.

Journal ArticleDOI
TL;DR: The Taverna workflow system has been extended to enable it to use and invoke Java classes and methods as tasks within Taverna workflows, demonstrated by a workflow in which libSBML is used to map gene expression data onto a metabolic pathway represented as a SBML model.
Abstract: Summary: Many data manipulation processes involve the use of programming libraries. These processes may beneficially be automated due to their repeated use. A convenient type of automation is in the form of workflows that also allow such processes to be shared amongst the community. The Taverna workflow system has been extended to enable it to use and invoke Java classes and methods as tasks within Taverna workflows. These classes and methods are selected for use during workflow construction by a Java Doclet application called the API Consumer. This selection is stored as an XML file which enables Taverna to present the subset of the API for use in the composition of workflows. The ability of Taverna to invoke Java classes and methods is demonstrated by a workflow in which we use libSBML to map gene expression data onto a metabolic pathway represented as a SBML model. Availability: Taverna and the API Consumer application can be freely downloaded from http://taverna.sourceforge.net Contact: peter.li@manchester.ac.uk Supplementary information: Supplementary data and documentation are available from http://www.mcisb.org/software/taverna/libsbml/index.html

Journal ArticleDOI
TL;DR: A text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.
Abstract: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.

Journal ArticleDOI
TL;DR: The constraint that the estimated parameters should be within given bounds and as close as possible to stated nominal values is introduced, and this deterministic 'proximate parameter tuning' algorithm turns out to be exceptionally effective.
Abstract: It is commonly the case in biochemical modelling that we have knowledge of the qualitative 'structure' of a model and some measurements of the time series of the variables of interest (concentrations and fluxes), but little or no knowledge of the model's parameters. This is, then, a system identification problem, that is commonly addressed by running a model with estimated parameters and assessing how far the model's behaviour is from the 'target' behaviour of the variables, and adjusting parameters iteratively until a good fit is achieved. The issue is that most of these problems are grossly underdetermined, such that many combinations of parameters can be used to fit a given set of variables. We introduce the constraint that the estimated parameters should be within given bounds and as close as possible to stated nominal values. This deterministic 'proximate parameter tuning' algorithm turns out to be exceptionally effective, and we illustrate its utility for models of p38 signalling, of yeast glycolysis and for a benchmark dataset describing the thermal isomerisation of alpha-pinene.

Journal ArticleDOI
TL;DR: SBML allows for distributed analysis of biochemical networks using loosely coupled workflows, and with the advent of the Internet the various software modules that one might use to analyze biochemical models can reside on entirely different computers and even on different continents.

Journal ArticleDOI
TL;DR: MC-based parameter estimation is proposed as a method to help in inferring parameter distributions, taking into account uncertainties in the initial conditions and in the measurement data, and the inferred parameter distributions are used to predict changes in the network via a simple classification method.
Abstract: Motivation: Genetic modifications or pharmaceutical interventions can influence multiple sites in metabolic pathways, and often these are ‘distant’ from the primary effect. In this regard, the ability to identify target and off-target effects of a specific compound or gene therapy is both a major challenge and critical in drug discovery. Results: We applied Markov Chain Monte Carlo (MCMC) for parameter estimation and perturbation identification in the kinetic modeling of metabolic pathways. Variability in the steady-state measurements in cells taken from a population can be caused by differences in initial conditions within the population, by variation of parameters among individuals and by possible measurement noise. MCMC-based parameter estimation is proposed as a method to help in inferring parameter distributions, taking into account uncertainties in the initial conditions and in the measurement data. The inferred parameter distributions are then used to predict changes in the network via a simple classification method. The proposed technique is applied to analyze changes in the pathways of pyruvate metabolism of mutants of Lactococcus lactis, based on previously published experimental data. Availability: MATLAB code used in the simulations is available from ftp://anonymous@dbkweb.mib.man.ac.uk/pub/Bioinformatics_BJ.zip Contact: bayujw@ieee.org Supplementary information: Supplementary data are available at Bioinformatics online.

Proceedings ArticleDOI
12 Jul 2008
TL;DR: It is demonstrated that genotype-fitness correlations may be used to estimate optimum population sizes for the six problems and is an important step towards the development of an adaptive algorithm that can respond to the perceived landscape in 'real-time', i.e. during the evolutionary search process itself.
Abstract: The main aim of landscape analysis has been to quantify the 'hardness' of problems. Early steps have been made towards extending this into Genetic Programming. However, few attempts have been made to extend the use of landscape analysis into the prediction of ways to make a problem easy, through the optimal setting of control parameters. This paper introduces a new class of landscape metrics, which we call 'Genotype-Fitness Correlations'. An example of this family of metrics is applied to six real-world regression problems. It is demonstrated that genotype-fitness correlations may be used to estimate optimum population sizes for the six problems. We believe that this application of a landscape metric as guidance in the setting of control parameters is an important step towards the development of an adaptive algorithm that can respond to the perceived landscape in 'real-time', i.e. during the evolutionary search process itself.


Book ChapterDOI
22 Sep 2008
TL;DR: A new method for automatically identifying the structure of an unknown molecule from its nuclear magnetic resonance (NMR) spectrum that does not need prior training or use spectrum prediction; does not rely on expert rules; and avoids enumeration of all possible candidate structures.
Abstract: Identifying the structure of unknown molecules is an important activity in the pharmaceutical industry where it underpins the production of new drugs and the analysis of complex biological samples. We present here a new method for automatically identifying the structure of an unknown molecule from its nuclear magnetic resonance (NMR) spectrum. In the technique, an ant colony optimization algorithm is used to search iteratively the highly-constrained space of feasible molecular structures, evaluating each one by reference to NMR information on known molecules stored (in a raw form) in a database. Unlike existing structure elucidation systems, ours: does not need prior training or use spectrum prediction; does not rely on expert rules; and avoids enumeration of all possible candidate structures. We describe the important elements of the system here and include results on a preliminarytest set of molecules. Whilst the results are currently too limited to allow parameter studies or comparison to other methods, they nevertheless indicate the system is working acceptably and shows considerable promise.