scispace - formally typeset
Search or ask a question
Author

Douglas B. Kell

Bio: Douglas B. Kell is an academic researcher from University of Liverpool. The author has contributed to research in topics: Dielectric & Systems biology. The author has an hindex of 111, co-authored 634 publications receiving 50335 citations. Previous affiliations of Douglas B. Kell include Max Planck Society & University of Wales.


Papers
More filters
Journal ArticleDOI
TL;DR: This work has constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature, and can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions.
Abstract: Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.

76 citations

Journal ArticleDOI
TL;DR: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows.
Abstract: Background: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusions: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.

74 citations

Journal ArticleDOI
TL;DR: Data is brought together that suggests that many supposedly non-communicable, chronic and inflammatory diseases are exacerbated by the presence of dormant or persistent bacteria, and that measures designed to assess and to inhibit or remove such organisms (or their access to iron) might be of much therapeutic benefit.
Abstract: For bacteria, replication mainly involves growth by binary fission. However, in a very great many natural environments there are examples of phenotypically dormant, non-growing cells that do not replicate immediately and that are phenotypically ‘nonculturable’ on media that normally admit their growth. They thereby evade detection by conventional culture-based methods. Such dormant cells may also be observed in laboratory cultures and in clinical microbiology. They are usually more tolerant to stresses such as antibiotics, and in clinical microbiology they are typically referred to as ‘persisters’. Bacterial cultures necessarily share a great deal of relatedness, and inclusive fitness theory implies that there are conceptual evolutionary advantages in trading a variation in growth rate against its mean, equivalent to hedging one’s bets. There is much evidence that bacteria exploit this strategy widely. We here bring together data that show the commonality of these phenomena across environmental, laboratory and clinical microbiology. Considerable evidence, using methods similar to those common in environmental microbiology, now suggests that many supposedly non-communicable, chronic and inflammatory diseases are exacerbated (if not indeed largely caused) by the presence of dormant or persistent bacteria (the ability of whose components to cause inflammation is well known). This dormancy (and resuscitation therefrom) often reflects the extent of the availability of free iron. Together, these phenomena can provide a ready explanation for the continuing inflammation common to such chronic diseases and its correlation with iron dysregulation. This implies that measures designed to assess and to inhibit or remove such organisms (or their access to iron) might be of much therapeutic benefit.

73 citations

Journal ArticleDOI
TL;DR: It is argued that stress‐induced iron dysregulation and its ability to awaken dormant, non‐replicating microbes with which the host has become infected are among the causes of chronic inflammation and its attendant inflammatory cytokines.
Abstract: Since the successful conquest of many acute, communicable (infectious) diseases through the use of vaccines and antibiotics, the currently most prevalent diseases are chronic and progressive in nature, and are all accompanied by inflammation. These diseases include neurodegenerative (e.g. Alzheimer's, Parkinson's), vascular (e.g. atherosclerosis, pre-eclampsia, type 2 diabetes) and autoimmune (e.g. rheumatoid arthritis and multiple sclerosis) diseases that may appear to have little in common. In fact they all share significant features, in particular chronic inflammation and its attendant inflammatory cytokines. Such effects do not happen without underlying and initially 'external' causes, and it is of interest to seek these causes. Taking a systems approach, we argue that these causes include (i) stress-induced iron dysregulation, and (ii) its ability to awaken dormant, non-replicating microbes with which the host has become infected. Other external causes may be dietary. Such microbes are capable of shedding small, but functionally significant amounts of highly inflammagenic molecules such as lipopolysaccharide and lipoteichoic acid. Sequelae include significant coagulopathies, not least the recently discovered amyloidogenic clotting of blood, leading to cell death and the release of further inflammagens. The extensive evidence discussed here implies, as was found with ulcers, that almost all chronic, infectious diseases do in fact harbour a microbial component. What differs is simply the microbes and the anatomical location from and at which they exert damage. This analysis offers novel avenues for diagnosis and treatment.

72 citations

Journal ArticleDOI
TL;DR: Neural networks are used to correct for pyrolysis mass spectrometer instrumental drift itself, so that neural network or other multivariate calibration models created using previously collected data can be used to give accurate estimates of determinand concentration or the nature of bacteria from newly acquired pyrolyses mass spectra.
Abstract: For pyrolysis mass spectrometry (PyMS) to be used for the routine identification of microorganisms, for quantifying determinands in biological and biotechnological systems, and in the production of useful mass spectral libraries, it is paramount that newly acquired spectra be compared to those previously collected. Neural network and other multivariate calibration models have been used to relate mass spectra to the biological features of interest. As commonly observed, however, mass spectral fingerprints showed a lack of long-term reproducibility, due to instrumental drift in the mass spectrometer; when identical materials were analyzed by PyMS at dates from 4 to 20 months apart, neural network models produced at earlier times could not be used to give accurate estimates of determinand concentrations or bacterial identities. Neural networks, however, can be used to correct for pyrolysis mass spectrometer instrumental drift itself, so that neural network or other multivariate calibration models created usi...

72 citations


Cited by
More filters
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: A simple and highly efficient method to disrupt chromosomal genes in Escherichia coli in which PCR primers provide the homology to the targeted gene(s), which should be widely useful, especially in genome analysis of E. coli and other bacteria.
Abstract: We have developed a simple and highly efficient method to disrupt chromosomal genes in Escherichia coli in which PCR primers provide the homology to the targeted gene(s). In this procedure, recombination requires the phage lambda Red recombinase, which is synthesized under the control of an inducible promoter on an easily curable, low copy number plasmid. To demonstrate the utility of this approach, we generated PCR products by using primers with 36- to 50-nt extensions that are homologous to regions adjacent to the gene to be inactivated and template plasmids carrying antibiotic resistance genes that are flanked by FRT (FLP recognition target) sites. By using the respective PCR products, we made 13 different disruptions of chromosomal genes. Mutants of the arcB, cyaA, lacZYA, ompR-envZ, phnR, pstB, pstCA, pstS, pstSCAB-phoU, recA, and torSTRCAD genes or operons were isolated as antibiotic-resistant colonies after the introduction into bacteria carrying a Red expression plasmid of synthetic (PCR-generated) DNA. The resistance genes were then eliminated by using a helper plasmid encoding the FLP recombinase which is also easily curable. This procedure should be widely useful, especially in genome analysis of E. coli and other bacteria because the procedure can be done in wild-type cells.

14,389 citations

Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: A practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics, which makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries.
Abstract: The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics.

10,947 citations