Showing papers by "David S. Wishart published in 2006"

PDF

Open Access

Journal Article•DOI•

DrugBank: a comprehensive resource for in silico drug discovery and exploration

[...]

David S. Wishart¹, Craig Knox¹, An Chi Guo¹, Savita Shrivastava¹, Murtaza Hassanali¹, Paul Stothard¹, Zhan Chang¹, Jennifer Woolsey¹ - Show less +4 more•Institutions (1)

University of Alberta¹

01 Jan 2006-Nucleic Acids Research

TL;DR: DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug data with comprehensive drug target information and is fully searchable supporting extensive text, sequence, chemical structure and relational query searches.

...read moreread less

Abstract: DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains .4100 drug entries including .800 FDA approved small molecule and biotech drugs as well as .3200 experimental drugs. Additionally, .14 000 protein or drug target sequences are linked to these drug entries. Each DrugCard entry contains .80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Many data fields are hyperlinked to other databases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot and GenBank) and a variety of structure viewing applets. The database is fully searchable supporting extensive text, sequence, chemical structure and relational query searches. Potential applications of DrugBank include in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. DrugBank is available at http:// redpoll.pharmacy.ualberta.ca/drugbank/.

...read moreread less

3,087 citations

Journal Article•DOI•

Applications of Machine Learning in Cancer Prediction and Prognosis

[...]

Joseph A. Cruz¹, David S. Wishart¹•Institutions (1)

University of Alberta¹

01 Jan 2006-Cancer Informatics

TL;DR: A broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis is conducted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies.

...read moreread less

Abstract: Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and mi...

...read moreread less

967 citations

Journal Article•DOI•

Escherichia coli K-12: a cooperatively developed annotation snapshot—2005

[...]

Monica Riley¹, Takashi Abe, Martha B. Arnaud², Mary B. Berlyn³, Frederick R. Blattner⁴, Roy R. Chaudhuri⁵, Jeremy D. Glasner⁴, Takashi Horiuchi⁶, Ingrid M. Keseler⁷, Takehide Kosuge, Hirotada Mori⁸, Hirotada Mori⁹, Nicole T. Perna⁴, Guy Plunkett⁴, Kenneth E. Rudd¹⁰, Margrethe H. Serres¹, Gavin H. Thomas¹¹, Nicholas R. Thomson¹², David S. Wishart¹³, Barry L. Wanner¹⁴ - Show less +16 more•Institutions (14)

Marine Biological Laboratory¹, Stanford University², Yale University³, University of Wisconsin-Madison⁴, University of Birmingham⁵, National Institutes of Natural Sciences, Japan⁶, SRI International⁷, Keio University⁸, Nara Institute of Science and Technology⁹, University of Miami¹⁰, University of York¹¹, Wellcome Trust Sanger Institute¹², University of Alberta¹³, Purdue University¹⁴

01 Jan 2006-Nucleic Acids Research

TL;DR: A snapshot analysis based on the most recent genome sequences of two E.coli K-12 strains allows comparison of their genotypes and mutant status of alleles.

...read moreread less

Abstract: The goal of this group project has been to coordinate and bring up-to-date information on all genes of Escherichia coli K-12. Annotation of the genome of an organism entails identification of genes, the boundaries of genes in terms of precise start and end sites, and description of the gene products. Known and predicted functions were assigned to each gene product on the basis of experimental evidence or sequence analysis. Since both kinds of evidence are constantly expanding, no annotation is complete at any moment in time. This is a snapshot analysis based on the most recent genome sequences of two E.coli K-12 bacteria. An accurate and up-to-date description of E.coli K-12 genes is of particular importance to the scientific community because experimentally determined properties of its gene products provide fundamental information for annotation of innumerable genes of other organisms. Availability of the complete genome sequence of two K-12 strains allows comparison of their genotypes and mutant status of alleles.

...read moreread less

636 citations

Journal Article•DOI•

PREDITOR: a web server for predicting protein torsion angle restraints

[...]

Mark V. Berjanskii¹, Stephen Neal, David S. Wishart•Institutions (1)

University of Alberta¹

01 Jul 2006-Nucleic Acids Research

TL;DR: A web server, called PREDITOR, which greatly accelerates and simplifies the determination of torsion angle restraints including phi, psi, omega and chi angles and is 35 times faster and up to 20% more accurate than any existing method.

...read moreread less

Abstract: Every year between 500 and 1000 peptide and protein structures are determined by NMR and deposited into the Protein Data Bank. However, the process of NMR structure determination continues to be a manually intensive and time-consuming task. One of the most tedious and error-prone aspects of this process involves the determination of torsion angle restraints including phi, psi, omega and chi angles. Most methods require many days of additional experiments, painstaking measurements or complex calculations. Here we wish to describe a web server, called PREDITOR, which greatly accelerates and simplifies this task. PREDITOR accepts sequence and/or chemical shift data as input and generates torsion angle predictions (with predicted errors) for phi, psi, omega and chi-1 angles. PREDITOR combines sequence alignment methods with advanced chemical shift analysis techniques to generate its torsion angle predictions. The method is fast (<40 s per protein) and accurate, with 88% of phi/psi predictions being within 30 degrees of the correct values, 84% of chi-1 predictions being correct and 99.97% of omega angles being correct. PREDITOR is 35 times faster and up to 20% more accurate than any existing method. PREDITOR also provides accurate assessments of the torsion angle errors so that the torsion angle constraints can be readily fed into standard structure refinement programs, such as CNS, XPLOR, AMBER and CYANA. Other unique features to PREDITOR include dihedral angle prediction via PDB structure mapping, automated chemical shift re-referencing (to improve accuracy), prediction of proline cis/trans states and a simple user interface. The PREDITOR website is located at: http://wishart.biology.ualberta.ca/preditor.

...read moreread less

187 citations

Journal Article•DOI•

Improving the accuracy of protein secondary structure prediction using structural alignment.

[...]

Scott Montgomerie¹, Shan Sundararaj¹, Warren J. Gallin¹, David S. Wishart¹•Institutions (1)

University of Alberta¹

14 Jun 2006-BMC Bioinformatics

TL;DR: This work has developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process that is approximately 4–5% better than any other method currently available.

...read moreread less

Abstract: The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high. We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%. By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at http://wishart.biology.ualberta.ca/proteus . For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.

...read moreread less

146 citations

Proceedings Article•

Visual explanation of evidence in additive classifiers

[...]

B. Poulin¹, Roman Eisner¹, Duane Szafron¹, Paul Lu¹, Russell Greiner¹, David S. Wishart¹, Alona Fyshe¹, Brandon Pearcy¹, Cam Macdonell¹, John Anvik¹ - Show less +6 more•Institutions (1)

University of Alberta¹

16 Jul 2006

TL;DR: A framework, ExplainD, is described for explaining decisions made by classifiers that use additive evidence, which applies to many widely used classifiers, including linear discriminants and many additive models.

...read moreread less

Abstract: Machine-learned classifiers are important components of many data mining and knowledge discovery systems. In several application domains, an explanation of the classifier's reasoning is critical for the classifier's acceptance by the end-user. We describe a framework, ExplainD, for explaining decisions made by classifiers that use additive evidence. ExplainD applies to many widely used classifiers, including linear discriminants and many additive models. We demonstrate our ExplainD framework using implementations of naive Bayes, linear support vector machine, and logistic regression classifiers on example applications. ExplainD uses a simple graphical explanation of the classification process to provide visualizations of the classifier decisions, visualization of the evidence for those decisions, the capability to speculate on the effect of changes to the data, and the capability, wherever possible, to drill down and audit the source of the evidence. We demonstrate the effectiveness of ExplainD in the context of a deployed web-based system (Proteome Analyst) and using a downloadable Python-based implementation.

...read moreread less

126 citations

Journal Article•DOI•

NMR: prediction of protein flexibility

[...]

Mark V. Berjanskii¹, David S. Wishart², David S. Wishart¹•Institutions (2)

University of Alberta¹, National Institute for Nanotechnology²

01 Jan 2006-Nature Protocols

TL;DR: The key advantages of this protocol over existing methods for studying protein dynamics are that it does not require prior knowledge of a protein's tertiary structure, it is not sensitive to the protein's overall tumbling, and itdoes not require additional NMR measurements beyond the standard experiments for backbone assignments.

...read moreread less

Abstract: We present a protocol for predicting protein flexibility from NMR chemical shifts. The protocol consists of (i) ensuring that the chemical shift assignments are correctly referenced or, if not, performing a reference correction using information derived from the chemical shift index, (ii) calculating the random coil index (RCI), and (iii) predicting the expected root mean square fluctuations (RMSFs) and order parameters (S2) of the protein from the RCI. The key advantages of this protocol over existing methods for studying protein dynamics are that (i) it does not require prior knowledge of a protein's tertiary structure, (ii) it is not sensitive to the protein's overall tumbling and (iii) it does not require additional NMR measurements beyond the standard experiments for backbone assignments. When chemical shift assignments are available, protein flexibility parameters, such as S2 and RMSF, can be calculated within 1–2 h using a spreadsheet program.

...read moreread less

79 citations

Journal Article•DOI•

Automated bacterial genome analysis and annotation.

[...]

Paul Stothard¹, David S. Wishart², David S. Wishart¹•Institutions (2)

University of Alberta¹, National Institute for Nanotechnology²

01 Oct 2006-Current Opinion in Microbiology

TL;DR: The combined challenge of revising existing annotations and extracting useful information from the flood of new genome sequences will necessitate more reliance on completely automated systems.

...read moreread less

66 citations

Journal Article•DOI•

Metabolomics in monitoring kidney transplants.

[...]

David S. Wishart¹•Institutions (1)

University of Alberta¹

01 Nov 2006-Current Opinion in Nephrology and Hypertension

TL;DR: The application of metabolomics to kidney transplant monitoring is still very much in its infancy, but there are a number of easily measured metabolites in both urine and serum that can provide reliable indications of organ function, organ injury, and immunosuppressive drug toxicity.

...read moreread less

Abstract: Purpose of reviewThe success of any given kidney transplant is closely tied to the ability to monitor patients and responsively change their medications. Transplant monitoring is still, however, dependent on relatively old technologies: serum creatinine levels, urine output, blood pressure, blood gl

...read moreread less

56 citations

Journal Article•DOI•

Accurate prediction of protein torsion angles using chemical shifts and sequence homology.

[...]

Stephen Neal¹, Mark V. Berjanskii¹, Haiyan Zhang¹, David S. Wishart², David S. Wishart¹ - Show less +1 more•Institutions (2)

University of Alberta¹, National Institute for Nanotechnology²

01 Jul 2006-Magnetic Resonance in Chemistry

TL;DR: A program, called SHIFTOR, that is able to accurately predict a large number of protein torsion angles using only 1H, 13C and 15N chemical shift assignments as input and its predictions are approximately 20% better than existing methods.

...read moreread less

Abstract: Torsion angle restraints are frequently used in the determination and refinement of protein structures by NMR. These restraints may be obtained by J coupling, cross-correlation measurements, nuclear Overhauser effects (NOEs) or secondary chemical shifts. Currently most backbone (phi/psi) torsion angles are determined using a combination of J(HNHalpha) couplings and chemical shift measurements while most side-chain (chi1) angles and cis/trans peptide bond angles (omega) are determined via NOEs. The dependency on multiple experimental (and computational) methods to obtain different torsion angle restraints is both time-consuming and error prone. The situation could be greatly improved if the determination of all torsion angles (phi, psi, chi and omega) could be made via a single type of measurement (i.e. chemical shifts). Here we describe a program, called SHIFTOR, that is able to accurately predict a large number of protein torsion angles (phi, psi, omega, chi1) using only 1H, 13C and 15N chemical shift assignments as input. Overall, the program is 100x faster and its predictions are approximately 20% better than existing methods. The program is also capable of predicting chi1 angles with 81% accuracy and omega angles with 100% accuracy. SHIFTOR exploits many of the recent developments and observations regarding chemical shift dependencies as well as using information in the Protein Databank to improve the quality of its shift-derived torsion angle predictions. SHIFTOR is available as a freely accessible web server at http://wishart.biology.ualberta.ca/shiftor.

...read moreread less

48 citations

Proceedings Article•DOI•

BioSpider: a web server for automating metabolome annotations.

[...]

Craig Knox¹, Savita Shrivastava, Paul Stothard, Roman Eisner, David S. Wishart - Show less +1 more•Institutions (1)

University of Alberta¹

01 Dec 2006

TL;DR: The developed BioSpider is essentially an automated report generator designed specifically to tabulate and summarize data on biomolecules - both large and small, and is believed to be a particularly valuable tool for researchers in metabolomics.

...read moreread less

Abstract: One of the growing challenges in life science research lies in finding useful, descriptive or quantitative data about newly reported biomolecules (genes, proteins, metabolites and drugs). An even greater challenge is finding information that connects these genes, proteins, drugs or metabolites to each other. Much of this information is scattered through hundreds of different databases, abstracts or books and almost none of it is particularly well integrated. While some efforts are being undertaken at the NCBI and EBI to integrate many different databases together, this still falls short of the goal of having some kind of human-readable synopsis that summarizes the state of knowledge about a given biomolecule - especially small molecules. To address this shortfall, we have developed BioSpider. BioSpider is essentially an automated report generator designed specifically to tabulate and summarize data on biomolecules - both large and small. Specifically, BioSpider allows users to type in almost any kind of biological or chemical identifier (protein/gene name, sequence, accession number, chemical name, brand name, SMILES string, InCHI string, CAS number, etc.) and it returns an in-depth synoptic report (approximately 3-30 pages in length) about that biomolecule and any other biomolecule it may target. This summary includes physico-chemical parameters, images, models, data files, descriptions and predictions concerning the query molecule. BioSpider uses a web-crawler to scan through dozens of public databases and employs a variety of specially developed text mining tools and locally developed prediction tools to find, extract and assemble data for its reports. Because of its breadth, depth and comprehensiveness, we believe BioSpider will prove to be a particularly valuable tool for researchers in metabolomics. BioSpider is available at: www.biospider.ca

...read moreread less

Book Chapter•DOI•

Metabolomics in Humans and Other Mammals

[...]

David S. Wishart¹, Silas G. Villas-Boas², Ute Roessner³, Michael Adsetts Edberg Hansen⁴, Jørn Smedsgaard⁴, Jens Nielsen⁴ - Show less +2 more•Institutions (4)

University of Alberta¹, AgResearch², Australian Centre for Plant Functional Genomics³, Technical University of Denmark⁴

30 Jun 2006

Proceedings Article•DOI•

Computational approaches to metabolomics: an introduction

[...]

David S. Wishart, Russell Greiner

01 Dec 2006

TL;DR: This year's Pacific Symposium in Biocomputing solicited papers that focused specifically on describing novel methods for the acquisition, management and analysis of metabolomic data, particularly interested in papers that covered one of the five following topics: metabolomics databases; 2) metabolomics LIMS; 3) spectral analysis tools for metabolomics; 4) medical or applied metabolomics.

...read moreread less

Abstract: 1. Session Background and Motivation This marks the first time that the Pacific Symposium in Biocomputing has hosted a session specifically devoted to the emerging computational needs of metabolomics. Metabolomics, or metabonomics as it is sometimes called, is a relatively new field of “omics” research concerned with the high-throughput identification and quantification of the small molecule metabolites in the metabolome (i.e. the complete complement of all small molecule metabolites found in a specific cell, organ or organism). It is a close counterpart to the genome, the transcriptome and the proteome. Together these four “omes” constitute the building blocks of systems biology. Even though metabolomics is primarily concerned with tracking and identifying chemicals as opposed to genes or proteins, it still shares many of the same computational needs with genomics, proteomics and transcriptomics. For instance, just like the other “omics” fields, metabolomics needs electronically accessible and searchable databases, it needs software to handle or process data from various highthroughput instruments such as NMR spectrometers or mass spectrometers, it needs laboratory information management systems (LIMS) to manage the data, and it needs software tools to predict or find information about metabolite properties, pathways, relationships or functions. These computational needs are just beginning to be addressed by members of the metabolomics community. As a result we believed that a PSB session devoted to this topic could address a number of important issues concerning both the emerging computational needs and the nascent computational trends in metabolomics. This year we solicited papers that focused specifically on describing novel methods for the acquisition, management and analysis of metabolomic data. We were particularly interested in papers that covered one of the five following topics: 1) metabolomics databases; 2) metabolomics LIMS; 3) spectral analysis tools for metabolomics; 4) medical or applied metabolomics

...read moreread less