scispace - formally typeset
Search or ask a question
Author

Marc R. Wilkins

Bio: Marc R. Wilkins is an academic researcher from University of New South Wales. The author has contributed to research in topics: Proteome & Methylation. The author has an hindex of 54, co-authored 249 publications receiving 19797 citations. Previous affiliations of Marc R. Wilkins include Geneva College & Swiss Institute of Bioinformatics.


Papers
More filters
Book ChapterDOI
TL;DR: Details are given about protein identification and analysis software that is available through the ExPASy World Wide Web server and the extensive annotation available in the Swiss-Prot database is used.
Abstract: Protein identification and analysis software performs a central role in the investigation of proteins from two-dimensional (2-D) gels and mass spectrometry. For protein identification, the user matches certain empirically acquired information against a protein database to define a protein as already known or as novel. For protein analysis, information in protein databases can be used to predict certain properties about a protein, which can be useful for its empirical investigation. The two processes are thus complementary. Although there are numerous programs available for those applications, we have developed a set of original tools with a few main goals in mind. Specifically, these are: 1. To utilize the extensive annotation available in the Swiss-Prot database wherever possible, in particular the position-specific annotation in the Swiss-Prot feature tables to take into account posttranslational modifications and protein processing. 2. To develop tools specifically, but not exclusively, applicable to proteins prepared by two dimensional gel electrophoresis and peptide mass fingerprinting experiments. 3. To make all tools available on the World-Wide Web (WWW), and freely usable by the scientific community. In this chapter we give details about protein identification and analysis software that is available through the ExPASy World Wide Web server.

8,007 citations

Journal ArticleDOI
TL;DR: The Progress with Proteome Projects: Why all Proteins Expressed by a Genome Should be Identified and How To Do It as discussed by the authors is an example of such a project.
Abstract: (1996). Progress with Proteome Projects: Why all Proteins Expressed by a Genome Should be Identified and How To Do It. Biotechnology and Genetic Engineering Reviews: Vol. 13, No. 1, pp. 19-50.

1,158 citations

Journal ArticleDOI
TL;DR: A protein map of the smallest known self‐replicating organism, Mycoplasma genitalium, revealed a high proportion of acidic proteins, which allowed proteins to be identified prior to detection of their respective genes via the M. genitalium sequencing initiative.
Abstract: A protein map of the smallest known self-replicating organism, Mycoplasma genitalium (Class: Mollicutes), revealed a high proportion of acidic proteins. Amino acid composition was used to putatively identify, or provide unique parameters, for 50 gene products separated by two-dimensional gel electrophoresis. A further 19 proteins were subjected to peptide-mass fingerprinting using matrix-assisted laser desorption ionisation-time of flight (MALDI-TOF) mass spectrometry and 4 were subjected to N-terminal Edman degradation. The majority of M. genitalium proteins remain uncharacterised. However, the combined approach of amino acid analysis and peptide-mass fingerprinting allowed gene products to be linked to homologous genes in a variety of organisms. This has allowed proteins to be identified prior to detection of their respective genes via the M. genitalium sequencing initiative. The principle of ‘hierarchical’ analysis for the mass screening of proteins and the analysis of microbial genomes via their protein complement or ‘proteome’ is detailed. Here, characterisation of gene products depends upon the quickest and most economical technologies being employed initially, so as to determine if a large number of proteins are already present in both homologous and heterologous species databases. Initial screening, which lends itself to automation and robotics, can then be followed by more time and cost intensive procedures, when necessary.

955 citations

Journal ArticleDOI
TL;DR: Single protein spots, from polyvinylidene difluoride blots of micropreparative E. coli 2-D gels, were rapidly and economically identified by matching their amino acid composition, estimated pI and molecular weight against all E. bacteria entries in the SWISS-PROT database.
Abstract: Separation and identification of proteins by two-dimensional (2-D) electrophoresis can be used for protein-based gene expression analysis In this report single protein spots, from polyvinylidene difluoride blots of micropreparative E coli 2-D gels, were rapidly and economically identified by matching their amino acid composition, estimated pI and molecular weight against all E coli entries in the SWISS-PROT database Thirty proteins from an E coli 2-D map were analyzed and identities assigned Three of the proteins were unknown By protein sequencing analysis, 20 of the 27 proteins were correctly identified Importantly, correct identifications showed unambiguous “correct” score patterns While incorrect protein identifications also showed distinctive score patterns, indicating that protein must be identified by other means These techniques allow large-scale screening of the protein complement of simple organisms, or tissues in normal and disease states The computer program described here is accessible via the World Wide Web at URL address (http://expasyhcugech/)

897 citations

Journal ArticleDOI
TL;DR: The processes and principles underpinning the development of guidance modules for reporting the use of techniques such as gel electrophoresis and mass spectrometry are described and the ramifications for various interest groups such as experimentalists, funders, publishers and the private sector are discussed.
Abstract: Both the generation and the analysis of proteomics data are now widespread, and high-throughput approaches are commonplace. Protocols continue to increase in complexity as methods and technologies evolve and diversify. To encourage the standardized collection, integration, storage and dissemination of proteomics data, the Human Proteome Organization's Proteomics Standards Initiative develops guidance modules for reporting the use of techniques such as gel electrophoresis and mass spectrometry. This paper describes the processes and principles underpinning the development of these modules; discusses the ramifications for various interest groups such as experimentalists, funders, publishers and the private sector; addresses the issue of overlap with other reporting guidelines; and highlights the criticality of appropriate tools and resources in enabling 'MIAPE-compliant' reporting.

703 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines target the reliability of results to help ensure the integrity of the scientific literature, promote consistency between laboratories, and increase experimental transparency.
Abstract: Background: Currently, a lack of consensus exists on how best to perform and interpret quantitative real-time PCR (qPCR) experiments. The problem is exacerbated by a lack of sufficient experimental detail in many publications, which impedes a reader’s ability to evaluate critically the quality of the results presented or to repeat the experiments. Content: The Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines target the reliability of results to help ensure the integrity of the scientific literature, promote consistency between laboratories, and increase experimental transparency. MIQE is a set of guidelines that describe the minimum information necessary for evaluating qPCR experiments. Included is a checklist to accompany the initial submission of a manuscript to the publisher. By providing all relevant experimental conditions and assay characteristics, reviewers can assess the validity of the protocols used. Full disclosure of all reagents, sequences, and analysis methods is necessary to enable other investigators to reproduce results. MIQE details should be published either in abbreviated form or as an online supplement. Summary: Following these guidelines will encourage better experimental practice, allowing more reliable and unequivocal interpretation of qPCR results.

12,469 citations

Journal ArticleDOI
TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Abstract: Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

10,913 citations

Journal ArticleDOI
TL;DR: An environment for comparative protein modeling is developed that consists of SWISS‐MODEL, a server for automated comparativeprotein modeling and of the SWiss‐PdbViewer, a sequence to structure workbench that provides a large selection of structure analysis and display tools.
Abstract: Comparative protein modeling is increasingly gaining interest since it is of great assistance during the rational design of mutagenesis experiments. The availability of this method, and the resulting models, has however been restricted by the availability of expensive computer hardware and software. To overcome these limitations, we have developed an environment for comparative protein modeling that consists of SWISS-MODEL, a server for automated comparative protein modeling and of the SWISS-PdbViewer, a sequence to structure workbench. The Swiss-PdbViewer not only acts as a client for SWISS-MODEL, but also provides a large selection of structure analysis and display tools. In addition, we provide the SWISS-MODEL Repository, a database containing more than 3500 automatically generated protein models. By making such tools freely available to the scientific community, we hope to increase the use of protein structures and models in the process of experiment design.

10,713 citations

01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
TL;DR: A new computer program, Mascot, is presented, which integrates all three types of search for protein identification by searching a sequence database using mass spectrometry data, and the scoring algorithm is probability based.
Abstract: Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.

8,195 citations