scispace - formally typeset
Search or ask a question

Showing papers by "Henning Hermjakob published in 2007"


Journal ArticleDOI
TL;DR: IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data that features over 126 000 binary interactions extracted from over 2100 scientific publications and makes extensive use of controlled vocabularies.
Abstract: IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data. The data available in the database originates entirely from published literature and is manually annotated by expert biologists to a high level of detail, including experimental methods, conditions and interacting domains. The database features over 126 000 binary interactions extracted from over 2100 scientific publications and makes extensive use of controlled vocabularies. The web site provides tools allowing users to search, visualize and download data from the repository. IntAct supports and encourages local installations as well as direct data submission and curation collaborations. IntAct source code and data are freely available from http://www.ebi.ac.uk/intact.

815 citations


Journal ArticleDOI
TL;DR: The processes and principles underpinning the development of guidance modules for reporting the use of techniques such as gel electrophoresis and mass spectrometry are described and the ramifications for various interest groups such as experimentalists, funders, publishers and the private sector are discussed.
Abstract: Both the generation and the analysis of proteomics data are now widespread, and high-throughput approaches are commonplace. Protocols continue to increase in complexity as methods and technologies evolve and diversify. To encourage the standardized collection, integration, storage and dissemination of proteomics data, the Human Proteome Organization's Proteomics Standards Initiative develops guidance modules for reporting the use of techniques such as gel electrophoresis and mass spectrometry. This paper describes the processes and principles underpinning the development of these modules; discusses the ramifications for various interest groups such as experimentalists, funders, publishers and the private sector; addresses the issue of overlap with other reporting guidelines; and highlights the criticality of appropriate tools and resources in enabling 'MIAPE-compliant' reporting.

703 citations


Journal ArticleDOI
TL;DR: It is shown that dysbindin and DISC1 share common PPIs suggesting they may affect common biological processes and that the function of schizophrenia risk genes may converge.
Abstract: Disrupted in Schizophrenia 1 (DISC1) is a schizophrenia risk gene associated with cognitive deficits in both schizophrenics and the normal ageing population. In this study, we have generated a network of protein–protein interactions (PPIs) around DISC1. This has been achieved by utilising iterative yeast-two hybrid (Y2H) screens, combined with detailed pathway and functional analysis. This so-called ‘DISC1 interactome’ contains many novel PPIs and provides a molecular framework to explore the function of DISC1. The network implicates DISC1 in processes of cytoskeletal stability and organisation, intracellular transport and cell-cycle/division. In particular, DISC1 looks to have a PPI profile consistent with that of an essential synaptic protein, which fits well with the underlying molecular pathology observed at the synaptic level and the cognitive deficits seen behaviourally in schizophrenics. Utilising a similar approach with dysbindin (DTNBP1), a second schizophrenia risk gene, we show that dysbindin and DISC1 share common PPIs suggesting they may affect common biological processes and that the function of schizophrenia risk genes may converge.

419 citations


Journal ArticleDOI
TL;DR: It is argued that clinical proteomics is not just a collection of studies dealing with analysis of clinical samples, but should be to address clinically relevant questions and to improve the state-of-the‐art, both in diagnosis and in therapy of diseases.
Abstract: The aim of this manuscript is to initiate a constructive discussion about the definition of clinical proteomics, study requirements, pitfalls and (potential) use. Furthermore, we hope to stimulate proposals for the optimal use of future opportunities and seek unification of the approaches in clinical proteomic studies. We have outlined our collective views about the basic principles that should be considered in clinical proteomic studies, including sample selection, choice of technology and appropriate quality control, and the need for collaborative interdisciplinary efforts involving clinicians and scientists. Furthermore, we propose guidelines for the critical aspects that should be included in published reports. Our hope is that, as a result of stimulating discussion, a consensus will be reached amongst the scientific community leading to guidelines for the studies, similar to those already published for mass spectrometric sequencing data. We contend that clinical proteomics is not just a collection of studies dealing with analysis of clinical samples. Rather, the essence of clinical proteomics should be to address clinically relevant questions and to improve the state-of-the-art, both in diagnosis and in therapy of diseases.

303 citations


Journal ArticleDOI
TL;DR: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes.
Abstract: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.

274 citations


Journal ArticleDOI
TL;DR: MIMIx, the minimum information required for reporting a molecular interaction experiment, is proposed, which will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.
Abstract: A wealth of molecular interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the experiment. Here we propose MIMIx, the minimum information required for reporting a molecular interaction experiment. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of molecular interaction data in public databases, thereby improving access to valuable interaction data.

270 citations


Journal ArticleDOI
TL;DR: ProteomeBinders is a new European consortium aiming to establish a comprehensive resource of well-characterized affinity reagents, including but not limited to antibodies, for analysis of the human proteome.
Abstract: ProteomeBinders is a new European consortium aiming to establish a comprehensive resource of well-characterized affinity reagents, including but not limited to antibodies, for analysis of the human proteome. Given the huge diversity of the proteome, the scale of the project is potentially immense but nevertheless feasible in the context of a pan-European or even worldwide coordination.

238 citations


Journal ArticleDOI
TL;DR: It is found that alternative splicing in human genes is more frequent than has commonly been suggested, and it is demonstrated that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts.
Abstract: Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.

218 citations


Journal ArticleDOI
TL;DR: The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML.
Abstract: The PRIDE (http://www.ebi.ac.uk/pride) database of protein and peptide identifications was previously described in the NAR Database Special Edition in 2006. Since this publication, the volume of public data in the PRIDE relational database has increased by more than an order of magnitude. Several significant public datasets have been added, including identifications and processed mass spectra generated by the HUPO Brain Proteome Project and the HUPO Liver Proteome Project. The PRIDE software development team has made several significant changes and additions to the user interface and tool set associated with PRIDE. The focus of these changes has been to facilitate the submission process and to improve the mechanisms by which PRIDE can be queried. The PRIDE team has developed a Microsoft Excel workbook that allows the required data to be collated in a series of relatively simple spreadsheets, with automatic generation of PRIDE XML at the end of the process. The ability to query PRIDE has been augmented by the addition of a BioMart interface allowing complex queries to be constructed. Collaboration with groups outside the EBI has been fruitful in extending PRIDE, including an approach to encode iTRAQ quantitative data in PRIDE XML.

141 citations


Journal ArticleDOI
TL;DR: The Protein Identifier Cross-Reference (PICR) service is a web application that provides interactive and programmatic access to a mapping algorithm that uses the UniProt Archive as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc.
Abstract: Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs.

136 citations


Journal ArticleDOI
TL;DR: Adoption of FuGE by multiple standards bodies will enable uniform reporting of common parts of functional genomics workflows, simplify data-integration efforts and ease the burden on researchers seeking to fulfill multiple minimum reporting requirements.
Abstract: The Functional Genomics Experiment data model (FuGE) has been developed to facilitate convergence of data standards for high-throughput, comprehensive analyses in biology. FuGE models the components of an experimental activity that are common across different technologies, including protocols, samples and data. FuGE provides a foundation for describing entire laboratory workflows and for the development of new data formats. The Microarray Gene Expression Data society and the Proteomics Standards Initiative have committed to using FuGE as the basis for defining their respective standards, and other standards groups, including the Metabolomics Standards Initiative, are evaluating FuGE in their development efforts. Adoption of FuGE by multiple standards bodies will enable uniform reporting of common parts of functional genomics workflows, simplify data-integration efforts and ease the burden on researchers seeking to fulfill multiple minimum reporting requirements. Such advances are important for transparent data management and mining in functional genomics and systems biology.

Journal ArticleDOI
TL;DR: The potential submitter is walked through the various routes by which data may be deposited with the databases and the tools which have been developed to assist in this process are described.
Abstract: The ever-increasing generation of, and corresponding interest in, molecular interaction data has lead to the establishment of a number of high-quality molecular interaction databases which manually curate interaction data extracted from the literature. In order to effectively share the curation load, and ensure that data is stored in and accessible from multiple sources, these databases have united to form the IMEx consortium. All of the IMEx databases also accept direct deposition of interaction data from authors prior to publication, thus both assisting the scientist in preparing the dataset for publication and ensuring that its subsequent representation in the public domain databases is fully accurate. This article walks the potential submitter through the various routes by which data may be deposited with the databases and describes the tools which have been developed to assist in this process.

Journal ArticleDOI
TL;DR: The effort to merge the existing mass spectrometry XML interchange formats, mzData and mzXML, into one single standard mzML yielded significant progress and the preliminary design of AnalysisXML was extended to include several new use cases and better support for quantification information.
Abstract: Over the last five years, the Human Proteome Organisation Proteomics Standards Initiative (HUPO PSI) has produced and released community-accepted XML interchange formats in the fields of mass spectrometry, molecular interactions and gel electrophoresis, have led the field in the discussion of the minimum information with which such data should be annotated and are now in the process of publishing much of this information. At this 4(th) Spring workshop, the emphasis was on consolidating this effort, refining and improving the existing models and in pushing these forward to align with more broadly encompassing efforts such as FuGE (Jones, A.R., Pizarro, A., Spellman, P., Miller, M., FuGE Working Group FuGE: Functional Genomics Experiment Object Model. OMICS 2006, 10, 179-184) and the Ontology for Biomedical Investigation (OBI). The effort to merge the existing mass spectrometry XML interchange formats, mzData and mzXML, into one single standard mzML yielded significant progress. Also the preliminary design of AnalysisXML was extended to include several new use cases and better support for quantification information. Finally the Molecular Interaction group discussed the development of a molecular interaction scoring system with accompanying gold standard data test sets.

Journal ArticleDOI
TL;DR: The web interface used to support HUPO‐PSI document processes for reviewing MIAPE documents, specifications, community practice and informational documents is presented.
Abstract: The Human Proteome Organisation's Proteomics Standards Initiative (HUPO-PSI) has recently developed formal document processes for reviewing MIAPE documents, specifications, community practice and informational documents. These document workflows rely on community participation as well as more traditional expert review. We here present the web interface used to support these document processes, and explain briefly how interested parties can participate in the review process. The HUPO-PSI website can be found at http://www.psidev.info.

Journal ArticleDOI
TL;DR: The Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI) was tasked with the creation of data standards and interchange formats to allow both the exchange and storage of such data irrespective of the hardware and software from which it was generated.
Abstract: The amount of data currently being generated by proteomics laboratories around the world is increasing exponentially, making it ever more critical that scientists are able to exchange, compare and retrieve datasets when re-evaluation of their original conclusions becomes important. Only a fraction of this data is published in the literature and important information is being lost every day as data formats become obsolete. The Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI) was tasked with the creation of data standards and interchange formats to allow both the exchange and storage of such data irrespective of the hardware and software from which it was generated. This article will provide an update on the work of this group, the creation and implementation of these standards and the standards-compliant data repositories being established as result of their efforts.

Journal ArticleDOI
TL;DR: This Opinion attempts to shed some light on some of the underlying issues of proteomics, and proposes certain guidelines authors can adhere to in order to allow others to validate their findings.
Abstract: The field of proteomics has gained considerable momentum over the last years as new technologies and better instrumentation allowed the field to mature from what resembled a cottage industry into a high-throughput means to identify, characterize and quantify hundreds of proteins. The identifications and (relative) quantitation values obtained are often controversial however, as various techniques and different software platforms are used in the many laboratories worldwide. This Opinion attempts to shed some light on some of the underlying issues, and proposes certain guidelines authors can adhere to in order to allow others to validate their findings.

Journal ArticleDOI
TL;DR: An extension to the PRIDE and mzData XML schema is proposed to accommodate the concept of multiple samples per experiment, and in addition, capture the intensities of the iTRAQTM reporter ions in the entry.
Abstract: Background Proteomics continues to play a critical role in post-genomic science as continued advances in mass spectrometry and analytical chemistry support the separation and identification of increasing numbers of peptides and proteins from their characteristic mass spectra. In order to facilitate the sharing of this data, various standard formats have been, and continue to be, developed. Still not fully mature however, these are not yet able to cope with the increasing number of quantitative proteomic technologies that are being developed.

Journal ArticleDOI
TL;DR: Standardization is a continuously moving target, in a fast evolving field like proteomics a consistent review and revision of these standards are necessary, and standards serve as essential tools to enable both data quality assessment and its subsequent reuse.

Journal ArticleDOI
TL;DR: Minimum reporting requirement have been developed and are now maturing to the point where they have been submitted for journal publication after prolonged exposure to community‐input via the PSI website.
Abstract: Since its conception in April 2002, the Human Proteome Organisation Proteomics Standards Initiative has contributed to the development of community standards for proteomics in a collaborative and very dynamic manner, resulting in the publication and increasing adoption of a number of interchange formats and controlled vocabularies. Repositories supporting these formats are being established or are already operational. In parallel with this, minimum reporting requirement have been developed and are now maturing to the point where they have been submitted for journal publication after prolonged exposure to community-input via the PSI website.

Book ChapterDOI
TL;DR: An increasing need for common data standards that will allow the interchange of data between different instrumentation, search engines, and between laboratory databases could lead to the establishment of data repositories from where benchmark datasets could be accessed and reanalyzed.
Abstract: The ever increasing volumes of proteomic data now being produced by laboratories across the world have resulted in major issues in data storage and accessibility. The further demands of multilaboratory initiatives has highlighted issues when collaborators cannot import data generated within the same project but generated by different hardware types and processed by laboratory-specific work flows and analyses packages. There is an increasing need for common data standards that will allow the interchange of data between different instrumentation, search engines, and between laboratory databases. This could then lead to the establishment of data repositories from where benchmark datasets could be accessed and reanalyzed. The Human Proteome Organization is currently supporting efforts to establish such standards. The work of the Proteomics Standards Initiative has lead to the development of the mzData XML interchange standard and is now broadening its scope to produce a spectral analysis output format, mzIdent. Accompanying controlled vocabularies allow the accurate, while systematic, representation of metadata throughout both schema.

Journal ArticleDOI
TL;DR: In the version of the article originally published, Manfred Koegl's name was misspelled and Zoltan Konthur's affiliation was incorrect; it should be Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany.
Abstract: Nat. Methods 4, 13–17 (2007); published online 28 December 2006; corrected after print 18 January 2007. In the version of the article originally published, Manfred Koegl's name was misspelled. Additionally, Zoltan Konthur's affiliation was incorrect; it should be Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany.

Proceedings Article
01 Jan 2007
TL;DR: A novel way of analyzing the information contained in proteomics experiments with a ’latent semantic analysis’ is presented, able to overcome the fundamental difficulties of analyzing such divergent and heterogeneous data emerging from large scale proteomics studies employing a vast spectrum of different sample treatment and mass-spectrometry technologies.
Abstract: Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amount of identifications available to the community. Despite the considerable body of information amassed, very few successful analysis have been performed and published on this data, levelling off the ultimate value of these projects far below their potential. In order to illustrate that these repositories should be considered sources of detailed knowledge instead of data graveyards, we here present a novel way of analyzing the information contained in proteomics experiments with a ’latent semantic analysis’. We apply this information retrieval approach to the peptide identification data contributed by the Plasma Proteome Project. Interestingly, this analysis is able to overcome the fundamental difficulties of analyzing such divergent and heterogeneous data emerging from large scale proteomics studies employing a vast spectrum of different sample treatment and mass-spectrometry technologies. Moreover, it yields several concrete recommendations for optimizing proteomics project planning as well as the choice of technologies used in the experiments. It is clear from these results that the analysis of large bodies of publicly available proteomics data holds great promise and is currently

01 Jan 2007
TL;DR: This work first used seven organisms separately to transfer interactions onto Synechocystis, then they combined the predictions and are now investigating the validation of these predictions using domain-domain interactions and functional annotation.
Abstract: To better understand the response of the model cyanobacterium Synechocystis to environ mental stresses, we aim at building regulatory networks by a co mbination of gene expression data and protein-protein interactions. The experimental generation of the interaction network remains difficult, but some large-scale interaction networks are available for a nu mber of model organisms, and systematic transfer of protein-protein interactions has beco me a central task of functional genomics. Consequently, we have investigated the domain of network inference and validation through the use of protein orthologs using the concept of "interologs". In this way we expect to quantitatively expand the Synechocystis protein interaction network. We first used seven organisms separately to transfer interactions onto Synechocystis, then we combined the predictions and are now investigating the validation of these predictions using domain-domain interactions and functional annotation.