scispace - formally typeset
Open AccessJournal ArticleDOI

The BioPAX community standard for pathway data sharing

Emek Demir, +94 more
- 01 Sep 2010 - 
- Vol. 28, Iss: 9, pp 935-942
TLDR
Thousands of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases, and this large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
Abstract
Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.

read more

Content maybe subject to copyright    Report

nature biotechnology VOLUME 28 NUMBER 9 SEPTEMBER 2010 9 3 5
P E R S P E C T I V E
Biological Pathway Exchange (BioPAX) is a standard language
to represent biological pathways at the molecular and cellular
level and to facilitate the exchange of pathway data. The
rapid growth of the volume of pathway data has spurred the
development of databases and computational tools to aid
interpretation; however, use of these data is hampered by the
current fragmentation of pathway information across many
databases with incompatible formats. BioPAX, which was
created through a community process, solves this problem
by making pathway data substantially easier to collect,
index, interpret and share. BioPAX can represent metabolic
and signaling pathways, molecular and genetic interactions
and gene regulation networks. Using BioPAX, millions of
interactions, organized into thousands of pathways, from many
organisms are available from a growing number of databases.
This large amount of pathway data in a computable form will
support visualization, analysis and biological discovery.
Increasingly powerful technologies, including genome-wide molecular
measurements, have accelerated progress toward a complete map of
molecular interaction networks in cells and between cells of many organ-
isms. The growing scale of these maps requires their representation in
a form suitable for computer processing, storage and dissemination
by means of software systems. The BioPAX project aims to facilitate
knowledge representation, systematic collection, integration and wide
distribution of pathway data from heterogeneous information sources.
This will enable these data to be incorporated into distributed biological
information systems that support visualization and analysis.
BioPAX supports efforts working toward a complete representation of
basic cellular processes. Biology has come a long way since the Boehringer-
Mannheim wall chart of metabolic pathways
1
and the Nicholson Metabolic
Map
2
. Since then, several groups have developed methods and databases
for organizing pathway information
3–16
, but only recently have groups
collaborated as part of the BioPAX project to develop a generally accepted
standard way of representing these pathway maps. Complete molecular
process maps must include all interactions, reactions, dependencies, influ-
ence and information flow between pools of molecules in cells and between
cells. For ease of use and simplicity of presentation, such network maps
are often organized in terms of subnetworks or pathways. Pathways are
models delineated within the entire cellular biochemical network that help
us describe and understand specific biological processes. Thus, a useful
definition of a pathway is a set of interactions between physical or genetic
cell components, often describing a cause-and-effect or time-dependent
process, that explains observable biological phenomena. How do we rep-
resent these pathways in a generally accepted and computable form?
Challenges posed by the many fragmented pathway databases
The total volume of pathway data mapped by biologists and stored
in databases has entered a rapid growth phase, with the number of
The BioPAX community standard for pathway
data sharing
Emek Demir
1,2,
*, Michael P Cary
1
, Suzanne Paley
3
, Ken Fukuda
4
, Christian Lemer
5
, Imre Vastrik
6
,
Guanming Wu
7
, Peter DEustachio
8
, Carl Schaefer
9
, Joanne Luciano
10
, Frank Schacherer
11
,
Irma Martinez-Flores
12
, Zhenjun Hu
13
, Veronica Jimenez-Jacinto
12
, Geeta Joshi-Tope
14
, Kumaran Kandasamy
15
,
Alejandra C Lopez-Fuentes
16
, Huaiyu Mi
17
, Elgar Pichler
18
, Igor Rodchenkov
19
, Andrea Splendiani
20,21
,
Sasha Tkachev
22
, Jeremy Zucker
23
, Gopal Gopinath
24
, Harsha Rajasimha
25,26
, Ranjani Ramakrishnan
27
,
Imran Shah
28
, Mustafa Syed
29
, Nadia Anwar
1
, Özgün Babur
1,2
, Michael Blinov
30
, Erik Brauner
31
,
Dan Corwin
32
, Sylva Donaldson
19
, Frank Gibbons
31
, Robert Goldberg
33
, Peter Hornbeck
22
, Augustin Luna
34
,
Peter Murray-Rust
35
, Eric Neumann
36
, Oliver Ruebenacker
37
, Matthias Samwald
38,39
, Martijn van Iersel
40
,
Sarala Wimalaratne
41
, Keith Allen
42
, Burk Braun
11
, Michelle Whirl-Carrillo
43
, Kei-Hoi Cheung
44
,
Kam Dahlquist
45
, Andrew Finney
46
, Marc Gillespie
47
, Elizabeth Glass
29
, Li Gong
43
, Robin Haw
7
,
Michael Honig
48
, Olivier Hubaut
5
, David Kane
49
, Shiva Krupa
50
, Martina Kutmon
51
, Julie Leonard
42
,
Debbie Marks
52
, David Merberg
53
, Victoria Petri
54
, Alex Pico
55
, Dean Ravenscroft
56
, Liya Ren
14
, Nigam Shah
57
,
Margot Sunshine
34
, Rebecca Tang
43
, Ryan Whaley
43
, Stan Letovksy
58
, Kenneth H Buetow
59
, Andrey Rzhetsky
60
,
Vincent Schachter
61
, Bruno S Sobral
25
, Ugur Dogrusoz
2
, Shannon McWeeney
27
, Mirit Aladjem
34
, Ewan Birney
6
,
Julio Collado-Vides
12
, Susumu Goto
62
, Michael Hucka
63
, Nicolas Le Novère
6
, Natalia Maltsev
29
,
Akhilesh Pandey
15
, Paul Thomas
17
, Edgar Wingender
64
, Peter D Karp
3
, Chris Sander
1
& Gary D Bader
19
*
A full list of author affiliations appear at the end of this paper.
Published online 9 September 2010; corrected after print 7 December 2010 and
10 April 2012; doi:10.1038/nbt.1666
npg
© 2012 Nature America, Inc. All rights reserved.

9 3 6 VOLUME 28 NUMBER 9 SEPTEMBER 2010 nature biotechnology
P E R S P E C T I V E
online resources for pathways and molecular interactions increasing
70%, from 190 in 2006 to 325 in 2010 (ref. 17). In addition, molecular
profiling methods, such as RNA profiling using microarrays, or pro-
tein quantification using mass spectrometry, provide large amounts of
information about the dynamics of cellular pathway components and
increase the power of pathway analysis techniques
18,19
. However, this
growth poses a formidable challenge for pathway data collection and
curation as well as for database, visualization and analysis software,
as these data are often fragmented.
The principal motivation for building pathway databases and soft-
ware tools is to facilitate qualitative and quantitative analysis and
modeling of large biological systems using a computational approach.
Over 300 pathway or molecular interaction–related data resources
17
and many visualization and analysis software tools
3,20–22
have been
developed. Unfortunately, most of these databases and tools were
originally developed to use their own pathway representation lan-
guage, resulting in a heterogeneous set of resources that are extremely
difficult to combine and use. This has occurred because many dif-
ferent research groups, each with their own system for representing
biomolecules and their interactions in a pathway, work independently
to collect pathway data recorded in the literature (estimated from
text-mining projects
23
to be present in at least 10% of the >20
million articles currently indexed by PubMed). As a result, researchers
waste time collecting information from different sources and con-
verting it from one form of representation to another. Fragmented
pathway data results in substantial lost opportunity cost. For instance,
visualization and analysis tools developed for one pathway database
cannot be reused for others, making software development efforts
more expensive. Therefore, it is imperative to develop computational
methods to cope with both the magnitude and fragmented nature of
this expanding, valuable pathway information. Whereas independent
research efforts are needed to find the best ways to represent path-
ways, community coordination and agreement on standard seman-
tics is necessary to be able to efficiently integrate pathway data from
multiple sources on a large scale.
BioPAX requirements and implementation
A common, inclusive and computable pathway data language is
necessary to share knowledge about pathway maps and to facilitate
integration and use for hypothesis testing in biology
24
. A shared
language facilitates communication by reducing the number of trans-
lations required to exchange data between multiple sources (Fig. 1).
Developing such a representation is challenging owing to the variety
of pathways in biology and the diverse uses of pathway information.
Pathway representations frequently use abstractions for metabolic,
signaling, gene regulation, protein interaction and genetic interaction,
and these serve as a starting point toward a shared language
25
. Also,
several variants of this common language may be required to answer
relevant research questions in distinct fields of biology, each covering
unique levels of detail addressing different uses, but these should be
rooted in common principles and must remain compatible.
BioPAX addresses these challenges. We developed BioPAX as a
shared language to facilitate communication between diverse soft-
ware systems and to establish standard knowledge representation of
pathway information. BioPAX supports representation of metabolic
and signaling pathways, molecular and genetic interactions and gene
regulation. Relationships between genes, small molecules, complexes
and their states (e.g., post-translational protein modifications, mRNA
splice variants, cellular location) are described, including the results
of events. Details about the BioPAX language are available in online
documentation at http://www.biopax.org/. The BioPAX language
provides terms and descriptions, to represent many aspects of biolog-
ical pathways and their annotation. It is implemented as an ontology,
a formal system of describing knowledge (Box 1) that helps structure
pathway data so that they are more easily processed by computer
software (Fig. 2). It provides a standard syntax used for data exchange
that is based on OWL (Web Ontology Language) (Box 1). Finally, it
provides a validator that uses a set of rules to verify whether a BioPAX
document is complete, consistent and free of common errors. BioPAX
is the only community standard for biological pathway exchange to
and from databases, but it is related to other standards (discussed
below in the “What is not covered? section).
Example of a pathway in BioPAX
Pathway models are generally described with text and with network
diagrams. Here we use the AKT signaling pathway
26,27
as an example
to show how a typical pathway diagram that can only be interpreted
by people (Fig. 3, top left) would be represented using BioPAX (
Fig. 3,
right). The AKT pathway is a cell surface receptor–activated signaling
cascade that transduces external signals to intracellular events through
a series of steps including protein-protein interactions and protein
kinase–mediated phosphorylation. The pathway eventually activates
transcription factors, which turn on genes to promote cell survival.
By representing the pathway using the BioPAX language (Fig. 3 and
Supplementary Tables 1 and 2), it can be analyzed by computational
approaches, such as pathway analysis of gene expression data.
Representing a pathway using the BioPAX language sometimes
necessitates being more explicit to avoid capturing inconsistent data.
For instance, the typical notion of an active proteinis dependent
on context, as the same molecule could be active in one cellular
context, such as a cellular compartment with a set of potentially
interacting molecules, and inactive in another context. Thus, captur-
ing the specific mechanism of activation, such as phosphorylation
modification, is usually required, and the presence of downstream
events that include the modified form signifies that the molecule is
active. Interactions where the mechanism of action is unknown can
also be specified.
What does BioPAX include?
BioPAX covers all major concepts familiar to biologists studying path-
ways, including metabolic and signaling pathways, gene regulatory
networks and genetic and molecular interactions (Supplementary
Table 3
). The BioPAX language is distributed as an ontology definition
(Fig. 4) with associated documentation, a validator for checking
a BioPAX document for errors and other software tools (Table 1).
Software
Database
Scientist
Efficient Communication
BioPAX
Figure 1 BioPAX is a shared language for biological pathways. BioPAX
reduces the effort required to efficiently communicate between pathway
users, databases and software tools. Without a shared language, each
system must speak the language of all other systems in the worst case
(black lines). With a shared language, each system only needs to speak
that language (central red box).
npg
© 2012 Nature America, Inc. All rights reserved.

nature biotechnology VOLUME 28 NUMBER 9 SEPTEMBER 2010 9 3 7
P E R S P E C T I V E
Pathway abstractions frequently used in several pathway databases
and software programs are supported as follows:
Metabolic pathways are described using the enzyme, substrate,
product’ abstraction
28
where substrates and products of a biochemi-
cal reaction are often small molecules. An enzyme, often a protein,
catalyzes the reaction, and inhibitors and activators can modulate the
catalysis event. Metabolic pathways use BioPAX classes: PhysicalEntity,
Conversion, Catalysis, Modulation, Pathway.
Signaling pathways involve molecules and complexes participating
in biochemical reactions, binding, transportation and catalysis events
(
Fig. 3)
5,9,29–31
. These pathways may also include descriptions of mole-
cular states (such as cellular location, covalent and noncovalent modifica-
tions, as well as fragments of sequence cleaved from a precursor) and
generic molecules (such as the family of homologous Wnt proteins).
Signaling pathways use BioPAX classes: PhysicalEntity, Conversion,
Control, Catalysis, Modulation, MolecularInteraction, Pathway.
• Gene regulatory networks involve transcription and translation
events and their control
12,14
. Transcription, translation and other
template-directed reactions involving DNA or RNA are captured in a
‘template reaction’ in BioPAX, which maps a template to its encoded
products (e.g., DNA to mRNA). Multiple sequence regions on a
single strand of the template, such as promoters, terminators, open
reading frames, operons and various reaction machinery bind-
ing sites, are active in a template reaction. Transcription factors
(generally proteins and complexes), microRNAs and other molecules,
participate in atemplate reaction regulation’ event. Gene regulatory
networks use BioPAX classes: PhysicalEntity, TemplateReaction,
TemplateReactionRegulation.
Molecular interactions, notably protein-protein
3236
and
protein-DNA interactions
37
, involve two or more ‘physical enti-
ties’. BioPAX follows the standard representation scheme of the
Proteomics Standards Initiative Molecular Interaction (PSI-MI)
format
38
. Molecular interactions use BioPAX classes: PhysicalEntity,
MolecularInteraction.
• Genetic interactions occur between two genes when the pheno-
typic consequence of perturbing both genes is different than expected
given the phenotypes of each single gene perturbation
39
. BioPAX
represents this as a pair of genes that participate in a genetic inter-
action measured using an observed ‘phenotype. Genetic interactions
use BioPAX classes: Gene, GeneticInteraction.
Metabolic-, signaling- and gene regulatory–pathway abstractions
are process oriented. They imply a temporal order and can be thought
of as extensions of the standard chemical reaction pathway notation
to accommodate biological information. Molecular and genetic inter-
actions, however, imply a static network of connections among system
components, instead of the temporally ordered process of reactions
that defines a metabolic or signaling pathway. BioPAX supports com-
bining these different types of data into a single model that is useful
to gain a more complete view of a cellular process.
Data
observations
Prior models
BioPAX
ontology
Use
Scientists
Publication about
a biological
process
Scientist
Software
PublishData Formalize
BioPAX
record
P
Publication about
a biological
process
TraditionalComputable
Pathway information processing
P
Figure 2 BioPAX enables computational data gathering, publication
and use of information about biological processes. Traditional pathway
information processing: observations considering prior models published
as text and figures. Computable pathway information processing:
scientists description represented using formal, computable framework
(ontology) published in a format readable by computer software for
analysis by scientists.
Box 1 What is an ontology?
An ontology is a formal system for representing knowledge
64
. Such representation is required for computer software to make use of
information. Example ontologies include organism taxonomies
65
and the Gene Ontology
40
. A formal representation allows consistent
communication of knowledge among individuals or computer systems and helps manage complexity in information processing as knowl-
edge is broken down into clear concepts that can be considered independently. Ontologies also enable integration of knowledge between
independent resources linked on the World Wide Web. Such linked, structured data form the basis of the semantic web, an extension of
the web that promises improved information management and search capability
61
. Representing and sharing knowledge using ontologies
is simplified by availability of the standard web ontology language (OWL; http://www.w3.org/TR/owl-features/). Tools to edit OWL, such
as Protégé
63
, have been developed by the semantic web community and adopted in the life sciences. Implementing BioPAX using OWL
enables both the ontology and the individuals and values to be stored in the same XML-based format, which makes data transmission
easier. Using OWL also enables BioPAX users to take advantage of existing software tools for editing, transmitting, querying, reasoning
about and visualizing OWL data.
An ontology is composed of classes, properties (representing relations) and restrictions and is used to define individuals (instances
of classes, also known as objects) and values for their properties. Classes (also known as concepts or types) are often arranged into a
hierarchy (or taxonomy) where child classes are more specific than, and inherit the properties of, parent classes. For example, in
BioPAX, the BiochemicalReaction class is a subclass of the Conversion class. Classes may have properties (also known as fields,
attributes or slots), which express possible relations to other classes (that is, they may have values of specific types). For example,
a SmallMolecule is related to the ChemicalStructure class by the property structure. Restrictions (also known as constraints) define
allowable values and connections within an ontology. For example, molecularWeight must be a positive number. Individuals are
instances of classes where values occupy the properties of those instances. BioPAX defines the classes, properties and restrictions
required to represent biological pathways and leaves creation of the individuals to users (data providers and consumers).
npg
© 2012 Nature America, Inc. All rights reserved.

9 3 8 VOLUME 28 NUMBER 9 SEPTEMBER 2010 nature biotechnology
P E R S P E C T I V E
BioPAX provides many additional constructs, not shown in Figure 4,
that are used to store extra details, such as database cross-references,
chemical structure, experimental forms of molecules, sequence feature
locations and links to controlled vocabulary terms in other ontologies
(Supplementary Fig. 1). BioPAX reuses a number of standard controlled
vocabularies defined by other groups. For example, Gene Ontology
40
is used to describe cellular location, PSI-MI vocabularies
38
are used to
define evidence codes, experimental forms, interaction types, relation-
ship types and sequence modifications, and Sequence Ontology
41
is used
to define types of sequence regions, such as a promoter region on DNA
involved in transcription of a gene. Other useful controlled vocabularies
can be referenced, such as the molecule role ontology
42
.
BioPAX defines additional semantics that are currently only cap-
tured in documentation. For instance, physical entities represent
pools of molecules and not individual molecules, corresponding to
typical semantics used when describing pathways in textbooks or
databases. A molecular pool is a set of molecules in a bounded area
of the cell, thus it has a concentration. Pools can be heterogeneous
and can overlap, as in the case of a protein existing in multiple phos-
phorylation states.
BioPAX also defines a range of constructs that are represented as
ontology classes. Some of these represent biological entities, such as
proteins, and are organized into classes that conceptualize the path-
way knowledge domain. Others are used to represent annotations
and properties of the database representation of biological entities.
For instance, BioPAX provides ‘xref classes to represent different
kinds of references to databases that can be useful for data integration.
These are represented as subclasses of UtilityClass for convenience.
A future version of BioPAX would ideally capture these semantics
and structure these concepts more formally.
Uses of pathway data encoded in BioPAX
Once pathway data are translated into a standard computable language,
such as BioPAX, it is easier for software to access them and thereby
support browsing, retrieval, visualization and analysis (
Fig. 5). This
enables efficient reuse of data in different ways, avoiding the time-
consuming and often frustrating task of translating them between
formats (Fig. 1). Additionally, it enables uses that would be impractical
without a standard format, such as those dependent on combining all
available pathway data.
BioPAX can be used to help aggregate large pathway data sets by
reducing the required collection and translation effort, for instance
using software such as cPath
43
. Typical biological queries, such as
‘What reactions involve my protein of interest? generate more com-
plete answers when querying these larger pathway data sets. Another
frequent use is to find pathways that are active in a particular bio-
logical context, such as a cell state determined by a genome-scale
molecular profile measurement. For instance, pathways with mul-
tiple differentially expressed genes may be transcriptionally active
in one biological condition and not in another. Functional genom-
ics and pathway data can be imported into software and combined
for visualization and analysis to find interesting network regions.
A typical workflow involves overlaying molecular profiling data, such
as mRNA transcript profiles, on a network of interacting proteins
to identify transcriptionally active network regions, which may
represent active pathways
44
. A number of recent papers have used
this pathway analysis workflow to highlight genes and pathways
that are active in specific model organisms or diseased tissues, such
as breast cancer, using gene and protein expression, copy number
variants and single-nucleotide polymorphisms
19,44–49
. BioPAX has
also been used in a number of these studies to collect and integrate
large amounts of pathway information from multiple databases for
analysis. For instance, protein expression data were combined with
pathway information to highlight the importance of apoptosis in a
mouse model of heart disease
50
. Multiple groups have found that
tumor-associated mutations are significantly related by pathway
Table 1 What is included in BioPAX
Content Description
Ontology specification Web Ontology Language (OWL) XML file, developed
using free Protégé ontology editor software
63
.
Language documentation Explanation of BioPAX entities, example documen-
tation, best practice recommendations, use cases
and instructions for carrying out frequently used
technical tasks.
Example files Example files for biochemical pathway, protein and
genetic interaction, protein phosphorylation, insulin
maturation, gene regulation and generic molecules
in OWL XML.
Graphical representation Recommendations for graphical representation using
Systems Biology Graphical Notation (SBGN) as a guide.
Paxtools software Java programming library supporting import/export,
conversion and validation. Can be used to add
BioPAX support to software.
List of data sources and
supporting software
Databases making data available in BioPAX format,
software systems for storing, visualizing and
analyzing BioPAX pathways.
AKT
AKT
AKT
P
Thr308
Ser473
hsp90
PDK1
P
P
P
P
rAKT1 is a ProteinReference
has standard-name “AKT1”
has name “PKB”
has xref Uniprot-P31749
AKT1.1 is a Protein
has proteinReference rAKT1
has notFeature p@308
has notFeature p@473
reaction1 is a BiochemicalReaction
has left AKT1.2
has right AKT1.1
is left-to-right.
AKT1
AKT1
AKT1
AKT1
P
P
308
473
P
P
473
308
P
473
308
473
308
HSP90
HSP90
PDK1
PDK2
PP2A
AKT1.2 is a Protein
has proteinReference rAKT1
has feature p@308
has notFeature p@473
catalysis1 is a Catalysis
has controller PP2A.1
has controlled reaction1
has direction irr-left-to-right
assembly1 is a ComplexAssembly
has left HSP90.1
has left AKT1.3
has right complex1
is reversible
complex1 is a Complex
has component AKT1.4
has component HSP90.2
HSP90.2 is a Protein
has proteinReference rHSP90
is boundTo AKT1.4
AKT1.4 is a Protein
has proteinReference rAKT1
has feature p@308
has feature p@473
is boundTo HSP90.2
p@308 is a ModificationFeature
has featureLocation AKT1-308
has modificationType
phosphorylation
PP2A
Figure 3 The AKT pathway as represented by a traditional method (top left;
from http://www.biocarta.com/), a formalized SBGN diagram (left; from
http://www.sbgn.org/
62
) and using the BioPAX language (right). An important
advantage of the BioPAX representation is that it can be interpreted by
computer software and used in multiple ways, including automatic diagram
creation, information retrieval and analysis. Online documentation at
http://www.biopax.org/ contains more details about how to represent diverse
types of biological pathways. Actual samples of pathway data in BioPAX
OWL XML format are available in Supplementary Tables 1 and 2.
npg
© 2012 Nature America, Inc. All rights reserved.

nature biotechnology VOLUME 28 NUMBER 9 SEPTEMBER 2010 9 3 9
P E R S P E C T I V E
information
47,48
. And recently, in a study of rare copy number vari-
ants in 996 individuals with autism spectrum disorder, a core set of
neuronal development–related pathways were found to link dozens
of rare mutations to autism that were not significantly linked to the
disorder on their own by traditional single-gene association statis-
tics
49
. These studies highlight the importance of pathway information
in explaining the functional consequence of mutations in human
disease. BioPAX pathway data can also be converted into simula-
tion models, for instance using differential equations
51
or rule-based
modeling languages
52
, to predict how a biological system may func-
tion after a gene is knocked out.
BioPAX is useful for exchanging information among and between
data providers and analysis software. Pathway database groups
can share the effort of pathway curation by making their pathways
available in BioPAX format and exchanging them with others. For
example, pathways in BioPAX format from the Reactome
8
database
are imported by the US National Cancer Institute/Nature Pathway
Information Database
9
. Data providers can use existing BioPAX-
enabled software to add useful new features to their systems. For
example, the Cytoscape network visualization software
20
can read and
display BioPAX-formatted data as a network. The Reactome group
used this feature to create a pathway visualization tool for their web-
site. Because Reactome data were available in BioPAX format, and
Cytoscape could already read BioPAX format, this new feature was
easy to implement.
The Paxtools Java programming library for BioPAX has been
developed to help software developers readily support the import,
export and validation of BioPAX-formatted data for various uses in
their software (http://www.biopax.org/paxtools/). Using Paxtools
and other tools, a range of BioPAX-compatible software has been
developed, including browsers, visualizers, querying engines,
editors and converters (Supplementary Table 4). For instance,
the ChiBE and VisANT pathway-visualization tools read BioPAX
format
22
, and the WikiPathways website
53
, a community wiki
for pathways, is working on using BioPAX to help import path-
ways from several sources, including manually edited pathways
from biologists. The Pathway Tools software
21
and CellDesigner
pathway editor
54
are developing support for BioPAX-based data
exchange. In addition, tools for the storage and querying of
Resource Description Framework (http://www.w3.org/RDF/) data
sets, generated within the Semantic Web community, can be used
to effectively process BioPAX data.
What is not covered?
The BioPAX language uses a discrete repre-
sentation of biological pathways. Dynamic
and quantitative aspects of biological proc-
esses, including temporal aspects of feedback
loops and calcium waves, are not supported.
However, BioPAX addresses this need by coor-
dinating work (as described below) with the
SBML and CellML mathematical modeling language communities
55,56
and a growing software tool set supporting biological process
simulation
57
. Detailed information about experimental evidence sup-
porting elements of a pathway map is useful for evaluating the qual-
ity of pathway data. This information is only included in BioPAX for
molecular interactions, because that was already defined by the PSI-MI
language
58
and it was reused The BioPAX work group makes use of
PSI-MI–controlled vocabularies and other concepts and works with
the PSI-MI work group to build these vocabularies in areas of shared
interest, such as genetic interactions. Although BioPAX does not aim to
standardize how pathways are visualized, work is coordinated with the
Entity
Pathway
Interaction
TemplateReaction
Control
Catalysis
TemplateReactionRegulation
Modulation
Transport
DegradationTransportWithBiochemicalReaction
Biochemical
Reaction
Complex
Assembly
Conversion
MolecularInteraction GeneticInteraction
Gene
Protein
DNA
RNA
PhysicalEntity
Complex
Protein properties
availability (String*)
name (String*)
-comment (String*)
xref (Xref*)
data Source (Provenance*)
evidence (Evidence*)
feature (Entity-Feature*)
not Feature (Entity-Feature*)
member Physical Entity (Protein*)
cellular Location
(Cellular-Location-Vocabulary*)
entity Reference (Protein-Reference)
Small
Molecule
Figure 4 High-level view of the BioPAX ontology.
Classes, shown as boxes and arrows, represent
inheritance relationships. The three main
types of classes in BioPAX are Pathway (red),
Interaction (green) and PhysicalEntity and
Gene (blue). For brevity, the properties of the
Protein class only are shown as an example at
the top right. Asterisks indicate that multiple
values for the property are allowed. Refer to
BioPAX documentation at http://www.biopax.org/
for full details of all classes and properties.
Export Import
Pathway analysis of genomics data
Pathway visualization from database
Data exchange between database groups
Database 1
Genomics
data
Pathway
data
Database 2
ExportDatabase 1
Visualization
software
Analysis
software
Find active
pathways
AKT1
PDK1
PP2A
PDK2
HSP90
AKT1
AKT1
308
473
P
P
308
473
P
308
473
Figure 5 Example uses of pathway information in BioPAX format. Red-
colored boxes or lines indicate the use of BioPAX.
npg
© 2012 Nature America, Inc. All rights reserved.

Figures
Citations
More filters
Journal ArticleDOI

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

TL;DR: A significant update to one of the tools in this domain called Enrichr, a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries is presented.
Journal ArticleDOI

Large-scale gene function analysis with the PANTHER classification system

TL;DR: This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system, and redesigned the website interface to improve both user experience and the system's analytical capability.
Journal ArticleDOI

Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation

TL;DR: This work developed “Enrichment Map”, a network-based visualization method for gene-set enrichment results that is implemented as a freely available and user friendly plug-in for the Cytoscape network visualization software and is a significant advance in the interpretation of enrichment analysis.
Journal ArticleDOI

PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees

TL;DR: The current PANTHER process as a whole, as well as the website tools for analysis of user-uploaded data are described, which include stable database identifiers for inferred ancestral genes, which are used to associate inferred gene attributes with particular genes in the common ancestral genomes of extant species.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Journal ArticleDOI

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

TL;DR: The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.
Journal ArticleDOI

Database resources of the National Center for Biotechnology Information

TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.
Journal ArticleDOI

Comprehensive genomic characterization defines human glioblastoma genes and core pathways

Roger E. McLendon, +233 more
- 23 Oct 2008 - 
TL;DR: The interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated gliobeasts, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What are the different types of signals that are captured in BioPAX?

translation and other template-directed reactions involving DNA or RNA are captured in a ‘template reaction’ in BioPAX, which maps a template to its encoded products (e.g., DNA to mRNA). 

In addition, tools for the storage and querying of Resource Description Framework (http://www.w3.org/RDF/) data sets, generated within the Semantic Web community, can be used to effectively process BioPAX data. 

BioPAX can be used to help aggregate large pathway data sets by reducing the required collection and translation effort, for instance using software such as cPath43. 

For instance, future BioPAX levels should capture cell-cell interactions, be better at describing pathways where sub-processes are not known or need not be represented, more closely integrate third-party controlled vocabularies and ontologies to ease their use and better encode semantics for easier data validation and reasoning. 

Easy-to-use tools for tasks like pathway editing must also be developed so that biologists can share their data in BioPAX format without substantial resource investment. 

Using OWL also enables BioPAX users to take advantage of existing software tools for editing, transmitting, querying, reasoning about and visualizing OWL data. 

The Paxtools Java programming library for BioPAX has been developed to help software developers readily support the import, export and validation of BioPAX-formatted data for various uses in their software (http://www.biopax.org/paxtools/). 

Once pathway data are translated into a standard computable language, such as BioPAX, it is easier for software to access them and therebysupport browsing, retrieval, visualization and analysis (Fig. 5). 

it provides a validator that uses a set of rules to verify whether a BioPAX document is complete, consistent and free of common errors. 

Representing a pathway using the BioPAX language sometimes necessitates being more explicit to avoid capturing inconsistent data. 

All authors helped develop the BioPAX language, ontology, documentation and examples by participating in workshops or on mailing lists and/or provided data inBioPAX format and/or wrote software that supports BioPAX.