What are some of the goals of the bioPAX community?

For instance, future BioPAX levels should capture cell-cell interactions, be better at describing pathways where sub-processes are not known or need not be represented, more closely integrate third-party controlled vocabularies and ontologies to ease their use and better encode semantics for easier data validation and reasoning.

How can biologists use the BioPAX language?

Easy-to-use tools for tasks like pathway editing must also be developed so that biologists can share their data in BioPAX format without substantial resource investment.

What did the authors contribute to the development of BioPAX?

All authors helped develop the BioPAX language, ontology, documentation and examples by participating in workshops or on mailing lists and/or provided data inBioPAX format and/or wrote software that supports BioPAX.

(Open Access) The BioPAX community standard for pathway data sharing (2010) | Emek Demir

nature biotechnology VOLUME 28 NUMBER 9 SEPTEMBER 2010 9 3 5

P E R S P E C T I V E

Biological Pathway Exchange (BioPAX) is a standard language

to represent biological pathways at the molecular and cellular

level and to facilitate the exchange of pathway data. The

rapid growth of the volume of pathway data has spurred the

development of databases and computational tools to aid

interpretation; however, use of these data is hampered by the

current fragmentation of pathway information across many

databases with incompatible formats. BioPAX, which was

created through a community process, solves this problem

by making pathway data substantially easier to collect,

index, interpret and share. BioPAX can represent metabolic

and signaling pathways, molecular and genetic interactions

and gene regulation networks. Using BioPAX, millions of

interactions, organized into thousands of pathways, from many

organisms are available from a growing number of databases.

This large amount of pathway data in a computable form will

support visualization, analysis and biological discovery.

Increasingly powerful technologies, including genome-wide molecular

measurements, have accelerated progress toward a complete map of

molecular interaction networks in cells and between cells of many organ-

isms. The growing scale of these maps requires their representation in

a form suitable for computer processing, storage and dissemination

by means of software systems. The BioPAX project aims to facilitate

knowledge representation, systematic collection, integration and wide

distribution of pathway data from heterogeneous information sources.

This will enable these data to be incorporated into distributed biological

information systems that support visualization and analysis.

BioPAX supports efforts working toward a complete representation of

basic cellular processes. Biology has come a long way since the Boehringer-

Mannheim wall chart of metabolic pathways

and the Nicholson Metabolic

Map

. Since then, several groups have developed methods and databases

for organizing pathway information

3–16

, but only recently have groups

collaborated as part of the BioPAX project to develop a generally accepted

standard way of representing these pathway maps. Complete molecular

process maps must include all interactions, reactions, dependencies, influ-

ence and information flow between pools of molecules in cells and between

cells. For ease of use and simplicity of presentation, such network maps

are often organized in terms of subnetworks or pathways. Pathways are

models delineated within the entire cellular biochemical network that help

us describe and understand specific biological processes. Thus, a useful

definition of a pathway is a set of interactions between physical or genetic

cell components, often describing a cause-and-effect or time-dependent

process, that explains observable biological phenomena. How do we rep-

resent these pathways in a generally accepted and computable form?

Challenges posed by the many fragmented pathway databases

The total volume of pathway data mapped by biologists and stored

in databases has entered a rapid growth phase, with the number of

The BioPAX community standard for pathway

data sharing

Emek Demir

1,2,

*, Michael P Cary

, Suzanne Paley

, Ken Fukuda

, Christian Lemer

, Imre Vastrik

Guanming Wu

, Peter D’Eustachio

, Carl Schaefer

, Joanne Luciano

, Frank Schacherer

Irma Martinez-Flores

, Zhenjun Hu

, Veronica Jimenez-Jacinto

, Geeta Joshi-Tope

, Kumaran Kandasamy

Alejandra C Lopez-Fuentes

, Huaiyu Mi

, Elgar Pichler

, Igor Rodchenkov

, Andrea Splendiani

20,21

Sasha Tkachev

, Jeremy Zucker

, Gopal Gopinath

, Harsha Rajasimha

25,26

, Ranjani Ramakrishnan

Imran Shah

, Mustafa Syed

, Nadia Anwar

, Özgün Babur

1,2

, Michael Blinov

, Erik Brauner

Dan Corwin

, Sylva Donaldson

, Frank Gibbons

, Robert Goldberg

, Peter Hornbeck

, Augustin Luna

Peter Murray-Rust

, Eric Neumann

, Oliver Ruebenacker

, Matthias Samwald

38,39

, Martijn van Iersel

Sarala Wimalaratne

, Keith Allen

, Burk Braun

, Michelle Whirl-Carrillo

, Kei-Hoi Cheung

Kam Dahlquist

, Andrew Finney

, Marc Gillespie

, Elizabeth Glass

, Li Gong

, Robin Haw

Michael Honig

, Olivier Hubaut

, David Kane

, Shiva Krupa

, Martina Kutmon

, Julie Leonard

Debbie Marks

, David Merberg

, Victoria Petri

, Alex Pico

, Dean Ravenscroft

, Liya Ren

, Nigam Shah

Margot Sunshine

, Rebecca Tang

, Ryan Whaley

, Stan Letovksy

, Kenneth H Buetow

, Andrey Rzhetsky

Vincent Schachter

, Bruno S Sobral

, Ugur Dogrusoz

, Shannon McWeeney

, Mirit Aladjem

, Ewan Birney

Julio Collado-Vides

, Susumu Goto

, Michael Hucka

, Nicolas Le Novère

, Natalia Maltsev

Akhilesh Pandey

, Paul Thomas

, Edgar Wingender

, Peter D Karp

, Chris Sander

& Gary D Bader

A full list of author affiliations appear at the end of this paper.

Published online 9 September 2010; corrected after print 7 December 2010 and

10 April 2012; doi:10.1038/nbt.1666

npg

9 3 6 VOLUME 28 NUMBER 9 SEPTEMBER 2010 nature biotechnology

P E R S P E C T I V E

online resources for pathways and molecular interactions increasing

70%, from 190 in 2006 to 325 in 2010 (ref. 17). In addition, molecular

profiling methods, such as RNA profiling using microarrays, or pro-

tein quantification using mass spectrometry, provide large amounts of

information about the dynamics of cellular pathway components and

increase the power of pathway analysis techniques

18,19

. However, this

growth poses a formidable challenge for pathway data collection and

curation as well as for database, visualization and analysis software,

as these data are often fragmented.

The principal motivation for building pathway databases and soft-

ware tools is to facilitate qualitative and quantitative analysis and

modeling of large biological systems using a computational approach.

Over 300 pathway or molecular interaction–related data resources

and many visualization and analysis software tools

3,20–22

have been

developed. Unfortunately, most of these databases and tools were

originally developed to use their own pathway representation lan-

guage, resulting in a heterogeneous set of resources that are extremely

difficult to combine and use. This has occurred because many dif-

ferent research groups, each with their own system for representing

biomolecules and their interactions in a pathway, work independently

to collect pathway data recorded in the literature (estimated from

text-mining projects

to be present in at least 10% of the >20

million articles currently indexed by PubMed). As a result, researchers

waste time collecting information from different sources and con-

verting it from one form of representation to another. Fragmented

pathway data results in substantial lost opportunity cost. For instance,

visualization and analysis tools developed for one pathway database

cannot be reused for others, making software development efforts

more expensive. Therefore, it is imperative to develop computational

methods to cope with both the magnitude and fragmented nature of

this expanding, valuable pathway information. Whereas independent

research efforts are needed to find the best ways to represent path-

ways, community coordination and agreement on standard seman-

tics is necessary to be able to efficiently integrate pathway data from

multiple sources on a large scale.

BioPAX requirements and implementation

A common, inclusive and computable pathway data language is

necessary to share knowledge about pathway maps and to facilitate

integration and use for hypothesis testing in biology

. A shared

language facilitates communication by reducing the number of trans-

lations required to exchange data between multiple sources (Fig. 1).

Developing such a representation is challenging owing to the variety

of pathways in biology and the diverse uses of pathway information.

Pathway representations frequently use abstractions for metabolic,

signaling, gene regulation, protein interaction and genetic interaction,

and these serve as a starting point toward a shared language

. Also,

several variants of this common language may be required to answer

relevant research questions in distinct fields of biology, each covering

unique levels of detail addressing different uses, but these should be

rooted in common principles and must remain compatible.

BioPAX addresses these challenges. We developed BioPAX as a

shared language to facilitate communication between diverse soft-

ware systems and to establish standard knowledge representation of

pathway information. BioPAX supports representation of metabolic

and signaling pathways, molecular and genetic interactions and gene

regulation. Relationships between genes, small molecules, complexes

and their states (e.g., post-translational protein modifications, mRNA

splice variants, cellular location) are described, including the results

of events. Details about the BioPAX language are available in online

documentation at http://www.biopax.org/. The BioPAX language

provides terms and descriptions, to represent many aspects of biolog-

ical pathways and their annotation. It is implemented as an ontology,

a formal system of describing knowledge (Box 1) that helps structure

pathway data so that they are more easily processed by computer

software (Fig. 2). It provides a standard syntax used for data exchange

that is based on OWL (Web Ontology Language) (Box 1). Finally, it

provides a validator that uses a set of rules to verify whether a BioPAX

document is complete, consistent and free of common errors. BioPAX

is the only community standard for biological pathway exchange to

and from databases, but it is related to other standards (discussed

below in the “What is not covered?” section).

Example of a pathway in BioPAX

Pathway models are generally described with text and with network

diagrams. Here we use the AKT signaling pathway

26,27

as an example

to show how a typical pathway diagram that can only be interpreted

by people (Fig. 3, top left) would be represented using BioPAX (

Fig. 3,

right). The AKT pathway is a cell surface receptor–activated signaling

cascade that transduces external signals to intracellular events through

a series of steps including protein-protein interactions and protein

kinase–mediated phosphorylation. The pathway eventually activates

transcription factors, which turn on genes to promote cell survival.

By representing the pathway using the BioPAX language (Fig. 3 and

Supplementary Tables 1 and 2), it can be analyzed by computational

approaches, such as pathway analysis of gene expression data.

Representing a pathway using the BioPAX language sometimes

necessitates being more explicit to avoid capturing inconsistent data.

For instance, the typical notion of an ‘active protein’ is dependent

on context, as the same molecule could be active in one cellular

context, such as a cellular compartment with a set of potentially

interacting molecules, and inactive in another context. Thus, captur-

ing the specific mechanism of activation, such as phosphorylation

modification, is usually required, and the presence of downstream

events that include the modified form signifies that the molecule is

active. Interactions where the mechanism of action is unknown can

also be specified.

What does BioPAX include?

BioPAX covers all major concepts familiar to biologists studying path-

ways, including metabolic and signaling pathways, gene regulatory

networks and genetic and molecular interactions (Supplementary

Table 3

). The BioPAX language is distributed as an ontology definition

(Fig. 4) with associated documentation, a validator for checking

a BioPAX document for errors and other software tools (Table 1).

Software

Database

Scientist

Efficient Communication

BioPAX

Figure 1 BioPAX is a shared language for biological pathways. BioPAX

reduces the effort required to efficiently communicate between pathway

users, databases and software tools. Without a shared language, each

system must speak the language of all other systems in the worst case

(black lines). With a shared language, each system only needs to speak

that language (central red box).

npg

nature biotechnology VOLUME 28 NUMBER 9 SEPTEMBER 2010 9 3 7

P E R S P E C T I V E

Pathway abstractions frequently used in several pathway databases

and software programs are supported as follows:

• Metabolic pathways are described using the ‘enzyme, substrate,

product’ abstraction

where substrates and products of a biochemi-

cal reaction are often small molecules. An enzyme, often a protein,

catalyzes the reaction, and inhibitors and activators can modulate the

catalysis event. Metabolic pathways use BioPAX classes: PhysicalEntity,

Conversion, Catalysis, Modulation, Pathway.

• Signaling pathways involve molecules and complexes participating

in biochemical reactions, binding, transportation and catalysis events

(

Fig. 3)

5,9,29–31

. These pathways may also include descriptions of mole-

cular states (such as cellular location, covalent and noncovalent modifica-

tions, as well as fragments of sequence cleaved from a precursor) and

generic molecules (such as the family of homologous Wnt proteins).

Signaling pathways use BioPAX classes: PhysicalEntity, Conversion,

Control, Catalysis, Modulation, MolecularInteraction, Pathway.

• Gene regulatory networks involve transcription and translation

events and their control

12,14

. Transcription, translation and other

template-directed reactions involving DNA or RNA are captured in a

‘template reaction’ in BioPAX, which maps a template to its encoded

products (e.g., DNA to mRNA). Multiple sequence regions on a

single strand of the template, such as promoters, terminators, open

reading frames, operons and various reaction machinery bind-

ing sites, are active in a template reaction. Transcription factors

(generally proteins and complexes), microRNAs and other molecules,

participate in a ‘template reaction regulation’ event. Gene regulatory

networks use BioPAX classes: PhysicalEntity, TemplateReaction,

TemplateReactionRegulation.

• Molecular interactions, notably protein-protein

32–36

and

protein-DNA interactions

, involve two or more ‘physical enti-

ties’. BioPAX follows the standard representation scheme of the

Proteomics Standards Initiative Molecular Interaction (PSI-MI)

format

. Molecular interactions use BioPAX classes: PhysicalEntity,

MolecularInteraction.

• Genetic interactions occur between two genes when the pheno-

typic consequence of perturbing both genes is different than expected

given the phenotypes of each single gene perturbation

. BioPAX

represents this as a pair of genes that participate in a ‘genetic inter-

action’ measured using an observed ‘phenotype’. Genetic interactions

use BioPAX classes: Gene, GeneticInteraction.

Metabolic-, signaling- and gene regulatory–pathway abstractions

are process oriented. They imply a temporal order and can be thought

of as extensions of the standard chemical reaction pathway notation

to accommodate biological information. Molecular and genetic inter-

actions, however, imply a static network of connections among system

components, instead of the temporally ordered process of reactions

that defines a metabolic or signaling pathway. BioPAX supports com-

bining these different types of data into a single model that is useful

to gain a more complete view of a cellular process.

Data

observations

Prior models

BioPAX

ontology

Use

Scientists

Publication about

a biological

process

Scientist

Software

PublishData Formalize

BioPAX

record

Publication about

a biological

process

TraditionalComputable

Pathway information processing

Figure 2 BioPAX enables computational data gathering, publication

and use of information about biological processes. Traditional pathway

information processing: observations considering prior models published

as text and figures. Computable pathway information processing:

scientist’s description represented using formal, computable framework

(ontology) published in a format readable by computer software for

analysis by scientists.

Box 1 What is an ontology?

An ontology is a formal system for representing knowledge

. Such representation is required for computer software to make use of

information. Example ontologies include organism taxonomies

and the Gene Ontology

. A formal representation allows consistent

communication of knowledge among individuals or computer systems and helps manage complexity in information processing as knowl-

edge is broken down into clear concepts that can be considered independently. Ontologies also enable integration of knowledge between

independent resources linked on the World Wide Web. Such linked, structured data form the basis of the semantic web, an extension of

the web that promises improved information management and search capability

. Representing and sharing knowledge using ontologies

is simpliﬁed by availability of the standard web ontology language (OWL; http://www.w3.org/TR/owl-features/). Tools to edit OWL, such

as Protégé

, have been developed by the semantic web community and adopted in the life sciences. Implementing BioPAX using OWL

enables both the ontology and the individuals and values to be stored in the same XML-based format, which makes data transmission

easier. Using OWL also enables BioPAX users to take advantage of existing software tools for editing, transmitting, querying, reasoning

about and visualizing OWL data.

An ontology is composed of classes, properties (representing relations) and restrictions and is used to deﬁne individuals (instances

of classes, also known as objects) and values for their properties. Classes (also known as concepts or types) are often arranged into a

hierarchy (or taxonomy) where child classes are more speciﬁc than, and inherit the properties of, parent classes. For example, in

BioPAX, the BiochemicalReaction class is a subclass of the Conversion class. Classes may have properties (also known as ﬁelds,

attributes or slots), which express possible relations to other classes (that is, they may have values of speciﬁc types). For example,

a SmallMolecule is related to the ChemicalStructure class by the property structure. Restrictions (also known as constraints) deﬁne

allowable values and connections within an ontology. For example, molecularWeight must be a positive number. Individuals are

instances of classes where values occupy the properties of those instances. BioPAX deﬁnes the classes, properties and restrictions

required to represent biological pathways and leaves creation of the individuals to users (data providers and consumers).

npg

9 3 8 VOLUME 28 NUMBER 9 SEPTEMBER 2010 nature biotechnology

P E R S P E C T I V E

BioPAX provides many additional constructs, not shown in Figure 4,

that are used to store extra details, such as database cross-references,

chemical structure, experimental forms of molecules, sequence feature

locations and links to controlled vocabulary terms in other ontologies

(Supplementary Fig. 1). BioPAX reuses a number of standard controlled

vocabularies defined by other groups. For example, Gene Ontology

is used to describe cellular location, PSI-MI vocabularies

are used to

define evidence codes, experimental forms, interaction types, relation-

ship types and sequence modifications, and Sequence Ontology

is used

to define types of sequence regions, such as a promoter region on DNA

involved in transcription of a gene. Other useful controlled vocabularies

can be referenced, such as the molecule role ontology

BioPAX defines additional semantics that are currently only cap-

tured in documentation. For instance, physical entities represent

pools of molecules and not individual molecules, corresponding to

typical semantics used when describing pathways in textbooks or

databases. A molecular pool is a set of molecules in a bounded area

of the cell, thus it has a concentration. Pools can be heterogeneous

and can overlap, as in the case of a protein existing in multiple phos-

phorylation states.

BioPAX also defines a range of constructs that are represented as

ontology classes. Some of these represent biological entities, such as

proteins, and are organized into classes that conceptualize the path-

way knowledge domain. Others are used to represent annotations

and properties of the database representation of biological entities.

For instance, BioPAX provides ‘xref ’ classes to represent different

kinds of references to databases that can be useful for data integration.

These are represented as subclasses of UtilityClass for convenience.

A future version of BioPAX would ideally capture these semantics

and structure these concepts more formally.

Uses of pathway data encoded in BioPAX

Once pathway data are translated into a standard computable language,

such as BioPAX, it is easier for software to access them and thereby

support browsing, retrieval, visualization and analysis (

Fig. 5). This

enables efficient reuse of data in different ways, avoiding the time-

consuming and often frustrating task of translating them between

formats (Fig. 1). Additionally, it enables uses that would be impractical

without a standard format, such as those dependent on combining all

available pathway data.

BioPAX can be used to help aggregate large pathway data sets by

reducing the required collection and translation effort, for instance

using software such as cPath

. Typical biological queries, such as

‘What reactions involve my protein of interest?’ generate more com-

plete answers when querying these larger pathway data sets. Another

frequent use is to find pathways that are active in a particular bio-

logical context, such as a cell state determined by a genome-scale

molecular profile measurement. For instance, pathways with mul-

tiple differentially expressed genes may be transcriptionally active

in one biological condition and not in another. Functional genom-

ics and pathway data can be imported into software and combined

for visualization and analysis to find interesting network regions.

A typical workflow involves overlaying molecular profiling data, such

as mRNA transcript profiles, on a network of interacting proteins

to identify transcriptionally active network regions, which may

represent active pathways

. A number of recent papers have used

this pathway analysis workflow to highlight genes and pathways

that are active in specific model organisms or diseased tissues, such

as breast cancer, using gene and protein expression, copy number

variants and single-nucleotide polymorphisms

19,44–49

. BioPAX has

also been used in a number of these studies to collect and integrate

large amounts of pathway information from multiple databases for

analysis. For instance, protein expression data were combined with

pathway information to highlight the importance of apoptosis in a

mouse model of heart disease

. Multiple groups have found that

tumor-associated mutations are significantly related by pathway

Table 1 What is included in BioPAX

Content Description

Ontology speciﬁcation Web Ontology Language (OWL) XML ﬁle, developed

using free Protégé ontology editor software

Language documentation Explanation of BioPAX entities, example documen-

tation, best practice recommendations, use cases

and instructions for carrying out frequently used

technical tasks.

Example ﬁles Example ﬁles for biochemical pathway, protein and

genetic interaction, protein phosphorylation, insulin

maturation, gene regulation and generic molecules

in OWL XML.

Graphical representation Recommendations for graphical representation using

Systems Biology Graphical Notation (SBGN) as a guide.

Paxtools software Java programming library supporting import/export,

conversion and validation. Can be used to add

BioPAX support to software.

List of data sources and

supporting software

Databases making data available in BioPAX format,

software systems for storing, visualizing and

analyzing BioPAX pathways.

AKT

Thr308

Ser473

hsp90

PDK1

rAKT1 is a ProteinReference

has standard-name “AKT1”

has name “PKB”

has xref Uniprot-P31749

AKT1.1 is a Protein

has proteinReference rAKT1

has notFeature p@308

has notFeature p@473

reaction1 is a BiochemicalReaction

has left AKT1.2

has right AKT1.1

is left-to-right.

AKT1

308

473

308

473

308

473

308

HSP90

PDK1

PDK2

PP2A

AKT1.2 is a Protein

has proteinReference rAKT1

has feature p@308

has notFeature p@473

catalysis1 is a Catalysis

has controller PP2A.1

has controlled reaction1

has direction irr-left-to-right

assembly1 is a ComplexAssembly

has left HSP90.1

has left AKT1.3

has right complex1

is reversible

complex1 is a Complex

has component AKT1.4

has component HSP90.2

HSP90.2 is a Protein

has proteinReference rHSP90

is boundTo AKT1.4

AKT1.4 is a Protein

has proteinReference rAKT1

has feature p@308

has feature p@473

is boundTo HSP90.2

p@308 is a ModificationFeature

has featureLocation AKT1-308

has modificationType

phosphorylation

PP2A

Figure 3 The AKT pathway as represented by a traditional method (top left;

from http://www.biocarta.com/), a formalized SBGN diagram (left; from

http://www.sbgn.org/

) and using the BioPAX language (right). An important

advantage of the BioPAX representation is that it can be interpreted by

computer software and used in multiple ways, including automatic diagram

creation, information retrieval and analysis. Online documentation at

http://www.biopax.org/ contains more details about how to represent diverse

types of biological pathways. Actual samples of pathway data in BioPAX

OWL XML format are available in Supplementary Tables 1 and 2.

npg

nature biotechnology VOLUME 28 NUMBER 9 SEPTEMBER 2010 9 3 9

P E R S P E C T I V E

information

47,48

. And recently, in a study of rare copy number vari-

ants in 996 individuals with autism spectrum disorder, a core set of

neuronal development–related pathways were found to link dozens

of rare mutations to autism that were not significantly linked to the

disorder on their own by traditional single-gene association statis-

tics

. These studies highlight the importance of pathway information

in explaining the functional consequence of mutations in human

disease. BioPAX pathway data can also be converted into simula-

tion models, for instance using differential equations

or rule-based

modeling languages

, to predict how a biological system may func-

tion after a gene is knocked out.

BioPAX is useful for exchanging information among and between

data providers and analysis software. Pathway database groups

can share the effort of pathway curation by making their pathways

available in BioPAX format and exchanging them with others. For

example, pathways in BioPAX format from the Reactome

database

are imported by the US National Cancer Institute/Nature Pathway

Information Database

. Data providers can use existing BioPAX-

enabled software to add useful new features to their systems. For

example, the Cytoscape network visualization software

can read and

display BioPAX-formatted data as a network. The Reactome group

used this feature to create a pathway visualization tool for their web-

site. Because Reactome data were available in BioPAX format, and

Cytoscape could already read BioPAX format, this new feature was

easy to implement.

The Paxtools Java programming library for BioPAX has been

developed to help software developers readily support the import,

export and validation of BioPAX-formatted data for various uses in

their software (http://www.biopax.org/paxtools/). Using Paxtools

and other tools, a range of BioPAX-compatible software has been

developed, including browsers, visualizers, querying engines,

editors and converters (Supplementary Table 4). For instance,

the ChiBE and VisANT pathway-visualization tools read BioPAX

format

, and the WikiPathways website

, a community wiki

for pathways, is working on using BioPAX to help import path-

ways from several sources, including manually edited pathways

from biologists. The Pathway Tools software

and CellDesigner

pathway editor

are developing support for BioPAX-based data

exchange. In addition, tools for the storage and querying of

Resource Description Framework (http://www.w3.org/RDF/) data

sets, generated within the Semantic Web community, can be used

to effectively process BioPAX data.

What is not covered?

The BioPAX language uses a discrete repre-

sentation of biological pathways. Dynamic

and quantitative aspects of biological proc-

esses, including temporal aspects of feedback

loops and calcium waves, are not supported.

However, BioPAX addresses this need by coor-

dinating work (as described below) with the

SBML and CellML mathematical modeling language communities

55,56

and a growing software tool set supporting biological process

simulation

. Detailed information about experimental evidence sup-

porting elements of a pathway map is useful for evaluating the qual-

ity of pathway data. This information is only included in BioPAX for

molecular interactions, because that was already defined by the PSI-MI

language

and it was reused The BioPAX work group makes use of

PSI-MI–controlled vocabularies and other concepts and works with

the PSI-MI work group to build these vocabularies in areas of shared

interest, such as genetic interactions. Although BioPAX does not aim to

standardize how pathways are visualized, work is coordinated with the

Entity

Pathway

Interaction

TemplateReaction

Control

Catalysis

TemplateReactionRegulation

Modulation

Transport

DegradationTransportWithBiochemicalReaction

Biochemical

Reaction

Complex

Assembly

Conversion

MolecularInteraction GeneticInteraction

Gene

Protein

DNA

RNA

PhysicalEntity

Complex

Protein properties

availability (String*)

name (String*)

-comment (String*)

xref (Xref*)

data Source (Provenance*)

evidence (Evidence*)

feature (Entity-Feature*)

not Feature (Entity-Feature*)

member Physical Entity (Protein*)

cellular Location

(Cellular-Location-Vocabulary*)

entity Reference (Protein-Reference)

Small

Molecule

Figure 4 High-level view of the BioPAX ontology.

Classes, shown as boxes and arrows, represent

inheritance relationships. The three main

types of classes in BioPAX are Pathway (red),

Interaction (green) and PhysicalEntity and

Gene (blue). For brevity, the properties of the

Protein class only are shown as an example at

the top right. Asterisks indicate that multiple

values for the property are allowed. Refer to

BioPAX documentation at http://www.biopax.org/

for full details of all classes and properties.

Export Import

Pathway analysis of genomics data

Pathway visualization from database

Data exchange between database groups

Database 1

Genomics

data

Pathway

data

Database 2

ExportDatabase 1

Visualization

software

Analysis

software

Find active

pathways

AKT1

PDK1

PP2A

PDK2

HSP90

AKT1

308

473

308

473

308

473

Figure 5 Example uses of pathway information in BioPAX format. Red-

colored boxes or lines indicate the use of BioPAX.

npg

The BioPAX community standard for pathway data sharing

Figures

Citations

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

The Reactome Pathway Knowledgebase.

Large-scale gene function analysis with the PANTHER classification system

Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation

PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees

References

Gene Ontology: tool for the unification of biology

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

Database resources of the National Center for Biotechnology Information

Comprehensive genomic characterization defines human glioblastoma genes and core pathways

Related Papers (5)

The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

Gene Ontology: tool for the unification of biology

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

The Reactome Pathway Knowledgebase.

KEGG: Kyoto Encyclopedia of Genes and Genomes

Frequently Asked Questions (11)

Q1. What are the different types of signals that are captured in BioPAX?

Q2. What is the main purpose of the bioPAX language?

Q3. What is the common use of bioPAX?

Q4. What are some of the goals of the bioPAX community?

Q5. How can biologists use the BioPAX language?

Q6. What is the main purpose of using OWL?

Q7. What is the name of the Java programming library for BioPAX?

Q8. What is the common way to access the data?

Q9. What is the way to validate a bioPAX document?

Q10. What is the common way to represent a pathway?

Q11. What did the authors contribute to the development of BioPAX?