A human phenome-interactome network of protein complexes implicated in genetic disorders

doi:10.1038/NBT1295

NATURE BIOTECHNOLOGY VOLUME 25

NUMBER 3

MARCH 2007 309

A human phenome-interactome network of protein

complexes implicated in genetic disorders

Kasper Lage

1,6

, E Olof Karlberg

1,6

, Zenia M Størling

1

, Páll Í Ólason

1

, Anders G Pedersen

1

, Olga Rigina

1

,

Anders M Hinsby

1

, Zeynep Tümer

2

, Flemming Pociot

3,4

, Niels Tommerup

2

, Yves Moreau

5

& Søren Brunak

1

We performed a systematic, large-scale analysis of human

protein complexes comprising gene products implicated in many

different categories of human disease to create a phenome-

interactome network. This was done by integrating quality-

controlled interactions of human proteins with a validated,

computationally derived phenotype similarity score, permitting

identification of previously unknown complexes likely to be

associated with disease. Using a phenomic ranking of protein

complexes linked to human disease, we developed a Bayesian

predictor that in 298 of 669 linkage intervals correctly ranks

the known disease-causing protein as the top candidate, and in

870 intervals with no identified disease-causing gene, provides

novel candidates implicated in disorders such as retinitis

pigmentosa, epithelial ovarian cancer, inflammatory bowel

disease, amyotrophic lateral sclerosis, Alzheimer disease, type

2 diabetes and coronary heart disease. Our publicly available

draft of protein complexes associated with pathology comprises

506 complexes, which reveal functional relationships between

disease-promoting genes that will inform future experimentation.

Several diseases with overlapping clinical manifestations are caused by

mutations in different genes that are part of the same functional module.

In such instances, the clinical overlap can be attributed to mutations in

single genes rendering the complete module dysfunctional

1

. This concept

has been applied to searches for disease genes by several computational

methods, including, for example, schemes based on Gene Ontology

annotations and gene expression data

2–12

. The advent of proteome-wide

interaction screens in model organisms has revealed the modularity of

the cellular interactome and that many genes exert their functions as

components of protein complexes such as cellular machines, rigid struc-

tures, dynamic signaling or metabolic networks and post-translational

modification systems

13

.

Analyses involving model organisms, and more recently humans, show

that direct and indirect interactions often occur between protein pairs

responsible for similar phenotypes

14–22

. In humans this relationship can,

for example, be observed in various inherited ataxias

20

. These findings

hint at the widespread association of protein complexes with human

disease and the likelihood that defects in several proteins, alone or in

combination, can cause overlapping clinical manifestations. Systematic

investigation of these complexes would help to elucidate cellular mecha-

nisms underlying various disorders and prioritize positional candidates

identified, for example, by linkage analysis or association studies.

Our strategy is predicated on the simple assumption that mutations

in different members of a protein complex (predicted from protein-pro-

tein interaction data) lead to comparable phenotypes, the similarities of

which can be automatically recognized by text mining. Computational

integration of phenotypic data with a high-confidence interaction net-

work of human proteins is required to perform such an analysis for many

human diseases simultaneously. This creates a phenome-interactome net-

work. However, there is no single standard vocabulary for phenotypic

annotation in humans. Furthermore, protein interaction data are noisy,

are scattered among different databases and contain many false positive

interactions

23

. Additionally, only a few large-scale protein interaction

studies have been finalized for the human proteome

24,25

rendering the

coverage of human protein interaction data too low for a systematic study

of protein complexes associated with human disease. Thus, extensive data

integration, including conservative incorporation of protein interaction

data from model organisms, streamlining of human phenotype data and

thorough testing of the resulting method, is required for the systematic

investigation of protein complexes associated with human disease.

RESULTS

Construction of a quality-controlled interaction network of human pro-

teins and implementation of a thoroughly benchmarked computational

phenotype similarity score allowed us to analyze a human phenome-

interactome network. The results show that the 506 disease-associated

protein complexes span a wide range of inherited disease categories. We

furthermore trained a Bayesian predictor to prioritize candidates in 870

linkage intervals by assigning candidates to protein complexes and rank-

ing these complexes based on the phenotypes associated with its members

by text mining. The key steps in our approach are illustrated in Figure

1. Four disease-specific case studies are presented to illustrate how the

complexes can be exploited to generate novel hypotheses, which directly

suggest specific validation experiments involving particular patient-

derived materials.

1

Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University

of Denmark, Building 208, DK-2800 Lyngby, Denmark.

2

Wilhelm Johannsen

Centre for Functional Genome Research, Department of Cellular and Molecular

Medicine, The Panum Institute, University of Copenhagen, Blegdamsvej 3, DK-

2200, Copenhagen N, Denmark.

3

Institute for Clinical Science, University of

Lund, SE – 22100 Lund, Sweden.

4

Steno Diabetes Center, Niels Steensesvej 2,

DK-2820 Gentofte, Denmark.

5

Department of Electrical Engineering, Faculty of

Engineering, Katholieke Universiteit Leuven, B–3001 Heverlee, Belgium.

6

These

authors contributed equally to this work. Correspondence should be addressed to

S.B. (brunak@cbs.dtu.dk).

Published online 7 March 2007; doi:10.1038/nbt1295

ANALYSIS

310 VOLUME 25

NUMBER 3

MARCH 2007 NATURE BIOTECHNOLOGY

Measuring phenotype similarity scores

Text mining techniques are well suited for investigating phenotype-

genotype relationships

8,11,12,14,26–28

. Inspired by such techniques, we

created a scoring scheme that quantitatively measures the phenotypic

overlap of Online Mendelian Inheritance in Man (OMIM)

29

records

(Supplementary Fig. 1 online). For every record we created a phenotype

vector consisting of weighted medical terms present in the record, which

represent the phenotype described in that particular record. The parsing

of the OMIM records was done using MetaMap Transfer

30

(MMTx),

a program that maps text to the Unified Medical Language System

(UMLS)

31

metathesaurus (MTH) concepts. The pairwise phenotypic

overlap between records was quantified by calculating the cosine of the

angle between normalized vector pairs

32

, which is a standard measure

in such analyses. Essentially, the method amounts to detecting words

(from the UMLS vocabulary) that are (i) common to the description

of the two phenotypes and (ii) do not occur too frequently among all

phenotype descriptions and thus are informative about the phenotype

under consideration.

Even though our approach is comparable to successful methods

reported in other contexts

28

, there are a number of problems surrounding

the use of MMTx and UMLS

33

, and it is not obvious that the cosine dis-

tance between phenotype vectors can accurately capture and quantify the

phenotypic overlap between record pairs. To evaluate the reliability of our

method, we extracted a large set of ~7,000 OMIM record pairs, which had

a high degree of phenotypic overlap. This assertion of phenotypic overlap

was based on a combination of the opinion of expert OMIM curators and

experts familiar with the diseases under consideration (Supplementary

Methods online). To evaluate the phenotypic overlap of record pairs in

this set, we manually curated 100 random record pairs. This evaluation

showed that over 90% of the pairs consist of records with a high degree

of phenotypic overlap (Supplementary Table 1 online).

The reliability of the phenotype similarity score was then tested by fit-

ting a calibration curve of the score against the overlap with the OMIM

record pairs (that is, the percentage of the pairs with a given score found

among the record pairs). This demonstrates their direct correlation

(Supplementary Fig. 2 online). The higher the phenotype similarity

score between records measured by our text-mining scheme, the higher

the probability that the records had been independently evaluated to

have a phenotypic overlap by the OMIM curators, so that indeed the

constructed phenotype vectors and scoring scheme produce a reliable

measure of phenotypic overlap between OMIM records.

Constructing a scored network of human protein interactions

We created a human protein interaction network by pooling human

interaction data from several of the largest databases and increased the

coverage by transferring data from model organisms. We then devised

and tested a network-wide confidence score for all interactions. This score

relies on network topology and furthermore considers (i) that interac-

tions from large-scale experiments generally contain more false positives

than interactions from small-scale experiments

23

, and (ii) that interac-

tions are more reliable if they have been reproduced in more than one

independent interaction experiment

23

. The reliability of this score as a

measure of interaction confidence was confirmed by fitting a calibration

curve of the score against overlap with a high-confidence set of about

35,000 human interactions (Supplementary Fig. 3 online). The resulting

network contains ~343,000 unique interactions between ~8,500 human

proteins. Of these, ~62,000 are high-confidence interactions.

Testing the predictor on 1,404 linkage intervals

We trained a Bayesian predictor to rank known disease-causing pro-

teins in linkage intervals, by assigning candidates to protein complexes

and ranking these complexes based on the phenotypes assigned to their

members by text mining. The predictor was validated by fivefold cross-

validation on a total of 1,404 linkage intervals containing an average of

109 candidates and including one candidate known to be involved in the

particular disease. For ranking candidates, the Bayesian predictor takes as

input the patient phenotype (e.g., Leber congenital amaurosis) and a link-

age interval, and the candidates are ranked by the following three steps

(Fig. 1). First, a given positional candidate is queried for high-scoring

interaction partners (termed a virtual pull-down of the protein). These

interaction partners compose the candidate complex. Second, proteins

known to be involved in disease are identified in the candidate complex,

and pairwise scores of the phenotypic overlap between diseases of these

proteins and the candidate phenotype are assigned. Third, based on the

phenotypes represented in the candidate complex, the Bayesian predictor

awards a posterior probability score to the candidate in the complex. All

candidates in the linkage interval are ranked on the basis of this score.

The biological interpretation of a high-scoring candidate is that this pro-

tein is likely to be involved in the molecular pathology of the disorder

of interest, because it is part of a high-confidence candidate complex

in which some proteins are known to be involved in highly similar (or

identical) disorders.

Performance of the Bayesian model relying on phenomic

scoring of protein complexes associated with disease

The results of prioritizing candidates in the 1,404 test linkage intervals

show that the predictor has both good precision and recall (Fig. 2a).

For each disease, we consider the known disease gene as the relevant

gene. Our method makes a prediction for a disease if the top-scoring

gene for this disease has a score above the threshold of 0.1. This thresh-

old is chosen because predictions scoring below 0.1 approximate the

chance of picking the correct gene randomly. The retrieved gene is then

this top-scoring gene. Precision (at a given threshold) is the propor-

tion of relevant genes among all retrieved genes (no. of relevant genes

retrieved/no. of genes retrieved). Recall is the fraction of the relevant

genes that have been retrieved at the same threshold (no. of relevant

genes retrieved/no. of relevant genes). For the 1,404 linkage intervals,

there are 669 different predictions with a score above 0.1. Among these,

there were 298 correctly identified disease genes, so that the precision

at this threshold is 45% (that is, 45% of the candidates that ranked

number one with a score above 0.1 are correctly identified as genes

causing disease) (Fig. 2a)—a level of precision far superior to random

prediction. At this threshold, the recall is 21%. A plot of precision versus

prediction score cutoff shows proportionality between the score and

the chance that the candidate is correct. Candidates scoring above 0.9

are correct in more than 65% of the cases (Fig. 2a). Thus, high-scor-

ing candidates are very likely to be correct, and the score awarded to a

candidate is a direct indication of the chance that the gene contributes

to the disease in question.

There were two main types of failures to identify the relevant genes.

Either the proteins coded by the relevant genes do not have an interac-

tion partner that is involved in a relevant phenotype (which applies to

59% of all intervals), or there is a gene in the region considered a bet-

ter candidate by the predictor (which applies to 26% of all intervals).

These 26% could in theory be correct predictions, as suggested by manual

inspection of false predictions with high posterior probabilities. By far

the most common failure is the lack of interaction partners involved in

similar diseases. In 75% of such cases there were no candidates that scored

above the threshold of 0.1. These failures could either be due to a lack

of data or because some disease proteins do not interact with proteins

involved in similar diseases. It seems most likely that the failures are due

to a combination of both.

ANALYSIS

NATURE BIOTECHNOLOGY VOLUME 25

NUMBER 3

MARCH 2007 311

We also tested a predictor trained on large-scale protein interac-

tion data from which bias related to human diseases was eliminated

(Supplementary Methods online). Here we observed a comparable

precision to the predictor trained on the full protein interaction data

set (Fig. 2b). Using these data, the precision above 0.1 is 25%, and above

0.9, it is 58%. Therefore, although the performance is slightly lower, it

is still very high. These results illustrate the value of large-scale protein

interaction data from model organisms, if subjected to stringent quality

control. The much lower recall (2.3%) is to be expected with less data.

This shows that it is possible to accurately identify disease genes using

data from model organisms that were not produced specifically to inves-

tigate disease relationships.

Because mutational analysis of candidates in linkage intervals is

extremely demanding in terms of resources, our method should be valu-

able for identifying highly likely candidates and thereby facilitating the

discovery of novel genes involved in human disease.

Predicting novel disease gene candidates

OMIM contains 870 intervals linked to diseases for which there are no

confirmed disease-causing genes. We ranked the genes in these intervals

by the method depicted in Figure 1. The full set of predictions above the

threshold of 0.1 can be seen in the Supplementary Data. We present the

best-scoring candidates made by our predictor in Supplementary Table 2

online. In each of the 91 represented intervals at least one candidate scores

above 0.2. In some intervals there are also candidates scoring in the range

0.1–0.2, these are included for completeness, so the table contains a total

of 113 candidates in 91 intervals.

All predictions in Supplementary Table 2 were followed up by inde-

pendent literature studies, where we investigated the distance of the pre-

dicted gene to the closest published high-resolution marker. Seven genes

were located >20 Mb from such markers (labeled * in Supplementary

Table 2 online). We also investigated whether the candidates had previ-

ously been associated with the respective disorders, and whether there

were inconsistencies between candidates we proposed and those pro-

posed by other groups for the same diseases and intervals.

Twenty-four of the predictions point to genes that are most likely true

positives, but where the causative mutation has not yet been identified

(annotated with “2” or “2#” in Supplementary Table 2 online). In these

cases, our predictions should be seen as further evidence that the genes

are involved in the respective diseases. Seven predictions point to genes

where a causative mutation has been identified (annotated with “3” in

Supplementary Table 2 online). Together, these constitute 31 predictions

most likely to be true. Of these, 25 are the best scoring in the interval, and

6 are scored second or lower. Sixteen predictions point to genes for which

literature studies show that a different gene is strongly incriminated in

the disease, most likely rendering the prediction wrong (annotated with

“1#” in Supplementary Table 2 online). Of these, 11 are the best-scoring

candidate in the interval and 5 score second or lower. When considering

Linkage interval,

with N candidates

found in genetic

studies to be

associated with

the patient

phenotype

Patient phenotype

Leber congenital

amaurois

Pairwise similarity of protein phenotype

and patient phenotype

Not involved in

similar disease

Figure 1 Steps in scoring each candidate in a linkage interval. First, a virtual pull-down of each candidate identifies putative protein complexes including

the candidate. Each complex is named the candidate complex. Second, proteins responsible for promoting disease are identified in the candidate complex,

and the pairwise similarity to the patient phenotype is measured by text-mining. In this case, proteins that are involved in different disorders comparable to

Leber congenital amaurosis are colored according to the clinical overlap with this phenotype. The last step involves scoring and ranking the candidates by the

Bayesian predictor. Each candidate is scored based on phenotypes associated with the proteins in the candidate complex, and all candidates in the interval

are ranked based on this score.

ANALYSIS

312 VOLUME 25

NUMBER 3

MARCH 2007 NATURE BIOTECHNOLOGY

only the best-scoring candidate in each interval (as we have done in the

benchmark), 25 are most likely true positives and 11 are most likely nega-

tives. Thus, the precision is 69%—even better than the precision in the

benchmark, where predictions above 0.2 have a precision of 49%. Sixty-

six of the candidates belong to intervals where there is no evidence in

the literature regarding a gene(s) that contributes to the pathology. We

consider these as novel candidates. All complexes underlying the candi-

dates scoring 0.1 or above are available for download from the database

supporting this work.

To exemplify the candidate protein complexes underlying the scoring

of the Bayesian predictor, we present four case studies of the novel candi-

dates from Supplementary Table 2 online. Similar analysis can be carried

out for all 506 complexes in the data set, pointing to specific approaches

toward validation of the proposed relationships.

Case studies

Retinitis pigmentosa is a clinically and genetically heterogeneous group

of disorders. Common traits are night blindness, constricted visual field

and retinal dystrophy. In an associated interval on 2p15–p11 (ref. 34), the

Bayesian predictor points to LOC130951 with a score of 0.5232. This pro-

tein is uncharacterized but evolutionarily conserved, and it is putatively

involved in the disease based on an interaction with CRX

25,35

(Fig. 3a).

CRX is a homeobox transcription factor known to be involved in retinitis

pigmentosa and cone rod dystrophy

36

. The candidature of LOC130951 is

not obvious, and because both interaction studies reporting the interac-

tion to CRX are large scale, including thousands of interactions, it seems

unlikely that LOC130951 would have been chosen as a suitable candidate

by manual investigation of the interval.

Epithelial ovarian cancer arises as a result of genetic alterations in the

ovarian surface epithelium. In an associated interval on 3p25–p22 (ref.

37), the Bayesian predictor points to Fanconi anemia group D2 protein

(FANCD2) with a score of 0.9981. This protein is placed in a complex

with breast cancer type 2 susceptibility protein (BRCA2), breast cancer

type 1 susceptibility protein (BRCA1) and nibrin isoform 1 (NBN), all

of which are involved in ovarian cancer, breast cancer or chromosomal

instability disorders

38–41

(Fig. 3b). Furthermore, other proteins involved

in cancer can be identified in the complex (Supplementary Data and

Supplementary Fig. 4 online). FANCD2 is part of the BRCA pathway in

cisplatin-sensitive cells

42

and is known to be involved in different types

of cancer

43

. However, to our knowledge, a mutation in this gene has

never been demonstrated in epithelial ovarian cancer, and we consider

it to be a likely candidate in epithelial ovarian cancer in families with

linkage to 3p22–p25.

Inflammatory bowel disease is characterized by chronic, relapsing

intestinal inflammation. In an associated interval on 6p

44,45

, the Bayesian

predictor points to receptor-interacting serine/threonine protein kinase

(RIPK1) as the most likely candidate with a score of 0.9984 (Fig. 3c).

The candidate complex includes the signaling proteins tumor necrosis

factor receptor 2 (TNFRSF1B), tumor necrosis factor precursor (TNF)

and tumor necrosis factor receptor precursor (TNFRSF1A), all known

to be associated with inflammatory bowel disease or other inflamma-

tory disorders. Furthermore, other proteins involved in inflammation

and immune responses can be observed in the complex (Supplementary

Data and Supplementary Fig. 5 online). We thus identified a positional

candidate, which is placed centrally in a complex of proteins known to

be involved in inflammatory bowel disease and other types of inflamma-

tion. We note that RIPK1 lies 20.6 Mb from the closest high-resolution

marker published. However, considering that all of 6q was screened for

candidates, and that several genes lying far from the published markers

are most likely true predictions in Supplementary Table 2 online, we

believe that RIPK1 is a very likely candidate involved in inflammatory

bowel disease.

Amyotrophic lateral sclerosis (ALS) with frontotemporal dementia is a

degenerative motor neuron disorder characterized by muscular atrophy,

progressive motor neuron function loss and bulbar paralysis. In many

families, hereditary ALS is associated with frontotemporal dementia

and linkage has been shown to an area on 9q21–q22 (ref. 46). Here, the

Bayesian predictor points to two likely candidates: bicaudal D homolog

2 (BICD2) and cytoplasmic isoleucyl-tRNA synthetase (IARS), scor-

ing 0.4351 and 0.2154, respectively. Although BICD2 is scored highest,

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Precision

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Precision

0.10 0.15 0.20 0.25

Prediction score cutoff

Recall

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

a

b

Figure 2 Performance of the Bayesian predictor. (a) A plot of recall of the

predictor against precision shows that precision for high-scoring candidates

can approach 65%. We also trained a predictor only on large-scale data

where we had removed all data that were related to diseases that were

represented in the test set. (b) Prediction score cutoff is plotted for both the

predictor trained on all protein interaction data in our network (green line)

and the predictor trained only on unbiased large-scale data (blue line). The

precision of these two approaches is comparable, showing that it is possible

to find disease genes with very high precision, even with unbiased large-

scale data inferred from model organisms, if these data are scored correctly.

ANALYSIS

NATURE BIOTECHNOLOGY VOLUME 25

NUMBER 3

MARCH 2007 313

both candidates are awarded good scores and

are plausible candidates for contributing to

ALS associated with dementia. However, inves-

tigation of the candidate complexes suggests

that BICD2 is more likely to be involved in

nonfamilial ALS not associated with dementia,

because it is part of a complex with dynactin,

which is associated with ALS without demen-

tia. IARS is in a complex with superoxide dis-

mutase 1, a protein known to be involved in

familial ALS

47

including dementia (Fig. 3d).

Also, the IARS complex contains molecular

chaperones and other proteins that have been

connected to the disease and other types of

dementias (Supplementary Fig. 6 online), and

the interaction data underlying the complex

is highly reproducible (Supplementary Data

online). Both candidates are likely, but the

candidate complex underlying IARS is seem-

ingly more relevant to familial ALS, and it is

plausible that IARS could be involved in the

disease in families with linkage to 9q21–q22.

Because little is known about this disorder, the complex revealed here is

an interesting new lead concerning its underlying causes.

These case studies indicate the value of data mining our phenome-

interactome network and integrating interaction data across multiple

organisms for positional candidate prioritization. In the case of retinitis

pigmentosa and ALS with frontotemporal dementia, the predictor iden-

tifies nonobvious candidates in novel putative complexes supported by

a network of reproducible interaction data from humans and multiple

model organisms. In the cases of inflammatory bowel disease and epi-

thelial ovarian cancer, we identify partly characterized complexes, where

several members are known to be involved in the patient phenotype.

However, because there are ~500 positional candidates in the case of

inflammatory bowel disease, it would require extensive literature studies

to reveal this network and candidate by manual data integration. We thus

believe that RIPK1 would probably not have been identified as a good

candidate despite prior knowledge of its involvement in a known network

contributing to inflammatory responses.

DISCUSSION

We have recently witnessed the emergence of integrative methods for

identifying probable disease genes in linkage intervals associated with dis-

ease based on data integration involving, for example, Gene Ontology cat-

egories and expression data

2–12

. Traditionally these methods are compared

by measuring average fold enrichment of positional probability

(Supplementary Methods online). If a method ranks the true candidate

in the top 10% of all candidates in 50% of the linkage intervals, there is

a tenfold enrichment in the successful predictions intervals and fivefold

enrichment on average. We show that our method increases the probabil-

ity 108.8 times for the successful predictions and 23.1 times on average,

significantly outperforming the other computational methods for posi-

tional candidate prioritization, which report 5.6–31.2 times enrichment

in the successful linkage intervals to 3.8 to 19.4 times enrichment on

average (Supplementary Table 3 online). The most common failure of

our method to correctly identify the disease gene results from the inability

to find interaction partners associated with a similar phenotype as the

relevant protein. This could result from either a lack of data or the failure

of these proteins to interact with proteins involved in similar phenotypes.

In 75% of these cases, failure to identify another candidate scoring over

0.1 eliminates the possibility of an incorrect prediction.

Our ability to assign candidates to high-confidence protein com-

plexes and rank these complexes in terms of phenomics has permitted

us to present a first draft of 506 protein complexes associated with

human disease. The success of our method can be attributed to a com-

bination of factors. First, we integrate experimental protein interaction

data with a phenotype similarity scheme, thereby taking advantage of

the complete clinical spectrum of related human diseases. Also, we use

ab

cd

-

Figure 3 Case studies of four candidate

complexes. (a–d) These candidate complexes

are subjected to virtual pull-down with the best-

scoring candidate in retinitis pigmentosa 28

(RP28) (a), epithelial ovarian cancer (EOC)

(b), inflammatory bowel disease (IBD) (c) and

a high-scoring candidate in amyotrophic lateral

sclerosis (ALS) with frontotemporal dementia

(d). Solid black circles (c) represent proteins

that are the high-scoring candidates in the four

disorders. Numbered circles are proteins that

interact with the candidate proteins. Colored

nodes are proteins identified by our phenotype-

similarity scheme. Gray proteins are not predicted

by our phenotype-similarity scheme to be

implicated in any disease.

ANALYSIS

A human phenome-interactome network of protein complexes implicated in genetic disorders

Citations

Network Medicine: A Network-Based Approach to Human Disease

The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored

De novo mutations revealed by whole-exome sequencing are strongly associated with autism

Genetics of rheumatoid arthritis contributes to biology and drug discovery

Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes

References

Introduction to Modern Information Retrieval

Network biology: understanding the cell's functional organization

The Unified Medical Language System (UMLS): integrating biomedical terminology

A comprehensive two-hybrid analysis to explore the yeast protein interactome

From genomics to chemical genomics: new developments in KEGG

Related Papers (5)

The human disease network

Gene Ontology: tool for the unification of biology

Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders

Towards a proteome-scale map of the human protein–protein interaction network

Network Medicine: A Network-Based Approach to Human Disease