What are the contributions mentioned in the paper "Quantifying the impact of public omics data" ?

Here, the authors propose a set of novel metrics to quantify the attention and impact of biomedical datasets. Finally, the authors propose a set of recommendations for authors, journals and data resources to promote an optimal quantification of the impact of datasets.

What is the method to shrink the original values of a distribution to a range?

The MinMaxScaler is a robust method to shrink original values of a distribution to a range such that it becomes a value between 0 and 1.

What can be the way to assess the impact of a dataset?

The correct tracking of datasets in a database by other data resources can help to assess its impact, since itdemonstrates that the data they store is actively re-used by (and thus it is relevant to) the community.

What is the importance of reporting scientific impact?

Reporting scientific impact is indeed increasingly relevant for individuals, but also reporting aggregated information has become essential for research groups, scientific consortia, institutions or for public data resources among others, in order to assess the level of importance, excellence and relevance of their work.

What are the main reasons for the reanalysis of datasets?

The appropriate and accurate reference to the original datasets in other resources facilitates the reproducibility and traceability of the results and the recognition for the authors that generated the original dataset32.

What are the five metrics that can be used to estimate the impact of datasets?

The authors have formulated five metrics that can be used to estimate the impact of datasets (Fig. 5):1. Number of reanalyses (reanalyses): A reanalysis can be generally defined as the complete or partial re-use of an original dataset (A) using a different analysis protocol and stored either in the same or in another public data resource (B) (Fig. 5).

How can researchers create their own profile in OmicsDI?

Analogously to services such as Google Scholar and ResearchGate for publications, the authors have implemented a mechanism that enables researchers to create their own profile in OmicsDI, by claiming their own datasets.

(Open Access) Quantifying the impact of public omics data (2018) | Yasset Perez-Riverol

Q: What is the standard deviation for the citation rate for proteomics datasets?

the standard deviation indicates that in transcriptomics some datasets get significantly more attention from the community than others (STD= 16), whereas for proteomics datasets the citation rate is much more homogenous (STD= 1.7).

ARTICLE

Quantifying the impact of public omics data

Yasset Perez-Riverol

, Andrey Zorin

, Gaurhari Dass

, Manh-Tu Vu

, Pan Xu

, Mihai Glont

Juan Antonio Vizcaíno

, Andrew F. Jarnuczak

, Robert Petryszak

, Peipei Ping

3,4

& Henning Hermjakob

1,2

The amount of omics data in the public domain is increasing every year. Modern science has

become a data-intensive discipline. Innovative solutions for data management, data sharing,

and for discovering novel datasets are therefore increasingly required. In 2016, we released

the ﬁrst version of the Omics Discovery Index (OmicsDI) as a light-weight system to

aggregate datasets across multiple public omics data resources. OmicsDI aggregates geno-

mics, transcriptomics, proteomics, metabolomics and multiomics datasets, as well as com-

putational models of biological processes. Here, we propose a set of novel metrics to quantify

the attention and impact of biomedical datasets. A complete framework (now integrated into

OmicsDI) has been implemented in order to provide and evaluate those metrics. Finally, we

propose a set of recommendations for authors, journals and data resources to promote an

optimal quantiﬁcation of the impact of datasets.

https://doi.org/10.1038/s41467-019-11461-w

OPEN

European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK.

State Key Laboratory of

Proteomics, Beijing Proteome Research Center, Beijing Institute of Lifeomics, National Center for Protein Sciences (The PHOENIX Center, Beijing), 102206

Beijing, China.

Department of Physiology, Division of Cardiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles 90095 CA,

USA.

Department of Medicine, Division of Cardiology, David Geffen School of Medicine at UCLA, University of California, Los Angeles 90095 CA, USA.

Correspondence and requests for materials should be addressed to Y.P.-R. (email: yperez@ebi.ac.uk)

NATURE COMMUNICATIONS | (2019) 10:3512 | https://doi.org/10.1038/s41467-019-11461-w | www.nature.com/naturecommunications 1

1234567890():,;

ublic availability of datasets is growing in all disciplines,

because it is considered to be a good scientiﬁc practice (e.g.

to enable reproducibility) and/or it is mandated by funding

agencies and scientiﬁc journals

1,2

. Science is now a data intensive

discipline and therefore, new and innovative ways for data

management, data sharing and for discovering novel datasets are

increasingly required

3,4

. However, as data volumes grow, quan-

tifying data impact becomes more and more important. In this

context, the Findable, Accessible, Interoperable, Reusable (FAIR)

principles have been developed to promote good scientiﬁc prac-

tises for scientiﬁc data and data resources

. In fact, recently,

several resources

1,2,6

have been created to facilitate the Findability

(F) and Accessibility (A) of biomedical datasets. These principles

put a speciﬁc emphasis on enhancing the ability of both indivi-

duals and software to discover and re-use digital objects in an

automated fashion throughout their entire life cycle

. While data

resources typically assign an equal relevance to all datasets (e.g. as

results of a query), the usage patterns of the data can vary

enormously, similarly to other “research products” such as pub-

lications. How do we know which datasets are getting more

attention? More generally, how can we quantify the scientiﬁc

impact of datasets?

Recently, several authors

7–9

and resources

pointed out the

importance of evaluating the impact of each research product,

including datasets. Reporting scientiﬁc impact is indeed increas-

ingly relevant for individuals, but also reporting aggregated

information has become essential for research groups, scientiﬁc

consortia, institutions or for public data resources among others,

in order to assess the level of importance, excellence and rele-

vance of their work. This is a key piece of information for funding

agencies, which is used routinely to prioritise the projects and

resources they fund. However, most of the efforts nowadays focus

on the evaluation and quantiﬁcation of the impact of publications

as the main artefact. For instance, in 2013, the “altmetrics” team

proposed a set of ‘alternative’ metrics to trace research products

with special focus on publications

. Speci ﬁc tools and services

have been built since to aggregate “altmetrics”, including for

instance counts of mentions of a given publication in blog posts,

tweets and articles in mainstream media. The altmetrics attention

score is widely used by the research community nowadays (e.g. by

multiple scientiﬁc journals), as a measure of scientiﬁcinﬂuence of

manuscripts. However, adequate tracking and recognition of

datasets has been limited so far for multiple reasons: (i) the

relatively low number of publications citing datasets instead of

their corresponding publications; (ii) the lack of services that

store and index datasets from heterogeneous origins; and (iii) the

absence of widely used metrics that enable the quantiﬁcation of

their impact. Some attempts have been made to improve the

situation, by introducing data object identiﬁers (DOIs) directly

associated to datasets

In 2016, we released the ﬁrst version of the Omics Discovery

Index (OmicsDI—https://www.omicsdi.org) as a light-weight

system to aggregate datasets across multiple public omics data

resources. OmicsDI aggregates genomics, transcriptomics, pro-

teomics, metabolomics and multiomics datasets, as well as com-

putational models of biological processes

. The OmicsDI web

interface and Application Programming Interface (API) provide

different views and search capabilities on the indexed datasets.

Datasets can be searched and ﬁltered based on different types of

technical and biological annotations (e.g. species, tissues, diseases,

etc.), year of publication and the original data repository where

they are stored, among others. At the time of writing (March

2019), OmicsDI stores just over 454,200 datasets from 16 dif-

ferent public data resources (https://www.omicsdi.org/database).

The split per omics technology is as follows: transcriptomics

(125,891 datasets), genomics (309,961), proteomics (12,362),

metabolomics (2411), multiomics (6578) and biological models

(8651). Here, we propose a set of novel metrics to quantify the

impact of biomedical datasets. A complete framework (now

integrated in OmicsDI) has been implemented in order to provide

and evaluate those metrics. Finally, we propose a set of recom-

mendations for authors, journals and data resources to promote

an optimal quantiﬁcation of the impact of datasets.

Results

Omics data reanalysis and citations. By March 2019, the num-

ber of datasets with at least one reanalysis, one citation, one

download, one view and that contained connections in knowl-

edgebases was 12,162, 58,054, 66,418, 163,431 and 469,015,

respectively (Table 1). The reanalysis metric quantiﬁ es how many

times one dataset has been re-used (re-analysed) and the result

deposited in the same or in another resource. We classify rea-

nalyses in two different categories: (i) reanalyses performed by

independent groups (Independent Lab Reanalyses) or reanalyses

performed systematically by resources such as PeptideAtlas or

Expression Atlas (Resource Reanalyses). On average, each rea-

nalysed dataset is reanalysed 2.3 times. However, each omics type

has a different pattern: proteomics (5.90), transcriptomics (1.31),

multiomics (2.07), genomics (1.26) and models (30.08).

Frequently, dataset re-use is a hierarchical process, where one

dataset is reanalysed subsequently multiple times. Figure 1a

presents a reanalysis network for the model BIOMD0000000055,

starting from 2006 (release year) to 2015. A different pattern is

illustrated in Fig. 1b, where BIOMD0000000286 is derived from

multiple source models. BioModels curates and annotates for

each deposited model, the corresponding model from which it is

derived (if applicable). Figure 1c shows the reanalysis network of

the PRIDE dataset PXD000561 (https://www.omicsdi.org/dataset/

pride/PXD000561) (75); one of the “drafts of the human

proteome”. This dataset and the PXD000865 have supported

the annotation of millions of peptides and proteins evidences,

enabling the large-scale annotation of the human proteome

and

have been reanalysed by multiple databases including the

proteomics resources PeptideAtlas and GPMDB

Interestingly, the distribution of the elapsed time between the

year of publication of the original datasets and publication of the

reanalyses shows that most of the datasets are reanalysed within

the ﬁrst 5 years after publication (Fig. 2a). After 10 years of

publication, still datasets are often reused in public databases like

Expression Atlas. The proteomics community (PRIDE datasets)

in contrast to transcriptomics tends to reanalyse the data within 3

years of its publication. Typically, the number of reanalyses in

OmicsDI grows within the ﬁ rst 5 years making this a metric

better suited to measure immediate impact.

The second metric is the number of direct citations in

publications for each dataset as previously suggested

. The

number of datasets with at least one citation in EuropePMC is

58,054 (Table 1). Figure 2b shows the distribution of dataset

direct citations by omics type. Transcriptomics datasets are the

most cited ones, followed by genomics and multiomics datasets.

Interestingly, the standard deviation indicates that in transcrip-

tomics some datasets get signiﬁcantly more attention from the

community than others (STD = 16), whereas for proteomics

datasets the citation rate is much more homogenous (STD = 1.7).

The current workﬂow searches EuropePMC using all the

identiﬁers associated with a given dataset (e.g. a given dataset

can be cited in a publication using the ArrayExpress, GEO or

BioProject identiﬁers). For example, the dataset E-GEOD-2034

(https://www.omicsdi.org/dataset/arrayexpress-repository/E-

GEOD-2034) is cited 312 and 28 times, using the ArrayExpress

(E-GEOD-2034) and GEO (GSE2034) identiﬁers, respectively.

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11461-w

2 NATURE COMMUNICATIONS | (2019) 10:3512 | https://doi.org/10.1038/s41467-019-11461-w | www.nature.com/naturecommunications

Biological entity connections. We analysed the number of bio-

logical entities reported on each omics dataset (e.g. UNIPORT

proteins) stored in other knowledge-bases (UniProt) (Table 1).

More than 53% of the datasets contains biological connections

that can be traced to knowledge-based resources, such as

Ensembl

, UniProt

or IntAct

. The number of connections

across different omics types can differ signiﬁcantly. For example,

dataset E-MTAB-599 (“RNA-seq of mouse DBA/2JxC57BL/6J

heart, hippocampus, liver, lung, spleen and thymus”), associated

with this publication

, has 1,710,979 connections, including

1,689,177 genome variants, 21,572 gene values and 230 other

connections, ranging from sample annotations to nucleotide

sequences. The second most connected dataset in the Metabo-

Lights database (MTBLS392—https://www.omicsdi.org/dataset/

metabolights_dataset/MTBLS392), associated with this publica-

tion

, only contains 345 metabolites reported in the ChEBI

database

. To overcome these differences, we have implemented

a normalisation method that creates a connectivity score which

boosts how much a dataset contributes to a speciﬁc knowledge-

base and also boost datasets that are included in more knowledge-

bases (Supplementary Note 1).

We have studied the correlation between all the metrics for the

different omics types (Fig. 3). The number of reanalyses and

citations are highly correlated for proteomics datasets (R = 0.7)

but are not correlated for other omics ﬁelds, such as

transcriptomics, genomics and multiomics: 0.018, 0.02 and 0.12,

respectively. The highest global correlation (R = 0.5) is observed

for the combination of number of connections and downloads.

Generally, the ﬁve metrics are not correlated for any of the omics

ﬁelds (Fig. 3) and can be seen as orthogonal variables to get a

broader representation of the impact of omics datasets.

Discussion

One of the obstacles to achieving a systematic deposition of

datasets in public repositories is the lack of a broad scientiﬁc

reward system, considering other research products in addition to

scientiﬁc publications

. Different studies have demonstrated the

need for metrics and frameworks to quantify the impact of

deposited datasets in the public domain. Such a system would not

only encourage authors to make their data public, but also would

help funding agencies, biological resources and the scientiﬁc

community as a whole to focus on the most impactful datasets. In

OmicsDI we have implemented a novel platform to quantify the

impact of public datasets systematically, by using data from

biological data resources (reanalyses), literature (citations),

knowledge bases (connections), views and downloads. Every

metric is updated on a weekly basis and made available through

the OmicsDI web interface and API.

One of the primary ﬁndings is that in systems biology (the

BioModels database

is the representative resource), the

deposition of data has enabled systematic generation of new

knowledge (biological models) based on previous datasets. For

example, the model “Genome-scale metabolic modelling of

hepatocytes reveals serine deﬁciency in patients with non-

alcoholic fatty liver disease” (MODEL1402200003)

has been

used to build more than 6000 models available in BioModels. We

noticed different complex graph patterns of reanalysis in the

BioModels database. For example, Fig. 1a shows the reanalysis

network of model BIOMD0000000055, where the original model

published in 2006 has been reused to build new models until

2015. BioModels can be built from multiple models and origi-

nated new models (Fig. 1b). BioModels database has deﬁned

during the submission process a mechanism to annotate if the

model reuse parts of previously published models enabling

OmicsDI to build and trace reanalysis patterns. In contrast to

Table 1 The number of citations, reanalyses, downloads, views and connections (April 2019)

omics type Number

citations

Number

of cited

datasets

Number of

reanalyses

Number of

reanalysed

datasets

Number of

downloads

Number of

downloaded

datasets

Number

of views

Number

viewed

datasets

Number of

connections

Number of

datasets

with

connections

Genomics 8152 3389 1103 872 1,210,799 54,336 1,233,388 13,441 1,041,407,105 313,549

Metabolomics 827 117 –– 49,907 321 253,428 2726 340,483 1340

Models 3 3 7190 239 –– 435,859 7262 12,880,012 7200

Multiomics 9111 2053 5013 2422 179,669 2694 860,092 7848 16,453,633 7849

Proteomics 4624 1793 3344 567 153,548 5392 1,417,107 13,015 51,857,985 20,577

Transcriptomics 665,022 50,699 10,527 8062 208,383 3675 14,793,937 119,139 27,696,366 118,500

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11461-w ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3512 | https://doi.org/10.1038/s41467-019-11461-w | www.nature.com/naturecommunications 3

biological models, the proteomics (Fig. 1c) and transcriptomics

ﬁelds are still working to deﬁne a proper mechanism to report the

multiple reanalyses of datasets in a hierarchical manner

. For

this reason, the reanalysis pattern detected in proteomics are “one

to many” networks where one dataset has been reanalysed by

multiple datasets (e.g. PXD000561).

Moreover, the results showed that the reanalysis metric is

crucial to highlight relevant datasets early after the dataset release

(Fig. 2c). Overall, 8000 datasets (>5% of OmicsDI content) have

been reanalysed by resources, such as PeptideAtlas, GPMDB or

Expression Atlas, among others. However, it should be noted that

the reanalysis metric measures only the impact of datasets in the

same or in other data resources contributing their metadata to

OmicsDI, which constitutes a fraction of the total re-use by the

scientiﬁc community.

To complement the reanalysis metric, we counted direct cita-

tions of datasets in scientiﬁc publications. Different studies have

estimated that the proportion of the total citation count con-

tributed by data depositions is around 6–20%

10,14

. Most of the

reanalyses tracked in OmicsDI have been performed using GEO

datasets, which might have biased the results to a speciﬁc

resource. However, our ﬁndings show the same patterns in the

literature: almost 9000 datasets have been cited in publications at

least once. It is important to highlight that counting direct

database citations in the whole text of manuscripts is only pos-

sible for open access publications. In the case where the corre-

sponding publications are not open access, dataset identiﬁers

would need to be included in the PubMed abstract to be included

in this metric. The coverage of direct citations in publications is

therefore limited by this systemic issue. We have found that the

transcriptomics community (individual researchers) tend to cite

the same datasets more often, with an average of four citations

per dataset. The most cited dataset is “Transcription proﬁling of

human breast cancer samples—relapse free survival” (E-GEOD-

2034), totalling 312 citations. Both metrics, reanalyses and cita-

tions, should be used in combination for a better understanding

of the dataset impact. Our results show that both metrics are

uncorrelated and should not be aggregated. For example, dataset

'BIOMD0000000097'

'BIOMD0000000089'

'BIOMD0000000055'

2006

2010

2015

BIOMD0000000055 Complex network

'BIOMD0000000095'

'BIOMD0000000096'

'BIOMD0000000476'

'BIOMD0000000273'

'BIOMD0000000597'

'BIOMD0000000445'

'BIOMD0000000564'

'BIOMD0000000412'

'BIOMD0000000598'

'BIOMD0000000577'

'BIOMD0000000091'

'BIOMD0000000285'

'BIOMD0000000488'

'BIOMD0000000105'

'BIOMD0000000293'

'BIOMD0000000344'

'BIOMD0000000462'

'BIOMD0000000189'

'BIOMD0000000634'

'BIOMD0000000286'

'BIOMD0000000188'

'BIOMD0000000287'

'PAe004741'

'PXD000561'

'PAe004729'

'PAe004999'

'PAe004935'

'PAe004834'

'PAe004728'

'PAe004730'

'PAe004856'

'PAe005019'

'PAe005003'

'PAe004825'

'PAe004743'

'PAe005057'

'PAe005055'

'PAe004846'

'PAe005038'

'PAe005073'

'PAe004827'

'PAe004735'

'PAe004724'

'PAe004968'

'PAe004826'

'PAe005004'

'PAe004988'

'PAe004828'

'PAe004925'

'PAe005059'

'PAe004850'

'PAe004962'

'PAe004747'

'PAe005015'

'PAe004738'

'PAe004824'

'PAe004737'

'PAe005097'

'PAe005016'

'PAe005095'

'PAe005078'

'PAe005001'

'PAe004896'

'PAe004904'

'PAe005076'

'PAe004727'

'PAe005041'

'PAe004726'

'PAe004734'

'PAe004957'

'PAe004969'

'PAe004731'

'PAe004736'

'PAe005029'

'PAe004974'

'PAe004973'

'PAe005098'

'PAe004965'

'PAe004950'

'PAe004833'

'PAe004744'

'PAe004831'

'PAe004914'

'PAe004963'

'PAe004746'

'PAe004991'

'PAe004880'

'PAe004953'

'PAe005014'

'PAe005031'

'PAe004733'

'PAe005035'

'PAe004723'

'PAe005042'

'PAe004740'

'PAe004966'

'PAe005027'

'PAe005009'

'PAe004901'

'PAe004732'

'PAe004876'

Fig. 1 Examples of the reanalysis network for different OmicsDI datasets: a BioModels model BIOMD0000000055. BioModels are reused over time (e.g.

2006–2015) to build new models; in the BioModel database each new model contains references to the original source model of information. b Twelve

different BioModels models are connected through a reanalysis network. The BioModel database traces the origin of each model and the relations between

them, enabling to trace complex reanalysis relations where models can be originated from multiple models and be used by other models. c Proteomics

reanalysis network for the draft of the human proteome project (PRIDE accession PXD000561). In proteomics, the predominant reanalysis pattern is “one

to many”, where original deposited submissions are reanalysed in multiple datasets by multiple authors

ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11461-w

4 NATURE COMMUNICATIONS | (2019) 10:3512 | https://doi.org/10.1038/s41467-019-11461-w | www.nature.com/naturecommunications

E-MTAB-513 is among the 10 most cited datasets in the litera-

ture: it has been cited 155 times and reanalysed 4 times. In

addition to the normalised values, we have decided to provide the

“raw” metrics to the community, which will enable to combine

them into more complex models

. However, we have shown that

these metrics can be used independently to generate models for

clustering and classiﬁcation (Supplementary Note 2).

In 2011, Mons et al. introduced the idea of nano-publications,

from which the authors could get credit not only through the

actual publication but also through all the knowledge associated

with it

. In our view, the value of the dataset should not be only

associated with the “raw data” or the claims in the publication,

but also should be assessed considering all the biological entities

supported in knowledgebases. We have developed the connec-

tions metric, which can be used to estimate the impact of a

dataset for knowledgebases, by counting how many biological

entities are supported by it.

Importantly, OmicsDI is monitoring not only the web interface

views but also the interaction through the OmicsDI API. On

average, every dataset in OmicsDI has been accessed at least 30

times since 2016 (Table 1). By March 2019, we had captured the

number of direct downloads for six different databases at the

European Bioinformatics Institute. These two metrics (views and

downloads) are not publicly available in any of these resources

and at present are infeasible to retrieve. In fact, at present, the ﬁrst

coordinated efforts to gather them in a standard manner are

taking place in the context of the ELIXIR framework for Eur-

opean biological data resources

. With this ﬁrst implementation,

we are promoting that resources systematically release this

information to the public domain.

The newly implemented OmicsDI dataset claiming system

enables authors, research groups, scientiﬁc consortia and research

institutions to organise datasets under a unique OmicsDI proﬁle,

and for datasets to be added to their own ORCID proﬁles as well.

At the time of writing (March 2019), 968 datasets have been

claimed into ORCID proﬁles through OmicsDI. In our view,

following the same system for monitoring the impact of indivi-

dual datasets, these metrics could also be used to measure at least

some aspects of the impact of public omics data resources

25,26

common problem of impact evaluation is to compare different

ﬁelds or topics with the same metrics. Figure 4 shows the average

distribution of metrics (raw and normalised) for each omics type.

1500

800

600

400

200

510

30,000

20,000

10,000

1000

500

Number of datasets

900

600

300

10 15

0 5 10 15 20

Number of citations

Genomics Metabolomics

Transcriptomics

Multiomics

Proteomics

Genomics

Metabolomics

Multiomics

Proteomics

Transcriptomics

Omics type

ArrayExpress

600

400

Number of datasets

200

0 5 10 5

Years

15 0

510

Database

ArrayExpress

GEO

Pride

GEO Pride

Fig. 2 a Elapsed time between the original publication of a dataset and the publication of all its reanalyses for three omics data archives (PRIDE—

Proteomics, GEO—Transcriptomics, ArrayExpress—Transcriptomics). Transcriptomics datasets tend to be reanalysed over time until datasets are 12 years

old, while proteomics datasets (PRIDE) are less reused after 3 years from their publication. b Distribution of the number of citations per dataset group by

OmicsDI omics type. Transcriptomics datasets are highly cited with more than 30,000 datasets with 11 citations; while in genomics, proteomics and

metabolomics most datasets are only cited once

NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-11461-w ARTICLE

NATURE COMMUNICATIONS | (2019) 10:3512 | https://doi.org/10.1038/s41467-019-11461-w | www.nature.com/naturecommunications 5

Quantifying the impact of public omics data

Figures

Citations

Integrated Omics: Tools, Advances, and Future Approaches

The ProteomeXchange consortium at 10 years: 2023 update

Decoding communication patterns of the innate immune system by quantitative proteomics.

Using open data to rapidly benchmark biomolecular simulations: Phospholipid conformational dynamics

Self-Supervision Enhanced Feature Selection with Correlated Gates

References

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

The FAIR Guiding Principles for scientific data management and stewardship

UniProt: the Universal Protein knowledgebase

The Reactome Pathway Knowledgebase.

2016 update of the PRIDE database and its related tools

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "Quantifying the impact of public omics data" ?

Q2. What is the method to shrink the original values of a distribution to a range?

Q3. How many datasets are stored in OmicsDI?

Q4. What is the purpose of the new OmicsDI system?

Q5. How many datasets contain connections to knowledge-based resources?

Q6. What can be the way to assess the impact of a dataset?

Q7. What is the importance of reporting scientific impact?

Q8. What are the main reasons for the reanalysis of datasets?

Q9. What is the standard deviation for the citation rate for proteomics datasets?

Q10. What are the five metrics that can be used to estimate the impact of datasets?

Q11. How can researchers create their own profile in OmicsDI?