Home
/
Authors
/
Erik M. van Mulligen

Author

Erik M. van Mulligen

Other affiliations: Nanyang Technological University, Erasmus University Rotterdam

Bio: Erik M. van Mulligen is an academic researcher from Erasmus University Medical Center. The author has contributed to research in topics: Unified Medical Language System & Annotation. The author has an hindex of 31, co-authored 81 publications receiving 7247 citations. Previous affiliations of Erik M. van Mulligen include Nanyang Technological University & Erasmus University Rotterdam.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1996
1994
1990

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The FAIR Guiding Principles for scientific data management and stewardship

[...]

Mark Wilkinson¹, Michel Dumontier², IJsbrand Jan Aalbersberg³, Gabrielle Appleton³, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Olavo Bonino da Silva Santos⁴, Philip E. Bourne⁵, Jildau Bouwman, Anthony J. Brookes⁶, Timothy Clark⁷, Mercè Crosas⁷, Ingrid Dillo, Olivier G. Dumon³, Scott C. Edmunds⁸, Chris T. Evelo⁹, Richard Finkers¹⁰, Alejandra Gonzalez-Beltran¹¹, Alasdair J. G. Gray¹², Paul Groth³, Carole Goble¹³, Jeffrey S. Grethe¹⁴, Jaap Heringa, Peter A C 't Hoen¹⁵, Rob Hooft, Tobias Kuhn⁴, Ruben Kok, Joost N. Kok¹⁶, Scott J. Lusher, Maryann E. Martone¹⁴, Albert Mons, Abel L. Packer¹⁷, Bengt Persson¹⁸, Philippe Rocca-Serra¹¹, Marco Roos¹⁵, Rene van Schaik¹⁹, Susanna-Assunta Sansone¹¹, Erik Anthony Schultes¹⁵, Thierry Sengstag²⁰, Ted Slater²¹, George Strawn, Morris A. Swertz²², Mark Thompson¹⁵, Johan van der Lei²³, Erik M. van Mulligen²³, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft¹⁶, Jun Zhao¹¹, Barend Mons¹⁵, Barend Mons²³ - Show less +50 more•Institutions (23)

Technical University of Madrid¹, Stanford University², Elsevier³, VU University Amsterdam⁴, National Institutes of Health⁵, University of Leicester⁶, Harvard University⁷, Beijing Genomics Institute⁸, Maastricht University⁹, Wageningen University and Research Centre¹⁰, University of Oxford¹¹, Heriot-Watt University¹², University of Manchester¹³, University of California, San Diego¹⁴, Leiden University Medical Center¹⁵, Leiden University¹⁶, Federal University of São Paulo¹⁷, Science for Life Laboratory¹⁸, Bayer¹⁹, Swiss Institute of Bioinformatics²⁰, Cray²¹, University Medical Center Groningen²², Erasmus University Rotterdam²³

15 Mar 2016-Scientific Data

TL;DR: The FAIR Data Principles as mentioned in this paper are a set of data reuse principles that focus on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.

...read moreread less

Abstract: There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

...read moreread less

7,602 citations

Journal Article•DOI•

Addendum: The FAIR Guiding Principles for scientific data management and stewardship

[...]

Mark Wilkinson¹, Michel Dumontier², IJsbrand Jan Aalbersberg³, Gabrielle Appleton³, Myles Axton, Arie Baak, Niklas Blomberg, Jan Willem Boiten, Luiz Olavo Bonino da Silva Santos⁴, Philip E. Bourne⁵, Jildau Bouwman, Anthony J. Brookes⁶, Timothy Clark⁷, Mercè Crosas⁷, Ingrid Dillo, Olivier G. Dumon³, Scott C. Edmunds⁸, Chris T. Evelo⁹, Richard Finkers¹⁰, Alejandra Gonzalez-Beltran¹¹, Alasdair J. G. Gray¹², Paul Groth³, Carole Goble¹³, Jeffrey S. Grethe¹⁴, Jaap Heringa, Peter A C 't Hoen¹⁵, Rob Hooft, Tobias Kuhn⁴, Ruben Kok, Joost N. Kok¹⁶, Scott J. Lusher, Maryann E. Martone¹⁴, Albert Mons, Abel L. Packer¹⁷, Bengt Persson¹⁸, Philippe Rocca-Serra¹¹, Marco Roos¹⁵, Rene van Schaik¹⁹, Susanna-Assunta Sansone¹¹, Erik Anthony Schultes¹⁶, Thierry Sengstag²⁰, Ted Slater²¹, George Strawn, Morris A. Swertz²², Mark Thompson¹⁵, Johan van der Lei²³, Erik M. van Mulligen²³, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft¹⁶, Jun Zhao¹¹, Barend Mons¹⁵, Barend Mons²³ - Show less +50 more•Institutions (23)

19 Mar 2019-Scientific Data

TL;DR: The FAIR Data Principles as discussed by the authors are a set of data reuse principles that focus on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.

...read moreread less

Abstract: There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

...read moreread less

220 citations

Journal Article•DOI•

The value of data

[...]

Barend Mons, Herman H. H. B. M. van Haagen¹, Christine Chichester², P.A.C. ’t Hoen¹, Johan T. den Dunnen¹, Gert-Jan B. van Ommen¹, Erik M. van Mulligen³, Bharat Singh³, Bharat Singh², Rob Hooft², Marco Roos¹, Marco Roos², Joel Hammond⁴, Bruce Kiesel⁴, Belinda Giardine⁵, Jan Velterop, Paul Groth⁶, Erik Anthony Schultes¹ - Show less +14 more•Institutions (6)

Leiden University Medical Center¹, Netherlands Bioinformatics Centre², Erasmus University Rotterdam³, Thomson Reuters⁴, Pennsylvania State University⁵, University of Amsterdam⁶

01 Apr 2011-Nature Genetics

TL;DR: The social challenge facing us is to maintain the value of traditional narrative publications and their relationship to the datasets they report upon while at the same time developing appropriate metrics for citation of data and data constructs.

...read moreread less

Abstract: Data citation and the derivation of semantic constructs directly from datasets have now both found their place in scientific communication. The social challenge facing us is to maintain the value of traditional narrative publications and their relationship to the datasets they report upon while at the same time developing appropriate metrics for citation of data and data constructs.

...read moreread less

174 citations

Journal Article•DOI•

Calling on a million minds for community annotation in WikiProteins.

[...]

Barend Mons, Michael Ashburner¹, Michael Ashburner², Christine Chichester³, Christine Chichester⁴, Erik M. van Mulligen⁵, Marc Weeber, Johan T. den Dunnen⁴, Gert-Jan B. van Ommen⁴, Mark A. Musen⁶, Matthew Cockerill⁷, Henning Hermjakob², Albert Mons, Abel L. Packer, Roberto Carlos dos Santos Pacheco, Suzanna E. Lewis¹, Suzanna E. Lewis², Alfred Berkeley, William Melton, Nickolas Barris, Jimmy Wales, Gerard Meijssen, Erik Moeller, Peter Jan Roes, Katy Börner⁸, Amos Marc Bairoch³ - Show less +22 more•Institutions (8)

Lawrence Berkeley National Laboratory¹, European Bioinformatics Institute², Swiss Institute of Bioinformatics³, Leiden University⁴, Erasmus University Rotterdam⁵, Stanford University⁶, BioMed Central⁷, Indiana University⁸

28 May 2008-Genome Biology

TL;DR: A 'million minds' are called on to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery in a Wiki-based system.

...read moreread less

Abstract: WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at http://www.wikiprofessional.org.

...read moreread less

153 citations

Journal Article•DOI•

A dictionary to identify small molecules and drugs in free text

[...]

Kristina Hettne¹, Rob H. Stierum¹, Martijn J. Schuemie¹, Peter J. M. Hendriksen¹, Bob J. A. Schijvenaars¹, Erik M. van Mulligen¹, Jos C. S. Kleinjans¹, Jan A. Kors¹ - Show less +4 more•Institutions (1)

Erasmus University Medical Center¹

01 Nov 2009-Bioinformatics

TL;DR: A dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus is developed.

...read moreread less

Abstract: Motivation: From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. Results: We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. Availability: The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist. Contact: k.hettne@erasmusmc.nl Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

151 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Pfam protein families database in 2019.

[...]

Sara El-Gebali¹, Jaina Mistry¹, Alex Bateman¹, Sean R. Eddy², Aurelien Luciani¹, Simon C. Potter¹, Matloob Qureshi¹, Lorna Richardson¹, Gustavo A. Salazar¹, Alfredo Smart¹, Erik L. L. Sonnhammer³, Layla Hirsh⁴, Layla Hirsh⁵, Lisanna Paladin⁵, Damiano Piovesan⁵, Silvio C. E. Tosatto⁵, Robert D. Finn¹ - Show less +13 more•Institutions (5)

European Bioinformatics Institute¹, Harvard University², Science for Life Laboratory³, Pontifical Catholic University of Peru⁴, University of Padua⁵

08 Jan 2019-Nucleic Acids Research

TL;DR: A significant comparison to the structural classification database that led to the creation of 825 new families based on their set of uncharacterized families (EUFs) was carried out and Pfam entries were connected to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms.

...read moreread less

Abstract: The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

...read moreread less

3,617 citations

Journal Article•DOI•

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.

[...]

Annalisa Buniello¹, Jacqueline A. L. MacArthur¹, Maria Cerezo¹, Laura W. Harris¹, James D. Hayhurst¹, Cinzia Malangone¹, Aoife McMahon¹, Joannella Morales¹, Edward Mountjoy², Edward Mountjoy³, Elliot Sollis¹, Daniel Suveges¹, Olga Vrousgou¹, Patricia L. Whetzel¹, M. Ridwan Amode¹, Jose A. Guillen¹, Harpreet Singh Riat¹, Stephen J. Trevanion¹, Peggy Hall⁴, Heather Junkins⁴, Paul Flicek¹, Tony Burdett¹, Lucia A. Hindorff⁴, Fiona Cunningham¹, Helen Parkinson¹ - Show less +21 more•Institutions (4)

European Bioinformatics Institute¹, Wellcome Trust Sanger Institute², University of Oxford³, National Institutes of Health⁴

08 Jan 2019-Nucleic Acids Research

TL;DR: Improved data access is improved with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database.

...read moreread less

Abstract: The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.

...read moreread less

2,878 citations

Journal Article•DOI•

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

[...]

Jinhyuk Lee¹, Wonjin Yoon¹, Sungdong Kim², Donghyeon Kim¹, Sunkyu Kim¹, Chan Ho So¹, Jaewoo Kang¹ - Show less +3 more•Institutions (2)

Korea University¹, Naver Corporation²

25 Jan 2019-Bioinformatics

TL;DR: This article proposed BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora.

...read moreread less

Abstract: Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.

...read moreread less

2,680 citations

Journal Article•DOI•

PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews.

[...]

Matthew J. Page¹, David Moher², Patrick M.M. Bossuyt³, Isabelle Boutron⁴, Tammy Hoffmann⁵, Cynthia D. Mulrow⁶, Larissa Shamseer², Jennifer Tetzlaff, Elie A. Akl⁷, Sue E. Brennan¹, Roger Chou⁸, Julie Glanville⁹, Jeremy M. Grimshaw¹⁰, Asbjørn Hróbjartsson¹¹, Manoj M. Lalu¹⁰, Tianjing Li¹², Elizabeth Loder¹³, Evan Mayo-Wilson¹⁴, Steve McDonald¹, Luke A McGuinness¹⁵, Lesley A. Stewart⁹, James Thomas¹⁶, Andrea C. Tricco¹⁷, Vivian Welch², Penny Whiting¹⁵, Joanne E. McKenzie¹ - Show less +22 more•Institutions (17)

Monash University¹, University of Ottawa², University of Amsterdam³, University of Paris⁴, Bond University⁵, University of Texas Health Science Center at San Antonio⁶, American University of Beirut⁷, Oregon Health & Science University⁸, University of York⁹, Ottawa Hospital Research Institute¹⁰, University of Southern Denmark¹¹, Johns Hopkins University¹², Brigham and Women's Hospital¹³, Indiana University¹⁴, University of Bristol¹⁵, University College London¹⁶, University of Toronto¹⁷

29 Mar 2021-BMJ

TL;DR: The preferred reporting items for systematic reviews and meta-analyses (PRISMA 2020) as mentioned in this paper was developed to facilitate transparent and complete reporting of systematic reviews, and has been updated to reflect recent advances in systematic review methodology and terminology.

...read moreread less

Abstract: The methods and results of systematic reviews should be reported in sufficient detail to allow users to assess the trustworthiness and applicability of the review findings. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement was developed to facilitate transparent and complete reporting of systematic reviews and has been updated (to PRISMA 2020) to reflect recent advances in systematic review methodology and terminology. Here, we present the explanation and elaboration paper for PRISMA 2020, where we explain why reporting of each item is recommended, present bullet points that detail the reporting recommendations, and present examples from published reviews. We hope that changes to the content and structure of PRISMA 2020 will facilitate uptake of the guideline and lead to more transparent, complete, and accurate reporting of systematic reviews.

...read moreread less

2,217 citations

Journal Article•DOI•

The Gene Ontology resource: enriching a GOld mine

[...]

Seth Carbon, Eric Douglass, Benjamin M. Good, Deepak Unni +176 more

08 Jan 2021-Nucleic Acids Research

TL;DR: A historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations is made available to maintain consistency with other ontologies.

...read moreread less

Abstract: The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.

...read moreread less

1,988 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse