Home
/
Authors
/
Donghong Ji

Author

Donghong Ji

Other affiliations: Institute for Infocomm Research Singapore

Bio: Donghong Ji is an academic researcher from Wuhan University. The author has contributed to research in topics: Computer science & Relationship extraction. The author has an hindex of 26, co-authored 132 publications receiving 2837 citations. Previous affiliations of Donghong Ji include Institute for Infocomm Research Singapore.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2008
2007
2006
2005
2004

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The CHEMDNER corpus of chemicals and drugs and its annotation principles.

[...]

Martin Krallinger, Obdulia Rabal¹, Florian Leitner², Miguel Vazquez, David Salgado, Zhiyong Lu³, Robert Leaman³, Yanan Lu⁴, Donghong Ji⁴, Daniel M. Lowe, Roger A. Sayle, Riza Theresa Batista-Navarro, Rafal Rak, Torsten Huber⁵, Tim Rocktäschel⁶, Sérgio Matos⁷, David Campos⁷, Buzhou Tang⁸, Hua Xu⁹, Tsendsuren Munkhdalai¹⁰, Keun Ho Ryu¹⁰, S. V. Ramanan¹¹, Senthil Nathan¹¹, Slavko Žitnik¹², Marko Bajec¹², Lutz Weber, Matthias Irmer, Saber A. Akhondi¹³, Jan A. Kors¹³, Shuo Xu, Xin An¹⁴, Utpal Kumar Sikdar¹⁵, Asif Ekbal¹⁵, Masaharu Yoshioka¹⁶, Thaer M. Dieb¹⁶, Miji Choi¹⁷, Karin Verspoor¹⁷, Madian Khabsa¹⁸, C. Lee Giles¹⁸, Hongfang Liu¹⁹, Komandur Elayavilli Ravikumar¹⁹, Andre Lamurias²⁰, Francisco M. Couto²⁰, Hong-Jie Dai²¹, Richard Tzong-Han Tsai²², Caglar Ata²³, Tolga Can²³, Anabel Usié, Rui Alves, Isabel Segura-Bedmar²⁴, Paloma Martínez²⁴, Julen Oyarzabal¹, Alfonso Valencia - Show less +49 more•Institutions (24)

University of Navarra¹, Technical University of Madrid², National Institutes of Health³, Wuhan University⁴, Humboldt University of Berlin⁵, University College London⁶, University of Aveiro⁷, Harbin Institute of Technology Shenzhen Graduate School⁸, University of Texas Health Science Center at Houston⁹, Chungbuk National University¹⁰, Indian Institute of Technology Madras¹¹, University of Ljubljana¹², Erasmus University Medical Center¹³, Beijing Forestry University¹⁴, Indian Institute of Technology Patna¹⁵, Hokkaido University¹⁶, University of Melbourne¹⁷, Pennsylvania State University¹⁸, University of Rochester¹⁹, University of Lisbon²⁰, Taipei Medical University²¹, National Central University²², Middle East Technical University²³, Charles III University of Madrid²⁴

19 Jan 2015-Journal of Cheminformatics

TL;DR: The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task.

...read moreread less

Abstract: The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/

...read moreread less

368 citations

Journal Article•DOI•

A neural joint model for entity and relation extraction from biomedical text

[...]

Fei Li¹, Meishan Zhang², Guohong Fu², Donghong Ji¹•Institutions (2)

Wuhan University¹, Heilongjiang University²

31 Mar 2017-BMC Bioinformatics

TL;DR: It is demonstrated that the model based on neural networks is effective for biomedical entity and relation extraction and parameter sharing is an alternative method for neural models to jointly process this task.

...read moreread less

Abstract: Extracting biomedical entities and their relations from text has important applications on biomedical research. Previous work primarily utilized feature-based pipeline models to process this task. Many efforts need to be made on feature engineering when feature-based models are employed. Moreover, pipeline models may suffer error propagation and are not able to utilize the interactions between subtasks. Therefore, we propose a neural joint model to extract biomedical entities as well as their relations simultaneously, and it can alleviate the problems above. Our model was evaluated on two tasks, i.e., the task of extracting adverse drug events between drug and disease entities, and the task of extracting resident relations between bacteria and location entities. Compared with the state-of-the-art systems in these tasks, our model improved the F1 scores of the first task by 5.1% in entity recognition and 8.0% in relation extraction, and that of the second task by 9.2% in relation extraction. The proposed model achieves competitive performances with less work on feature engineering. We demonstrate that the model based on neural networks is effective for biomedical entity and relation extraction. In addition, parameter sharing is an alternative method for neural models to jointly process this task. Our work can facilitate the research on biomedical text mining.

...read moreread less

238 citations

Proceedings Article•

Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information

[...]

Guodong Zhou, Min Zhang, Donghong Ji, Qiaoming Zhu

01 Jun 2007

TL;DR: Evaluation on the ACE RDC corpora shows that the dynamic context-sensitive tree span is much more suitable for relation extraction than SPT and the tree kernel outperforms the state-of-the-art Collins and Duffy’s convolution tree kernel.

...read moreread less

Abstract: This paper proposes a tree kernel with contextsensitive structured parse tree information for relation extraction. It resolves two critical problems in previous tree kernels for relation extraction in two ways. First, it automatically determines a d ynamic context-sensitive tree span for relation extraction by extending the widely -used Shortest Path-enclosed Tree (SPT) to include necessary context information outside SPT. Second, it pr oposes a context -sensitive convolution tree kernel, which enumerates both context-free and contextsensitive sub-trees by consid ering their ancestor node paths as their contexts. Moreover, this paper evaluates the complementary nature between our tree kernel and a state -of-the-art linear kernel. Evaluation on the ACE RDC corpora shows that our dynamic context-sensitive tree span is much more suitable for relation extraction than SPT and our tree kernel outperforms the state-of-the-art Collins and Duffy’s convolution tree kernel. It also shows that our tree kernel achieves much better performance than the state-of-the-art linear kernels . Finally, it shows that feature-based and tree kernel-based methods much complement each other and the composite kernel can well integrate both flat and structured features.

...read moreread less

212 citations

Journal Article•DOI•

Neural networks for deceptive opinion spam detection

[...]

Yafeng Ren¹, Donghong Ji²•Institutions (2)

Guangdong University of Foreign Studies¹, Wuhan University²

01 Apr 2017-Information Sciences

TL;DR: This work empirically explore a neural network model to learn document-level representation for detecting deceptive opinion spam and shows that the proposed method outperforms state-of-the-art methods.

...read moreread less

181 citations

Proceedings Article•DOI•

Dependency Graph Enhanced Dual-transformer Structure for Aspect-based Sentiment Classification.

[...]

Hao Tang¹, Donghong Ji¹, Chenliang Li¹, Qiji Zhou¹•Institutions (1)

Wuhan University¹

01 Jul 2020

TL;DR: A dual-transformer structure is devised in DGEDT to support mutual reinforcement between the flat representation learning and graph-based representation learning, and to allow the dependency graph to guide the representation learning of the transformer encoder and vice versa.

...read moreread less

Abstract: Aspect-based sentiment classification is a popular task aimed at identifying the corresponding emotion of a specific aspect. One sentence may contain various sentiments for different aspects. Many sophisticated methods such as attention mechanism and Convolutional Neural Networks (CNN) have been widely employed for handling this challenge. Recently, semantic dependency tree implemented by Graph Convolutional Networks (GCN) is introduced to describe the inner connection between aspects and the associated emotion words. But the improvement is limited due to the noise and instability of dependency trees. To this end, we propose a dependency graph enhanced dual-transformer network (named DGEDT) by jointly considering the flat representations learnt from Transformer and graph-based representations learnt from the corresponding dependency graph in an iterative interaction manner. Specifically, a dual-transformer structure is devised in DGEDT to support mutual reinforcement between the flat representation learning and graph-based representation learning. The idea is to allow the dependency graph to guide the representation learning of the transformer encoder and vice versa. The results on five datasets demonstrate that the proposed DGEDT outperforms all state-of-the-art alternatives with a large margin.

...read moreread less

171 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

Collapse

Cited by

PDF

Open Access

More filters

Semi-Supervised Learning Literature Survey

[...]

Xiaojin Zhu

01 Jan 2005

4,189 citations

Journal Article•DOI•

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

[...]

Ranjay Krishna¹, Yuke Zhu¹, Oliver Groth², Justin Johnson¹, Kenji Hata¹, Joshua Kravitz¹, Stephanie Chen¹, Yannis Kalantidis³, Li-Jia Li, David A. Shamma⁴, Michael S. Bernstein¹, Li Fei-Fei¹ - Show less +8 more•Institutions (4)

Stanford University¹, Dresden University of Technology², Yahoo!³, Centrum Wiskunde & Informatica⁴

01 May 2017-International Journal of Computer Vision

TL;DR: The Visual Genome dataset as mentioned in this paper contains over 108k images where each image has an average of $35$35 objects, $26$26 attributes, and $21$21 pairwise relationships between objects.

...read moreread less

Abstract: Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked "What vehicle is the person riding?", computers will need to identify the objects in an image as well as the relationships riding(man, carriage) and pulling(horse, carriage) to answer correctly that "the person is riding a horse-drawn carriage." In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations represent the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.

...read moreread less

3,842 citations

Proceedings Article•DOI•

Distant supervision for relation extraction without labeled data

[...]

Mike D. Mintz¹, Steven Bills¹, Rion Snow¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

02 Aug 2009

TL;DR: This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.

...read moreread less

Abstract: Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size. Our experiments use Freebase, a large semantic database of several thousand relations, to provide distant supervision. For each pair of entities that appears in some Freebase relation, we find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. Our algorithm combines the advantages of supervised IE (combining 400,000 noisy pattern features in a probabilistic classifier) and unsupervised IE (extracting large numbers of relations from large corpora of any domain). Our model is able to extract 10,000 instances of 102 relations at a precision of 67.6%. We also analyze feature performance, showing that syntactic parse features are particularly helpful for relations that are ambiguous or lexically distant in their expression.

...read moreread less

2,965 citations

Journal Article•DOI•

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

[...]

Jinhyuk Lee¹, Wonjin Yoon¹, Sungdong Kim², Donghyeon Kim¹, Sunkyu Kim¹, Chan Ho So¹, Jaewoo Kang¹ - Show less +3 more•Institutions (2)

Korea University¹, Naver Corporation²

25 Jan 2019-Bioinformatics

TL;DR: This article proposed BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora.

...read moreread less

Abstract: Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.

...read moreread less

2,680 citations

Journal Article•DOI•

Word sense disambiguation: A survey

[...]

Roberto Navigli¹•Institutions (1)

Sapienza University of Rome¹

23 Feb 2009-ACM Computing Surveys

TL;DR: This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.

...read moreread less

Abstract: Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.

...read moreread less

2,178 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse