Home
/
Authors
/
Xiaofei Sun

Author

Xiaofei Sun

Other affiliations: Harbin Institute of Technology

Bio: Xiaofei Sun is an academic researcher from Stony Brook University. The author has contributed to research in topics: Language model & Computer science. The author has an hindex of 10, co-authored 44 publications receiving 483 citations. Previous affiliations of Xiaofei Sun include Harbin Institute of Technology.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Dice Loss for Data-imbalanced NLP Tasks

[...]

Xiaoya Li, Xiaofei Sun¹, Yuxian Meng, Junjun Liang, Fei Wu², Jiwei Li² - Show less +2 more•Institutions (2)

Stony Brook University¹, Zhejiang University²

01 Jul 2020

TL;DR: This paper proposes to use dice loss in replacement of the standard cross-entropy objective for data-imbalanced NLP tasks, based on the Sørensen--Dice coefficient or Tversky index, which attaches similar importance to false positives and false negatives, and is more immune to the data-IMbalance issue.

...read moreread less

Abstract: Many NLP tasks such as tagging and machine reading comprehension are faced with the severe data imbalance issue: negative examples significantly outnumber positive examples, and the huge number of easy-negative examples overwhelms the training. The most commonly used cross entropy (CE) criteria is actually an accuracy-oriented objective, and thus creates a discrepancy between training and test: at training time, each training instance contributes equally to the objective function, while at test time F1 score concerns more about positive examples. In this paper, we propose to use dice loss in replacement of the standard cross-entropy objective for data-imbalanced NLP tasks. Dice loss is based on the Sorensen--Dice coefficient or Tversky index , which attaches similar importance to false positives and false negatives, and is more immune to the data-imbalance issue. To further alleviate the dominating influence from easy-negative examples in training, we propose to associate training examples with dynamically adjusted weights to deemphasize easy-negative examples. Theoretical analysis shows that this strategy narrows down the gap between the F1 score in evaluation and the dice loss in training. With the proposed training objective, we observe significant performance boost on a wide range of data imbalanced NLP tasks. Notably, we are able to achieve SOTA results on CTB5, CTB6 and UD1.4 for the part of speech tagging task; SOTA results on CoNLL03, OntoNotes5.0, MSRA and OntoNotes4.0 for the named entity recognition task; along with competitive results on the tasks of machine reading comprehension and paraphrase identification.

...read moreread less

195 citations

Proceedings Article•

Glyce: Glyph-vectors for Chinese Character Representations

[...]

Yuxian Meng, Wei Wu¹, Fei Wang², Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun³, Jiwei Li⁴ - Show less +6 more•Institutions (4)

Peking University¹, Xi'an Jiaotong University², Stony Brook University³, Stanford University⁴

29 Jan 2019

TL;DR: Glyce, the glyph-vectors for Chinese character representations, is presented and it is shown that glyph-based models are able to consistently outperform word/char ID- based models in a wide range of Chinese NLP tasks.

...read moreread less

Abstract: It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures (called tianzege-CNN) tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model's ability to generalize. We show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. When combing with BERT, we are able to set new state-of-the-art results for a variety of Chinese NLP tasks, including language modeling, tagging (NER, CWS, POS), sentence pair classification (BQ, LCQMC, XNLI, NLPCC-DBQA), single sentence classification tasks (ChnSentiCorp, the Fudan corpus, iFeng), dependency parsing, and semantic role labeling. For example, the proposed model achieves an F1 score of 81.6 on the OntoNotes dataset of NER, +1.5 over BERT; it achieves an almost perfect accuracy of 99.8\% on the the Fudan corpus for text classification.

...read moreread less

123 citations

Proceedings Article•DOI•

BertGCN: Transductive Text Classification by Combining GNN and BERT

[...]

Yuxiao Lin¹, Yuxian Meng, Xiaofei Sun², Qinghong Han, Kun Kuang¹, Jiwei Li¹, Fei Wu³ - Show less +3 more•Institutions (3)

Zhejiang University¹, Stony Brook University², Fudan University³

01 Aug 2021

TL;DR: By jointly training the BERT and GCN modules within BertGCN, the proposed model is able to leverage the advantages of both worlds: large-scale pretraining which takes the advantage of the massive amount of raw data and transductive learning.

...read moreread less

Abstract: In this work, we propose BertGCN, a model that combines large scale pretraining and transductive learning for text classification. BertGCN constructs a heterogeneous graph over the dataset and represents documents as nodes using BERT representations. By jointly training the BERT and GCN modules within BertGCN, the proposed model is able to leverage the advantages of both worlds: large-scale pretraining which takes the advantage of the massive amount of raw data and transductive learning which jointly learns representations for both training data and unlabeled test data by propagating label influence through graph convolution. Experiments show that BertGCN achieves SOTA performances on a wide range of text classification datasets.1

...read moreread less

102 citations

Proceedings Article•DOI•

Is Word Segmentation Necessary for Deep Learning of Chinese Representations

[...]

Xiaoya Li, Yuxian Meng, Xiaofei Sun¹, Qinghong Han, Arianna Yuan², Jiwei Li³ - Show less +2 more•Institutions (3)

Stony Brook University¹, Stanford University², Zhejiang University³

01 Jul 2019

TL;DR: The authors show that word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting.

...read moreread less

Abstract: Segmenting a chunk of text into words is usually the first step of processing Chinese text, but its necessity has rarely been explored. In this paper, we ask the fundamental question of whether Chinese word segmentation (CWS) is necessary for deep learning-based Chinese Natural Language Processing. We benchmark neural word-based models which rely on word segmentation against neural char-based models which do not involve word segmentation in four end-to-end NLP benchmark tasks: language modeling, machine translation, sentence matching/paraphrase and text classification. Through direct comparisons between these two types of models, we find that char-based models consistently outperform word-based models. Based on these observations, we conduct comprehensive experiments to study why word-based models underperform char-based models in these deep learning-based NLP tasks. We show that it is because word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting. We hope this paper could encourage researchers in the community to rethink the necessity of word segmentation in deep learning-based Chinese Natural Language Processing.

...read moreread less

91 citations

Posted Content•

A General Framework for Content-enhanced Network Representation Learning.

[...]

Xiaofei Sun, Jiang Guo, Xiao Ding, Ting Liu

10 Oct 2016-arXiv: Social and Information Networks

TL;DR: This paper proposes content-enhanced network embedding (CENE), which is capable of jointly leveraging the network structure and the content information, and shows that its models outperform all existing network embeddedding methods, demonstrating the merits of content information and joint learning.

...read moreread less

Abstract: This paper investigates the problem of network embedding, which aims at learning low-dimensional vector representation of nodes in networks. Most existing network embedding methods rely solely on the network structure, i.e., the linkage relationships between nodes, but ignore the rich content information associated with it, which is common in real world networks and beneficial to describing the characteristics of a node. In this paper, we propose content-enhanced network embedding (CENE), which is capable of jointly leveraging the network structure and the content information. Our approach integrates text modeling and structure modeling in a general framework by treating the content information as a special kind of node. Experiments on several real world net- works with application to node classification show that our models outperform all existing network embedding methods, demonstrating the merits of content information and joint learning.

...read moreread less

83 citations

1
2
3
4
…
5
6
7
8
9
10

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A Survey on Network Embedding

[...]

Peng Cui¹, Xiao Wang¹, Jian Pei², Wenwu Zhu¹•Institutions (2)

Tsinghua University¹, Simon Fraser University²

01 May 2019-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure as discussed by the authors, and a significant amount of progress has been made toward this emerging network analysis paradigm.

...read moreread less

Abstract: Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses have been made toward this emerging network analysis paradigm. In this survey, we focus on categorizing and then reviewing the current development on network embedding methods, and point out its future research directions. We first summarize the motivation of network embedding. We discuss the classical graph embedding algorithms and their relationship with network embedding. Afterwards and primarily, we provide a comprehensive overview of a large number of network embedding methods in a systematic manner, covering the structure- and property-preserving network embedding methods, the network embedding methods with side information, and the advanced information preserving network embedding methods. Moreover, several evaluation approaches for network embedding and some useful online resources, including the network data sets and softwares, are reviewed, too. Finally, we discuss the framework of exploiting these network embedding methods to build an effective system and point out some potential future directions.

...read moreread less

929 citations

Journal Article•DOI•

A Survey on Deep Learning for Named Entity Recognition

[...]

Jing Li¹, Aixin Sun¹, Jianglei Han, Chenliang Li²•Institutions (2)

Nanyang Technological University¹, Wuhan University²

17 Mar 2020-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A comprehensive review on existing deep learning techniques for NER is provided in this paper, where the authors systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder.

...read moreread less

Abstract: Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.

...read moreread less

474 citations

Posted Content•

A Survey on Deep Learning for Named Entity Recognition

[...]

Jing Li¹, Aixin Sun¹, Jianglei Han¹, Chenliang Li²•Institutions (2)

Nanyang Technological University¹, Wuhan University²

22 Dec 2018-arXiv: Computation and Language

TL;DR: A comprehensive review on existing deep learning techniques for NER, including tagged NER corpora and off-the-shelf NER tools, and systematically categorizes existing works based on a taxonomy along three axes.

...read moreread less

Abstract: Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging to predefined semantic types such as person, location, organization etc. NER always serves as the foundation for many natural language applications such as question answering, text summarization, and machine translation. Early NER systems got a huge success in achieving good performance with the cost of human engineering in designing domain-specific features and rules. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.

...read moreread less

381 citations

Posted Content•

A Unified MRC Framework for Named Entity Recognition

[...]

Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu¹, Jiwei Li¹ - Show less +2 more•Institutions (1)

Zhejiang University¹

25 Oct 2019-arXiv: Computation and Language

TL;DR: This paper proposed a unified framework that is capable of handling both flat and nested NER tasks, instead of treating the task of NER as a sequence labeling problem, they propose to formulate it as a machine reading comprehension (MRC) task.

...read moreread less

Abstract: The task of named entity recognition (NER) is normally divided into nested NER and flat NER depending on whether named entities are nested or not. Models are usually separately developed for the two tasks, since sequence labeling models, the most widely used backbone for flat NER, are only able to assign a single label to a particular token, which is unsuitable for nested NER where a token may be assigned several labels. In this paper, we propose a unified framework that is capable of handling both flat and nested NER tasks. Instead of treating the task of NER as a sequence labeling problem, we propose to formulate it as a machine reading comprehension (MRC) task. For example, extracting entities with the \textsc{per} label is formalized as extracting answer spans to the question "{\it which person is mentioned in the text?}". This formulation naturally tackles the entity overlapping issue in nested NER: the extraction of two overlapping entities for different categories requires answering two independent questions. Additionally, since the query encodes informative prior knowledge, this strategy facilitates the process of entity extraction, leading to better performances for not only nested NER, but flat NER. We conduct experiments on both {\em nested} and {\em flat} NER datasets. Experimental results demonstrate the effectiveness of the proposed formulation. We are able to achieve vast amount of performance boost over current SOTA models on nested NER datasets, i.e., +1.28, +2.55, +5.44, +6.37, respectively on ACE04, ACE05, GENIA and KBP17, along with SOTA results on flat NER datasets, i.e.,+0.24, +1.95, +0.21, +1.49 respectively on English CoNLL 2003, English OntoNotes 5.0, Chinese MSRA, Chinese OntoNotes 4.0.

...read moreread less

282 citations

Proceedings Article•DOI•

CANE: Context-Aware Network Embedding for Relation Modeling

[...]

Cunchao Tu¹, Han Liu, Zhiyuan Liu¹, Maosong Sun¹•Institutions (1)

Tsinghua University¹

01 Jul 2017

TL;DR: Context-Aware Network Embedding (CANE), a novel NE model that learns context-aware embeddings for vertices with mutual attention mechanism and is expected to model the semantic relationships between vertices more precisely, is presented.

...read moreread less

Abstract: Network embedding (NE) is playing a critical role in network analysis, due to its ability to represent vertices with efficient low-dimensional embedding vectors. However, existing NE models aim to learn a fixed context-free embedding for each vertex and neglect the diverse roles when interacting with other vertices. In this paper, we assume that one vertex usually shows different aspects when interacting with different neighbor vertices, and should own different embeddings respectively. Therefore, we present Context-Aware Network Embedding (CANE), a novel NE model to address this issue. CANE learns context-aware embeddings for vertices with mutual attention mechanism and is expected to model the semantic relationships between vertices more precisely. In experiments, we compare our model with existing NE models on three real-world datasets. Experimental results show that CANE achieves significant improvement than state-of-the-art methods on link prediction and comparable performance on vertex classification. The source code and datasets can be obtained from https://github.com/thunlp/CANE.

...read moreread less

257 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169

Collapse