Home
/
Authors
/
Andrew MacKinlay

Author

Andrew MacKinlay

Other affiliations: NICTA, University of Melbourne

Bio: Andrew MacKinlay is an academic researcher from IBM. The author has contributed to research in topics: Parsing & Biomedical text mining. The author has an hindex of 13, co-authored 36 publications receiving 679 citations. Previous affiliations of Andrew MacKinlay include NICTA & University of Melbourne.

Papers

PDF

Open Access

More filters

Proceedings Article•

How Noisy Social Media Text, How Diffrnt Social Media Sources?

[...]

Timothy Baldwin¹, Paul Cook¹, Marco Lui¹, Andrew MacKinlay², Li Wang² - Show less +1 more•Institutions (2)

University of Melbourne¹, NICTA²

01 Oct 2013

TL;DR: This work investigates just how linguistically noisy or otherwise text in social media text is over a range of social media sources, in the form of YouTube comments, Twitter posts, web user forum posts, blog posts and Wikipedia, which is compared to a reference corpus of edited English text.

...read moreread less

Abstract: While various claims have been made about text in social media text being noisy, there has never been a systematic study to investigate just how linguistically noisy or otherwise it is over a range of social media sources. We explore this question empirically over popular social media text types, in the form of YouTube comments, Twitter posts, web user forum posts, blog posts and Wikipedia, which we compare to a reference corpus of edited English text. We first extract out various descriptive statistics from each data type (including the distribution of languages, average sentence length and proportion of out-ofvocabulary words), and then investigate the proportion of grammatical sentences in each, based on a linguistically-motivated parser. We also investigate the relative similarity between different data types.

...read moreread less

234 citations

Proceedings Article•

Reconsidering language identification for written language resources

[...]

Baden Hughes¹, Timothy Baldwin¹, Steven Bird¹, Jeremy Nicholson¹, Andrew MacKinlay¹ - Show less +1 more•Institutions (1)

University of Melbourne¹

01 May 2006

TL;DR: A review of previous research in written language identification reveals a number of questions which remain open and ripe for further investigation.

...read moreread less

Abstract: The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approachesto written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain openand ripe for further investigation.

...read moreread less

108 citations

Proceedings Article•DOI•

Investigating Public Health Surveillance using Twitter

[...]

Antonio Jimeno Yepes¹, Andrew MacKinlay¹, Bo Han¹•Institutions (1)

IBM¹

01 Jul 2015

TL;DR: A system which can identify medical named entities in a real-time stream of Twitter posts and determine their geographic locations is presented, as well as preliminary experiments in using this information for health surveillance purposes.

...read moreread less

Abstract: Microblog services such as Twitter are an attractive source of data for public health surveillance, as they avoid the legal and technical obstacles to accessing the more obvious and targeted sources of health information. Only a tiny fraction of tweets may contain useful public health information but in Twitter this is oset by the sheer volume of tweets posted. We present a system which can identify medical named entities in a real-time stream of Twitter posts and determine their geographic locations, as well as preliminary experiments in using this information for health surveillance purposes.

...read moreread less

41 citations

Proceedings Article•

Named Entity Recognition with Stack Residual LSTM and Trainable Bias Decoding.

[...]

Quan Tran, Andrew MacKinlay¹, Antonio Jimeno Yepes¹•Institutions (1)

IBM¹

23 Jun 2017

TL;DR: This paper introduced residual connections between the Stacked Recurrent Neural Network model to address the degradation problem of deep neural networks and a bias decoding mechanism to adapt to non-differentiable and externally computed objectives, such as the entity-based F-measure.

...read moreread less

Abstract: Recurrent Neural Network models are the state-of-the-art for Named Entity Recognition (NER). We present two innovations to improve the performance of these models. The first innovation is the introduction of residual connections between the Stacked Recurrent Neural Network model to address the degradation problem of deep neural networks. The second innovation is a bias decoding mechanism that allows the trained system to adapt to non-differentiable and externally computed objectives, such as the entity-based F-measure. Our work improves the state-of-the-art results for both Spanish and English languages on the standard train/development/test split of the CoNLL 2003 Shared Task NER dataset.

...read moreread less

37 citations

Journal Article•DOI•

Identifying Diseases, Drugs, and Symptoms in Twitter.

[...]

Antonio Jimeno-Yepes¹, Andrew MacKinlay¹, Bo Han¹, Qiang Chen¹•Institutions (1)

IBM¹

01 Jan 2015-Studies in health technology and informatics

TL;DR: The manual annotation results show that it is possible to perform high-quality annotation despite of the complexity of medical terminology and the lack of context in a tweet, and the capability of state-of-the-art approaches to reproduce the annotations in the data set is evaluated.

...read moreread less

Abstract: Social media sites, such as Twitter, are a rich source of many kinds of information, including health-related information. Accurate detection of entities such as diseases, drugs, and symptoms could be used for biosurveillance (e.g. monitoring of flu) and identification of adverse drug events. However, a critical assessment of performance of current text mining technology on Twitter has not been done yet in the medical domain. Here, we study the development of a Twitter data set annotated with relevant medical entities which we have publicly released. The manual annotation results show that it is possible to perform high-quality annotation despite of the complexity of medical terminology and the lack of context in a tweet. Furthermore, we have evaluated the capability of state-of-the-art approaches to reproduce the annotations in the data set. The best methods achieve F-scores of 55-66%. The data analysis and the preliminary results provide valuable insights on identifying medical entities in Twitter for various applications.

...read moreread less

35 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Cambridge Grammar of the English Language

[...]

H.G.A. Hughes

01 Jan 2003

1,739 citations

Journal Article•DOI•

A Survey on Deep Learning for Named Entity Recognition

[...]

Jing Li¹, Aixin Sun¹, Jianglei Han, Chenliang Li²•Institutions (2)

Nanyang Technological University¹, Wuhan University²

17 Mar 2020-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A comprehensive review on existing deep learning techniques for NER is provided in this paper, where the authors systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder.

...read moreread less

Abstract: Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.

...read moreread less

474 citations

Book Chapter•DOI•

Part-of-speech tagging from 97% to 100%: is it time for some linguistics?

[...]

Christopher D. Manning¹•Institutions (1)

Stanford University¹

20 Feb 2011

TL;DR: It is suggested and demonstrated that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained, that is, from improved descriptive linguistics.

...read moreread less

Abstract: I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning or better features in a discriminative sequence classifier. The prospects for further gains from semisupervised learning also seem quite limited. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. That is, from improved descriptive linguistics. However, I conclude by suggesting that there are also limits to this process. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis.

...read moreread less

398 citations

Posted Content•

A Survey on Deep Learning for Named Entity Recognition

[...]

Jing Li¹, Aixin Sun¹, Jianglei Han¹, Chenliang Li²•Institutions (2)

Nanyang Technological University¹, Wuhan University²

22 Dec 2018-arXiv: Computation and Language

TL;DR: A comprehensive review on existing deep learning techniques for NER, including tagged NER corpora and off-the-shelf NER tools, and systematically categorizes existing works based on a taxonomy along three axes.

...read moreread less

Abstract: Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging to predefined semantic types such as person, location, organization etc. NER always serves as the foundation for many natural language applications such as question answering, text summarization, and machine translation. Early NER systems got a huge success in achieving good performance with the cost of human engineering in designing domain-specific features and rules. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.

...read moreread less

381 citations

Journal Article•DOI•

Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review

[...]

Kory Kreimeyer¹, Matthew Foster¹, Abhishek Pandey¹, Nina Arya¹, Gwendolyn Halford², Sandra F Jones³, Richard A. Forshee¹, Mark Walderhaug¹, Taxiarchis Botsis¹ - Show less +5 more•Institutions (3)

Center for Biologics Evaluation and Research¹, Food and Drug Administration², Centers for Disease Control and Prevention³

01 Sep 2017-Journal of Biomedical Informatics

TL;DR: This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.

...read moreread less

342 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

Collapse