Home
/
Authors
/
Catherine Blake

Author

Catherine Blake

University of Illinois at Urbana–Champaign

Other affiliations: National Institutes of Health, University of North Carolina at Chapel Hill, National Center for Supercomputing Applications ...read more

Bio: Catherine Blake is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Sentence & Information system. The author has an hindex of 15, co-authored 63 publications receiving 13446 citations. Previous affiliations of Catherine Blake include National Institutes of Health & University of North Carolina at Chapel Hill.

Papers published on a yearly basis

2022
2021
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2007
2006
2005
2003
2002
2001
2000
1998

Papers

PDF

Open Access

More filters

UCI Repository of machine learning databases

[...]

Catherine Blake

01 Jan 1998

12,940 citations

Journal Article•DOI•

Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles

[...]

Catherine Blake¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 2010-Journal of Biomedical Informatics

TL;DR: An automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall is introduced, showing that a combination of semantics and syntax is required to achieve the best system performance.

...read moreread less

80 citations

Journal Issue•DOI•

Collaborative information synthesis I: A model of information behaviors of scientists in medicine and public health

[...]

Catherine Blake¹, Wanda Pratt²•Institutions (2)

University of North Carolina at Chapel Hill¹, University of Washington²

01 Nov 2006-Journal of the Association for Information Science and Technology

TL;DR: The CIS model emerges from a rich collection of qualitative data including interviews, electronic recordings of meetings, meeting minutes, e-mail communications, and extraction worksheets and suggests that scientists provide two information constructs: a hypothesis projection and context information.

...read moreread less

Abstract: Scientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. One activity that consumes much of a scientist's time is developing models that balance contradictory and redundant evidence. Driven by our desire to understand the information behaviors of this important user group, and the behaviors of scientific discovery in general, we conducted an observational study of academic research scientists as they resolved different experimental results reported in the biomedical literature. This article is the first of two that reports our findings. In this article, we introduce the Collaborative Information Synthesis (CIS) model that reflects the salient information behaviors that we observed. The CIS model emerges from a rich collection of qualitative data including interviews, electronic recordings of meetings, meeting minutes, e-mail communications, and extraction worksheets. Our findings suggest that scientists provide two information constructs: a hypothesis projection and context information. They also engage in four critical tasks: retrieval, extraction, verification, and analysis. The findings also suggest that science is not an individual but rather a collaborative activity and that scientists use the results of one analysis to inform new analyses. In Part 2, we compare and contrast existing information and cognitive models that have inadvertently reported synthesis, and then provide five recommendations that will enable designers to build information systems that support the important synthesis activity. © 2006 Wiley Periodicals, Inc.

...read moreread less

67 citations

Proceedings Article•DOI•

Better rules, fewer features: a semantic approach to selecting features from text

[...]

Catherine Blake¹, Wanda Pratt¹•Institutions (1)

University of California, Irvine¹

29 Nov 2001

TL;DR: The preliminary findings indicate that bi-directional association rules based on concepts or keywords are more plausible and more useful than those based on word features.

...read moreread less

Abstract: The choice of features used to represent a domain has a profound effect on the quality of the model produced; yet, few researchers have investigated the relationship between the features used to represent text and the quality of the final model We explored this relationship for medical texts by comparing association rules based on features with three different semantic levels: (1) words (2) manually assigned keywords and (3) automatically selected medical concepts Our preliminary findings indicate that bi-directional association rules based on concepts or keywords are more plausible and more useful than those based on word features The concept and keyword representations also required 90% fewer features than the word representation This drastic dimensionality reduction suggests that this approach is well suited to large textual corpora of medical text, such as parts of the Web

...read moreread less

60 citations

Journal Article•DOI•

Text mining

[...]

Catherine Blake¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 2011

59 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Journal Article•DOI•

SMOTE: synthetic minority over-sampling technique

[...]

Nitesh V. Chawla¹, Kevin W. Bowyer², Lawrence O. Hall¹, W. Philip Kegelmeyer³•Institutions (3)

University of South Florida¹, University of Notre Dame², Sandia National Laboratories³

01 Jan 2002-Journal of Artificial Intelligence Research

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of oversampling the minority (abnormal)cla ss and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space)tha n only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space)t han varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC)and the ROC convex hull strategy.

...read moreread less

17,313 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

SMOTE: Synthetic Minority Over-sampling Technique

[...]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, W.P. Kegelmeyer

09 Jun 2011-arXiv: Artificial Intelligence

...read moreread less

Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

...read moreread less

11,512 citations

Journal Article•

Statistical Comparisons of Classifiers over Multiple Data Sets

[...]

Janez Demšar

01 Dec 2006-Journal of Machine Learning Research

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.

...read moreread less

Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

...read moreread less

10,306 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse