Home
/
Authors
/
William R. Hersh

Author

William R. Hersh

Other affiliations: Brigham and Women's Hospital, Stanford University, University of Portland ...read more

Bio: William R. Hersh is an academic researcher from Oregon Health & Science University. The author has contributed to research in topics: Health informatics & Health care. The author has an hindex of 66, co-authored 343 publications receiving 15514 citations. Previous affiliations of William R. Hersh include Brigham and Women's Hospital & Stanford University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

OHSUMED: an interactive retrieval evaluation and new large test collection for research

[...]

William R. Hersh¹, Chris Buckley², T. J. Leone¹, David H. Hickam¹•Institutions (2)

Oregon Health & Science University¹, Cornell University²

01 Aug 1994

TL;DR: A series of information retrieval experiments was carried out with a computer installed in a medical practice setting for relatively inexperienced physician end-users using a commercial MEDLINE product based on the vector space model, finding that these physicians searched just as effectively as more experienced searchers using Boolean searching.

...read moreread less

Abstract: A series of information retrieval experiments was carried out with a computer installed in a medical practice setting for relatively inexperienced physician end-users. Using a commercial MEDLINE product based on the vector space model, these physicians searched just as effectively as more experienced searchers using Boolean searching. The results of this experiment were subsequently used to create a new large medical test collection, which was used in experiments with the SMART retrieval system to obtain baseline performance data as well as compare SMART with the other searchers.

...read moreread less

894 citations

Journal Article•DOI•

A survey of current work in biomedical text mining

[...]

Aaron Cohen¹, William R. Hersh¹•Institutions (1)

Oregon Health & Science University¹

01 Mar 2005-Briefings in Bioinformatics

TL;DR: The major challenge of biomedical text mining over the next 5-10 years will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

...read moreread less

Abstract: The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5–10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

...read moreread less

782 citations

Journal Article•DOI•

Caveats for the use of operational electronic health record data in comparative effectiveness research.

[...]

William R. Hersh¹, Mark G. Weiner², Peter J. Embi³, Judith R. Logan¹, Philip R. O. Payne³, Elmer V. Bernstam⁴, Harold P Lehmann⁵, George Hripcsak⁶, Timothy H. Hartzog⁷, James J. Cimino⁸, Joel H. Saltz⁹ - Show less +7 more•Institutions (9)

Oregon Health & Science University¹, AstraZeneca², Ohio State University³, University of Texas Health Science Center at Houston⁴, Johns Hopkins University⁵, Columbia University⁶, Medical University of South Carolina⁷, National Institutes of Health⁸, Emory University⁹

01 Aug 2013-Medical Care

TL;DR: A list of caveats is developed to inform would-be users of such data as well as provide an informatics roadmap that aims to insure this opportunity to augment comparative effectiveness research can be best leveraged.

...read moreread less

Abstract: The growing amount of data in operational electronic health record systems provides unprecedented opportunity for its reuse for many tasks, including comparative effectiveness research. However, there are many caveats to the use of such data. Electronic health record data from clinical settings may be inaccurate, incomplete, transformed in ways that undermine their meaning, unrecoverable for research, of unknown provenance, of insufficient granularity, and incompatible with research protocols. However, the quantity and real-world nature of these data provide impetus for their use, and we develop a list of caveats to inform would-be users of such data as well as provide an informatics roadmap that aims to insure this opportunity to augment comparative effectiveness research can be best leveraged.

...read moreread less

437 citations

Proceedings Article•

TREC 2005 Genomics Track Overview.

[...]

William R. Hersh, Aaron Cohen, Jianji Yang¹, Ravi Teja Bhupatiraju¹, Phoebe M. Roberts², Marti A. Hearst³ - Show less +2 more•Institutions (3)

Oregon Health & Science University¹, Biogen Idec², University of California, Berkeley³

01 Dec 2005

TL;DR: The TREC 2005 Genomics Track featured two tasks, an ad hoc retrieval task and four subtasks in text categorization as mentioned in this paper, which used a full-text document collection with training and test sets consisting of about 6,000 biomedical journal articles each.

...read moreread less

Abstract: The TREC 2005 Genomics Track featured two tasks, an ad hoc retrieval task and four subtasks in text categorization. The ad hoc retrieval task utilized a 10-year, 4.5-million document subset of the MEDLINE bibliographic database, with 50 topics conforming to five generic topic types. The categorization task used a full-text document collection with training and test sets consisting of about 6,000 biomedical journal articles each. Participants aimed to triage the documents into categories representing data resources in the Mouse Genome Informatics database, with performance assessed via a utility measure.

...read moreread less

395 citations

Journal Article•DOI•

Computerized Physician Order Entry in U.S. Hospitals: Results of a 2002 Survey

[...]

Joan S. Ash¹, Paul Gorman¹, Veena Seshadri¹, William R. Hersh¹•Institutions (1)

Oregon Health & Science University¹

21 Nov 2003-Journal of the American Medical Informatics Association

TL;DR: Despite increasing consensus about the desirability of computerized physician order entry (CPOE) use, these data indicate that only 9.6% of U.S. hospitals presently have CPOE completely available.

...read moreread less

301 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Book•

Foundations of Statistical Natural Language Processing

[...]

Christopher D. Manning¹, Hinrich Schütze²•Institutions (2)

Stanford University¹, PARC²

28 May 1999

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.

...read moreread less

Abstract: Statistical approaches to processing natural language text have become dominant in recent years This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear The book contains all the theory and algorithms needed for building NLP tools It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications

...read moreread less

9,295 citations

Journal Article•DOI•

Machine learning in automated text categorization

[...]

Fabrizio Sebastiani

01 Mar 2002-ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

7,539 citations

Journal Article•DOI•

Rayyan-a web and mobile app for systematic reviews.

[...]

Mourad Ouzzani¹, Hossam M. Hammady¹, Zbys Fedorowicz², Ahmed K. Elmagarmid¹•Institutions (2)

Qatar Computing Research Institute¹, Cochrane Collaboration²

05 Dec 2016-Systematic Reviews

TL;DR: The strongest features of the app, identified and reported in user feedback, were its ability to help in screening and collaboration as well as the time savings it affords to users.

...read moreread less

Abstract: Synthesis of multiple randomized controlled trials (RCTs) in a systematic review can summarize the effects of individual outcomes and provide numerical answers about the effectiveness of interventions. Filtering of searches is time consuming, and no single method fulfills the principal requirements of speed with accuracy. Automation of systematic reviews is driven by a necessity to expedite the availability of current best evidence for policy and clinical decision-making. We developed Rayyan ( http://rayyan.qcri.org ), a free web and mobile app, that helps expedite the initial screening of abstracts and titles using a process of semi-automation while incorporating a high level of usability. For the beta testing phase, we used two published Cochrane reviews in which included studies had been selected manually. Their searches, with 1030 records and 273 records, were uploaded to Rayyan. Different features of Rayyan were tested using these two reviews. We also conducted a survey of Rayyan’s users and collected feedback through a built-in feature. Pilot testing of Rayyan focused on usability, accuracy against manual methods, and the added value of the prediction feature. The “taster” review (273 records) allowed a quick overview of Rayyan for early comments on usability. The second review (1030 records) required several iterations to identify the previously identified 11 trials. The “suggestions” and “hints,” based on the “prediction model,” appeared as testing progressed beyond five included studies. Post rollout user experiences and a reflexive response by the developers enabled real-time modifications and improvements. The survey respondents reported 40% average time savings when using Rayyan compared to others tools, with 34% of the respondents reporting more than 50% time savings. In addition, around 75% of the respondents mentioned that screening and labeling studies as well as collaborating on reviews to be the two most important features of Rayyan. As of November 2016, Rayyan users exceed 2000 from over 60 countries conducting hundreds of reviews totaling more than 1.6M citations. Feedback from users, obtained mostly through the app web site and a recent survey, has highlighted the ease in exploration of searches, the time saved, and simplicity in sharing and comparing include-exclude decisions. The strongest features of the app, identified and reported in user feedback, were its ability to help in screening and collaboration as well as the time savings it affords to users. Rayyan is responsive and intuitive in use with significant potential to lighten the load of reviewers.

...read moreread less

7,527 citations

Journal Article•DOI•

Evaluating collaborative filtering recommender systems

[...]

Jonathan L. Herlocker¹, Joseph A. Konstan², Loren Terveen², John Riedl²•Institutions (2)

Oregon State University¹, University of Minnesota²

01 Jan 2004-ACM Transactions on Information Systems

TL;DR: The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole.

...read moreread less

Abstract: Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.

...read moreread less

5,686 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse