Home
/
Authors
/
Edward A. Fox

Author

Edward A. Fox

Other affiliations: University of Maryland, College Park, Cornell University, Villanova University ...read more

Bio: Edward A. Fox is an academic researcher from Virginia Tech. The author has contributed to research in topics: Digital library & Metadata. The author has an hindex of 53, co-authored 522 publications receiving 13862 citations. Previous affiliations of Edward A. Fox include University of Maryland, College Park & Cornell University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980

Papers

PDF

Open Access

More filters

Proceedings Article•

Combination of multiple searches

[...]

Edward A. Fox, Joseph A. Shaw

01 Jan 1994

TL;DR: This paper describes one method that has been shown to increase performance by combining the similarity values from five different retrieval runs using both vector space and P-norm extended boolean retrieval methods.

...read moreread less

Abstract: The TREC-2 project at Virginai Tech focused on methods for combining the evidence from multiple retrieval runs to improve performance over any single retrieval method. This paper describes one such method that has been shown to increase performance by combining the similarity values from five different retrieval runs using both vector space and P-norm extended boolean retrieval methods

...read moreread less

1,106 citations

Journal Article•DOI•

Extended Boolean information retrieval

[...]

Gerard Salton¹, Edward A. Fox², Harry Wu•Institutions (2)

Cornell University¹, International Institute of Minnesota²

01 Nov 1983-Communications of The ACM

TL;DR: A new, extended Boolean information retrieval system is introduced which is intermediate between the Boolean system of query processing and the vector processing model, and Laboratory tests indicate that the extended system produces better retrieval output than either the Boolean or thevector processing systems.

...read moreread less

Abstract: In conventional information retrieval Boolean combinations of index terms are used to formulate the users'' information requests. While any document is in principle retrievable by a Boolean query, the amount of output obtainable by Boolean processing is difficult to control, and the retrieved items are not ranked in any presumed order of importance to the user population. In the vector processing model of retrieval, the retrieved items are easily ranked in decreasing order of the query-record similarity, but the queries themselves are unstructured and expressed as simple sets of weighted index terms. A new, extended Boolean information retrieval system is introduced which is intermediate between the Boolean system of query processing and the vector processing model. The query structure inherent in the Boolean system is preserved, while at the same time weighted terms may be incorporated into both queries and stored documents; the retrieved output can also be ranked in strict similarity order with the user queries. A conventional retrieval system can be modified to make use of the extended system. Laboratory tests indicate that the extended system produces better retrieval output than either the Boolean or the vector processing systems.

...read moreread less

909 citations

Journal Article•DOI•

Digital libraries

[...]

Edward A. Fox¹, Robert M. Akscyn, Richard Furuta², John J. Leggett²•Institutions (2)

Virginia Tech¹, Texas A&M University²

01 Apr 1995-Communications of The ACM

TL;DR: This report outlines IBM’s perspective on key supporting technologies and on the unique challenges highlighted by the emergence of digital libraries.

...read moreread less

Abstract: ing Education-support Object-oriented Accessibility Electronic publishing OCR Agents Ethnographic study OODB support Annotation Filtering Personalization Archive Geographic information system Preservation Billing, charging Hypermedia Privacy Browsing Hypertext Publisher library Catalog Image processing Repository Classification Indexing Scalability Clustering Information retrieval Searching Commercial service Intellectual property rights Security Content conversion Interactive Sociological study Copyright clearance Knowledge base Storage Courseware Knowbot Standard Database Library science Subscription Diagrams (e.g., CAD) Mediator Sustainability Digital video Multilingual Training support Discipline-level library Multimedia stream playback Usability Distributed processing Multimedia systems Virtual (integration) Document analysis Multimodal Visualization Document model National library World-Wide Web Economic study Navigation its characterization of digital libraries. Many important projects and perspectives have been omitted. Here we give some pointers to aid further exploration, and of course we encourage interested readers to attend the numerous conferences and workshops scheduled in this field, many sponsored by or in cooperation with ACM and its SIGs. One early journal special issue is introduced in [6]. It includes articles on copyright and intellectual property rights, a subscription model for handling funds transfer related to digital libraries, a description of the evolution of the WAIS search system in general and its interfaces in particular, an overview of the Right Pages system and its use of OCR and document analysis algorithms, and an early overview of the Envision system [7]. We note that to many, intellectual property rights issues and ways to obtain revenue streams to sustain digital libraries are the most important open problems. The largest digital library conference makes its proceedings available over the WWW [9]. These contain many insightful discussions, proposals of new research ideas, descriptions of base technologies, and explanations of how the broad concept of a digital library fits in with the needs of specific user communities and the information they require. Readers can find a variety of works on agents, architectures, catalogs, collaboration, compression, document analysis from OCR and page images, document structure, electronic journals, heterogeneous sources, knowledge-based approaches, library science, numerical data collections, object stores, and organizational usability. For more details on the origins of the Digital Library Initiative, and for a variety of perspectives on open research problems, we refer the reader to [5]. This work also has numerous pointers to people, projects, institutions, and other reference works in the area. For a perspective on the role the computer industry should have in this field, see [10]. This report outlines IBM’s perspective on key supporting technologies and on the unique challenges highlighted by the emergence of digital libraries. We expect considerable interest from the corporate sector as well as from government agencies in this important area of information technology. For lack of space, we have had to omit many publications on networking and storage technologies, sociological and ethnographic studies, library and information science, OCR and document analysis or conversion, and rights management. These and other works are needed to round out the discussion of digital libraries. However, we encourage you to read the rest of this issue as a good starting point for your future studies of this important field. We invite you to not only use but also help in the creation of a future World Digital Library System!

...read moreread less

654 citations

Journal Article•DOI•

Social media use by government: From the routine to the critical

[...]

Andrea L. Kavanaugh¹, Edward A. Fox¹, Steven D. Sheetz¹, Seungwon Yang¹, Lin Tzy Li², Donald J. Shoemaker¹, Apostol Natsev³, Lexing Xie⁴ - Show less +4 more•Institutions (4)

Virginia Tech¹, State University of Campinas², Google³, Australian National University⁴

01 Oct 2012-Government Information Quarterly

TL;DR: Findings from a exploratory study conducted with government officials in Arlington, VA between June and December 2010 are presented, with the broad goal of understanding social media use by government officials as well as community organizations, businesses, and the public at large.

...read moreread less

497 citations

Caching Proxies: Limitations and Potentials

[...]

Marc D. Abrams, Charles R. Standridge, Ghaleb Abdulla, Stephen Williams, Edward A. Fox - Show less +1 more

18 Jul 1995

TL;DR: This work assesses the potential of proxy servers to cache documents retrieved with the HTTP protocol, and finds that a proxy server really functions as a second level cache, and its hit rate may tend to decline with time after initial loading given a more or less constant set of users.

...read moreread less

Abstract: As the number of World-Wide Web users grow, so does the number of connections made to servers. This increases both network load and server load. Caching can reduce both loads by migrating copies of server files closer to the clients that use those files. Caching can either be done at a client or in the network (by a proxy server or gateway). We assess the potential of proxy servers to cache documents retrieved with the HTTP protocol. We monitored traffic corresponding to three types of educational workloads over a one semester period, and used this as input to a cache simulation. Our main findings are (1) that with our workloads a proxy has a 30-50% maximum possible hit rate no matter how it is designed; (2) that when the cache is full and a document is replaced, least recently used (LRU) is a poor policy, but simple variations can dramatically improve hit rate and reduce cache size; (3) that a proxy server really functions as a second level cache, and its hit rate may tend to decline with time after initial loading given a more or less constant set of users; and (4) that certain tuning configuration parameters for a cache may have little benefit.

...read moreread less

495 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Book•

世界経済・社会統計 = World development indicators

[...]

泰彦鳥居, 裕秋山

01 Jan 1998

9,675 citations

Journal Article•DOI•

Term Weighting Approaches in Automatic Text Retrieval

[...]

Gerard Salton¹, Chris Buckley¹•Institutions (1)

Cornell University¹

01 Aug 1988-Information Processing and Management

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

Abstract: The experimental evidence accumulated over the past 20 years indicates that textindexing systems based on the assignment of appropriately weighted single terms produce retrieval results that are superior to those obtainable with other more elaborate text representations. These results depend crucially on the choice of effective term weighting systems. This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

9,460 citations

Book•

计量经济分析 = Econometric analysis

[...]

William H. Greene, 成思张

01 Jan 2009

8,216 citations

Journal Article•DOI•

Data clustering: 50 years beyond K-means

[...]

Anil K. Jain¹•Institutions (1)

Michigan State University¹

01 Jun 2010

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

...read moreread less

Abstract: Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.

...read moreread less

6,601 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse