Home
/
Authors
/
Usama M. Fayyad

Author

Usama M. Fayyad

Other affiliations: University of Michigan, California Institute of Technology, University of Minnesota ...read more

Bio: Usama M. Fayyad is an academic researcher from Northeastern University. The author has contributed to research in topics: Knowledge extraction & Cluster analysis. The author has an hindex of 58, co-authored 135 publications receiving 24868 citations. Previous affiliations of Usama M. Fayyad include University of Michigan & California Institute of Technology.

Papers published on a yearly basis

2023
2022
2020
2017
2014
2012
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988

Papers

PDF

Open Access

More filters

Journal Article•DOI•

From Data Mining to Knowledge Discovery in Databases

[...]

Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth

15 Mar 1996-Ai Magazine

TL;DR: An overview of this emerging field is provided, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases.

...read moreread less

Abstract: ■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field.

...read moreread less

4,782 citations

Proceedings Article•

Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning

[...]

Usama M. Fayyad, Keki B. Irani

01 Sep 1993

TL;DR: This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals.

...read moreread less

Abstract: Since most real-world applications of classification learning involve continuous-valued attributes, properly addressing the discretization process is an important problem. This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals.

...read moreread less

3,192 citations

Proceedings Article•

From data mining to knowledge discovery: an overview

[...]

Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth

01 Feb 1996

2,710 citations

Journal Article•DOI•

The KDD process for extracting useful knowledge from volumes of data

[...]

Usama M. Fayyad¹, Gregory Piatetsky-Shapiro, Padhraic Smyth¹•Institutions (1)

California Institute of Technology¹

01 Nov 1996-Communications of The ACM

1,857 citations

Proceedings Article•

Refining Initial Points for K-Means Clustering

[...]

Paul S. Bradley¹, Usama M. Fayyad²•Institutions (2)

University of Wisconsin-Madison¹, Microsoft²

24 Jul 1998

TL;DR: A procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution that allows the iterative algorithm to converge to a “better” local minimum.

...read moreread less

Abstract: Practical approaches to clustering use an iterative procedure (e.g. K-Means, EM) which converges to one of numerous local minima. It is known that these iterative techniques are especially sensitive to initial starting conditions. We present a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution. The refined initial starting condition allows the iterative algorithm to converge to a “better” local minimum. The procedure is applicable to a wide class of clustering algorithms for both discrete and continuous data. We demonstrate the application of this method to the popular K-Means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. Refinement run time is considerably lower than the time required to cluster the full database. The method is scalable and can be coupled with a scalable clustering algorithm to address the large-scale clustering problems in data mining.

...read moreread less

1,214 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Journal Article•DOI•

Maps of Dust Infrared Emission for Use in Estimation of Reddening and Cosmic Microwave Background Radiation Foregrounds

[...]

David J. Schlegel¹, Douglas P. Finkbeiner², Marc Davis²•Institutions (2)

Durham University¹, University of California, Berkeley²

20 Jun 1998-The Astrophysical Journal

TL;DR: In this article, a reprocessed composite of the COBE/DIRBE and IRAS/ISSA maps, with the zodiacal foreground and confirmed point sources removed, is presented.

...read moreread less

Abstract: We present a full-sky 100 μm map that is a reprocessed composite of the COBE/DIRBE and IRAS/ISSA maps, with the zodiacal foreground and confirmed point sources removed. Before using the ISSA maps, we remove the remaining artifacts from the IRAS scan pattern. Using the DIRBE 100 and 240 μm data, we have constructed a map of the dust temperature so that the 100 μm map may be converted to a map proportional to dust column density. The dust temperature varies from 17 to 21 K, which is modest but does modify the estimate of the dust column by a factor of 5. The result of these manipulations is a map with DIRBE quality calibration and IRAS resolution. A wealth of filamentary detail is apparent on many different scales at all Galactic latitudes. In high-latitude regions, the dust map correlates well with maps of H I emission, but deviations are coherent in the sky and are especially conspicuous in regions of saturation of H I emission toward denser clouds and of formation of H2 in molecular clouds. In contrast, high-velocity H I clouds are deficient in dust emission, as expected. To generate the full-sky dust maps, we must first remove zodiacal light contamination, as well as a possible cosmic infrared background (CIB). This is done via a regression analysis of the 100 μm DIRBE map against the Leiden-Dwingeloo map of H I emission, with corrections for the zodiacal light via a suitable expansion of the DIRBE 25 μm flux. This procedure removes virtually all traces of the zodiacal foreground. For the 100 μm map no significant CIB is detected. At longer wavelengths, where the zodiacal contamination is weaker, we detect the CIB at surprisingly high flux levels of 32 ± 13 nW m-2 sr-1 at 140 μm and of 17 ± 4 nW m-2 sr-1 at 240 μm (95% confidence). This integrated flux ~2 times that extrapolated from optical galaxies in the Hubble Deep Field. The primary use of these maps is likely to be as a new estimator of Galactic extinction. To calibrate our maps, we assume a standard reddening law and use the colors of elliptical galaxies to measure the reddening per unit flux density of 100 μm emission. We find consistent calibration using the B-R color distribution of a sample of the 106 brightest cluster ellipticals, as well as a sample of 384 ellipticals with B-V and Mg line strength measurements. For the latter sample, we use the correlation of intrinsic B-V versus Mg2 index to tighten the power of the test greatly. We demonstrate that the new maps are twice as accurate as the older Burstein-Heiles reddening estimates in regions of low and moderate reddening. The maps are expected to be significantly more accurate in regions of high reddening. These dust maps will also be useful for estimating millimeter emission that contaminates cosmic microwave background radiation experiments and for estimating soft X-ray absorption. We describe how to access our maps readily for general use.

...read moreread less

15,988 citations

Monograph•DOI•

Causality: models, reasoning, and inference

[...]

Judea Pearl¹•Institutions (1)

University of California, Los Angeles¹

14 Sep 2009-Tijdschrift Voor Filosofie

TL;DR: The art and science of cause and effect have been studied in the social sciences for a long time as mentioned in this paper, see, e.g., the theory of inferred causation, causal diagrams and the identification of causal effects.

...read moreread less

Abstract: 1. Introduction to probabilities, graphs, and causal models 2. A theory of inferred causation 3. Causal diagrams and the identification of causal effects 4. Actions, plans, and direct effects 5. Causality and structural models in the social sciences 6. Simpson's paradox, confounding, and collapsibility 7. Structural and counterfactual models 8. Imperfect experiments: bounds and counterfactuals 9. Probability of causation: interpretation and identification Epilogue: the art and science of cause and effect.

...read moreread less

12,606 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse