scispace - formally typeset
Search or ask a question
Book

Data Mining

01 Jan 2008-
TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.
Abstract: tic regression, and it concerns studying the effect of covariates on the risk of disease. The chapter includes generalized estimating equations (GEE’s) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC. As a prelude to the following chapter on repeated-measures data, Chapter 5 presents time series analysis. The material on repeated-measures analysis uses linear additive models with GEE’s and PROC MIXED in SAS for linear mixed-effects models. Chapter 7 is about survival data analysis. All computing throughout the book is done using SAS procedures.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations

Journal ArticleDOI
TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Abstract: Classification and regression trees are machine-learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14-23 DOI: 10.1002/widm.8 This article is categorized under: Technologies > Classification Technologies > Machine Learning Technologies > Prediction Technologies > Statistical Fundamentals

16,974 citations

01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: A basic taxonomy of feature selection techniques is provided, providing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
Abstract: Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications. Contact: yvan.saeys@psb.ugent.be Supplementary information: http://bioinformatics.psb.ugent.be/supplementary_data/yvsae/fsreview

4,706 citations

Journal ArticleDOI
TL;DR: This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A, and introduces and characterized the six articles that comprise this special issue in terms of the proposed BI &A research framework.
Abstract: Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.

4,610 citations

References
More filters
01 Jan 2002

9,314 citations

Book
01 Jan 1998
TL;DR: In this article, Bayes and Laplace introduced the notion of inverse probability in probability theory and proposed the Equiprobability Model and the Inverse Probability Model for games of chance.
Abstract: DIRECT PROBABILITY 17501805. Some Results and Tools in Probability Theory (By Bernoulli, de Moivre, and Laplace). The Distribution of the Arithmetic Mean, 17561781. Chance or Design: Tests of Significance. Theory of Errors and Methods of Estimation. Fitting of Equations to Data, 17501805. INVERSE PROBABILITY BY BAYES AND LAPLACE, WITH COMMENTS ON LATER DEVELOPMENTS. Induction and Probability: The Philosophical Background. Bayes, Price, and the Essay, 17641765. Equiprobability, Equipossibility, and Inverse Probability. Laplace's Applications of the Principle of Inverse Probability in 1774. Laplace's General Theory of Inverse Probability. The Equiprobability Model and the Inverse Probability Model for Games of Chance. Laplace's Methods of Asymptotic Expansion, 1781 and 1785. Laplace's Analysis of Binomially Distributed Observations. Laplace's Theory of Statistical Prediction. Laplace's Sample Survey of the Population of France and the Distribution of the Ratio Estimator. THE NORMAL DISTRIBUTION, THE METHOD OF LEAST SQUARES, AND THE CENTRAL LIMIT THEOREM. GAUSS AND LAPLACE, 18091828. The Early History of the Central Limit Theorem, 18101853. Derivations of the Normal Distribution as a Law of Error. Gauss's Linear Normal Model and the Method of Least Squares, 1809 and 1811. Laplace's Large-Sample Theory of Linear Estimation, 18111827. Gauss's Theory of Linear Unbiased Minimum Variance Estimation, 18231828. SELECTED TOPICS IN ESTIMATION THEORY 18301930. On Error and Estimation Theory, 18301890. Bienaym?'s Proof of the Multivariate Central Limit Theorem and His Defense of Laplace's Theory of Linear Estimation, 1852 and 1853. Cauchy's Method for Determining the Number of Terms to be Included in the Linear Model and for Estimating the Parameters, 18351853. Orthogonalization and Polynomial Regression. Statistical Laws in the Social and Biological Sciences, Poisson, Quetelet, and Galton, 18301890. Sampling Distributions under Normality. Fisher's Theory of Estimation, 19121935, and His Immediate Precursors. References. Index.

325 citations

Journal ArticleDOI

117 citations


"Data Mining" refers background in this paper

  • ...The author states in his Preface (p. v) that the book “complements the analysis of these procedures in Stigler (1986).”...

    [...]