Journal ArticleDOI
Bayesian Classifiers Programmed in SQL
Reads0
Chats0
TLDR
This work introduces two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering and achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability in SQL.Abstract:
The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.read more
Citations
More filters
Journal ArticleDOI
Statistical Model Computation with UDFs
TL;DR: This work introduces techniques to efficiently compute fundamental statistical models inside a DBMS exploiting User-Defined Functions (UDFs), and studies the computation of linear regression, PCA, clustering, and Naive Bayes.
Journal ArticleDOI
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Carlos Ordonez,Zhibo Chen +1 more
TL;DR: This work proposes simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row.
Journal ArticleDOI
Evaluating association rules and decision trees to predict multiple target attributes
Carlos Ordonez,Kai Zhao +1 more
TL;DR: This work conducts an extensive experimental evaluation on a real medical data set to mine rules predicting disease on multiple heart arteries, and shows association rules, compared to decision trees, tend to have higher confidence, they involve larger subsets of the data set, they are better for multiple target attributes, and they work better with user-defined binning.
Journal ArticleDOI
Optimal Data Center Scheduling for Quality of Service Management in Sensor-Cloud
TL;DR: The proposed work concentrates on the networking facets of sensor-cloud infrastructures and determines an optimal decision rule for electing a particular DC that congregates data from various VSs, and transmit the same to the end-user application.
Journal ArticleDOI
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
TL;DR: This paper presents techniques to support horizontal aggregations through SQL queries that include CASE, SPJ and PIVOT and shows that these constructs are capable of generating data sets that can be used for further data mining operations.
References
More filters
Journal ArticleDOI
The Elements of Statistical Learning
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Proceedings Article
Scaling clustering algorithms to large databases
TL;DR: A scalable clustering framework applicable to a wide class of iterative clustering that requires at most one scan of the database and is instantiated and numerically justified with the popular K-Means clustering algorithm.
Journal ArticleDOI
Integrating K-means clustering with a relational DBMS using SQL
TL;DR: This work introduces three SQL implementations of the popular K-means clustering algorithm and introduces an optimized version based on improved data organization, efficient indexing, sufficient statistics, and rewritten queries, and an incremental version that uses the optimized version as a building block with fast convergence and automated reseeding.
Book ChapterDOI
ATLAS: a small but complete SQL extension for data mining and data streams
TL;DR: This chapter implements ATLAS, a powerful database language and system that enables users to develop complete data-intensive applications in structured query language (SQL)—by writing new aggregates and table functions in SQL, rather than in procedural languages as in current Object- Relational systems.