scispace - formally typeset
Journal ArticleDOI

Bayesian Classifiers Programmed in SQL

Reads0
Chats0
TLDR
This work introduces two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering and achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability in SQL.
Abstract
The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.

read more

Citations
More filters
Journal ArticleDOI

Statistical Model Computation with UDFs

TL;DR: This work introduces techniques to efficiently compute fundamental statistical models inside a DBMS exploiting User-Defined Functions (UDFs), and studies the computation of linear regression, PCA, clustering, and Naive Bayes.
Journal ArticleDOI

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

TL;DR: This work proposes simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row.
Journal ArticleDOI

Evaluating association rules and decision trees to predict multiple target attributes

TL;DR: This work conducts an extensive experimental evaluation on a real medical data set to mine rules predicting disease on multiple heart arteries, and shows association rules, compared to decision trees, tend to have higher confidence, they involve larger subsets of the data set, they are better for multiple target attributes, and they work better with user-defined binning.
Journal ArticleDOI

Optimal Data Center Scheduling for Quality of Service Management in Sensor-Cloud

TL;DR: The proposed work concentrates on the networking facets of sensor-cloud infrastructures and determines an optimal decision rule for electing a particular DC that congregates data from various VSs, and transmit the same to the end-user application.
Journal ArticleDOI

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

TL;DR: This paper presents techniques to support horizontal aggregations through SQL queries that include CASE, SPJ and PIVOT and shows that these constructs are capable of generating data sets that can be used for further data mining operations.
References
More filters
Journal ArticleDOI

The Elements of Statistical Learning

Eric R. Ziegel
- 01 Aug 2003 - 
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Proceedings Article

Scaling clustering algorithms to large databases

TL;DR: A scalable clustering framework applicable to a wide class of iterative clustering that requires at most one scan of the database and is instantiated and numerically justified with the popular K-Means clustering algorithm.
Journal ArticleDOI

Integrating K-means clustering with a relational DBMS using SQL

TL;DR: This work introduces three SQL implementations of the popular K-means clustering algorithm and introduces an optimized version based on improved data organization, efficient indexing, sufficient statistics, and rewritten queries, and an incremental version that uses the optimized version as a building block with fast convergence and automated reseeding.
Book ChapterDOI

ATLAS: a small but complete SQL extension for data mining and data streams

TL;DR: This chapter implements ATLAS, a powerful database language and system that enables users to develop complete data-intensive applications in structured query language (SQL)—by writing new aggregates and table functions in SQL, rather than in procedural languages as in current Object- Relational systems.
Related Papers (5)
Trending Questions (1)
How can I increase Nvarchar max size in SQL Server?

Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.