Bayesian Classifiers Programmed in SQL

doi:10.1109/TKDE.2009.127

Journal ArticleDOI

Bayesian Classifiers Programmed in SQL

Carlos Ordonez, +1 more

- 01 Jan 2010 -

IEEE Transactions on Knowledge and Data ...

- Vol. 22, Iss: 1, pp 139-144

Chats0

TLDR

This work introduces two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering and achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability in SQL.

Abstract:

The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Statistical Model Computation with UDFs

Carlos Ordonez

- 01 Dec 2010 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This work introduces techniques to efficiently compute fundamental statistical models inside a DBMS exploiting User-Defined Functions (UDFs), and studies the computation of linear regression, PCA, clustering, and Naive Bayes.

...read moreread less

Journal ArticleDOI

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Carlos Ordonez, +1 more

- 01 Apr 2012 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This work proposes simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row.

...read moreread less

Journal ArticleDOI

Evaluating association rules and decision trees to predict multiple target attributes

Carlos Ordonez, +1 more

TL;DR: This work conducts an extensive experimental evaluation on a real medical data set to mine rules predicting disease on multiple heart arteries, and shows association rules, compared to decision trees, tend to have higher confidence, they involve larger subsets of the data set, they are better for multiple target attributes, and they work better with user-defined binning.

...read moreread less

Journal ArticleDOI

Optimal Data Center Scheduling for Quality of Service Management in Sensor-Cloud

Subarna Chatterjee, +2 more

- 01 Jan 2019 -

IEEE Transactions on Cloud Computing

TL;DR: The proposed work concentrates on the networking facets of sensor-cloud infrastructures and determines an optimal decision rule for electing a particular DC that congregates data from various VSs, and transmit the same to the end-user application.

...read moreread less

Journal ArticleDOI

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

V. Pradeep Kumar, +1 more

- 01 Jan 2012 -

IOSR Journal of Computer Engineering

TL;DR: This paper presents techniques to support horizontal aggregations through SQL queries that include CASE, SPJ and PIVOT and shows that these constructs are capable of generating data sets that can be used for further data mining operations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

The Elements of Statistical Learning

Trevor Hastie, +2 more

Journal ArticleDOI

The Elements of Statistical Learning

Eric R. Ziegel

- 01 Aug 2003 -

Technometrics

TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.

...read moreread less

Proceedings Article

Scaling clustering algorithms to large databases

Paul S. Bradley, +2 more

TL;DR: A scalable clustering framework applicable to a wide class of iterative clustering that requires at most one scan of the database and is instantiated and numerically justified with the popular K-Means clustering algorithm.

...read moreread less

Journal ArticleDOI

Integrating K-means clustering with a relational DBMS using SQL

Carlos Ordonez

- 01 Feb 2006 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This work introduces three SQL implementations of the popular K-means clustering algorithm and introduces an optimized version based on improved data organization, efficient indexing, sufficient statistics, and rewritten queries, and an incremental version that uses the optimized version as a building block with fast convergence and automated reseeding.

...read moreread less

Book ChapterDOI

ATLAS: a small but complete SQL extension for data mining and data streams

Haixun Wang, +2 more

TL;DR: This chapter implements ATLAS, a powerful database language and system that enables users to develop complete data-intensive applications in structured query language (SQL)—by writing new aggregates and table functions in SQL, rather than in procedural languages as in current Object- Relational systems.

...read moreread less

Bayesian Classifiers Programmed in SQL

Citations

Statistical Model Computation with UDFs

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Evaluating association rules and decision trees to predict multiple target attributes

Optimal Data Center Scheduling for Quality of Service Management in Sensor-Cloud

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

References

The Elements of Statistical Learning

The Elements of Statistical Learning

Scaling clustering algorithms to large databases

Integrating K-means clustering with a relational DBMS using SQL

ATLAS: a small but complete SQL extension for data mining and data streams

Related Papers (5)

Statistical Model Computation with UDFs

Integrating K-means clustering with a relational DBMS using SQL

PIVOT and UNPIVOT: optimization and execution strategies in an RDBMS

Integrating association rule mining with relational database systems: alternatives and implications

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

Trending Questions (1)