scispace - formally typeset
Open AccessDOI

Parallel Algorithms for Predictive Modeling

Markus Hegland
- pp 323-362
TLDR
It turns out that even smaller granularity parallelism can be exploited effectively in the problems considered, and the development is illustrated by four examples of nonparametric regression techniques.
Abstract
Parallel computing enables the analysis of very large data sets using large collections of flexible models with many variables The computational methods are based on ideas from computational linear algebra and can draw on the extensive research on parallel algorithms in this area Many algorithms for the direct and iterative solution of penalised least squares problems and for updating can be applied Both methods for dense and sparse problems are applicable An important property of the algorithms is their scalability, ie, their ability to solve larger problems in the same time using hardware which grows linearly with the problem size While in most cases large granularity parallelism is to be preferred, it turns out that even smaller granularity parallelism can be exploited effectively in the problems considered The development is illustrated by four examples of nonparametric regression techniques In a first example, additive models are considered While the backfitting method contains dependencies which inhibit parallel execution it turns out that parallelisation over the data leads to a viable method, akin to the bagging algorithm without replacement which is known to have superior statistical properties in many cases The second example considers radial basis function fitting with thin plate splines Here the direct approach turns out to be non-scalable but an approximation with finite elements is shown to be scalable and parallelises well One of the most popular algorithms in data mining is MARS (Multivariate Adaptive Regression Splines) This is discussed in the third example MARS has been modified to use a multiscale approach and a parallel algorithm with a small granularity has been seen to give good results The final example considers the current research area of sparse grids Sparse grids take up many ideas from the previous examples and, in fact, can be considered as a generalisation of MARS and additive models They are naturally parallel when the combination technique is used We discuss limitations and improvements of the combination technique

read more

Citations
More filters
Posted Content

Parallel MARS Algorithm Based on B-splines

TL;DR: This work investigates one of the possible ways for improving Friedman's Multivariate Adaptive Regression Splines (MARS) algorithm designed for flexible modelling of high-dimensional data by using B-splines instead of truncated power basis functions.

The integrated delivery of large-scale data mining: The acsys data mining project

TL;DR: The Australian Government's Cooperative Research Centre for Advanced Computational Systems (ACSys) as mentioned in this paper is a link between industry and research focusing on the deployment of high performance computers for data mining.
References
More filters
Book

Matrix computations

Gene H. Golub
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Related Papers (5)