XG: a data-driven computation grid for enterprise-scale mining

doi:10.1007/11546924_81

Book ChapterDOI

XG: a data-driven computation grid for enterprise-scale mining

Radu Sion, +4 more

- pp 828-837

Chats0

TLDR

It is shown how such an architecture can be leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets, in stark contrast to existing Grid solutions that interact with data layers mainly as external “storage”.

Abstract:

In this paper we introduce a novel architecture for data processing, based on a functional fusion between a data and a computation layer. We show how such an architecture can be leveraged to offer significant speedups for data processing jobs such as data analysis and mining over large data sets. One novel contribution of our solution is its data-driven approach. The computation infrastructure is controlled from within the data layer. Grid compute job submission events are based within the query processor on the DBMS side and in effect controlled by the data processing job to be performed. This allows the early deployment of on-the-fly data aggregation techniques, minimizing the amount of data to be transfered to/from compute nodes and is in stark contrast to existing Grid solutions that interact with data layers mainly as external “storage”. We validate this in a scenario derived from a real business deployment, involving financial customer profiling using common types of data analytics (e.g., linear regression analysis). Experimental results show significant speedups. For example, using a grid of only 12 non-dedicated nodes, we observed a speedup of approximately 1000% in a scenario involving complex linear regression analysis data mining computations for commercial customer profiling.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A grid-based approach for enterprise-scale data mining

Ramesh Natarajan, +2 more

TL;DR: The goal of this paper is to describe an algorithmic decomposition of data mining kernels between the data storage and compute grids that makes it possible to exploit the parallelism on the respective grids in a simple way, while minimizing the data transfer between these grids.

...read moreread less

Book ChapterDOI

XG: a grid-enabled query processing engine

Radu Sion, +3 more

TL;DR: By integrating scheduling intelligence in the data layer itself it is shown that it is possible to provide a close to optimal solution to the more general grid trade-off between required data replication costs and computation speed-up benefits.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Explicit control a batch-aware distributed file system

John M. Bent, +4 more

TL;DR: The design, implementation, and evaluation of the Batch-Aware Distributed File System (BAD-FS), a system designed to orchestrate large, I/O-intensive batch workloads on remote computing clusters distributed across the wide area, are presented.

...read moreread less

Book

Human Factors and Web Development

Julie Ratner

TL;DR: This book focuses on the development of web interfaces for people with disabilities and the design of web pages and applications for People With Disabilities.

...read moreread less