scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

01 Jan 2006-Distributed and Parallel Databases (Kluwer Academic Publishers)-Vol. 19, Iss: 1, pp 29-62
TL;DR: This paper discusses the cgmCUBE Project, a multi-year effort to design and implement aMulti-processor platform for data cube generation that targets the relational database model (ROLAP), and discusses new algorithmic and system optimizations relating to a thorough optimization of the underlying sequential cube construction method.
Abstract: On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems. In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour.
Citations
More filters
01 Jan 2002

9,314 citations

Proceedings ArticleDOI
28 Oct 2013
TL;DR: Open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data are highlighted and several novel research directions arising in this field are derived.
Abstract: In this paper, we highlight open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data, an emerging term in Data Warehousing and OLAP research. We also derive several novel research directions arising in this field, and put emphasis on possible contributions to be achieved by future research efforts.

120 citations

Proceedings ArticleDOI
09 Oct 2013
TL;DR: Three important aspects of Big Data research are discussed, namely OLAP over Big Data, Big Data Posting, and Privacy of Big data, and future research directions are depicted, hence implicitly defining a research agenda aiming at leading future challenges in this research field.
Abstract: Recently, a great deal of interest for Big Data has risen, mainly driven from a widespread number of research problems strongly related to real-life applications and systems, such as representing, modeling, processing, querying and mining massive, distributed, large-scale repositories (mostly being of unstructured nature). Inspired by this main trend, in this paper we discuss three important aspects of Big Data research, namely OLAP over Big Data, Big Data Posting, and Privacy of Big Data. We also depict future research directions, hence implicitly defining a research agenda aiming at leading future challenges in this research field.

113 citations


Cites background from "The cgmCUBE project: Optimizing par..."

  • ...Here, scientists and researchers produce huge amounts of data per-day via experiments (e.g., think of disciplines like high-energy physics, astronomy, biology, bio-medicine, and so forth) but ex­tracting useful knowledge for decision making purposes from these massive, large-scale data…...

    [...]

Journal ArticleDOI
TL;DR: 21.1 I/O Errors 20721.2 Files and Handles 20821.3 Opening and Closing Files 21021.4 Determining the Size of a File 21121.5 Detecting the End of Input 21 121.6 Buffering Operations 21120.7 Repositioning Handles 21321.9 Text Input and Output 21421.10 Examples 21521.11 Library IO 216
Abstract: 21.1 I/O Errors 20721.2 Files and Handles 20821.3 Opening and Closing Files 21021.4 Determining the Size of a File 21121.5 Detecting the End of Input 21121.6 Buffering Operations 21121.7 Repositioning Handles 21321.8 Handle Properties 21321.9 Text Input and Output 21421.10 Examples 21521.11 Library IO 216

60 citations

Journal ArticleDOI
01 Oct 2009
TL;DR: This paper introduces a very effective compression technique for multidimensional data cubes, and the system Hand-OLAP, which exploits this technique to allow handheld devices to extract and browse compressed two-dimensional OLAP views coming from multiddimensional data cubes stored on a remote OLAP server localized on the wired network.
Abstract: The main drawbacks of handheld devices (small storage space, small size of the display screen, discontinuance of the connection to the WLAN etc) are often incompatible with the need of querying and browsing information extracted from enormous amounts of data which are accessible through the network. In this application scenario, data compression and summarization have a leading role: data in a lossy compressed format can be transmitted more efficiently than the original ones, and can be effectively stored in handheld devices (setting the compression ratio accordingly). In this paper, we introduce a very effective compression technique for multidimensional data cubes, and the system Hand-OLAP, which exploits this technique to allow handheld devices to extract and browse compressed two-dimensional OLAP views coming from multidimensional data cubes stored on a remote OLAP server localized on the wired network. Hand-OLAP effectively and efficiently enables OLAP in mobile environments, and also enlarges the potentialities of Decision Support Systems by taking advantage from the "naturally" decentralized nature of such environments. The idea which the system is based on is: rather than querying the original multidimensional data cubes, it may be more convenient to generate a compressed OLAP view of them, store such view into the handheld device, and query it locally (off-line), thus obtaining approximate answers that are suitable for OLAP applications.

57 citations


Cites background from "The cgmCUBE project: Optimizing par..."

  • ...By looking at the active literature, a possible research direction could be the one drawn by recent distributed and parallel data cube compression methodologies (e.g., Dehne et al. 2001, 2004)....

    [...]

References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Book
01 Jan 1990
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Abstract: From the Publisher: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures. Like the first edition,this text can also be used for self-study by technical professionals since it discusses engineering issues in algorithm design as well as the mathematical aspects. In its new edition,Introduction to Algorithms continues to provide a comprehensive introduction to the modern study of algorithms. The revision has been updated to reflect changes in the years since the book's original publication. New chapters on the role of algorithms in computing and on probabilistic analysis and randomized algorithms have been included. Sections throughout the book have been rewritten for increased clarity,and material has been added wherever a fuller explanation has seemed useful or new information warrants expanded coverage. As in the classic first edition,this new edition of Introduction to Algorithms presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers. Further,the algorithms are presented in pseudocode to make the book easily accessible to students from all programming language backgrounds. Each chapter presents an algorithm,a design technique,an application area,or a related topic. The chapters are not dependent on one another,so the instructor can organize his or her use of the book in the way that best suits the course's needs. Additionally,the new edition offers a 25% increase over the first edition in the number of problems,giving the book 155 problems and over 900 exercises thatreinforcethe concepts the students are learning.

21,651 citations

01 Jan 2005

19,250 citations

01 Jan 1950
TL;DR: A First Course in Probability (8th ed.) by S. Ross is a lively text that covers the basic ideas of probability theory including those needed in statistics.
Abstract: Office hours: MWF, immediately after class or early afternoon (time TBA). We will cover the mathematical foundations of probability theory. The basic terminology and concepts of probability theory include: random experiments, sample or outcome spaces (discrete and continuous case), events and their algebra, probability measures, conditional probability A First Course in Probability (8th ed.) by S. Ross. This is a lively text that covers the basic ideas of probability theory including those needed in statistics. Theoretical concepts are introduced via interesting concrete examples. In 394 I will begin my lectures with the basics of probability theory in Chapter 2. However, your first assignment is to review Chapter 1, which treats elementary counting methods. They are used in applications in Chapter 2. I expect to cover Chapters 2-5 plus portions of 6 and 7. You are encouraged to read ahead. In lectures I will not be able to cover every topic and example in Ross, and conversely, I may cover some topics/examples in lectures that are not treated in Ross. You will be responsible for all material in my lectures, assigned reading, and homework, including supplementary handouts if any.

10,221 citations


"The cgmCUBE project: Optimizing par..." refers methods in this paper

  • ...While we have developed our own method during the course of implementing the larger system, the spirit (and results) are very similar to those described in [11]....

    [...]