scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 1996"


Proceedings Article
03 Sep 1996
TL;DR: In this article, the authors present fast algorithms for computing a collection of group bys, which is equivalent to the union of a number of standard group-by operations, and show how the structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations.
Abstract: At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of group bys. We focus on a special case of the aggregation problem - computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward meth

608 citations


Journal ArticleDOI
01 Sep 1996
TL;DR: This paper will show that an MDD provides significant advantages over a ROLAP such as several orders of magnitude faster data retrieval, several times faster calculation, much less disk space, and le,~ programming effort.
Abstract: Many people ask about the difference between implementing On-Line Analytical Processing (OLAP) with a Relational Database Management System (ROLAP) versus a Mutidimensional Database (MDD). In this paper, we will show that an MDD provides significant advantages over a ROLAP such as several orders of magnitude faster data retrieval, several orders of magnitude faster calculation, much less disk space, and le,~ programming effort.

222 citations


Proceedings Article
01 Jan 1996
TL;DR: In this paper, the authors compare the work done in these two areas and argue for the support of a Statistical Object data type as one of the fundamental structures that object-oriented data models and systems should support.
Abstract: During the 1980’s there was a lot of activity in the area of Statistical Databases, focusing mostly on socioeconomic type applications, such as census data, national production and consumption patterns, etc. Tn the 1990’s the area of On-LineAnalytic Processing (OLAP) was introduced for the analysis of transaction based business data, such as retail stores transactions. Both areas deal with the representation and support of data in a multi-dimensional space. Much of the OLAP literature does not refer to the Statistical Database literature, perhaps because the connection between analyzing business data and socioeconomic data is not obvious. Furthermore, there are papers published in one area or the other whose results can be applied in both application areas. In this paper, we compare the work done in these two areas. We discuss concepts used in the conceptual modeling of the data and operations over them, efficient physical organization and access methods, as well as privacy issues. We point out the terminology used and the correspondence between terms. We identify which research aspects are emphasized in each of these areas and the reasons for that We conclude by arguing for the support of a Statistical Object data type as one of the fundamental structures that object-oriented data models and systems should support

188 citations


Book
06 Dec 1996
TL;DR: Leading practitioners of AI in the financial services industry present detailed, up-to-date coverage of all major AI techniques, including genetic algorithms, neural networks, rule-based systems, fuzzy logic, case- based systems and machine learning algorithms.
Abstract: Advanced techniques for transforming data into decisions. Today's knowledge-intensive decision support tools can make organizations dramatically more intelligent, by utilizing the information organizations already possess, taking advantage of today's increasingly widespread data mining and warehousing systems, and leveraging mature and powerful artificial intelligence technologies. This is the first book to cover these new approaches in depth, for both the business decision maker and the technologist. Authors Vasant Dhar and Roger Stein are leading practitioners of AI in the financial services industry. They present detailed, up-to-date coverage of all major AI techniques, including genetic algorithms, neural networks, rule-based systems, fuzzy logic, case-based systems and machine learning algorithms. They discuss advanced OLAP and data warehousing systems, and show how to select the most appropriate tool for each business challenge. The book includes detailed checklists of key organizational and technical issues to consider, several practical case studies, and a common methodology and analysis techniques.

149 citations


Patent
07 May 1996
TL;DR: The Neural Intelligent Mail Query Agent (NIMSA) as mentioned in this paper is a neural intelligent mail query agent that includes an online analytical processing system for accessing and analyzing data in at least one database, a query-by-mail system coupled to the online analytical process system for receiving and processing queries from users for information derived from databases.
Abstract: A neural intelligent mail query agent. The neural intelligent mail query agent includes an online analytical processing system for accessing and analyzing data in at least one database, a query-by-mail system coupled to the online analytical processing system for receiving and processing queries from users for information derived from databases, and a neural network coupled to the remote query-by-mail system for providing learning capabilities in response to the remote mail queries. An expert system provides inference functions and the neural network is trained using a data stream from the databases as it is generated by the received mail queries. The neural network reports intelligence abstracts to the query-by-mail system as well as reports and organizes new rules constructed from the intelligence abstracts and existing rules.

107 citations


Book
01 Apr 1996
TL;DR: Enterprise computing professionals will discover how to design and build an effective data warehousing system with Rob Mattison's well-organized, step-by-step plan.
Abstract: From the Publisher: Data warehouseing is key information management strategy for businesses that want to stay competitive into the next century. With this practical,information-packed sourcebook enterprise computing professionals will discover how to design and build an effective data warehousing system. Author Rob Mattison provides a well-organized,step-by-step plan and offers a valuable,vendor-independent examination of the available tools and products. He explains the key concepts of business process reengineering,client/server technology,systems architecture,online analytical processing (OLAP),and decision support systems (DS) that are critical to data warehousing.

56 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: This work proposes new methods to deal with disk resident extendible arrays and introduces a new index data structure for keeping track of the extensions, and a performance analysis is conducted for array extension and retrievals.
Abstract: Online analytical processing (OLAP) is becoming increasingly important as today's organizations frequently make business decisions based on statistical analysis of their enterprise data. This data is multidimensional and is derived from transactional data using various levels of aggregation. As the business model changes frequently, the multidimensional arrays must be extended in terms of the value ranges of each dimension and even new dimensions. We propose new methods to deal with disk resident extendible arrays. A new index data structure for keeping track of the extensions is introduced, and a performance analysis is conducted for array extension and retrievals.

41 citations


Proceedings Article
02 Aug 1996
TL;DR: The work reported here is a summary of results appearing in the following two papers: V. Harinarayan, A. Rajaiaman, and J. D. Ullman, "Implementing data cubes efficiently," to appear in 1996 SIGMOD.
Abstract: Data cubes are specialized database management systems designed to support multidimensional data for such purposes as decision support and data mining. For a given mix of queries, we can optimize the implementation of a data cube by materializing some projections of the cube. A greedy approach turns out to be very effective; it is both polynomial-time as a function of the number of possible views to materialize and guaranteed to come close to the optimum choice of views. The work reported here is a summary of results appearing in the following two papers: V. Harinarayan, A. Rajaiaman, and J. D. Ullman, "Implementing data cubes efficiently." To appear in 1996 SIGMOD. An extended version is available by anonymous ftp from db.stanford.edu as pub/harinarayan/1995/cube.ps. H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman, "Index selection for OLAP." Available by anonymous ftp from db.stanford.edu as pub/hgupta/1996/CubeIndex.ps.

27 citations


Proceedings Article
04 Aug 1996
TL;DR: This talk will describe efforts to develop a new generation of data mining systems where users specify what to search for simply by providing the system with training examples, and letting the system automatically learn what to do, and then focus on two specific applications in scientific data analysis to illustrate the potential, limitations, challenges, and promise of KDD.
Abstract: of Invited Talk Overview of the Topic Knowledge Discovery in Databases (KDD) is a new field of research concerned with the extraction of high-level information (knowledge) from low-level data (usually stored in large databases) [l]. It is an area of interest to researchers and practitioners from many fields including: AI, statistics, pattern recognition, databases, visualization, and high-performance and parallel computing. The basic problem is to search databases for patterns or models that can be useful in accomplishing one or more goals. Examples of such goals include: prediction (e.g. regression and classification), descriptive or generative modeling (e.g. clustering), e data summarization (e.g. report generation), or e visualization of either data or extracted knowledge (e.g. to support decision making or exploratory data analysis). KDD is a process that includes many steps. Among these steps are: data preparation and cleaning, data selection and sampling, preprocessing and transformation, data mining to extract patterns and models, interpretation and evaluation of extracted information, and finally evaluation, rendering, or use of final extracted knowledge. Note that under this view, data mining constitutes one of the steps of the overall KDD process. The other steps are essential to make the application of data mining possible, and to make the results useful. Within data mining, methods for deriving patterns or extracting models originate from statistics, machine 1590 IAAI-96 learning, statistical pattern recognition, uncertainty management, and database methods such as on-line analysis processing (OLAP) or association rules [2]. The process is typically highly interactive and may involve many iterations before useful knowledge is extracted from the underlying data. This talk will give an overview and summary of the rapidly growing field of KDD, and then focus on two specific applications in scientific data analysis to illustrate the potential, limitations, challenges, and promise of KDD. An overview of the KDD process is given in [3]. Today’s science instruments are capable of gathering huge amounts of data, making traditional human-based comprehensive analysis an infeasible endeavor. This has been a primary motivation to develop tools to automate science data analysis tasks. The talk will describe efforts to develop a new generation of data mining systems where users specify what to search for simply by providing the system with training examples, and letting the system automatically learn what to do. The system would then automatically sift through the data and catalog objects of interest for analysis purposes. The learn-from-example approach is a natural solution to a problem we call the query formulation problem in the exploration and analysis of image data [4]: How does one express a query for objects that are typically only recognized by visual intuition? Translating human visual intuition to pixel-level algorithmic constraints is a difficult problem. By asking the user to simply “show” the system examples of objects of interest, then let the system figure out how to formulate the appropriate query, we believe the problem can be surmounted in certain circumstances. Two applications at JPL will be used to illustrate the learning techniques and their effects. The first targets automating the cataloging of sky objects in a digitized sky survey consisting of three terabytes of image data and From: AAAI-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. containing on the order of two billion sky objects. The Sky Image Cataloging and Analysis Tool (SKICAT) [5] allows for automated and accurate classification, enabling the automated cataloging of an estimated two billion sky objects, the majority of which being too faint for visual recognition by astronomers. This represents an instance where learning algorithms solved a significant and difficult scientific analysis problem. Several new results in astronomy have been achieved based on the SKICAT catalog [6]. Recent results of the application of SKICAT to help in discovery of new objects in the Universe include the discovery of 16 new high-redshift quasars: some of the furthest and oldest objects detectable by today’s instruments [7]. The second system we describe is called JARtool (JPL Adaptive Recognition Tool) [8]. JARtool is being initially developed to detect and catalog an estimated one million small volcanoes (< 15km in diameter) visible in a database consisting of over 30,000 images of the planet Venus. The images were collected by the Magellan spacecraft using synthetic aperture radar (SAR) to penetrate the permanent gaseous cloud cover that obscures the planet’s surface in the optical range. Work at JPL’s Machine Learning Systems Group continues to extend data mining techniques to automate analysis in other areas of science including: cataloging of Sun spots, remote-sensing detection of earthquake faults [9], spatiotemporal analysis of atmospheric data, and others (see http://www-aig.jpl.nasa.gov/mls/ for live descriptions of ongoing work).

24 citations




Proceedings ArticleDOI
26 Feb 1996
TL;DR: Future versions of the Oracle Server will provide an open and extensible framework for supporting complex data domains including, but not limited to, text, image, spatial, video, and OLAP.
Abstract: Future versions of the Oracle Server will provide an open and extensible framework for supporting complex data domains including, but not limited to, text, image, spatial, video, and OLAP. This framework encompasses features for defining, storing, updating, indexing, and retrieving complex forms of data with full transaction semantics. The underpinning for these features is an extended Oracle Server that is an object-relational database management system (ORDBMS).

Proceedings ArticleDOI
12 Nov 1996
TL;DR: It is argued for the support of a Statistical Object data type as one of the fundamental structures that object-oriented data models and systems should support.
Abstract: During the 1980’s there was a lot of activity in the area of Statistical Databases, focusing mostly on socioeconomic type applications, such as census data, national production and consumption patterns, etc. Tn the 1990’s the area of On-LineAnalytic Processing (OLAP) was introduced for the analysis of transaction based business data, such as retail stores transactions. Both areas deal with the representation and support of data in a multi-dimensional space. Much of the OLAP literature does not refer to the Statistical Database literature, perhaps because the connection between analyzing business data and socioeconomic data is not obvious. Furthermore, there are papers published in one area or the other whose results can be applied in both application areas. In this paper, we compare the work done in these two areas. We discuss concepts used in the conceptual modeling of the data and operations over them, efficient physical organization and access methods, as well as privacy issues. We point out the terminology used and the correspondence between terms. We identify which research aspects are emphasized in each of these areas and the reasons for that We conclude by arguing for the support of a Statistical Object data type as one of the fundamental structures that object-oriented data models and systems should support


Book
01 Jan 1996
TL;DR: This chapter discusses the power of Relational Databases, the Expressiveness of Objects, and how data warehousing and OLAP: Extending the Relational Model can help to improve the quality of objects.
Abstract: 1 The Power of Relational Databases 2 The Expressiveness of Objects 3 Connecting Objects to Relational Databases 4 Data warehousing and OLAP: Extending the Relational Model 5 Object/Relational Database Hybrids 6 Object-Oriented Databases 7 Reusable and Distributed Objects 8 Sample Applications Using Objects 9 Running an Object-Oriented Project 10 Conclusion