scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 2001"


Book ChapterDOI
12 Jul 2001
TL;DR: An ad-hoc grouping hierarchy based on the spatial index at the finest spatial granularity is constructed and incorporated in the lattice model and efficient methods to process arbitrary aggregations are presented.
Abstract: Spatial databases store information about the position of individual objects in space. In many applications however, such as traffic supervision or mobile communications, only summarized data, like the number of cars in an area or phones serviced by a cell, is required. Although this information can be obtained from transactional spatial databases, its computation is expensive, rendering online processing inapplicable. Driven by the non-spatial paradigm, spatial data warehouses can be constructed to accelerate spatial OLAP operations. In this paper we consider the star-schema and we focus on the spatial dimensions. Unlike the non-spatial case, the groupings and the hierarchies can be numerous and unknown at design time, therefore the well-known materialization techniques are not directly applicable. In order to address this problem, we construct an ad-hoc grouping hierarchy based on the spatial index at the finest spatial granularity. We incorporate this hierarchy in the lattice model and present efficient methods to process arbitrary aggregations. We finally extend our technique to moving objects by employing incremental update methods.

367 citations


Patent
03 Apr 2001
TL;DR: In this article, the authors present an approach to derive OLAP dimensions from the normalized relational table and the results of the OLAP measures derivation by an automated method according to the present invention.
Abstract: A Relational Database Management System (RDBMS) having any arbitrary structure is translated into a multi-dimensional data model suitable for performing OLAP operations upon. If a relational table defining the relational model includes any tables with cardinality of 1,1 or 0,1, the tables are merged into a single table. If the relational table is not normalized, then normalization is performed and a relationship between the original table and the normalized table is created. If the relational table is normalized, but not by dependence between columns, such as in the dimension table in a snowflake schema, the normalization process is performed using the foreign key in order to generate the normalized table. Once the normalized table is generated, OLAP measures are derived from the normalized relational table by an automated method. In addition, OLAP dimensions are derived from the normalized relational table and the results of the OLAP measures derivation by an automated method according to the present invention. According to an aspect, it is possible to associate a member of a dimension to another member of the same or another dimension. According to another aspect, it is possible to create a new dimension of analysis, the members of which are all the different values that a scalar expression can take on. According to yet another aspect, it is possible to access the various instances of a Reporting Object as members in an OLAP dimension. According to the yet another aspect, it is possible to apply opaque filters or a combination of them to the data that underlies analysis.

344 citations


Journal ArticleDOI
TL;DR: Multidimensional database technology will increasingly be applied where analysis results are fed directly into other systems, thereby eliminating humans from the loop, when coupled with the need for continuous updates.
Abstract: Multidimensional database technology is a key factor in the interactive analysis of large amounts of data for decision making purposes. In contrast to previous technologies, these databases view data as multidimensional cubes that are particularly well suited for data analysis. Multidimensional models categorize data either as facts with associated numerical measures or as textual dimensions that characterize the facts. Queries aggregate measure values over a range of dimension values to provide results such as total sales per month of a given product. Multidimensional database technology is being applied to distributed data and to new types of data that current technology often cannot adequately analyze. For example, classic techniques such as preaggregation cannot ensure fast query response times when data-such as that obtained from sensors or GPS-equipped moving objects-changes continuously. Multidimensional database technology will increasingly be applied where analysis results are fed directly into other systems, thereby eliminating humans from the loop. When coupled with the need for continuous updates, this context poses stringent performance requirements not met by current technology.

303 citations


Journal ArticleDOI
TL;DR: An approach that provides a theoretical foundation for the use of object-oriented databases and object-relational databases in data warehouse, multidimensional database, and online analytical processing applications and introduces a set of minimal constraints and extensions to the Unified Modeling Language for representing multiddimensional modeling properties for these applications.
Abstract: The authors propose an approach that provides a theoretical foundation for the use of object-oriented databases and object-relational databases in data warehouse, multidimensional database, and online analytical processing applications. This approach introduces a set of minimal constraints and extensions to the Unified Modeling Language for representing multidimensional modeling properties for these applications. Multidimensional modeling offers two benefits. First, the model closely parallels how data analyzers think and, therefore, helps users understand data. Second, multidimensional modeling helps predict what final users want to do, thereby facilitating performance improvements. The authors are using their approach to create an automatic implementation of a multidimensional model. They plan to integrate commercial online-analytical-processing tool facilities within their GOLD model case tool as well, a task that involves data warehouse prototyping and sample data generation issues.

298 citations


Journal ArticleDOI
TL;DR: The data model and query evaluation techniques discussed in this paper can be implemented using relational database technology and is also capable of exploiting multidimensional query processing techniques like pre-aggregation.

275 citations


Book
01 Jan 2001
TL;DR: This book discusses the design and management of data Warehousing, and the role of metadata in this process.
Abstract: Foreword. Preface. PART 1: OVERVIEW AND CONCEPTS. The Compelling Need for Data Warehousing. Data Warehouse: The Building Blocks. Trends in Data Warehousing. PART 2: PLANNING AND REQUIREMENTS. Planning and Project Management. Defining the Business Requirements. Requirements as the Driving Force for Data Warehousing. PART 3: ARCHITECTURE AND INFRASTRUCTURE. The Architectural Components. Infrastructure as the Foundation for Data Warehousing. The Significant Role of Metadata. PART 4: DATA DESIGN AND DATA PREPARATION. Principles of Dimensional Modeling. Dimensional Modeling: Advanced Topics. Data Extraction, Transformation, and Loading. Data Quality: A Key to Success. PART 5: INFORMATION ACCESS AND DELIVERY. Matching Information to the Classes of Users. OLAP in the Data Warehouse. Data Warehousing and the Web. Data Mining Basics. PART 6: IMPLEMENTATION AND MAINTENANCE. The Physical Design Process. Data Warehouse Deployment. Growth and Maintenance. Appendix A: Project Life Cycle Steps and Checklists. Appendix B: Critical Factors for Success. Appendix C: Guidelines for Evaluating Vendor Solutions. References. Glossary. Index.

214 citations


Patent
31 May 2001
TL;DR: In this paper, improved techniques for generating multi-dimensional output using a relational source database are disclosed. The techniques allow generation of instructions needed to access the source database in order to produce multidimensional output, which do not need to be stored and can be generated dynamically.
Abstract: Improved techniques for generating multidimensional output using a relational source database are disclosed. The techniques allow generation of instructions needed to access the source database in order to produce multidimensional output. The instructions do not need to be stored and can be generated dynamically. As a result, relational databases can be accessed without requiring additional programming and/or changes to the relational database. The source database can be a relational database that is accessed by a variety of conceptual accessing techniques (e.g., SQL). A Meta-data manager can be used to access the source database, as well as interact with a Meta-data modeler and a Meta-data outliner. The Meta-data modeler can be used to define a Meta-model. The Meta-data outliner can be used to create one or more Meta- outlines. The Meta-model and Meta outline, together, provide a mechanism for describing the semantics of relational and OLAP models in a way that bi-directional access to physical structures in data bases can be accomplished without having to rely on 'hard-coded' data values.

211 citations


Patent
28 Feb 2001
TL;DR: In this article, a stand-alone aggregation server for multidimensional databases (MDDBs) is presented, which can uniformly distribute data elements among a plurality of processors for balanced loading and processing, and therefore is highly scalable.
Abstract: Improved method of and apparatus for aggregating data elements in multidimensional databases (MDDB). In one aspect of the present invention, the apparatus is realized in the form of a high-performance stand-alone (i.e. external) aggregation server which can be plugged-into conventional OLAP systems to achieve significant improments in system performance. In accordance with the principles of the present invention, the stand-alone aggregation server contains a scalable MDDB and a high-performance aggregation engine that are integrated into the modular architecture of the aggregation server. The stand-alone aggregation server of the present invention can uniformly distribute data elements among a plurality of processors, for balanced loading and processing, and therefore is highly scalable. The stand-alone aggregation server of the present invention can be used to realize (i) an improved MDDB for supporting on-line analytical processing (OLAP) operations, (ii) an improved Internet URL Directory for supporting on-line information searching operations by Web-enabled client machines, as well as (iii) diverse types of MDDB-based systems for supporting real-time control of processes in response to complex states of information reflected in the MDDB. In another aspect of the present invention, the apparatus is integrated within a database management system (DBMS). The improved DBMS can be used to realize achieving a significant increase in system performance (e.g. deceased access/search time), user flexibility and ease of use. The improved DBMS system of the present invention can be used to realize an improved Data Warehouse for supporting on-line analytical processing (OLAP) operations or to realize an improved informational database system, operational database system, or the like.

191 citations


Journal ArticleDOI
Charu C. Aggarwal1, Philip S. Yu
TL;DR: The problem of online mining of association rules in a large database of sales transactions is discussed, with the use of nonredundant association rules helping significantly in the reduction of irrelevant noise in the data mining process.
Abstract: We discuss the problem of online mining of association rules in a large database of sales transactions. The online mining is performed by preprocessing the data effectively in order to make it suitable for repeated online queries. We store the preprocessed data in such a way that online processing may be done by applying a graph theoretic search algorithm whose complexity is proportional to the size of the output. The result is an online algorithm which is independent of the size of the transactional data and the size of the preprocessed data. The algorithm is almost instantaneous in the size of the output. The algorithm also supports techniques for quickly discovering association rules from large itemsets. The algorithm is capable of finding rules with specific items in the antecedent or consequent. These association rules are presented in a compact form, eliminating redundancy. The use of nonredundant association rules helps significantly in the reduction of irrelevant noise in the data mining process.

128 citations


Proceedings ArticleDOI
29 Nov 2001
TL;DR: It is shown that the e-commerce domain can provide all the right ingredients for successful data mining and an integrated architecture for supporting this integration is described, which can dramatically reduce the pre-processing, cleaning, and data understanding effort in knowledge discovery projects.
Abstract: We show that the e-commerce domain can provide all the right ingredients for successful data mining. We describe an integrated architecture for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We emphasize the need for data collection at the application server layer (not the Web server) in order to support logging of data and metadata that is essential to the discovery process. We describe the data transformation bridges required from the transaction processing systems and customer event streams (e.g., clickstreams) to the data warehouse. We detail the mining workbench, which needs to provide multiple views of the data through reporting, data mining algorithms, visualization, and OLAP. We conclude with a set of challenges.

125 citations


Proceedings Article
11 Sep 2001
TL;DR: A new operator can automatically generalize from a specific problem case in detailed data and return the broadest context in which the problem occurs and a compact and easy-to-interpret summary of all possible maximal generalizations along various roll-up paths around the case is proposed.
Abstract: In this paper we propose a new operator for advanced exploration of large multidimensional databases. The proposed operator can automatically generalize from a specific problem case in detailed data and return the broadest context in which the problem occurs. Such a functionality would be useful to an analyst who after observing a problem case, say a drop in sales for a product in a store, would like to find the exact scope of the problem. With existing tools he would have to manually search around the problem tuple trying to draw a pattern. This process is both tedious and imprecise. Our proposed operator can automate these manual steps and return in a single step a compact and easy-to-interpret summary of all possible maximal generalizations along various roll-up paths around the case. We present a fle xible cost-based framework that can generalize various kinds of behaviour (not simply drops) while requiring little additional customization from the user. We design an algorithm that can work efficiently on large multidimensional hierarchical data cubes so as to be usable in an interactive setting.

01 Jan 2001
TL;DR: The central concept of the model is the Multidimensional Aggregation Cube, which gives a broad and flexible definition to the notion of a multidimensional cube, and it is shown that MAC offers a unique combination of modeling skills.
Abstract: In this paper we address the issue of conceptual modeling of data used in multidimensional analysis. We view the problem from the end-user point of view and we describe a set of requirements for the conceptual modeling of realworld OLAP scenarios. Based on those requirements we then define a new conceptual model that intends to capture the static properties of the involved information. In its definition we use a minimal set of well-understood OLAP concepts like dimensions, levels, hierarchies, measures and cubes. The central concept of the model is the Multidimensional Aggregation Cube (MAC), which gives a broad and flexible definition to the notion of a multidimensional cube. We evaluate our model against other existing multidimensional models and show that MAC offers a unique combination of modeling skills. Our main contribution is the definition of the basic concepts of our model; although the set of requirements and the evaluation of all related models against those requirements represent an additional result.

Journal ArticleDOI
01 Dec 2001
TL;DR: Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available, so functionality can be provided by extending the conventional data warehouse architecture with analysis rules.
Abstract: Conventional data warehouses are passive. All tasks related to analysing data and making decisions must be carried out manually by analysts. Today's data warehouse and OLAP systems offer little support to automatize decision tasks that occur frequently and for which well-established decision procedures are available. Such a functionality can be provided by extending the conventional data warehouse architecture with analysis rules, which mimic the work of an analyst during decision making. Analysis rules extend the basic event/condition/action (ECA) rule structure with mechanisms to analyse data multidimensionally and to make decisions. The resulting architecture is called active data warehouse.

Book ChapterDOI
16 Nov 2001
TL;DR: This paper proposes to extend traditional two-dimensional user/item recommender systems to support multiple dimensions, as well as comprehensive profiling and hierarchical aggregation (OLAP) capabilities.
Abstract: In this paper, we present a new data-warehousing-based approach to recommender systems. In particular, we propose to extend traditional two-dimensional user/item recommender systems to support multiple dimensions, as well as comprehensive profiling and hierarchical aggregation (OLAP) capabilities. We also introduce a new recommendation query language RQL that can express complex recommendations taking into account the proposed extensions. We describe how these extensions are integrated into a framework that facilitates more flexible and comprehensive user interactions with recommender systems.

Journal ArticleDOI
18 Jul 2001
TL;DR: This paper presents an approach to specification of OLAP DBs based on web data, using Unified Modeling Language (UML) as a basis for so-called UML snowflake diagrams that precisely capture the multidimensional structure of the data.
Abstract: On-Line Analytical Processing (OLAP) enables analysts to gain insight into data through fast and interactive access to a variety of possible views on information, organized in a dimensional model. The demand for data integration is rapidly becoming larger as more and more information sources appear in modern enterprises. In the data warehousing approach, selected information is extracted in advance and stored in a repository. This approach is used because of its high performance. However, in many situations a logical (rather than physical) integration of data is preferable. Previous Web-based data integration efforts have focused almost exclusively on the logical level of data models, creating a need for techniques focused on the conceptual level. Also, previous integration techniques for Web-based data have not addressed the special needs of OLAP tools such as handling dimensions with hierarchies. Extensible Markup Language (XML) is fast becoming the new standard for data representation and exchange on the World Wide Web. The rapid emergence of XML data on the Web, e.g., business-to-business (B2B) e-commerce, is making it necessary for OLAP and other data analysis tools to handle XML data as well as traditional data formats. Based on a real-world case study, the paper presents an approach to the conceptual specification of OLAP DBs based on Web data. Unlike previous work, this approach takes special OLAP issues such as dimension hierarchies and correct aggregation of data into account. Additionally, an integration architecture that allows the logical integration of XML and relational data sources for use by OLAP tools is presented.

Journal ArticleDOI
TL;DR: This article presents DynaMat, a system that manages dynamic collections of materialized aggregate views in a data warehouse, and shows how to derive an efficient update plan with respect to the available maintenance window, the different update policies for the views and the dependencies that exist among them.
Abstract: Materialized aggregate views represent a set of redundant entities in a data warehouse that are frequently used to accelerate On-Line Analytical Processing (OLAP). Due to the complex structure of the data warehouse and the different profiles of the users who submit queries, there is need for tools that will automate and ease the view selection and management processes. In this article we present DynaMat, a system that manages dynamic collections of materialized aggregate views in a data warehouse. At query time, DynaMat utilizes a dedicated disk space for storing computed aggregates that are further engaged for answering new queries. Queries are executed independently or can be bundled within a multiquery expression. In the latter case, we present an execution mechanism that exploits dependencies among the queries and the materialized set to further optimize their execution. During updates, DynaMat reconciles the current materialized view selection and refreshes the most beneficial subset of it within a given maintenance window. We show how to derive an efficient update plan with respect to the available maintenance window, the different update policies for the views and the dependencies that exist among them.

Proceedings ArticleDOI
09 Nov 2001
TL;DR: This work presents a technique that automates cube design given the data warehouse, functional dependency information, and sample OLAP queries expressed in the general form, and constructs complete but minimal cubes with low risks related to sparsity and incorrect aggregations.
Abstract: An On-Line Analytical Processing (OLAP) user often follows a train of thought, posing a sequence of related queries against the data warehouse. Although their details are not known in advance, the general form of those queries is apparent beforehand. Thus, the user can outline the relevant portion of the data posing generalised queries against a cube representing the data warehouse.Since existing OLAP design methods are not suitable for non-professionals, we present a technique that automates cube design given the data warehouse, functional dependency information, and sample OLAP queries expressed in the general form. The method constructs complete but minimal cubes with low risks related to sparsity and incorrect aggregations. After the user has given queries, the system will suggest a cube design. The user can accept it or improve it by giving more queries. The method is also suitable for improving existing cubes using respective real MDX queries.

Proceedings ArticleDOI
01 May 2001
TL;DR: This paper investigates the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries and recommends a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.
Abstract: In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that the total cube has net been precomputed. The algorithmic space we explore considers trade-offs between parallelism, computation and I/0. Our main contribution is the development and a comprehensive evaluation of various novel, parallel algorithms. Specifically: (1) Algorithm RP is a straightforward parallel version of BUC [BR99]; (2) Algorithm BPP attempts to reduce I/0 by outputting results in a more efficient way; (3) Algorithm ASL, which maintains cells in a cuboid in a skiplist, is designed to put the utmost priority on load balancing; and (4) alternatively, Algorithm PT load-balances by using binary partitioning to divide the cube lattice as evenly as possible.We present a thorough performance evaluation on all these algorithms on a variety of parameters, including the dimensionality of the cube, the sparseness of the cube, the selectivity of the constraints, the number of processors, and the size of the dataset. A key finding is that it is not a one-algorithm-fit-all situation. We recommend a “recipe” which uses PT as the default algorithm, but may also deploy ASL under specific circumstances.

Journal Article
TL;DR: This article focuses on one analysis, which was performed by a team of physicians and computer science researchers, using a commercially available on-line analytical processing (OLAP) tool in conjunction with proprietary data mining techniques developed by HAL researchers.
Abstract: Healthcare provider organizations are faced with a rising number of financial pressures. Both administrators and physicians need help analyzing large numbers of clinical and financial data when making decisions. To assist them, Rush-Presbyterian-St. Luke's Medical Center and Hitachi America, Ltd. (HAL), Inc., have partnered to build an enterprise data warehouse and perform a series of case study analyses. This article focuses on one analysis, which was performed by a team of physicians and computer science researchers, using a commercially available on-line analytical processing (OLAP) tool in conjunction with proprietary data mining techniques developed by HAL researchers. The initial objective of the analysis was to discover how to use data mining techniques to make business decisions that can influence cost, revenue, and operational efficiency while maintaining a high level of care. Another objective was to understand how to apply these techniques appropriately and to find a repeatable method for analyzing data and finding business insights. The process used to identify opportunities and effect changes is described.

Book ChapterDOI
12 Jul 2001
TL;DR: The properties of topological relationships between 2D spatial objects with respect to pre-aggregation are analyzed and why traditional preaggregation techniques do not work in this setting and this knowledge is significantly extended.
Abstract: Data warehouses are becoming increasingly popular in the spatial domain, where they are used to analyze large amounts of spatial information for decision-making purposes The data warehouse must provide very fast response times if popular analysis tools such as On-Line Analytical Processing [2](OLAP) are to be applied successfully In order for the data analysis to have an adequate performance, pre-aggregation, ie, pre-computation of partial query answers, is used to speed up query processing Normally, the data structures in the data warehouse have to be very "well-behaved" in order for pre-aggregation to be feasible However, this is not the case in many spatial applicationsIn this paper, we analyze the properties of topological relationships between 2D spatial objects with respect to pre-aggregation and show why traditional preaggregation techniques do not work in this setting We then use this knowledge to significantly extend previous work on pre-aggregation for irregular data structures to also cover special spatial issues such as partially overlapping areas

Proceedings ArticleDOI
02 Apr 2001
TL;DR: The MD-join provides a clean separation between group definition and aggregate computation, allowing great flexibility in the expression of OLAP queries, and several algebraic transformations that allow relational algebra queries that include MD-joins to be optimized.
Abstract: OLAP queries (i.e. group-by or cube-by queries with aggregation) have proven to be valuable for data analysis and exploration. Many decision support applications need very complex OLAP queries, requiring a fine degree of control over both the group definition and the aggregates that are computed. For example, suppose that the user has access to a data cube whose measure attribute is Sum(Sales). Then the user might wish to compute the sum of sales in New York and the sum of sales in California for those data cube entries in which Sum(Sales)>$1,000,000. This type of complex OLAP query is often difficult to express and difficult to optimize using standard relational operators (including standard aggregation operators). In this paper, we propose the MD-join operator for complex OLAP queries. The MD-join provides a clean separation between group definition and aggregate computation, allowing great flexibility in the expression of OLAP queries. In addition, the MD-join has a simple and easily optimizable implementation, while the equivalent relational algebra expression is often complex and difficult to optimize. We present several algebraic transformations that allow relational algebra queries that include MD-joins to be optimized.

Journal ArticleDOI
TL;DR: In this article, the authors propose a model of a data cube and an algebra to support OLAP operations on this cube, and the model is simple and intuitive and the algebra provides a means to concisely express complex OLAP queries.
Abstract: Data warehousing and On-Line Analytical Processing (OLAP) are two of the most significant new technologies in the business data processing arena. A data warehouse, or decision support database, can be defined as a "very large" repository of historical data pertaining to an organization. OLAP refers to the technique of performing complex analysis over the information stored in a data warehouse. The complexity of queries required to support OLAP applications makes it difficult to implement using standard relational database technology. Moreover, currently there is no standard conceptual model for OLAP. There clearly is a need for such a model and an algebra as evidenced by the numerous SQL extensions offered by many vendors of OLAP products. In this paper we address this issue by proposing a model of a data cube and an algebra to support OLAP operations on this cube. The model we present is simple and intuitive, and the algebra provides a means to concisely express complex OLAP queries.

Journal ArticleDOI
TL;DR: Based on the experience with several OLAP tools, a more pragmatic approach to the design of multidimensional information systems that lets managers make the most of their companies' information assets is developed.
Abstract: Managers see information as a critical resource and require systems that let them exploit it for competitive advantage. One way to better use organizational information is via online analytical processing and multidimensional databases (MDDBs). OLAP and MDDBs present summarized information from company databases. They use multidimensional structures that let managers slice and dice views of company performance data and drill down into trouble spots. For over a decade, proponents have touted these tools as the ultimate executive information system, but most of the hype comes from product vendors themselves. Based on our experience with several OLAP tools, we have developed a more pragmatic approach to the design of multidimensional information systems that lets managers make the most of their companies' information assets.

Book ChapterDOI
27 Nov 2001
TL;DR: This work approaches the issue from the application side by introducing a methodology and a language for conceptual OLAP security design and shows how the relational model is predominate in operational systems while OLAP systems make use of the non-traditional multidimensional model.
Abstract: Traditionally data warehouses were queried by high level users (executive management, business analysts) only. As the range of potential users with data warehouse access is steadily growing, this assumption is no longer appropriate and the necessity of proper access control mechanisms arises. The security capabilities of available commercial OLAP systems are highly proprietary and the syntax of their security constraints is not suitable for design and documentation purposes. Also, approaches trying to derive the access control policies from the operational data sources have not been very successful, as the relational model is predominate in operational systems while OLAP systems make use of the non-traditional multidimensional model. Access control schemes do not map easily. We approach the issue from the application side by introducing a methodology and a language for conceptual OLAP security design.

Patent
07 Nov 2001
TL;DR: An improved method of and apparatus for aggregating data having at least one dimension logically organized into multiple hierarchies of items is presented in this paper, whereby such multiple hierarchy of items are transformed into a single hierarchy that is functionally equivalent to the multiple hierarchy.
Abstract: An improved method of and apparatus for aggregating data having at least one dimension logically organized into multiple hierarchies of items, whereby such multiple hierarchies of items are transformed into a single hierarchy that is functionally equivalent to the multiple hierarchies. In the hierarchy transformation process, a given child item is linked with a parent item in the single hierarchy when no other child item linked to the parent item has a child item in common with the given child item. In the event that at least one other child item linked to the parent item has a child item in common with the given child item, the given child item is not linked with the parent item in the single hierarchy. The improved data aggregation mechanism of the present invention achieves a significant increase in system performance (e.g. deceased access/search time). Moreover, the improved data aggregation mechanism of the present invention may be integrated into a standalone data aggregation server supporting an OLAP system (one or more OLAP servers and clients), or may be integrated into a database management system (DBMS), thus achieving improved user flexibility and ease of use. The improved DBMS system of the present invention can be used to realize an improved Data Warehouse for supporting on-line analytical processing (OLAP) operations or to realize an improved informational database system, operational database system, or the like.

Journal ArticleDOI
TL;DR: The system PARSIMONY is described, parallel and scalable infrastructure for multidimensional online analytical processing, used for both OLAP and data mining, and parallel algorithms for data mining on the multiddimensional cube structure for attribute-oriented association rules and decision-tree-based classification are developed.

Patent
02 Nov 2001
TL;DR: In this article, the authors present a system for dealing with complex planning calculations based on data warehouse or PDR data where some aggregated data or forecast data might be changed without directly manipulating the underlying data, and where there may be several relationships linking the data.
Abstract: This invention addresses the need for dealing with complex planning calculations based on data warehouse or Planning Data Repository (PDR) data where some aggregated data or forecast data might be changed without directly manipulating the underlying data, and where there may be several relationships linking the data. The system is able to deal with complex relationships along more than one axis or dimension. A number of iterations are typically used involving both back-solving and ‘forward-solving’. The subset of cells that needs to be recalculated is identified before steps of back-solving and/or forward-solving using parent/child tables. The scanning of these tables looking for potential dependencies is much simpler and faster than to looking at the actual formulae or functions relating the cells. The step of creating the parent/child tables is carried out in advance of the actual calculation by parsing all the relationships (formulae and functions) and summarising the dependencies between cells in the parent/child tables.

Proceedings ArticleDOI
18 Jul 2001
TL;DR: This work characterize the case where the aggregation functions can be correctly applied on macrodata (data cube) which are computed on the microdata.
Abstract: Aggregation functions are a class of generic functions which must be usable in any database application. We characterize the case where the aggregation functions can be correctly applied on macrodata (data cube) which are computed on the microdata.

Book ChapterDOI
04 Jan 2001
TL;DR: The proposed approach provides a modular framework for combining one-dimensional aggregation techniques to create space-optimal high-dimensional data cubes, and a large variety of cost tradeoffs can be generated, making it easy to find the right configuration based on the application requirements.
Abstract: Applications like Online Analytical Processing depend heavily on the ability to quickly summarize large amounts of information. Techniques were proposed recently that speed up aggregate range queries on MOLAP data cubes by storing pre-computed aggregates. These approaches try to handle data cubes of any dimensionality by dealing with all dimensions at the same time and treat the different dimensions uniformly. The algorithms are typically complex, and it is difficult to prove their correctness and to analyze their performance. We present a new technique to generate Iterative Data Cubes (IDC) that addresses these problems. The proposed approach provides a modular framework for combining one-dimensional aggregation techniques to create space-optimal high-dimensional data cubes. A large variety of cost tradeoffs for high-dimensional IDC can be generated, making it easy to find the right configuration based on the application requirements.

Book ChapterDOI
08 Sep 2001
TL;DR: This paper shows how to translate a TOLAP program to SQL, and presents a real-life case study, a medical center in Buenos Aires, to show how the proposed temporal multidimensional model and query language can address problems that occur in real situations and that current nontemporal commercial systems cannot deal with.
Abstract: Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. In earlier work we proposed a temporal multidimensional model and TOLAP, a query language supporting it, accounting for dimension updates and schema evolution at a high level of abstraction. In this paper we present our implementation of the model and the query language. We show how to translate a TOLAP program to SQL, and present a real-life case study, a medical center in Buenos Aires. We apply our implementation to this case study in order to show how our approach can address problems that occur in real situations and that current nontemporal commercial systems cannot deal with. We present results on query and dimension update performance, and briefly describe a visualization tool that allows editing and running TOLAP queries, performing dimension updates, and browsing dimensions across time.