scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 2000"


Proceedings Article
10 Sep 2000
TL;DR: This paper presents an end-to-end solution to the problem of selecting materialized views and indexes for SQL databases, and describes results of extensive experimental evaluation that demonstrate the effectiveness of the techniques.
Abstract: Automatically selecting an appropriate set of materialized views and indexes for SQL databases is a non-trivial task. A judicious choice must be cost-driven and influenced by the workload experienced by the system. Although there has been work in materialized view selection in the context of multidimensional (OLAP) databases, no past work has looked at the problem of building an industry-strength tool for automated selection of materialized views and indexes for SQL workloads. In this paper, we present an end-to-end solution to the problem of selecting materialized views and indexes. We describe results of extensive experimental evaluation that demonstrate the effectiveness of our techniques. Our solution is implemented as part of a tuning wizard that ships with Microsoft SQL Server 2000.

690 citations


Patent
31 Mar 2000
TL;DR: In this paper, a method, system and article of manufacture are provided which enhance the ability to aggregate, analyze and report data from a multidimensional database in a memory, which consists of a hierarchy of data items, each item representing an account.
Abstract: A method, system and article of manufacture are provided which enhance the ability to aggregate, analyze and report data from a multidimensional database in a memory. The database consists of a hierarchy of data items, each item representing an account. Each account may hold either numerical data (figures) or multimedia content. The invention is particularly useful in assisting financial controllers of a multinational company or other organization in gathering data and generating reports. The invention complies with existing on-line analytical processing (OLAP) standards, and improves upon conventional OLAP methods, software and systems. The present invention includes some innovative new features, such as: incubescent data structure, joint dimensions, delegation using "datareqs," cycle preservation, enrichment and instant consolidation. Navigation and communication tools are also included.

361 citations


Patent
21 Jan 2000
TL;DR: A personal intelligence network that actively delivers highly personalized and timely informational and transactional data from an OLAP based channel database system to individuals via e-mail, spreadsheet programs (over email), pager, telephone, mobile phone, fax, personal digital assistants, HTML e mail and other formats as mentioned in this paper.
Abstract: A personal intelligence network that actively delivers highly personalized and timely informational and transactional data from an OLAP based channel database system to individuals via e-mail, spreadsheet programs (over e-mail), pager, telephone, mobile phone, fax, personal digital assistants, HTML e-mail and other formats.

215 citations


Journal ArticleDOI
TL;DR: Methods for spatial OLAP, by integrating nonspatial OLAP methods with spatial database implementation techniques, are studied, with several strategies being proposed, including approximation and selective materialization of the spatial objects resulting from spatialOLAP operations.
Abstract: With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing and OLAP of spatial data. In this paper, we study methods for spatial OLAP, by integrating nonspatial OLAP methods with spatial database implementation techniques. A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed. Methods for the computation of spatial data cubes and analytical processing on such spatial data cubes are studied, with several strategies being proposed, including approximation and selective materialization of the spatial objects resulting from spatial OLAP operations. The focus of our study is on a method for spatial cube construction, called object-based selective materialization, which is different from cuboid-based selective materialization (proposed in previous studies of nonspatial data cube construction). Rather than using a cuboid as an atomic structure during the selective materialization, we explore granularity on a much finer level: that of a single cell of a cuboid. Several algorithms are proposed for object-based selective materialization of spatial data cubes, and a performance study has demonstrated the effectiveness of these techniques.

196 citations


01 Jan 2000
TL;DR: This paper shows how to systematically derive a conceptual warehouse schema that is even in generalized multidimensional normal form from an operational database.
Abstract: A data warehouse is an integrated and timevarying collection of data derived from operational data and primarily used in strategic decision making by means of online analytical processing (OLAP) techniques. Although it is generally agreed that warehouse design is a non-trivial problem and that multidimensional data models and star or snowflake schemata are relevant in this context, hardly any methods exist to date for deriving such a schema from an operational database. In this paper, we fill this gap by showing how to systematically derive a conceptual warehouse schema that is even in generalized multidimensional normal form.

191 citations


Proceedings Article
10 Sep 2000
TL;DR: This paper discusses the major issues of a UB-Tree integration and favors the kernel integration because of the tight coupling with the query optimizer, which allows for optimal usage of the UBTree in execution plans.
Abstract: Multidimensional access methods have shown high potential for significant performance improvements in various application domains. However, only few approaches have made their way into commercial products. In commercial database management systems (DBMSs) the BTree is still the prevalent indexing technique. Integrating new indexing methods into existing database kernels is in general a very complex and costly task. Exceptions exist, as our experience of integrating the UB-Tree into TransBase, a commercial DBMS, shows. The UB-Tree is a very promising multidimensional index, which has shown its superiority over traditional access methods in different scenarios, especially in OLAP applications. In this paper we discuss the major issues of a UB-Tree integration. As we will show, the complexity and cost of this task is reduced significantly due to the fact that the UBTree relies on the classical B-Tree. Even though commercial DBMSs provide interfaces for index extensions, we favor the kernel integration because of the tight coupling with the query optimizer, which allows for optimal usage of the UBTree in execution plans. Measurements on a real-world data warehouse show that the kernel integration leads to an additional performance improvement compared to our prototype implementation and competing index methods.

190 citations


Proceedings Article
10 Sep 2000

156 citations


Proceedings Article
01 Jan 2000
TL;DR: A tool for enhanced exploration of OLAP data that is adaptive to a user’s prior knowledge of the data that will enable much faster assimilation of all signicant information in the data compared to existing manual explorations.
Abstract: In this paper we present a tool for enhanced exploration of OLAP data that is adaptive to a user’s prior knowledge of the data. The tool continuously keeps track of the parts of the cube that a user has visited. The information in these scattered visited parts of the cube is pieced together to form a model of the user’s expected values in the unvisited parts. The mathematical foundation for this modeling is provided by the classical Maximum Entropy principle. At any time, the user can query for the most surprising unvisited parts of the cube. The most surprising values are dened as those which if known to the user would bring the new expected values closest to the actual values. This process of updating the user’s context based on visited parts and querying for regions to explore further continues in a loop until the user’s mental model perfectly matches the actual cube. We believe and prove through experiments that such a user-in-the-loop exploration will enable much faster assimilation of all signicant information in the data compared to existing manual explorations.

149 citations


Proceedings Article
10 Sep 2000
TL;DR: This paper introduces icicles, a new class of samples that tune themselves to a dynamic workload and shows, analytically, that for a certain class of queries reflected by the workload, icicles yield more accurate answers.
Abstract: Approximate query answering systems provide very fast alternatives to OLAP systems when applications are tolerant to small errors in query answers. Current sampling-based approaches to approximately answer aggregate queries over foreign key joins suffer from the following drawback. All tuples in relations are deemed equally important for answering queries even though, in reality, OLAP queries exhibit locality in their data access. Consequently, they may waste precious real estate by sampling tuples that are not required at all or required very rarely. In this paper, we introduce icicles, a new class of samples that tune themselves to a dynamic workload. Intuitively, the probability of a tuple being present in an icicle is proportional to its importance for answering queries in the workload. Therefore, an icicle consists of more tuples from a subset of the relation that is required to answer more queries in the workload. Consequently, the accuracy of approximate answers obtained by using icicles is better than a static uniform random sample. We show, analytically, that for a certain class of queries reflected by the workload, icicles yield more accurate answers. In a detailed experimental study, we examine the validity and performance of icicles.

148 citations


Book
01 Jan 2000
TL;DR: In this article, the authors present a series of end-to-end exercises to enable students to choose between three sets of alternative technologies: OLE DB, IIS, and ASP in Chapter 15 and again for JDBC, JSP, and MySQL in Chapter 16.
Abstract: From the Book: According to Alan Greenspan, Chairman of the U.S. Federal Reserve, information technology has enabled unprecedented increases in business productivity. While the Internet takes most of the credit, behind the scenes database technology plays a vital role. After all, the Internet is only a communication system; much of its value lies in the data and information transmitted to and from databases. News of the dot-com bust may cause students to wonder if the value of these technologies will decline accordingly. Nothing could be further from the truth. Lou Gestner, Chairman of IBM, stated several years ago that the true benefits of the Internet and related technologies will occur only after those technologies have been embraced by mainstream, corporate America—by the so-called "old economy" companies. Major opportunities for database technology (and for future database practitioners) lie in applying this technology now, to every kind of business and business activity. All of which means there has never been a better time to study database processing. From personal databases on desktops to large interorganizational databases distributed on computers worldwide, databases are increasingly important business assets. Marketing, sales, production, operations, finance, accounting, management, and indeed all business disciplines, are using database technology to gain increased productivity in their respective activities. Moreover, after the frenzy of new technologies and products in recent years, the key elements of modern database management have now become clear. Conceptual knowledge of data modeling and database design continue to be essential; equally, therelational model and SQL are as important as in the past. Database administration, especially the technology supporting multi-user database management, has increased in importance because all databases that use the new technologies are multi-user. Additionally, technology for publishing databases on the web, especially three-tier and multi-tier architectures, XML, Active Server Pages (ASP), and Java Server Pages (JSP) have emerged as winners among many contenders for database publishing. In concert with these technologies, both ODBC with OLE DB and JDBC continue their importance. In short, database technology is more important than ever, and the basic technologies that need to be taught have become clearer than any time in the past five years. FEATURES OF THIS EDITION In accordance with these remarks, the second half of this text has been completely rewritten. Almost all of Chapters 11 through 16 is new. The major tasks of database administration are surveyed in Chapter 11 and then illustrated for Oracle in Chapter 12, and again for SQL Server in Chapter 13. Then, Chapter 14 surveys the basic technologies for database publishing on the Web and these technologies are then illustrated for ODBC, OLE DB, IIS, and ASP in Chapter 15 and again for JDBC, JSP, and MySQL in Chapter 16. Chapter 17 includes information on OLAP, while Chapter 18 introduces Oracle's new object-relational constructs. Addressing all of these topics in a single term is a challenge, and I believe we need seriously to consider devoting a full year to the database class. Meanwhile, if you have just one term and time is short, this edition has been written to enable you to choose among three sets of alternative technologies. Specifically, regarding data modeling, the text addresses the entity-relationship model and the semantic object model. If time is short, you might want to cover only the E-R model because it is far more popular. Similarly, regarding multi-user databases, pick either Oracle in Chapter 12 or SQL Server in Chapter 13 depending on the needs of graduates in your community'. Finally, regarding Web' publishing, if time constrains your course, choose either IIS, ASP, and ODBC in Chapter 15; or Java, JDBC, and JSP in Chapter 16. No loss of continuity will occur if you select only one of any of these three pairs. Of course, if you're not constrained by time, all of these topics are important. This edition also includes a new series of end-of-chapter exercises. These concern a small company that markets, sells, produces, and supports a line of camping stoves. The goal of these exercises is to enable the students to apply the knowledge gained from each chapter to a small, realistic, but constrained application. CHAPTER-BY-CHAPTER OVERVIEW This text consists of seven parts. Part I introduces database processing. Chapter 1 illustrates sample applications, defines basic terms, and sketches the history of database processing. Chapter 2 then illustrates the development of a simple database and application using Microsoft Access XP. The second part concerns data modeling. Chapter 3 discusses the entity-relationship model and shows how this model has been integrated with UML, or the Uniform Modeling Language. Chapter 4 presents the semantic object model, a data modeling alternative to the E-R model. Database design is the subject of Part III. Chapter 5 discusses the relational model and normalization. Chapter 6 then applies the ideas from Chapters 3 and 5 to transform entity-relationship models into relational database designs. Chapter 7 applies the ideas from Chapters 4 and 5 to transform semantic object models into relational database designs. The next part addresses the fundamentals of relational database implementation. Chapter 8 presents an overview, Chapter 9 addresses procedural SQL, and Chapter 10 describes the design of relational database applications. Part V considers multi-user database management. Chapter 11 describes database administration and discusses important issues of multi-user database processing including concurrency control, security, and backup and recovery. The ideas presented in Chapter 11 are then illustrated for Oracle in Chapter 12. Chapter 12 also illustrates SQL for data definition. Chapter 13 also mirrors the discussion of Chapter 11 to illustrate multi-user database management using SQL Server. Database publishing on the Web is next addressed in Part VI. Chapter 14 lays the foundations of network processing, multi-tier architectures and XML. Chapter 15 then applies these concepts using Microsoft technology including ODBC, OLE DB, IIS, and ASP. Chapter 16 applies the concepts of Chapter 14 using Java; it includes JDBC, JSP, and MySQL. Concepts are illustrated with example using Linux and Apache Tomcat. Chapter 17 then addresses issues of data administration and discusses OLAP. Part VII contains only one chapter which addresses object-oriented database processing. New to this chapter is a discussion of Oracles object-relational features and functions. Appendix A contains a brief survey of data structures and Appendix B illustrates the use of Tabledesigner, a product that can be used to develop semantic object models and covert them into database designs and ASP pages.

136 citations


Patent
13 Jan 2000
TL;DR: In this paper, a method for graphically analyzing relationships in data (103) from one or more data sources of an enterprise is presented. But the method is especially useful in conjunction with a meta-model based technique for modeling the enterprise data.
Abstract: According to the invention, techniques for visualizing customer data (103) contained in databases (6), data marts and data warehouses (8). In an exemplary embodiment, the invention provides a method for graphically analyzing relationships in data (103) from one or more data sources of an enterprise. The method can be used with many popular visualization tools (21), such as a On Line Analytical Processing (OLAP) tools (2) and the like. The method is especially useful in conjunction with a meta-model (103) based technique for modeling the enterprise data. The enterprise is typically a business activity (21), but can also be other loci of human activity (10). Embodiments according to the invention can display data from a variety of sources in order to provide visual representations of data in a data warehousing environment (8).

Patent
28 Feb 2000
TL;DR: In this article, an improved method of and apparatus for aggregating data elements in multidimensional databases (MDDB) realized in the form of a high-performance stand-alone (i.e. external) aggregation server which can be plugged-into conventional OLAP systems to achieve significant improvements in system performance.
Abstract: An improved method of and apparatus for aggregating data elements in multidimensional databases (MDDB) realized in the form of a high-performance stand-alone (i.e. external) aggregation server which can be plugged-into conventional OLAP systems to achieve significant improvements in system performance. In accordance with the principals of the present invention, the stand-alone aggregation server contains a scalable MDDB and a high-performance aggregation engine that are integrated into the modular architecture of the aggregation server. The stand-alone aggregation server of the present invention can uniformly distributed data elements among a plurality of processors, for balanced loading and processing, and therefore is highly scalable. The stand-alone aggregation server of the present invention can be used to realize (i) an improved MDDB for supporting on-line analytical processing (OLAP) operations, (ii) an improved Internet URL Directory for supporting on-line information searching operations by Web-enabled client machines, as well as (iii) diverse types of MDDB-based systems for supporting real-time control of processes in response to complex states of information reflected in the MDDB.

Book ChapterDOI
04 Sep 2000
TL;DR: The PROMISE (Predicting User Behavior in Multidimensional Information System Environments) approach as discussed by the authors deploys information about characteristic patterns in the user's multidimensional data access in order to improve caching algorithms of OLAP systems.
Abstract: This paper discusses the PROMISE (Predicting User Behavior in Multidimensional Information System Environments) approach, that deploys information about characteristic patterns in the user's multidimensional data access in order to improve caching algorithms of OLAP systems. The paper motivates this approach by presenting results of an analysis of the user behavior in a real-world OLAP environment. Further contributions of this paper are a model for characteristic OLAP query patterns based on Markov Models and a corresponding OLAP query prediction algorithm.

Book ChapterDOI
21 Jun 2000
TL;DR: This paper first proposes a conceptual multidimensional data model, which is able to represent and capture natural hierarchical relationships among members within a dimension as well as the relationships between dimension members and measure data values, and uses UML (Unified Modeling Language) to model it in the context of object oriented databases.
Abstract: Online Analytical Processing (OLAP) data is frequently organized in the form of multidimensional data cubes each of which is used to examine a set of data values, called measures, associated with multiple dimensions and their multiple levels. In this paper, we first propose a conceptual multidimensional data model, which is able to represent and capture natural hierarchical relationships among members within a dimension as well as the relationships between dimension members and measure data values. Hereafter, dimensions and data cubes with their operators are formally introduced. Afterward, we use UML (Unified Modeling Language) to model the conceptual multidimensional model in the context of object oriented databases.

Proceedings ArticleDOI
01 Nov 2000
TL;DR: The goal of this paper is to introduce an OLAP security design methodology, pointing out fields that require further research work, and presenting possible access control requirements categorized by their complexity.
Abstract: With the use of data warehousing and online analytical processing (OLAP) for decision support applications new security issues arise. The goal of this paper is to introduce an OLAP security design methodology, pointing out fields that require further research work. We present possible access control requirements categorized by their complexity. OLAP security mechanisms and their implementations in commercial systems are presented and checked for their suitability to address the requirements. Traditionally data warehouses were queried by high level users (executive management, business analysts) only. As the range of potential users with data warehouse access is steadily growing, this assumption is no longer appropriate and the necessity of proper access control mechanisms arises. However, a data warehouse is primarily built as an open system. Especially exploratory OLAP analysis requires this open nature; security controls may hinder the analytical discovery process.

Patent
27 Jun 2000
TL;DR: An active cache as mentioned in this paper is an OAP system that can not only answer queries that match data stored in the cache, but can also handle queries that require aggregation or other computation of the stored data.
Abstract: An “active cache”, for use by On-Line Analytic Processing (OLAP) systems, that can not only answer queries that match data stored in the cache, but can also answer queries that require aggregation or other computation of the data stored in the cache.

Journal Article
TL;DR: This paper discusses the PROMISE (Predicting User Behavior in Multidimensional Information System Environments) approach, that deploys information about characteristic patterns in the user's multidimensional data access in order to improve caching algorithms of OLAP systems.
Abstract: This paper discusses the PROMISE (Predicting User Behavior in Multidimensional Information System Environments) approach, that deploys information about characteristic patterns in the user's multidimensional data access in order to improve caching algorithms of OLAP systems. The paper motivates this approach by presenting results of an analysis of the user behavior in a real-world OLAP environment. Further contributions of this paper are a model for characteristic OLAP query patterns based on Markov Models and a corresponding OLAP query prediction algorithm.

Proceedings ArticleDOI
01 Nov 2000
TL;DR: The paper lists typical mismatches between the data model of commercial OLAP tools and conceptual graphical modeling notations, and proposes methods to overcome these expressive differences during the generation process.
Abstract: tool specific schemata and configuration information for OLAP database tools from conceptual graphical models is an important prerequisite for a comprehensive tool support for computer aided data warehouse engineering (CAWE). This paper describes the design and implementation of such a generation component in the context of our BabelFish data warehouse design tool environment. It identifies the principal issues that are involved in the design and implementation of such a component and discusses possible solutions. The paper lists typical mismatches between the data model of commercial OLAP tools and conceptual graphical modeling notations, and proposes methods to overcome these expressive differences during the generation process. Further topics are the use of graph grammars for specifying and parsing graphical MD schema descriptions and the integration of the generation process into a metadata centered modeling tool environment.

Patent
18 Dec 2000
TL;DR: A computer-implemented data mining system includes an Interface Tier, an Analysis Tier, and a Database Tier as discussed by the authors, which includes an On-Line Analytic Processing (OLAP) Client that provides a user interface for generating SQL statements that retrieve data from a database, and an Analysis Client that displays results from a data mining algorithm.
Abstract: A computer-implemented data mining system includes an Interface Tier, an Analysis Tier, and a Database Tier. The Interface Tier supports interaction with users, and includes an On-Line Analytic Processing (OLAP) Client that provides a user interface for generating SQL statements that retrieve data from a database, and an Analysis Client that displays results from a data mining algorithm. The Analysis Tier performs one or more data mining algorithms, and includes an OLAP Server that schedules and prioritizes the SQL statements received from the OLAP Client, an Analytic Server that schedules and invokes the data mining algorithm to analyze the data retrieved from the database, and a Learning Engine performs a Learning step of the data mining algorithm. The Database Tier stores and manages the databases, and includes an Inference Engine that performs an Inference step of the data mining algorithm, a relational database management system (RDBMS) that performs the SQL statements against a Data Mining View to retrieve the data from the database, and a Model Results Table that stores the results of the data mining algorithm.

Patent
06 Sep 2000
TL;DR: In this paper, the push-down of the filtering operation into the window sort operation corresponding to a target ranking function is described for optimizing the computation of OLAP ranking functions, and the pushdown technique may be employed when a predetermined set of pushdown conditions are met.
Abstract: Techniques are described for optimizing the computation of OLAP ranking functions. The techniques involve push-down of the filtering operation into the window sort operation corresponding to a target ranking function. The push-down technique may be employed when a predetermined set of push-down conditions are met.

Book ChapterDOI
27 Mar 2000
TL;DR: It is shown that for multicube workloads substantial performance improvements can be realized by using the multi-cube algorithms, and that algorithms and cost models developed for single cube precomputation must be extended to deal well with themulti-cube case.
Abstract: OLAP applications use precomputation of aggregate data to improve query response time. While this problem has been well-studied in the recent database literature, to our knowledge all previous work has focussed on the special case in which all aggregates are computed from a single cube (in a star schema, this corresponds to there being a single fact table). This is unfortunate, because many real world applications require aggregates over multiple fact tables. In this paper, we attempt to fill this lack of discussion about the issues arising in multi-cube data models by analyzing these issues. Then we examine performance issues by studying the precomputation problem for multi-cube systems. We show that this problem is significantly more complex than the single cube precomputation problem, and that algorithms and cost models developed for single cube precomputation must be extended to deal well with the multi-cube case. Our results from a prototype implementation show that for multicube workloads substantial performance improvements can be realized by using the multi-cube algorithms.

Journal ArticleDOI
01 Jul 2000
TL;DR: This work proposes a new approach to implement an index structure on top of a commercial relational database system, which maps the index structure to a relational database design and simulates the behavior of theindex structure using triggers and stored procedures.
Abstract: Efficient query processing is one of the basic needs for data mining algorithms. Clustering algorithms, association rule mining algorithms and OLAP tools all rely on efficient query processors being able to deal with high-dimensional data. Inside such a query processor, multidimensional index structures are used as a basic technique. As the implementation of such an index structure is a difficult and time-consuming task, we propose a new approach to implement an index structure on top of a commercial relational database system. In particular, we map the index structure to a relational database design and simulate the behavior of the index structure using triggers and stored procedures. This can be easily done for a very large class of multidimensional index structures. To demonstrate the feasibility and efficiency, we implemented an X-tree on top of Oracle8. We ran several experiments on large databases and recorded a performance improvement up to a factor of 11.5 compared to a sequential scan of the database.

Book ChapterDOI
05 Jun 2000
TL;DR: This paper proposes a logical model for cubes based on the key observation that a cube is not a self-existing entity, but rather a view over an underlying data set, and accompanies this model with syntactic characterisations for the problem of cube usability.
Abstract: It is commonly agreed that multidimensional data cubes form the basic logical data model for OLAP applications. Still, there seems to be no agreement on a common model for cubes. In this paper we propose a logical model for cubes based on the key observation that a cube is not a self-existing entity, but rather a view over an underlying data set. We accompany our model with syntactic characterisations for the problem of cube usability. To this end, we have developed algorithms to check whether (a) the marginal conditions of two cubes are appropriate for a rewriting, in the presence of aggregation hierarchies and (b) an implication exists between two selection conditions that involve different levels of aggregation of the same dimension hierarchy. Finally, we present a rewriting algorithm for the cube usability problem.

Patent
Li-Wen Chen1, Hwa Chung Feng1
21 Mar 2000
TL;DR: In this paper, a meta-model based technique for modeling the enterprise data is proposed to create a dynamic customer profile by analyzing relationships in data from one or more data sources of an enterprise.
Abstract: According to the invention, techniques for profiling of human behavior based upon analyzing data contained in databases, data marts and data warehouses. In an exemplary embodiment, the invention provides for creating a dynamic customer profile by analyzing relationships in data from one or more data sources of an enterprise. The method can be used with many popular visualization tools, such as On Line Analytical Processing (OLAP) tools and the like. The method is especially useful in conjunction with a meta-model based technique for modeling the enterprise data. The enterprise is typically a business activity, but can also be other loci of human activity. The human behavior profiled is typically that of a customer, but can be any other type of human behavior. Embodiments according to the invention can display data from a variety of sources in order to provide visual representations of data in a data warehousing environment.

Journal ArticleDOI
TL;DR: The current business environment of the data warehouse is investigated, including OLAP, data mining, data visualization and other technologies, and the importance of data warehouse management and maintenance and its future developments.
Abstract: Data warehousing is the technological trend for the corporate decision support process. This article investigates the current business environment of the data warehouse, including OLAP, data mining, data visualization and other technologies. This article also analyzes the importance of data warehouse management and maintenance and its future developments.

Patent
10 Mar 2000
TL;DR: In this article, a distributed OLAP-based method and system for generating association rules is proposed for processing transaction data to generate summary information, customer profiles, and association rules, which can be provided to the data warehouse/OLAP stations for business planning.
Abstract: A distributed OLAP-based method and system for generating association rules An architecture is provided for processing transaction data to generate summary information, customer profiles, and association rules The distributed system includes at least two layers of data warehouse/OLAP stations: local data-warehouse OLAP stations (LDOSs) and a global data-warehouse OLAP station (GDOS) The LDOSs perform local data mining and summarization, and the GDOS merges, mines, and summarizes the input data received from LDOSs The summarized data is then utilized by the GDOS to generate association rules that can be provided to the LDOSs for business planning

Journal ArticleDOI
16 May 2000
TL;DR: This work investigated how and why analysts currently explore the data cube and then automated them using advanced operators that can be invoked interactively like existing simple operators, in the form of a toolkit attached with a OLAP product.
Abstract: The goal of the i3(eye cube) project is to enhance multidimensional database products with a suite of advanced operators to automate data analysis tasks that are currently handled through manual exploration. Most OLAP products are rather simplistic and rely heavily on the user's intuition to manually drive the discovery process. Such ad hoc user-driven exploration gets tedious and error-prone as data dimensionality and size increases. We first investigated how and why analysts currently explore the data cube and then automated them using advanced operators that can be invoked interactively like existing simple operators.Our proposed suite of extensions appear in the form of a toolkit attached with a OLAP product. At this demo we will present three such operators: DIFF, RELAX and INFORM with illustrations from real-life datasets.

Patent
30 Jun 2000
TL;DR: In this article, a method, a computer system and a computer program product for transforming general On-line Analytical Processing (OLAP) hierarchies into summarizable hierarchies whereby pre-aggregation is disclosed, by which fast query response times for aggregation queries without excessive storage use is made possible even when the hierarchies originally are irregular.
Abstract: A method, a computer system and a computer program product for a computer system for transforming general On-line Analytical Processing (OLAP) hierarchies into summarizable hierarchies whereby pre-aggregation is disclosed, by which fast query response times for aggregation queries without excessive storage use is made possible even when the hierarchies originally are irregular. Pre-aggregation is essential for ensuring adequate response time during data analysis. Most OLAP systems adopt the practical pre-aggregation approach, as opposed to full pre-aggregation, of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. The present invention significantly extends the scope of practical pre-aggregation by transforming irregulare dimension hierarchies and fact-dimension relationships into well-behaved structures that enable practical pre-aggregation.

Book ChapterDOI
27 Mar 2000
TL;DR: It turns out that hybrid is superior to full replication, even without updates, and coordinator-based routing has good scaleup properties for scenarios with complex analysis queries.
Abstract: This article quantifies the benefit from simple data organization schemes and elementary query routing techniques for the PowerDB engine, a system that coordinates a cluster of databases. We report on evaluations for a specific scenario: the workload contains OLAP queries, OLTP queries, and simple updates, borrowed from the TPC-R benchmark.We investigate affinity of OLAP queries and different routing strategies for such queries. We then compare two simple data placement schemes, namely full replication and a hybrid one combining partial replication with partitioning. We run different experiments with queries only, with updates only, and with queries concurrently to simple updates. It turns out that hybrid is superior to full replication, even without updates. Our overall conclusion is that coordinator-based routing has good scaleup properties for scenarios with complex analysis queries.

Proceedings ArticleDOI
01 Feb 2000
TL;DR: A scalable data-warehouse/OLAP framework is developed, and the notion of dynamic data warehousing for managing information at different aggregation levels with different life spans is introduced, demonstrating the practical value of the above framework in supporting an important class of telecommunication business intelligence applications.
Abstract: In a telecommunication network, hundreds of millions of call detail records (CDRs) are generated daily. Applications such as tandem traffic analysis require the collection and mining of CDRs on a continuous basis. The data volumes and data flow rates pose serious scalability and performance challenges. This has motivated us to develop a scalable data-warehouse/OLAP framework, and based on this framework, tackle the issue of scaling the whole operation chain, including data cleansing, loading, maintenance, access and analysis. We introduce the notion of dynamic data warehousing for managing information at different aggregation levels with different life spans. We use OLAP servers, together with the associated multidimensional databases, as a computation platform for data caching, reduction and aggregation, in addition to data analysis. The framework supports parallel computation for scaling up data mining, and supports incremental OLAP for providing continuous data mining. A tandem traffic analysis engine is implemented on the proposed framework. In addition to the parallel and incremental computation architecture, we provide a set of application-specific optimization mechanisms for scaling performance. These mechanisms fit well into the above framework. Our experience demonstrates the practical value of the above framework in supporting an important class of telecommunication business intelligence applications.