scispace - formally typeset
Search or ask a question

Showing papers on "Data warehouse published in 1998"


Journal ArticleDOI
TL;DR: Y the authors' company decides to build a data warehouse and you are designated the project manager, and you have specific questions that need specific answers, and building a data Warehouse is an extremely complex process.
Abstract: Y our company decides to build a data warehouse and you are designated the project manager. What are your first steps? You’ve read the books, attended the conferences, and perused the trade publications. Now you have to act. There are numerous vendors, all touting the wonders of their products, but you have specific questions that need specific answers, and building a data warehouse is an extremely complex process. Questions you have to weigh fall into the following general categories:

1,272 citations


Book
01 Jan 1998
TL;DR: Drawing upon their experiences with numerous data warehouse implementations, Ralph Kimball and his coauthors show you all the practical details involved in planning, designing, developing, deploying, and growing data warehouses.
Abstract: The Chess Pieces. PROJECT MANAGEMENT AND REQUIREMENTS. The Business Dimensional Lifecycle. Project Planning and Management. Collecting the Requirements. DATA DESIGN. A First Course on Dimensional Modeling. A Graduate Course on Dimensional Modeling. Building Dimensional Models. ARCHITECTURE. Introducing Data Warehouse Architecture. Back Room Technical Architecture. Architecture for the Front Room. Infrastructure and Metadata. A Graduate Course on the Internet and Security. Creating the Architecture Plan and Selecting Products. IMPLEMENTATION. A Graduate Course on Aggregates. Completing the Physical Design. Data Staging. Building End User Applications. DEPLOYMENT AND GROWTH. Planning the Deployment. Maintaining and Growing the Data Warehouse. Appendices. Index.

547 citations


Proceedings Article
24 Aug 1998
TL;DR: It can be proven that the incremental algorithm yields the same result as DBSCAN, which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database.
Abstract: Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, it is highly desirable to perform these updates incrementally. In this paper, we present the first incremental clustering algorithm. Our algorithm is based on the clustering algorithm DBSCAN which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database. Due to the density-based nature of DBSCAN, the insertion or deletion of an object affects the current clustering only in the neighborhood of this object. Thus, efficient algorithms can be given for incremental insertions and deletions to an existing clustering. Based on the formal definition of clusters, it can be proven that the incremental algorithm yields the same result as DBSCAN. A performance evaluation of IncrementalDBSCAN on a spatial database as well as on a WWW-log database is presented, demonstrating the efficiency of the proposed algorithm. IncrementalDBSCAN yields significant speed-up factors over DBSCAN even for large numbers of daily updates in a data warehouse.

538 citations


Proceedings ArticleDOI
01 Jun 1998
TL;DR: This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution, and considers their application to providing fast approximate answers to hot list queries.
Abstract: In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.

515 citations


Journal ArticleDOI
01 Oct 1998
TL;DR: This paper evaluates Active Disk architectures which integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk.
Abstract: Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk. The key idea is to offload bulk of the processing to the diskresident processors and to use the host processor primarily for coordination, scheduling and combination of results from individual disks. To program Active Disks, we propose a stream-based programming model which allows disklets to be executed efficiently and safely. Simulation results for a suite of six algorithms from three application domains (commercial data warehouses, image processing and satellite data processing) indicate that for these algorithms, Active Disks outperform conventional-disk architectures.

402 citations


Proceedings ArticleDOI
01 Jun 1998
TL;DR: This work comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point and concludes that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two.
Abstract: Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability.

396 citations


Proceedings ArticleDOI
06 Jan 1998
TL;DR: A graphical conceptual model for data warehouses, called Dimensional Fact model, is presented and a semi-automated methodology to build it from the pre-existing entity/relationship schemes describing a database is proposed.
Abstract: Data warehousing systems enable enterprise managers to acquire and integrate information from heterogeneous sources and to query very large databases efficiently. Building a data warehouse requires adopting design and implementation techniques completely different from those underlying information systems. We present a graphical conceptual model for data warehouses, called Dimensional Fact model, and propose a semi-automated methodology to build it from the pre-existing entity/relationship schemes describing a database. Our conceptual model consists of tree-structured fact schemes whose basic elements are facts, attributes, dimensions and hierarchies; other features which may be represented on fact schemes are the additivity of fact attributes along dimensions, the optionality of dimension attributes and the existence of non-dimension attributes. Compatible fact schemes may be overlapped in order to relate and compare data. Fact schemes may be integrated with information of the conjectured workload, expressed in terms of query patterns, to be used as the input of a design phase whose output are the logical and physical schemes of the data warehouse.

353 citations


Book ChapterDOI
19 Nov 1998
TL;DR: The Multidimensional Entity Relationship (ME/R) model as mentioned in this paper is a specialization of the E/R model that allows the representation of the multidimensional data view inherent to OLAP, namely the separation of qualifying and quantifying data and complex structure of dimensions.
Abstract: Multidimensional data modeling plays a key role in the design of a data warehouse. We argue that the Entity Relationship Model is not suited for multidimensional conceptual modeling because the semantics of the main characteristics of the paradigm cannot be adequately represented. Consequently, we present a specialization of the E/R model — called Multidimensional Entity Relationship (ME/R) Model. In order to express the multidimensional structure of the data we define two specialized relationship sets and a specialized entity set. The resulting ME/R model allows the adequate conceptual representation of the multidimensional data view inherent to OLAP, namely the separation of qualifying and quantifying data and the complex structure of dimensions. We demonstrate the usability of the ME/R model by an example taken from an actual project dealing with the analysis of vehicle repairs.

289 citations


Proceedings Article
02 Jun 1998
TL;DR: This work presents a novel approach to conceptual modeling for Information Integration, which allows for suitably modeling the global concepts of the application, the individual information sources, and the constraints among different sources.
Abstract: Information Integration is one of the core problems in distributed databases, cooperative information systems, and data warehousing, which are key areas in the software development industry. Two critical factors for the design and maintenance of applications requiring Information Integration are conceptual modeling of the domain, and reasoning support over the conceptual representation. We demonstrate that Knowledge Representation and Reasoning techniques can play an important role for both of these factors, by proposing a Description Logic based framework for Information Integration. We show that the development of successful Information Integration solutions requires not only to resort to very expressive Description Logics, but also to significantly extend them. We present a novel approach to conceptual modeling for Information Integration, which allows for suitably modeling the global concepts of the application, the individual information sources, and the constraints among different sources. Moreover, we devise inference procedures for the fundamental reasoning services, namely relation and concept subsumption, and query containment. Finally, we present a methodological framework for Information Integration, which can be applied in several contexts, and highlights the role of reasoning services within the design process.

288 citations


Patent
18 Nov 1998
TL;DR: In this article, a system and method for the collection of marketing data which simultaneously captures, at a point-of-sale (120), data pertaining to a specific customer transaction is presented.
Abstract: A system and method for the collection of marketing data which simultaneously captures, at a point-of-sale (120), data pertaining to a specific customer transaction. An electronic invoice, containing line item data, and identified by the payment vehicle number, is created and transmitted to a credit authorization location (165) for credit authorization. The credit authorization location forwards the invoice to a data warehouse (185), which may be located in a location remote from the credit authorization location. The data warehouse comprises related data structures to facilitate analysis and searching of collected data.

286 citations


Book
01 Jun 1998
TL;DR: The Data Warehouse Toolkit as discussed by the authors is a data warehouse toolkit for planning, designing, developing, deploying, and growing data marts and data warehouses, with a focus on data warehouse deployment.
Abstract: From the Publisher: In The Data Warehouse Toolkit, Ralph Kimball showed you how to use dimensional modeling to design effective and usable data warehouses. Now, he carries these techniques to the larger issues of delivering complete data marts and data warehouses. Drawing upon their experiences with numerous data warehouse implementations, he and his coauthors show you all the practical details involved in planning, designing, developing, deploying, and growing data warehouses.

Patent
27 Apr 1998
TL;DR: In this paper, a data replication component automatically creates the different subcomponents of the request by accessing various links stored by the repository tool and displays a visual representation of the sub-components and their relationships to each other to the administrator.
Abstract: A method and system for facilitating the creation of warehouse requests in a data warehouse system. During the design of the data warehouse tables, a repository tool is used for storing a number of new objects such as source and target databases, source and target tables and warehouse requests that are graphically defined and linked together by an administrator with the repository tool. The resulting visual design is so drawn so as to serve as input for each warehouse request to be generated. The administrator invokes a data replication component that operatively couples to the repository tool signaling that the warehouse request is to be implemented. The data replication component automatically creates the different subcomponents of the request by accessing various links stored by the repository tool and displays a visual representation of the subcomponents and their relationships to each other to the administrator. Thereafter, the replication component provides access to menu screens for enabling the administrator to visualize each of the subcomponents of the request and their properties for enabling modifications to be made to such subcomponents for completing configuration of all request subcomponents. Subsequently, the warehouse request can be scheduled to execute and populate the warehouse tables.

Patent
25 Sep 1998
TL;DR: In this paper, a data warehousing infrastructure for telecommunications priced call detail data is integrated with a Web/Internet based reporting system providing a common GUI enabling the requesting, customizing, scheduling and viewing of various types of priced call details data reports.
Abstract: A data warehousing infrastructure for telecommunications priced call detail data is integrated with a Web/Internet based reporting system providing a common GUI enabling the requesting, customizing, scheduling and viewing of various types of priced call detail data reports. Such an infrastructure performs an extraction process to obtain only those billing detail records of entitled customers, and a harvesting process for transforming the billing records into a star schema format for storage in one or more operational data storage devices. The system is integrated with a database server supporting expedient and accurate access to the customer's telecommunications priced call detail data for priced call detail data report generation.

Proceedings ArticleDOI
01 Nov 1998
TL;DR: This paper outlines a general methodological framework for data warehouse design, based on the Dimensional Fact Model (DFM), which suggests that conceptual design is carried out semi-automatically starting from the operational database scheme.
Abstract: Though designing a data warehouse requires techniques completely different from those adopted for operational systems, no significant effort has been made so far to develop a complete and consistent design methodology for data warehouses. In this paper we outline a general methodological framework for data warehouse design, based on our Dimensional Fact Model (DFM). After analyzing the existing information system and collecting the user requirements, conceptual design is carried out semi-automatically starting from the operational database scheme. A workload is then characterized in terms of data volumes and expected queries, to be used as the input of the logical and physical design phases whose output is the final scheme for the data warehouse.

Book ChapterDOI
01 Jan 1998
TL;DR: OLAP mining is a mechanism which integrates on-line analytical processing with data mining so that mining can be performed in different portions of databases or data warehouses and at different levels of abstraction at user’s finger tips.
Abstract: OLAP mining is a mechanism which integrates on-line analytical processing (OLAP) with data mining so that mining can be performed in different portions of databases or data warehouses and at different levels of abstraction at user’s finger tips. With rapid developments of data warehouse and OLAP technologies in database industry, it is promising to develop OLAP mining mechanisms.

Journal ArticleDOI
01 Mar 1998
TL;DR: In this article, a data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses, including characterization, comparison, association, classification, prediction, and clustering.
Abstract: Great efforts have been paid in the Intelligent Database Systems Research Lab for the research and development of efficient data mining methods and construction of on-line analytical data mining systems.Our work has been focused on the integration of data mining and OLAP technologies and the development of scalable, integrated, and multiple data mining functions. A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data mining functions, including characterization, comparison, association, classification, prediction, and clustering. It also builds up a user-friendly, interactive data mining environment and a set of knowledge visualization tools. In-depth research has been performed on the efficiency and scalability of data mining methods. Moreover, the research has been extended to spatial data mining, multimedia data mining, text mining, and Web mining with several new data mining system prototypes constructed or under construction, including GeoMiner, MultiMediaMiner, and WebLogMiner.This article summarizes our research and development activities in the last several years and shares our experiences and lessons with the readers.

Book
27 Jul 1998
TL;DR: This book discusses data mining techniques, techniques and tools used in the field, and future trends in Visual Data Mining.
Abstract: DEFINING THE DATA MINING APPROACH. What is Data Mining? Understanding Data Modeling. Defining the Problems to be Solved. DATA PREPARATION AND ANALYSIS. Accessing and Preparing the Data. Visual Methods for Analyzing Data. Nonvisual Analytical Methods. ASSESSING DATA MINING TOOLS AND TECHNOLOGIES. Link Analysis Tools. Landscape Visualization Tools. Quantitative Data Mining Tools. Future Trends in Visual Data Mining. CASE STUDIES. Mapping the Human Genome. Telecommunication Services. Banking and Finance. Retail Data Mining. Financial Market Data Mining. Money Laundering and Other Financial Crimes. Appendix. What's on the CD--ROM. Index.

Book ChapterDOI
15 Apr 1998
TL;DR: A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed, and several strategies proposed, including approximation and partial materialization of the spatial objects resulted from spatial OLAP operations are proposed.
Abstract: On-line analytical processing (OLAP) has gained its popularity in database industry. With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing and on-line analytical processing of spatial data. In this paper, we study methods for spatial OLAP, by integration of nonspatial on-line analytical processing (OLAP) methods with spatial database implementation techniques. A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed. Methods for computation of spatial data cubes and analytical processing on such spatial data cubes are studied, with several strategies proposed, including approximation and partial materialization of the spatial objects resulted from spatial OLAP operations. Some techniques for selective materialization of the spatial computation results are worked out, and the performance study has demonstrated the effectiveness of these techniques.

Patent
15 Jul 1998
TL;DR: In this paper, a computer software architecture is proposed to automatically optimize the throughput of the data extraction/transformation/loading (ETL) process in data warehousing applications. This architecture has a componentized aspect and a pipeline-based aspect, where each transformation component automatically stages or streams its data to optimize ETL throughput.
Abstract: A computer software architecture to automatically optimize the throughput of the data extraction/transformation/loading (ETL) process in data warehousing applications. This architecture has a componentized aspect and a pipeline-based aspect. The componentized aspect refers to the fact that every transformation used in this architecture is built up with transformation components selected from an extensible set of transformation components. Besides simplifying source code maintenance and adjustment for the data warehouse users, these transformation components also provide these users the building blocks to effectively construct pertinent and functionally sophisticated transformations in a pipelined manner. Within a pipeline, each transformation component automatically stages or streams its data to optimize ETL throughput. Furthermore, each transformation either pushes data to another transformation component, pulls data from another transformation component, or performs a push/pull operation on the data. Thereby, the pipelining; staging/streaming; and pushing/pulling features of the transformation components effectively optimizes the throughput of the ETL process.

Book
01 Jan 1998
TL;DR: Decision Support in the Data Warehouse demystifies data warehousing's technical jargon and provides a complete framework for building, maintaining, and using a data warehouse for decision support.
Abstract: From the Publisher: Decision Support in the Data Warehouse demystifies data warehousing's technical jargon and provides a complete framework for building, maintaining, and using a data warehouse for decision support. This is the first book that integrates building and operating a data warehouse; developing decision support applications using the warehouse; and using the right warehouse tools. The book clearly describes the business and technical issues important to data warehousing success. They are brought to life with up-to-the-minute case studies drawn from today's leading organizations. Learn how to have a strategic business impact with your warehouse.

Journal ArticleDOI
01 Mar 1998
TL;DR: In this article, the authors summarize the versatility of relational views and their potential in a data warehouse, a redundant collection of data replicated from several possibly distributed and loosely coupled source databases, organized to answer OLAP queries, using both as a specification technique and as an execution plan for the derivation of the warehouse data.
Abstract: A data warehouse is a redundant collection of data replicated from several possibly distributed and loosely coupled source databases, organized to answer OLAP queries. Relational views are used both as a specification technique and as an execution plan for the derivation of the warehouse data. In this position paper, we summarize the versatility of relational views and their potential.

Patent
10 Feb 1998
TL;DR: In this paper, a scheme for automatic data conversion definition generation based on data feature such as a decision tree or a statistical feature is presented, so as to enable a quick data analysis in a visual multidimensional data analysis tool.
Abstract: A scheme for automatic data conversion definition generation based on data feature such as a decision tree or a statistical feature, so as to enable a quick data analysis in a visual multidimensional data analysis tool. In an apparatus for converting data stored in database or files into graphic data according to the data conversion definition and displaying the graphic data, a definition generation assistance device for automatically generating the data conversion definition is provided, where the definition generation assistance device extracts a data feature of the data from a scheme and contents of the database or files, and automatically generates the data conversion definition according to the extracted data feature, the data conversion definition being formed by an attribute mapping definition defining combinations of data attributes and graphic data parameters and a data conversion method definition defining a method for converting a value of each data attribute into a value of a corresponding graphic data parameter.

Proceedings Article
24 Aug 1998
TL;DR: It is shown that for nearly all types of database updates, it is more efficient to apply the incremental maintenance algorithm to the view than to recompute the view from the database, even when there are thousands of updates.
Abstract: Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web. Views over semistructured data can be used to filter the data and to restructure (or provide structure to) it. To achieve fast query response time, these views are often materialized. This paper proposes an incremental maintenance algorithm for materialized views over semistructured data. We use the graph-based data model OEM and the query language Lorel, developed at Stanford, as the framework for our work. our algorithm produces a set of queries that compute the updates to the view based upon an update of the source. We develop an analytic cost model and compare the cost of executing our incremental maintenance algorithm to that of recomputing the view. We show that for nearly all types of database updates, it is more efficient to apply our incremental maintenance algorithm to the view than to recompute the view from the database, even when there are thousands of updates.

Book
07 Jan 1998
TL;DR: In this paper, Inmon, Claudia Imhoff, and Ryan Sousa introduce a practical and proven framework that shows companies how to leverage these solutions to build a company-wide information ecosystem.
Abstract: From the Publisher: From traditional data warehousing to data marts and operational data stores, a dizzying array of architectures and tools are now available to help enterprises strategically use and manage information. Each has its unique costs and benefits associated with delivering value to the business. But, despite all the hype, not all solutions are equally well suited to every company's needs. In Corporate Information Factory, Bill Inmon, Claudia Imhoff, and Ryan Sousa introduce a practical and proven framework that shows companies how to leverage these solutions to build a company-wide information ecosystem.

Journal ArticleDOI
TL;DR: The authors illustrate their approach to the attribute-centric query problem with ACT/DB, a database for managing clinical trials data, based on metadata supporting a query front end that essentially hides the EAV/non-EAV nature of individual attributes from the user.

Proceedings ArticleDOI
01 Jun 1998
TL;DR: This work considers a restricted class of higher order views and shows the power of these views in integrating legacy structures and gives conditions under which a higher order view is usable for answering a query and provides query translation algorithms.
Abstract: Schematic heterogeneity arises when information that is represented as data under one schema, is represented within the schema (as metadata) in another. Schematic heterogeneity is an important class of heterogeneity that arises frequently in integrating legacy data in federated or data warehousing applications. Traditional query languages and view mechanisms are insufficient for reconciling and translating data between schematically heterogeneous schemas. Higher order query languages, that permit quantification over schema labels, have been proposed to permit querying and restructuring of data between schematically disparate schemas. We extend this work by considering how these languages can be used in practice. Specifically, we consider a restricted class of higher order views and show the power of these views in integrating legacy structures. Our results provide insights into the properties of restructuring transformations required to resolve schematic discrepancies. In addition, we show how the use of these views permits schema browsing and new forms of data independence that are important for global information systems. Furthermore, these views provide a framework for integrating semi-structured and unstructured queries, such as keyword searches, into a structured querying environment. We show how these views can be used with minimal extensions to existing query engines. We give conditions under which a higher order view is usable for answering a query and provide query translation algorithms.

Journal ArticleDOI
Sung Ho Ha1, Sang Chan Park1
TL;DR: This paper presents the data mining process from data extraction to knowledge interpretation and data mining tasks, and corresponding algorithms, and proposes a new marketing strategy that fully utilizes the knowledge resulting from data mining.
Abstract: Data mining, which is also referred to as knowledge discovery in databases, is the process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions. In this paper, we present the data mining process from data extraction to knowledge interpretation and data mining tasks, and corresponding algorithms. Before applying data mining techniques to a real-world application, we build a data mart on the enterprise Intranet. RFM (recency, frequency, and monetary) data extracted from the data mart are used extensively for our analysis. We then propose a new marketing strategy that fully utilizes the knowledge resulting from data mining.

Proceedings Article
30 Nov 1998
TL;DR: A prototype for mining high-level multimedia information and knowledge from large multimedia databases, and the mining of multiple kinds of knowledge, including summarization, classification, and association, in image and video databases is implemented.
Abstract: Data Mining is a young but flourishing field. Many algorithms and applications exist to mine different types of data and extract different types of knowledge. Mining multimedia data is, however, at an experimental stage.We have implemented a prototype for mining high-level multimedia information and knowledge from large multimedia databases. MultiMedia Miner has been designed based on our years of experience in the research and development of a relational data mining system, DBMiner, in the Intelligent Database Systems Research Laboratory, and a Content-Based Image Retrieval system from Digital Libraries, C-BIRD, in the Vision and Media Laboratory.MultiMediaMiner includes the construction of multimedia data cubes which facilitate multiple dimensional analysis of multimedia data, and the mining of multiple kinds of knowledge, including summarization, classification, and association, in image and video databases. The images and video clips used in our experiments are collected by crawling the WWW. Many challenges have yet to be overcome, such as the large number of dimensions, and the existence of multi-valued dimensions.

Proceedings ArticleDOI
23 Feb 1998
TL;DR: This work defines simple views and materialized views for such graph structured data, analyzing options for representing record identity and references in the view and develops incremental maintenance algorithms for these views.
Abstract: Studies the problem of maintaining materialized views of graph structured data. The base data consists of records containing identifiers of other records. The data could represent traditional objects (with methods, attributes and a class hierarchy), but it could also represent a lower-level data structure. We define simple views and materialized views for such graph structured data, analyzing options for representing record identity and references in the view. We develop incremental maintenance algorithms for these views.

Proceedings ArticleDOI
26 Aug 1998
TL;DR: This work lists requirements that a formal model and a corresponding query language must fulfill to be suitable for OLAP and discusses four approaches that come closest to these requirements, thus providing a systematic overview.
Abstract: Multidimensional database technology is becoming more and more important in conjunction with data warehouses and OLAP analysis. What is still lacking is a commonly accepted formal foundation. Such a model can serve as a basis for future research and standardization. Recently a multitude of interesting proposals on this topic have been published. OLAP applications have some special requirements that do not apply to other areas of multidimensional analysis (e.g. GIS, PACS). We list requirements that a formal model and a corresponding query language must fulfill to be suitable for OLAP. We compare four approaches that come closest to our requirements. After a brief description we discuss their suitability as a formal foundation for OLAP, thus providing a systematic overview. Finally, we propose directions for further research.