Showing papers on "Data warehouse published in 1998"

PDF

Open Access

Journal Article•DOI•

[...]

01 Sep 1998-Communications of The ACM

TL;DR: Y the authors' company decides to build a data warehouse and you are designated the project manager, and you have specific questions that need specific answers, and building a data Warehouse is an extremely complex process.

...read moreread less

Abstract: Y our company decides to build a data warehouse and you are designated the project manager. What are your first steps? You’ve read the books, attended the conferences, and perused the trade publications. Now you have to act. There are numerous vendors, all touting the wonders of their products, but you have specific questions that need specific answers, and building a data warehouse is an extremely complex process. Questions you have to weigh fall into the following general categories:

...read moreread less

1,272 citations

Book•

The Data Warehouse Lifecycle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses

[...]

Ralph Kimball

01 Jan 1998

TL;DR: Drawing upon their experiences with numerous data warehouse implementations, Ralph Kimball and his coauthors show you all the practical details involved in planning, designing, developing, deploying, and growing data warehouses.

...read moreread less

Abstract: The Chess Pieces. PROJECT MANAGEMENT AND REQUIREMENTS. The Business Dimensional Lifecycle. Project Planning and Management. Collecting the Requirements. DATA DESIGN. A First Course on Dimensional Modeling. A Graduate Course on Dimensional Modeling. Building Dimensional Models. ARCHITECTURE. Introducing Data Warehouse Architecture. Back Room Technical Architecture. Architecture for the Front Room. Infrastructure and Metadata. A Graduate Course on the Internet and Security. Creating the Architecture Plan and Selecting Products. IMPLEMENTATION. A Graduate Course on Aggregates. Completing the Physical Design. Data Staging. Building End User Applications. DEPLOYMENT AND GROWTH. Planning the Deployment. Maintaining and Growing the Data Warehouse. Appendices. Index.

...read moreread less

547 citations

Proceedings Article•

Incremental Clustering for Mining in a Data Warehousing Environment

[...]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Michael Wimmer, Xiaowei Xu¹ - Show less +1 more•Institutions (1)

Ludwig Maximilian University of Munich¹

24 Aug 1998

TL;DR: It can be proven that the incremental algorithm yields the same result as DBSCAN, which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database.

...read moreread less

Abstract: Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, it is highly desirable to perform these updates incrementally. In this paper, we present the first incremental clustering algorithm. Our algorithm is based on the clustering algorithm DBSCAN which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database. Due to the density-based nature of DBSCAN, the insertion or deletion of an object affects the current clustering only in the neighborhood of this object. Thus, efficient algorithms can be given for incremental insertions and deletions to an existing clustering. Based on the formal definition of clusters, it can be proven that the incremental algorithm yields the same result as DBSCAN. A performance evaluation of IncrementalDBSCAN on a spatial database as well as on a WWW-log database is presented, demonstrating the efficiency of the proposed algorithm. IncrementalDBSCAN yields significant speed-up factors over DBSCAN even for large numbers of daily updates in a data warehouse.

...read moreread less

538 citations

Proceedings Article•DOI•

New sampling-based summary statistics for improving approximate query answers

[...]

Phillip B. Gibbons¹, Yossi Matias²•Institutions (2)

Bell Labs¹, Tel Aviv University²

01 Jun 1998

TL;DR: This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution, and considers their application to providing fast approximate answers to hot list queries.

...read moreread less

Abstract: In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.

...read moreread less

515 citations

Journal Article•DOI•

Active disks: programming model, algorithms and evaluation

[...]

Anurag Acharya¹, Mustafa Uysal², Joel H. Saltz²•Institutions (2)

University of California, Santa Barbara¹, University of Maryland, College Park²

01 Oct 1998

TL;DR: This paper evaluates Active Disk architectures which integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk.

...read moreread less

Abstract: Several application and technology trends indicate that it might be both profitable and feasible to move computation closer to the data that it processes. In this paper, we evaluate Active Disk architectures which integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk. The key idea is to offload bulk of the processing to the diskresident processors and to use the host processor primarily for coordination, scheduling and combination of results from individual disks. To program Active Disks, we propose a stream-based programming model which allows disklets to be executed efficiently and safely. Simulation results for a suite of six algorithms from three application domains (commercial data warehouses, image processing and satellite data processing) indicate that for these algorithms, Active Disks outperform conventional-disk architectures.

...read moreread less

402 citations

Proceedings Article•DOI•

Integrating association rule mining with relational database systems: alternatives and implications

[...]

Sunita Sarawagi¹, Shiby Thomas², Rakesh Agrawal¹•Institutions (2)

IBM¹, University of Florida²

01 Jun 1998

TL;DR: This work comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point and concludes that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two.

...read moreread less

Abstract: Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability.

...read moreread less

396 citations

Proceedings Article•DOI•

Conceptual design of data warehouses from E/R schemes

[...]

Matteo Golfarelli¹, Dario Maio¹, Stefano Rizzi¹•Institutions (1)

University of Bologna¹

06 Jan 1998

TL;DR: A graphical conceptual model for data warehouses, called Dimensional Fact model, is presented and a semi-automated methodology to build it from the pre-existing entity/relationship schemes describing a database is proposed.

...read moreread less

Abstract: Data warehousing systems enable enterprise managers to acquire and integrate information from heterogeneous sources and to query very large databases efficiently. Building a data warehouse requires adopting design and implementation techniques completely different from those underlying information systems. We present a graphical conceptual model for data warehouses, called Dimensional Fact model, and propose a semi-automated methodology to build it from the pre-existing entity/relationship schemes describing a database. Our conceptual model consists of tree-structured fact schemes whose basic elements are facts, attributes, dimensions and hierarchies; other features which may be represented on fact schemes are the additivity of fact attributes along dimensions, the optionality of dimension attributes and the existence of non-dimension attributes. Compatible fact schemes may be overlapped in order to relate and compare data. Fact schemes may be integrated with information of the conjectured workload, expressed in terms of query patterns, to be used as the input of a design phase whose output are the logical and physical schemes of the data warehouse.

...read moreread less

353 citations

Book Chapter•DOI•

Extending the E/R Model for the Multidimensional Paradigm

[...]

Carsten Sapia, Markus Blaschka, Gabriele Höfling, Barbara Dinter

19 Nov 1998

TL;DR: The Multidimensional Entity Relationship (ME/R) model as mentioned in this paper is a specialization of the E/R model that allows the representation of the multidimensional data view inherent to OLAP, namely the separation of qualifying and quantifying data and complex structure of dimensions.

...read moreread less

Abstract: Multidimensional data modeling plays a key role in the design of a data warehouse. We argue that the Entity Relationship Model is not suited for multidimensional conceptual modeling because the semantics of the main characteristics of the paradigm cannot be adequately represented. Consequently, we present a specialization of the E/R model — called Multidimensional Entity Relationship (ME/R) Model. In order to express the multidimensional structure of the data we define two specialized relationship sets and a specialized entity set. The resulting ME/R model allows the adequate conceptual representation of the multidimensional data view inherent to OLAP, namely the separation of qualifying and quantifying data and the complex structure of dimensions. We demonstrate the usability of the ME/R model by an example taken from an actual project dealing with the analysis of vehicle repairs.

...read moreread less

289 citations

Proceedings Article•

Description logic framework for information integration

[...]

Diego Calvanese¹, Giuseppe De Giacomo¹, Maurizio Lenzerini¹, Daniele Nardi¹, Riccardo Rosati¹ - Show less +1 more•Institutions (1)

Sapienza University of Rome¹

02 Jun 1998

TL;DR: This work presents a novel approach to conceptual modeling for Information Integration, which allows for suitably modeling the global concepts of the application, the individual information sources, and the constraints among different sources.

...read moreread less

Abstract: Information Integration is one of the core problems in distributed databases, cooperative information systems, and data warehousing, which are key areas in the software development industry. Two critical factors for the design and maintenance of applications requiring Information Integration are conceptual modeling of the domain, and reasoning support over the conceptual representation. We demonstrate that Knowledge Representation and Reasoning techniques can play an important role for both of these factors, by proposing a Description Logic based framework for Information Integration. We show that the development of successful Information Integration solutions requires not only to resort to very expressive Description Logics, but also to significantly extend them. We present a novel approach to conceptual modeling for Information Integration, which allows for suitably modeling the global concepts of the application, the individual information sources, and the constraints among different sources. Moreover, we devise inference procedures for the fundamental reasoning services, namely relation and concept subsumption, and query containment. Finally, we present a methodological framework for Information Integration, which can be applied in several contexts, and highlights the role of reasoning services within the design process.

...read moreread less

288 citations

Patent•

Method and system for collecting and processing marketing data

[...]

John Riordan, Bruce Morehouse

18 Nov 1998

TL;DR: In this article, a system and method for the collection of marketing data which simultaneously captures, at a point-of-sale (120), data pertaining to a specific customer transaction is presented.

...read moreread less

Abstract: A system and method for the collection of marketing data which simultaneously captures, at a point-of-sale (120), data pertaining to a specific customer transaction. An electronic invoice, containing line item data, and identified by the payment vehicle number, is created and transmitted to a credit authorization location (165) for credit authorization. The credit authorization location forwards the invoice to a data warehouse (185), which may be located in a location remote from the credit authorization location. The data warehouse comprises related data structures to facilitate analysis and searching of collected data.

...read moreread less

286 citations

Book•

The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom

[...]

Ralph Kimball, Laura Reeves, Warren Thornthwaite, Margy Ross, Warren Thornwaite - Show less +1 more

01 Jun 1998

TL;DR: The Data Warehouse Toolkit as discussed by the authors is a data warehouse toolkit for planning, designing, developing, deploying, and growing data marts and data warehouses, with a focus on data warehouse deployment.

...read moreread less

Abstract: From the Publisher: In The Data Warehouse Toolkit, Ralph Kimball showed you how to use dimensional modeling to design effective and usable data warehouses. Now, he carries these techniques to the larger issues of delivering complete data marts and data warehouses. Drawing upon their experiences with numerous data warehouse implementations, he and his coauthors show you all the practical details involved in planning, designing, developing, deploying, and growing data warehouses.

...read moreread less

Patent•

Method and apparatus for automatically populating a data warehouse system

[...]

Kenneth R. Rosensteel, Jerry T. Guhr, Joseph K. Picone

27 Apr 1998

TL;DR: In this paper, a data replication component automatically creates the different subcomponents of the request by accessing various links stored by the repository tool and displays a visual representation of the sub-components and their relationships to each other to the administrator.

...read moreread less

Abstract: A method and system for facilitating the creation of warehouse requests in a data warehouse system. During the design of the data warehouse tables, a repository tool is used for storing a number of new objects such as source and target databases, source and target tables and warehouse requests that are graphically defined and linked together by an administrator with the repository tool. The resulting visual design is so drawn so as to serve as input for each warehouse request to be generated. The administrator invokes a data replication component that operatively couples to the repository tool signaling that the warehouse request is to be implemented. The data replication component automatically creates the different subcomponents of the request by accessing various links stored by the repository tool and displays a visual representation of the subcomponents and their relationships to each other to the administrator. Thereafter, the replication component provides access to menu screens for enabling the administrator to visualize each of the subcomponents of the request and their properties for enabling modifications to be made to such subcomponents for completing configuration of all request subcomponents. Subsequently, the warehouse request can be scheduled to execute and populate the warehouse tables.

...read moreread less

Patent•

Data warehousing infrastructure for web based reporting tool

[...]

Andre R. Brandt, Barbara Frueh, Sajan J. Pillai, Karl Rehder, Donald J. Shearer - Show less +1 more

25 Sep 1998

TL;DR: In this paper, a data warehousing infrastructure for telecommunications priced call detail data is integrated with a Web/Internet based reporting system providing a common GUI enabling the requesting, customizing, scheduling and viewing of various types of priced call details data reports.

...read moreread less

Abstract: A data warehousing infrastructure for telecommunications priced call detail data is integrated with a Web/Internet based reporting system providing a common GUI enabling the requesting, customizing, scheduling and viewing of various types of priced call detail data reports. Such an infrastructure performs an extraction process to obtain only those billing detail records of entitled customers, and a harvesting process for transforming the billing records into a star schema format for storage in one or more operational data storage devices. The system is integrated with a database server supporting expedient and accurate access to the customer's telecommunications priced call detail data for priced call detail data report generation.

...read moreread less

Proceedings Article•DOI•

A methodological framework for data warehouse design

[...]

Matteo Golfarelli¹, Stefano Rizzi¹•Institutions (1)

University of Bologna¹

01 Nov 1998

TL;DR: This paper outlines a general methodological framework for data warehouse design, based on the Dimensional Fact Model (DFM), which suggests that conceptual design is carried out semi-automatically starting from the operational database scheme.

...read moreread less

Abstract: Though designing a data warehouse requires techniques completely different from those adopted for operational systems, no significant effort has been made so far to develop a complete and consistent design methodology for data warehouses. In this paper we outline a general methodological framework for data warehouse design, based on our Dimensional Fact Model (DFM). After analyzing the existing information system and collecting the user requirements, conceptual design is carried out semi-automatically starting from the operational database scheme. A workload is then characterized in terms of data volumes and expected queries, to be used as the input of the logical and physical design phases whose output is the final scheme for the data warehouse.

...read moreread less

Book Chapter•DOI•

OLAP Mining: An Integration of OLAP with Data Mining

[...]

Jiawei Han¹•Institutions (1)

Simon Fraser University¹

01 Jan 1998

TL;DR: OLAP mining is a mechanism which integrates on-line analytical processing with data mining so that mining can be performed in different portions of databases or data warehouses and at different levels of abstraction at user’s finger tips.

...read moreread less

Abstract: OLAP mining is a mechanism which integrates on-line analytical processing (OLAP) with data mining so that mining can be performed in different portions of databases or data warehouses and at different levels of abstraction at user’s finger tips. With rapid developments of data warehouse and OLAP technologies in database industry, it is promising to develop OLAP mining mechanisms.

...read moreread less

Journal Article•DOI•

Towards on-line analytical mining in large databases

[...]

Jiawei Han¹•Institutions (1)

University of British Columbia¹

01 Mar 1998

TL;DR: In this article, a data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses, including characterization, comparison, association, classification, prediction, and clustering.

...read moreread less

Abstract: Great efforts have been paid in the Intelligent Database Systems Research Lab for the research and development of efficient data mining methods and construction of on-line analytical data mining systems.Our work has been focused on the integration of data mining and OLAP technologies and the development of scalable, integrated, and multiple data mining functions. A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data mining functions, including characterization, comparison, association, classification, prediction, and clustering. It also builds up a user-friendly, interactive data mining environment and a set of knowledge visualization tools. In-depth research has been performed on the efficiency and scalability of data mining methods. Moreover, the research has been extended to spatial data mining, multimedia data mining, text mining, and Web mining with several new data mining system prototypes constructed or under construction, including GeoMiner, MultiMediaMiner, and WebLogMiner.This article summarizes our research and development activities in the last several years and shares our experiences and lessons with the readers.

...read moreread less

Book•

Data Mining Solutions: Methods and Tools for Solving Real-World Problems

[...]

Christopher Westphal¹, Teresa A. Blaxton¹•Institutions (1)

University of Maryland, Baltimore¹

27 Jul 1998

TL;DR: This book discusses data mining techniques, techniques and tools used in the field, and future trends in Visual Data Mining.

...read moreread less

Abstract: DEFINING THE DATA MINING APPROACH. What is Data Mining? Understanding Data Modeling. Defining the Problems to be Solved. DATA PREPARATION AND ANALYSIS. Accessing and Preparing the Data. Visual Methods for Analyzing Data. Nonvisual Analytical Methods. ASSESSING DATA MINING TOOLS AND TECHNOLOGIES. Link Analysis Tools. Landscape Visualization Tools. Quantitative Data Mining Tools. Future Trends in Visual Data Mining. CASE STUDIES. Mapping the Human Genome. Telecommunication Services. Banking and Finance. Retail Data Mining. Financial Market Data Mining. Money Laundering and Other Financial Crimes. Appendix. What's on the CD--ROM. Index.

...read moreread less

Book Chapter•DOI•

Selective Materialization: An Efficient Method for Spatial Data Cube Construction

[...]

Jiawei Han¹, Nebojsa Stefanovic¹, Krzysztof Koperski¹•Institutions (1)

Simon Fraser University¹

15 Apr 1998

TL;DR: A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed, and several strategies proposed, including approximation and partial materialization of the spatial objects resulted from spatial OLAP operations are proposed.

...read moreread less

Abstract: On-line analytical processing (OLAP) has gained its popularity in database industry. With a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for spatial data warehousing and on-line analytical processing of spatial data. In this paper, we study methods for spatial OLAP, by integration of nonspatial on-line analytical processing (OLAP) methods with spatial database implementation techniques. A spatial data warehouse model, which consists of both spatial and nonspatial dimensions and measures, is proposed. Methods for computation of spatial data cubes and analytical processing on such spatial data cubes are studied, with several strategies proposed, including approximation and partial materialization of the spatial objects resulted from spatial OLAP operations. Some techniques for selective materialization of the spatial computation results are worked out, and the performance study has demonstrated the effectiveness of these techniques.

...read moreread less

Patent•

Method and architecture for automated optimization of etl throughput in data warehousing applications

[...]

Sankaran Suresh, Jyotindra Pramathnath Gautam, Girish Pancha, Frank Joseph DeRose, Mohan Sankaran - Show less +1 more

15 Jul 1998

TL;DR: In this paper, a computer software architecture is proposed to automatically optimize the throughput of the data extraction/transformation/loading (ETL) process in data warehousing applications. This architecture has a componentized aspect and a pipeline-based aspect, where each transformation component automatically stages or streams its data to optimize ETL throughput.

...read moreread less

Abstract: A computer software architecture to automatically optimize the throughput of the data extraction/transformation/loading (ETL) process in data warehousing applications. This architecture has a componentized aspect and a pipeline-based aspect. The componentized aspect refers to the fact that every transformation used in this architecture is built up with transformation components selected from an extensible set of transformation components. Besides simplifying source code maintenance and adjustment for the data warehouse users, these transformation components also provide these users the building blocks to effectively construct pertinent and functionally sophisticated transformations in a pipelined manner. Within a pipeline, each transformation component automatically stages or streams its data to optimize ETL throughput. Furthermore, each transformation either pushes data to another transformation component, pulls data from another transformation component, or performs a push/pull operation on the data. Thereby, the pipelining; staging/streaming; and pushing/pulling features of the transformation components effectively optimizes the throughput of the ETL process.

...read moreread less

Book•

Decision support in the data warehouse

[...]

Hugh J. Watson, Paul Gray

01 Jan 1998

TL;DR: Decision Support in the Data Warehouse demystifies data warehousing's technical jargon and provides a complete framework for building, maintaining, and using a data warehouse for decision support.

...read moreread less

Abstract: From the Publisher: Decision Support in the Data Warehouse demystifies data warehousing's technical jargon and provides a complete framework for building, maintaining, and using a data warehouse for decision support. This is the first book that integrates building and operating a data warehouse; developing decision support applications using the warehouse; and using the right warehouse tools. The book clearly describes the business and technical issues important to data warehousing success. They are brought to life with up-to-the-minute case studies drawn from today's leading organizations. Learn how to have a strategic business impact with your warehouse.

...read moreread less

Journal Article•DOI•

Materialized views and data warehouses

[...]

Nick Roussopoulos¹•Institutions (1)

University of Maryland, College Park¹

01 Mar 1998

TL;DR: In this article, the authors summarize the versatility of relational views and their potential in a data warehouse, a redundant collection of data replicated from several possibly distributed and loosely coupled source databases, organized to answer OLAP queries, using both as a specification technique and as an execution plan for the derivation of the warehouse data.

...read moreread less

Abstract: A data warehouse is a redundant collection of data replicated from several possibly distributed and loosely coupled source databases, organized to answer OLAP queries. Relational views are used both as a specification technique and as an execution plan for the derivation of the warehouse data. In this position paper, we summarize the versatility of relational views and their potential.

...read moreread less

Patent•

Scheme for automatic data conversion definition generation according to data feature in visual multidimensional data analysis tool

[...]

Tetsuya Iizuka¹, Yuichi Iizuka¹, Seiji Isobe¹, Kiyoshi Nippon Telegraph¹, Telephone Kurokawa¹, Hisako Shiohara¹ - Show less +2 more•Institutions (1)

Nippon Telegraph and Telephone¹

10 Feb 1998

TL;DR: In this paper, a scheme for automatic data conversion definition generation based on data feature such as a decision tree or a statistical feature is presented, so as to enable a quick data analysis in a visual multidimensional data analysis tool.

...read moreread less

Abstract: A scheme for automatic data conversion definition generation based on data feature such as a decision tree or a statistical feature, so as to enable a quick data analysis in a visual multidimensional data analysis tool. In an apparatus for converting data stored in database or files into graphic data according to the data conversion definition and displaying the graphic data, a definition generation assistance device for automatically generating the data conversion definition is provided, where the definition generation assistance device extracts a data feature of the data from a scheme and contents of the database or files, and automatically generates the data conversion definition according to the extracted data feature, the data conversion definition being formed by an attribute mapping definition defining combinations of data attributes and graphic data parameters and a data conversion method definition defining a method for converting a value of each data attribute into a value of a corresponding graphic data parameter.

...read moreread less

Proceedings Article•

Incremental Maintenance for Materialized Views over Semistructured Data

[...]

Serge Abiteboul¹, Jason G. McHugh, Michael Rys, Vasilis Vassalos, Janet L. Wiener - Show less +1 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

24 Aug 1998

TL;DR: It is shown that for nearly all types of database updates, it is more efficient to apply the incremental maintenance algorithm to the view than to recompute the view from the database, even when there are thousands of updates.

...read moreread less

Abstract: Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web. Views over semistructured data can be used to filter the data and to restructure (or provide structure to) it. To achieve fast query response time, these views are often materialized. This paper proposes an incremental maintenance algorithm for materialized views over semistructured data. We use the graph-based data model OEM and the query language Lorel, developed at Stanford, as the framework for our work. our algorithm produces a set of queries that compute the updates to the view based upon an update of the source. We develop an analytic cost model and compare the cost of executing our incremental maintenance algorithm to that of recomputing the view. We show that for nearly all types of database updates, it is more efficient to apply our incremental maintenance algorithm to the view than to recompute the view from the database, even when there are thousands of updates.

...read moreread less

Book•

Corporate Information Factory

[...]

William H. Inmon, Claudia Imhoff, Ryan Sousa

07 Jan 1998

TL;DR: In this paper, Inmon, Claudia Imhoff, and Ryan Sousa introduce a practical and proven framework that shows companies how to leverage these solutions to build a company-wide information ecosystem.

...read moreread less

Abstract: From the Publisher: From traditional data warehousing to data marts and operational data stores, a dizzying array of architectures and tools are now available to help enterprises strategically use and manage information. Each has its unique costs and benefits associated with delivering value to the business. But, despite all the hype, not all solutions are equally well suited to every company's needs. In Corporate Information Factory, Bill Inmon, Claudia Imhoff, and Ryan Sousa introduce a practical and proven framework that shows companies how to leverage these solutions to build a company-wide information ecosystem.

...read moreread less

Journal Article•DOI•

Data Extraction and Ad Hoc Query of an Entity- Attribute-Value Database

[...]

Prakash M. Nadkarni¹, Cynthia Brandt¹•Institutions (1)

Yale University¹

01 Nov 1998-Journal of the American Medical Informatics Association

TL;DR: The authors illustrate their approach to the attribute-centric query problem with ACT/DB, a database for managing clinical trials data, based on metadata supporting a query front end that essentially hides the EAV/non-EAV nature of individual attributes from the user.

...read moreread less

Proceedings Article•DOI•

Using schematically heterogeneous structures

[...]

Reée J. Miller¹•Institutions (1)

Ohio State University¹

01 Jun 1998

TL;DR: This work considers a restricted class of higher order views and shows the power of these views in integrating legacy structures and gives conditions under which a higher order view is usable for answering a query and provides query translation algorithms.

...read moreread less

Abstract: Schematic heterogeneity arises when information that is represented as data under one schema, is represented within the schema (as metadata) in another. Schematic heterogeneity is an important class of heterogeneity that arises frequently in integrating legacy data in federated or data warehousing applications. Traditional query languages and view mechanisms are insufficient for reconciling and translating data between schematically heterogeneous schemas. Higher order query languages, that permit quantification over schema labels, have been proposed to permit querying and restructuring of data between schematically disparate schemas. We extend this work by considering how these languages can be used in practice. Specifically, we consider a restricted class of higher order views and show the power of these views in integrating legacy structures. Our results provide insights into the properties of restructuring transformations required to resolve schematic discrepancies. In addition, we show how the use of these views permits schema browsing and new forms of data independence that are important for global information systems. Furthermore, these views provide a framework for integrating semi-structured and unstructured queries, such as keyword searches, into a structured querying environment. We show how these views can be used with minimal extensions to existing query engines. We give conditions under which a higher order view is usable for answering a query and provide query translation algorithms.

...read moreread less

Journal Article•DOI•

Application of data mining tools to hotel data mart on the Intranet for database marketing

[...]

Sung Ho Ha¹, Sang Chan Park¹•Institutions (1)

KAIST¹

01 Jul 1998-Expert Systems With Applications

TL;DR: This paper presents the data mining process from data extraction to knowledge interpretation and data mining tasks, and corresponding algorithms, and proposes a new marketing strategy that fully utilizes the knowledge resulting from data mining.

...read moreread less

Abstract: Data mining, which is also referred to as knowledge discovery in databases, is the process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions. In this paper, we present the data mining process from data extraction to knowledge interpretation and data mining tasks, and corresponding algorithms. Before applying data mining techniques to a real-world application, we build a data mart on the enterprise Intranet. RFM (recency, frequency, and monetary) data extracted from the data mart are used extensively for our analysis. We then propose a new marketing strategy that fully utilizes the knowledge resulting from data mining.

...read moreread less

Proceedings Article•

Mining multimedia data

[...]

Osmar R. Zaïane¹, Jiawei Han¹, Ze-Nian Li¹, Jean Hou¹•Institutions (1)

Simon Fraser University¹

30 Nov 1998

TL;DR: A prototype for mining high-level multimedia information and knowledge from large multimedia databases, and the mining of multiple kinds of knowledge, including summarization, classification, and association, in image and video databases is implemented.

...read moreread less

Abstract: Data Mining is a young but flourishing field. Many algorithms and applications exist to mine different types of data and extract different types of knowledge. Mining multimedia data is, however, at an experimental stage.We have implemented a prototype for mining high-level multimedia information and knowledge from large multimedia databases. MultiMedia Miner has been designed based on our years of experience in the research and development of a relational data mining system, DBMiner, in the Intelligent Database Systems Research Laboratory, and a Content-Based Image Retrieval system from Digital Libraries, C-BIRD, in the Vision and Media Laboratory.MultiMediaMiner includes the construction of multimedia data cubes which facilitate multiple dimensional analysis of multimedia data, and the mining of multiple kinds of knowledge, including summarization, classification, and association, in image and video databases. The images and video clips used in our experiments are collected by crawling the WWW. Many challenges have yet to be overcome, such as the large number of dimensions, and the existence of multi-valued dimensions.

...read moreread less

Proceedings Article•DOI•

Graph structured views and their incremental maintenance

[...]

Yue Zhuge¹, Hector Garcia-Molina•Institutions (1)

Stanford University¹

23 Feb 1998

TL;DR: This work defines simple views and materialized views for such graph structured data, analyzing options for representing record identity and references in the view and develops incremental maintenance algorithms for these views.

...read moreread less

Abstract: Studies the problem of maintaining materialized views of graph structured data. The base data consists of records containing identifiers of other records. The data could represent traditional objects (with methods, attributes and a class hierarchy), but it could also represent a lower-level data structure. We define simple views and materialized views for such graph structured data, analyzing options for representing record identity and references in the view. We develop incremental maintenance algorithms for these views.

...read moreread less

Proceedings Article•DOI•

Finding your way through multidimensional data models

[...]

Markus Blaschka, Carsten Sapia, Gabriele Höfling, Barbara Dinter¹•Institutions (1)

Technische Universität München¹

26 Aug 1998

TL;DR: This work lists requirements that a formal model and a corresponding query language must fulfill to be suitable for OLAP and discusses four approaches that come closest to these requirements, thus providing a systematic overview.

...read moreread less

Abstract: Multidimensional database technology is becoming more and more important in conjunction with data warehouses and OLAP analysis. What is still lacking is a commonly accepted formal foundation. Such a model can serve as a basis for future research and standardization. Recently a multitude of interesting proposals on this topic have been published. OLAP applications have some special requirements that do not apply to other areas of multidimensional analysis (e.g. GIS, PACS). We list requirements that a formal model and a corresponding query language must fulfill to be suitable for OLAP. We compare four approaches that come closest to our requirements. After a brief description we discuss their suitability as a formal foundation for OLAP, thus providing a systematic overview. Finally, we propose directions for further research.

...read moreread less

Collapse