scispace - formally typeset
Search or ask a question

From enterprise models to dimensional models: a methodology for data warehouse and data mart design.

TL;DR: A method for developing dimensional models from traditional Entity Relationship models, which can be used to design data warehouses and data marts based on enterprise data models is described.
Abstract: This paper describes a method for developing dimensional models from traditional Entity Relationship models. This can be used to design data warehouses and data marts based on enterprise data models. The first step of the method involves classifying entities in the data model into a number of categories. The second step involves identifying hierarchies that exist in the model. The final step involves collapsing these hierarchies and aggregating transaction data to form dimensional models. A number of design alternatives are presented, including a flat schema, a terraced schema, a star schema and a snowflake schema. We also define a new type of schema called a star cluster schema. This is a restricted form of snowflake schema, which minimises the number of tables while avoiding overlap between different dimensional hierarchies. Individual schemas can be collected together to form constellations or galaxies. We illustrate the method using a simple example.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
08 Nov 2002
TL;DR: The proposed conceptual model is customized for the tracing of inter-attribute relationships and the respective ETL activities in the early stages of a data warehouse project and constructed in a customizable and extensible manner, so that the designer can enrich it with his own re-occurring patterns forETL activities.
Abstract: Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we focus on the problem of the definition of ETL activities and provide formal foundations for their conceptual representation. The proposed conceptual model is (a) customized for the tracing of inter-attribute relationships and the respective ETL activities in the early stages of a data warehouse project; (b) enriched with a 'palette' of a set of frequently used ETL activities, like the assignment of surrogate keys, the check for null values, etc; and (c) constructed in a customizable and extensible manner, so that the designer can enrich it with his own re-occurring patterns for ETL activities.

393 citations

Journal ArticleDOI
01 Apr 2008
TL;DR: This paper proposes GRAnD, a goal-oriented approach to requirement analysis for data warehouses based on the Tropos methodology, which can be employed within both a demand-driven and a mixed supply/demand-driven design framework.
Abstract: Several surveys indicate that a significant percentage of data warehouses fail to meet business objectives or are outright failures. One of the reasons for this is that requirement analysis is typically overlooked in real projects. In this paper we propose GRAnD, a goal-oriented approach to requirement analysis for data warehouses based on the Tropos methodology. Two different perspectives are integrated for requirement analysis: organizational modeling, centered on stakeholders, and decisional modeling, focused on decision makers. Our approach can be employed within both a demand-driven and a mixed supply/demand-driven design framework.

215 citations


Cites background from "From enterprise models to dimension..."

  • ...• Information requirements for DW applications are dificult to specify because decision processes are flexibly structured, poorly shared across large organizations, jealously guarded by managers, and unstable over time in keeping up with evolving business processes....

    [...]

Proceedings ArticleDOI
04 Nov 2005
TL;DR: This paper proposes a goal-oriented approach to requirement analysis for data warehouses, based on the Tropos methodology, which can be employed within both a demand-driven and a mixed supply/demand-driven design framework.
Abstract: Several surveys indicate that a significant percentage of data warehouses fail to meet business objectives or are outright failures. One of the reasons for this is that requirement analysis is typically overlooked in real projects. In this paper we propose a goal-oriented approach to requirement analysis for data warehouses, based on the Tropos methodology. Two different perspectives are integrated for requirement analysis: organizational modeling, centered on stakeholders, and decisional modeling, focused on decision makers. Our approach can be employed within both a demand-driven and a mixed supply/demand-driven design framework: in the second case, while the operational sources are still explored to shape hierarchies, user requirements play a fundamental role in restricting the area of interest for analysis and in choosing facts, dimensions, and measures. The methodology proposed, supported by a prototype, is described with reference to a real case study.

210 citations


Cites background from "From enterprise models to dimension..."

  • ...• Supply-driven (also called data-driven) approaches design the DW starting from a detailed analysis of the data sources [11, 8, 19]....

    [...]

  • ...Other supply-driven approaches were proposed in [11], [8], and [19], where conceptual design of the DW is rooted in the schema of operational sources and is carried out starting, respectively, from the identification of measures, from the selection of facts, and from a classification of the operational entities....

    [...]

01 Jan 2002
TL;DR: These algorithms provide a foundation for a software tool to create and evaluate data warehouse conceptual schemas and propose a guideline of manual steps to refine a conceptual schema to suit additional user needs.
Abstract: The popularity of data warehouses for analysis of data has grown tremendously, but much of the creation of data warehouses is done manually. We propose and illustrate algorithms for automatic conceptual schema development and evaluation. Our creation algorithm uses an enterprise schema of an operational database as a starting point for source-driven data warehouse schema design. Candidate conceptual schemas are created using the ME/R model, extended to note where additional user input can be used to further refine a schema. Our evaluation algorithm follows a user-driven requirements approach that utilizes queries to guide selection of candidate schemas most likely to meet user needs. In addition, we propose a guideline of manual steps to refine a conceptual schema to suit additional user needs, for example, the level of detail needed for date fields. The algorithms are illustrated using the TPC-H Benchmark schema and queries. Our algorithms provide a foundation for a software tool to create and evaluate data warehouse conceptual schemas.

176 citations

References
More filters
01 Jan 1970
TL;DR: A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced and certain operations on relations are discussed and applied to the problems of redundancy and consistency in the user's model.
Abstract: Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.

2,395 citations


"From enterprise models to dimension..." refers background in this paper

  • ...This introduces redundancy in the form of a transitive dependency, which is a violation to third normal form (Codd, 1970)....

    [...]

  • ...The objective of normalisation is to minimise data redundancy (Codd, 1970)....

    [...]

Journal ArticleDOI
TL;DR: Y the authors' company decides to build a data warehouse and you are designated the project manager, and you have specific questions that need specific answers, and building a data Warehouse is an extremely complex process.
Abstract: Y our company decides to build a data warehouse and you are designated the project manager. What are your first steps? You’ve read the books, attended the conferences, and perused the trade publications. Now you have to act. There are numerous vendors, all touting the wonders of their products, but you have specific questions that need specific answers, and building a data warehouse is an extremely complex process. Questions you have to weigh fall into the following general categories:

1,272 citations

Book
17 Aug 1991
TL;DR: This chapter discusses Conceptual Design, Logical Design, and Design Tools for Database Design, as well as Joint Data and Functional Analysis, and Improving the Quality of a Database Schema.
Abstract: I. CONCEPTUAL DATABASE DESIGN. 1. An Introduction to Database Design. 2. Data Modeling Concepts. 3. Methodologies for Conceptual Design. 4. View Design. 5. View Integration. 6. Improving the Quality of a Database Schema. 7. Schema Documentation and Maintenance. II. FUNCTIONAL ANALYSIS FOR DATABASE DESIGN. 1. Functional Analysis Using the Dataflow Model. 2. Joint Data and Functional Analysis. 3. Case Study. III. LOGICAL DESIGN AND DESIGN TOOLS. 1. High-Level Logical Design Using the Entity-Relationship Model. 2. Logical Design for the Relational Model. 3. Logical Design for the Network Model. 4. Logical Design for the Hierarchical Model. 5. Database Design Tools. Index. 0805302441T04062001

1,018 citations

Book
15 Mar 1996
TL;DR: This definitive guide succinctly explains how to build a data warehouse by using actual case studies of existing data warehouses developed for specific types of business applications such as retail, manufacturing, banking, insurance, subcriptions and airline reservations.
Abstract: This definitive guide succinctly explains how to build a data warehouse by using actual case studies of existing data warehouses developed for specific types of business applications such as retail, manufacturing, banking, insurance, subcriptions and airline reservations. Describes a powerful new model of data warehouse design, the dimensional data warehouse, that provides readers with the ability to quickly analyze complex information in order to make sound decisions. The accompanying CD-ROM includes a toolkit for building dimensional data warehouses and examples of all the databases discussed in the text.

839 citations


"From enterprise models to dimension..." refers background in this paper

  • ...According to Kimball (1996, 1997), the data warehousing (OLAP1) environment is profoundly different from the operational (OLTP2) environment and techniques used to design operational databases are inappropriate for designing data warehouses....

    [...]

  • ...Option 5: Star Cluster Schema Kimball (1996) argues that “snowflaking” is undesirable, because it adds complexity to the schema and requires extra joins....

    [...]

  • ...Kimball (1996) claims that use of star schemas to design data warehouses results in 80% of queries being single table browses....

    [...]

Book
10 May 1996

513 citations