scispace - formally typeset
Search or ask a question

Showing papers on "Data warehouse published in 1995"


Proceedings ArticleDOI
02 Dec 1995
TL;DR: This paper motivates the concept of a data warehouse, outlines a general data warehousing architecture, and proposes a number of technical issues arising from the architecture that are suitable topics for exploratory research.
Abstract: The topic of data warehousing encompasses architectures, algorithms, and tools for bringing together selected data from multiple databases or other information sources into a single repository, called a data warehouse, suitable for direct querying or analysis. In recent years data warehousing has become a prominent buzzword in the database industry, but attention from the database research community has been limited. In this paper we motivate the concept of a data warehouse, we outline a general data warehousing architecture, and we propose a number of technical issues arising from the architecture that we believe are suitable topics for exploratory research.

772 citations


Book
03 Feb 1995
TL;DR: The reader will find expanded coverage of data warehousing, data mining, on-line analytical processes, and an entirely new chapter on intelligent agents in this fifth edition.
Abstract: From the Publisher: This book is widely known for its comprehensive treatment of decision support theory and how it is applied. Through four editions, this book has defined the course and set the standard for up-to-date coverage of the latest decision support theories and practices by managers and organizations. This fifth edition has been streamlined and updated throughout to reflect new computing technologies. Chapter 9 has been completely rewritten to focus on the Internet and Intranet. The reader will find expanded coverage of data warehousing, data mining, on-line analytical processes, and an entirely new chapter on intelligent agents (Ch. 19). Internet related topics and links to Internet exercises and cases appear throughout the new edition.

676 citations


Book
01 Jan 1995
TL;DR: The authors have redesigned the distributed systems design chapter to also address Internet-based application design topics not covered in the other chapters: They cover Internet application design standards, how to maintain site consistency, security issues, and data warehousing, among other topics.
Abstract: From the Publisher: NEW OR EXPANDED CONTENT COVERAGE TO KEEP YOU ON THE LEADING EDGE… Increased focus on "make versus buy" and systems integration. More and more systems development involves the use of packages in combination with legacy applications and new modules. Chapter 11 shows how companies deal with these issues. Coverage of Internet-based systems. The authors have redesigned the distributed systems design chapter (now Chapter 16) to also address Internet-based application design topics not covered in the other chapters: They cover Internet application design standards, how to maintain site consistency, security issues, and data warehousing, among other topics. Expanded coverage of process modeling techniques. Chapter 8 now includes an introduction to business process modeling and functional hierarchy modeling as alternatives to data flow diagramming. These three process-modeling techniques are compared so you know when to use each in practice. Unlike other SAD texts, Modern Systems Analysis and Design has continually offered strong coverage of RAD — an important element in systems design. FEATURES THAT MAKE THIS EDITION AN INDISPENSABLE RESOURCE: Expanded and updated coverage of systems analysis as a profession Updated coverage of codes of conduct and new material on how systems professionals approach business problems with ethical considerations. Updated information on career paths with the latest information gathered from professional societies. Net Search Exercises New marginicons for Net Search exercises on the Web site can be found in every chapter. The icon signals when a topic in the text has a corresponding Net Search exercise on the Web site. Integration of Electronic commerce into the running cases One of three fictional running cases in the text, Pine Valley Furniture, is a furniture company founded in 1980, that now, in the Third Edition, has decided to explore electronic commerce as an avenue to increase its market share. Broadway Entertainment Company, Inc., BEC, a fictional video and record retailer, is a project case that allows you to study and develop a Web-based customer relationship management system.

449 citations


Journal ArticleDOI
TL;DR: The technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies are surveyed, to counter the increasing dispersion and heterogeneity of data.
Abstract: Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies.

200 citations


Journal Article
TL;DR: The goal of the data warehousing project at Stanford (the WHIPS project) is to develop algorithms and tools for the effcient collection and integration of information from heterogeneous and autonomous sources, including legacy sources.
Abstract: The goal of the data warehousing project at Stanford (the WHIPS project) is to develop algorithms and tools for the effcient collection and integration of information from heterogeneous and autonomous sources, including legacy sources. In this paper we give a brief overview of the WHIPS project, and we describe some of the research problems being addressed in the initial phase of the project.

179 citations


Book
01 Oct 1995
TL;DR: In Building a Data Warehouse for Decision Support, Second Edition as discussed by the authors, a team of the world's leading experts presents a start-to-finish, state-of-the-art guide to designing and implementing data warehouses.
Abstract: From the Publisher: In Building a Data Warehouse for Decision Support, Second Edition, a team of the world's leading experts presents a start-to-finish, state-of-the-art guide to designing and implementing data warehouses. You'll find up-to-the-minute solutions-oriented recommendations for the entire data warehouse development lifecycle, including best practices for requirements gathering and identifying business objectives; critical success factors - why and how they affect the success of your project; planning, scoping, and managing your data warehouse project; and managing metadata for a production data warehouse.

173 citations


Book
27 Oct 1995
TL;DR: The operational data store as mentioned in this paper is a dynamic architectural construct designed specifically for high-speed, integrated operational processing and achieves at the operational level what the data warehouse does at the strategic/managerial level.
Abstract: From the Publisher: The operational data store, or ODS, is a dynamic architectural construct designed specifically for high-speed, integrated operational processing. It achieves at the operational level what the data warehouse does at the strategic/managerial level. Now, from a team of experts headed by W. H. Inmon, "father of the data warehouse," here is the first comprehensive guide to understanding, building, and using operational data stores.

100 citations


Journal ArticleDOI
TL;DR: A prototype system, called Repository-based Executive Information System (R-EIS), designed to integrate instead of replacing various existing application systems to support managerial information delivery is presented in this paper.
Abstract: Most executive information systems (EIS) are focusing on the delivery of information to executives on line. Clicking on icons or command buttons, executives can browse through a series of screens of tabular or graphical information organized in a hierarchical structure. There is, however, no underlying model to guide the navigation of the diverse data and applications used by executives. A repository-based and model-driven EIS that captures integrated organization and information system (IOIS) models of an enterprise is therefore needed. A prototype system, called Repository-based Executive Information System (R-EIS), designed to integrate instead of replacing various existing application systems to support managerial information delivery is presented in this paper. This integrated IOIS model can be used directly by executives to assist them in navigating a model from a business perspective in order to gain access to applications and information to support their decision making. The navigation paths become contexts for the information delivered. Executives who can use an explicit organization model in their navigation of large information bases may learn more about their businesses and therefore facilitate organizational learning.

44 citations



Journal ArticleDOI
01 Apr 1995
TL;DR: InterViso, one of the first commercial federated database products, is discussed, which provides a value added layer above connectivity products to handle views across databases, schema translation, and transaction management.
Abstract: Connectivity products are finally available to provide the "highways" between computers containing data. IBM has provided strong validation of the concept with their "Information Warehouse." DBMS vendors are providing gateways into their products, and SQL is being retrofitted on many older DBMSs to make it easier to access data from standard 4GL products and application development systems. The next step needed for data integration is to provide (1) a common data dictionary with a conceptual schema across the data to mask the many differences that occur when databases are developed independently and (2) a server that can access and integrate the databases using information from the data dictionary. In this article, we discuss InterViso, one of the first commercial federated database products. InterViso is based on Mermaid, which was developed at SDC and Unisys (Templeton et al., 1987b). It provides a value added layer above connectivity products to handle views across databases, schema translation, and transaction management.

39 citations


Proceedings Article
20 Aug 1995
TL;DR: Recon incorporates several data mining modules into a single, uniform framework: data visualization, deductive databases, and rule induction, which yields superior error detection over the application of any single data mining technique.
Abstract: To aid in making investment decisions, financial analysts purchase large amounts of data from financial data providers They use this historical data to develop financial models for making predictions and assessing risk about current dat about current dataa Unfortunately, these database often contain errors and omissions of important information Analysts are dependent upon the quality of these databases—missing data can prevent computation of key values and, more dangerously, incorrect data can cause their models to produce erroneous results without the analyst's knowledge Because of the importance of accurate data, and the large volume of data involved, data providers and consumers have a need to develop advanced methods for data cleaning: the process of identifying and correcting incomplete and incorrect information in databases This paper describes how the Recon data mining system has been used to clean financial databases Recon incorporates several data mining modules into a single, uniform framework: data visualization, deductive databases, and rule induction The data visualization component supports the visual detection of outliers and other unusual phenomena The deductive database enables analysts to capture, codify, and apply corporate knowledge about data integrity The rule induction module creates error detection rules by generalizing from known errors to detect suspicious data entries in the rest of the data The collaborative use of these three modules yields superior error detection over the application of any single data mining technique

Journal ArticleDOI
TL;DR: The intuition and theory necessary for identifying and designing views that are efficiently maintainable using partial information are given and these techniques can be used to minimize remote data access and often to completely avoid remote access.

Proceedings ArticleDOI
22 May 1995
TL;DR: The data warehouse must replace old legacy applications for effective information processing, and it is necessary to understand the root causes of the difficulty in getting information in the first place.
Abstract: Corporations worldwide are finding that understanding and managing rapidly growing, enterprisewide data is critical for making timely decisions and responding to changing business conditions. To manage and use business information competitively, many companies are establishing decision support systems built around a data warehouse of subject-oriented, integrated, historical information. In order to understand why the data warehouse must replace old legacy applications for effective information processing, it is necessary to understand the root causes of the difficulty in getting information in the first place. The first difficulty in getting information from the base of old applications is that those old applications were shaped around business requirements that were relevant as much as twenty-five years ago. These applications that were shaped yesterday do not reflect today’s business. The second reason why older applications are so hard to use as a basis for information is that those applications were shaped around the clerical needs of the corporation. A clerically focused application of necessity does not have the historical foundation required to support a long-term view. Another reason why the clerical perspective of applications does not support management’s need for information is that the clerical community focuses on detailed data. While detailed data is tine for the day-to-day clerical needs of the organization, management needs to see summary data in order to identify trends, challenges and opportunities. Yet another reason why the clerical perspective of applications does not suffice for management’s need for information is that the clerically-oriented applications were built an application at a time, and there was little or no integration from one application to the next. The result is that the old legacy applications cannot easily or reliably be combined to produce a unified perspective of data. For these basic reasons, the older foundation of applications will not suffice as a basis for the important informational processing that organizations need to do in order to become efficient, competitive corporations. Nothing short of an entire change in architecture and a fundamental restructuring of the applications foundation will suffice. Fortunately there is an alternative architecture, which consists of a separation of processing into two broad categories-operational processing and decision-support processing. At the heart of decision-support (DSS) processing is the structure known as the data warehouse. The data warehouse contains data which has been gathered and integrated from the legacy systems environment. There are different levels of data within the data warehouse. Some data is very detailed. Other data is summarized. Other older detailed data is placed in secondary storage. In addition there is a component of the data warehouse known as “meta data.” Meta data, or information about data, is a directory as to what the contents of the data warehouse are and where the contents came from.



Book ChapterDOI
28 Oct 1995
TL;DR: Process management will play a larger role in the development of adequate exploration environments, because such environments will be integrations of numerous types of systems that focus on smaller aspects of the overall problem.
Abstract: A recurrent theme from the Second Database Issues for Data Visualization workshop was the importance of interactively exploring databases using numerous tools and techniques. Database exploration is a discovery process where relevant information or knowledge is identified and extracted from data. It is related to the field of Knowledge Discovery in Databases (KDD), and emphasizes the process of knowledge discovery: the development of hypotheses about the data, and the validation of those hypotheses. Discovery is not only possible from analytic tools, but also from graphical, textual, numeric, and tabular presentations of data. Flexibility in data processing and output presentation are fundamental requirements of any data exploration environment. A shared sentiment among workshop participants was that database exploration requires the cooperation of database management, data analysis and data visualization facilities, as shown in Figure 1. Interaction is also central to database exploration. The user must interact with data to discover information. User-data interactions must, then, be supported by an integrated exploration system. Because of the potential complexity of such a system, interactions occur at many levels between the data, system and user. These include interactions among software modules and user-data interfaces. Process management will play a larger role in the development of adequate exploration environments, because such environments will be integrations of numerous types of systems that focus on smaller aspects of the overall problem. If we are to realize any benefits from such an integration, the whole must be greater than the sum of the components.


Proceedings ArticleDOI
TL;DR: The problem of storing and retrieving medical image data from a warehouse is looked at and the approach is to store the most needed information abstract at the top of the pyramid and more detailed and storage consuming data toward the end of the Pyramid.
Abstract: As our applications continue to become more sophisticated, the demand for more storage continues to rise. Hence many businesses are looking toward data warehousing technology to satisfy their storage needs. A warehouse is different from a conventional database and hence deserves a different approach while storing data that might be retrieved at a later point in time. In this paper we look at the problem of storing and retrieving medical image data from a warehouse. We regard the warehouse as a pyramid with fast storage devices at the top and slower storage devices at the bottom. Our approach is to store the most needed information abstract at the top of the pyramid and more detailed and storage consuming data toward the end of the pyramid. This information is linked for browsing purposes. In a similar fashion, during the retrieval of data, the user is given a sample representation with browse option of the detailed data and, as required, more and more details are made available.© (1995) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.


Book ChapterDOI
28 Oct 1995
TL;DR: A comprehensive scientific data model is required for seamless integration of various components of a scientific database system which includes visualization, data analysis, and data management.
Abstract: Visualization is one of the most important activities involved in modern exploratory data analysis. Traditional database data models, in their current forms, are inadequate to satisfy the data modeling need of exploratory data analysis in general and visualization in particular. A comprehensive scientific data model is required for seamless integration of various components of a scientific database system which includes visualization, data analysis, and data management.

Patent
15 Sep 1995
TL;DR: In this paper, a warehouse database hub interface (23) is connected to the database, and a user generates queries based on the schema provided by the Warehouse Database Hub Interface (23).
Abstract: A database warehouse (27) includes a database having data arranged in data tables (11-16), e.g., fact tables and reference tables. A warehouse database hub interface (23) is connected to the database. The warehouse database hub interface (23) presents to a user a schema of the data in the database warehouse (27). The schema consists of virtual tables (31-34). Arrangement of the data in the virtual tables (31-34) is different than arrangement of data in the fact tables and the reference tables. A user generates queries based on the schema provided by the warehouse database hub interface (23). In response to a such a query for particular information stored in the database warehouse (27), the warehouse database hub interface (23) modifies the query to take into account pre-computed values and the arrangement of the data within the database warehouse (27). Then the warehouse database hub interface (23) queries the database warehouse (27) using the modified query to obtain the particular information from the database warehouse (27). Finally, the warehouse database hub interface (23) forwards the particular information obtained from the database warehouse (27) to the user.

Journal ArticleDOI
TL;DR: A pre-physical design step is described, and a set of heuristics is proposed in order to obtain a refined data base design.

Book ChapterDOI
28 Oct 1995
TL;DR: Knowledge Discovery in Databases is a relatively new research area that employs a variety of tools to explore and identify structure and patterns in these large databases.
Abstract: The government, corporate, and industrial communities are faced with an ever increasing number of databases. These databases need not only to be managed, but also explored. The first requires secure access to distributed heterogeneous multimedia databases with rich metadata and having to meet timing constraints. The second requires exploratory tools supporting the identification of domain and mission critical elements such as patterns in data access (e.g., security breach determinations), patterns in data (e.g., marketing and clustering), or for patterns in transactions (e.g., data compression), to site a few. Knowledge Discovery in Databases is a relatively new research area that employs a variety of tools to explore and identify structure and patterns in these large databases. Often the data is preprocessed to facilitate such computations (data warehousing). The data is then mined for specific rules that are built incrementally and often steered by users with a specific set of goals in mind.

01 Jan 1995
TL;DR: The system in use at the Planetary Plasma Interactions (PPI) node of the NASA Planetary Data System (PDS) is based on the object-oriented Distributed Inventory Tracking and Data Ordering Specification (DITDOS), which describes data inventories in a storage independent way.
Abstract: The analysis of space science data often requires researchers to work with many different types of data. For instance, correlative analysis can require data from multiple instruments on a single spacecraft, multiple spacecraft, and ground-based data. Typically, data from each source are available in a different format and have been written on a different type of computer, and so much effort must be spent to read the data and convert it to the computer and format that the researchers use in their analysis. The large and ever-growing amount of data and the large investment by the scientific community in software that require a specific data format make using standard data formats impractical. A format-independent approach to accessing and analyzing disparate data is key to being able to deliver data to a diverse community in a timely fashion. The system in use at the Planetary Plasma Interactions (PPI) node of the NASA Planetary Data System (PDS) is based on the object-oriented Distributed Inventory Tracking and Data Ordering Specification (DITDOS), which describes data inventories in a storage independent way. The specifications have been designed to make it possible to build DITDOS compliant inventories that can exist on portable media such as CD-ROM's. The portable media can be moved within a system, or from system to system, and still be used without modification. Several applications have been developed to work with DITDOS compliant data holdings. One is a windows-based client/server application, which helps guide the user in the selection of data. A user can select a data base, then a data set, then a specific data file, and then either order the data and receive it immediately if it is online or request that it be brought online if it is not. A user can also view data by any of the supported methods. DITDOS makes it possible to use already existing applications for data-specific actions, and this is done whenever possible. Another application is a stand-alone tool to assist in the extraction of data from portable media, such as CD-ROM's. In addition to the applications, there is a set of libraries that can facilitate building new DITDOS compliant applications.

Proceedings Article
11 Sep 1995
TL;DR: Data warehouses are repositories that integrate and relational and non-relational data stores that summarize historical and reference data from numerous the warehouse integrates’ gateways and distributed sources.
Abstract: Data warehouses are repositories that integrate and relational and non-relational data stores. Because summarize historical and reference data from numerous the warehouse integrates’ gateways and distributed sources. Warehoused data can be analyzed along several SQL processing, the warehoe cq load data from dimensions such as time, product, and geography to ari ‘opeiational system with an SQL imerf...seZecf identify trends and gain competitive advantage. statement.


Journal ArticleDOI
TL;DR: The types of metadata encountered and the problems associated with dealing with them are discussed, and an alternative approach based on textual markup rather than, for example, the relational model is described.
Abstract: With many types of scientific data, the amount of descriptive and qualifying information associated with the data values is quite variable and potentially large compared with the number of actual data values. This problem has been found to be particularly acute when dealing with data about the nutrient composition of foods, and a system—based on textual markup rather than, for example, the relational model—has been developed to deal with it. This paper discusses the types of metadata encountered and the problems associated with dealing with them, and then describes this alternative approach. The approach described has been installed in several locations around the world, and is in preliminary use as a tool for interchanging data among different databases as well as local database management.

Proceedings ArticleDOI
15 May 1995
TL;DR: This paper discusses a data warehouse facility known as the "Information Utility" ("IU") which was constructed to meet the land related information needs of government agencies, utility companies and the public in Manitoba.
Abstract: This paper discusses a data warehouse facility known as the "Information Utility" ("IU") which was constructed to meet the land related information needs of government agencies, utility companies and the public in Manitoba. The IU forms the hub of the Manitoba Land Related Information System ("MLRIS"); a multi-participant system with a mandate to provide access to common spatial information. The concept of the IU was formed in the late 1980s with the understanding that many agencies all required access to the large and complex datasets that describe the land and infrastructure base. In exploring the architecture of the IU this paper describes the main system components including the repository, the data catalogue, the data exchange manager, query and browse applications, and the system administration modules.