scispace - formally typeset
Search or ask a question

Showing papers on "Data mart published in 2015"


Book
15 Sep 2015
TL;DR: "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to creating a technical data warehouse layer.
Abstract: The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouseDemystifies data vault modeling with beginning, intermediate, and advanced techniquesDiscusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0

37 citations


Proceedings ArticleDOI
02 Mar 2015
TL;DR: A summary of one such business case of the Financial Services Industry where traditional ETL silos was broken to support the structurally dynamic, ever expanding and changing data usage needs employing Ontology and Semantic techniques like RDF/RDFS, SPARQL, OWL and related stack.
Abstract: We are seeing a sea change down the pike in terms of financial information aggregation and consumption; this could potentially be a game changer in financial services space with focus on ability to commoditize data. Financial Services Industry deals with a tremendous amount of data that varies in its structure, volume and purpose. The data is generated in the ecosystem (its customers, its own accounts, partner trades, securities transactions etc.), is handled by many systems - each having its own perspective. Front-office systems handle transactional behavior of the data, middle office systems which typically work with a drop-copy of the data subject it to intense processing, business logic, computations (such as inventory positions, fee calculations, commissions) and the back office systems deal with reconciliation, cleansing, exception management etc. Then there are the analytic systems which are concerned with auditing, compliance reporting as well as business analytics. Data that flows through this ecosystem gets aggregated, transformed, and transported time and again. Traditional approaches to managing such data leverage Extract-Transform-Load (ETL) technologies to set up data marts where each data mart serves a specific purpose (such as reconciliation or analytics). The result is proliferation of transformations and marts in the Organization. The need is to have architectures and IT systems that can aggregate data from many such sources without making any assumptions on HOW, WHERE or WHEN this data will be used. The incoming data is semantically annotated and stored in the triple store within storage tier and offers the ability to store, query and draw inferences using the ontology. There is a probable need for a Big Data Solution here that helps ease data liberation and co-location. This paper is a summary of one such business case of the Financial Services Industry where traditional ETL silos was broken to support the structurally dynamic, ever expanding and changing data usage needs employing Ontology and Semantic techniques like RDF/RDFS, SPARQL, OWL and related stack.

10 citations


Proceedings ArticleDOI
27 May 2015
TL;DR: The conclusion from this research is the use of dynamicETL process (using metadata ETL) is required when ETL process is dealing with the operational system that still unstable and likely to change the database schema.
Abstract: Extraction-transformation-loading (ETL) process in data warehouse development perform data extraction from various resources, transform the data into suitable format and loadit into data warehouse storage. In the ETL process, there is data cleansing process function that handles redundancy, inconsistency and integrity data. ETL process will move data from the source to the integration layer (data store in data warehouse). In the integration layer, the data can be grouped into smaller scope and more specific for the requirement in other repositories called data marts. Reporting program of data warehouse will be associated with a data mart as its data source. In this research, the data warehouse is built to handle the ETL process. The data warehouse build metadata to support the process. The metadata construction for ETL processes will lead to ETL programs with high degree of reusability. The conclusion from this research is the use of dynamic ETL process (using metadata ETL) is required when ETL process is dealing with the operational system that still unstable and likely to change the database schema. Dynamic ETL process is also needed to address the increase requirement for report from the users.

10 citations


Patent
04 Mar 2015
TL;DR: In this paper, a financial service management information platform is proposed to realize cross-system and trans-department data information sharing on the basis of finishing basic data merging and integration of each legal person organization.
Abstract: The invention discloses a financial service management information platform which comprises a foundation platform and an application platform. The financial service management information platform can realize cross-system and trans-department data information sharing on the basis of finishing basic data merging and integration of each legal person organization. Data application marts having different service themes are established through a unified data view, the data application marts comprising a business report data mart, a business index system and business analysis data marts having different topics. Therefore, service level of a rural credit system is improved, and wisdom cooperation financial service is realized under the premise of guaranteeing the system to have good expansibility, safety and stability and high availability.

9 citations


Book ChapterDOI
26 May 2015
TL;DR: A new methodology that is composed by schema matching and schema mapping is proposed, which compares the elements of the two schemas using a new semantic measure to generate the mapping rules and applies them to ensure the automatic merging of the schemas.
Abstract: The schema integration technique offers the possibility to unify the representation of several schemas into one global schema. In this work, we present two contributions. The first one is about automating this technique to reduce human intervention. The second one is about applying this technique to generate data warehouse schema from data mart schemas. To response to our goals, we propose a new methodology that is composed by schema matching and schema mapping. The first technique compares the elements of the two schemas using a new semantic measure to generate the mapping rules. The second one transforms the mapping rules into queries and applies them to ensure the automatic merging of the schemas.

8 citations


Journal ArticleDOI
TL;DR: The crucial tip of the proposed work is integrated on delivering an enhanced and an exclusive innovative model based on the intention of enhancing security measures, which at times have been found wanting and also ensuring improved accessibility using Hashing modus operandi.
Abstract: Data warehouse is a set of integrated databases deliberated to expand decision-making and problem solving, espousing exceedingly condensed data. Data warehouse happens to be progressively more accepted theme for contemporary researchers with respect to contemporary inclination towards industry and executive purview. The crucial tip of the proposed work is integrated on delivering an enhanced and an exclusive innovative model based on the intention of enhancing security measures, which at times have been found wanting and also ensuring improved accessibility using Hashing modus operandi. An unsullied algorithm was engendered using the concept of protein synthesis, prevalently studied in Genetics, that is, in the field of Biotechnology, wherein three steps are observed, namely; DNA Replication, Translation and Transcription. In the proposed algorithm, the two latter steps, that is, Translation and Transcription have been taken into account and the concept have been used for competent encryption and proficient decryption of data. Central Dogma Model is the name of the explicit model that accounts for and elucidates the course of action for Protein Synthesis using the Codons which compose the RNA and the DNA and are implicated in numerous bio–chemical processes in living organisms. It could be observed that subsequently a dual stratum of encryption and decryption mechanism has been employed for optimal security. The formulation of the immaculate Hashing modus operandi ensure that there would be considerable diminution of access time, keeping in mind the apt retrieval of all indispensable data from the data vaults. The pertinent appliance of the proposed model with enhanced security might be in its significant service in a variety of organizations where accrual of protected data is of extreme magnitude. The variety of organizations might include educational organizations, corporate houses, medical establishments, private establishments and so on and so forth.

7 citations


Proceedings ArticleDOI
15 Jul 2015
TL;DR: This paper describes a process to develop and publish a scorecard on the semantic web based on the ISO 2789:2013 standard using Linked Data technologies in such a way that it can be linked to related datasets across the web.
Abstract: Open access journals collect, preserve and publish scientific information in digital form, but it is still difficult not only for users but also for digital libraries to evaluate the usage and impact of this kind of publications. This problem can be tackled by introducing Key Performance Indicators (KPIs), allowing us to objectively measure the performance of the journals related to the objectives pursued. In addition, Linked Data technologies constitute an opportunity to enrich the information provided by KPIs, connecting them to relevant datasets across the web. This paper describes a process to develop and publish a scorecard on the semantic web based on the ISO 2789:2013 standard using Linked Data technologies in such a way that it can be linked to related datasets. Furthermore, methodological guidelines are presented with activities. The proposed process was applied to the open journal system of a university, including the definition of the KPIs linked to the institutional strategies, the extraction, cleaning and loading of data from the data sources into a data mart, the transforming of data into RDF (Resource Description Framework), and the publication of data by means of a SPARQL endpoint using the OpenLink Virtuoso application. Additionally, the RDF data cube vocabulary has been used to publish the multidimensional data on the web. The visualization was made using CubeViz a faceted browser to present the KPIs in interactive charts.

6 citations


Patent
Qiang Zhu1, Yan Liu1, Songtao Guo1, Shaobo Liu1, Guan Wang1, Sui Yan1, Kuisong Tong1 
31 Dec 2015
TL;DR: In this article, the authors present techniques for generating and deploying a computer model with few inputs from a user, and for creating a data mart that multiple computer models may leverage in order to decrease the time required to generate subsequent computer models.
Abstract: Techniques are provided for generating and deploying a computer model with few inputs from a user. Techniques are also provided for creating a data mart that multiple computer models may leverage in order to decrease the time required to generate subsequent computer models.

5 citations


Book ChapterDOI
11 Oct 2015
TL;DR: An application called Drug Encyclopedia is presented which is built on the top of the data mart represented as Linked Data and enables physicians to search and browse clinically relevant information about medicinal products and drugs.
Abstract: The information about drugs is scattered among various resources and accessing it is hard for end users. In this paper we present an application called Drug Encyclopedia which is built on the top of the data mart represented as Linked Data and enables physicians to search and browse clinically relevant information about medicinal products and drugs. The application has been running for more than a year and has attracted many users. We describe the development driven by requirements, data mart creation, application evaluation and discuss the lessons learned.

5 citations


Patent
05 Aug 2015
Abstract: The utility model relates to an information system technical field especially relates to an operation monitoring analysis show integration platform. Contain ETL server, data resources management application server, data resources management data storehouse server, ODS database server, data warehouse server, data mart server, index monitoring server, index analysis server, index monitoring application server, the visual application server of large-size screen monitors, visual show database server, graphics workstation, externally demonstrate large-size screen monitors, above-mentioned each database server all arranges on san storage network, connects the disk array through the optic fibre switch and realizes that each database server's data regularly backup to in the tape library, whole platform is in in the power grid company information intranet, each server is distinguished through the IP address, inserts main network, the network core layer of rethread four layers exchange access information intranet. Have rational in infrastructurely, the practicality is strong, improves operating efficiency's characteristics.

5 citations


Proceedings Article
01 Jan 2015
TL;DR: The Hadoop framework is presented as a solution to the problem of integrating research information in the Business Intelligence field and can collect, explore, process and structure the aforementioned information to develop an equivalent function to a data mart in an Intelligence Business system.
Abstract: Most of the information collected in different fields by Instituto de Investigacion Biomedica de A Coruna (INIBIC) is classified as unstructured due to its high volume and heterogeneity. This situation, linked to the recent requirement of integrating it to the medical information, makes it necessary to implant specific architectures to collect and organize it before it can be analysed. The purpose of this article is to present the Hadoop framework as a solution to the problem of integrating research information in the Business Intelligence field. This framework can collect, explore, process and structure the aforementioned information, which allow us to develop an equivalent function to a data mart in an Intelligence Business system.

Journal Article
TL;DR: A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople in making decisions based primarily on analyses of past activities and results.
Abstract: A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople in making decisions based primarily on analyses of past activities and results. A data mart contains a predefined subset of enterprise data organized for rapid analysis and reporting. Data warehousing has come into being because the file structure of the large mainframe core business systems is inimical to information retrieval. The purpose of the data warehouse is to combine core business and data from other sources in a format that facilitates reporting and decision support. In just a few years, data warehouses have evolved from large, centralized data repositories to subject specific, but independent, data marts and now to dependent marts that load data from a central repository of Data Staging files that has previously extracted data from the institution’s operational business systems (e.g., student record, finance and human resource systems, etc.).

Patent
25 Nov 2015
TL;DR: In this article, a method and a device for data sharing among data marts is presented. The method comprises that a data document is extracted from a data warehouse, and the data documents are copied into a preset mart storage sharing area and taken as a copy document of the data document; and an external table corresponding to the copy document inside the mart sharing area is established in a data mart server.
Abstract: The invention discloses a method and a device for data sharing among data marts The method comprises that a data document is extracted from a data warehouse, and the data document is copied into a preset mart storage sharing area and taken as a copy document of the data document; and an external table corresponding to the copy document inside the mart storage sharing area is established in a data mart server, wherein the external table comprises a storage path which points at the copy document inside the mart storage sharing area, so that the data marts can obtain the copy document corresponding to the external table by checking the external table The method and the device for the data sharing among the data marts provided by the invention realize the purpose of the data sharing among the multiple data marts, avoid wasting in resource storage, and solve the problem about consistency of data access among the data marts

DOI
01 Jan 2015
TL;DR: Developing graduate data mart is capable to display the data of graduates on the academic year, GPA, educational years, and the status of the student transfers and, as the result, those data are able to help the management of university in filling up the accreditation form.
Abstract: Universitas Muhammadiyah Yogyakarta (UMY) is a big and high-grade educational institution. During the period of 1998-2014, UMY has produced about 20550 graduates. But, the big number of graduates is not supported by a good data storage system. Whereas those data are needed in filling up the accreditation form. For that reason, we have to build an integrated data storage system to provide graduates data as needed, that is graduate data mart. The development of graduate data mart uses SDLC Model Waterfall method. This method involves several types, there are requirement analysis, design system, implementation system, testing system, and maintenance system and those must be done sequentially. If there is an error, the process must be repeated from the beginning to fix the error. Development of graduate data mart uses Operational Data Store (ODS) and Dimensional Data Store (DDS) architecture. Those architectures are selected because they support transactional level. By using those architectures, graduate data mart is capable to display the data of graduates on the academic year, GPA, educational years, and the status of the student transfers. As the result, those data are able to help the management of university in filling up the accreditation form.


Book ChapterDOI
26 May 2015
TL;DR: This paper proposes a Data Warehouse Architecture Reference Model (DWARM), which unifies known architectural styles and provides options for adaptation to fit particular purposes of a developed data warehouse system.
Abstract: A common taxonomy of data warehouse architectures comprises five basic approaches: Centralized, Independent Data Mart, Federated, Hub-and-Spoke and Data Mart Bus. However, for many real world cases, an applied data warehouse architecture can be their combination. In this paper we propose a Data Warehouse Architecture Reference Model (DWARM), which unifies known architectural styles and provides options for adaptation to fit particular purposes of a developed data warehouse system. The model comprises 11 layers grouping containers (data stores, sources and consumers), as well as processes, covering typical functional groups: ETL, data storage, data integration and delivery. Actual data warehouse architecture can be obtained by tailoring (removing unnecessary components) and instantiating (creating required layers and components of a given type).


Proceedings ArticleDOI
09 Nov 2015
TL;DR: The results show that the use of the database schema, created according to the experimental data modeling approach, had positive impact on the querying performance in several cases, however, varied depending on each query's respective resulting dataset size.
Abstract: The paper deals with the assessment of an experimental data modeling approach which is intended to support the agile oriented data modeling. The approach is based on the Anchor Data Modeling technique and is applied on a multidimensional data model. The assessed approach is expected to facilitate more effective execution of queries in the data mart environment. The emphasis is placed on the comparison of the query execution performance using database schemas, each built using traditional and the experimental approach. The tests are done in the environment of selected modern Business Intelligence tools, and using two test queries with varying output dataset sizes. The results show that the use of the database schema, created according to the experimental data modeling approach, had positive impact on the querying performance in several cases. The magnitude of impact on the querying performance, however, varied depending on each query's respective resulting dataset size.


Book ChapterDOI
01 Jan 2015
TL;DR: The process of the transformation, storage and analysis for data mart enlargement based on the users’ requests; concretely the data mart of the personal transport is presented in the paper.
Abstract: Creation of the information system is the complex and long process but it is only the first step in its existence. Most of information systems go through a certain development during its life cycle. Very often the users have defined the requests for the enlargement of the functionality and the volume of the displayed data. The process of the transformation, storage and analysis for data mart enlargement based on the users’ requests; concretely the data mart of the personal transport is presented in the paper. The expressions from the field of the Business Intelligence and the systems used for gaining data, data analysis or creation of forms and reports are explained.

Journal ArticleDOI
TL;DR: The core research on the architecture of ETL process which is applied on BI environment along with the advent of metadata at each corresponding layer that can be applicable to all the scenarios of BI are dealt with.
Abstract: paper deals with the core research on the architecture of ETL process which is applied on BI environment along with the advent of metadata at each corresponding layer that can be applicable to all the scenarios of BI. The management of extraction process has been done using several operators which help in reducing its complexity. New operators have been developed to easily understand each and every layer of extraction process. ETL stands for extraction, transformation and loading, and it plays a vital role in the area of business intelligence. Extraction is the process of extracting heterogeneous data from disparate source system for further analysis in a data warehouse environment. Transformation is a process of storing data in a correct, unambiguous and consistent format which is compatible to the format of existing data warehouse. Loading is a process that loads data into the end target which may be simply a data warehouse or a data mart. KeywordsTransformation, Loading, Business Intelligence

Book ChapterDOI
01 Jan 2015
TL;DR: The recommended approach is ADA with the hybrid dimensional-normalized model, which accommodates various data stores: hub-and-spoke, system of integration, and system of analytics.
Abstract: The data architecture defines the data along with the schemas, integration, transformations, storage, and workflow required to enable the analytical requirements of the information architecture. A solid data architecture is a blueprint that helps align your company's data with its business strategies. The data architecture guides how the data is collected, integrated, enhanced, stored, and delivered to business people who use it to do their jobs. It helps make data available, accurate, and complete so it can be used for business decision-making. The major choices are Enterprise Data Warehouse (EDW)-only, independent data marts, Ralph Kimball's enterprise data bus architecture, Bill Inmon's Corporate Information Factory (CIF), hub-and-spoke, analytical data architecture (ADA). The recommended approach is ADA with the hybrid dimensional-normalized model, which accommodates various data stores: hub-and-spoke, system of integration, and system of analytics. An architecture may include an operational data store, which is a database that enables non-traditional data warehousing functions such as real-time operational reporting or refining unstructured data.

Journal ArticleDOI
TL;DR: In this paper, black box and white box testing techniques concerning Data Mart are discussed and the test cases designed using these techniques are very successful to detect previously undetected errors or faults in a Data Mart.
Abstract: Data Warehouse is a logical stockroom which accumulates and maintains enormous volume of data. It is very important for the enterprise that needs to analyze data obtained from heterogeneous sources to take tactical decisions. Any enterprise that wants to expand, survive and beat out the competition must have control over data. Any error or inconsistency in the data may lead to improper decision which may cause immense losses to enterprise. Data Warehouse testing is carried out to eliminate the errors and inconsistencies that arise due to data being collected from desperate sources in different formats. Data Warehouse testing is too expensive and time consuming practice as exhaustive testing is not possible. Therefore the concept of Data Mart comes into existence. Data Mart is a specialized subset of Data Warehouse which fulfills the data requirement of a specific group. Testing of a Data Mart is much easier and manageable process as compared to testing of a Data Warehouse. In order to test the data mart, there are a number of strategies that allow us to select a set of test cases and test data which are very effective in detecting errors. In our paper we have discussed black box and white box testing techniques concerning Data Mart in brief. We have explored black box techniques to select test cases as they have systematic approach to uncover a great number of errors. We have also proposed to design the test cases using Boundary Value Analysis, Equivalence Class Partitioning and the combination of both techniques. The test cases designed using these techniques are very successful to detect previously undetected errors or faults in a Data Mart. These are also proficient for testing the worst case and robustness of Data Mart.

Book ChapterDOI
01 Jan 2015
TL;DR: This paper adapts a schema evolution method to address the data warehouse evolution given a data warehouse is built as a multidimensional schema and focuses on evolution operations in dimensional tables, which includes changes in levels, hierarchies, and paths.
Abstract: An organization collects current and historical data for a data warehouse from disparate sources across the organization to support management for making decisions. The data sources change their contents and structure dynamically to reflect business changes or organization requirements, which causes data warehouse evolution in order to provide consistent analytical results. This evolution may cause changes in contents or a schema of a data warehouse. This paper adapts a schema evolution method to address the data warehouse evolution given a data warehouse is built as a multidimensional schema. While existing works have identified and developed schema evolution operations based on conceptual models to enforce schema correctness, only a few have developed software tools to enforce schema correctness of those operations. They are also coupled with specific DBMSs (e.g., SQL Server) and provide limited GUI capability. This paper aims to develop a web-based implementation to support data warehouse schema evolution. It offers all the benefits that Internet browser-based applications provide, by allowing users to design, view, and modify data warehouse schema graphically. This work focuses on evolution operations in dimensional tables, which includes changes in levels, hierarchies, and paths. Schema correctness for each schema evolution operation is ensured by procedural codes implemented in PHP.

Posted Content
TL;DR: The paper examines and classifies the heterogeneities which can occur during the integration of independently developed data marts and four methods for heterogeneity detection are proposed and discussed and conclusions about the advantages of the proposed methods are drawn.
Abstract: The paper focuses on the detection of heterogeneities between multi-dimensional data marts. In many cases, data which resides in multiple and independently developed data marts is needed for decision-making. The multi-dimensional model introduces, in addition to the ER data model, dimension and fact entity. As a result of the multi-dimensional model elements, two groups of heterogeneities have been identified – dimension and fact. The former depends on differences between the dimensions’ hierarchies, their members, the names of the members, their levels and dimensions. The latter kind of heterogeneities occurs when facts in different data marts are in different names, values (inconsistent measures), formats or even on a different scale. Therefore, the paper examines and classifies the heterogeneities which can occur during the integration of independently developed data marts and four methods for heterogeneity detection are proposed and discussed. The methods are as follows: method for metadata extraction, method for detecting schema-instance heterogeneities, method for detecting heterogeneities among dimensions and method for detecting heterogeneities among facts. The paper ends with conclusions about the advantages of the proposed methods for heterogeneity detection during the integration of data marts.

Journal ArticleDOI
01 Mar 2015-ComTech
TL;DR: By implementing tuning for the funding database, it can help the bank to improve the performance of database system and decrease time to produce analytical reports.
Abstract: The purpose of this research is to analyze, design, and implement tuning for the funding database, which include data mart processing that will be used in the formation of analytical reports. The research method used is literature study from a variety of journals, books, e-books, and articles on the internet. Fact finding techniques are also done by analyzing, collecting, and examining the documents, interviews, and observations. Other the research methods are also used to analyze and design database such as SQL tuning, Partitioning, and Indexing. The results obtained from this research is an implementation tuning for the funding database, in which if it is implemented, it will improve performance significantly. The conclusion is by implementing tuning for the funding database, it can help the bank to improve the performance of database system and decrease time to produce analytical reports.


Patent
06 May 2015
TL;DR: In this paper, a decision support system for a freezing method based shaft sinking project is presented, which comprises a source data layer mainly used for storing detected data before and during freezing construction, a data acquisition layer is responsible for extracting data from the source dataset, and a data management layer establishes a data mart to reduce data processing workload according to different themes.
Abstract: The invention discloses a freezing control decision support system for a freezing method based shaft sinking project. The system comprises a source data layer mainly used for storing detected data before and during freezing construction, a data acquisition layer is responsible for extracting data from the source data layer and then loading cleaned and converted data into a freezing project data warehouse, and a data management layer establishes a data mart to reduce data processing workload according to different themes. Multi-layered analyzing and digging of the data are realized through data digging and OLAP (on-line analytical processing) tools in a data analysis layer, acquired knowledge is put into a knowledge base, and auxiliary decisions for certain analysis are achieved through knowledge inference. Comprehensive decisions of multiple models are achieved by a model base. A data display layer provides analysis results for related decision-making personnel through a front-end display tool (graphic user interface). The system integrates various detection, analysis and control in the field of freezing control and provides powerful decision support for freezing construction personnel to control freezing.

Posted ContentDOI
10 Oct 2015-viXra
TL;DR: This paper proposes Hyper ETL with an integration of decision making methodologies and fuzzy optimistic technique, an intelligent data repository with soft computing that covers similarity metrics that are commonly used to improve the efficiency of data storages.
Abstract: Data warehouse is one of the components of the overall business intelligence system. An enterprise has one data warehouse, and data marts source has their information from the data warehouse. The Data warehouse is a corporation of all data marts within the enterprise. Information is always accumulated in the dimensional model. In this paper, an intelligent data repository with soft computing is presented. It covers similarity metrics that are commonly used to improve the efficiency of data storages. It also covers multiple decision making methodologies to improve the efficiency of decision making. This chapter focuses on the review of the literature for Extraction, Transform and Load with Data Warehouse. Moreover the ETL hybridization with fuzzy optimization, Markov Decision model, Decision making criteria and Decision Matrix has also been reviewed. The Decision Matrix is a mathematical tool to deal with uncertainty and vagueness of decision systems. It has been applied successfully in all fields. This paper proposes Hyper ETL with an integration of decision making methodologies and fuzzy optimistic technique.

01 Jan 2015
TL;DR: Adopting a Data Vault (DV)-based Enterprise Data Warehouse (EDW) can simplify and enhance various aspects of testing, and curtail delays common in non-DV based DW projects.
Abstract: Data warehouse (DW) projects are undertakings that require integration of disparate sources of data, a well-defined mapping of the source data to the reconciled data, and effective Extract, Transform, and Load (ETL) processes. Owing to the complexity of data warehouse projects, great emphasis must be placed on an agile-based approach with properly developed and executed test plans throughout the various stages of designing, developing, and implementing the data warehouse to mitigate against budget overruns, missed deadlines, low customer satisfaction, and outright project failures. Yet, there are often attempts to test the data warehouse exactly like traditional back-end databases and legacy applications, or to downplay the role of quality assurance (QA) and testing, which only serve to fuel the frustration and mistrust of data warehouse and business intelligence (BI) systems. In spite of this, there are a number of steps that can be taken to ensure DW/BI solutions are successful, highly trusted, and stable. In particular, adopting a Data Vault (DV)-based Enterprise Data Warehouse (EDW) can simplify and enhance various aspects of testing, and curtail delays common in non-DV based DW projects. A major area of focus in this research is raw DV loads from source systems, keeping transformations to a minimum in the ETL process which loads the DV from the source. Certain load errors, classified as permissible errors and enforced by business rules, are kept in the Data Vault until correct values are supplied. Major transformation activities are pushed further downstream to the next ETL process which loads and refreshes the Data Mart (DM) from the Data Vault. INDEX WORDS: Data warehouse testing, Data Vault, Data Mart, Business Intelligence, Quality Assurance, Raw Data Vault loads