scispace - formally typeset
Search or ask a question

Showing papers on "Data mart published in 2008"


Journal ArticleDOI
TL;DR: A novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNP) in widespread use in human population genetics: SPSmart (SNPs for Population Studies).
Abstract: In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 109 genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.

116 citations


Journal ArticleDOI
TL;DR: This paper starts by tackling the basic issue of matching heterogeneous dimensions and provides a number of general properties that a dimension matching should fulfill, and proposes two different approaches to the problem of integration that try to enforce matchings satisfying these properties.
Abstract: In this paper we address the problem of integrating independent and possibly heterogeneous data warehouses, a problem that has received little attention so far, but that arises very often in practice. We start by tackling the basic issue of matching heterogeneous dimensions and provide a number of general properties that a dimension matching should fulfill. We then propose two different approaches to the problem of integration that try to enforce matchings satisfying these properties. The first approach refers to a scenario of loosely coupled integration, in which we just need to identify the common information between data sources and perform join operations over the original sources. The goal of the second approach is the derivation of a materialized view built by merging the sources, and refers to a scenario of tightly coupled integration in which queries are performed against the view. We also illustrate architecture and functionality of a practical system that we have developed to demonstrate the effectiveness of our integration strategies.

56 citations


Journal ArticleDOI
TL;DR: The figures reported in the paper should support the notion that BGIS-related systems' applications are potentially a good investment and worthy of considerable research in the knowledge management fields.
Abstract: The purpose of this paper is to provide practitioners of information exploration of the need for Business and e-Government Intelligence Systems (BGIS), the role such intelligence plays in competitive market research, industry, through the comparison of vendors, advantages and disadvantages, comparing the costs and benefits and some future insights. A review of the applied literature on topics that focus is on utilising Business Intelligence (BI) as a competitive tool in an online retrieval environment. The growth for BI systems may be dramatic [actual (2004, $5.3 billion; 2004, $5.6 billion) and predicted growth (2005, $6 billion; 2006, $6.5 billion; 2007, $7 billion and in 2008, $7.3 billion)], its associated costs may be equally stunning, especially in end-user query, reporting, analysis, data-mining applications and packaged data mart and/or warehousing applications. However, the figures reported in the paper should support the notion that BGIS-related systems' applications are potentially a good investment and worthy of considerable research in the knowledge management fields.

23 citations


Journal ArticleDOI
TL;DR: This article will provide guidelines for design and development of similar complex data marts in agricultural sector, particularly in the field of livestock management with special reference to animal resource management.

20 citations


Book ChapterDOI
20 Oct 2008
TL;DR: This approach consists on linking information requirements to specific data marts elicited by using the goal-oriented requirement engineering, which are automatically translated into the implementation of the corresponding data repositories by means of model-driven engineeringtechniques.
Abstract: A corporate data warehouseis a repository that provides decision makers with a large amount of historical data concerning the overall enterprise strategy. In order to customize the data warehouse, many organizations develop concrete data martsfocused on a particular department or business process. However, their integrated development is still an open problem for many organizations due to the technical and organizational challenges involved during the design of these repositories as a complete solution. Therefore, we present here a design approach in order to build both the corporate data warehouse and data marts from user's requirements in an integrated way. Our approach consists on linking information requirements to specific data marts elicited by using the goal-oriented requirement engineering, which are automatically translated into the implementation of the corresponding data repositories by means of model-driven engineeringtechniques. Its great advantage is that user's requirements are captured since the very-early development stages of a data-warehousing project in order to automatically translate them into the entire data-warehousing platform.

11 citations


Proceedings Article
20 Feb 2008
TL;DR: The paper examines some common platforms supporting Business Intelligence activities in order to state evaluation criteria for the system choice and experimental results are reported which show the advantages and the drawbacks of each considered system.
Abstract: The paper examines some common platforms supporting Business Intelligence activities in order to state evaluation criteria for the system choice. The evaluation considers a software measurement method based on the analysis of the functional complexity of the platforms. The study has been performed on an academic warehouse that uses historical data available in legacy databases. Experimental results are reported which show the advantages and the drawbacks of each considered system.

11 citations


Journal Article
TL;DR: A set of evaluation criteria is described and considered for comparing some popular OLAP systems that support Business Intelligence that involve critical aspects such as: information delivery, system and user administration, and OLAP queries.
Abstract: A set of evaluation criteria is described and considered for comparing some popular OLAP systems that support Business Intelligence. These criteria involve critical aspects such as: information delivery, system and user administration, and OLAP queries. The measurement method is based on the functional complexity analysis. Experimental results have been carried out using a data warehouse in academic environment and they allow to evidence the weaknesses and the points of force of each compared system.

10 citations


Posted Content
01 Jan 2008
TL;DR: The objectives are to understanding what data warehouse means examine the reasons for doing so, appreciate the implications of the convergence of Web technologies and those of the data warehouse and examine the steps for building a Web-enabled data warehouse.
Abstract: In this paper, our objectives are to understanding what data warehouse means examine the reasons for doing so, appreciate the implications of the convergence of Web technologies and those of the data warehouse and examine the steps for building a Web-enabled data warehouse. The web revolution has propelled the data warehouse out onto the main stage, because in many situations the data warehouse must be the engine that controls or analysis the web experience. In order to step up to this new responsibility, the data warehouse must adjust. The nature of the data warehouse needs to be somewhat different. As a result, our data warehouses are becoming data webhouses. The data warehouse is becoming the infrastructure that supports customer relationship management (CRM). And the data warehouse is being asked to make the customer clickstream available for analysis. This rebirth of data warehousing architecture is called the data webhouse.

8 citations


Book ChapterDOI
01 Jan 2008
TL;DR: Data warehousing has been increasingly recognized as an effective tool for organizations to transform data into useful information for strategic decision-making and to achieve competitive advantages via data warehousing, data warehouse management is crucial.
Abstract: As internal and external demands on information from managers are increasing rapidly, especially the information that is processed to serve managers’ specific needs, regular databases and decision support systems (DSS) cannot provide the information needed. Data warehouses came into existence to meet these needs, consolidating and integrating information from many internal and external sources and arranging it in a meaningful format for making accurate business decisions (Martin, 1997). In the past five years, there has been a significant growth in data warehousing (Hoffer, Prescott, & McFadden, 2005). Correspondingly, this occurrence has brought up the issue of data warehouse administration and management. Data warehousing has been increasingly recognized as an effective tool for organizations to transform data into useful information for strategic decision-making. To achieve competitive advantages via data warehousing, data warehouse management is crucial (Ma, Chou, & Yen, 2000).

7 citations


Proceedings Article
02 May 2008
TL;DR: In this article, the authors describe a proposal of the architecture of a Business Intelligence system and the flow of data processing for the University of Aarhus in Denmark, which is based on the same model as the one described in this paper.
Abstract: Traditional users of data warehouses were banks, financial services, or chains of supermarkets. Instead, Institutional Organizations (e.g. Academies) in the past did not use the large amount of transactional data for strategic decision making. The optimal management of a University can now be considered as critical as the management of a big enterprise. In fact, the factors affecting the management of a University are the same involved in the business processes. The paper describes a proposal of the architecture of a Business Intelligence system and the flow of data processing for our University.

4 citations


Book ChapterDOI
01 Jan 2008

Journal Article
TL;DR: The authors present a geoscience spatial data warehouse architecture that conforms to China's national conditions and have five levels, i.e. the data source, spatial ETL, spatial data storage, application service based on SOA and client application, and a three-level physical deployment scheme.
Abstract: The authors took the geoscience spatial data warehouse as a scheme of data integration in order to integrate multi-source, heterogeneous and disperse geological data of China and provide effective data for resource assessmentThey for the first time present a geoscience spatial data warehouse architecture that conforms to China's national conditions and have five levels,iethe data source, spatial ETL,spatial data storage,application service based on SOA and client applicationThe authors designed a three-level(state,ad- ministrative regions and provinces)physical deployment scheme for the geoscience spatial data warehouse system according to the ad- ministration regions of China's geological work and distribution of dataIt can realize the objectives of geoscience data integration Research results show that this is a complete and feasible geoscience data integration scheme that conforms to the actual situation of geoscience of China

Book
07 Aug 2008
TL;DR: The proposed data mart based information system has proven to be useful and effective in the particular application domain of clinical research in heart surgery and integrates the current and historical data from all relevant data sources without imposing any considerable operational or liability contract risk for the existing hospital information systems.
Abstract: The proposed data mart based information system has proven to be useful and effective in the particular application domain of clinical research in heart surgery. In contrast to common data warehouse systems who are focused primarily on administrative, managerial, and executive decision making, the primary objective of the designed and implemented data mart was to provide an ongoing, consolidated and stable research basis. Beside detail-oriented patient data also aggregated data are incorporated in order to fulfill multiple purposes. Due to the chosen concept, this technique integrates the current and historical data from all relevant data sources without imposing any considerable operational or liability contract risk for the existing hospital information systems (HIS). By this means the possible resistance of involved persons in charge can be minimized and the project specific goals effectively met. The challenges of isolated data sources, securing a high data quality, data with partial redundancy and consistency, valuable legacy data in special file formats, and privacy protection regulations are met with the proposed data mart architecture. The applicability was demonstrated in several fields, including (i) to permit easy comprehensive medical research, (ii) to assess preoperative risks of adverse surgical outcomes, (iii) to get insights into historical performance changes, (iv) to monitor surgical results, (v) to improve risk estimation, and (vi) to generate new knowledge from observational studies. The data mart approach allows to turn redundant data from the electronically available hospital data sources into valuable information. On the one hand, redundancies are used to detect inconsistencies within and across HIS. On the other hand, redundancies are used to derive attributes from several data sources which originally did not contain the desired semantic meaning. Appropriate verification tools help to inspect the extraction and transformation processes in order to ensure a high data quality. Based on the verification data stored during data mart assembly, various aspects on the basis of an individual case, a group, or a specific rule can be inspected. Invalid values or inconsistencies must be corrected in the primary source data bases by the health professionals. Due to all modifications are automatically transferred to the data mart system in a subsequent cycle, a consolidated and stable research data base is achieved throughout the system in a persistent manner. In the past, performing comprehensive observational studies at the Heart Institute Lahr had been extremely time consuming and therefore limited. Several attempts had already been conducted to extract and combine data from the electronically available data sources. Dependent on the desired scientific task, the processes to extract and connect the data were often rebuilt and modified. Consequently the semantics and the definitions of the research data changed from one study to the other. Additionally, it was very difficult to maintain an overview of all data variants and derived research data sets. With the implementation of the presented data mart system the most time and effort consuming process with conducting successful observational studies could be replaced and the research basis remains stable and leads to reliable results.

Book ChapterDOI
01 Jan 2008
TL;DR: This chapter provides an overview of the history of data warehousing with a focus on the first-generation data warehouse, which evolved to include disciplined data ETL from legacy applications in a granular, historical, integrated data warehouse.
Abstract: This chapter provides an overview of the history of data warehousing. Data warehousing has come a long way since the frustrating days when user data was limited to operational application data that was accessible only through an IT department intermediary. Data warehousing has evolved to meet the needs of end users who require integrated, historical, granular, flexible, and accurate information. The first-generation data warehouse evolved to include disciplined data ETL (extract/transform/load) from legacy applications in a granular, historical, integrated data warehouse. With the growing popularity of data warehousing came numerous changes—volumes of data, a spiral development approach, heuristic processing, and more. As the evolution of data warehousing continued, some mutant forms emerged like active data warehousing, federated data warehousing, star schema data warehouses, and data mart data warehouses. While each of these mutant forms of data warehousing has some advantages, they also have introduced a host of new and significant disadvantages. Therefore, the time for the next generation of data warehousing has come.

01 Jan 2008
TL;DR: A suitable use of multidimensional data analysis (MDA) is proposed to investigate the associations characterizing the indicators/attributes of the system to asses the impact of an adopted policy by measuring system performance.
Abstract: The present paper focuses on ex post analysis to asses the impact of an adopted policy by measuring system performance. Since accurate impact assessment requires in-depth knowledge of the structure underlying the system, this contribution proposes a suitable use of multidimensional data analysis (MDA) to investigate the associations characterizing the indicators/attributes of the system. The general aim is to identify homogeneous subsets of objects that are described by subsets of attributes. This approach was planned to study students performance in Italian universities: the focus is on student careers. The example data set is a data mart selected from the University of Macerata data base and refers to the students at the Economics Faculty from 2001 to 2007.

17 Oct 2008
TL;DR: Time Histograms with Interactive Selection of Time Unit and Dimension (THISTUD) as mentioned in this paper is an improved technique of time histograms, which can be used to improve the usability and efficiency of the sale data analysis methods.
Abstract: Many researchers are working on the improvement of the usability and efficiency of the sale data analysis methods which are required by the users – analytical staff and managers. In this paper, we present the improved technique of time histograms. We named proposed method Time Histograms with Interactive Selection of Time Unit and Dimension (THISTUD). Performed modifications and developed interactive user interface are described. The system is tested on the data warehouse that includes data for the ten years. The created sale data mart and visualization results are presented in the paper too.

Proceedings Article
06 Nov 2008
TL;DR: Initial design of a "patient-specific" hybrid system (physiological-causal probabilistic) of adaptive diabetes models and insulin treatment algorithms will be presented.
Abstract: Constantly changing diabetes care standards makes it challenging to deliver care adapted to the unique condition of the individual patient. The availability of large amounts of data from patient's electronic medical records makes it possible to individualize diabetes management. Initial design of a "patient-specific" hybrid system (physiological-causal probabilistic) of adaptive diabetes models and insulin treatment algorithms will be presented. The system is uniquely derived and tested using a diabetes data mart of about 33,000 patients.

Journal Article
TL;DR: Building data mart of product design case for corporations, a knowledge processing method for product design was presented and a new discrete method was put forward, and the veracity and efficiency of case retrieval has been improved.
Abstract: Data mart is a cheap method to give management analysis for product design knowledge processing.Building data mart of product design case for corporations,a knowledge processing method for product design was presented.Designers could input inquiries into Case-Based Reasoning(CBR) system.Then On-Line Analysis and Processing(OLAP) drilled down and found the similar case.Knowledge reduction techniques were adopted to reduce the retrieved similar cases output from OLAP,which improved CBR.Rough set theory was applied to calculate the important degree of each feature attribute and remove the redundant ones.And to deal with the quantitative features,a new discrete method was put forward,and the veracity and efficiency of case retrieval has been improved.The last an example was introduced.

Patent
07 Feb 2008
TL;DR: In this paper, the authors present methods and systems for assembling, managing, and using a continuous translational data system. But they do not provide a detailed description of the system itself.
Abstract: Provided are methods and systems for assembling, managing, and using a continuous translational data system.

01 Jan 2008
TL;DR: A design research project was undertaken to demonstrate that an Access-based data mart could successfully streamline this report generating process and demonstrate the need to eliminate excessive detail and deliver highly summarized reports.
Abstract: Hospitals and medical centers participate in a physician profiling process. This process is important to ensure that physicians are providing safe care and to comply with regulations. One medical center was struggling with the ongoing generation of physician performance reports that were an important part of the profiling process. A design research project was undertaken to demonstrate that an Access-based data mart could successfully streamline this report generating process. The research also demonstrated the need to eliminate excessive detail and deliver highly summarized reports. In addition, the research provided thorough documentation of the entire data mart development approach. This documentation can serve as a resource for future research and/or for other medical centers that might be struggling to manage the profiling report requirements. Profiling Data Mart vii Table of

Proceedings ArticleDOI
30 Oct 2008
TL;DR: This article describes the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, and shows that the updating of these data marts is straightforward, permitting easy implementation of new external data and the computation of new statistical indices.
Abstract: Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. To address this limitation, we propose building in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we propose building a set of data processing scripts that would deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen) that can be stripped into single genotypes and then grouped into populations. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates up to elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. This article describes the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, and shows that the updating of these data marts is straightforward, permitting easy implementation of new external data and the computation of new statistical indices.


Book ChapterDOI
01 Jan 2008
TL;DR: This chapter focuses on topics like data marts, monitoring the DW 2.0 environment, moving data from one data mart to another, what to do about bad data, the speed of the movement of data within DW 2,0, and data warehouse utilities.
Abstract: Publisher Summary This chapter focuses on topics like data marts, monitoring the DW 2.0 environment, moving data from one data mart to another, what to do about bad data, the speed of the movement of data within DW 2.0, and data warehouse utilities. DW 2.0 is presented as a representation of the base data that resides at the core of the DW 2.0 enterprise data warehouse. However, there are independent structures that use that data for analytical purposes. The exploration facility is one such structure. Another structure that takes data from DW 2.0 is the data mart. Data marts contain departmental data for the purpose of decision making. There are lots of reasons for the creation of a data mart, including the cost of machine cycles is low, the end user has control, and the performance of the DW 2.0 environment is enhanced. When bad data enters the DW 2.0 environment, the source of the bad data should be identified and corrected, a balancing entry can be created, a value may be reset, and actual corrections can be made to the data.