scispace - formally typeset
Search or ask a question

Showing papers on "Data mart published in 2009"


Journal ArticleDOI
TL;DR: The main concepts and terminology of temporal databases are introduced and the open research issues also in connection with their implementation on commercial tools are discussed.
Abstract: Data warehouses are information repositories specialized in supporting decision making. Since the decisional process typically requires an analysis of historical trends, time and its management acquire a huge importance. In this paper we consider the variety of issues, often grouped under term temporal data warehousing, implied by the need for accurately describing how information changes over time in data warehousing systems. We recognize that, with reference to a three-levels architecture, these issues can be classified into some topics, namely: handling data/schema changes in the data warehouse, handling data/schema changes in the data mart, querying temporal data, and designing temporal data warehouses. After introducing the main concepts and terminology of temporal databases, we separately survey these topics. Finally, we discuss the open research issues also in connection with their implementation on commercial tools.

94 citations


01 Jan 2009
TL;DR: In this paper, the authors consider the variety of issues, often grouped under term temporal data warehousing, implied by the need for accurately describing how information changes over time in data warehouse systems and recognize that, with reference to a three-levels architecture, these issues can be classified into some topics, namely: handling data/schema changes in the data warehouse, handling data and schema changes in data mart, querying temporal data, and designing temporal data warehouses.
Abstract: Data warehouses are information repositories specialized in supporting decision making. Since the decisional process typically requires an analysis of historical trends, time and its management acquire a huge importance. In this paper we consider the variety of issues, often grouped under term temporal data warehousing, implied by the need for accurately describing how information changes over time in data warehousing systems. We recognize that, with reference to a three-levels architecture, these issues can be classified into some topics, namely: handling data/schema changes in the data warehouse, handling data/schema changes in the data mart, querying temporal data, and designing temporal data warehouses. After introducing the main concepts and terminology of temporal databases, we separately survey these topics. Finally, we discuss the open research issues also in connection with their implementation on commercial tools.

89 citations


Journal ArticleDOI
TL;DR: The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories.
Abstract: Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.

72 citations


Patent
27 Feb 2009
TL;DR: In this paper, a marketer may query a data mart with a user-defined rule created with parameters selected from fields available in the data mart and is presented with a count with which the marketer can determine if the segment will be cost effective for the marketing campaign.
Abstract: Remote segmentation applied to a segmentation data mart allows a marketer to create a personalized email campaign for a selected segment of customers. Segmentation data is collected from a plurality of third party sources, imported and cleansed. The marketer may query a data mart with a user-defined rule created with parameters selected from fields available in the data mart. The marketer submits the query and is presented with a count with which the marketer may determine if the segment will be cost effective for the marketing campaign. If the count is acceptable, the query is saved. Later, when the marketer creates the email message for a particular campaign, s/he assigns the segment to the campaign. When the campaign is released, the query extracts email addresses currently meeting the criteria of the query and uses the addresses for distributing the email.

52 citations


Book ChapterDOI
10 Nov 2009
TL;DR: A novel approach to personalizing OLAP systems at the conceptual level based on the underlying multidimensional model of the data warehouse, a user model and a set of personalization rules is presented.
Abstract: Data warehouses rely on multidimensional models in order to provide decision makers with appropriate structures to intuitively analyze data with OLAP technologies However, data warehouses may be potentially large and multidimensional structures become increasingly complex to be understood at a glance Even if a departmental data warehouse (also known as data mart) is used, these structures would be also too complex As a consequence, acquiring the required information is more costly than expected and decision makers using OLAP tools may get frustrated In this context, current approaches for data warehouse design are focused on deriving a unique OLAP schema for all analysts from their previously stated information requirements, which is not enough to lighten the complexity of the decision making process To overcome this drawback, we argue for personalizing multidimensional models for OLAP technologies according to the continuously changing user characteristics, context, requirements and behaviour In this paper, we present a novel approach to personalizing OLAP systems at the conceptual level based on the underlying multidimensional model of the data warehouse, a user model and a set of personalization rules The great advantage of our approach is that a personalized OLAP schema is provided for each decision maker contributing to better satisfy their specific analysis needs Finally, we show the applicability of our approach through a sample scenario based on our CASE tool for data warehouse development

42 citations


Journal ArticleDOI
TL;DR: The SPORE Head and Neck Neoplasm Virtual Biorepository is a robust translational biomedical informatics tool that can facilitate basic science, clinical, and translational research.
Abstract: The Specialized Program of Research Excellence (SPORE) in Head and Neck Cancer neoplasm virtual biorepository is a bioinformatics-supported system to incorporate data from various clinical, pathological, and molecular systems into a single architecture based on a set of common data elements (CDEs) that provides semantic and syntactic interoperability of data sets. The various components of this annotation tool include the Development of Common Data Elements (CDEs) that are derived from College of American Pathologists (CAP) Checklist and North American Association of Central Cancer Registries (NAACR) standards. The Data Entry Tool is a portable and flexible Oracle-based data entry device, which is an easily mastered web-based tool. The Data Query Tool helps investigators and researchers to search de-identified information within the warehouse/resource through a "point and click" interface, thus enabling only the selected data elements to be essentially copied into a data mart using a multi dimensional model from the warehouse's relational structure. The SPORE Head and Neck Neoplasm Database contains multimodal datasets that are accessible to investigators via an easy to use query tool. The database currently holds 6553 cases and 10607 tumor accessions. Among these, there are 965 metastatic, 4227 primary, 1369 recurrent, and 483 new primary cases. The data disclosure is strictly regulated by user's authorization. The SPORE Head and Neck Neoplasm Virtual Biorepository is a robust translational biomedical informatics tool that can facilitate basic science, clinical, and translational research. The Data Query Tool acts as a central source providing a mechanism for researchers to efficiently find clinically annotated datasets and biospecimens that are relevant to their research areas. The tool protects patient privacy by revealing only de-identified data in accordance with regulations and approvals of the IRB and scientific review committee.

14 citations


Proceedings ArticleDOI
23 Aug 2009
TL;DR: This research empirically confirms that organizational and operational factors affect the successful implementation of a Data Warehouse and concludes that the high level of correlation between the success factors of the implementation and the success Factors of the system does not exist according to the criteria established by Hinkle.
Abstract: The growth of the Internet and the expansion of global markets have transformed the economies of industrial societies. Services provided by these economies are based on knowledge and information management, implying a changing role for information systems. Information systems provide companies with the communication and the analytical tools to manage business on a global scale. They are the backbone for the new products and services that are provided by economies that are knowledge based. Information systems permit businesses to adopt more flexible and decentralized structures [8]. Corporate organizations have implemented data warehouse projects to improve their ability to assess, understand and analyze their business operations [11]. A data warehouse recovers data from multiple operational or transactional systems; data is integrated and stored, allowing the creation of a new product (information). This study builds on previous research that will help to identify the important elements that permit a better understanding of data warehouse implementation. The research will also discuss previous data warehouse models that were successfully implemented and determine their application in a Puerto Rican corporation. This research empirically confirms that organizational and operational factors affect the successful implementation of a Data Warehouse. It concludes that the high level of correlation between the success factors of the implementation and the success factors of the system does not exist according to the criteria established by Hinkle [6].

11 citations


Journal ArticleDOI
TL;DR: An OLAP system can help the telecommunications company to get better insight into its customers' behavior and improve its marketing campaigns and pricing strategies.
Abstract: In order to succeed in the market, telecommunications companies are not competing solely on price. They have to expand their services based on their knowledge of customers' needs gained through the use of call detail records (CDR) and customer demographics. All the data should be stored together in the CDR data mart. The paper covers the topic of its design and development in detail and especially focuses on the conceptual/logical/physical trilogy. Some other design problems are also discussed. An important area is the problem involving time. This is why the implication of time in data warehousing is carefully considered. The CDR data mart provides the platform for Online Analytical Processing (OLAP) analysis. As it is presented in this paper, an OLAP system can help the telecommunications company to get better insight into its customers' behavior and improve its marketing campaigns and pricing strategies.

10 citations


Book ChapterDOI
01 Jan 2009
TL;DR: The back-end tools of a data Warehouse are pieces of software responsible for the extraction of data from several sources, their cleansing, customization, and insertion into a data warehouse.
Abstract: The back-end tools of a data warehouse are pieces of software responsible for the extraction of data from several sources, their cleansing, customization, and insertion into a data warehouse. They are known under the general term extraction, transformation and loading (ETL) tools. In all the phases of an ETL process (extraction and exportation, transformation and cleaning, and loading), individual issues arise and, along with the problems and constraints that concern the overall ETL process, make its lifecycle a very complex task.

6 citations


Patent
21 Jul 2009
TL;DR: In this paper, the authors present a system in which a medical device selects less than all of its stored information and provides the selected subset of information to a data mart for storage, processing, and/or communication to one or more interested parties.
Abstract: Embodiments of the present invention provide a system in which a medical device selects less than all of its stored information and provides the selected subset of information to a data mart for storage, processing, and/or communication to one or more interested parties. In many embodiments, customers, patients, or even components of the medical device or of the remote patient management system can access selected medical device information (e.g., customers can access medical device information tailored to the care they are providing to one or more patients). In many embodiments, customers can receive such medical device information according to a schedule that best suits their care (or whenever they desire such information, irrespective of a schedule). In many embodiments, providing less than full transmissions to the data mart reduces the strain on medical device batteries.

6 citations


Book ChapterDOI
02 Jun 2009
TL;DR: Experiences with developing and running a life sciences data management system in a productive environment are reported on.
Abstract: The advances in information technology boosted life sciences research towards systems biology which aims at studying complex interactions in biological systems in an integrative way. Steadily improving high-throughput instruments (genome sequencers, mass spectrometers etc.) and analysis software produce large amounts of experimental data. In this paper, we report on experiences with developing and running a life sciences data management system in a productive environment.

Patent
23 Mar 2009
TL;DR: In this article, a data transformation can be constructed that summarizes by the original and by the associated dimensions in feeds in an analytical data mart (cube) that includes all the dimensions.
Abstract: Not all facts in a data warehouse are described by the same set of dimensions. However, there can be associations between the data dimensions and other dimensions. By maintaining a set of relationships that are capable of linking the dimensional keys used in existing data to the keys of an associated dimension, a data transformation can be constructed that summarizes by the original and by the associated dimensions in feeds in an analytical data mart (cube) that includes all the dimensions. This cube can then be consolidated and analyzed in a slice-and-dice fashion as though all the dimensions were independent. Data transformed in this manner can be analyzed alongside data from a source that is keyed by all of the dimensions.

01 Jan 2009
TL;DR: The main concepts and terminology of temporal databases are introduced and the open research issues also in connection with their implementation on commercial tools are discussed.
Abstract: Data warehouses are information repositories specialized in supporting decision making. Since the decisional process typically requires an analysis of historical trends, time and its management acquire a huge importance. In this paper we consider the variety of issues, often grouped under term temporal data warehousing, implied by the need for accurately describing how information changes over time in data warehousing systems. We recognize that, with reference to a three-levels architecture, these issues can be classified into some topics, namely: handling data/schema changes in the data warehouse, handling data/schema changes in the data mart, querying temporal data, and designing temporal data warehouses. After introducing the main concepts and terminology of temporal databases, we separately survey these topics. Finally, we discuss the open research issues also in connection with their implementation on commercial tools.

Journal ArticleDOI
TL;DR: This paper is an attempt to provide the initial concept about data mining model that most likely will be used in various department including libraries of the teaching institutes.
Abstract: Organisations be it industry or business or even educational institutes, need to improve their information inventory system so as to survive in the competitive environment. The organisations have to increase their efficiency and effectiveness in maintaining the cycle of activities, in their planning, decision-making processes, and analytical needs. There are several ways to acquire this goal; one of it is with data mining which is able to make a prediction using existing data in their database in order to forecast future demand. In addition, with data mining they would be able to determine which activity is more important and what trend is prevailing. An information system, which is based on both World Wide Web technology and a 3-tiered architecture, is proposed herein to meet the above requirements. This paper is an attempt to provide the initial concept about data mining model that most likely will be used in various department including libraries of the teaching institutes. The initial concepts covered by the paper are the appropriate data warehouse schema; data mining tasks and techniques that are best suited, and applications. http://dx.doi.org/10.14429/djlit.29.242

Proceedings ArticleDOI
11 Jun 2009
TL;DR: The concepts of data warehouse, data mart, on-line analytical processing (OLAP), and the universal OLAP tool whose functions include SQL, classifying statistic, and data visualization are introduced into the domain of environmental decision support system.
Abstract: The concepts of data warehouse, data mart, on-line analytical processing (OLAP) are introduced into the domain of environmental decision support system. A water environmental data mart (WEDM) is developed with a city's water environmental operation database as data source, and with star schema as data framework. Two main parts are included in WEDM, the extract-cleanse-transform-load tool, and the universal OLAP tool whose functions include SQL, classifying statistic, and data visualization. The WEDM can provide multidimensional, multi-leveling, integrated, dynamic, flexible querying and analyzing, which is not offered by previous environmental operation database system. Three examples show the functions of proposing decision support information of WEDM.

Book ChapterDOI
01 Jan 2009

Book
15 Jul 2009
TL;DR: This book is a practical tutorial for Analysis Services that shows readers how to solve problems commonly encountered while designing cubes, and explains which features of Analysis Services work well and which should be avoided.
Abstract: Design and implement fast, scalable and maintainable cubes A real-world guide to designing cubes with Analysis Services 2008Model dimensions and measure groups in BI Development StudioImplement security, drill-through, and MDX calculationsLearn how to deploy, monitor, and performance-tune your cubeFilled with best practices and useful hints and tips In Detail Microsoft's SQL Server Analysis Services 2008 is an OLAP server that allows users to analyze business data quickly and easily. However, designing cubes in Analysis Services can be a complex task: it's all too easy to make mistakes early on in development that lead to serious problems when the cube is in production. Learning the best practices for cube design before you start your project will help you avoid these problems and ensure that your project is a success.This book offers practical advice on how to go about designing and building fast, scalable, and maintainable cubes that will meet your users' requirements and help make your Business Intelligence project a success.This book gives readers insight into the best practices for designing and building Microsoft Analysis Services 2008 cubes. It also provides details about server architecture, performance tuning, security, and administration of an Analysis Services solution.In this book, you will learn how to design and implement Analysis Services cubes. Starting from designing a data mart for Analysis Services, through the creation of dimensions and measure groups, to putting the cube into production, we'll explore the whole of the development lifecycle.This book is an invaluable guide for anyone who is planning to use Microsoft Analysis Services 2008 in a Business Intelligence project. What you will learn from this book? Build a data mart suitable for use with Analysis Services Create and configure an Analysis Services project in Business Intelligence Development StudioUse the Dimension Wizard and the Dimension Editor to build dimensionsCreate measure groups and associate them with dimensionsAdd calculations to the cube, including implementing currency conversion and a date tool dimensionExplore the security model, including dimension security and cell security, and implement dynamic securityTune queries to get the best possible performanceAutomate processing and partition creationMonitor your cube to see who's actually using it Approach This is a practical tutorial for Analysis Services that shows readers how to solve problems commonly encountered while designing cubes, and explains which features of Analysis Services work well and which should be avoided. The book walks through the whole cube development lifecycle, from building dimensions, cubes and calculations to tuning and moving the cube into production. Who this book is written for? This book is aimed at Analysis Services developers who already have some experience but who want to go into more detail on advanced topics, and who want to learn best practices for cube design.

Book ChapterDOI
TL;DR: This paper reviews data warehouse appliances by surveying thirteen products offered today, assessing the common characteristics among them and proposing a classification for DA offerings, in the hope it will help define a useful benchmark for DAs.
Abstract: The success of Business Intelligence (BI) applications depends on two factors, the ability to analyze data ever more quickly and the ability to handle ever increasing volumes of data. Data Warehouse (DW) and Data Mart (DM) installations that support BI applications have historically been built using traditional architectures either designed from the ground up or based on customized reference system designs. The advent of Data Warehouse Appliances (DA) brings packaged software and hardware solutions that address performance and scalability requirements for certain market segments. The differences between DAs and custom installations make direct comparisons between them impractical and suggest the need for a targeted DA benchmark. In this paper we review data warehouse appliances by surveying thirteen products offered today. We assess the common characteristics among them and propose a classification for DA offerings. We hope our results will help define a useful benchmark for DAs.

Journal ArticleDOI
TL;DR: The idea of using statistical methods to model federated data marts is presented and advantages include: quick query responses without accessing external servers; user-defined accuracy of the approximate query answers and network-efficient method for periodical updates.
Abstract: The global business deals with a large amount of business data that are stored in potentially hundreds of distributed systems. It is challenging to allow end-users issue online analytical processing (OLAP) queries to retrieve suitable information through a worldwide network. This article presents the idea of using statistical methods to model federated data marts. Once data marts are modelled, reduced sets of distributed data can be imported and used to approximately reconstruct a federated data mart. Approximate queries can then be obtained from the reconstructed federated data mart. Advantages of this design include: quick query responses without accessing external servers; user-defined accuracy of the approximate query answers and network-efficient method for periodical updates. A proof of concept is presented using large data sets used for marketing analysis purposes.

Book ChapterDOI
01 Jan 2009

Proceedings ArticleDOI
25 Jul 2009
TL;DR: This paper describes a data modeling method for the prediction system of specialty setting, considering both the requirements and the reality, based on the data warehouse theory and technology.
Abstract: It's important to set up a basic database which can support national wide prediction system of specialty setting. With the challenges of massive and multi-source data, the key question we must solve is how to satisfy the users' complex query demands in different levels and different angles. Based on the data warehouse theory and technology, we have taken an in-depth study of the data modeling of the data mart in the prediction system of specialty setting. In this paper, we describe a data modeling method for the prediction system of specialty setting, considering both the requirements and the reality. The study will be a guarantee of the research on the specialty setting prediction system.

Patent
30 Apr 2009
TL;DR: In this article, a post marketing surveillance methodology using a data mart is provided to present data mart construction methodology for the PMS(Postmarketing Surveillance) to analyze the correlation and the frequency of the side effect according to the use of the corresponding medicine.
Abstract: A development of a post marketing surveillance methodology using a data mart is provided to present data mart construction methodology for the PMS(Postmarketing Surveillance) to analyze the correlation and the frequency of the side effect according to the use of the corresponding medicine. The target patients are extracted from the bulk hospital database to search medical records of patients who has taken the electrocardiogram(ECG) among the patients who have prescribed a medicine, for example levofloxacin, which is known as causing abnormal reaction to cardiovascular in a non-clinical test. The validity of data and the quality of value are strengthened through the quality control of data and quality control of ERD.

Journal Article
TL;DR: A uniform and standard ocean data system architecture is proposed for managing ocean data from national level and provincial level to city level effectively, and fulfills the requirements of foundation data platform of "digital ocean".
Abstract: The characteristics of polymorphism, diversity, multi-sources and great capacity make the ocean data quite different from other data. How to store and manage ocean data more effectively and reasonably is the key to constructing the ocean integrated management system and "digital ocean" prototype system. It is also a pressing problems. This paper arranges the various ocean data of resources, environment, economy and management etc., proposes a uniform and standard ocean data system architecture for managing ocean data from national level and provincial level to city level effectively, and fulfills the requirements of foundation data platform of "digital ocean".

Proceedings ArticleDOI
22 Jun 2009
TL;DR: This paper presents a strategy and development of Warehouse model for Students' Nourishment Information System in Croatia and indicates the main goal remains the same - fast and simple data access.
Abstract: There are two basic design models that can be used for building a data warehouse: Bottom up and Top down model. Choice between these models depends on method of data organization in transactional database and faster or cheaper database warehouse development. Depending on choice, solution has to be purposeful about data usage and later analytical operations which will be used on warehouse data. The main goal remains the same - fast and simple data access. This paper presents a strategy and development of Warehouse model for Students' Nourishment Information System in Croatia.

01 Jan 2009
TL;DR: The Emergo Project as mentioned in this paper assesses psychology students using a data mart with multiple-choice questions from national exams and students' answers, identifying patterns for the evolution of correct answers across semester enrolled.
Abstract: National-level, objective assessment in higher education has been a practice in Brazil since 1996, surviving political shifts that frequently dismantle public policies. This paper presents the Emergo Project – the assessment of Psychology students using a data mart with multiple-choice questions from national exams and students’ answers. We run two annual examinations, giving individual feedback and discussing aggregate results with faculty and students. We identified patterns for the evolution of correct answers across semester enrolled – Growing, Decreasing, Peak, Constant, and Other. Actual results in the national exam suggest that the feedback and discussions might have helped achieving superior performance standards.

01 Dec 2009
TL;DR: In this paper, attempt is made to design data mart for utilizing micro-data through data analysis and guidelines for development of statistics information service of research and development are provided.
Abstract: The survey of research and development is conducted for estimating of national status in science and technology, and the micro-data obtained from survey had been used for generating indicators which are supplied in the form of printed materials. But survey micro-data had not been managed in a systematic way then end user didn't acquire and manipulate statistical information for their purposes. In this paper, attempt is made to design data mart for utilizing micro-data through data analysis. And this article provides guidelines for development of statistics information service of research and development.

Book ChapterDOI
06 May 2009
TL;DR: This work proposes a minable data warehouse that integrates the preprocessing stage in a data mining technique within the cleansing and transformation process in a Data Warehouse, and presents a proposed framework using a synthetically generated dataset and a classical datamining technique called Apriori to discover association rules within instant messaging datasets.
Abstract: Data warehouses have been widely used in various capacities such as large corporations or public institutions. These systems contain large and rich datasets that are often used by several data mining techniques to discover interesting patterns. However, before data mining techniques can be applied to data warehouses, arduous and convoluted preprocessing techniques must be completed. Thus, we propose a minable data warehouse that integrates the preprocessing stage in a data mining technique within the cleansing and transformation process in a data warehouse. This framework will allow data mining techniques to be computed without any additional preprocessing steps. We present our proposed framework using a synthetically generated dataset and a classical data mining technique called Apriori to discover association rules within instant messaging datasets.