scispace - formally typeset
Search or ask a question
Topic

Data mart

About: Data mart is a research topic. Over the lifetime, 559 publications have been published within this topic receiving 8550 citations.


Papers
More filters
Patent
05 Apr 2004
TL;DR: In this paper, a data analysis workbench enables a user to define data analysis process that includes an extract sub-process to obtain transactional data from a source system, a load subprocess for providing the extracted data to a data warehouse or data mart, a data mining analysis subprocess to use the obtained transactional datasets, and a deployment subprocess is used to make the data mining results accessible by another computer program.
Abstract: A data analysis workbench enables a user to define a data analysis process that includes an extract sub-process to obtain transactional data from a source system, a load sub-process for providing the extracted data to a data warehouse or data mart, a data mining analysis sub-process to use the obtained transactional data, and a deployment sub-process to make the data mining results accessible by another computer program. Common settings used by each of the sub-processes are defined, as are specialized settings relevant to each of the sub-processes. The invention also enables a user to define an order in which the defined sub-processes are to be executed.

38 citations

Book ChapterDOI
24 Jun 2004
TL;DR: The comparison indicates that the proposed system can be used for non-critical case-finding applications such as: finding appropriate patients for clinical trials and several caching mechanisms.
Abstract: This paper presents a novel information retrieval system designed specifically for medical case finding applications. The proposed system begins by extracting medical information from free-text narrative reports and storing it in a predefined relational clinical data mart. The extraction is performed using a medical thesaurus and a regular expression pattern match. Following the extraction phase, inclusion/exclusion criteria are provided to the system using a physician-friendly user interface. The system converts the entered criteria into a single SQL command which can be then executed on the relational data mart. In order to achieve the appropriate response time required for on-line analysis, the system implements several caching mechanisms. The proposed system has been examined on real-world database. The performance of the system has been compared to the results obtained manually by a physician. The comparison indicates that the proposed system can be used for non-critical case-finding applications such as: finding appropriate patients for clinical trials.

38 citations

Book
15 Sep 2015
TL;DR: "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to creating a technical data warehouse layer.
Abstract: The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouseDemystifies data vault modeling with beginning, intermediate, and advanced techniquesDiscusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0

37 citations

Patent
25 Jun 2001
TL;DR: In this article, a method and system for performing real-time transformations of dynamically increasing databases is described, in which a session, identified as a real time session, is initialized.
Abstract: A method and system thereof for performing real time transformations of dynamically increasing databases is described. A session, identified as a real time session, is initialized. The real time session repeatedly executes a persistent (e.g., continually running) data transport pipeline of the analytic application. The data transport pipeline extracts data from a changing database, transforms the data, and writes the transformed data to storage (e.g., a data warehouse or data mart). The data transport pipeline is executed at the end of each time interval in a plurality of contiguous time intervals occurring during the real time session. The data transport pipeline remains running after it is executed, until the real time session is completed. Accordingly, new data are transformed in a timely manner, and processing resources are not consumed by having to repeatedly re-establish (re-initialize) the data transport pipeline.

36 citations

Journal ArticleDOI
01 Mar 2012
TL;DR: This work argues for considering spatiality as a personalization feature within a formal design process so that each decision maker will be able to access its own personalized SMD schema with its required spatial structures and instances, suitable to be properly analyzed at a glance.
Abstract: Spatial data warehouses (SDW) rely on extended multidimensional (MD) models in order to provide decision makers with appropriate structures to intuitively explore spatial data by using different analysis techniques such as OLAP (On-Line Analytical Processing) or data mining. Current development approaches are focused on defining a unique and static Spatial multidimensional (SMD) schema at the conceptual level over which all decision makers fulfill their current spatial information needs. However, considering the required spatiality for each decision maker is likely to derive in a potentially misleading SMD schema (even if a departmental DW or data mart is being defined). Furthermore, spatial needs of each decision maker could change over time or depending on the context, thus requiring the SMD schema to be continuously updated with changes that can hamper decision making. Therefore, if a unique and static SMD schema is designed, acquiring the required spatial information is more costly than expected for decision makers and they may get frustrated during the analysis. To overcome these drawbacks, we argue for considering spatiality as a personalization feature within a formal design process. In this way, each decision maker will be able to access its own personalized SMD schema with its required spatial structures and instances, suitable to be properly analyzed at a glance. Our approach considers several novel artifacts: (i) a UML profile for spatial multidimensional modeling at the conceptual level, (ii) a spatial-aware user model in order to define decision maker profile; and (iii) a spatial personalization language to define spatial needs of decision makers as personalization rules. The definition of personalized SMD schemas by using these artifacts is formally defined using the Software Process Engineering Metamodel Specification (SPEM) standard. Finally, the applicability of our approach is shown through a running example based on our Eclipse-based tool for SDW development.

35 citations


Network Information
Related Topics (5)
Information system
107.5K papers, 1.8M citations
77% related
The Internet
213.2K papers, 3.8M citations
72% related
Scheduling (computing)
78.6K papers, 1.3M citations
72% related
Cloud computing
156.4K papers, 1.9M citations
71% related
Software
130.5K papers, 2M citations
70% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202113
202020
201926
201823
201726
201627