scispace - formally typeset
Search or ask a question

Showing papers on "Data warehouse published in 2010"


Journal Article
TL;DR: Data mining is the search for new, valuable, and nontrivial information in large volumes of data, a cooperative effort of humans and computers that is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining which produces new, nontrivials information based on the available data set.
Abstract: Understand the need for analyses of large, complex, information-rich data sets. Identify the goals and primary tasks of the data-mining process. Describe the roots of data-mining technology. Recognize the iterative character of a data-mining process and specify its basic steps. Explain the influence of data quality on a data-mining process. Establish the relation between data warehousing and data mining. Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an "interesting" outcome. Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers. In practice, the two primary goals of data mining tend to be prediction and description. Prediction involves using some variables or fields in the data set to predict unknown or future values of other variables of interest. Description, on the other hand, focuses on finding patterns describing the data that can be interpreted by humans. Therefore, it is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining, which produces new, nontrivial information based on the available data set.

4,646 citations


Proceedings ArticleDOI
01 Mar 2010
TL;DR: Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.
Abstract: The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [1] is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoop. In addition, HiveQL enables users to plug in custom map-reduce scripts into queries. The language includes a type system with support for tables containing primitive types, collections like arrays and maps, and nested compositions of the same. The underlying IO libraries can be extended to query data in custom formats. Hive also includes a system catalog - Metastore - that contains schemas and statistics, which are useful in data exploration, query optimization and query compilation. In Facebook, the Hive warehouse contains tens of thousands of tables and stores over 700TB of data and is being used extensively for both reporting and ad-hoc analyses by more than 200 users per month.

959 citations


Proceedings ArticleDOI
06 Jun 2010
TL;DR: This paper presents how Scribe, Hadoop and Hive together form the cornerstones of the log collection, storage and analytics infrastructure at Facebook and enabled us to implement a data warehouse that stores more than 15PB of data and loads more than 60TB of new data every day.
Abstract: Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis of data and creation of business intelligence dashboards by analysts across the company, a number of Facebook's site features are also based on analyzing large data sets. These features range from simple reporting applications like Insights for the Facebook Advertisers, to more advanced kinds such as friend recommendations. In order to support this diversity of use cases on the ever increasing amount of data, a flexible infrastructure that scales up in a cost effective manner, is critical. We have leveraged, authored and contributed to a number of open source technologies in order to address these requirements at Facebook. These include Scribe, Hadoop and Hive which together form the cornerstones of the log collection, storage and analytics infrastructure at Facebook. In this paper we will present how these systems have come together and enabled us to implement a data warehouse that stores more than 15PB of data (2.5PB after compression) and loads more than 60TB of new data (10TB after compression) every day. We discuss the motivations behind our design choices, the capabilities of this solution, the challenges that we face in day today operations and future capabilities and improvements that we are working on.

455 citations


Patent
06 Aug 2010
TL;DR: In this article, a system is proposed to identify a second account identifier of a user from the second user identifier based on the mapping data between the first user identifiers and the first account identifiers to facilitate targeted advertising using the profile of the user and/or to provide information about certain transactions of the users related to a previously presented advertisement.
Abstract: In one aspect, a system includes a transaction handler to process transactions, a data warehouse to store transaction data recording the transactions processed at the transaction handler and to store mapping data between first user identifiers and first account identifiers, a profile generator to generate a profile of a user based on the transaction data, and a portal coupled to the transaction handler to receive a query identifying a second user identifier used by the first tracker to track online activities of a user. The system is to identify a second account identifier of the user from the second user identifier based on the mapping data between the first user identifiers and the first account identifiers to facilitate targeted advertising using the profile of the user and/or to provide information about certain transactions of the user related to a previously presented advertisement.

390 citations


Patent
06 Aug 2010
TL;DR: In this article, a data warehouse is coupled with a portal to determine a second value for the first propensity score based on transaction data recording payment transactions of the at least one user identified by user data.
Abstract: In one aspect, a computing apparatus includes: a transaction handler to process transactions; a data warehouse to store transaction data recording the transactions processed at the transaction handler; and a portal to receive a request from a client device over a network, the request including user data identifying at least one user. The client device has activity data recording activities of the user, and has the capability to determine from the activity data a first value for a first propensity score of the user. The computing apparatus further includes a score evaluator coupled to the data warehouse and the portal to determine a second value for the first propensity score based on transaction data recording payment transactions of the at least one user identified by the user data. The portal is configured to provide information based on the second value in response to the request.

270 citations


Journal ArticleDOI
TL;DR: The CDW platform would be a promising infrastructure to make full use of the TCM clinical data for scientific hypothesis generation, and promote the development of TCM from individualized empirical knowledge to large-scale evidence-based medicine.

210 citations


Book
01 Mar 2010
TL;DR: A complete and comprehensive handbook for the application of data mining techniques in marketing and customer relationship management that combines a technical and a business perspective, bridging the gap between data mining and its use in marketing.
Abstract: A complete and comprehensive handbook for the application of data mining techniques in marketing and customer relationship management. It combines a technical and a business perspective, bridging the gap between data mining and its use in marketing.It guides readers through all the phases of the data mining process, presenting a solid data mining methodology, data mining best practices and recommendations for the use of the data mining results for effective marketing. It answers the crucial question of 'what data to use' by proposing mining data marts and full lists of KPIs for all major industries.Data mining algorithms are presented in a simple and comprehensive way for the business users along with real-world application examples from all major industries.The book is mainly addressed to marketers, business analysts and data mining practitioners who are looking for a how-to guide on data mining. It presents the authors' knowledge and experience from the "data mining trenches", revealing the secrets for data mining success.

184 citations


Patent
18 Aug 2010
TL;DR: In this paper, the authors collected, analyzed and reported social media aggregated from a plurality of social media websites, analyzed for sentiment, and categorized by topic and user demographics, and archived in a data warehouse and various interfaces are provided to query and generate reports on the archived data.
Abstract: Systems and methods are provided to collect, analyze and report social media aggregated from a plurality of social media websites. Social media is retrieved from social media websites, analyzed for sentiment, and categorized by topic and user demographics. The data is then archived in a data warehouse and various interfaces are provided to query and generate reports on the archived data. In some embodiments, the system further recognizes alert conditions and sends alerts to interested users. In some embodiments, the system further recognizes situations where users can be influenced to view a company or its products in a more favorable light, and automatically posts responsive social media to one or more social media websites.

180 citations


Patent
03 Aug 2010
TL;DR: In this paper, a system includes a transaction handler to process transactions, a data warehouse to store transaction data recording the transactions processed at the transaction handler, a profile generator to generate a profile of a user based on transaction data, an advertisement selector to identify an advertisement based on the profile of the user, and a portal coupled to the transaction handlers to provide the advertisement for presentation to the user in connection with information about the transaction.
Abstract: In one aspect, a system includes a transaction handler to process transactions, a data warehouse to store transaction data recording the transactions processed at the transaction handler, a profile generator to generate a profile of a user based on the transaction data, an advertisement selector to identify an advertisement based on the profile of the user in response to the transaction handler processing a transaction of the user, and a portal coupled to the transaction handler to provide the advertisement for presentation to the user in connection with information about the transaction of the user. In one example, the profile includes a plurality of values representing aggregated spending of the user in various areas to summarize the transactions of the user.

174 citations


Proceedings ArticleDOI
19 Apr 2010
TL;DR: PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines that enhances the scalability and flexibility of the current I/O stack on HEC platforms.
Abstract: Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to science end users, such data must be organized in its layout, indexed, sorted, and otherwise manipulated for subsequent data presentation, visualization, and detailed analysis. In addition, scientists desire to gain insights into selected data characteristics ‘hidden’ or ‘latent’ in these massive datasets while data is being produced by simulations. PreDatA, short for Preparatory Data Analytics, is an approach to preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the machine as ‘staging’ nodes and by staging simulations' output data through these nodes, PreDatA can exploit their computational power to perform select data manipulations with lower latency than attainable by first moving data into file systems and storage. Such intransit manipulations are supported by the PreDatA middleware through asynchronous data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. PreDatA enhances the scalability and flexibility of the current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulations.

173 citations


Proceedings ArticleDOI
09 Apr 2010
TL;DR: This paper proposes four data mining models for the Internet of Things, which are multi-layer data mining model, distributed data Mining model, Grid based datamining model and data miningmodel from multi-technology integration perspective.
Abstract: In this paper, we propose four data mining models for the Internet of Things, which are multi-layer data mining model, distributed data mining model, Grid based data mining model and data mining model from multi-technology integration perspective. Among them, multi-layer model includes four layers: 1) data collection layer, 2) data management layer, 3) event processing layer, and 4) data mining service layer. Distributed data mining model can solve problems from depositing data at different sites. Grid based data mining model allows Grid framework to realize the functions of data mining. Data mining model from multi-technology integration perspective describes the corresponding framework for the future Internet. Several key issues in data mining of IoT are also discussed.

Journal ArticleDOI
01 Sep 2010
TL;DR: This paper describes a data warehouse system, called Cheetah, built on top of MapReduce, designed specifically for the authors' online advertising application to allow various simplifications and custom optimizations and describes a stack of optimization techniques ranging from data compression and access method to multi-query optimization and exploiting materialized views.
Abstract: Large-scale data analysis has become increasingly important for many enterprises. Recently, a new distributed computing paradigm, called MapReduce, and its open source implementation Hadoop, has been widely adopted due to its impressive scalability and flexibility to handle structured as well as unstructured data. In this paper, we describe our data warehouse system, called Cheetah, built on top of MapReduce. Cheetah is designed specifically for our online advertising application to allow various simplifications and custom optimizations. First, we take a fresh look at the data warehouse schema design. In particular, we define a virtual view on top of the common star or snowflake data warehouse schema. This virtual view abstraction not only allows us to design a SQL-like but much more succinct query language, but also makes it easier to support many advanced query processing features. Next, we describe a stack of optimization techniques ranging from data compression and access method to multi-query optimization and exploiting materialized views. In fact, each node with commodity hardware in our cluster is able to process raw data at 1GBytes/s. Lastly, we show how to seamlessly integrate Cheetah into any ad-hoc MapReduce jobs. This allows MapReduce developers to fully leverage the power of both MapReduce and data warehouse technologies.

Patent
23 Nov 2010
TL;DR: In this article, a system includes a transaction handler, a data warehouse to store transaction data recording transactions processed at the transaction handler and to store account data identifying an account of the user, and a portal to receive a user selection of a first portion of an advertisement and, in response, to present a user interface.
Abstract: In one aspect, a system includes a transaction handler, a data warehouse to store transaction data recording transactions processed at the transaction handler and to store account data identifying an account of the user, and a portal to receive a user selection of a first portion of an advertisement and, in response, to present a user interface. The advertisement provides an offer and includes a second portion which when selected directs the user to a website of an advertiser. The data warehouse is to store data associating the offer with the account data of the user in response to a request made in the user interface. The system is to monitor transactions processed at the transaction handler to identify a purchase paid via the account and eligible for the redemption of the offer. The transaction handler is to provide statement credits to the user, if the payment transaction is identified.

Proceedings ArticleDOI
06 Jun 2010
TL;DR: In DataPath, queries do not request data, and data are automatically pushed onto processors, where they are then processed by any interested computation, making for a very lean and fast database system.
Abstract: Since the 1970's, database systems have been "compute-centric". When a computation needs the data, it requests the data, and the data are pulled through the system. We believe that this is problematic for two reasons. First, requests for data naturally incur high latency as the data are pulled through the memory hierarchy, and second, it makes it difficult or impossible for multiple queries or operations that are interested in the same data to amortize the bandwidth and latency costs associated with their data access. In this paper, we describe a purely-push based, research prototype database system called DataPath. DataPath is "data-centric". In DataPath, queries do not request data. Instead, data are automatically pushed onto processors, where they are then processed by any interested computation. We show experimentally on a multi-terabyte benchmark that this basic design principle makes for a very lean and fast database system.

Patent
10 Aug 2010
TL;DR: In this article, a transaction handler is used to process transactions and a data warehouse to store transaction data recording the transactions processed at the transaction handler and a profile generator to identify a set of user clusters based on transaction data.
Abstract: In one aspect, a computing apparatus includes: a transaction handler to process transactions; a data warehouse to store transaction data recording the transactions processed at the transaction handler; a profile generator to identify a set of user clusters based on transaction data; and a portal to enroll users and identify preferred communication channels of the users, receive offers from a plurality of entities, present data identifying the set of user clusters to the entities, receive bids on the clusters from the entities in accordance with types of the offers, based on the bids determine winning entities for a predetermined time period, and provide offers of the winning entities to respective enrolled users in respective clusters during the predetermined time period, using preferred communication channels of the respective enrolled users.

Patent
18 Oct 2010
TL;DR: In this paper, a data warehouse is used to store transaction data recorded by the transaction handler and a portal coupled with the data warehouse to receive one or more parameters as an input and to provide spending activity information for presentation as a response to the input.
Abstract: In one aspect, a computing apparatus includes: a transaction handler to process transactions, a data warehouse to store transaction data recording the transactions processed by the transaction handler, a portal coupled with the data warehouse to receive one or more parameters as an input and to provide spending activity information for presentation as a response to the input, and an analytics engine coupled with the portal and the data warehouse to analyze spending activities of a user based on the transaction data and the one or more parameters to generate the spending activity information regarding transactions in a plurality of accounts of the user.

Journal ArticleDOI
Nayem Rahman1
TL;DR: An Extract-Transform-Load ETL metadata model is proposed that archives load observation timestamps and other useful load parameters and recommends algorithms and techniques for incremental refreshes that enable table loading while ensuring data consistency, integrity, and improving load performance.
Abstract: Incremental load is an important factor for successful data warehousing. Lack of standardized incremental refresh methodologies can lead to poor analytical results, which can be unacceptable to an organization's analytical community. Successful data warehouse implementation depends on consistent metadata as well as incremental data load techniques. If consistent load timestamps are maintained and efficient transformation algorithms are used, it is possible to refresh databases with complete accuracy and with little or no manual checking. This paper proposes an Extract-Transform-Load ETL metadata model that archives load observation timestamps and other useful load parameters. The author also recommends algorithms and techniques for incremental refreshes that enable table loading while ensuring data consistency, integrity, and improving load performance. In addition to significantly improving quality in incremental load techniques, these methods will save a substantial amount of data warehouse systems resources.

Patent
07 Oct 2010
TL;DR: In this article, a data warehouse is used to store transaction data and purchase details associated with the authorization request in response to a determination that the first account identifier is associated with consent data.
Abstract: In one aspect, a computing apparatus includes: a transaction handler to process transactions; a portal to receive, from users, consent data that identifies account identifiers of the users; a data warehouse to store transaction data recording the transactions and store purchase details for at least some of the transactions; and a profile generator to generate profiles based on the transaction data and the purchase details stored in the data warehouse. In response to an authorization request received in the transaction handler for a payment transaction identifying a first account identifier, the system is to use the transaction handler to request purchase details associated with the authorization request from the merchant via a response to the authorization request, and receive and store the purchase details associated with the authorization request in the data warehouse, in response to a determination that the first account identifier is associated with consent data.

Journal ArticleDOI
01 Nov 2010
TL;DR: A user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks is introduced, based on a reengineering process that derives the multiddimensional schema from a conceptual formalization of the domain.
Abstract: The data warehouse design task needs to consider both the end-user requirements and the organization data sources. For this reason, the data warehouse design has been traditionally considered a reengineering process, guided by requirements, from the data sources. Most current design methods available demand highly-expressive end-user requirements as input, in order to carry out the exploration and analysis of the data sources. However, the task to elicit the end-user information requirements might result in a thorough task. Importantly, in the data warehousing context, the analysis capabilities of the target data warehouse depend on what kind of data is available in the data sources. Thus, in those scenarios where the analysis capabilities of the data sources are not (fully) known, it is possible to help the data warehouse designer to identify and elicit unknown analysis capabilities. In this paper we introduce a user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks. Our proposal is based on a reengineering process that derives the multidimensional schema from a conceptual formalization of the domain. It starts by fully analyzing the data sources to identify, without considering requirements yet, the multidimensional knowledge they capture (i.e., data likely to be analyzed from a multidimensional point of view). Next, we propose to exploit this knowledge in order to support the requirements elicitation task. In this way, we are already conciliating requirements with the data sources, and we are able to fully exploit the analysis capabilities of the sources. Once requirements are clear, we automatically create the data warehouse conceptual schema according to the multidimensional knowledge extracted from the sources.

Book
02 Jun 2010
TL;DR: This lecture gives an overview of recent research in stream processing, ranging from answering simple queries on high-speed streams to loading real-time data feeds into a streaming warehouse for off-line analysis.
Abstract: In this lecture many applications process high volumes of streaming data, among them Internet traffic analysis, financial tickers, and transaction log mining. In general, a data stream is an unbounded data set that is produced incrementally over time, rather than being available in full before its processing begins. In this lecture, we give an overview of recent research in stream processing, ranging from answering simple queries on high-speed streams to loading real-time data feeds into a streaming warehouse for off-line analysis. We will discuss two types of systems for end-to-end stream processing: Data Stream Management Systems (DSMSs) and Streaming Data Warehouses (SDWs). A traditional database management system typically processes a stream of ad-hoc queries over relatively static data. In contrast, a DSMS evaluates static (long-running) queries on streaming data, making a single pass over the data and using limited working memory. In the first part of this lecture, we will discuss research problems in DSMSs, such as continuous query languages, non-blocking query operators that continually react to new data, and continuous query optimization. The second part covers SDWs, which combine the real-time response of a DSMS by loading new data as soon as they arrive with a data warehouse's ability to manage Terabytes of historical data on secondary storage. Table of Contents: Introduction / Data Stream Management Systems / Streaming Data Warehouses / Conclusions

01 Jan 2010
TL;DR: The state-of-the-art purpose of the paper is to identify the reasons for data deficiencies, non-availability or reach ability problems at all the aforementioned stages of data warehousing and to formulate descriptive classification of these causes.
Abstract: Data warehousing is gaining in eminence as organizations become awake of the benefits of decision oriented and business intelligence oriented data bases. However, there is one key stumbling block to the rapid development and implementation of quality data warehouses, specifically that of warehouse data quality issues at various stages of data warehousing. Specifically, problems arise in populating a warehouse with quality data. Over the period of time many researchers have contributed to the data quality issues, but no research has collectively gathered all the causes of data quality problems at all the phases of data warehousing Viz. 1) data sources, 2) data integration & data profiling, 3) Data staging and ETL, 4) data warehouse modeling & schema design. The state-of-the-art purpose of the paper is to identify the reasons for data deficiencies, non-availability or reach ability problems at all the aforementioned stages of data warehousing and to formulate descriptive classification of these causes. We have identified possible set of causes of data quality issues from the extensive literature review and with consultation of the data warehouse practitioners working in renowned IT giants on India. We hope this will help developers & Implementers of warehouse to examine and analyze these issues before moving ahead for data integration and data warehouse solutions for quality decision oriented and business intelligence oriented applications.

Patent
George Candea1, Neoklis Polyzotis1
12 May 2010
TL;DR: In this paper, the authors present a method for concurrently executing a set of multiple queries through a processor to improve a resource usage within a data warehouse system. But the method does not consider the group of users of the system to simultaneously run queries.
Abstract: In one embodiment, a method includes concurrently executing a set of multiple queries, through a processor, to improve a resource usage within a data warehouse system. The method also includes permitting a group of users of the data warehouse system to simultaneously run a set of queries. In addition, the method includes applying a high-concurrency query operator to continuously optimize a large number of concurrent queries for a set of highly concurrent dynamic workloads.

Journal ArticleDOI
TL;DR: The book offers a principled overview of key implementation techniques that are particularly important to multidimensional databases, including materialized views, bitmap indices, join indices, and star join processing.
Abstract: The present book's subject is multidimensional data models and data modeling concepts as they are applied in real data warehouses. The book aims to present the most important concepts within this subject in a precise and understandable manner. The book's coverage of fundamental concepts includes data cubes and their elements, such as dimensions, facts, and measures and their representation in a relational setting; it includes architecture-related concepts; and it includes the querying of multidimensional databases. The book also covers advanced multidimensional concepts that are considered to be particularly important. This coverage includes advanced dimension-related concepts such as slowly changing dimensions, degenerate and junk dimensions, outriggers, parent-child hierarchies, and unbalanced, non-covering, and non-strict hierarchies. The book offers a principled overview of key implementation techniques that are particularly important to multidimensional databases, including materialized views, bitmap indices, join indices, and star join processing. The book ends with a chapter that presents the literature on which the book is based and offers further readings for those readers who wish to engage in more in-depth study of specific aspects of the book's subject. Table of Contents: Introduction / Fundamental Concepts / Advanced Concepts / Implementation Issues / Further Readings

Book
08 Feb 2010
TL;DR: These practical, hands-on articles are fully updated to reflect current practices and terminology and cover the complete lifecycle including project planning, requirements gathering, dimensional modeling, ETL, and business intelligence and analytics.
Abstract: An unparalleled collection of recommended guidelines for data warehousing and business intelligence pioneered by Ralph Kimball and his team of colleagues from the Kimball Group. Recognized and respected throughout the world as the most influential leaders in the data warehousing industry, Ralph Kimball and the Kimball Group have written articles covering more than 250 topics that define the field of data warehousing. For the first time, the Kimball Group's incomparable advice, design tips, and best practices have been gathered in this remarkable collection of articles, which spans a decade of data warehousing innovation. Each group of articles is introduced with original commentaries that explain their role in the overall lifecycle methodology developed by the Kimball Group. These practical, hands-on articles are fully updated to reflect current practices and terminology and cover the complete lifecycleincluding project planning, requirements gathering, dimensional modeling, ETL, and business intelligence and analytics. This easily referenced collection is nothing less than vital if you are involved with data warehousing or business intelligence in any capacity.

Journal ArticleDOI
TL;DR: Evaluation of cell culture stage-specific models indicates that production performance can be reliably predicted days prior to harvest, and implementation of this methodology on the manufacturing floor can facilitate a real-time decision making process and thereby improve the robustness of large scale bioprocesses.

Journal ArticleDOI
01 Sep 2010
TL;DR: The most relevant step in the framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements, and is a fully automatic approach that handles and analyzes the end- user requirements automatically.
Abstract: It is widely accepted that the conceptual schema of a data warehouse must be structured according to the multidimensional model. Moreover, it has been suggested that the ideal scenario for deriving the multidimensional conceptual schema of the data warehouse would consist of a hybrid approach (i.e., a combination of data-driven and requirement-driven paradigms). Thus, the resulting multidimensional schema would satisfy the end-user requirements and would be conciliated with the data sources. Most current methods follow either a data-driven or requirement-driven paradigm and only a few use a hybrid approach. Furthermore, hybrid methods are unbalanced and do not benefit from all of the advantages brought by each paradigm. In this paper we present our approach for multidimensional design. The most relevant step in our framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements. MDBE introduces several advantages over previous approaches, which can be summarized as three main contributions. (i) The MDBE method is a fully automatic approach that handles and analyzes the end-user requirements automatically. (ii) Unlike data-driven methods, we focus on data of interest to the end-user. However, the user may not be aware of all the potential analyses of the data sources and, in contrast to requirement-driven approaches, MDBE can propose new multidimensional knowledge related to concepts already queried by the user. (iii) Finally, MDBE proposes meaningful multidimensional schemas derived from a validation process. Therefore, the proposed schemas are sound and meaningful.

Patent
24 Sep 2010
TL;DR: The clinical informatics platform may include a data extraction facility that gathers clinical data from numerous sources, a data mapping facility that identifies and maps key data elements and links data over time, data normalization facility to normalize the clinical data and, optionally, de-identify the data, a flexible data warehouse for storing raw clinical data or longitudinal patient data, and a clinical analytics facility for data mining, analytic model building, patient risk identification, benchmarking, performing quality assurance, and patient tracking.
Abstract: The clinical analytics platform automates the capture, extraction, and reporting of data required for certain quality measures, provides real-time clinical surveillance, clinical dashboards, tracking lists, and alerts for specific, high-priority conditions, and offers dynamic, ad-hoc quality reporting capabilities. The clinical informatics platform may include a data extraction facility that gathers clinical data from numerous sources, a data mapping facility that identifies and maps key data elements and links data over time, a data normalization facility to normalize the clinical data and, optionally, de-identify the data, a flexible data warehouse for storing raw clinical data or longitudinal patient data, a clinical analytics facility for data mining, analytic model building, patient risk identification, benchmarking, performing quality assurance, and patient tracking, and a graphical user interface for presenting clinical analytics in an actionable format.

Proceedings ArticleDOI
01 Aug 2010
TL;DR: This paper identifies the critical roles of organizational routines and organization-wide capabilities for identifying, resourcing and implementing business analytics-based competitive actions in delivering performance gains and competitive advantage.
Abstract: Business analytics has the potential to deliver performance gains and competitive advantage. However, a theoretically grounded model identifying the factors and processes involved in realizing those performance gains has not been clearly articulated in the literature. This paper draws on the literature on dynamic capabilities to develop such a theoretical framework. It identifies the critical roles of organizational routines and organization-wide capabilities for identifying, resourcing and implementing business analytics-based competitive actions in delivering performance gains and competitive advantage. A theoretical framework and propositions for future research are developed.

Journal ArticleDOI
01 May 2010
TL;DR: The research suggests an overall model for predicting the data warehouse architecture selection decision and identifies the various contextual factors that affect the selection decision.
Abstract: Even though data warehousing has been in existence for over a decade, companies are still uncertain about a critical decision - which data warehouse architecture to implement? Based on the existing literature, theory, and interviews with experts, a research model was created that identifies the various contextual factors that affect the selection decision. The results from the field survey and multinomial logistic regression suggest that various combinations of organizational factors influence data warehouse architecture selection. The strategic view of the data warehouse prior to implementation emerged as a key determinant. The research suggests an overall model for predicting the data warehouse architecture selection decision.

Patent
30 Sep 2010
TL;DR: In this article, a data warehouse is used to store transaction data and a profile generator to generate a profile including a plurality of values representing aggregated spending in various spending areas to summarize transactions in a geographical area.
Abstract: In one aspect, a computing apparatus includes: a transaction handler to process transactions; a data warehouse to store transaction data recording the transactions processed at the transaction handler; a profile generator to generate, based on the transaction data, a profile including a plurality of values representing aggregated spending in various spending areas to summarize transactions in a geographical area; and a portal to receive advertisement data from an advertiser and to create an advertisement campaign based on the profile to deliver advertisements to users in the geographical area on behalf of the advertiser using one or more media channels.