scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 2010"


Proceedings ArticleDOI
06 Jun 2010
TL;DR: The focus of this work is on transaction processing (i.e., read and update workloads), rather than analytics or OLAP workloads, which have recently gained a great deal of attention.
Abstract: Cloud computing promises a number of advantages for the deployment of data-intensive applications. One important promise is reduced cost with a pay-as-you-go business model. Another promise is (virtually) unlimited throughput by adding servers if the workload increases. This paper lists alternative architectures to effect cloud computing for database applications and reports on the results of a comprehensive evaluation of existing commercial cloud services that have adopted these architectures. The focus of this work is on transaction processing (i.e., read and update workloads), rather than analytics or OLAP workloads, which have recently gained a great deal of attention. The results are surprising in several ways. Most importantly, it seems that all major vendors have adopted a different architecture for their cloud services. As a result, the cost and performance of the services vary significantly depending on the workload.

291 citations


Journal ArticleDOI
TL;DR: The CDW platform would be a promising infrastructure to make full use of the TCM clinical data for scientific hypothesis generation, and promote the development of TCM from individualized empirical knowledge to large-scale evidence-based medicine.

210 citations


Journal ArticleDOI
01 Nov 2010
TL;DR: A user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks is introduced, based on a reengineering process that derives the multiddimensional schema from a conceptual formalization of the domain.
Abstract: The data warehouse design task needs to consider both the end-user requirements and the organization data sources. For this reason, the data warehouse design has been traditionally considered a reengineering process, guided by requirements, from the data sources. Most current design methods available demand highly-expressive end-user requirements as input, in order to carry out the exploration and analysis of the data sources. However, the task to elicit the end-user information requirements might result in a thorough task. Importantly, in the data warehousing context, the analysis capabilities of the target data warehouse depend on what kind of data is available in the data sources. Thus, in those scenarios where the analysis capabilities of the data sources are not (fully) known, it is possible to help the data warehouse designer to identify and elicit unknown analysis capabilities. In this paper we introduce a user-centered approach to support the end-user requirements elicitation and the data warehouse multidimensional design tasks. Our proposal is based on a reengineering process that derives the multidimensional schema from a conceptual formalization of the domain. It starts by fully analyzing the data sources to identify, without considering requirements yet, the multidimensional knowledge they capture (i.e., data likely to be analyzed from a multidimensional point of view). Next, we propose to exploit this knowledge in order to support the requirements elicitation task. In this way, we are already conciliating requirements with the data sources, and we are able to fully exploit the analysis capabilities of the sources. Once requirements are clear, we automatically create the data warehouse conceptual schema according to the multidimensional knowledge extracted from the sources.

96 citations


01 Jan 2010
TL;DR: In this paper, a query algebra for multidimensional analyses is presented, which supports complex analyses through advanced operators and binary operators, and a graphical language, based on this algebra, is also provided to ease the specification of multi-dimensional queries.
Abstract: This article deals with multidimensional analyses. Analyzed data are designed according to a conceptual model as a constellation of facts and dimensions, which are composed of multi-hierarchies. This model supports a query algebra defining a minimal core of operators, which produce multidimensional tables for displaying analyzed data. This user-oriented algebra supports complex analyses through advanced operators and binary operators. A graphical language, based on this algebra, is also provided to ease the specification of multidimensional queries. These graphical manipulations are expressed from a constellation schema and they produce multidimensional tables.

91 citations


Journal ArticleDOI
TL;DR: The book offers a principled overview of key implementation techniques that are particularly important to multidimensional databases, including materialized views, bitmap indices, join indices, and star join processing.
Abstract: The present book's subject is multidimensional data models and data modeling concepts as they are applied in real data warehouses. The book aims to present the most important concepts within this subject in a precise and understandable manner. The book's coverage of fundamental concepts includes data cubes and their elements, such as dimensions, facts, and measures and their representation in a relational setting; it includes architecture-related concepts; and it includes the querying of multidimensional databases. The book also covers advanced multidimensional concepts that are considered to be particularly important. This coverage includes advanced dimension-related concepts such as slowly changing dimensions, degenerate and junk dimensions, outriggers, parent-child hierarchies, and unbalanced, non-covering, and non-strict hierarchies. The book offers a principled overview of key implementation techniques that are particularly important to multidimensional databases, including materialized views, bitmap indices, join indices, and star join processing. The book ends with a chapter that presents the literature on which the book is based and offers further readings for those readers who wish to engage in more in-depth study of specific aspects of the book's subject. Table of Contents: Introduction / Fundamental Concepts / Advanced Concepts / Implementation Issues / Further Readings

89 citations


Journal ArticleDOI
01 Sep 2010
TL;DR: The most relevant step in the framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements, and is a fully automatic approach that handles and analyzes the end- user requirements automatically.
Abstract: It is widely accepted that the conceptual schema of a data warehouse must be structured according to the multidimensional model. Moreover, it has been suggested that the ideal scenario for deriving the multidimensional conceptual schema of the data warehouse would consist of a hybrid approach (i.e., a combination of data-driven and requirement-driven paradigms). Thus, the resulting multidimensional schema would satisfy the end-user requirements and would be conciliated with the data sources. Most current methods follow either a data-driven or requirement-driven paradigm and only a few use a hybrid approach. Furthermore, hybrid methods are unbalanced and do not benefit from all of the advantages brought by each paradigm. In this paper we present our approach for multidimensional design. The most relevant step in our framework is Multidimensional Design by Examples (MDBE), which is a novel method for deriving multidimensional conceptual schemas from relational sources according to end-user requirements. MDBE introduces several advantages over previous approaches, which can be summarized as three main contributions. (i) The MDBE method is a fully automatic approach that handles and analyzes the end-user requirements automatically. (ii) Unlike data-driven methods, we focus on data of interest to the end-user. However, the user may not be aware of all the potential analyses of the data sources and, in contrast to requirement-driven approaches, MDBE can propose new multidimensional knowledge related to concepts already queried by the user. (iii) Finally, MDBE proposes meaningful multidimensional schemas derived from a validation process. Therefore, the proposed schemas are sound and meaningful.

85 citations


Journal Article
TL;DR: In this paper, the authors present a case study in which the evaluation of an investment in on-line analytical processing (OLAP) technology in the company Melamin was made through an qualitative approach.
Abstract: Several arguments can be found in business intelligence literature that the use of business intelligence systems can bring multiple benefits, for example, via faster and easier access to information, savings in information technology (‘IT’) and greater customer satisfaction all the way through to the improved competitiveness of enterprises. Yet, most of these benefits are often very difficult to measure because of their indirect and delayed effects on business success. On top of the difficulties in justifying investments in information technology (‘IT’), particularly business intelligence (‘BI’), business executives generally want to know whether the investment is worth the money and if it can be economically justified. In looking for an answer to this question, various methods of evaluating investments can be employed. We can use the classic return on investment (‘ROI’) calculation, cost-benefit analysis, the net present value (‘NPV’) method, the internal rate of return (‘IRR’) and others. However, it often appears in business practice that the use of these methods alone is inappropriate, insufficient or unfeasible for evaluating an investment in business intelligence systems. Therefore, for this purpose, more appropriate methods are those based mainly on a qualitative approach, such as case studies, empirical analyses, user satisfaction analyses, and others that can be employed independently or can help us complete the whole picture in conjunction with the previously mentioned methods. Since there is no universal approach to the evaluation of an investment in information technology and business intelligence, it is necessary to approach each case in a different way based on the specific circumstances and purpose of the evaluation. This paper presents a case study in which the evaluation of an investment in on-line analytical processing (‘OLAP’) technology in the company Melamin was made through an

81 citations


Journal ArticleDOI
TL;DR: The authors propose the GeoCube model, which enriches the SOLAP concepts of spatial measure and spatial dimension and take into account the semantic component of geographic information.
Abstract: Introducing spatial data into multidimensional models leads to the concept of Spatial OLAP SOLAP. Existing SOLAP models do not completely integrate the semantic component of geographic information alphanumeric attributes and relationships or the flexibility of spatial analysis into multidimensional analysis. In this paper, the authors propose the GeoCube model and its associated operators to overcome these limitations. GeoCube enriches the SOLAP concepts of spatial measure and spatial dimension and take into account the semantic component of geographic information. The authors define geographic measures and dimensions as geographic and/or complex objects belonging to hierarchy schemas. GeoCube's algebra extends SOLAP operators with five new operators, i.e., Classify, Specialize, Permute, OLAP-Buffer and OLAP-Overlay. In addition to classical drill-and-slice OLAP operators, GeoCube provides two operators for navigating the hierarchy of the measures, and two spatial analysis operators that dynamically modify the structure of the geographic hypercube. Finally, to exploit the symmetrical representation of dimensions and measures, GeoCube provides an operator capable of permuting dimension and measure. In this paper, GeoCube is presented using environmental data on the pollution of the Venetian Lagoon.

63 citations


01 Jan 2010
TL;DR: In this paper, the authors describe the development and implementation of Air Quality Data Mart for Ontario and Canada using Online Analytical Processing (OLAP) tool and evaluate the functionality of the tool by extracting the data across several dimensions.
Abstract: This thesis describes the development and implementation of Air Quality Data Mart for Ontario Canada using Online Analytical Processing (OLAP) tool. It is followed by a case study which presents comparisons of air quality between the urban and rural areas, peak and non-peak hours, working days and weekends for various cities in Ontario. The purpose of this study is to develop a user friendly tool for historical air quality data and evaluate the functionality of the tool by extracting the data across several dimensions. The data for air quality is available on the Ontario Ministry of Environment website for 43 monitoring stations across Ontario. This data is in the form of static Hyper Text Markup Language (HTML) pages which cannot be used for analytical purposes. Air quality data mart was developed using open source OLAP. The database was designed using multidimensional modeling approach. OLAP server “Mondrian” was used as the presentation server whereas “Openi” client was used as an end user tool for this study. The different functions available in this data mart are: rollup, drill down and slice and dice the data across several dimensions such as time, location and pollutant. The most important conclusion of this thesis is the successful implementation of an air quality data mart with the possibility to extract accurate historical air quality data. The data in the form of a data mart provides numerous advantages, where it can be analyzed according to the required analytical perspective for a given city/cities. The only drawback of having data in the form of a data mart is that, if the data is drilled down to the finest precision i.e. to the hour (depending on the number of dimensions selected) the resulting chart will be very crowded but the generated report will present a complete overview of the analysis.

60 citations


Book
07 Dec 2010
TL;DR: Business IntelligenceTechniques is a compilation of chapters written by experts in the various areas that provide a comprehensive overview of how to exploit accounting data in the business environment.
Abstract: Modern businesses generate huge volumes of accounting data on a daily basis. The recent advancements in information technology have given organizations the ability to capture and store data in an efficient and effective manner. However, there is a widening gap between this data storage and usage of the data. Business intelligence techniques can help an organization obtain and process relevant accounting data quickly and cost efficiently. Such techniques include: query and reporting tools, online analytical processing (OLAP), statistical analysis, text mining, data mining, and visualization. Business IntelligenceTechniques is a compilation of chapters written by experts in the various areas. While these chapters stand on their own, taken together they provide a comprehensive overview of how to exploit accounting data in the business environment.

60 citations


Journal ArticleDOI
01 Jul 2010
TL;DR: The compression strategy proposed in ECM-DS puts the basis for a novel class of intelligent applications over data streams where the knowledge on actual streams is integrated-with and correlated-to the knowledge related to expired events that are considered critical for the target OLAP analysis scenario.
Abstract: An innovative event-based lossy compression model for effective and efficient OLAP over data streams, called ECM-DS, is presented and experimentally assessed in this paper. The main novelty of our compression approach with respect to traditional data stream compression techniques relies on exploiting the semantics of the reference application scenario in order to drive the compression process by means of the ''degree of interestingness'' of events occurring in the target stream. This finally improves the quality of retrieved approximate answers to OLAP queries over data streams, and, in turn, the quality of complex knowledge discovery tasks over data streams developed on top of ECM-DS, and implemented via ad-hoc data stream mining algorithms. Overall, the compression strategy we propose in this research puts the basis for a novel class of intelligent applications over data streams where the knowledge on actual streams is integrated-with and correlated-to the knowledge related to expired events that are considered critical for the target OLAP analysis scenario. Finally, a comprehensive experimental evaluation over several classes of data stream sets clearly confirms the benefits deriving from the event-based data stream compression approach proposed in ECM-DS.

01 Jan 2010
TL;DR: This paper provides an overview of Data warehousing, Data Mining, OLAP, OLTP technologies, exploring the features, applications and the architecture of Data Warehousing.
Abstract: This paper provides an overview of Data warehousing, Data Mining, OLAP, OLTP technologies, exploring the features, applications and the architecture of Data Warehousing. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the on-line transaction processing (OLTP) applications traditionally supported by the operational databases. Data warehouses provide on-line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitates effective data mining. Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. OLTP is customer-oriented and is used for transaction and query processing by clerks, clients and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives and analysts. Data warehousing and OLAP have emerged as leading technologies that facilitate data storage, organization and then, significant retrieval. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications.

Proceedings ArticleDOI
30 Oct 2010
TL;DR: An innovative framework based on flexible sampling-based data cube compression techniques for computing privacy preserving OLAP aggregations on data cubes while allowing approximate answers to be efficiently evaluated over such aggregations is proposed.
Abstract: In this paper we propose an innovative framework based on flexible sampling-based data cube compression techniques for computing privacy preserving OLAP aggregations on data cubes while allowing approximate answers to be efficiently evaluated over such aggregations. In our proposal, this scenario is accomplished by means of the so-called accuracy/privacy contract, which determines how OLAP aggregations must be accessed throughout balancing accuracy of approximate answers and privacy of sensitive ranges of multidimensional data.

Proceedings ArticleDOI
06 Jun 2010
TL;DR: This tutorial presents an organized picture on how to turn a database into one or a set of organized heterogeneous information networks, how information networks can be used for data cleaning, data consolidation, and data qualify improvement, and how to discover various kinds of knowledge from information networks.
Abstract: Most people consider a database is merely a data repository that supports data storage and retrieval. Actually, a database contains rich, inter-related, multi-typed data and information, forming one or a set of gigantic, interconnected, heterogeneous information networks. Much knowledge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology. In this tutorial, we introduce database-oriented information network analysis methods and demonstrate how information networks can be used to improve data quality and consistency, facilitate data integration, and generate interesting knowledge. This tutorial presents an organized picture on how to turn a database into one or a set of organized heterogeneous information networks, how information networks can be used for data cleaning, data consolidation, and data qualify improvement, how to discover various kinds of knowledge from information networks, how to perform OLAP in information networks, and how to transform database data into knowledge by information network analysis. Moreover, we present interesting case studies on real datasets, including DBLP and Flickr, and show how interesting and organized knowledge can be generated from database-oriented information networks.

Journal ArticleDOI
TL;DR: A decision support system containing the methodology, Weighted and Layered workflow evaluation (WaLwFA), extended to incorporate business intelligence using C4.5 and association rule algorithms is described.
Abstract: Business performance measurements, decision support systems (DSS) and online analytical processing (OLAP) have a common goal i.e., to assist decision-makers during the decision-making process. Integrating DSS and OLAP into existing business performance measurements hopes to improve the accuracy of analysis and provide in-depth, multi-angle view of data. This paper describes a decision support system containing our methodology, Weighted and Layered workflow evaluation (WaLwFA), extended to incorporate business intelligence using C4.5 and association rule algorithms. C4.5 produces more comprehensible decision trees by showing only important attributes. Furthermore, C4.5 can be transformed into IF-THEN rules. However, association rules are preferred as data can be described in rules of multiple granularities. Sorting rules based on rules' complexities permits OLAP to navigate through layers of complexities to extract rules of relevant sizes and to view data from multidimensional perspectives in each layer. Experimental results on an airline domain are presented.

Journal ArticleDOI
TL;DR: A solution to this drawback consisting of an extension to the object constraint language (OCL), which has been developed to include a set of predefined OLAP operators that can be used to define platform-independent OLAP queries as a part of the specification of the data warehouse conceptual multidimensional model.

Journal ArticleDOI
06 Dec 2010
TL;DR: In this article, the authors review web-based business intelligence approaches for small and middle-sized enterprises in decision making, and discuss the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools).
Abstract: Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt existing solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises.Small enterprises require cheap, lightweight architectures and tools (hardware and software) providing online data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also review in-memory processing.Consequently, this paper discusses the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making.

Proceedings ArticleDOI
26 Apr 2010
TL;DR: UDFs are proposed that re-factor analytical processing on RDF graphs in a way that enables more parallelized processing and perform a look-ahead processing to reduce the cost of subsequent operators in the query execution plan.
Abstract: In order to exploit the growing amount of RDF data in decision-making, there is an increasing demand for analytics-style processing of such data RDF data is modeled as a labeled graph that represents a collection of binary relations (triples) In this context, analytical queries can be interpreted as consisting of three main constructs namely pattern matching, grouping and aggregation, and require several join operations to reassemble them into n-ary relations relevant to the given query, unlike traditional OLAP systems where data is suitably organized MapReduce-based parallel processing systems like Pig have gained success in processing scalable analytical workloads However, these systems offer only relational algebra style operators which would require an iterative n-tuple reassembly process in which intermediate results need to be materialized This leads to high I/O costs that negatively impacts performance In this paper, we propose UDFs that (i) re-factor analytical processing on RDF graphs in a way that enables more parallelized processing (ii) perform a look-ahead processing to reduce the cost of subsequent operators in the query execution plan These functions have been integrated into the Pig Latin function library and the experimental results show up to 50% improvement in execution times for certain classes of queries An important impact of this work is that it could serve as the foundation for additional physical operators in systems such as Pig for more efficient graph processing

Journal ArticleDOI
TL;DR: This survey paper presents an overview of the different proposals that use XML within data warehousing technology, which range from using XML data sources for regular warehouses to those using full XML warehousing solutions.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: This work demonstrates a framework that transforms the traditional data cube model into a trajectory warehouse, T-WAREHOUSE, a system that incorporates all the required steps for Visual Trajectory Data Warehousing, from trajectory reconstruction and ETL processing to Visual OLAP analysis on mobility data.
Abstract: Technological advances in sensing technologies and wireless telecommunication devices enable novel research fields related to the management of trajectory data. As it usually happens in the data management world, the challenge after storing the data is the implementation of appropriate analytics for extracting useful knowledge. However, traditional data warehousing systems and techniques were not designed for analyzing trajectory data. Thus, in this work, we demonstrate a framework that transforms the traditional data cube model into a trajectory warehouse. As a proof-of-concept, we implemented T-WAREHOUSE, a system that incorporates all the required steps for Visual Trajectory Data Warehousing, from trajectory reconstruction and ETL processing to Visual OLAP analysis on mobility data.

Proceedings ArticleDOI
30 Oct 2010
TL;DR: This work presents a GPU-based cube data structure and algorithms for fast multidimensional aggregation, implemented using Nvidia's CUDA framework, and shows a substantial speedup over state-of-the-art sequential algorithms.
Abstract: Multidimensional aggregation is one of the most important computational building blocks and hence also a potential performance bottleneck in Online Analytic Processing (OLAP). In order to deliver fast query responses for interactive operations such as slicing, dicing, roll-up and drill-down, it is essential that aggregates along the relevant dimensions of a data cube can be calculated as efficiently as possible. General-purpose computing on graphics processing units (GPGPU) is a recent trend used in many computing domains with the potential for tremendous speedups through the massively data-parallel computation available on such devices. We present a GPU-based cube data structure and algorithms for fast multidimensional aggregation, implemented using Nvidia's CUDA framework. Our experimental tests show a substantial speedup over state-of-the-art sequential algorithms. Moreover, the performance gain is particularly high in cases exposing the weaknesses of traditional algorithms, i.e. when the number of base cells involved in an aggregation is large.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: A novel E-Cube model is demonstrated that combines CEP and OLAP techniques for multi-dimensional event pattern analysis at different abstraction levels and a London transit scenario is given to demonstrate the utility and performance of this proposed technology.
Abstract: Many modern applications including tag based mass transit systems, RFID-based supply chain management systems and online financial feeds require special purpose event stream processing technology to analyze vast amounts of sequential multi-dimensional data available in real-time data feeds. Traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while Complex Event Processing (CEP) systems are designed for sequence detection and do not support OLAP operations. We will demonstrate a novel E-Cube model that combines CEP and OLAP techniques for multi-dimensional event pattern analysis at different abstraction levels. A London transit scenario will be given to demonstrate the utility and performance of this proposed technology.

Proceedings ArticleDOI
26 Oct 2010
TL;DR: Visual Cube and multi-dimensional OLAP of image collections, such as web images indexed in search engines, product images and photos shared on social networks, are proposed and efficient algorithms are developed to construct Visual Cube.
Abstract: On-Line Analytical Processing (OLAP) has shown great success in many industry applications, including sales, marketing, management, financial data analysis, etc. In this paper, we propose Visual Cube and multi-dimensional OLAP of image collections, such as web images indexed in search engines (e.g., Google and Bing), product images (e.g. Amazon) and photos shared on social networks (e.g., Facebook and Flickr). It provides online responses to user requests with summarized statistics of image information and handles rich semantics related to image visual features. A clustering structure measure is proposed to help users freely navigate and explore images. Efficient algorithms are developed to construct Visual Cube. In addition, we introduce the new issue of Cell Overlapping in data cube and present efficient solutions for Visual Cube computation and OLAP operations. Extensive experiments are conducted and the results show good performance of our algorithms.

Journal ArticleDOI
TL;DR: An analytical model using Petri Net for distributed data management in a data warehouse to ease the OLAP (Online Analytical Processing) operations is proposed and some of the properties like safeness, boundedness, liveness and conservativeness are verified.
Abstract: A decision maker wants the pool of data at the finger tip while making the decision. In state of the art applications, decision making is no more a centralized process. Distribution of resources is a challenge before the system designers. Besides, for timely analyzing the distributed data, a robust query processing system along with the physical storage with schema definitions are also necessary. In the present state of business process skills, technologies, processes, applications and practices all are falling under the categories of competitive intelligence. In short BI aims to support better business decision-making. In this paper we propose an analytical model using Petri Net for distributed data management in a data warehouse to ease the OLAP (Online Analytical Processing) operations. Some of the properties of the model like safeness, boundedness, liveness and conservativeness are also verified..

Patent
22 Nov 2010
TL;DR: In this paper, the authors present an OLAP execution model using relational operations, where the first query is generated by the OLAP and the second query is received in a relational engine coupled to the datastore.
Abstract: In one embodiment the present invention includes an OLAP execution model using relational operations. In one embodiment, the present invention includes, a method comprising receiving a first query in an online analytic processor (OLAP) executing on one or more computers, the OLAP generating and comprising a model specifying a graph defining a plurality of nodes and a plurality of tiers, each node corresponding to a different operation on data. A second query is generated by the OLAP. The second query includes a plurality of layered subqueries each corresponding to one of the nodes in the graph for specifying the different operations on data. The second query is received in a relational engine coupled to the datastore. The relational engine executes the second query, and in accordance therewith, retrieves data.

Book ChapterDOI
12 Dec 2010
TL;DR: The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively).
Abstract: The Cloud is fast gaining popularity as a platform for deploying Software as a Service (SaaS) applications. In principle, the Cloud provides unlimited compute resources, enabling deployed services to scale seamlessly. Moreover, the pay-as-you-go model in the Cloud reduces the maintenance overhead of the applications. Given the advantages of the Cloud, it is attractive to migrate existing software to this new platform. However, challenges remain as most software applications need to be redesigned to embrace the Cloud. In this paper, we present an overview of our current on-going work in developing epiC - an elastic and efficient power-aware data-intensive Cloud system. We discuss the design issues and the implementation of epiC's storage system and processing engine. The storage system and the processing engine are loosely coupled, and have been designed to handle two types of workload simultaneously, namely data-intensive analytical jobs and online transactions (commonly referred as OLAP and OLTP respectively). The processing of large-scale analytical jobs in epiC adopts a phase-based processing strategy, which provides a fine-grained fault tolerance, while the processing of queries adopts indexing and filter-and-refine strategies.

Journal ArticleDOI
TL;DR: A generic multidimensional schema is proposed to analyze the results of a simulation model, which can guide modelers in designing specific data warehouses, and an adaptation of an OLAP client tool to provide an adequate visualization of data is proposed.
Abstract: This paper examines the multidimensional modeling of a data warehouse for simulation results. Environmental dynamics modeling is used to study complex scenarios like urbanization, climate change and deforestation while allowing decision makers to understand and predict the evolution of the environment in response to potential value changes in a large number of influence variables. In this context, exploring simulation models produces a huge volume of data, which must often be studied extensively at different levels of aggregation due to there being a great need to define tools and methodologies specifically adapted for the storage and analysis of such complex data. Data warehousing systems provide technologies for managing simulation results from different sources. Moreover, OLAP technologies allow one to analyze and compare these results and their corresponding models. In this paper, the authors propose a generic multidimensional schema to analyze the results of a simulation model, which can guide modelers in designing specific data warehouses, and an adaptation of an OLAP client tool to provide an adequate visualization of data. As an example, a data warehouse for the analysis of results produced from a savanna simulation model is implemented using a Relational OLAP architecture.

Proceedings ArticleDOI
01 Nov 2010
TL;DR: A MapReduceMerge-based parallel data cube construction method with a read-optimized data storage strategy which is more suitable for OLAP and can ensure good load balancing and reduce the large amount of data movement compared with traditional approaches.
Abstract: The pre-computation of data cubes is critical to improve the response time of On-Line Analytical Processing (OLAP) system. However, as the size of data grows, the time it takes to construct data cubes becomes a significant performance bottleneck. Therefore, we need the parallel pre-computation approach to further improve the performance of OLAP. Current parallel approaches can be grouped into two categories: work partitioning and data partitioning. But the first one can not guarantee the load balance among processors and the second one produces massive data movement between processors. This paper proposes a MapReduceMerge-based parallel data cube construction method with a read-optimized data storage strategy which is more suitable for OLAP. Our method can ensure good load balancing and reduce the large amount of data movement compared with traditional approaches. MapReduceMerge is the expansion of Map Reduce which is a programming model that enables easy development of parallel applications to process massive data on large clusters and it is the key element of Hadoop(an cloud computing framework) which used to support the businesses of Face book under cloud environment. We modify the original MapReduceMerge framework to make it meet the needs of cuboids construction and show the implementation in details through an example of 2-dimension cuboids construction. In the mean time, we discuss the optimization for the construction of multi-dimension cuboids.

Journal ArticleDOI
TL;DR: The UML class diagram of a GDW metamodel and its formal specifications are discussed and the Geographical Multidimensional Query Language (GeoMDQL) is proposed, based on well-known standards such as the MultiDimensional eXpressions (MDX) language and OGC simple features specification for SQL.

Proceedings ArticleDOI
19 Nov 2010
TL;DR: The integrated contextual information is proposed as the foundation concept of multidimensional recommendation model and the Online Analytical Processing (OLAP) ability of data warehousing is used to solve the contradicting tribulations among hierarchy ratings.
Abstract: Recommender systems utilize the times of yore experiences and preferences of the target customers as a basis to offer personalized recommendations for them as well as resolve the information overloading hitch. Personalized recommendation methods are primarily classified into content-based recommendation approach and collaborative filtering recommendation approach. Both recommendation approaches have their own advantages, drawbacks and complementarities. Because conventional recommendation techniques don’t consider the contextual information, the real factor why a customer likes a specific product is unable to be understood. Therefore, in reality, it often causes a decrease in the accuracy of the recommendation results and also persuades the recommendation quality. In this paper, we propose the integrated contextual information as the foundation concept of multidimensional recommendation model and use the Online Analytical Processing (OLAP) ability of data warehousing to solve the contradicting tribulations among hierarchy ratings. This work hopes that by establishing additional user profiles and multidimensional analysis to find the key factors affecting user perceptions.