scispace - formally typeset
Search or ask a question

Showing papers on "Online analytical processing published in 2013"


Patent
16 May 2013
TL;DR: In this article, the authors present a method for generating a query for a database for information stored in the database and then generating an Online Analytical Processing (OLAP) element to represent information received from the query.
Abstract: A method is provided in one example and includes generating a query for a database for information stored in the database. The information relates to data discovered through a capture system. The method further includes generating an Online Analytical Processing (OLAP) element to represent information received from the query. A rule based on the OLAP element is generated and the rule affects data management for one or more documents that satisfy the rule. In more specific embodiments, the method further includes generating a capture rule that defines items the capture system should capture. The method also includes generating a discovery rule that defines objects the capture system should register. In still other embodiments, the method includes developing a policy based on the rule, where the policy identifies how one or more documents are permitted to traverse a network.

147 citations


Proceedings ArticleDOI
28 Oct 2013
TL;DR: Open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data are highlighted and several novel research directions arising in this field are derived.
Abstract: In this paper, we highlight open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data, an emerging term in Data Warehousing and OLAP research. We also derive several novel research directions arising in this field, and put emphasis on possible contributions to be achieved by future research efforts.

120 citations


Proceedings ArticleDOI
09 Oct 2013
TL;DR: Three important aspects of Big Data research are discussed, namely OLAP over Big Data, Big Data Posting, and Privacy of Big data, and future research directions are depicted, hence implicitly defining a research agenda aiming at leading future challenges in this research field.
Abstract: Recently, a great deal of interest for Big Data has risen, mainly driven from a widespread number of research problems strongly related to real-life applications and systems, such as representing, modeling, processing, querying and mining massive, distributed, large-scale repositories (mostly being of unstructured nature). Inspired by this main trend, in this paper we discuss three important aspects of Big Data research, namely OLAP over Big Data, Big Data Posting, and Privacy of Big Data. We also depict future research directions, hence implicitly defining a research agenda aiming at leading future challenges in this research field.

113 citations


Book ChapterDOI
29 Aug 2013
TL;DR: This paper proposes the notion of process cubes where events and process models are organized using different dimensions and each cell in the process cube corresponds to a set of events and can be used to discover a process model, to check conformance with respect to some process models, or to discover bottlenecks.
Abstract: Recent breakthroughs in process mining research make it possible to discover, analyze, and improve business processes based on event data. The growth of event data provides many opportunities but also imposes new challenges. Process mining is typically done for an isolated well-defined process in steady-state. However, the boundaries of a process may be fluid and there is a need to continuously view event data from different angles. This paper proposes the notion of process cubes where events and process models are organized using different dimensions. Each cell in the process cube corresponds to a set of events and can be used to discover a process model, to check conformance with respect to some process model, or to discover bottlenecks. The idea is related to the well-known OLAP (Online Analytical Processing) data cubes and associated operations such as slice, dice, roll-up, and drill-down. However, there are also significant differences because of the process-related nature of event data. For example, process discovery based on events is incomparable to computing the average or sum over a set of numerical values. Moreover, dimensions related to process instances (e.g. cases are split into gold and silver customers), subprocesses (e.g. acquisition versus delivery), organizational entities (e.g. backoffice versus frontoffice), and time (e.g., 2010, 2011, 2012, and 2013) are semantically different and it is challenging to slice, dice, roll-up, and drill-down process mining results efficiently.

95 citations


Journal ArticleDOI
Tim Kraska1
TL;DR: With the increasing importance of big data, many new systems have been developed to "solve" the big data challenge but at the same time, famous database researchers argue that there is nothing new about these systems and that they're actually a step backward.
Abstract: With the increasing importance of big data, many new systems have been developed to "solve" the big data challenge. At the same time, famous database researchers argue that there is nothing new about these systems and that they're actually a step backward. This article sheds some light on this discussion.

79 citations


Proceedings ArticleDOI
22 Jul 2013
TL;DR: This paper explores the convergence of Data Warehousing, OLAP and data-intensive Cloud Infrastructures in the context of so-called analytics over Big Data.
Abstract: This paper explores the convergence of Data Warehousing, OLAP and data-intensive Cloud Infrastructures in the context of so-called analytics over Big Data The paper briefly reviews some state-of-the-art proposals, highlights open research issues and, finally, it draws possible research directions in this scientific field

76 citations


Journal ArticleDOI
TL;DR: A hybrid OLAP-association rule mining based quality management system (HQMS) to extract defect patterns in the garment industry and the results indicate that the HQMS contributes significantly to the formulation of quality improvement in the industry.
Abstract: In today's garment industry, garment defects have to be minimized so as to fulfill the expectations of demanding customers who seek products of high quality but low cost. However, without any data mining tools to manage massive data related to quality, it is difficult to investigate the hidden patterns among defects which are important information for improving the quality of garments. This paper presents a hybrid OLAP-association rule mining based quality management system (HQMS) to extract defect patterns in the garment industry. The mined results indicate the relationship between defects which serves as a reference for defect prediction, root cause identification and the formulation of proactive measures for quality improvement. Because real-time access to desirable information is crucial for survival under the severe competition, the system is equipped with Online Analytical Processing (OLAP) features so that manufacturers are able to explore the required data in a timely manner. The integration of OLAP and association rule mining allows data mining to be applied on a multidimensional basis. A pilot run of the HQMS is undertaken in a garment manufacturing company to demonstrate how OLAP and association rule mining are effective in discovering patterns among product defects. The results indicate that the HQMS contributes significantly to the formulation of quality improvement in the industry.

57 citations


Journal ArticleDOI
TL;DR: A user-centric conceptual model for data warehouses and OLAP systems, called the Cube Algebra, that takes the cube metaphor literally and provides the knowledge worker with high-level cube objects and related concepts is proposed.
Abstract: The lack of an appropriate conceptual model for data warehouses and OLAP systems has led to the tendency to deploy logical models for example, star, snowflake, and constellation schemas for them as conceptual models. ER model extensions, UML extensions, special graphical user interfaces, and dashboards have been proposed as conceptual approaches. However, they introduce their own problems, are somehow complex and difficult to understand, and are not always user-friendly. They also require a high learning curve, and most of them address only structural design, not considering associated operations. Therefore, they are not really an improvement and, in the end, only represent a reflection of the logical model. The essential drawback of offering this system-centric view as a user concept is that knowledge workers are confronted with the full and overwhelming complexity of these systems as well as complicated and user-unfriendly query languages such as SQL OLAP and MDX. In this article, the authors propose a user-centric conceptual model for data warehouses and OLAP systems, called the Cube Algebra. It takes the cube metaphor literally and provides the knowledge worker with high-level cube objects and related concepts. A novel query language leverages well known high-level operations such as roll-up, drill-down, slice, and drill-across. As a result, the logical and physical levels are hidden from the unskilled end user.

49 citations


Book ChapterDOI
25 Sep 2013
TL;DR: This paper investigates solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and proposes the framework OLAP*, along with the associated benchmark TPC-H*d, a suitable transformation of the well-known data warehouse benchmark T PC-H.
Abstract: In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmark TPC-H*d, a suitable transformation of the well-known data warehouse benchmark TPC-H. We demonstrate through performance measurements the efficiency of the proposed framework, developed on top of the ROLAP server Mondrian.

46 citations


Book ChapterDOI
26 May 2013
TL;DR: A empirical argument for the need of an OLAP-to-SPARQL engines for analytical query capabilities in industry is given, and the performance gain of RDF aggregate views that, similar to aggregate tables in ROLAP, materialise parts of the data cube are measured.
Abstract: Statistics published as Linked Data promise efficient extraction, transformation and loading (ETL) into a database for decision support. The predominant way to implement analytical query capabilities in industry are specialised engines that translate OLAP queries to SQL queries on a relational database using a star schema (ROLAP). A more direct approach than ROLAP is to load Statistical Linked Data into an RDF store and to answer OLAP queries using SPARQL. However, we assume that general-purpose triple stores – just as typical relational databases – are no perfect fit for analytical workloads and need to be complemented by OLAP-to-SPARQL engines. To give an empirical argument for the need of such an engine, we first compare the performance of our generated SPARQL and of ROLAP SQL queries. Second, we measure the performance gain of RDF aggregate views that, similar to aggregate tables in ROLAP, materialise parts of the data cube.

44 citations


Journal ArticleDOI
TL;DR: Semantically enriching process execution data can successfully raise analysis from the syntactic to the semantic level, and enable multiple perspectives of analysis on business processes.
Abstract: Purpose – The purpose of this paper is to propose a solution to the problem of a lack of machine processable semantics in business process management.Design/methodology/approach – The paper introduces a methodology that combines domain and company‐specific ontologies and databases to obtain multiple levels of abstraction for process mining and analysis. The authors valuated this approach with a real case study from the apparel domain, using a prototype system and techniques developed in the Process Mining Framework (ProM). The results of this approach are compared with similar research.Findings – Semantically enriching process execution data can successfully raise analysis from the syntactic to the semantic level, and enable multiple perspectives of analysis on business processes. Combining this approach with complementary research in semantic business process management (SBPM) can provide results comparable to multidimensional analysis in data warehouse and on line analytical processing (OLAP) technologi...


Journal Article
TL;DR: The HyPerScript transaction programming language, the mainmemory indexing technique ART, which is decisive for high transaction processing performance, and HyPer’s transaction management that allows heterogeneous workloads consisting of short pre-canned transactions, OLAP-style queries, and long interactive transactions are surveyed.
Abstract: Two emerging hardware trends have re-initiated the development of in-core database systems: ever increasing main-memory capacities and vast multi-core parallel processing power. Main-memory capacities of several TB allow to retain all transactional data of even the largest applications in-memory on one (or a few) servers. The vast computational power in combination with low data management overhead yields unprecedented transaction performance which allows to push transaction processing (away from application servers) into the database server and still “leaves room” for additional query processing directly on the transactional data. Thereby, the often postulated goal of real-time business intelligence, where decision makers have access to the latest version of the transactional state, becomes feasible. In this paper we will survey the HyPerScript transaction programming language, the mainmemory indexing technique ART, which is decisive for high transaction processing performance, and HyPer’s transaction management that allows heterogeneous workloads consisting of short pre-canned transactions, OLAP-style queries, and long interactive transactions.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: This work attempts to extend the established OLAP technology to allow multidimensional analysis of social media data by integrating text and opinion mining methods into the data warehousing system and by exploiting various knowledge discovery techniques to deal with semi-structured and unstructured data from social media.
Abstract: Social networks are platforms where millions of users interact frequently and share variety of digital content with each other. Users express their feelings and opinions on every topic of interest. These opinions carry import value for personal, academic and commercial applications, but the volume and the speed at which these are produced make it a challenging task for researchers and the underlying technologies to provide useful insights to such data. We attempt to extend the established OLAP(On-line Analytical Processing) technology to allow multidimensional analysis of social media data by integrating text and opinion mining methods into the data warehousing system and by exploiting various knowledge discovery techniques to deal with semi-structured and unstructured data from social media. The capabilities of OLAP are extended by semantic enrichment of the underlying dataset to discover new measures and dimensions for building data cubes and by supporting up-to-date analysis of the evolving as well as the historical social media data. The benefits of such an analysis platform are demonstrated by building a data warehouse for a social network of Twitter, dynamically enriching the underlying dataset and enabling multidimensional analysis.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: ScyPer is presented, a Scale-out of the authors' HyPer main memory database system that horizontally scales out on shared-nothing hardware, aimed at sustaining the superior OLTP throughput of a single HyPer server, and providing elastic OLAP throughput by provisioning additional servers on-demand, e.g., in the Cloud.
Abstract: Ever increasing main memory sizes and the advent of multi-core parallel processing have fostered the development of in-core databases. Even the transactional data of large enterprises can be retained in-memory on a single server. Modern in-core databases like our HyPer system achieve best-of-breed OLTP throughput that is sufficient for the lion's share of applications. Remaining server resources are used for OLAP query processing on the latest transactional data, i.e., real-time business analytics. While OLTP performance of a single server is sufficient, an increasing demand for OLAP throughput can only be satisfied economically by a scale-out.In this work we present ScyPer, a Scale-out of our HyPer main memory database system that horizontally scales out on shared-nothing hardware. With ScyPer we aim at (i) sustaining the superior OLTP throughput of a single HyPer server, and (ii) providing elastic OLAP throughput by provisioning additional servers on-demand, e.g., in the Cloud.

Proceedings ArticleDOI
19 Jun 2013
TL;DR: Current state-of-the-art B.I components (tools) are discussed and hospitals advances in their businesses by using B.i solutions through focusing on inter-relationship of business needs and the IT technologies are outlined.
Abstract: Healthcare environment is growing to include not only the traditional information systems, but also a business intelligence platform. For executive leaders, consultants, and analysts, there is no longer a need to spend hours in design and develop of typical reports or charts, the entire solution can be completed through using Business Intelligence “BI” software. This paper discusses current state-of-the-art B.I components (tools) and outlines hospitals advances in their businesses by using B.I solutions through focusing on inter-relationship of business needs and the IT technologies. We also present a case study that illustrates of transforming a traditional online transactional processing (OLTP) system towards building an online analytical processing (OLAP) solution.

Book ChapterDOI
30 Aug 2013
TL;DR: This paper proposes comparative process mining using process cubes, and focuses on educational data, which allows for the comparison of students watching video lectures given by the first author and differences between male and female students, between different parts of the course, and between Dutch students and international students.
Abstract: Process mining techniques enable the analysis of a wide variety of processes using event data. For example, event logs can be used to automatically learn a process model (e.g., a Petri net or BPMN model). Next to the automated discovery of the real underlying process, there are process mining techniques to analyze bottlenecks, to uncover hidden inefficiencies, to check compliance, to explain deviations, to predict performance, and to guide users towards “better” processes. Dozens (if not hundreds) of process mining techniques are available and their value has been proven in many case studies. However, existing techniques focus on the analysis of a single process rather than the comparison of different processes. In this paper, we propose comparative process mining using process cubes. An event has attributes referring to the dimensions of the process cube. Through slicing, dicing, rolling-up, and drilling-down we can view event data from different angles and produce process mining results that can be compared. To illustrate the process cube concept, we focus on educational data. In particular, we analyze data of students watching video lectures given by the first author. The dimensions of the process cube allow us to compare the process of students that passed the course versus the process of students that failed. We can also analyze differences between male and female students, between different parts of the course, and between Dutch students and international students. The initial analysis provided in this paper is used to elicit requirements for better tool support facilitating comparative process mining.

Journal ArticleDOI
TL;DR: A multidimensional data model to integrate sentiment data extracted from opinion posts in a traditional corporate data warehouse and a new sentiment data extraction method that applies semantic annotation as a means to facilitate the integration of both types of data is presented.
Abstract: Web opinion feeds have become one of the most popular information sources users consult before buying products or contracting services. Negative opinions about a product can have a high impact in its sales figures. As a consequence, companies are more and more concerned about how to integrate opinion data in their business intelligence models so that they can predict sales figures or define new strategic goals. After analysing the requirements of this new application, this paper proposes a multidimensional data model to integrate sentiment data extracted from opinion posts in a traditional corporate data warehouse. Then, a new sentiment data extraction method that applies semantic annotation as a means to facilitate the integration of both types of data is presented. In this method, Wikipedia is used as the main knowledge resource, together with some well-known lexicons of opinion words and other corporate data and metadata stores describing the company products like, for example, technical specifications and user manuals. The resulting information system allows users to perform new analysis tasks by using the traditional OLAP-based data warehouse operators. We have developed a case study over a set of real opinions about digital devices which are offered by a wholesale dealer. Over this case study, the quality of the extracted sentiment data is evaluated, and some query examples that illustrate the potential uses of the integrated model are provided.

Proceedings ArticleDOI
28 Oct 2013
TL;DR: A contextual text cube model denoted CXT-Cube is proposed which considers several contextual factors during the OLAP analysis in order to better consider the contextual information associated with textual data.
Abstract: Traditional data warehousing technologies and On-Line Analytical Processing (OLAP) are unable to analyze textual data. Moreover, as OLAP queries of a decision-maker are generally related to a context, contextual information must be taken into account during the exploitation of data warehouses. Thus, we propose a contextual text cube model denoted CXT-Cube which considers several contextual factors during the OLAP analysis in order to better consider the contextual information associated with textual data. CXT-Cube is characterized by several contextual dimensions, each one related to a contextual factor. In addition, we extend our aggregation OLAP operator for textual data ORank (OLAP-Rank) to consider all the contextual factors defined in our CXT-Cube model. To validate our model, we perform an experimental study and the preliminary results show the importance of our approach for integrating textual data into a data warehouse and improving the decision-making.

Book ChapterDOI
02 Apr 2013
TL;DR: A text cube approach to studying different kinds of human, social and cultural behavior (HSCB) embedded in the Twitter stream is discussed, including public sentiment in a U.S. city and political sentiment in the Arab Spring.
Abstract: Twitter is a microblogging website that has been useful as a source for human social behavioral analysis, such as political sentiment analysis, user influence, and spread of news. In this paper, we discuss a text cube approach to studying different kinds of human, social and cultural behavior (HSCB) embedded in the Twitter stream. Text cube is a new way to organize data (e.g., Twitter text) in multiple dimensions and multiple hierarchies for efficient information query and visualization. With the HSCB measures defined in a cube, users are able to view statistical reports and perform online analytical processing. Along with viewing and analyzing Twitter text using cubes and charts, we have also added the capability to display the contents of the cube on a heat map. The degree of opacity is directly proportional to the value of the behavioral, social or cultural measure. This kind of map allows the analyst to focus attention on hotspots of concern in a region of interest. In addition, the text cube architecture supports the development of data mining models using the data taken from cubes. We provide several case studies to illustrate the text cube approach, including public sentiment in a U.S. city and political sentiment in the Arab Spring.

01 Jan 2013
TL;DR: This demo shows that ScyPer achieves a near-linear scale-out of OLAP query throughput with the number of active nodes, sustains a constant OLTP throughput, and offers real-time analytical capabilities through market-leading query response times and periodically forked TX-consistent virtual memory snapshots with sub-second lifetime durations.
Abstract: ScyPer is an abbreviation for Scaled-out HyPer, a version of the HyPer main memory hybrid OLTP&OLAP database system that horizontally scales out on sharednothing commodity hardware. Our demo shows that ScyPer a) achieves a near-linear scale-out of OLAP query throughput with the number of active nodes, b) sustains a constant OLTP throughput, c) is resilient to node failures, and d) offers real-time analytical capabilities through market-leading query response times and periodically forked TX-consistent virtual memory snapshots with sub-second lifetime durations.

Proceedings ArticleDOI
18 Mar 2013
TL;DR: Results are presented for an automatic design tool that is aimed at column-oriented DBMSes on OLAP workloads and the key problem is selecting proper sort orders and compression schemes for the columns as well as appropriate pre-join views.
Abstract: Good database design is typically a very difficult and costly process. As database systems get more complex and as the amount of data under management grows, the stakes increase accordingly. Past research produced a number of design tools capable of automatically selecting secondary indexes and materialized views for a known workload. However, a significant bulk of research on automated database design has been done in the context of row-store DBMSes. While this work has produced effective design tools, new specialized database architectures demand a rethinking of automated design algorithms.In this paper, we present results for an automatic design tool that is aimed at column-oriented DBMSes on OLAP workloads. In particular, we have chosen a commercial column store DBMS that supports data sorting. In this setting, the key problem is selecting proper sort orders and compression schemes for the columns as well as appropriate pre-join views. This paper describes our automatic design algorithms as well as the results of some experiments using it on realistic data sets.

Book ChapterDOI
26 Aug 2013
TL;DR: This paper proposes a framework to predict the most likely next query and recommend this to the user based on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric.
Abstract: In Business Intelligence systems, users interact with data warehouses by formulating OLAP queries aimed at exploring multidimensional data cubes. Being able to predict the most likely next queries would provide a way to recommend interesting queries to users on the one hand, and could improve the efficiency of OLAP sessions on the other. In particular, query recommendation would proactively guide users in data exploration and improve the quality of their interactive experience. In this paper, we propose a framework to predict the most likely next query and recommend this to the user. Our framework relies on a probabilistic user behavior model built by analyzing previous OLAP sessions and exploiting a query similarity metric. To gain insight in the recommendation precision and on what parameters it depends, we evaluate our approach using different quality assessments.

Proceedings ArticleDOI
28 Oct 2013
TL;DR: ProtOLAP is proposed, a tool-assisted fast prototyping methodology that enables quick and reliable test and validation of data warehouse schemata in situations where data supply is collected on users' demand and users' ICT skills are minimal.
Abstract: The approaches to data warehouse design are based on the assumption that source data are known in advance and available. While this assumption is true in common project situations, in some peculiar contexts it is not. This is the case of the French national project for analysis of energetic agricultural farms, that is the case study of this paper. Here, the above-mentioned methods can hardly be applied because source data can only be identified and collected once user requirements indicate a need. Besides, the users involved in this project found it very hard to express their analysis needs in abstract terms, i.e., without visualizing sample results of queries, which in turn would require availability of source data. To solve this deadlock we propose ProtOLAP, a tool-assisted fast prototyping methodology that enables quick and reliable test and validation of data warehouse schemata in situations where data supply is collected on users' demand and users' ICT skills are minimal. To this end, users manually feed sample realistic data into a prototype created by designers, then they access and explore these sample data using pivot tables to validate the prototype.

Proceedings ArticleDOI
23 Dec 2013
TL;DR: CR-OLAP is introduced, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors.
Abstract: In contrast to queries for on-line transaction processing (OLTP) systems that typically access only a small portion of a database, OLAP queries may need to aggregate large portions of a database which often leads to performance issues. In this paper we introduce CR-OLAP, a Cloud based Real-time OLAP system based on a new distributed index structure for OLAP, the distributed PDCR tree, that utilizes a cloud infrastructure consisting of (m + 1) multi-core processors. With increasing database size, CR-OLAP dynamically increases m to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are so central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as “report the total sales in all stores located in California and New York during the months February-May of all years”. We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. For example, on an Amazon EC2 cloud instance with eight processors, for a TPC-DS OLAP query stream on a data warehouse with 80 million tuples where every OLAP query aggregates more than 50% of the database, CR-OLAP achieved a query latency of 0.3 seconds which can be considered a real time response.

Journal ArticleDOI
01 Jan 2013
TL;DR: In this paper, a multi-dimensional model for opinion mining is proposed, which integrates customers' characteristics and their opinion about any products, and the model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location.
Abstract: Online business or Electronic Commerce (EC) is getting popular among customers today, as a result large number of product reviews have been posted online by the customers. This information is very valuable not only for prospective customers to make decision on buying product but also for companies to gather information of customers’ satisfaction about their products. Opinion mining is used to capture customer reviews and separated this review into subjective expressions (sentiment word) and objective expressions (no sentiment word). This paper proposes a novel, multi-dimensional model for opinion mining, which integrates customers’ characteristics and their opinion about any products. The model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location. Data warehouse techniques such as OLAP and Data Cubes were used to analyze opinionated sentences. A comprehensive way to calculate customers’ orientation on products’ features and attributes are presented in this paper.

Patent
20 Mar 2013
TL;DR: In this paper, an on-line analytical processing (OLAP) massive multidimensional data dimension storage method is presented, where dimensions are divided according to dimensions, dimension hierarchical encoding is built, a high definition (HD) File dimension storage file structure is designed, only relevant dimension corresponding data needs to be accessed for aggregation calculation, and therefore retrieval of unrelated data is avoided.
Abstract: The invention discloses an on-line analytical processing (OLAP) massive multidimensional data dimension storage method. Firstly, OLAP multidimensional data are divided according to dimensions, dimension hierarchical encoding is built, a high definition (HD) File dimension storage file structure is designed, only relevant dimension corresponding data needs to be accessed for aggregation calculation, and therefore retrieval of unrelated data is avoided; secondly, a B+ tree index based on the dimension hierarchical encoding is built for rapid positioning of the dimension storage data, and therefore input (I)/output (O) overhead is saved; and at last, a high-efficiency parallel query algorithm is designed, and OLAP query efficiency is further improved. Therefore, the OLAP massive multidimensional data dimension storage method which is high in efficiency, easy to use and scalable is provided for massive data analysis application for scientific experimental statistics, environmental meteorology, bioinformatics computing and the like.

Proceedings Article
03 May 2013
TL;DR: The indexed table-at-a-time processing model allows the ecient construction of composed operators like the multi-way-select-join-group and speed up the processing of complex OLAP queries so that the approach outperforms state-of-the-art in-memory databases.
Abstract: Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually columnstores that exchange columns or vectors between operators and suer from a high tuple reconstruction overhead. In this paper, we present the indexed table-at-a-time processing model that makes indexes the first-class citizen of the database system. The processing model comprises the concepts of intermediate indexed tables and cooperative operators, which make indexes the common data exchange format between plan operators. To keep the intermediate index materialization costs low, we employ optimized prefix trees that oer a balanced read/write performance. The indexed tableat-a-time processing model allows the ecient construction of composed operators like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that our approach outperforms state-of-the-art in-memory databases.

Proceedings ArticleDOI
11 Aug 2013
TL;DR: This proposed EventCube demo will show the power of the system not only on the originally designed ASRS (Aviation Safety Report System) data sets, but also on news datasets collected from multiple news agencies, and academic datasets constructed from the DBLP and web data.
Abstract: A large portion of real world data is either text or structured (e.g., relational) data. Moreover, such data objects are often linked together (e.g., structured specification of products linking with the corresponding product descriptions and customer comments). Even for text data such as news data, typed entities can be extracted with entity extraction tools. The EventCube project constructs TextCube and TopicCube from interconnected structured and text data (or from text data via entity extraction and dimension building), and performs multidimensional search and analysis on such datasets, in an informative, powerful, and user-friendly manner. This proposed EventCube demo will show the power of the system not only on the originally designed ASRS (Aviation Safety Report System) data sets, but also on news datasets collected from multiple news agencies, and academic datasets constructed from the DBLP and web data. The system has high potential to be extended in many powerful ways and serve as a general platform for search, OLAP (online analytical processing) and data mining on integrated text and structured data. After the system demo in the conference, the system will be put on the web for public access and evaluation.

Book ChapterDOI
01 Jan 2013
TL;DR: The prototypical implementation of the DMIS focussing on the data warehouse and OLAP functionalities for customer feedback processes proved the suitability and effectiveness of the proposed overall architecture.
Abstract: Information and communication technologies (ICTS) play a crucial role to increase the knowledge base of destination stakeholders. Organisational learning and managerial effectiveness can particularly be enhanced by applying methods of business intelligence (BI). Although huge amounts of data are available in tourism destinations these valuable knowledge sources typically remain unused. The described problem is solved by conceptualizing, prototypically implementing and testing a novel destination management information system (DMIS) that applies methods of BI and data warehousing for the leading Swedish ski destination, Are. As being a central DMIS component, the destination-wide data warehouse (DW), its underlying multi-dimensional data model, the technical architecture, as well as critical implementation issues are discussed. Finally, the prototypical implementation of the DMIS focussing on the data warehouse and OLAP functionalities for customer feedback processes proved the suitability and effectiveness of the proposed overall architecture.