Showing papers presented at "Business Intelligence for the Real-Time Enterprises in 2008"

PDF

Open Access

Book Chapter•DOI•

A Hybrid Row-Column OLTP Database Architecture for Operational Reporting

[...]

Jan Schaffner¹, Anja Bog¹, Jens Krüger¹, Alexander Zeier¹•Institutions (1)

24 Aug 2008

TL;DR: Using an ODS as the source for operational reporting exhibits a similar information latency to informational reporting, so it is often not desirable to maintain data on such detailed level in the data warehouse, due to both exploding size of the warehouse and the update frequency.

...read moreread less

Abstract: Operational reporting differs from informational reporting in that its scope is on day-to-day operations and thus requires data on the detail of individual transactions. It is often not desirable to maintain data on such detailed level in the data warehouse, due to both exploding size of the warehouse and the update frequency required for operational reports. Using an ODS as the source for operational reporting exhibits a similar information latency.

...read moreread less

94 citations

Book Chapter•DOI•

Situational Business Intelligence

[...]

Alexander Löser¹, Fabian Hueske¹, Volker Markl¹•Institutions (1)

Technical University of Berlin¹

24 Aug 2008

TL;DR: This work focuses on a different kind of business intelligence, which spontaneously correlates data from a company’s data warehouse with "external" information sources that may come from the corporate intranet, are acquired from some external vendor, or are derived from the internet.

...read moreread less

Abstract: Traditional business intelligence has focused on creating dimensional models and data warehouses, where after a high modeling and creation cost structurally similar queries are processed on a regular basis. So called "ad-hoc" queries aggregate data from one or several dimensional models, but fail to incorporate other external information that is not considered in the pre-defined data model. We focus on a different kind of business intelligence, which spontaneously correlates data from a company’s data warehouse with "external" information sources that may come from the corporate intranet, are acquired from some external vendor, or are derived from the internet. Such situational applications are usually short-lived programs created for a small group of users with a specific business need. We will showcase the state-of-the-art for situational applications as well as the impact of Web 2.0 for these applications. We will also present examples and research challenges that the information management research community needs to address in order to arrive at a platform for Situational Business Intelligence.

...read moreread less

50 citations

Book Chapter•DOI•

Beyond Conventional Data Warehousing — Massively Parallel Data Processing with Greenplum Database

[...]

Florian Waas

24 Aug 2008

TL;DR: How a massively parallel system like Greenplum Database can be used for MapReduce-like data processing in addition to conventional application areas is discussed.

...read moreread less

Abstract: In this presentation we discuss trends and challenges for data warehousing beyond conventional application areas. In particular, we discuss how a massively parallel system like Greenplum Database can be used for MapReduce-like data processing.

...read moreread less

47 citations

Book Chapter•DOI•

The Reality of Real-Time Business Intelligence

[...]

Divyakant Agrawal¹•Institutions (1)

University of California, Santa Barbara¹

24 Aug 2008

TL;DR: This paper takes an evolutionary approach to obtain a better understanding of the role of real-time business intelligence in the context of enterprise-wide information infrastructures and proposes a reference architecture for building a real- time business intelligence system.

...read moreread less

Abstract: Real-time Business Intelligence has emerged as a new technology solution to provide timely data-driven analysis of enterprise wide data and information. Such type of data analysis is needed for both tactical as well as strategic decision making tasks within an enterprise. Unfortunately, there is no clarity about the critical technology components that distinguish a real-time business intelligence system from traditional data warehousing and business intelligence solutions. In this paper, we take an evolutionary approach to obtain a better understanding of the role of real-time business intelligence in the context of enterprise-wide information infrastructures. We then propose a reference architecture for building a real-time business intelligence system. By using this reference architecture we identify the key research and development challenges in the areas of data-stream analysis, complex event processing, and real-time data integration that must be overcome for making real-time business intelligence a reality.

...read moreread less

20 citations

Book Chapter•DOI•

Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration

[...]

Bogdan Alexe¹, Michael Gubanov², Mauricio A. Hernández³, C. T. Howard Ho³, Jen-Wei Huang⁴, Yannis Katsis⁵, Lucian Popa³, Barna Saha⁶, Ioana Stanoi³ - Show less +5 more•Institutions (6)

University of California, Santa Cruz¹, University of Washington², IBM³, National Taiwan University⁴, University of California, San Diego⁵, University of Maryland, College Park⁶

24 Aug 2008

TL;DR: This paper describes how Unified Famous Objects (UFOs), a schema abstraction similar to business objects, is used as a data model, how to reason about flows of mappings over UFOs, and how to create and deploy transformations into different run-time engines.

...read moreread less

Abstract: The Clio project at IBM Almaden investigates foundational aspects of data transformation, with particular emphasis on the design and execution of schema mappings. We now use Clio as part of a broader data-flow framework in which mappings are just one component. These data-flows express complex transformations between several source and target schemas and require multiple mappings to be specified. This paper describes research issues we have encountered as we try to create and run these mapping-based data-flows. In particular, we describe how we use Unified Famous Objects (UFOs), a schema abstraction similar to business objects, as our data model, how we reason about flows of mappings over UFOs, and how we create and deploy transformations into different run-time engines.

...read moreread less

18 citations

Book Chapter•DOI•

QoS-Aware Publish-Subscribe Service for Real-Time Data Acquisition

[...]

Xinjie Lu¹, Xin Li², Tian Yang¹, Zaifei Liao¹, Wei Liu¹, Hongan Wang¹ - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, Shandong University²

24 Aug 2008

TL;DR: A novel QoS-aware Real-Time Publish-Subscribe (QRTPS) service compatible to DDS for distributed real-time data acquisition is proposed and implemented on Agilor by using objects and RECA rules in AgILor.

...read moreread less

Abstract: Many complex distributed real-time applications need complicated processing and sharing of an extensive amount of data under critical timing constraints. In this paper, we present a comprehensive overview of the Data Distribution Service standard (DDS) and describe its QoS features for developing real-time applications. An overview of an active real-time database (ARTDB) named Agilor is also provided. For efficient expressing QoS policy in Agilor, a Real-time ECA (RECA) rule model is presented based on common ECA rule. And then we propose a novel QoS-aware Real-Time Publish-Subscribe (QRTPS) service compatible to DDS for distributed real-time data acquisition. Furthermore, QRTPS is implemented on Agilor by using objects and RECA rules in Agilor. To illustrate the benefits of QRTPS for real-time data acquisition, an example application is presented.

...read moreread less

14 citations

Book Chapter•DOI•

On Solving Efficiently the View Selection Problem under Bag-Semantics

[...]

Foto N. Afrati¹, Matthew Damigos¹, Manolis Gergatsoulis²•Institutions (2)

National Technical University of Athens¹, Ionian University²

24 Aug 2008

TL;DR: In this paper, the problem of view selection for workloads of conjunctive queries under bag semantics is investigated, and the authors aim to limit the search space of candidate viewsets, and they start delineating the boundary between query workloads for which certain restricted search spaces suffice.

...read moreread less

Abstract: In this paper, we investigate the problem of view selection for workloads of conjunctive queries under bag semantics. In particular we aim to limit the search space of candidate viewsets. In that respect we start delineating the boundary between query workloads for which certain restricted search spaces suffice. They suffice in the sense that they do not compromise optimality in that they contain at least one of the optimal solutions. We start with the general case, where we give a tight condition that candidate views can satisfy and still the search space (thus limited) does contain at least one optimal solution. Preliminary experiments show that this reduces the size of the search space significantly. Then we study special cases. We show that for chain query workloads, taking only chain views may miss all optimum solutions, whereas, if we further limit the queries to be path queries (i.e., chain queries over a single binary relation), then path views suffice. This last result shows that in the case of path queries, taking query subexpressions suffice.

...read moreread less

2 citations

Book Chapter•DOI•

Scalable Data-Intensive Analytics

[...]

Meichun Hsu¹, Qiming Chen¹•Institutions (1)

Hewlett-Packard¹

24 Aug 2008

TL;DR: Some issues and approaches in integrating large scale information extraction and analytical tasks with parallel data management are discussed, especially in a highly scaled-out architecture.

...read moreread less

Abstract: To effectively handle the scale of processing required in information extraction and analytical tasks in an era of information explosion, partitioning the data streams and applying computation to each partition in parallel is the key. Even though the concept of MapReduce has been around for some time and is well known in the functional programming literatures, it is Google which demonstrated that this very high-level abstraction is especially suitable for data-intensive computation and potentially has very high performance implementation as well. If we observe the behavior of a query plan on a modern shared-nothing parallel database system such as Teradata and HP NeoView, one notices that it also offers large-scale parallel processing while maintaining the high level abstraction of a declarative query language. The correspondence between the MapReduce parallel processing paradigm and the paradigm for parallel query processing has been observed. In addition to integrated schema management and declarative query language, the strengths of parallel SQL engines also include workload management and richer expressive power and parallel processing patterns. Compared to the MapReduce parallel processing paradigm, however, the parallel query processing paradigm has focused on native, built-in, algebraic query operators that are supported in the SQL language. Parallel query processing engines lack the ability to efficiently handle dynamically-defined procedures. While the “user-defined function” in SQL can be used to inject dynamically defined procedures, the ability of standard SQL to support flexibility of their invocation, and efficient implementation of these user-defined functions, especially in a highly scaled-out architecture, are not adequate. This paper discusses some issues and approaches in integrating large scale information extraction and analytical tasks with parallel data management.

...read moreread less

2 citations

Book Chapter•DOI•

A Near Real-Time Reporting System for Enterprises Using JavaScript Instrumentation with Inter-colo Event Replication

[...]

Timothy Michael Tully¹•Institutions (1)

Yahoo!¹

24 Aug 2008

TL;DR: It is shown that by using a combination of novel JavaScript instrumentation techniques, as well as an automated, standardized reporting system on top of a near real-time inter-colo event collection mechanism, Yahoo! is nearing its real- time reporting goals.

...read moreread less

Abstract: Yahoo! is on track to realize its goal of real-time enterprise-level reporting. Accessing real-time reports allows executives and decision makers to program content and advertising in a way that benefits both the business and the end user. This paper describes our legacy architecture, as well as a new, low latency pipeline. In particular, we show that by using a combination of novel JavaScript instrumentation techniques, as well as an automated, standardized reporting system on top of a near real-time inter-colo event collection mechanism, Yahoo! is nearing its real-time reporting goals.

...read moreread less