scispace - formally typeset
Search or ask a question

Showing papers presented at "Business Intelligence for the Real-Time Enterprises in 2008"


Book ChapterDOI
24 Aug 2008
TL;DR: Using an ODS as the source for operational reporting exhibits a similar information latency to informational reporting, so it is often not desirable to maintain data on such detailed level in the data warehouse, due to both exploding size of the warehouse and the update frequency.
Abstract: Operational reporting differs from informational reporting in that its scope is on day-to-day operations and thus requires data on the detail of individual transactions. It is often not desirable to maintain data on such detailed level in the data warehouse, due to both exploding size of the warehouse and the update frequency required for operational reports. Using an ODS as the source for operational reporting exhibits a similar information latency.

94 citations


Book ChapterDOI
24 Aug 2008
TL;DR: This work focuses on a different kind of business intelligence, which spontaneously correlates data from a company’s data warehouse with "external" information sources that may come from the corporate intranet, are acquired from some external vendor, or are derived from the internet.
Abstract: Traditional business intelligence has focused on creating dimensional models and data warehouses, where after a high modeling and creation cost structurally similar queries are processed on a regular basis. So called "ad-hoc" queries aggregate data from one or several dimensional models, but fail to incorporate other external information that is not considered in the pre-defined data model. We focus on a different kind of business intelligence, which spontaneously correlates data from a company’s data warehouse with "external" information sources that may come from the corporate intranet, are acquired from some external vendor, or are derived from the internet. Such situational applications are usually short-lived programs created for a small group of users with a specific business need. We will showcase the state-of-the-art for situational applications as well as the impact of Web 2.0 for these applications. We will also present examples and research challenges that the information management research community needs to address in order to arrive at a platform for Situational Business Intelligence.

50 citations


Book ChapterDOI
24 Aug 2008
TL;DR: How a massively parallel system like Greenplum Database can be used for MapReduce-like data processing in addition to conventional application areas is discussed.
Abstract: In this presentation we discuss trends and challenges for data warehousing beyond conventional application areas. In particular, we discuss how a massively parallel system like Greenplum Database can be used for MapReduce-like data processing.

47 citations


Book ChapterDOI
24 Aug 2008
TL;DR: This paper takes an evolutionary approach to obtain a better understanding of the role of real-time business intelligence in the context of enterprise-wide information infrastructures and proposes a reference architecture for building a real- time business intelligence system.
Abstract: Real-time Business Intelligence has emerged as a new technology solution to provide timely data-driven analysis of enterprise wide data and information. Such type of data analysis is needed for both tactical as well as strategic decision making tasks within an enterprise. Unfortunately, there is no clarity about the critical technology components that distinguish a real-time business intelligence system from traditional data warehousing and business intelligence solutions. In this paper, we take an evolutionary approach to obtain a better understanding of the role of real-time business intelligence in the context of enterprise-wide information infrastructures. We then propose a reference architecture for building a real-time business intelligence system. By using this reference architecture we identify the key research and development challenges in the areas of data-stream analysis, complex event processing, and real-time data integration that must be overcome for making real-time business intelligence a reality.

20 citations


Book ChapterDOI
24 Aug 2008
TL;DR: This paper describes how Unified Famous Objects (UFOs), a schema abstraction similar to business objects, is used as a data model, how to reason about flows of mappings over UFOs, and how to create and deploy transformations into different run-time engines.
Abstract: The Clio project at IBM Almaden investigates foundational aspects of data transformation, with particular emphasis on the design and execution of schema mappings. We now use Clio as part of a broader data-flow framework in which mappings are just one component. These data-flows express complex transformations between several source and target schemas and require multiple mappings to be specified. This paper describes research issues we have encountered as we try to create and run these mapping-based data-flows. In particular, we describe how we use Unified Famous Objects (UFOs), a schema abstraction similar to business objects, as our data model, how we reason about flows of mappings over UFOs, and how we create and deploy transformations into different run-time engines.

18 citations


Book ChapterDOI
24 Aug 2008
TL;DR: A novel QoS-aware Real-Time Publish-Subscribe (QRTPS) service compatible to DDS for distributed real-time data acquisition is proposed and implemented on Agilor by using objects and RECA rules in AgILor.
Abstract: Many complex distributed real-time applications need complicated processing and sharing of an extensive amount of data under critical timing constraints. In this paper, we present a comprehensive overview of the Data Distribution Service standard (DDS) and describe its QoS features for developing real-time applications. An overview of an active real-time database (ARTDB) named Agilor is also provided. For efficient expressing QoS policy in Agilor, a Real-time ECA (RECA) rule model is presented based on common ECA rule. And then we propose a novel QoS-aware Real-Time Publish-Subscribe (QRTPS) service compatible to DDS for distributed real-time data acquisition. Furthermore, QRTPS is implemented on Agilor by using objects and RECA rules in Agilor. To illustrate the benefits of QRTPS for real-time data acquisition, an example application is presented.

14 citations


Book ChapterDOI
24 Aug 2008
TL;DR: In this paper, the problem of view selection for workloads of conjunctive queries under bag semantics is investigated, and the authors aim to limit the search space of candidate viewsets, and they start delineating the boundary between query workloads for which certain restricted search spaces suffice.
Abstract: In this paper, we investigate the problem of view selection for workloads of conjunctive queries under bag semantics. In particular we aim to limit the search space of candidate viewsets. In that respect we start delineating the boundary between query workloads for which certain restricted search spaces suffice. They suffice in the sense that they do not compromise optimality in that they contain at least one of the optimal solutions. We start with the general case, where we give a tight condition that candidate views can satisfy and still the search space (thus limited) does contain at least one optimal solution. Preliminary experiments show that this reduces the size of the search space significantly. Then we study special cases. We show that for chain query workloads, taking only chain views may miss all optimum solutions, whereas, if we further limit the queries to be path queries (i.e., chain queries over a single binary relation), then path views suffice. This last result shows that in the case of path queries, taking query subexpressions suffice.

2 citations


Book ChapterDOI
24 Aug 2008
TL;DR: Some issues and approaches in integrating large scale information extraction and analytical tasks with parallel data management are discussed, especially in a highly scaled-out architecture.
Abstract: To effectively handle the scale of processing required in information extraction and analytical tasks in an era of information explosion, partitioning the data streams and applying computation to each partition in parallel is the key. Even though the concept of MapReduce has been around for some time and is well known in the functional programming literatures, it is Google which demonstrated that this very high-level abstraction is especially suitable for data-intensive computation and potentially has very high performance implementation as well. If we observe the behavior of a query plan on a modern shared-nothing parallel database system such as Teradata and HP NeoView, one notices that it also offers large-scale parallel processing while maintaining the high level abstraction of a declarative query language. The correspondence between the MapReduce parallel processing paradigm and the paradigm for parallel query processing has been observed. In addition to integrated schema management and declarative query language, the strengths of parallel SQL engines also include workload management and richer expressive power and parallel processing patterns. Compared to the MapReduce parallel processing paradigm, however, the parallel query processing paradigm has focused on native, built-in, algebraic query operators that are supported in the SQL language. Parallel query processing engines lack the ability to efficiently handle dynamically-defined procedures. While the “user-defined function” in SQL can be used to inject dynamically defined procedures, the ability of standard SQL to support flexibility of their invocation, and efficient implementation of these user-defined functions, especially in a highly scaled-out architecture, are not adequate. This paper discusses some issues and approaches in integrating large scale information extraction and analytical tasks with parallel data management.

2 citations


Book ChapterDOI
Timothy Michael Tully1
24 Aug 2008
TL;DR: It is shown that by using a combination of novel JavaScript instrumentation techniques, as well as an automated, standardized reporting system on top of a near real-time inter-colo event collection mechanism, Yahoo! is nearing its real- time reporting goals.
Abstract: Yahoo! is on track to realize its goal of real-time enterprise-level reporting. Accessing real-time reports allows executives and decision makers to program content and advertising in a way that benefits both the business and the end user. This paper describes our legacy architecture, as well as a new, low latency pipeline. In particular, we show that by using a combination of novel JavaScript instrumentation techniques, as well as an automated, standardized reporting system on top of a near real-time inter-colo event collection mechanism, Yahoo! is nearing its real-time reporting goals.