scispace - formally typeset
Search or ask a question
Conference

Statistical and Scientific Database Management 

About: Statistical and Scientific Database Management is an academic conference. The conference publishes majorly in the area(s): Query optimization & Data management. Over the lifetime, 1177 publications have been published by the conference receiving 22480 citations.


Papers
More filters
Proceedings ArticleDOI
21 Jun 2004
TL;DR: The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs), a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results.
Abstract: Most scientists conduct analyses and run models in several different software and hardware environments, mentally coordinating the export and import of data from one environment to another. The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs). SWFs are a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results. Kepler attempts to streamline the workflow creation and execution process so that scientists can design, execute, monitor, re-run, and communicate analytical procedures repeatedly with minimal effort. Kepler is unique in that it seamlessly combines high-level workflow design with execution and runtime interaction, access to local and remote data, and local and remote service invocation. SWFs are superficially similar to business process workflows but have several challenges not present in the business workflow scenario. For example, they often operate on large, complex and heterogeneous data, can be computationally intensive and produce complex derived data products that may be archived for use in reparameterized runs or other workflows. Moreover, unlike business workflows, SWFs are often dataflow-oriented as witnessed by a number of recent academic systems (e.g., DiscoveryNet, Taverna and Triana) and commercial systems (Scitegic/Pipeline-Pilot, Inforsense). In a sense, SWFs are often closer to signal-processing and data streaming applications than they are to control-oriented business workflow applications.

746 citations

Proceedings ArticleDOI
24 Jul 2002
TL;DR: The Chimera virtual data system is developed, which combines avirtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database.
Abstract: A lot of scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. We hypothesize that explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data"). To explore this idea, we have developed the Chimera virtual data system, which combines a virtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database. We couple the Chimera system with distributed "data grid" services to enable on-demand execution of computation schedules constructed from database queries. We have applied this system to two challenge problems, the reconstruction of simulated collision event data from a high-energy physics experiment, and searching digital sky survey data for galactic clusters, with promising results.

665 citations

Proceedings ArticleDOI
29 Jul 2013
TL;DR: This paper describes the implementation of GPS and its novel features, and presents experimental results on the performance effects of both static and dynamic graph partitioning schemes, and describes the compilation of a high-level domain-specific programming language to GPS, enabling easy expression of complex algorithms.
Abstract: GPS (for Graph Processing System) is a complete open-source system we developed for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. This paper serves the dual role of describing the GPS system, and presenting techniques and experimental results for graph partitioning in distributed graph-processing systems like GPS. GPS is similar to Google's proprietary Pregel system, with three new features: (1) an extended API to make global computations more easily expressed and more efficient; (2) a dynamic repartitioning scheme that reassigns vertices to different workers during the computation, based on messaging patterns; and (3) an optimization that distributes adjacency lists of high-degree vertices across all compute nodes to improve performance. In addition to presenting the implementation of GPS and its novel features, we also present experimental results on the performance effects of both static and dynamic graph partitioning schemes, and we describe the compilation of a high-level domain-specific programming language to GPS, enabling easy expression of complex algorithms.

541 citations

Proceedings ArticleDOI
01 Jul 1998
TL;DR: The objective of the Databases fOr MovINg Objects (DOMINO) project is to build an envelope containing a critical set of capabilities that are needed by moving object database applications and are lacking in existing DBMSs.
Abstract: Consider a database that represents information about moving objects and their location. For example, for a database representing the location of taxi-cabs a typical query may be: retrieve the free cabs that are currently within 1 mile of 33 N. Michigan Ave., Chicago (to pickup a customer). In the military, moving object database applications arise in the context of the digital battlefield and in the civilian industry they arise in transportation systems. Currently, moving object database applications are being developed in an ad hoc fashion. Database management system (DBMS) technology provides a potential foundation upon which to develop these applications, however DBMSs are currently not used for this purpose. The reason is that there is a critical set of capabilities that are needed by moving object database applications and are lacking in existing DBMSs. The objective of our Databases fOr MovINg Objects (DOMINO) project is to build an envelope containing these capabilities on top of existing DBMSs. We describe the problems and our proposed solutions.

515 citations

Proceedings ArticleDOI
11 Aug 1997
TL;DR: A framework for precisely specifying the context in which statistical objects are defined is introduced, which uses a three-step process to define normalized statistical objects.
Abstract: The summarizability of OLAP (online analytical processing) and statistical databases is an a extremely important property, because violating this condition can lead to erroneous conclusions and decisions. In this paper, we explore the conditions for summarizability. We introduce a framework for precisely specifying the context in which statistical objects are defined. We use a three-step process to define normalized statistical objects. Using this framework, we identify three necessary conditions for summarizability. We provide specific tests for each of the conditions that can be verified either from semantic knowledge or by checking the statistical database itself. We also provide the reasoning for our belief that these three summarizability conditions are sufficient as well.

392 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
202132
202032
201927
201833
201742
201628