Conference

Statistical and Scientific Database Management

About: Statistical and Scientific Database Management is an academic conference. The conference publishes majorly in the area(s): Query optimization & Data management. Over the lifetime, 1177 publications have been published by the conference receiving 22480 citations.

...read moreread less

Topics: Query optimization, Data management, Query language, Relational database, Database design ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Kepler: an extensible system for design and execution of scientific workflows

[...]

Ilkay Altintas¹, Chad Berkley², Efrat Jaeger¹, Matthew B. Jones², Bertram Ludäscher¹, S. Mock¹ - Show less +2 more•Institutions (2)

University of California, San Diego¹, University of California, Santa Barbara²

21 Jun 2004

TL;DR: The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs), a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results.

...read moreread less

Abstract: Most scientists conduct analyses and run models in several different software and hardware environments, mentally coordinating the export and import of data from one environment to another. The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs). SWFs are a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results. Kepler attempts to streamline the workflow creation and execution process so that scientists can design, execute, monitor, re-run, and communicate analytical procedures repeatedly with minimal effort. Kepler is unique in that it seamlessly combines high-level workflow design with execution and runtime interaction, access to local and remote data, and local and remote service invocation. SWFs are superficially similar to business process workflows but have several challenges not present in the business workflow scenario. For example, they often operate on large, complex and heterogeneous data, can be computationally intensive and produce complex derived data products that may be archived for use in reparameterized runs or other workflows. Moreover, unlike business workflows, SWFs are often dataflow-oriented as witnessed by a number of recent academic systems (e.g., DiscoveryNet, Taverna and Triana) and commercial systems (Scitegic/Pipeline-Pilot, Inforsense). In a sense, SWFs are often closer to signal-processing and data streaming applications than they are to control-oriented business workflow applications.

...read moreread less

746 citations

Proceedings Article•DOI•

Chimera: a virtual data system for representing, querying, and automating data derivation

[...]

Ian Foster¹, J. Vockler², Michael Wilde¹, Yong Zhao²•Institutions (2)

Argonne National Laboratory¹, University of Chicago²

24 Jul 2002

TL;DR: The Chimera virtual data system is developed, which combines avirtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database.

...read moreread less

Abstract: A lot of scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. We hypothesize that explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data"). To explore this idea, we have developed the Chimera virtual data system, which combines a virtual data catalog for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database. We couple the Chimera system with distributed "data grid" services to enable on-demand execution of computation schedules constructed from database queries. We have applied this system to two challenge problems, the reconstruction of simulated collision event data from a high-energy physics experiment, and searching digital sky survey data for galactic clusters, with promising results.

...read moreread less

665 citations

Proceedings Article•DOI•

GPS: a graph processing system

[...]

Semih Salihoglu¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

29 Jul 2013

TL;DR: This paper describes the implementation of GPS and its novel features, and presents experimental results on the performance effects of both static and dynamic graph partitioning schemes, and describes the compilation of a high-level domain-specific programming language to GPS, enabling easy expression of complex algorithms.

...read moreread less

Abstract: GPS (for Graph Processing System) is a complete open-source system we developed for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. This paper serves the dual role of describing the GPS system, and presenting techniques and experimental results for graph partitioning in distributed graph-processing systems like GPS. GPS is similar to Google's proprietary Pregel system, with three new features: (1) an extended API to make global computations more easily expressed and more efficient; (2) a dynamic repartitioning scheme that reassigns vertices to different workers during the computation, based on messaging patterns; and (3) an optimization that distributes adjacency lists of high-degree vertices across all compute nodes to improve performance. In addition to presenting the implementation of GPS and its novel features, we also present experimental results on the performance effects of both static and dynamic graph partitioning schemes, and we describe the compilation of a high-level domain-specific programming language to GPS, enabling easy expression of complex algorithms.

...read moreread less

541 citations

Proceedings Article•DOI•

Moving objects databases: issues and solutions

[...]

Ouri Wolfson¹, Bo Xu, Sam Chamberlain, Liqin Jiang•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jul 1998

TL;DR: The objective of the Databases fOr MovINg Objects (DOMINO) project is to build an envelope containing a critical set of capabilities that are needed by moving object database applications and are lacking in existing DBMSs.

...read moreread less

Abstract: Consider a database that represents information about moving objects and their location. For example, for a database representing the location of taxi-cabs a typical query may be: retrieve the free cabs that are currently within 1 mile of 33 N. Michigan Ave., Chicago (to pickup a customer). In the military, moving object database applications arise in the context of the digital battlefield and in the civilian industry they arise in transportation systems. Currently, moving object database applications are being developed in an ad hoc fashion. Database management system (DBMS) technology provides a potential foundation upon which to develop these applications, however DBMSs are currently not used for this purpose. The reason is that there is a critical set of capabilities that are needed by moving object database applications and are lacking in existing DBMSs. The objective of our Databases fOr MovINg Objects (DOMINO) project is to build an envelope containing these capabilities on top of existing DBMSs. We describe the problems and our proposed solutions.

...read moreread less

515 citations

Proceedings Article•DOI•

Summarizability in OLAP and statistical data bases

[...]

H.-J. Lenz¹, Arie Shoshani•Institutions (1)

Lawrence Berkeley National Laboratory¹

11 Aug 1997

TL;DR: A framework for precisely specifying the context in which statistical objects are defined is introduced, which uses a three-step process to define normalized statistical objects.

...read moreread less

Abstract: The summarizability of OLAP (online analytical processing) and statistical databases is an a extremely important property, because violating this condition can lead to erroneous conclusions and decisions. In this paper, we explore the conditions for summarizability. We introduce a framework for precisely specifying the context in which statistical objects are defined. We use a three-step process to define normalized statistical objects. Using this framework, we identify three necessary conditions for summarizability. We provide specific tests for each of the conditions that can be verified either from semantic knowledge or by checking the statistical database itself. We also provide the reasoning for our belief that these three summarizability conditions are sufficient as well.

...read moreread less

392 citations

Collapse

Performance

Metrics

1,177

Papers

22,480

Citations

No. of papers from the Conference in previous years
Year	Papers
2021	32
2020	32
2019	27
2018	33
2017	42
2016	28