Topic

Workflow

About: Workflow is a research topic. Over the lifetime, 31996 publications have been published within this topic receiving 498339 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Corleone: hands-off crowdsourcing for entity matching

[...]

Chaitanya Gokhale¹, Sanjib Das¹, AnHai Doan¹, Jeffrey F. Naughton¹, Narasimhan Rampalli², Jude W. Shavlik¹, Xiaojin Zhu¹ - Show less +3 more•Institutions (2)

University of Wisconsin-Madison¹, Walmart Labs²

18 Jun 2014

TL;DR: Corleone is described, a HOC solution for EM, which uses the crowd in all major steps of the EM process, and the implications of this work to executing crowdsourced RDBMS joins, cleaning learning models, and soliciting complex information types from crowd workers.

...read moreread less

Abstract: Recent approaches to crowdsourcing entity matching (EM) are limited in that they crowdsource only parts of the EM workflow, requiring a developer to execute the remaining parts. Consequently, these approaches do not scale to the growing EM need at enterprises and crowdsourcing startups, and cannot handle scenarios where ordinary users (i.e., the masses) want to leverage crowdsourcing to match entities. In response, we propose the notion of hands-off crowdsourcing (HOC)}, which crowdsources the entire workflow of a task, thus requiring no developers. We show how HOC can represent a next logical direction for crowdsourcing research, scale up EM at enterprises and crowdsourcing startups, and open up crowdsourcing for the masses. We describe Corleone, a HOC solution for EM, which uses the crowd in all major steps of the EM process. Finally, we discuss the implications of our work to executing crowdsourced RDBMS joins, cleaning learning models, and soliciting complex information types from crowd workers.

...read moreread less

251 citations

Journal Article•DOI•

From Centralized Workflow Specification to Distributed WorkflowExecution

[...]

Peter Muth, Dirk Wodtke, Jeanine Weissenfels, Angelika Kotz Dittrich¹, Gerhard Weikum - Show less +1 more•Institutions (1)

Union Bank of Switzerland¹

01 Mar 1998

TL;DR: An algorithm for transforming a centralized state and activity chart into a provably equivalent partitioned one, suitable for distributed execution, is developed and a synchronization scheme is developed that guarantees an execution equivalent to a non-distributed one.

...read moreread less

Abstract: Current workflow management systems fall short of supporting large-scale distributed, enterprise-wide applications. We present a scalable, rigorously founded approach to enterprise-wide workflow management, based on the distributed execution of state and activity charts. By exploiting the formal semantics of state and activity charts, we develop an algorithm for transforming a centralized state and activity chart into a provably equivalent partitioned one, suitable for distributed execution. A synchronization scheme is developed that guarantees an execution equivalent to a non-distributed one. This basic solution is further refined in order to reduce communication overhead and exploit parallelism between partitions whenever possible. The developed synchronization schemes are compared in terms of the number and size of synchronization messages.

...read moreread less

251 citations

Journal Article•DOI•

Cost optimized provisioning of elastic resources for application workflows

[...]

Eun-Kyu Byun¹, Yang-Suk Kee², Jin-Soo Kim³, Seungryoul Maeng¹•Institutions (3)

KAIST¹, Oracle Corporation², Sungkyunkwan University³

01 Oct 2011-Future Generation Computer Systems

TL;DR: This paper suggests an architecture for the automatic execution of large-scale workflow-based applications on dynamically and elastically provisioned computing resources using the core algorithm named PBTS (Partitioned Balanced Time Scheduling), which estimates the minimum number of computing hosts required to execute a workflow within a user-specified finish time.

...read moreread less

250 citations

Journal Issue•DOI•

Scientific workflow management and the Kepler system: Research Articles

[...]

Bertram Ludäscher¹, Ilkay Altintas¹, Chad Berkley², Dan Higgins², Efrat Jaeger¹, Matthew B. Jones², Edward A. Lee³, Jing Tao¹, Yang Zhao³ - Show less +5 more•Institutions (3)

San Diego Supercomputer Center¹, University of California, Santa Barbara², University of California, Berkeley³

15 Aug 2006-Concurrency and Computation: Practice and Experience

TL;DR: Characteristics of and requirements for scientific workflows as identified in a number of application projects are described, and some key features of Kepler and its underlying Ptolemy II system, planned extensions, and areas of future research are described.

...read moreread less

Abstract: Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery ‘pipelines’. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. ‘the Grid’). However, this infrastructure is only a means to an end and ideally scientists should not be too concerned with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high-performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemy II system, planned extensions, and areas of future research. Kepler is a community-driven, open source project, and we always welcome related projects and new contributors to join. Copyright © 2005 John Wiley & Sons, Ltd.

...read moreread less

250 citations

Book•

The Foundations for Provenance on the Web

[...]

Luc Moreau¹•Institutions (1)

University of Southampton¹

01 Jan 2010

TL;DR: This monograph contends that provenance can and should reliably be tracked and exploited on the Web, and investigates the necessary foundations to achieve such a vision, as well as identifying an open approach and a model for provenance.

...read moreread less

Abstract: Provenance, i.e., the origin or source of something, is becoming an important concern, since it offers the means to verify data products, to infer their quality, to analyse the processes that led to them, and to decide whether they can be trusted. For instance, provenance enables the reproducibility of scientific results; provenance is necessary to track attribution and credit in curated databases; and, it is essential for reasoners to make trust judgements about the information they use over the Semantic Web. As the Web allows information sharing, discovery, aggregation, filtering and flow in an unprecedented manner, it also becomes very difficult to identify, reliably, the original source that produced an information item on the Web. Since the emerging use of provenance in niche applications is undoubtedly demonstrating the benefits of provenance, this monograph contends that provenance can and should reliably be tracked and exploited on the Web, and investigates the necessary foundations to achieve such a vision. Multiple data sources have been used to compile the largest bibliographical database on provenance so far. This large corpus permits the analysis of emerging trends in the research community. Specifically, the CiteSpace tool identifies clusters of papers that constitute research fronts, from which characteristics are extracted to structure a foundational framework for provenance on the Web. Such an endeavour requires a multi-disciplinary approach, since it requires contributions from many computer science sub-disciplines, but also other non-technical fields given the human challenge that is anticipated. To develop such a vision, it is necessary to provide a definition of provenance that applies to the Web context. A conceptual definition of provenance is expressed in terms of processes, and is shown to generalise various definitions of provenance commonly encountered. Furthermore, by bringing realistic distributed systems assumptions, this definition is refined as a query over assertions made by applications. Given that the majority of work on provenance has been undertaken by the database, workflow and e-science communities, some of their work is reviewed, contrasting approaches, and focusing on important topics believed to be crucial for bringing provenance to the Web, such as abstraction, collections, storage, queries, workflow evolution, semantics and activities involving human interactions. However, provenance approaches developed in the context of databases and workflows essentially deal with closed systems. By that, it is meant that workflow or database management systems are in full control of the data they manage, and track their provenance within their own scope, but not beyond. In the context of the Web, a broader approach is required by which chunks of provenance representation can be brought together to describe the provenance of information flowing across multiple systems. For this purpose, this monograph puts forward the Open Provenance Vision, which is an approach that consists of controlled vocabulary, serialisation formats and interfaces to allow the provenance of individual systems to be expressed, connected in a coherent fashion, and queried seamlessly. In this context, the Open Provenance Model is an emerging community-driven representation of provenance, which has been actively used by some 20 teams to exchange provenance information, in line with the Open Provenance Vision. After identifying an open approach and a model for provenance, techniques to expose provenance over the Web are investigated. In particular, Semantic Web technologies are discussed since they have been successfully exploited to express, query and reason over provenance. Symmetrically, Semantic Web technologies such as RDF, underpinning the Linked Data effort, are analysed since they offer their own difficulties with respect to provenance. A powerful argument for provenance is that it can help make systems transparent, so that it becomes possible to determine whether a particular use of information is appropriate under a set of rules. Such capability helps make systems and information accountable. To offer accountability, provenance itself must be authentic, and rely on security approaches, which are described in the monograph. This is then followed by systems where provenance is the basis of an auditing mechanism to check past processes against rules or regulations. In practice, not all users want to check and audit provenance, instead, they may rely on measures of quality or trust; hence, emerging provenance-based approaches to compute trust and quality of data are reviewed.

...read moreread less

248 citations

Collapse

Network Information

Performance

Metrics

45,561

Papers

582,986

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	4,414
2022	9,010
2021	1,461
2020	1,579
2019	1,702

Workflow

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics