Showing papers on "Workflow published in 2018"

PDF

Open Access

Journal Article•DOI•

quanteda: An R package for the quantitative analysis of textual data

[...]

Kenneth Benoit, Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, Akitaka Matsuo - Show less +3 more

06 Oct 2018-The Journal of Open Source Software

TL;DR: While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps, which lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.

...read moreread less

Abstract: quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations,it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multi-threading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data. The package is designed for R users needing to apply natural language processing to texts,from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.

...read moreread less

617 citations

Journal Article•DOI•

Machine Learning for Networking: Workflow, Advances and Opportunities

[...]

Mowei Wang¹, Yong Cui¹, Xin Wang², Shihan Xiao¹, Junchen Jiang³ - Show less +1 more•Institutions (3)

Tsinghua University¹, Stony Brook University², Carnegie Mellon University³

01 Mar 2018-IEEE Network

TL;DR: The basic workflow to explain how to apply machine learning technology in the networking domain is summarized and a selective survey of the latest representative advances with explanations of their design principles and benefits is provided.

...read moreread less

Abstract: Recently, machine learning has been used in every possible field to leverage its amazing power. For a long time, the networking and distributed computing system is the key infrastructure to provide efficient computational resources for machine learning. Networking itself can also benefit from this promising technology. This article focuses on the application of MLN, which can not only help solve the intractable old network questions but also stimulate new network applications. In this article, we summarize the basic workflow to explain how to apply machine learning technology in the networking domain. Then we provide a selective survey of the latest representative advances with explanations of their design principles and benefits. These advances are divided into several network design objectives and the detailed information of how they perform in each step of MLN workflow is presented. Finally, we shed light on the new opportunities in networking design and community building of this new inter-discipline. Our goal is to provide a broad research guideline on networking with machine learning to help motivate researchers to develop innovative algorithms, standards and frameworks.

...read moreread less

328 citations

Journal Article•DOI•

The Computational 2D Materials Database: High-Throughput Modeling and Discovery of Atomically Thin Crystals

[...]

Sten Haastrup¹, Mikkel Strange¹, Mohnish Pandey¹, Thorsten Deilmann¹, Per Simmendefeldt Schmidt¹, N. F. Hinsche¹, Morten Niklas Gjerding¹, Daniele Torelli¹, Peter Mahler Larsen¹, Anders C. Riis-Jensen¹, Jakob Gath¹, Karsten Wedel Jacobsen¹, Jens Jørgen Mortensen¹, Thomas Olsen¹, Kristian Sommer Thygesen¹ - Show less +11 more•Institutions (1)

Technical University of Denmark¹

08 Jun 2018-arXiv: Mesoscale and Nanoscale Physics

TL;DR: The Computational 2D Materials Database (C2DB) as discussed by the authors is a large-scale database of 2D materials and van der Waals heterostructures, including tens of thousands of materials.

...read moreread less

Abstract: We introduce the Computational 2D Materials Database (C2DB), which organises a variety of structural, thermodynamic, elastic, electronic, magnetic, and optical properties of around 1500 two-dimensional materials distributed over more than 30 different crystal structures. Material properties are systematically calculated by state-of-the art density functional theory and many-body perturbation theory (G$_0\!$W$\!_0$ and the Bethe-Salpeter Equation for $\sim$200 materials) following a semi-automated workflow for maximal consistency and transparency. The C2DB is fully open and can be browsed online or downloaded in its entirety. In this paper, we describe the workflow behind the database, present an overview of the properties and materials currently available, and explore trends and correlations in the data. Moreover, we identify a large number of new potentially synthesisable 2D materials with interesting properties targeting applications within spintronics, (opto-)electronics, and plasmonics. The C2DB offers a comprehensive and easily accessible overview of the rapidly expanding family of 2D materials and forms an ideal platform for computational modeling and design of new 2D materials and van der Waals heterostructures.

...read moreread less

241 citations

Journal Article•DOI•

Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale.

[...]

W. Daniel Kissling¹, Jorge A. Ahumada², Anne Bowser³, Miguel Fernandez⁴, Miguel Fernandez⁵, Néstor Fernández⁶, Enrique Alonso García⁷, Robert P. Guralnick⁸, Nick J. B. Isaac, Steve Kelling⁹, W. Los¹, Louise McRae¹⁰, Jean-Baptiste Mihoub¹¹, Jean-Baptiste Mihoub¹², Matthias Obst¹³, Monica Santamaria, Andrew K. Skidmore¹⁴, Kristen J. Williams¹⁵, Donat Agosti, Daniel Amariles¹⁶, Christos Arvanitidis, Lucy Bastin¹⁷, Francesca De Leo, Willi Egloff, Jane Elith¹⁸, Donald Hobern¹⁹, David Martin¹⁵, Henrique M. Pereira⁵, Graziano Pesole²⁰, Johannes Peterseil, Hannu Saarenmaa²¹, Dmitry Schigel¹⁹, Dirk S. Schmeller¹², Dirk S. Schmeller²², Nicola Segata²³, Eren Turak²⁴, Eren Turak²⁵, Paul F. Uhlir, Brian Wee, Alex Hardisty²⁶ - Show less +36 more•Institutions (26)

University of Amsterdam¹, Conservation International², Woodrow Wilson International Center for Scholars³, Higher University of San Andrés⁴, Martin Luther University of Halle-Wittenberg⁵, Spanish National Research Council⁶, University of Alcalá⁷, Florida Museum of Natural History⁸, Cornell University⁹, Zoological Society of London¹⁰, Helmholtz Centre for Environmental Research - UFZ¹¹, University of Paris¹², University of Gothenburg¹³, University of Twente¹⁴, Commonwealth Scientific and Industrial Research Organisation¹⁵, International Center for Tropical Agriculture¹⁶, Aston University¹⁷, University of Melbourne¹⁸, Global Biodiversity Information Facility¹⁹, University of Bari²⁰, University of Eastern Finland²¹, University of Toulouse²², University of Trento²³, Office of Environment and Heritage²⁴, Australian Museum²⁵, Cardiff University²⁶

01 Feb 2018-Biological Reviews

TL;DR: The challenges of a ‘Big Data’ approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance are assessed.

...read moreread less

Abstract: Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a 'Big Data' approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence-only or presence-absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi-source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter- or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi-source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA-based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals.

...read moreread less

212 citations

Journal Article•DOI•

The psycho Package: an Efficient and Publishing-Oriented Workflow for Psychological Science

[...]

Dominique Makowski

05 Feb 2018-The Journal of Open Source Software

TL;DR: Statistics are a vital aspect of psychological science, but the process of analyzing, formatting and reporting results is often fastidious, time-consuming and prone to errors, resulting in frustration and aversion.

...read moreread less

Abstract: Statistics are a vital aspect of psychological science. Unfortunately, the process of analyzing, formatting and reporting results is often fastidious, time-consuming and prone to errors, resulting in frustration and aversion. On top of that, many available tools for professionals and students are either overpriced, too complex (i.e., displaying vast amounts of raw information neither demanded nor needed by the user) or too basic (i.e., not supporting advanced statistical procedures). These factors contribute to the reproducibility crisis in psychological science (Chambers et al. 2014, Etz and Vandekerckhove (2016), Szucs and Ioannidis (2016)).

...read moreread less

187 citations

Journal Article•DOI•

RetroPath2.0: A retrosynthesis workflow for metabolic engineers.

[...]

Baudoin Delépine¹, Thomas Duigou¹, Pablo Carbonell², Jean-Loup Faulon•Institutions (2)

Université Paris-Saclay¹, University of Manchester²

01 Jan 2018-Metabolic Engineering

TL;DR: The ability of the workflow to streamline retrosynthesis pathway design and its major role in reshaping the design, build, test and learn pipeline by driving the process toward the objective of optimizing bioproduction are demonstrated.

...read moreread less

158 citations

Journal Article•DOI•

Workflow Scheduling Using Hybrid GA-PSO Algorithm in Cloud Computing

[...]

Ahmad M. Manasrah, Hanan Ba Ali

08 Jan 2018-Wireless Communications and Mobile Computing

TL;DR: The experiment results show that the GA-PSO algorithm decreases the total execution time of the workflow tasks, in comparison with GA, PSO, HSGA,WSGA, WSGA, and MTCT algorithms, and reduces the execution cost.

...read moreread less

Abstract: Cloud computing environment provides several on-demand services and resource sharing for clients. Business processes are managed using the workflow technology over the cloud, which represents one of the challenges in using the resources in an efficient manner due to the dependencies between the tasks. In this paper, a Hybrid GA-PSO algorithm is proposed to allocate tasks to the resources efficiently. The Hybrid GA-PSO algorithm aims to reduce the makespan and the cost and balance the load of the dependent tasks over the heterogonous resources in cloud computing environments. The experiment results show that the GA-PSO algorithm decreases the total execution time of the workflow tasks, in comparison with GA, PSO, HSGA, WSGA, and MTCT algorithms. Furthermore, it reduces the execution cost. In addition, it improves the load balancing of the workflow application over the available resources. Finally, the obtained results also proved that the proposed algorithm converges to optimal solutions faster and with higher quality compared to other algorithms.

...read moreread less

154 citations

Journal Article•DOI•

A Cost-Effective Deadline-Constrained Dynamic Scheduling Algorithm for Scientific Workflows in a Cloud Environment

[...]

Jyoti Sahni¹, Prakash Vidyarthi¹•Institutions (1)

Jawaharlal Nehru University¹

01 Jan 2018-IEEE Transactions on Cloud Computing

TL;DR: This work proposes a dynamic cost-effective deadline-constrained heuristic algorithm for scheduling a scientific workflow in a public cloud that aims to exploit the advantages offered by cloud computing while taking into account the virtual machine performance variability and instance acquisition delay.

...read moreread less

Abstract: Cloud computing, a distributed computing paradigm, enables delivery of IT resources over the Internet and follows the pay-as-you-go billing model. Workflow scheduling is one of the most challenging problems in cloud computing. Although, workflow scheduling on distributed systems like grids and clusters have been extensively studied, however, these solutions are not viable for a cloud environment. It is because, a cloud environment differs from other distributed environment in two major ways: on-demand resource provisioning and pay-as-you-go pricing model. Thus, to achieve the true benefits of workflow orchestration onto cloud resources novel approaches that can capitalize the advantages and address the challenges specific to a cloud environment needs to be developed. This work proposes a dynamic cost-effective deadline-constrained heuristic algorithm for scheduling a scientific workflow in a public cloud. The proposed technique aims to exploit the advantages offered by cloud computing while taking into account the virtual machine (VM) performance variability and instance acquisition delay to identify a just-in-time schedule of a deadline constrained scientific workflow at lesser costs. Performance evaluation on some well-known scientific workflows exhibit that the proposed algorithm delivers better performance in comparison to the current state-of-the-art heuristics.

...read moreread less

149 citations

Journal Article•DOI•

Invited Review Article: Metal-additive manufacturing — Modeling strategies for application-optimized designs

[...]

Amit Bandyopadhyay¹, Kellen D. Traxel¹•Institutions (1)

Washington State University¹

01 Aug 2018-Additive manufacturing

TL;DR: This review is envisioned to provide an essential framework on modeling techniques to supplement the experimental optimization process and highlight fundamental modeling strategies, considerations, and results, as well as validation techniques using experimental data.

...read moreread less

Abstract: Next generation, additively-manufactured metallic parts will be designed with application-optimized geometry, composition, and functionality. Manufacturers and researchers have investigated various techniques for increasing the reliability of the metal-AM process to create these components, however, understanding and manipulating the complex phenomena that occurs within the printed component during processing remains a formidable challenge-limiting the use of these unique design capabilities. Among various approaches, thermomechanical modeling has emerged as a technique for increasing the reliability of metal-AM processes, however, most literature is specialized and challenging to interpret for users unfamiliar with numerical modeling techniques. This review article highlights fundamental modeling strategies, considerations, and results, as well as validation techniques using experimental data. A discussion of emerging research areas where simulation will enhance the metal-AM optimization process is presented, as well as a potential modeling workflow for process optimization. This review is envisioned to provide an essential framework on modeling techniques to supplement the experimental optimization process.

...read moreread less

146 citations

Journal Article•DOI•

The future of scientific workflows

[...]

Ewa Deelman¹, Tom Peterka², Ilkay Altintas³, Christopher D. Carothers⁴, Kerstin Kleese van Dam⁵, Kenneth Moreland⁶, Manish Parashar⁷, Lavanya Ramakrishnan⁸, Michela Taufer⁹, Jeffrey S. Vetter¹⁰ - Show less +6 more•Institutions (10)

University of Southern California¹, Argonne National Laboratory², University of California, San Diego³, Rensselaer Polytechnic Institute⁴, Brookhaven National Laboratory⁵, Sandia National Laboratories⁶, Rutgers University⁷, Lawrence Berkeley National Laboratory⁸, University of Delaware⁹, Oak Ridge National Laboratory¹⁰

01 Jan 2018-International Journal of High Performance Computing Applications

TL;DR: This work highlights use cases, computing systems, workflow needs, and concludes by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.

...read moreread less

Abstract: Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on those workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community...

...read moreread less

144 citations

Journal Article•DOI•

pipsCloud: High performance cloud computing for remote sensing big data management and processing

[...]

Lizhe Wang¹, Lizhe Wang², Yan Ma², Jining Yan², Victor Chang³, Albert Y. Zomaya⁴ - Show less +2 more•Institutions (4)

China University of Geosciences (Wuhan)¹, Chinese Academy of Sciences², Xi'an Jiaotong-Liverpool University³, University of Sydney⁴

01 Jan 2018-Future Generation Computer Systems

TL;DR: This work proposes pipsCloud which combine recent Cloud computing and HPC techniques to enable large-scale RS data processing system as on-demand real-time services and adopts an adaptive RS data analysis workflow manage system.

...read moreread less

Journal Article•DOI•

Information extraction and knowledge graph construction from geoscience literature

[...]

Chengbin Wang¹, Chengbin Wang², Xiaogang Ma², Jianguo Chen¹, Jingwen Chen¹ - Show less +1 more•Institutions (2)

China University of Geosciences (Wuhan)¹, University of Idaho²

01 Mar 2018-Computers & Geosciences

TL;DR: A workflow and a few empirical case studies for Chinese word segmentation rules of the Conditional Random Fields model are presented, and the potential of leveraging natural language processing and knowledge graph technologies for geoscience is shown.

...read moreread less

Journal Article•DOI•

Practical Computational Reproducibility in the Life Sciences.

[...]

Björn Grüning¹, John Chilton², Johannes Köster³, Ryan K. Dale⁴, Nicola Soranzo⁵, Marius van den Beek⁶, Jeremy Goecks⁷, Rolf Backofen¹, Anton Nekrutenko², James W. Taylor⁸ - Show less +6 more•Institutions (8)

University of Freiburg¹, Pennsylvania State University², University of Duisburg-Essen³, National Institutes of Health⁴, Norwich University⁵, Curie Institute⁶, Oregon Health & Science University⁷, Johns Hopkins University⁸

27 Jun 2018-Cell systems

TL;DR: This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility.

...read moreread less

Abstract: Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components—a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines—to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.

...read moreread less

Journal Article•DOI•

Covidence and Rayyan

[...]

Liz Kellermeyer, Ben Harnke¹, Shandra L Knight•Institutions (1)

Anschutz Medical Campus¹

04 Oct 2018-Journal of The Medical Library Association

TL;DR: Covidence works well and is well suited for more rigorous systematic reviews, where methodology must be adhered to and documented at each stage, and Rayyan works as a viable upgrade from a workflow using only EndNote and/or Excel.

...read moreread less

Abstract: Health sciences librarians from two institutions conducted an assessment of Covidence, a subscription-based systematic review tool, and Rayyan, a free competitor, for abilities, strengths, and limitations. Covidence mirrors the multiphase review process, including data extraction, directly in its design. Rayyan, on the other hand, does not easily mirror this process and really only aids with the reference screening phases. Rayyan takes a minimalist approach, placing more of the logistical and workflow burden on the users themselves. Many of the peripheral features (e.g., highlighting, tagging, etc.) are comparable. Covidence works well and is well suited for more rigorous systematic reviews, where methodology must be adhered to and documented at each stage. In spite of some limited functionality and clunky features, Rayyan is a good free alternative for article screening and works as a viable upgrade from a workflow using only EndNote and/or Excel.

...read moreread less

Proceedings Article•DOI•

Towards a Knowledge Graph for Science

[...]

Sören Auer¹, Viktor Kovtun¹, Manuel Prinz, Anna Kasprzik, Markus Stocker, Maria-Esther Vidal¹ - Show less +2 more•Institutions (1)

Leibniz University of Hanover¹

25 Jun 2018

TL;DR: The vision of a knowledge graph for science is proposed, a possible infrastructure for such a knowledge graphs is presented as well as early attempts towards an implementation of the infrastructure.

...read moreread less

Abstract: The document-centric workflows in science have reached (or already exceeded) the limits of adequacy. This is emphasized by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. This presents an opportunity to rethink the dominant paradigm of document-centric scholarly information communication and transform it into knowledge-based information flows by representing and expressing information through semantically rich, interlinked knowledge graphs. At the core of knowledge-based information flows is the creation and evolution of information models that establish a common understanding of information communicated between stakeholders as well as the integration of these technologies into the infrastructure and processes of search and information exchange in the research library of the future. By integrating these models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work as information and research results can be seamlessly interlinked with each other and better matched to complex information needs. Furthermore, research results become directly comparable and easier to reuse. As our main contribution, we propose the vision of a knowledge graph for science, present a possible infrastructure for such a knowledge graph as well as our early attempts towards an implementation of the infrastructure.

...read moreread less

Journal Article•DOI•

Toward service selection for workflow reconfiguration:An interface-based computing solution

[...]

Honghao Gao¹, Wanqiu Huang¹, Xiaoxian Yang², Yucong Duan³, Yuyu Yin⁴ - Show less +1 more•Institutions (4)

Shanghai University¹, Shanghai Second Polytechnic University², Hainan University³, Hangzhou Dianzi University⁴

01 Oct 2018-Future Generation Computer Systems

TL;DR: A novel service workflow reconfiguration architecture is designed to provide guidance, which ranges from monitoring to recommendations for project implementation, and experiments are conducted to demonstrate the effectiveness and efficiency of the proposed method.

...read moreread less

Journal Article•DOI•

Guidelines for using Bsoft for high resolution reconstruction and validation of biomolecular structures from electron micrographs

[...]

J. Bernard Heymann

01 Jan 2018-Protein Science

TL;DR: The Bsoft software package, developed over 20 years for analyzing electron micrographs, offers a full workflow for validated single particle analysis with extensive functionality, enabling customization for specific cases.

...read moreread less

Abstract: Cryo-electron microscopy (cryoEM) is becoming popular as a tool to solve biomolecular structures with the recent availability of direct electron detectors allowing automated acquisition of high resolution data. The Bsoft software package, developed over 20 years for analyzing electron micrographs, offers a full workflow for validated single particle analysis with extensive functionality, enabling customization for specific cases. With the increasing use of cryoEM and its automation, proper validation of the results is a bigger concern. The three major validation approaches, independent data sets, resolution-limited processing, and coherence testing, can be incorporated into any Bsoft workflow. Here, the main workflow is divided into four phases: (i) micrograph preprocessing, (ii) particle picking, (iii) particle alignment and reconstruction, and (iv) interpretation. Each of these phases represents a conceptual unit that can be automated, followed by a check point to assess the results. The aim in the first three phases is to reconstruct one or more validated maps at the best resolution possible. Map interpretation then involves identification of components, segmentation, quantification, and modeling. The algorithms in Bsoft are well established, with future plans focused on ease of use, automation and institutionalizing validation.

...read moreread less

Journal Article•DOI•

Artificial intelligence and statistics

[...]

Bin Yu¹, Karl Kumbier¹•Institutions (1)

University of California, Berkeley¹

23 Apr 2018-Journal of Zhejiang University Science C

TL;DR: The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and researches and discusses the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors’ collaborative research.

...read moreread less

Abstract: Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during the generation of data, the development of algorithms, and the evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS). The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and researches. These ideas include experimental design principles of randomization and local control as well as the principle of stability to gain reproducibility and interpretability of algorithms and data results. We discuss the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors’ collaborative research.

...read moreread less

Proceedings Article•DOI•

Cross-Organizational Workflow Management Using Blockchain Technology - Towards Applicability, Auditability, and Automation

[...]

Gilbert Fridgen¹, Sven Radszuwill², Nils Urbach², Lena Utz•Institutions (2)

University of Luxembourg¹, University of Bayreuth²

03 Jan 2018

TL;DR: The research reveals that a tamper-proof process history for improved auditability, automation of manual process steps and the decentralized nature of the system can be major advantages of a Blockchain solution for crossorganizational workflow management.

...read moreread less

Abstract: Bringing Blockchain technology and business process management together, we follow the Design Science Research approach and design, implement, and evaluate a Blockchain prototype for crossorganizational workflow management together with a German bank. For the use case of a documentary letter of credit we describe the status quo of the process, identify areas of improvement, implement a Blockchain solution, and compare both workflows. The prototype illustrates that the process, as of today paper-based and with high manual effort, can be significantly improved. Our research reveals that a tamper-proof process history for improved auditability, automation of manual process steps and the decentralized nature of the system can be major advantages of a Blockchain solution for crossorganizational workflow management. Further, our research provides insights how Blockchain technology can be used for business process management in general.

...read moreread less

Journal Article•DOI•

A GSA based hybrid algorithm for bi-objective workflow scheduling in cloud computing

[...]

Anubhav Choudhary¹, Indrajeet Gupta¹, Vishakha Singh¹, Prasanta K. Jana¹•Institutions (1)

Indian Institutes of Technology¹

01 Jun 2018-Future Generation Computer Systems

TL;DR: A meta-heuristic based algorithm for workflow scheduling that considers minimization of makespan and cost and introduces a new factor called cost time equivalence to make the bi-objective optimization more realistic.

...read moreread less

Journal Article•DOI•

Simple data and workflow management with the signac framework

[...]

Carl Simon Adorf¹, Paul M. Dodd¹, Vyas Ramasubramani¹, Sharon C. Glotzer¹•Institutions (1)

University of Michigan¹

15 Apr 2018-Computational Materials Science

TL;DR: Signac as discussed by the authors is a framework designed to assist in the integration of various specialized data formats, tools and workflows, simplifying data access and modification through a homogeneous data interface that is largely agnostic to the data source.

...read moreread less

Journal Article•DOI•

A data citation roadmap for scientific publishers

[...]

Helena Cousijn¹, Amye Kenall², Emma Ganley³, Melissa Harrison, David Kernohan⁴, Thomas Lemberger⁵, Fiona Murphy⁶, Patrick Polischuk, Simone Taylor⁷, Maryann E. Martone⁸, Timothy Clark⁹ - Show less +7 more•Institutions (9)

Elsevier¹, Springer Science+Business Media², PLOS³, Jisc⁴, European Molecular Biology Organization⁵, University of Reading⁶, John Wiley & Sons⁷, University of California, San Diego⁸, University of Virginia⁹

20 Nov 2018-Scientific Data

TL;DR: This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies.

...read moreread less

Abstract: This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies. It was developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE program. The structure of the roadmap presented here follows the “life of a paper” workflow and includes the categories Pre-submission, Submission, Production, and Publication. The roadmap is intended to be publisher-agnostic so that all publishers can use this as a starting point when implementing JDDCP-compliant data citation. Authors reading this roadmap will also better know what to expect from publishers and how to enable their own data citations to gain maximum impact, as well as complying with what will become increasingly common funder mandates on data transparency.

...read moreread less

Journal Article•DOI•

Virtual reality-integrated workflow in BIM-enabled projects collaboration and design review: a case study

[...]

Reza Zaker¹, Eloi Coloma¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Dec 2018-Visualization in Engineering

TL;DR: A case study of VR integrated collaboration workflow serves as an example of how firms can overcome the challenge of benefiting this new technology, and shows the investment in new hardware and software, and resistant against adoption of new technologies are main obstacles of its wide adoption.

...read moreread less

Abstract: A successful project delivery based on building information modeling (BIM) methods is interdependent on an efficient collaboration. This relies mainly on the visualization of a BIM model, which can appear on different mediums. Visualization on mediums such as computer screens, lack some degrees of immersion which may prevent the full utilization of the model. Another problem with conventional collaboration methods such as BIM-Big room, is the need of physical presence of participants in a room. Virtual Reality as the most immersive medium for visualizing a model, has the promise to become a regular part of construction industry. The virtual presence of collaborators in a VR environment, eliminates the need of their physical presence. Simulation of on-site task can address a number of issues during construction, such as feasibility of operations. As consumer VR tools have recently been available in the market, little research has been done on their actual employment in architecture, engineering and construction (AEC) practices. This paper investigates the application of a VR based workflow in a real project. The authors collaborated with a software company to evaluate some of their advanced VR software features, such as simulation of an on-site task. A case study of VR integrated collaboration workflow serves as an example of how firms can overcome the challenge of benefiting this new technology. A group of AEC professionals involved in a project were invited to take part in the experiment, utilizing their actual project BIM models. The results of the feedbacks from the experiment confirmed the supposed benefits of a VR collaboration method. Although the participants of the study were from a wide range of disciplines, they could find benefits of the technology in their practice. It also resulted that an experimental method of clash detection via simulation, could actually be practical. The simulation of on-site tasks and perception of architectural spaces in a 1:1 scale are assets unique to VR application in AEC practices. Nevertheless, the study shows the investment in new hardware and software, and resistant against adoption of new technologies are main obstacles of its wide adoption. Further works in computer industry is required to make these technologies more affordable.

...read moreread less

Journal Article•DOI•

Integration of parametric design into modular coordination: A construction waste reduction workflow

[...]

Saeed Banihashemi¹, Amir Tabadkani², M. Reza Hosseini³•Institutions (3)

University of Canberra¹, Polytechnic University of Milan², Deakin University³

01 Apr 2018-Automation in Construction

TL;DR: An integration attempt is made to provide the details of a developed-and-experimented workflow in construction waste reduction at the design stage, and how the proposed workflow reduces the volume of post-optimization paneling waste by 2% at its minimum are the major findings.

...read moreread less

Journal Article•DOI•

A Workflow Management System for Scalable Data Mining on Clouds

[...]

Fabrizio Marozzo¹, Domenico Talia¹, Paolo Trunfio¹•Institutions (1)

University of Calabria¹

01 May 2018-IEEE Transactions on Services Computing

TL;DR: The design and implementation of the Data Mining Cloud Framework (DMCF), a data analysis system that integrates a visual workflow language and a parallel runtime with the Software-as-a-Service (SaaS) model is described.

...read moreread less

Abstract: The extraction of useful information from data is often a complex process that can be conveniently modeled as a data analysis workflow. When very large data sets must be analyzed and/or complex data mining algorithms must be executed, data analysis workflows may take very long times to complete their execution. Therefore, efficient systems are required for the scalable execution of data analysis workflows, by exploiting the computing services of the Cloud platforms where data is increasingly being stored. The objective of the paper is to demonstrate how Cloud software technologies can be integrated to implement an effective environment for designing and executing scalable data analysis workflows. We describe the design and implementation of the Data Mining Cloud Framework (DMCF), a data analysis system that integrates a visual workflow language and a parallel runtime with the Software-as-a-Service (SaaS) model. DMCF was designed taking into account the needs of real data mining applications, with the goal of simplifying the development of data mining applications compared to generic workflow management systems that are not specifically designed for this domain. The result is a high-level environment that, through an integrated visual workflow language, minimizes the programming effort, making easier to domain experts the use of common patterns specifically designed for the development and the parallel execution of data mining applications. The DMCF's visual workflow language, system architecture and runtime mechanisms are presented. We also discuss several data mining workflows developed with DMCF and the scalability obtained executing such workflows on a public Cloud.

...read moreread less

Journal Article•DOI•

A Re-Engineered Software Interface and Workflow for the Open-Source SimVascular Cardiovascular Modeling Package.

[...]

Hongzhi Lan¹, Adam Updegrove², Nathan M. Wilson, Gabriel D. Maher¹, Shawn C. Shadden², Alison L. Marsden¹ - Show less +2 more•Institutions (2)

Stanford University¹, University of California, Berkeley²

01 Feb 2018-Journal of Biomechanical Engineering-transactions of The Asme

TL;DR: A major SimVascular (SV) release is introduced that includes a new graphical user interface (GUI) designed to improve user experience and major changes to the software platform and outline features added in this new release.

...read moreread less

Abstract: Patient-specific simulation plays an important role in cardiovascular disease research, diagnosis, surgical planning and medical device design, as well as education in cardiovascular biomechanics. simvascular is an open-source software package encompassing an entire cardiovascular modeling and simulation pipeline from image segmentation, three-dimensional (3D) solid modeling, and mesh generation, to patient-specific simulation and analysis. SimVascular is widely used for cardiovascular basic science and clinical research as well as education, following increased adoption by users and development of a GATEWAY web portal to facilitate educational access. Initial efforts of the project focused on replacing commercial packages with open-source alternatives and adding increased functionality for multiscale modeling, fluid-structure interaction (FSI), and solid modeling operations. In this paper, we introduce a major SimVascular (SV) release that includes a new graphical user interface (GUI) designed to improve user experience. Additional improvements include enhanced data/project management, interactive tools to facilitate user interaction, new boundary condition (BC) functionality, plug-in mechanism to increase modularity, a new 3D segmentation tool, and new computer-aided design (CAD)-based solid modeling capabilities. Here, we focus on major changes to the software platform and outline features added in this new release. We also briefly describe our recent experiences using SimVascular in the classroom for bioengineering education.

...read moreread less

Journal Article•DOI•

Fluctuation-Aware and Predictive Workflow Scheduling in Cost-Effective Infrastructure-as-a-Service Clouds

[...]

Weiling Li¹, Yunni Xia¹, MengChu Zhou², Xiaoning Sun¹, Qingsheng Zhu¹ - Show less +1 more•Institutions (2)

Chongqing University¹, New Jersey Institute of Technology²

24 Sep 2018-IEEE Access

TL;DR: A case study based on real-world third-party IaaS clouds and some well-known scientific workflows shows that the proposed approach outperforms traditional approaches, especially those considering time-invariant or bounded performance only.

...read moreread less

Abstract: Cloud computing is becoming an increasingly popular platform for the execution of scientific applications such as scientific workflows. In contrast to grids and other traditional high-performance computing systems, clouds provide a customizable infrastructure where scientific workflows can provision desired resources ahead of the execution and set up a required software environment on virtual machines (VMs). Nevertheless, various challenges, especially its quality-of-service prediction and optimal scheduling, are yet to be addressed. Existing studies mainly consider workflow tasks to be executed with VMs having time-invariant, stochastic, or bounded performance and focus on minimizing workflow execution time or execution cost while meeting the quality-of-service requirements. This work considers time-varying performance and aims at minimizing the execution cost of workflow deployed on Infrastructure-as-a-Service clouds while satisfying Service-Level-Agreements with users. We employ time-series-based approaches to capture dynamic performance fluctuations, feed a genetic algorithm with predicted performance of VMs, and generate schedules at run-time. A case study based on real-world third-party IaaS clouds and some well-known scientific workflows show that our proposed approach outperforms traditional approaches, especially those considering time-invariant or bounded performance only.

...read moreread less

Journal Article•DOI•

Viewing Visual Analytics as Model Building

[...]

Natalia Andrienko¹, Natalia Andrienko², Tim Lammarsch, Gennady Andrienko², Gennady Andrienko¹, Georg Fuchs², Daniel A. Keim³, Silvia Miksch⁴, Alexander Rind⁵ - Show less +5 more•Institutions (5)

City University London¹, Fraunhofer Society², University of Konstanz³, Vienna University of Technology⁴, St. Pölten University of Applied Sciences⁵

01 Sep 2018-Computer Graphics Forum

TL;DR: This work argues that the main goal of doing visual analytics is to build a mental and/or formal model of a certain piece of reality reflected in data, and proposes a detailed conceptual framework in which the visual analytics process is considered as a goal‐oriented workflow producing a model as a result.

...read moreread less

Abstract: To complement the currently existing definitions and conceptual frameworks of visual analytics, which focus mainly on activities performed by analysts and types of techniques they use, we attempt to define the expected results of these activities. We argue that the main goal of doing visual analytics is to build a mental and/or formal model of a certain piece of reality reflected in data. The purpose of the model may be to understand, to forecast or to control this piece of reality. Based on this model-building perspective, we propose a detailed conceptual framework in which the visual analytics process is considered as a goal-oriented workflow producing a model as a result. We demonstrate how this framework can be used for performing an analytical survey of the visual analytics research field and identifying the directions and areas where further research is needed.

...read moreread less

Journal Article•DOI•

Integrating IoT into operational workflows for real-time and automated decision-making in repetitive construction operations

[...]

Joseph Louis¹, Phillip S. Dunston²•Institutions (2)

Oregon State University¹, Purdue University²

01 Oct 2018-Automation in Construction

TL;DR: A framework for leveraging the growing ubiquity of devices that can be considered part of the internet of things to inform real-time decision-making on the construction site to provide a practical and sensor-agnostic implementation of operation-level decisions by utilizing IoT networks along with advancements in modeling and simulation tools.

...read moreread less

Journal Article•DOI•

Energy-aware scheduling algorithm for time-constrained workflow tasks in DVFS-enabled cloud environment

[...]

Monire Safari¹, Reihaneh Khorsand¹•Institutions (1)

Islamic Azad University, Isfahan¹

01 Sep 2018-Simulation Modelling Practice and Theory

TL;DR: A new energy-aware scheduling algorithm for time-constrained workflow tasks is proposed using the DVFS method in which the host reduces the operating frequency using different voltage levels, which performs more efficiently when evaluating metrics such as energy utilization, average execution time, average resource utilization and average SLA violation.

...read moreread less

Collapse