Showing papers on "Workflow published in 2021"

PDF

Open Access

Journal Article•DOI•

Sustainable data analysis with Snakemake.

[...]

Felix Mölder¹, Kim Philipp Jablonski², Kim Philipp Jablonski³, Brice Letcher⁴, Michael B Hall⁴, Christopher Tomkins-Tinch⁵, Christopher Tomkins-Tinch⁶, Vanessa Sochat⁷, Jan Forster⁸, Jan Forster¹, Soohyun Lee⁶, Sven Twardziok⁹, Alexander Kanitz¹⁰, Alexander Kanitz³, Andreas Wilm¹¹, Manuel Holtgrewe⁹, Sven Rahmann¹, Sven Nahnsen¹², Johannes Köster¹, Johannes Köster⁶ - Show less +16 more•Institutions (12)

University of Duisburg-Essen¹, ETH Zurich², Swiss Institute of Bioinformatics³, European Bioinformatics Institute⁴, Broad Institute⁵, Harvard University⁶, Stanford University⁷, German Cancer Research Center⁸, Humboldt University of Berlin⁹, University of Basel¹⁰, Microsoft¹¹, University of Tübingen¹²

19 Apr 2021-F1000Research

TL;DR: It is shown how the popular workflow management system Snakemake can be used to guarantee reproducibility, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

...read moreread less

Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

...read moreread less

519 citations

Journal Article•DOI•

A review on machine learning in 3D printing: applications, potential, and challenges

[...]

Guo Dong Goh¹, Swee Leong Sing¹, Wai Yee Yeong¹•Institutions (1)

Nanyang Technological University¹

01 Jan 2021-Artificial Intelligence Review

TL;DR: In the authors’ perspective, in situ monitoring of AM processes will significantly benefit from the object detection ability of ML, and data sharing of AM would enable faster adoption of ML in AM.

...read moreread less

Abstract: Additive manufacturing (AM) or 3D printing is growing rapidly in the manufacturing industry and has gained a lot of attention from various fields owing to its ability to fabricate parts with complex features. The reliability of the 3D printed parts has been the focus of the researchers to realize AM as an end-part production tool. Machine learning (ML) has been applied in various aspects of AM to improve the whole design and manufacturing workflow especially in the era of industry 4.0. In this review article, various types of ML techniques are first introduced. It is then followed by the discussion on their use in various aspects of AM such as design for 3D printing, material tuning, process optimization, in situ monitoring, cloud service, and cybersecurity. Potential applications in the biomedical, tissue engineering and building and construction will be highlighted. The challenges faced by ML in AM such as computational cost, standards for qualification and data acquisition techniques will also be discussed. In the authors’ perspective, in situ monitoring of AM processes will significantly benefit from the object detection ability of ML. As a large data set is crucial for ML, data sharing of AM would enable faster adoption of ML in AM. Standards for the shared data are needed to facilitate easy sharing of data. The use of ML in AM will become more mature and widely adopted as better data acquisition techniques and more powerful computer chips for ML are developed.

...read moreread less

229 citations

Journal Article•DOI•

Community-led, integrated, reproducible multi-omics with anvi'o.

[...]

A. Murat Eren, Evan Kiefl¹, Alon Shaiber¹, Iva Veseli¹, Samuel E. Miller¹, Matthew S. Schechter¹, Isaac Fink¹, Jessica N. Pan¹, Mahmoud Yousef¹, Emily Fogarty¹, Florian Trigodet¹, Andrea R. Watson¹, Özcan C. Esen¹, Ryan M Moore², Quentin Clayssen³, Michael D. Lee⁴, Veronika Kivenson⁵, Elaina D. Graham⁶, Bryan D. Merrill⁷, Antti Karkman⁸, Daniel Blankenberg⁹, Daniel Blankenberg¹⁰, John M. Eppley, Andreas Sjödin¹¹, Jarrod J. Scott¹², Xabier Vázquez-Campos¹³, Luke J. McKay¹⁴, Elizabeth A. McDaniel¹⁵, Sarah L. R. Stevens¹⁵, Rika E. Anderson¹⁶, Jessika Fuessel¹, Antonio Fernandez-Guerra¹⁷, Lois Maignien¹⁸, Lois Maignien¹⁹, Tom O. Delmont²⁰, Amy D. Willis²¹ - Show less +32 more•Institutions (21)

01 Jan 2021-Nature microbiology

TL;DR: The workflows designed to enable researchers to interpret data can constrain the biological questions that can be asked as discussed by the authors, but the workflows can also be difficult to adapt to real-world applications.

...read moreread less

Abstract: Big data abound in microbiology, but the workflows designed to enable researchers to interpret data can constrain the biological questions that can be asked. Five years after anvi’o was first published, this community-led multi-omics platform is maturing into an open software ecosystem that reduces constraints in ‘omics data analyses.

...read moreread less

220 citations

Journal Article•DOI•

Current challenges and best-practice protocols for microbiome analysis.

[...]

Richa Bharti¹, Dominik G. Grimm¹•Institutions (1)

Weihenstephan-Triesdorf University of Applied Sciences¹

18 Jan 2021-Briefings in Bioinformatics

TL;DR: The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization.

...read moreread less

Abstract: Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).

...read moreread less

209 citations

Journal Article•DOI•

Modular, efficient and constant-memory single-cell RNA-seq preprocessing.

[...]

Páll Melsted¹, A. Sina Booeshaghi², Lauren Liu², Fan Gao², Lambda Lu², Kyung Hoi Min³, Eduardo da Veiga Beltrame², Kristjan E. Hjorleifsson², Jase Gehring⁴, Lior Pachter² - Show less +6 more•Institutions (4)

University of Iceland¹, California Institute of Technology², Massachusetts Institute of Technology³, University of Washington⁴

01 Apr 2021-Nature Biotechnology

TL;DR: In this article, a workflow for preprocessing single-cell RNA-sequencing data that balances efficiency and accuracy is described, based on the kallisto and bustools programs.

...read moreread less

Abstract: We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses

...read moreread less

170 citations

Journal Article•DOI•

Quantitative single-cell proteomics as a tool to characterize cellular hierarchies.

[...]

Erwin M. Schoof, Benjamin Furtwängler¹, Nil Üresin¹, Nicolas Rapin¹, Simonas Savickas², Coline Gentil¹, Eric R. Lechman³, Eric R. Lechman⁴, Ulrich auf dem Keller², John E. Dick⁴, John E. Dick³, Bo T. Porse¹ - Show less +8 more•Institutions (4)

University of Copenhagen¹, Technical University of Denmark², University of Toronto³, Princess Margaret Cancer Centre⁴

07 Jun 2021-Nature Communications

TL;DR: In this article, a global single-cell mass spectrometry-based proteomics approach is proposed for large-scale single cell analyses, which can provide insights into the molecular basis for cellular heterogeneity.

...read moreread less

Abstract: Large-scale single-cell analyses are of fundamental importance in order to capture biological heterogeneity within complex cell systems, but have largely been limited to RNA-based technologies. Here we present a comprehensive benchmarked experimental and computational workflow, which establishes global single-cell mass spectrometry-based proteomics as a tool for large-scale single-cell analyses. By exploiting a primary leukemia model system, we demonstrate both through pre-enrichment of cell populations and through a non-enriched unbiased approach that our workflow enables the exploration of cellular heterogeneity within this aberrant developmental hierarchy. Our approach is capable of consistently quantifying ~1000 proteins per cell across thousands of individual cells using limited instrument time. Furthermore, we develop a computational workflow (SCeptre) that effectively normalizes the data, integrates available FACS data and facilitates downstream analysis. The approach presented here lays a foundation for implementing global single-cell proteomics studies across the world. Single-cell proteomics can provide insights into the molecular basis for cellular heterogeneity. Here, the authors develop a multiplexed single-cell proteomics and computational workflow, and show that their strategy captures the cellular hierarchies in an Acute Myeloid Leukemia culture model.

...read moreread less

149 citations

Journal Article•DOI•

Secure Artificial Intelligence of Things for Implicit Group Recommendations

[...]

Keping Yu¹, Zhiwei Guo², Yu Shen², Wei Wang³, Jerry Chun-Wei Lin⁴, Takuro Sato¹ - Show less +2 more•Institutions (4)

Waseda University¹, Chongqing Technology and Business University², Sun Yat-sen University³, Bergen University College⁴

12 May 2021-IEEE Internet of Things Journal

TL;DR: This article proposes secure AIoT for implicit group recommendations (SAIoT-GRs) with a collaborative Bayesian network model and noncooperative game introduced as algorithms and is able to maximize the advantages of the two modules.

...read moreread less

Abstract: The emergence of Artificial Intelligence of Things (AIoT) has provided novel insights for many social computing applications such as group recommender systems. As the distances between people have been greatly shortened, there has been more general demand for the provision of personalized services aimed at groups instead of individuals. The existing methods for capturing group-level preference features from individuals have mostly been established via aggregation and face two challenges: secure data management workflows are absent, and implicit preference feedback is ignored. To tackle these current difficulties, this paper proposes secure AIoT for implicit group recommendations (SAIoT-GR). For the hardware module, a secure IoT structure is developed as the bottom support platform. For the software module, a collaborative Bayesian network model and noncooperative game are introduced as algorithms. This secure AIoT architecture is able to maximize the advantages of the two modules. In addition, a large number of experiments are carried out to evaluate the performance of SAIoT-GR in terms of efficiency and robustness.

...read moreread less

117 citations

Journal Article•DOI•

Uncertainty-Aware Online Scheduling for Real-Time Workflows in Cloud Service Environment

[...]

Huangke Chen¹, Xiaomin Zhu¹, Guipeng Liu¹, Witold Pedrycz²•Institutions (2)

National University of Defense Technology¹, University of Alberta²

01 Jul 2021-IEEE Transactions on Services Computing

TL;DR: This study develops an unceRtainty-aware Online Scheduling Algorithm (ROSA) to schedule dynamic and multiple workflows with deadlines that performs better than the five compared algorithms with respect to costs, deviations, deviation, resource utilization, and fairness.

...read moreread less

Abstract: Scheduling workflows in cloud service environment has attracted great enthusiasm, and various approaches have been reported up to now. However, these approaches often ignored the uncertainties in the scheduling environment, such as the uncertain task start/execution/finish time, the uncertain data transfer time among tasks, the sudden arrival of new workflows. Ignoring these uncertain factors often leads to the violation of workflow deadlines and increases service renting costs of executing workflows. This study devotes to improving the performance for cloud service platforms by minimizing uncertainty propagation in scheduling workflow applications that have both uncertain task execution time and data transfer time. To be specific, a novel scheduling architecture is designed to control the count of workflow tasks directly waiting on each service instance (e.g., virtual machine and container). Once a task is completed, its start/execution/finish time are available, which means its uncertainties disappearing, and will not affect the subsequent waiting tasks on the same service instance. Thus, controlling the count of waiting tasks on service instances can prohibit the propagation of uncertainties. Based on this architecture, we develop an unce R tainty-aware O nline S cheduling A lgorithm ( ROSA ) to schedule dynamic and multiple workflows with deadlines. The proposed ROSA skillfully integrates both the proactive and reactive strategies. During the execution of the generated baseline schedules, the reactive strategy in ROSA will be dynamically called to produce new proactive baseline schedules for dealing with uncertainties. Then, on the basis of real-world workflow traces, five groups of simulation experiments are carried out to compare ROSA with five typical algorithms. The comparison results reveal that ROSA performs better than the five compared algorithms with respect to costs (up to 56 percent), deviation (up to 70 percent), resource utilization (up to 37 percent), and fairness (up to 37 percent).

...read moreread less

116 citations

Journal Article•DOI•

The Cloud-edge-based Dynamic Reconfiguration to Service Workflow for Mobile Ecommerce Environments: A QoS Prediction Perspective

[...]

Honghao Gao¹, Wanqiu Huang¹, Yucong Duan²•Institutions (2)

Shanghai University¹, Hainan University²

13 Jan 2021-ACM Transactions on Internet Technology

TL;DR: In this article, a cloud-edge based dynamic reconfiguration to service workflow for mobile e-commerce environments is proposed, where the value and cost attributes of a service are considered, and a long short-term memory (LSTM) neural network is used to predict the stability of services.

...read moreread less

Abstract: The emergence of mobile service composition meets the current needs for real-time eCommerce. However, the requirements for eCommerce, such as safety and timeliness, are becoming increasingly strict. Thus, the cloud-edge hybrid computing model has been introduced to accelerate information processing, especially in a mobile scenario. However, the mobile environment is characterized by limited resource storage and users who frequently move, and these characteristics strongly affect the reliability of service composition running in this environment. Consequently, applications are likely to fail if inappropriate services are invoked. To ensure that the composite service can operate normally, traditional dynamic reconfiguration methods tend to focus on cloud services scheduling. Unfortunately, most of these approaches cannot support timely responses to dynamic changes. In this article, the cloud-edge based dynamic reconfiguration to service workflow for mobile eCommerce environments is proposed. First, the service quality concept is extended. Specifically, the value and cost attributes of a service are considered. The value attribute is used to assess the stability of the service for some time to come, and the cost attribute is the cost of a service invocation. Second, a long short-term memory (LSTM) neural network is used to predict the stability of services, which is related to the calculation of the value attribute. Then, in view of the limited available equipment resources, a method for calculating the cost of calling a service is introduced. Third, candidate services are selected by considering both service stability and the cost of service invocation, thus yielding a dynamic reconfiguration scheme that is more suitable for the cloud-edge environment. Finally, a series of comparative experiments were carried out, and the experimental results prove that the method proposed in this article offers higher stability, less energy consumption, and more accurate service prediction.

...read moreread less

93 citations

Journal Article•DOI•

Reproducibility standards for machine learning in the life sciences.

[...]

Benjamin J. Heil¹, Michael M. Hoffman, Florian Markowetz², Su-In Lee³, Casey S. Greene⁴, Stephanie C. Hicks⁵ - Show less +2 more•Institutions (5)

University of Pennsylvania¹, University of Cambridge², University of Washington³, University of Colorado Denver⁴, Johns Hopkins University⁵

30 Aug 2021-Nature Methods

TL;DR: To make machine-learning analyses in the life sciences more computationally reproducible, standards based on data, model and code publication, programming best practices and workflow automation are proposed.

...read moreread less

Abstract: To make machine-learning analyses in the life sciences more computationally reproducible, we propose standards based on data, model and code publication, programming best practices and workflow automation. By meeting these standards, the community of researchers applying machine-learning methods in the life sciences can ensure that their analyses are worthy of trust.

...read moreread less

88 citations

Book Chapter•DOI•

PySyft: A Library for Easy Federated Learning

[...]

Alexander Ziller¹, Andrew Trask², Antonio Lopardo³, Benjamin Szymkow, Bobby Wagner, Emma Bluemke², Jean-Mickael Nounahon, Jonathan Passerat-Palmbach⁴, Kritika Prakash⁵, Nick Rose, Théo Ryffel⁶, Zarreen Naowal Reza, Georgios Kaissis¹ - Show less +9 more•Institutions (6)

Technische Universität München¹, University of Oxford², ETH Zurich³, Imperial College London⁴, International Institute of Information Technology, Hyderabad⁵, French Institute for Research in Computer Science and Automation⁶

01 Jan 2021

TL;DR: This chapter introduces Duet: the authors' tool for easier FL for scientists and data owners and provides a proof-of-concept demonstration of a FL workflow using an example of how to train a convolutional neural network.

...read moreread less

Abstract: PySyft is an open-source multi-language library enabling secure and private machine learning by wrapping and extending popular deep learning frameworks such as PyTorch in a transparent, lightweight, and user-friendly manner. Its aim is to both help popularize privacy-preserving techniques in machine learning by making them as accessible as possible via Python bindings and common tools familiar to researchers and data scientists, as well as to be extensible such that new Federated Learning (FL), Multi-Party Computation, or Differential Privacy methods can be flexibly and simply implemented and integrated. This chapter will introduce the methods available within the PySyft library and describe their implementations. We will then provide a proof-of-concept demonstration of a FL workflow using an example of how to train a convolutional neural network. Next, we review the use of PySyft in academic literature to date and discuss future use-cases and development plans. Most importantly, we introduce Duet: our tool for easier FL for scientists and data owners.

...read moreread less

Journal Article•DOI•

MaxDIA enables library-based and library-free data-independent acquisition proteomics

[...]

Pavel Sinitcyn¹, Hamid Hamzeiy¹, Favio Salinas Soto¹, Daniel N. Itzhak, Frank McCarthy, Christoph Wichmann¹, Martin Steger, Uli Ohmayer, Ute Distler², Stephanie Kaspar-Schoenefeld, Nikita Prianichnikov¹, Şule Yılmaz¹, Jan Daniel Rudolph³, Jan Daniel Rudolph¹, Stefan Tenzer², Yasset Perez-Riverol⁴, Nagarjuna Nagaraj, Sean J. Humphrey⁵, Jürgen Cox⁶, Jürgen Cox¹ - Show less +16 more•Institutions (6)

Max Planck Society¹, University of Mainz², Bosch³, European Bioinformatics Institute⁴, University of Sydney⁵, University of Bergen⁶

08 Jul 2021-Nature Biotechnology

TL;DR: MaxDIA as mentioned in this paper is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment, which is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries.

...read moreread less

Abstract: MaxDIA is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment. Using spectral libraries, MaxDIA achieves deep proteome coverage with substantially better coefficients of variation in protein quantification than other software. MaxDIA is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries. This is the foundation of discovery DIA-hypothesis-free analysis of DIA samples without library and with reliable FDR control. MaxDIA performs three- or four-dimensional feature detection of fragment data, and scoring of matches is augmented by machine learning on the features of an identification. MaxDIA's bootstrap DIA workflow performs multiple rounds of matching with increasing quality of recalibration and stringency of matching to the library. Combining MaxDIA with two new technologies-BoxCar acquisition and trapped ion mobility spectrometry-both lead to deep and accurate proteome quantification.

...read moreread less

Journal Article•DOI•

Toward a principled Bayesian workflow in cognitive science.

[...]

Daniel J. Schad¹, Michael Betancourt, Shravan Vasishth¹•Institutions (1)

University of Potsdam¹

01 Feb 2021-Psychological Methods

TL;DR: A principled Bayesian workflow is introduced that provides guidelines and checks for valid data analysis, avoiding overfitting complex models to noise, and capturing relevant data structure in a probabilistic model.

...read moreread less

Abstract: Experiments in research on memory, language, and in other areas of cognitive science are increasingly being analyzed using Bayesian methods. This has been facilitated by the development of probabilistic programming languages such as Stan, and easily accessible front-end packages such as brms. The utility of Bayesian methods, however, ultimately depends on the relevance of the Bayesian model, in particular whether or not it accurately captures the structure of the data and the data analyst's domain expertise. Even with powerful software, the analyst is responsible for verifying the utility of their model. To demonstrate this point, we introduce a principled Bayesian workflow (Betancourt, 2018) to cognitive science. Using a concrete working example, we describe basic questions one should ask about the model: prior predictive checks, computational faithfulness, model sensitivity, and posterior predictive checks. The running example for demonstrating the workflow is data on reading times with a linguistic manipulation of object versus subject relative clause sentences. This principled Bayesian workflow also demonstrates how to use domain knowledge to inform prior distributions. It provides guidelines and checks for valid data analysis, avoiding overfitting complex models to noise, and capturing relevant data structure in a probabilistic model. Given the increasing use of Bayesian methods, we aim to discuss how these methods can be properly employed to obtain robust answers to scientific questions. All data and code accompanying this article are available from https://osf.io/b2vx9/. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

...read moreread less

Journal Article•DOI•

Heat mitigation benefits of urban green and blue infrastructures: A systematic review of modeling techniques, validation and scenario simulation in ENVI-met V4

[...]

Zhixin Liu¹, Wenwen Cheng², Chi Yung Jim³, Tobi Eniolu Morakinyo⁴, Yuan Shi¹, Edward Ng¹ - Show less +2 more•Institutions (4)

The Chinese University of Hong Kong¹, University of Oklahoma², University of Hong Kong³, University College Dublin⁴

01 Aug 2021-Building and Environment

TL;DR: A comprehensive review was conducted on GBI-targeted studies enlisting ENVI-met as the primary tool, providing researchers with an overview of the ENvi-met methodology and recommendations to refine research on G BI thermal effects.

...read moreread less

Journal Article•DOI•

Deep learning in pore scale imaging and modeling

[...]

Ying Da Wang¹, Martin J. Blunt², Ryan T. Armstrong¹, Peyman Mostaghimi¹•Institutions (2)

University of New South Wales¹, Imperial College London²

01 Apr 2021-Earth-Science Reviews

TL;DR: Current limitations and challenges are discussed, including advances in network implementations, applications to unconventional resources, dataset acquisition and synthetic training, extrapolative potential, accuracy loss from soft computing, and the computational cost of 3D Deep Learning.

...read moreread less

Journal Article•DOI•

An Effective Cloud Workflow Scheduling Approach Combining PSO and Idle Time Slot-Aware Rules

[...]

Yun Wang¹, Xingquan Zuo¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

05 Apr 2021-IEEE/CAA Journal of Automatica Sinica

TL;DR: In this paper, the authors proposed a cloud workflow scheduling approach which combines particle swarm optimization and idle time slot-aware rules, to minimize the execution cost of a workflow application under a deadline constraint.

...read moreread less

Abstract: Workflow scheduling is a key issue and remains a challenging problem in cloud computing. Faced with the large number of virtual machine (VM) types offered by cloud providers, cloud users need to choose the most appropriate VM type for each task. Multiple task scheduling sequences exist in a workflow application. Different task scheduling sequences have a significant impact on the scheduling performance. It is not easy to determine the most appropriate set of VM types for tasks and the best task scheduling sequence. Besides, the idle time slots on VM instances should be used fully to increase resources' utilization and save the execution cost of a workflow. This paper considers these three aspects simultaneously and proposes a cloud workflow scheduling approach which combines particle swarm optimization (PSO) and idle time slot-aware rules, to minimize the execution cost of a workflow application under a deadline constraint. A new particle encoding is devised to represent the VM type required by each task and the scheduling sequence of tasks. An idle time slot-aware decoding procedure is proposed to decode a particle into a scheduling solution. To handle tasks' invalid priorities caused by the randomness of PSO, a repair method is used to repair those priorities to produce valid task scheduling sequences. The proposed approach is compared with state-of-the-art cloud workflow scheduling algorithms. Experiments show that the proposed approach outperforms the comparative algorithms in terms of both of the execution cost and the success rate in meeting the deadline.

...read moreread less

Journal Article•DOI•

Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review.

[...]

Lu Xu¹, Lu Xu², Leslie Sanders³, Kay Li³, James C. L. Chow⁴, James C. L. Chow¹ - Show less +2 more•Institutions (4)

University of Toronto¹, University of Western Ontario², York University³, Princess Margaret Cancer Centre⁴

29 Nov 2021

TL;DR: Chatbots have the potential to be integrated into clinical practice by working alongside health practitioners to reduce costs, refine workflow efficiencies, and improve patient outcomes, and further research and interdisciplinary collaboration could advance this technology to dramatically improve the quality of care for patients, rebalance the workload for clinicians, and revolutionize the practice of medicine.

...read moreread less

Abstract: Background: Chatbot is a timely topic applied in various fields, including medicine and health care, for human-like knowledge transfer and communication. Machine learning, a subset of artificial intelligence, has been proven particularly applicable in health care, with the ability for complex dialog management and conversational flexibility. Objective: This review article aims to report on the recent advances and current trends in chatbot technology in medicine. A brief historical overview, along with the developmental progress and design characteristics, is first introduced. The focus will be on cancer therapy, with in-depth discussions and examples of diagnosis, treatment, monitoring, patient support, workflow efficiency, and health promotion. In addition, this paper will explore the limitations and areas of concern, highlighting ethical, moral, security, technical, and regulatory standards and evaluation issues to explain the hesitancy in implementation. Methods: A search of the literature published in the past 20 years was conducted using the IEEE Xplore, PubMed, Web of Science, Scopus, and OVID databases. The screening of chatbots was guided by the open-access Botlist directory for health care components and further divided according to the following criteria: diagnosis, treatment, monitoring, support, workflow, and health promotion. Results: Even after addressing these issues and establishing the safety or efficacy of chatbots, human elements in health care will not be replaceable. Therefore, chatbots have the potential to be integrated into clinical practice by working alongside health practitioners to reduce costs, refine workflow efficiencies, and improve patient outcomes. Other applications in pandemic support, global health, and education are yet to be fully explored. Conclusions: Further research and interdisciplinary collaboration could advance this technology to dramatically improve the quality of care for patients, rebalance the workload for clinicians, and revolutionize the practice of medicine.

...read moreread less

Journal Article•DOI•

patRoon: open source software platform for environmental mass spectrometry based non-target screening

[...]

Rick Helmus¹, T.L. ter Laak¹, Annemarie P. van Wezel¹, Pim de Voogt¹, Emma L. Schymanski² - Show less +1 more•Institutions (2)

University of Amsterdam¹, University of Luxembourg²

06 Jan 2021-Journal of Cheminformatics

TL;DR: PatRoon as discussed by the authors is a new R based open-source software platform, which provides comprehensive, fully tailored and straightforward non-target analysis workflows, making the use, evaluation and mixing of well-tested algorithms seamless by harmonizing various common (primarily open) software tools under a consistent interface.

...read moreread less

Abstract: Mass spectrometry based non-target analysis is increasingly adopted in environmental sciences to screen and identify numerous chemicals simultaneously in highly complex samples. However, current data processing software either lack functionality for environmental sciences, solve only part of the workflow, are not openly available and/or are restricted in input data formats. In this paper we present patRoon, a new R based open-source software platform, which provides comprehensive, fully tailored and straightforward non-target analysis workflows. This platform makes the use, evaluation and mixing of well-tested algorithms seamless by harmonizing various common (primarily open) software tools under a consistent interface. In addition, patRoon offers various functionality and strategies to simplify and perform automated processing of complex (environmental) data effectively. patRoon implements several effective optimization strategies to significantly reduce computational times. The ability of patRoon to perform time-efficient and automated non-target data annotation of environmental samples is demonstrated with a simple and reproducible workflow using open-access data of spiked samples from a drinking water treatment plant study. In addition, the ability to easily use, combine and evaluate different algorithms was demonstrated for three commonly used feature finding algorithms. This article, combined with already published works, demonstrate that patRoon helps make comprehensive (environmental) non-target analysis readily accessible to a wider community of researchers.

...read moreread less

Journal Article•DOI•

ALKEMIE: An intelligent computational platform for accelerating materials discovery and design

[...]

Guanjie Wang¹, Liyu Peng¹, Kaiqi Li¹, Linggang Zhu¹, Jian Zhou¹, Naihua Miao¹, Zhimei Sun¹ - Show less +3 more•Institutions (1)

Beihang University¹

01 Jan 2021-Computational Materials Science

TL;DR: An open-source computational platform named ALKEMIE, acronyms for Artificial Learning and Knowledge Enhanced Materials Informatics Engineering, which enables easy access of data-driven techniques to broad communities and has an elaborately designed user-friendly graphical user-interface which makes the workflow and dataflow more maneuverable and transparent, facilitating its easy-to-use for scientists with broad backgrounds.

...read moreread less

Journal Article•DOI•

Digital twin-based assembly data management and process traceability for complex products

[...]

Cunbo Zhuang¹, Jingcheng Gong¹, Jianhua Liu¹•Institutions (1)

Beijing Institute of Technology¹

01 Jan 2021-Journal of Manufacturing Systems

TL;DR: A digital twin-based assembly data management and process traceability approach for complex products is proposed and the Digital Twin-based Assembly Process Management and Control System (DT-APMCS) was designed to verify the efficiency of the proposed approach.

...read moreread less

Journal Article•DOI•

Urban Building Energy Modeling (UBEM) Tools: A State-of-the-Art Review of bottom-up physics-based approaches.

[...]

Martina Ferrando¹, Francesco Causone¹, Tianzhen Hong², Yixing Chen³, Yixing Chen⁴ - Show less +1 more•Institutions (4)

Polytechnic University of Milan¹, Lawrence Berkeley National Laboratory², Hunan University³, Chinese Ministry of Education⁴

25 Feb 2021-arXiv: Computers and Society

TL;DR: In this paper, the authors present a review of the main bottom-up physics-based UBEM tools, comparing them from a user-oriented perspective, focusing on the required inputs, the reported outputs, the exploited workflow, the applicability of each tool, and the potential users.

...read moreread less

Abstract: Regulations corroborate the importance of retrofitting existing building stocks or constructing new energy efficient district. There is, thus, a need for modeling tools to evaluate energy scenarios to better manage and design cities, and numerous methodologies and tools have been developed. Among them, Urban Building Energy Modeling (UBEM) tools allow the energy simulation of buildings at large scales. Choosing an appropriate UBEM tool, balancing the level of complexity, accuracy, usability, and computing needs, remains a challenge for users. The review focuses on the main bottom-up physics-based UBEM tools, comparing them from a user-oriented perspective. Five categories are used: (i) the required inputs, (ii) the reported outputs, (iii) the exploited workflow, (iv) the applicability of each tool, and (v) the potential users. Moreover, a critical discussion is proposed focusing on interests and trends in research and development. The results highlighted major differences between UBEM tools that must be considered to choose the proper one for an application. Barriers of adoption of UBEM tools include the needs of a standardized ontology, a common three dimensional city model, a standard procedure to collect data, and a standard set of test cases. This feeds into future development of UBEM tools to support cities' sustainability goals.

...read moreread less

Journal Article•DOI•

Online Multi-Workflow Scheduling under Uncertain Task Execution Time in IaaS Clouds

[...]

Jiagang Liu¹, Ren Ju², Wei Dai³, Deyu Zhang², Pude Zhou¹, Zhang Yaoxue², Geyong Min¹, Noushin Najjari¹ - Show less +4 more•Institutions (3)

Central South University¹, Hunan Institute of Technology², Jiangxi University of Finance and Economics³

01 Jul 2021-IEEE Transactions on Cloud Computing

TL;DR: An onliNe multi-workflOw Scheduling Framework, named NOSF, to schedule deadline-constrained workflows with random arrivals and uncertain task execution time is proposed and significantly outperforms two state-of-the-art algorithms in terms of reducing VM rental costs and deadline violation probability.

...read moreread less

Abstract: Cloud has become an important platform for executing numerous deadline-constrained scientific applications generally represented by workflow models. It provides scientists a simple and cost-efficient method of running workflows on their rental Virtual Machines (VMs) anytime and anywhere. Since pay-as-you-go is a dominating pricing solution in clouds, extensive research efforts have been devoted to minimizing the monetary cost of executing workflows by designing tailored VM allocation mechanisms. However, most of them assume that the task execution time in clouds is static and can be estimated in advance, which is impractical in real scenarios due to performance fluctuation of VMs. In this paper, we propose an onli N e multi-workfl O w S cheduling F ramework, named NOSF, to schedule deadline-constrained workflows with random arrivals and uncertain task execution time. In NOSF, workflow scheduling process consists of three phases, including workflow preprocessing, VM allocation and feedback process. Built upon the new framework, a deadline-aware heuristic algorithm is then developed to elastically provision suitable VMs for workflow execution, with the objective of minimizing the rental cost and improving resource utilization. Simulation results demonstrate that the proposed algorithm significantly outperforms two state-of-the-art algorithms in terms of reducing VM rental costs and deadline violation probability, as well as improving the resource utilization efficiency.

...read moreread less

Journal Article•DOI•

Task scheduling, resource provisioning, and load balancing on scientific workflows using parallel SARSA reinforcement learning agents and genetic algorithm

[...]

Ali Asghari¹, Mohammad Karim Sohrabi¹, Farzin Yaghmaee¹, Farzin Yaghmaee²•Institutions (2)

Islamic Azad University¹, Semnan University²

01 Mar 2021-The Journal of Supercomputing

TL;DR: The experimental results show that the proposed algorithm reduces makespan, enhances resource utilization, and improves load balancing, compared to MOHEFT and MCP, the well-known workflow scheduling algorithms of the literature.

...read moreread less

Abstract: Cloud computing is one of the most popular distributed environments, in which, multiple powerful and heterogeneous resources are used by different user applications Task scheduling and resource provisioning are two important challenges of cloud environment, called cloud resource management Resource management is a major problem especially for scientific workflows due to their heavy calculations and dependency between their operations Several algorithms and methods have been developed to manage cloud resources In this paper, the combination of state-action-reward-state-action learning and genetic algorithm is used to manage cloud resources At the first step, the intelligent agents schedule the tasks during the learning process by exploring the workflow Then, in the resource provisioning step, each resource is assigned to an agent, and its utilization is attempted to be maximized in the learning process of its corresponding agent This is conducted by selecting the most appropriate set of the tasks that maximizes the utilization of the resource Genetic algorithm is utilized for convergence of the agents of the proposed method, and to achieve global optimization The fitness function that has been exploited by this genetic algorithm seeks to achieve more efficient resource utilization and better load balancing by observing the deadlines of the tasks The experimental results show that the proposed algorithm reduces makespan, enhances resource utilization, and improves load balancing, compared to MOHEFT and MCP, the well-known workflow scheduling algorithms of the literature

...read moreread less

Posted Content•

Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods.

[...]

Zohaib Salahuddin, Henry C. Woodruff, Avishek Chatterjee, Philippe Lambin

01 Nov 2021-arXiv: Image and Video Processing

TL;DR: In this article, a narrative review of interpretability methods for deep learning models for medical image analysis applications is presented, which is based on the type of generated explanations and technical similarities.

...read moreread less

Abstract: Artificial Intelligence has emerged as a useful aid in numerous clinical applications for diagnosis and treatment decisions. Deep neural networks have shown same or better performance than clinicians in many tasks owing to the rapid increase in the available data and computational power. In order to conform to the principles of trustworthy AI, it is essential that the AI system be transparent, robust, fair and ensure accountability. Current deep neural solutions are referred to as black-boxes due to a lack of understanding of the specifics concerning the decision making process. Therefore, there is a need to ensure interpretability of deep neural networks before they can be incorporated in the routine clinical workflow. In this narrative review, we utilized systematic keyword searches and domain expertise to identify nine different types of interpretability methods that have been used for understanding deep learning models for medical image analysis applications based on the type of generated explanations and technical similarities. Furthermore, we report the progress made towards evaluating the explanations produced by various interpretability methods. Finally we discuss limitations, provide guidelines for using interpretability methods and future directions concerning the interpretability of deep neural networks for medical imaging analysis.

...read moreread less

Journal Article•DOI•

Chemometric analysis in Raman spectroscopy from experimental design to machine learning-based modeling.

[...]

Shuxia Guo¹, Shuxia Guo², Shuxia Guo³, Jürgen Popp², Jürgen Popp¹, Thomas Bocklitz¹, Thomas Bocklitz² - Show less +3 more•Institutions (3)

University of Jena¹, Leibniz Institute of Photonic Technology², Southeast University³

05 Nov 2021-Nature Protocols

TL;DR: In this article, the authors provide guidance for performing chemometric analysis to detect and extract information relating to the chemical differences between biological samples, such as whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not.

...read moreread less

Abstract: Raman spectroscopy is increasingly being used in biology, forensics, diagnostics, pharmaceutics and food science applications. This growth is triggered not only by improvements in the computational and experimental setups but also by the development of chemometric techniques. Chemometric techniques are the analytical processes used to detect and extract information from subtle differences in Raman spectra obtained from related samples. This information could be used to find out, for example, whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not. Chemometric techniques include spectral processing (ensuring that the spectra used for the subsequent computational processes are as clean as possible) as well as the statistical analysis of the data required for finding the spectral differences that are most useful for differentiation between, for example, different cell types. For Raman spectra, this analysis process is not yet standardized, and there are many confounding pitfalls. This protocol provides guidance on how to perform a Raman spectral analysis: how to avoid these pitfalls, and strategies to circumvent problematic issues. The protocol is divided into four parts: experimental design, data preprocessing, data learning and model transfer. We exemplify our workflow using three example datasets where the spectra from individual cells were collected in single-cell mode, and one dataset where the data were collected from a raster scanning–based Raman spectral imaging experiment of mice tissue. Our aim is to help move Raman-based technologies from proof-of-concept studies toward real-world applications. Raman spectroscopy is increasingly being used in biological assays and studies. This protocol provides guidance for performing chemometric analysis to detect and extract information relating to the chemical differences between biological samples.

...read moreread less

Journal Article•DOI•

ImaCytE: Visual Exploration of Cellular Micro-Environments for Imaging Mass Cytometry Data

[...]

Antonios Somarakis¹, Vincent van Unen¹, Frits Koning¹, Boudewijn P. F. Lelieveldt¹, Thomas Höllt¹ - Show less +1 more•Institutions (1)

Leiden University Medical Center¹

01 Jan 2021-IEEE Transactions on Visualization and Computer Graphics

TL;DR: An interactive visual analysis workflow for the end-to-end analysis of Imaging Mass Cytometry data that was developed in close collaboration with domain expert partners is presented and the effectiveness of the workflow and ImaCytE is shown.

...read moreread less

Abstract: Tissue functionality is determined by the characteristics of tissue-resident cells and their interactions within their microenvironment. Imaging Mass Cytometry offers the opportunity to distinguish cell types with high precision and link them to their spatial location in intact tissues at sub-cellular resolution. This technology produces large amounts of spatially-resolved high-dimensional data, which constitutes a serious challenge for the data analysis. We present an interactive visual analysis workflow for the end-to-end analysis of Imaging Mass Cytometry data that was developed in close collaboration with domain expert partners. We implemented the presented workflow in an interactive visual analysis tool; ImaCytE. Our workflow is designed to allow the user to discriminate cell types according to their protein expression profiles and analyze their cellular microenvironments, aiding in the formulation or verification of hypotheses on tissue architecture and function. Finally, we show the effectiveness of our workflow and ImaCytE through a case study performed by a collaborating specialist.

...read moreread less

Journal Article•DOI•

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers.

[...]

Laura Wratten¹, Andreas Wilm, Jonathan Göke¹•Institutions (1)

Genome Institute of Singapore¹

23 Sep 2021-Nature Methods

TL;DR: Workflow managers have been developed to simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing as mentioned in this paper.

...read moreread less

Abstract: The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not just for experiments, but also for computational analysis. However, transforming data into information involves running a large number of tools, optimizing parameters, and integrating dynamically changing reference data. Workflow managers were developed in response to such challenges. They simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing. In this Perspective, we highlight key features of workflow managers, compare commonly used approaches for bioinformatics workflows, and provide a guide for computational and noncomputational users. We outline community-curated pipeline initiatives that enable novice and experienced users to perform complex, best-practice analyses without having to manually assemble workflows. In sum, we illustrate how workflow managers contribute to making computational analysis in biomedical research shareable, scalable, and reproducible.

...read moreread less

Journal Article•DOI•

Dependable Scheduling for Real-Time Workflows on Cyber–Physical Cloud Systems

[...]

Junlong Zhou¹, Jin Sun¹, Mingyue Zhang¹, Yue Ma²•Institutions (2)

Nanjing University of Science and Technology¹, University of Notre Dame²

01 Nov 2021-IEEE Transactions on Industrial Informatics

TL;DR: A dependable algorithm for scheduling workflow applications on CPCS that uses slack to recover failed tasks and allows all tasks to share the available slack in the system to improve soft-error reliability.

...read moreread less

Abstract: Cyber–physical cloud systems (CPCS) are integrations of cyber–physical systems (CPS) and cloud computing infrastructures. Integrating CPS into cloud computing infrastructures could improve the performance in many aspects. However, new reliability and security challenges are also introduced. This fact highlights the need to develop novel methodologies to tackle these challenges in CPCS. To this end, this article is oriented toward enhancing the soft-error reliability of real-time workflows on CPCS while satisfying the lifetime reliability, security, and real-time constraints. In this article, we propose a dependable algorithm for scheduling workflow applications on CPCS. The proposed algorithm uses slack to recover failed tasks and allows all tasks to share the available slack in the system. To improve soft-error reliability, the algorithm first determines the priority of tasks, then assigns the maximum frequency to each task, and finally assigns the recoveries to tasks dynamically. Slack also can be used to utilize security services for satisfying system security requirements. The lifetime reliability constraint is met by dynamically scaling down the operating frequency of low-priority tasks. Extensive experiments on real-world workflow benchmarks demonstrate that the proposed scheme reduces the probability of failure by up to $52.1\%$ and improves the scheduling feasibility by up to $83.5\%$ compared to a number of representative approaches.

...read moreread less

Journal Article•DOI•

Demystifying MLOps and Presenting a Recipe for the Selection of Open-Source Tools

[...]

Philipp Ruf, Manav Madan, Christoph Reich, Djaffar Ould-Abdeslam

23 Sep 2021-Applied Sciences

TL;DR: By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given.

...read moreread less

Abstract: Nowadays, machine learning projects have become more and more relevant to various real-world use cases. The success of complex Neural Network models depends upon many factors, as the requirement for structured and machine learning-centric project development management arises. Due to the multitude of tools available for different operational phases, responsibilities and requirements become more and more unclear. In this work, Machine Learning Operations (MLOps) technologies and tools for every part of the overall project pipeline, as well as involved roles, are examined and clearly defined. With the focus on the inter-connectivity of specific tools and comparison by well-selected requirements of MLOps, model performance, input data, and system quality metrics are briefly discussed. By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given. Deep learning has revolutionized the field of Image processing, and building an automated machine learning workflow for object detection is of great interest for many organizations. For this, a simple MLOps workflow for object detection with images is portrayed.

...read moreread less

Journal Article•DOI•

Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods.

[...]

Zoe A Clarke¹, Tallulah S Andrews², Tallulah S Andrews¹, Jawairia Atif¹, Jawairia Atif², Delaram Pouyabahar¹, Brendan T. Innes¹, Sonya A. MacParland¹, Sonya A. MacParland², Gary D. Bader - Show less +6 more•Institutions (2)

University of Toronto¹, Toronto General Hospital²

24 May 2021-Nature Protocols

TL;DR: In this paper, the authors provide guidelines for interpreting single-cell transcriptomic maps to identify cell types, states and other biologically relevant patterns with the objective of creating an annotated map of cells.

...read moreread less

Abstract: Single-cell transcriptomics can profile thousands of cells in a single experiment and identify novel cell types, states and dynamics in a wide variety of tissues and organisms. Standard experimental protocols and analysis workflows have been developed to create single-cell transcriptomic maps from tissues. This tutorial focuses on how to interpret these data to identify cell types, states and other biologically relevant patterns with the objective of creating an annotated map of cells. We recommend a three-step workflow including automatic cell annotation (wherever possible), manual cell annotation and verification. Frequently encountered challenges are discussed, as well as strategies to address them. Guiding principles and specific recommendations for software tools and resources that can be used for each step are covered, and an R notebook is included to help run the recommended workflow. Basic familiarity with computer software is assumed, and basic knowledge of programming (e.g., in the R language) is recommended. This tutorial provides guidelines for interpreting single-cell transcriptomic maps to identify cell types, states and other biologically relevant patterns.

...read moreread less

Collapse