scispace - formally typeset
Search or ask a question

Showing papers on "Workflow published in 2021"


Journal ArticleDOI
TL;DR: It is shown how the popular workflow management system Snakemake can be used to guarantee reproducibility, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

519 citations


Journal ArticleDOI
TL;DR: In the authors’ perspective, in situ monitoring of AM processes will significantly benefit from the object detection ability of ML, and data sharing of AM would enable faster adoption of ML in AM.
Abstract: Additive manufacturing (AM) or 3D printing is growing rapidly in the manufacturing industry and has gained a lot of attention from various fields owing to its ability to fabricate parts with complex features. The reliability of the 3D printed parts has been the focus of the researchers to realize AM as an end-part production tool. Machine learning (ML) has been applied in various aspects of AM to improve the whole design and manufacturing workflow especially in the era of industry 4.0. In this review article, various types of ML techniques are first introduced. It is then followed by the discussion on their use in various aspects of AM such as design for 3D printing, material tuning, process optimization, in situ monitoring, cloud service, and cybersecurity. Potential applications in the biomedical, tissue engineering and building and construction will be highlighted. The challenges faced by ML in AM such as computational cost, standards for qualification and data acquisition techniques will also be discussed. In the authors’ perspective, in situ monitoring of AM processes will significantly benefit from the object detection ability of ML. As a large data set is crucial for ML, data sharing of AM would enable faster adoption of ML in AM. Standards for the shared data are needed to facilitate easy sharing of data. The use of ML in AM will become more mature and widely adopted as better data acquisition techniques and more powerful computer chips for ML are developed.

229 citations


Journal ArticleDOI
TL;DR: The workflows designed to enable researchers to interpret data can constrain the biological questions that can be asked as discussed by the authors, but the workflows can also be difficult to adapt to real-world applications.
Abstract: Big data abound in microbiology, but the workflows designed to enable researchers to interpret data can constrain the biological questions that can be asked. Five years after anvi’o was first published, this community-led multi-omics platform is maturing into an open software ecosystem that reduces constraints in ‘omics data analyses.

220 citations


Journal ArticleDOI
TL;DR: The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization.
Abstract: Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).

209 citations


Journal ArticleDOI
TL;DR: In this article, a workflow for preprocessing single-cell RNA-sequencing data that balances efficiency and accuracy is described, based on the kallisto and bustools programs.
Abstract: We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses

170 citations


Journal ArticleDOI
TL;DR: In this article, a global single-cell mass spectrometry-based proteomics approach is proposed for large-scale single cell analyses, which can provide insights into the molecular basis for cellular heterogeneity.
Abstract: Large-scale single-cell analyses are of fundamental importance in order to capture biological heterogeneity within complex cell systems, but have largely been limited to RNA-based technologies. Here we present a comprehensive benchmarked experimental and computational workflow, which establishes global single-cell mass spectrometry-based proteomics as a tool for large-scale single-cell analyses. By exploiting a primary leukemia model system, we demonstrate both through pre-enrichment of cell populations and through a non-enriched unbiased approach that our workflow enables the exploration of cellular heterogeneity within this aberrant developmental hierarchy. Our approach is capable of consistently quantifying ~1000 proteins per cell across thousands of individual cells using limited instrument time. Furthermore, we develop a computational workflow (SCeptre) that effectively normalizes the data, integrates available FACS data and facilitates downstream analysis. The approach presented here lays a foundation for implementing global single-cell proteomics studies across the world. Single-cell proteomics can provide insights into the molecular basis for cellular heterogeneity. Here, the authors develop a multiplexed single-cell proteomics and computational workflow, and show that their strategy captures the cellular hierarchies in an Acute Myeloid Leukemia culture model.

149 citations


Journal ArticleDOI
TL;DR: This article proposes secure AIoT for implicit group recommendations (SAIoT-GRs) with a collaborative Bayesian network model and noncooperative game introduced as algorithms and is able to maximize the advantages of the two modules.
Abstract: The emergence of Artificial Intelligence of Things (AIoT) has provided novel insights for many social computing applications such as group recommender systems. As the distances between people have been greatly shortened, there has been more general demand for the provision of personalized services aimed at groups instead of individuals. The existing methods for capturing group-level preference features from individuals have mostly been established via aggregation and face two challenges: secure data management workflows are absent, and implicit preference feedback is ignored. To tackle these current difficulties, this paper proposes secure AIoT for implicit group recommendations (SAIoT-GR). For the hardware module, a secure IoT structure is developed as the bottom support platform. For the software module, a collaborative Bayesian network model and noncooperative game are introduced as algorithms. This secure AIoT architecture is able to maximize the advantages of the two modules. In addition, a large number of experiments are carried out to evaluate the performance of SAIoT-GR in terms of efficiency and robustness.

117 citations


Journal ArticleDOI
TL;DR: This study develops an unceRtainty-aware Online Scheduling Algorithm (ROSA) to schedule dynamic and multiple workflows with deadlines that performs better than the five compared algorithms with respect to costs, deviations, deviation, resource utilization, and fairness.
Abstract: Scheduling workflows in cloud service environment has attracted great enthusiasm, and various approaches have been reported up to now. However, these approaches often ignored the uncertainties in the scheduling environment, such as the uncertain task start/execution/finish time, the uncertain data transfer time among tasks, the sudden arrival of new workflows. Ignoring these uncertain factors often leads to the violation of workflow deadlines and increases service renting costs of executing workflows. This study devotes to improving the performance for cloud service platforms by minimizing uncertainty propagation in scheduling workflow applications that have both uncertain task execution time and data transfer time. To be specific, a novel scheduling architecture is designed to control the count of workflow tasks directly waiting on each service instance (e.g., virtual machine and container). Once a task is completed, its start/execution/finish time are available, which means its uncertainties disappearing, and will not affect the subsequent waiting tasks on the same service instance. Thus, controlling the count of waiting tasks on service instances can prohibit the propagation of uncertainties. Based on this architecture, we develop an unce R tainty-aware O nline S cheduling A lgorithm ( ROSA ) to schedule dynamic and multiple workflows with deadlines. The proposed ROSA skillfully integrates both the proactive and reactive strategies. During the execution of the generated baseline schedules, the reactive strategy in ROSA will be dynamically called to produce new proactive baseline schedules for dealing with uncertainties. Then, on the basis of real-world workflow traces, five groups of simulation experiments are carried out to compare ROSA with five typical algorithms. The comparison results reveal that ROSA performs better than the five compared algorithms with respect to costs (up to 56 percent), deviation (up to 70 percent), resource utilization (up to 37 percent), and fairness (up to 37 percent).

116 citations


Journal ArticleDOI
TL;DR: In this article, a cloud-edge based dynamic reconfiguration to service workflow for mobile e-commerce environments is proposed, where the value and cost attributes of a service are considered, and a long short-term memory (LSTM) neural network is used to predict the stability of services.
Abstract: The emergence of mobile service composition meets the current needs for real-time eCommerce. However, the requirements for eCommerce, such as safety and timeliness, are becoming increasingly strict. Thus, the cloud-edge hybrid computing model has been introduced to accelerate information processing, especially in a mobile scenario. However, the mobile environment is characterized by limited resource storage and users who frequently move, and these characteristics strongly affect the reliability of service composition running in this environment. Consequently, applications are likely to fail if inappropriate services are invoked. To ensure that the composite service can operate normally, traditional dynamic reconfiguration methods tend to focus on cloud services scheduling. Unfortunately, most of these approaches cannot support timely responses to dynamic changes. In this article, the cloud-edge based dynamic reconfiguration to service workflow for mobile eCommerce environments is proposed. First, the service quality concept is extended. Specifically, the value and cost attributes of a service are considered. The value attribute is used to assess the stability of the service for some time to come, and the cost attribute is the cost of a service invocation. Second, a long short-term memory (LSTM) neural network is used to predict the stability of services, which is related to the calculation of the value attribute. Then, in view of the limited available equipment resources, a method for calculating the cost of calling a service is introduced. Third, candidate services are selected by considering both service stability and the cost of service invocation, thus yielding a dynamic reconfiguration scheme that is more suitable for the cloud-edge environment. Finally, a series of comparative experiments were carried out, and the experimental results prove that the method proposed in this article offers higher stability, less energy consumption, and more accurate service prediction.

93 citations


Journal ArticleDOI
TL;DR: To make machine-learning analyses in the life sciences more computationally reproducible, standards based on data, model and code publication, programming best practices and workflow automation are proposed.
Abstract: To make machine-learning analyses in the life sciences more computationally reproducible, we propose standards based on data, model and code publication, programming best practices and workflow automation. By meeting these standards, the community of researchers applying machine-learning methods in the life sciences can ensure that their analyses are worthy of trust.

88 citations


Book ChapterDOI
01 Jan 2021
TL;DR: This chapter introduces Duet: the authors' tool for easier FL for scientists and data owners and provides a proof-of-concept demonstration of a FL workflow using an example of how to train a convolutional neural network.
Abstract: PySyft is an open-source multi-language library enabling secure and private machine learning by wrapping and extending popular deep learning frameworks such as PyTorch in a transparent, lightweight, and user-friendly manner. Its aim is to both help popularize privacy-preserving techniques in machine learning by making them as accessible as possible via Python bindings and common tools familiar to researchers and data scientists, as well as to be extensible such that new Federated Learning (FL), Multi-Party Computation, or Differential Privacy methods can be flexibly and simply implemented and integrated. This chapter will introduce the methods available within the PySyft library and describe their implementations. We will then provide a proof-of-concept demonstration of a FL workflow using an example of how to train a convolutional neural network. Next, we review the use of PySyft in academic literature to date and discuss future use-cases and development plans. Most importantly, we introduce Duet: our tool for easier FL for scientists and data owners.

Journal ArticleDOI
TL;DR: MaxDIA as mentioned in this paper is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment, which is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries.
Abstract: MaxDIA is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment. Using spectral libraries, MaxDIA achieves deep proteome coverage with substantially better coefficients of variation in protein quantification than other software. MaxDIA is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries. This is the foundation of discovery DIA-hypothesis-free analysis of DIA samples without library and with reliable FDR control. MaxDIA performs three- or four-dimensional feature detection of fragment data, and scoring of matches is augmented by machine learning on the features of an identification. MaxDIA's bootstrap DIA workflow performs multiple rounds of matching with increasing quality of recalibration and stringency of matching to the library. Combining MaxDIA with two new technologies-BoxCar acquisition and trapped ion mobility spectrometry-both lead to deep and accurate proteome quantification.

Journal ArticleDOI
TL;DR: A principled Bayesian workflow is introduced that provides guidelines and checks for valid data analysis, avoiding overfitting complex models to noise, and capturing relevant data structure in a probabilistic model.
Abstract: Experiments in research on memory, language, and in other areas of cognitive science are increasingly being analyzed using Bayesian methods. This has been facilitated by the development of probabilistic programming languages such as Stan, and easily accessible front-end packages such as brms. The utility of Bayesian methods, however, ultimately depends on the relevance of the Bayesian model, in particular whether or not it accurately captures the structure of the data and the data analyst's domain expertise. Even with powerful software, the analyst is responsible for verifying the utility of their model. To demonstrate this point, we introduce a principled Bayesian workflow (Betancourt, 2018) to cognitive science. Using a concrete working example, we describe basic questions one should ask about the model: prior predictive checks, computational faithfulness, model sensitivity, and posterior predictive checks. The running example for demonstrating the workflow is data on reading times with a linguistic manipulation of object versus subject relative clause sentences. This principled Bayesian workflow also demonstrates how to use domain knowledge to inform prior distributions. It provides guidelines and checks for valid data analysis, avoiding overfitting complex models to noise, and capturing relevant data structure in a probabilistic model. Given the increasing use of Bayesian methods, we aim to discuss how these methods can be properly employed to obtain robust answers to scientific questions. All data and code accompanying this article are available from https://osf.io/b2vx9/. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

Journal ArticleDOI
TL;DR: A comprehensive review was conducted on GBI-targeted studies enlisting ENVI-met as the primary tool, providing researchers with an overview of the ENvi-met methodology and recommendations to refine research on G BI thermal effects.

Journal ArticleDOI
TL;DR: Current limitations and challenges are discussed, including advances in network implementations, applications to unconventional resources, dataset acquisition and synthetic training, extrapolative potential, accuracy loss from soft computing, and the computational cost of 3D Deep Learning.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a cloud workflow scheduling approach which combines particle swarm optimization and idle time slot-aware rules, to minimize the execution cost of a workflow application under a deadline constraint.
Abstract: Workflow scheduling is a key issue and remains a challenging problem in cloud computing. Faced with the large number of virtual machine (VM) types offered by cloud providers, cloud users need to choose the most appropriate VM type for each task. Multiple task scheduling sequences exist in a workflow application. Different task scheduling sequences have a significant impact on the scheduling performance. It is not easy to determine the most appropriate set of VM types for tasks and the best task scheduling sequence. Besides, the idle time slots on VM instances should be used fully to increase resources' utilization and save the execution cost of a workflow. This paper considers these three aspects simultaneously and proposes a cloud workflow scheduling approach which combines particle swarm optimization (PSO) and idle time slot-aware rules, to minimize the execution cost of a workflow application under a deadline constraint. A new particle encoding is devised to represent the VM type required by each task and the scheduling sequence of tasks. An idle time slot-aware decoding procedure is proposed to decode a particle into a scheduling solution. To handle tasks' invalid priorities caused by the randomness of PSO, a repair method is used to repair those priorities to produce valid task scheduling sequences. The proposed approach is compared with state-of-the-art cloud workflow scheduling algorithms. Experiments show that the proposed approach outperforms the comparative algorithms in terms of both of the execution cost and the success rate in meeting the deadline.

Journal ArticleDOI
29 Nov 2021
TL;DR: Chatbots have the potential to be integrated into clinical practice by working alongside health practitioners to reduce costs, refine workflow efficiencies, and improve patient outcomes, and further research and interdisciplinary collaboration could advance this technology to dramatically improve the quality of care for patients, rebalance the workload for clinicians, and revolutionize the practice of medicine.
Abstract: Background: Chatbot is a timely topic applied in various fields, including medicine and health care, for human-like knowledge transfer and communication. Machine learning, a subset of artificial intelligence, has been proven particularly applicable in health care, with the ability for complex dialog management and conversational flexibility. Objective: This review article aims to report on the recent advances and current trends in chatbot technology in medicine. A brief historical overview, along with the developmental progress and design characteristics, is first introduced. The focus will be on cancer therapy, with in-depth discussions and examples of diagnosis, treatment, monitoring, patient support, workflow efficiency, and health promotion. In addition, this paper will explore the limitations and areas of concern, highlighting ethical, moral, security, technical, and regulatory standards and evaluation issues to explain the hesitancy in implementation. Methods: A search of the literature published in the past 20 years was conducted using the IEEE Xplore, PubMed, Web of Science, Scopus, and OVID databases. The screening of chatbots was guided by the open-access Botlist directory for health care components and further divided according to the following criteria: diagnosis, treatment, monitoring, support, workflow, and health promotion. Results: Even after addressing these issues and establishing the safety or efficacy of chatbots, human elements in health care will not be replaceable. Therefore, chatbots have the potential to be integrated into clinical practice by working alongside health practitioners to reduce costs, refine workflow efficiencies, and improve patient outcomes. Other applications in pandemic support, global health, and education are yet to be fully explored. Conclusions: Further research and interdisciplinary collaboration could advance this technology to dramatically improve the quality of care for patients, rebalance the workload for clinicians, and revolutionize the practice of medicine.

Journal ArticleDOI
TL;DR: PatRoon as discussed by the authors is a new R based open-source software platform, which provides comprehensive, fully tailored and straightforward non-target analysis workflows, making the use, evaluation and mixing of well-tested algorithms seamless by harmonizing various common (primarily open) software tools under a consistent interface.
Abstract: Mass spectrometry based non-target analysis is increasingly adopted in environmental sciences to screen and identify numerous chemicals simultaneously in highly complex samples. However, current data processing software either lack functionality for environmental sciences, solve only part of the workflow, are not openly available and/or are restricted in input data formats. In this paper we present patRoon, a new R based open-source software platform, which provides comprehensive, fully tailored and straightforward non-target analysis workflows. This platform makes the use, evaluation and mixing of well-tested algorithms seamless by harmonizing various common (primarily open) software tools under a consistent interface. In addition, patRoon offers various functionality and strategies to simplify and perform automated processing of complex (environmental) data effectively. patRoon implements several effective optimization strategies to significantly reduce computational times. The ability of patRoon to perform time-efficient and automated non-target data annotation of environmental samples is demonstrated with a simple and reproducible workflow using open-access data of spiked samples from a drinking water treatment plant study. In addition, the ability to easily use, combine and evaluate different algorithms was demonstrated for three commonly used feature finding algorithms. This article, combined with already published works, demonstrate that patRoon helps make comprehensive (environmental) non-target analysis readily accessible to a wider community of researchers.

Journal ArticleDOI
Guanjie Wang1, Liyu Peng1, Kaiqi Li1, Linggang Zhu1, Jian Zhou1, Naihua Miao1, Zhimei Sun1 
TL;DR: An open-source computational platform named ALKEMIE, acronyms for Artificial Learning and Knowledge Enhanced Materials Informatics Engineering, which enables easy access of data-driven techniques to broad communities and has an elaborately designed user-friendly graphical user-interface which makes the workflow and dataflow more maneuverable and transparent, facilitating its easy-to-use for scientists with broad backgrounds.

Journal ArticleDOI
TL;DR: A digital twin-based assembly data management and process traceability approach for complex products is proposed and the Digital Twin-based Assembly Process Management and Control System (DT-APMCS) was designed to verify the efficiency of the proposed approach.

Journal ArticleDOI
TL;DR: In this paper, the authors present a review of the main bottom-up physics-based UBEM tools, comparing them from a user-oriented perspective, focusing on the required inputs, the reported outputs, the exploited workflow, the applicability of each tool, and the potential users.
Abstract: Regulations corroborate the importance of retrofitting existing building stocks or constructing new energy efficient district. There is, thus, a need for modeling tools to evaluate energy scenarios to better manage and design cities, and numerous methodologies and tools have been developed. Among them, Urban Building Energy Modeling (UBEM) tools allow the energy simulation of buildings at large scales. Choosing an appropriate UBEM tool, balancing the level of complexity, accuracy, usability, and computing needs, remains a challenge for users. The review focuses on the main bottom-up physics-based UBEM tools, comparing them from a user-oriented perspective. Five categories are used: (i) the required inputs, (ii) the reported outputs, (iii) the exploited workflow, (iv) the applicability of each tool, and (v) the potential users. Moreover, a critical discussion is proposed focusing on interests and trends in research and development. The results highlighted major differences between UBEM tools that must be considered to choose the proper one for an application. Barriers of adoption of UBEM tools include the needs of a standardized ontology, a common three dimensional city model, a standard procedure to collect data, and a standard set of test cases. This feeds into future development of UBEM tools to support cities' sustainability goals.

Journal ArticleDOI
TL;DR: An onliNe multi-workflOw Scheduling Framework, named NOSF, to schedule deadline-constrained workflows with random arrivals and uncertain task execution time is proposed and significantly outperforms two state-of-the-art algorithms in terms of reducing VM rental costs and deadline violation probability.
Abstract: Cloud has become an important platform for executing numerous deadline-constrained scientific applications generally represented by workflow models. It provides scientists a simple and cost-efficient method of running workflows on their rental Virtual Machines (VMs) anytime and anywhere. Since pay-as-you-go is a dominating pricing solution in clouds, extensive research efforts have been devoted to minimizing the monetary cost of executing workflows by designing tailored VM allocation mechanisms. However, most of them assume that the task execution time in clouds is static and can be estimated in advance, which is impractical in real scenarios due to performance fluctuation of VMs. In this paper, we propose an onli N e multi-workfl O w S cheduling F ramework, named NOSF, to schedule deadline-constrained workflows with random arrivals and uncertain task execution time. In NOSF, workflow scheduling process consists of three phases, including workflow preprocessing, VM allocation and feedback process. Built upon the new framework, a deadline-aware heuristic algorithm is then developed to elastically provision suitable VMs for workflow execution, with the objective of minimizing the rental cost and improving resource utilization. Simulation results demonstrate that the proposed algorithm significantly outperforms two state-of-the-art algorithms in terms of reducing VM rental costs and deadline violation probability, as well as improving the resource utilization efficiency.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed algorithm reduces makespan, enhances resource utilization, and improves load balancing, compared to MOHEFT and MCP, the well-known workflow scheduling algorithms of the literature.
Abstract: Cloud computing is one of the most popular distributed environments, in which, multiple powerful and heterogeneous resources are used by different user applications Task scheduling and resource provisioning are two important challenges of cloud environment, called cloud resource management Resource management is a major problem especially for scientific workflows due to their heavy calculations and dependency between their operations Several algorithms and methods have been developed to manage cloud resources In this paper, the combination of state-action-reward-state-action learning and genetic algorithm is used to manage cloud resources At the first step, the intelligent agents schedule the tasks during the learning process by exploring the workflow Then, in the resource provisioning step, each resource is assigned to an agent, and its utilization is attempted to be maximized in the learning process of its corresponding agent This is conducted by selecting the most appropriate set of the tasks that maximizes the utilization of the resource Genetic algorithm is utilized for convergence of the agents of the proposed method, and to achieve global optimization The fitness function that has been exploited by this genetic algorithm seeks to achieve more efficient resource utilization and better load balancing by observing the deadlines of the tasks The experimental results show that the proposed algorithm reduces makespan, enhances resource utilization, and improves load balancing, compared to MOHEFT and MCP, the well-known workflow scheduling algorithms of the literature

Posted Content
TL;DR: In this article, a narrative review of interpretability methods for deep learning models for medical image analysis applications is presented, which is based on the type of generated explanations and technical similarities.
Abstract: Artificial Intelligence has emerged as a useful aid in numerous clinical applications for diagnosis and treatment decisions. Deep neural networks have shown same or better performance than clinicians in many tasks owing to the rapid increase in the available data and computational power. In order to conform to the principles of trustworthy AI, it is essential that the AI system be transparent, robust, fair and ensure accountability. Current deep neural solutions are referred to as black-boxes due to a lack of understanding of the specifics concerning the decision making process. Therefore, there is a need to ensure interpretability of deep neural networks before they can be incorporated in the routine clinical workflow. In this narrative review, we utilized systematic keyword searches and domain expertise to identify nine different types of interpretability methods that have been used for understanding deep learning models for medical image analysis applications based on the type of generated explanations and technical similarities. Furthermore, we report the progress made towards evaluating the explanations produced by various interpretability methods. Finally we discuss limitations, provide guidelines for using interpretability methods and future directions concerning the interpretability of deep neural networks for medical imaging analysis.

Journal ArticleDOI
TL;DR: In this article, the authors provide guidance for performing chemometric analysis to detect and extract information relating to the chemical differences between biological samples, such as whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not.
Abstract: Raman spectroscopy is increasingly being used in biology, forensics, diagnostics, pharmaceutics and food science applications. This growth is triggered not only by improvements in the computational and experimental setups but also by the development of chemometric techniques. Chemometric techniques are the analytical processes used to detect and extract information from subtle differences in Raman spectra obtained from related samples. This information could be used to find out, for example, whether a mixture of bacterial cells contains different species, or whether a mammalian cell is healthy or not. Chemometric techniques include spectral processing (ensuring that the spectra used for the subsequent computational processes are as clean as possible) as well as the statistical analysis of the data required for finding the spectral differences that are most useful for differentiation between, for example, different cell types. For Raman spectra, this analysis process is not yet standardized, and there are many confounding pitfalls. This protocol provides guidance on how to perform a Raman spectral analysis: how to avoid these pitfalls, and strategies to circumvent problematic issues. The protocol is divided into four parts: experimental design, data preprocessing, data learning and model transfer. We exemplify our workflow using three example datasets where the spectra from individual cells were collected in single-cell mode, and one dataset where the data were collected from a raster scanning–based Raman spectral imaging experiment of mice tissue. Our aim is to help move Raman-based technologies from proof-of-concept studies toward real-world applications. Raman spectroscopy is increasingly being used in biological assays and studies. This protocol provides guidance for performing chemometric analysis to detect and extract information relating to the chemical differences between biological samples.

Journal ArticleDOI
TL;DR: An interactive visual analysis workflow for the end-to-end analysis of Imaging Mass Cytometry data that was developed in close collaboration with domain expert partners is presented and the effectiveness of the workflow and ImaCytE is shown.
Abstract: Tissue functionality is determined by the characteristics of tissue-resident cells and their interactions within their microenvironment. Imaging Mass Cytometry offers the opportunity to distinguish cell types with high precision and link them to their spatial location in intact tissues at sub-cellular resolution. This technology produces large amounts of spatially-resolved high-dimensional data, which constitutes a serious challenge for the data analysis. We present an interactive visual analysis workflow for the end-to-end analysis of Imaging Mass Cytometry data that was developed in close collaboration with domain expert partners. We implemented the presented workflow in an interactive visual analysis tool; ImaCytE. Our workflow is designed to allow the user to discriminate cell types according to their protein expression profiles and analyze their cellular microenvironments, aiding in the formulation or verification of hypotheses on tissue architecture and function. Finally, we show the effectiveness of our workflow and ImaCytE through a case study performed by a collaborating specialist.

Journal ArticleDOI
TL;DR: Workflow managers have been developed to simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing as mentioned in this paper.
Abstract: The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not just for experiments, but also for computational analysis. However, transforming data into information involves running a large number of tools, optimizing parameters, and integrating dynamically changing reference data. Workflow managers were developed in response to such challenges. They simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing. In this Perspective, we highlight key features of workflow managers, compare commonly used approaches for bioinformatics workflows, and provide a guide for computational and noncomputational users. We outline community-curated pipeline initiatives that enable novice and experienced users to perform complex, best-practice analyses without having to manually assemble workflows. In sum, we illustrate how workflow managers contribute to making computational analysis in biomedical research shareable, scalable, and reproducible.

Journal ArticleDOI
TL;DR: A dependable algorithm for scheduling workflow applications on CPCS that uses slack to recover failed tasks and allows all tasks to share the available slack in the system to improve soft-error reliability.
Abstract: Cyber–physical cloud systems (CPCS) are integrations of cyber–physical systems (CPS) and cloud computing infrastructures. Integrating CPS into cloud computing infrastructures could improve the performance in many aspects. However, new reliability and security challenges are also introduced. This fact highlights the need to develop novel methodologies to tackle these challenges in CPCS. To this end, this article is oriented toward enhancing the soft-error reliability of real-time workflows on CPCS while satisfying the lifetime reliability, security, and real-time constraints. In this article, we propose a dependable algorithm for scheduling workflow applications on CPCS. The proposed algorithm uses slack to recover failed tasks and allows all tasks to share the available slack in the system. To improve soft-error reliability, the algorithm first determines the priority of tasks, then assigns the maximum frequency to each task, and finally assigns the recoveries to tasks dynamically. Slack also can be used to utilize security services for satisfying system security requirements. The lifetime reliability constraint is met by dynamically scaling down the operating frequency of low-priority tasks. Extensive experiments on real-world workflow benchmarks demonstrate that the proposed scheme reduces the probability of failure by up to $52.1\%$ and improves the scheduling feasibility by up to $83.5\%$ compared to a number of representative approaches.

Journal ArticleDOI
TL;DR: By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given.
Abstract: Nowadays, machine learning projects have become more and more relevant to various real-world use cases. The success of complex Neural Network models depends upon many factors, as the requirement for structured and machine learning-centric project development management arises. Due to the multitude of tools available for different operational phases, responsibilities and requirements become more and more unclear. In this work, Machine Learning Operations (MLOps) technologies and tools for every part of the overall project pipeline, as well as involved roles, are examined and clearly defined. With the focus on the inter-connectivity of specific tools and comparison by well-selected requirements of MLOps, model performance, input data, and system quality metrics are briefly discussed. By identifying aspects of machine learning, which can be reused from project to project, open-source tools which help in specific parts of the pipeline, and possible combinations, an overview of support in MLOps is given. Deep learning has revolutionized the field of Image processing, and building an automated machine learning workflow for object detection is of great interest for many organizations. For this, a simple MLOps workflow for object detection with images is portrayed.

Journal ArticleDOI
TL;DR: In this paper, the authors provide guidelines for interpreting single-cell transcriptomic maps to identify cell types, states and other biologically relevant patterns with the objective of creating an annotated map of cells.
Abstract: Single-cell transcriptomics can profile thousands of cells in a single experiment and identify novel cell types, states and dynamics in a wide variety of tissues and organisms. Standard experimental protocols and analysis workflows have been developed to create single-cell transcriptomic maps from tissues. This tutorial focuses on how to interpret these data to identify cell types, states and other biologically relevant patterns with the objective of creating an annotated map of cells. We recommend a three-step workflow including automatic cell annotation (wherever possible), manual cell annotation and verification. Frequently encountered challenges are discussed, as well as strategies to address them. Guiding principles and specific recommendations for software tools and resources that can be used for each step are covered, and an R notebook is included to help run the recommended workflow. Basic familiarity with computer software is assumed, and basic knowledge of programming (e.g., in the R language) is recommended. This tutorial provides guidelines for interpreting single-cell transcriptomic maps to identify cell types, states and other biologically relevant patterns.