scispace - formally typeset
Search or ask a question

Showing papers on "Workflow published in 2019"


Journal ArticleDOI
Eric J. Topol1
TL;DR: Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient–doctor relationship or facilitate its erosion remains to be seen.
Abstract: The use of artificial intelligence, and the deep-learning subtype in particular, has been enabled by the use of labeled big data, along with markedly enhanced computing power and cloud storage, across all sectors. In medicine, this is beginning to have an impact at three levels: for clinicians, predominantly via rapid, accurate image interpretation; for health systems, by improving workflow and the potential for reducing medical errors; and for patients, by enabling them to process their own data to promote health. The current limitations, including bias, privacy and security, and lack of transparency, along with the future directions of these applications will be discussed in this article. Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient-doctor relationship or facilitate its erosion remains to be seen.

2,574 citations


Journal ArticleDOI
TL;DR: The steps of a typical single‐cell RNA‐seq analysis, including pre‐processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell‐ and gene‐level downstream analysis, are detailed.
Abstract: Single-cell RNA-seq has enabled gene expression to be studied at an unprecedented resolution. The promise of this technology is attracting a growing user base for single-cell analysis methods. As more analysis tools are becoming available, it is becoming increasingly difficult to navigate this landscape and produce an up-to-date workflow to analyse one's data. Here, we detail the steps of a typical single-cell RNA-seq analysis, including pre-processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell- and gene-level downstream analysis. We formulate current best-practice recommendations for these steps based on independent comparison studies. We have integrated these best-practice recommendations into a workflow, which we apply to a public dataset to further illustrate how these steps work in practice. Our documented case study can be found at https://www.github.com/theislab/single-cell-tutorial This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.

1,180 citations


Journal ArticleDOI
TL;DR: The Basis Set Exchange has been rewritten, utilizing modern software design and best practices, and the website updated to use the current generation of web development libraries.
Abstract: The Basis Set Exchange (BSE) has been a prominent fixture in the quantum chemistry community. First publicly available in 2007, it is recognized by both users and basis set creators as the de facto source for information related to basis sets. This popular resource has been rewritten, utilizing modern software design and best practices. The basis set data has been separated into a stand-alone library with an accessible API, and the Web site has been updated to use the current generation of web development libraries. The general layout and workflow of the Web site is preserved, while helpful features requested by the user community have been added. Overall, this design should increase adaptability and lend itself well into the future as a dependable resource for the computational chemistry community. This article will discuss the decision to rewrite the BSE, the new architecture and design, and the new features that have been added.

1,016 citations


Proceedings ArticleDOI
27 May 2019
TL;DR: A study conducted on observing software teams at Microsoft as they develop AI-based applications finds that various Microsoft teams have united this workflow into preexisting, well-evolved, Agile-like software engineering processes, providing insights about several essential engineering challenges that organizations may face in creating large-scale AI solutions for the marketplace.
Abstract: Recent advances in machine learning have stimulated widespread interest within the Information Technology sector on integrating AI capabilities into software and services. This goal has forced organizations to evolve their development processes. We report on a study that we conducted on observing software teams at Microsoft as they develop AI-based applications. We consider a nine-stage workflow process informed by prior experiences developing AI applications (e.g., search and NLP) and data science tools (e.g. application diagnostics and bug reporting). We found that various Microsoft teams have united this workflow into preexisting, well-evolved, Agile-like software engineering processes, providing insights about several essential engineering challenges that organizations may face in creating large-scale AI solutions for the marketplace. We collected some best practices from Microsoft teams to address these challenges. In addition, we have identified three aspects of the AI domain that make it fundamentally different from prior software application domains: 1) discovering, managing, and versioning the data needed for machine learning applications is much more complex and difficult than other types of software engineering, 2) model customization and model reuse require very different skills than are typically found in software teams, and 3) AI components are more difficult to handle as distinct modules than traditional software components --- models may be "entangled" in complex ways and experience non-monotonic error behavior. We believe that the lessons learned by Microsoft teams will be valuable to other organizations.

597 citations


Journal ArticleDOI
TL;DR: Visualization is helpful in each of these stages of the Bayesian workflow and it is indispensable when drawing inferences from the types of modern, high dimensional models that are used by applied researchers.
Abstract: Bayesian data analysis is about more than just computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Practical Bayesian data analysis, like all data analysis, is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is helpful in each of these stages of the Bayesian workflow and it is indispensable when drawing inferences from the types of modern, high dimensional models that are used by applied researchers.

440 citations


Journal ArticleDOI
TL;DR: NGPhylogeny.fr is developed to be more flexible in terms of tools and workflows, easily installable, and more scalable, and is managed and run by an underlying Galaxy workflow system, which makes workflows more scalable interms of number of jobs and size of data.
Abstract: Phylogeny.fr, created in 2008, has been designed to facilitate the execution of phylogenetic workflows, and is nowadays widely used. However, since its development, user needs have evolved, new tools and workflows have been published, and the number of jobs has increased dramatically, thus promoting new practices, which motivated its refactoring. We developed NGPhylogeny.fr to be more flexible in terms of tools and workflows, easily installable, and more scalable. It integrates numerous tools in their latest version (e.g. TNT, FastME, MrBayes, etc.) as well as new ones designed in the last ten years (e.g. PhyML, SMS, FastTree, trimAl, BOOSTER, etc.). These tools cover a large range of usage (sequence searching, multiple sequence alignment, model selection, tree inference and tree drawing) and a large panel of standard methods (distance, parsimony, maximum likelihood and Bayesian). They are integrated in workflows, which have been already configured ('One click'), can be customized ('Advanced'), or are built from scratch ('A la carte'). Workflows are managed and run by an underlying Galaxy workflow system, which makes workflows more scalable in terms of number of jobs and size of data. NGPhylogeny.fr is deployable on any server or personal computer, and is freely accessible at https://ngphylogeny.fr.

391 citations


Journal ArticleDOI
04 Jun 2019
TL;DR: The presented workflow allows the production of a set of preprocessed Sentinel-1 GRD data, offering a benchmark for the development of new products and operational down-streaming services based on consistent Copernicus Sentinel- 1 GRD datasets, with the aim of providing reliable information of interest to a wide range of communities.
Abstract: The Copernicus Programme has become the world’s largest space data provider, providing complete, free and open access to satellite data, mainly acquired by Sentinel satellites. Sentinel-1 Synthetic Aperture Radar (SAR) data have improved spatial resolution and high revisit frequency, making them useful for a wide range of applications. While few research applications need Sentinel-1 Ground Range Detected (GRD) data with few corrections applied, a wider range of users needs products with a standard set of corrections applied. In order to facilitate the exploitation of Sentinel-1 GRD products, there is the need to standardise procedures to preprocess SAR data to a higher processing level. A standard generic workflow to preprocess Copernicus Sentinel-1 GRD data is presented here. The workflow aims to apply a series of standard corrections, and to apply a precise orbit of acquisition, remove thermal and image border noise, perform radiometric calibration, and apply range Doppler and terrain correction. Additionally, the workflow allows spatially snapping of Sentinel-1 GRD products to Sentinel-2 MSI data grids, in order to promote the use of satellite virtual constellations by means of data fusion techniques. The presented workflow allows the production of a set of preprocessed Sentinel-1 GRD data, offering a benchmark for the development of new products and operational down-streaming services based on consistent Copernicus Sentinel-1 GRD datasets, with the aim of providing reliable information of interest to a wide range of communities.

238 citations


Journal ArticleDOI
TL;DR: A deep-Q-network model in a multi-agent reinforcement learning setting to guide the scheduling of multi-workflows over infrastructure-as-a-service clouds and experimental results suggest that the proposed approach outperforms traditional ones, e.g., non-dominated sorting genetic algorithm-II, multi-objective particle swarm optimization, and game-theoretic-based greedy algorithms, in terms of optimality of scheduling plans generated.
Abstract: Cloud Computing provides an effective platform for executing large-scale and complex workflow applications with a pay-as-you-go model. Nevertheless, various challenges, especially its optimal scheduling for multiple conflicting objectives, are yet to be addressed properly. The existing multi-objective workflow scheduling approaches are still limited in many ways, e.g., encoding is restricted by prior experts' knowledge when handling a dynamic real-time problem, which strongly influences the performance of scheduling. In this paper, we apply a deep-Q-network model in a multi-agent reinforcement learning setting to guide the scheduling of multi-workflows over infrastructure-as-a-service clouds. To optimize multi-workflow completion time and user's cost, we consider a Markov game model, which takes the number of workflow applications and heterogeneous virtual machines as state input and the maximum completion time and cost as rewards. The game model is capable of seeking for correlated equilibrium between make-span and cost criteria without prior experts' knowledge and converges to the correlated equilibrium policy in a dynamic real-time environment. To validate our proposed approach, we conduct extensive case studies based on multiple well-known scientific workflow templates and Amazon EC2 cloud. The experimental results clearly suggest that our proposed approach outperforms traditional ones, e.g., non-dominated sorting genetic algorithm-II, multi-objective particle swarm optimization, and game-theoretic-based greedy algorithms, in terms of optimality of scheduling plans generated.

203 citations


Journal ArticleDOI
TL;DR: A novel multiobjective ant colony system based on a co-evolutionary multiple populations for multiple objectives framework is proposed, which adopts two colonies to deal with these two objectives, respectively.
Abstract: Cloud workflow scheduling is significantly challenging due to not only the large scale of workflow but also the elasticity and heterogeneity of cloud resources. Moreover, the pricing model of clouds makes the execution time and execution cost two critical issues in the scheduling. This paper models the cloud workflow scheduling as a multiobjective optimization problem that optimizes both execution time and execution cost. A novel multiobjective ant colony system based on a co-evolutionary multiple populations for multiple objectives framework is proposed, which adopts two colonies to deal with these two objectives, respectively. Moreover, the proposed approach incorporates with the following three novel designs to efficiently deal with the multiobjective challenges: 1) a new pheromone update rule based on a set of nondominated solutions from a global archive to guide each colony to search its optimization objective sufficiently; 2) a complementary heuristic strategy to avoid a colony only focusing on its corresponding single optimization objective, cooperating with the pheromone update rule to balance the search of both objectives; and 3) an elite study strategy to improve the solution quality of the global archive to help further approach the global Pareto front. Experimental simulations are conducted on five types of real-world scientific workflows and consider the properties of Amazon EC2 cloud platform. The experimental results show that the proposed algorithm performs better than both some state-of-the-art multiobjective optimization approaches and the constrained optimization approaches.

190 citations


Journal ArticleDOI
TL;DR: This paper studies the joint optimization of cost and makespan of scheduling workflows in IaaS clouds, and proposes a novel workflow scheduling scheme which closely integrates the fuzzy dominance sort mechanism with the list scheduling heuristic HEFT.

182 citations


Journal ArticleDOI
19 Mar 2019-Sensors
TL;DR: This work proposed a generic approach to enabling spatiotemporal capabilities in information services for smart cities, adopted a multidisciplinary approach to achieving data integration and real-time processing, and developed a reference architecture for the development of event-driven applications.
Abstract: Smart cities are urban environments where Internet of Things (IoT) devices provide a continuous source of data about urban phenomena such as traffic and air pollution. The exploitation of the spatial properties of data enables situation and context awareness. However, the integration and analysis of data from IoT sensing devices remain a crucial challenge for the development of IoT applications in smart cities. Existing approaches provide no or limited ability to perform spatial data analysis, even when spatial information plays a significant role in decision making across many disciplines. This work proposes a generic approach to enabling spatiotemporal capabilities in information services for smart cities. We adopted a multidisciplinary approach to achieving data integration and real-time processing, and developed a reference architecture for the development of event-driven applications. This type of applications seamlessly integrates IoT sensing devices, complex event processing, and spatiotemporal analytics through a processing workflow for the detection of geographic events. Through the implementation and testing of a system prototype, built upon an existing sensor network, we demonstrated the feasibility, performance, and scalability of event-driven applications to achieve real-time processing capabilities and detect geographic events.

Journal ArticleDOI
TL;DR: Apollo as discussed by the authors is an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region.
Abstract: Genome annotation is the process of identifying the location and function of a genome's encoded features. Improving the biological accuracy of annotation is a complex and iterative process requiring researchers to review and incorporate multiple sources of information such as transcriptome alignments, predictive models based on sequence profiles, and comparisons to features found in related organisms. Because rapidly decreasing costs are enabling an ever-growing number of scientists to incorporate sequencing as a routine laboratory technique, there is widespread demand for tools that can assist in the deliberative analytical review of genomic information. To this end, we present Apollo, an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform. Some of Apollo's newer user interface features include support for real-time collaboration, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region in a manner similar to Google Docs. Its technical architecture enables Apollo to be integrated into multiple existing genomic analysis pipelines and heterogeneous laboratory workflow platforms. Finally, we consider the implications that Apollo and related applications may have on how the results of genome research are published and made accessible.

Journal ArticleDOI
TL;DR: In this article, a self-adaptive discrete particle swarm optimization algorithm with genetic algorithm operators (GA-DPSO) was proposed to optimize the data transmission time when placing data for a scientific workflow.
Abstract: Compared to traditional distributed computing environments such as grids, cloud computing provides a more cost-effective way to deploy scientific workflows. Each task of a scientific workflow requires several large datasets that are located in different datacenters, resulting in serious data transmission delays. Edge computing reduces the data transmission delays and supports the fixed storing manner for scientific workflow private datasets, but there is a bottleneck in its storage capacity. It is a challenge to combine the advantages of both edge computing and cloud computing to rationalize the data placement of scientific workflow, and optimize the data transmission time across different datacenters. In this study, a self-adaptive discrete particle swarm optimization algorithm with genetic algorithm operators (GA-DPSO) was proposed to optimize the data transmission time when placing data for a scientific workflow. This approach considered the characteristics of data placement combining edge computing and cloud computing. In addition, it considered the factors impacting transmission delay, such as the bandwidth between datacenters, the number of edge datacenters, and the storage capacity of edge datacenters. The crossover and mutation operators of the genetic algorithm were adopted to avoid the premature convergence of traditional particle swarm optimization algorithm, which enhanced the diversity of population evolution and effectively reduced the data transmission time. The experimental results show that the data placement strategy based on GA-DPSO can effectively reduce the data transmission time during workflow execution combining edge computing and cloud computing.

Journal ArticleDOI
TL;DR: Current DL approaches and research directions in rapidly advancing ultrasound technology are reviewed and the outlook on future directions and trends for DL techniques to further improve diagnosis, reduce health care cost, and optimize ultrasound clinical workflow is presented.
Abstract: Ultrasound is the most commonly used imaging modality in clinical practice because it is a nonionizing, low-cost, and portable point-of-care imaging tool that provides real-time images. Artificial intelligence (AI)–powered ultrasound is becoming more mature and getting closer to routine clinical applications in recent times because of an increased need for efficient and objective acquisition and evaluation of ultrasound images. Because ultrasound images involve operator-, patient-, and scanner-dependent variations, the adaptation of classical machine learning methods to clinical applications becomes challenging. With their self-learning ability, deep-learning (DL) methods are able to harness exponentially growing graphics processing unit computing power to identify abstract and complex imaging features. This has given rise to tremendous opportunities such as providing robust and generalizable AI models for improving image acquisition, real-time assessment of image quality, objective diagnosis and detection of diseases, and optimizing ultrasound clinical workflow. In this report, the authors review current DL approaches and research directions in rapidly advancing ultrasound technology and present their outlook on future directions and trends for DL techniques to further improve diagnosis, reduce health care cost, and optimize ultrasound clinical workflow.

Journal ArticleDOI
TL;DR: PhenoMeNal is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud and constitutes a keystone solution in cloud e-infrastructures available for metabolomics.
Abstract: PhenoMeNal constitutes a keystone solution in cloud e-infrastructures available for metabolomics. PhenoMeNal is a unique and complete solution for setting up cloud e-infrastructures through easy-to-use web interfaces that can be scaled to any custom public and private cloud environment. By harmonizing and automating software installation and configuration and through ready-to-use scientific workflow user interfaces, PhenoMeNal has succeeded in providing scientists with workflow-driven, reproducible, and shareable metabolomics data analysis platforms that are interfaced through standard data formats, representative datasets, versioned, and have been tested for reproducibility and interoperability. The elastic implementation of PhenoMeNal further allows easy adaptation of the infrastructure to other application areas and 'omics research domains. PhenoMeNal (Phenome and Metabolome aNalysis) is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud. PhenoMeNal seamlessly integrates a wide array of existing open-source tools that are tested and packaged as Docker containers through the project's continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated, and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi, and Pachyderm. Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological, and many other applied biological domains. Its computationally intensive nature has driven requirements for open data formats, data repositories, and data analysis tools. However, the rapid progress has resulted in a mosaic of independent, and sometimes incompatible, analysis methods that are difficult to connect into a useful and complete data analysis solution. CONCLUSIONS FINDINGS BACKGROUND

Posted Content
TL;DR: A novel process mining library that aims to bridge the gap between commercial and open-source process mining tools, providing integration with state-of-the-art data science libraries, e.g., pandas, numpy, scipy and scikit-learn is presented.
Abstract: Process mining, i.e., a sub-field of data science focusing on the analysis of event data generated during the execution of (business) processes, has seen a tremendous change over the past two decades. Starting off in the early 2000's, with limited to no tool support, nowadays, several software tools, i.e., both open-source, e.g., ProM and Apromore, and commercial, e.g., Disco, Celonis, ProcessGold, etc., exist. The commercial process mining tools provide limited support for implementing custom algorithms. Moreover, both commercial and open-source process mining tools are often only accessible through a graphical user interface, which hampers their usage in large-scale experimental settings. Initiatives such as RapidProM provide process mining support in the scientific workflow-based data science suite RapidMiner. However, these offer limited to no support for algorithmic customization. In the light of the aforementioned, in this paper, we present a novel process mining library, i.e. Process Mining for Python (PM4Py) that aims to bridge this gap, providing integration with state-of-the-art data science libraries, e.g., pandas, numpy, scipy and scikit-learn. We provide a global overview of the architecture and functionality of PM4Py, accompanied by some representative examples of its usage.

Journal ArticleDOI
TL;DR: The study explores how employing distributed ledger technology (DLT) could be advantageous in the BIM working environment by reinforcing network security, providing more reliable data storage and management of permissions, and ensuring change tracing and data ownership.
Abstract: Blockchain is a relatively new concept that originated from the first cryptocurrency known as Bitcoin and was soon noted to have a much wider range of applications than just serving as the platform for digital cryptocurrency. A blockchain (BC) is essentially a decentralized ledger that records every transaction made in the network, known as a ‘block’, the body of which comprises encrypted data of the entire transaction history. The implementation of decentralized technology in any industry would require augmented security, enforce accountability, and could potentially accelerate a shift in workflow dynamics from the current hierarchical structure to a decentralized, cooperative chain of command and affect a cultural and societal change by encouraging trust and transparency. This paper presents an evaluation survey of blockchain technology and its applications in the built environment and examines the potential integration with the BIM process. Moreover, the study explores how employing distributed ledger technology (DLT) could be advantageous in the BIM working environment by reinforcing network security, providing more reliable data storage and management of permissions, and ensuring change tracing and data ownership. The study discusses the basic fundamentals of distributed ledgers, their potential future applications and current advances, and their classification based on inherent characteristics of consensus reaching and permission management. Furthermore, the paper evaluates the potential application of BC technologies in enhancing the framework for automating the construction design review process such as smart contract technologies and Hyperledger Fabric, as well as discussing the pros, cons, and possible future research directions.

Journal ArticleDOI
TL;DR: A smart contract for establishing the trust of process executions that fits into the IoT environment is presented and a consensus approach with selected validators extended from Practical Byzantine Fault Tolerance (PBFT) is introduced to address time and prejudice challenges.
Abstract: Innovating business processes involves cutting-edge technologies where the Internet of Things (IoT) and Blockchain are technological breakthroughs. IoT is envisioned as a global network infrastructure consisting of numerous connected devices over the Internet. Many attempts have been made to improve and adapt business workflows for best utilizing IoT services. One possible solution is to digitize and automate internal processes using IoT services, in which Blockchain smart contract is a viable solution to establish the trust of process executions without intermediaries. Modern business processes are composed of disparate services; many of them tend to be delivered based on IoT. Interoperating with such services poses major challenges: 1) time for finality settlement of transactions is unpredictable and usually experiencing delay; 2) several implementations of permissioned Blockchain pose a major concern of trust regarding nodes that perform consensus; and 3) trust of process executions and IoT information is the major factor to the success of modern business processes, which require the composition of distributed IoT services. Traditional business processes are mostly managed by a single entity, which induces the problem of trust of process executions. In this paper, a smart contract for establishing the trust of process executions that fits into the IoT environment is presented. A consensus approach with selected validators extended from Practical Byzantine Fault Tolerance (PBFT) is introduced to address time and prejudice challenges.

Journal ArticleDOI
TL;DR: A new framework for an integrated assessment modeling platform aimed at facilitating the highest level of openness for scientific analysis is presented, bridging the need for transparency with efficient data processing and powerful numerical solvers.
Abstract: The MESSAGE Integrated Assessment Model (IAM) developed by IIASA has been a central tool of energy-environment-economy systems analysis in the global scientific and policy arena. It played a major role in the Assessment Reports of the Intergovernmental Panel on Climate Change (IPCC); it provided marker scenarios of the Representative Concentration Pathways (RCPs) and the Shared Socio-Economic Pathways (SSPs); and it underpinned the analysis of the Global Energy Assessment (GEA). Alas, to provide relevant analysis for current and future challenges, numerical models of human and earth systems need to support higher spatial and temporal resolution, facilitate integration of data sources and methodologies across disciplines, and become open and transparent regarding the underlying data, methods, and the scientific workflow. In this manuscript, we present the building blocks of a new framework for an integrated assessment modeling platform; the \ecosystem" comprises: i) an open-source GAMS implementation of the MESSAGE energy++ system model integrated with the MACRO economic model; ii) a Java/database backend for version-controlled data management, iii) interfaces for the scientific programming languages Python & R for efficient input data and results processing workflows; and iv) a web-browser-based user interface for model/scenario management and intuitive \drag-and-drop" visualization of results. The framework aims to facilitate the highest level of openness for scientific analysis, bridging the need for transparency with efficient data processing and powerful numerical solvers. The platform is geared towards easy integration of data sources and models across disciplines, spatial scales and temporal disaggregation levels. All tools apply best-practice in collaborative software development, and comprehensive documentation of all building blocks and scripts is generated directly from the GAMS equations and the Java/Python/R source code.

Journal ArticleDOI
TL;DR: It is presented here how several recent techniques relying on machine and deep learning can be used to analyze the activities taking place during surgery, using videos captured from either endoscopic or ceiling-mounted cameras.
Abstract: Recent years have seen tremendous progress in artificial intelligence (AI), such as with the automatic and real-time recognition of objects and activities in videos in the field of computer vision. Due to its increasing digitalization, the operating room (OR) promises to directly benefit from this progress in the form of new assistance tools that can enhance the abilities and performance of surgical teams. Key for such tools is the recognition of the surgical workflow, because efficient assistance by an AI system requires this system to be aware of the surgical context, namely of all activities taking place inside the operating room. We present here how several recent techniques relying on machine and deep learning can be used to analyze the activities taking place during surgery, using videos captured from either endoscopic or ceiling-mounted cameras. We also present two potential clinical applications that we are developing at the University of Strasbourg with our clinical partners.

Journal ArticleDOI
TL;DR: A new version (version 2) of the genomic dose-response analysis software, BMDExpress, has been created, which addresses the increasing use of transcriptomic dose- response data in toxicology, drug design, risk assessment and translational research.
Abstract: Summary A new version (version 2) of the genomic dose-response analysis software, BMDExpress, has been created. The software addresses the increasing use of transcriptomic dose-response data in toxicology, drug design, risk assessment and translational research. In this new version, we have implemented additional statistical filtering options (e.g. Williams' trend test), curve fitting models, Linux and Macintosh compatibility and support for additional transcriptomic platforms with up-to-date gene annotations. Furthermore, we have implemented extensive data visualizations, on-the-fly data filtering, and a batch-wise analysis workflow. We have also significantly re-engineered the code base to reflect contemporary software engineering practices and streamline future development. The first version of BMDExpress was developed in 2007 to meet an unmet demand for easy-to-use transcriptomic dose-response analysis software. Since its original release, however, transcriptomic platforms, technologies, pathway annotations and quantitative methods for data analysis have undergone a large change necessitating a significant re-development of BMDExpress. To that end, as of 2016, the National Toxicology Program assumed stewardship of BMDExpress. The result is a modernized and updated BMDExpress 2 that addresses the needs of the growing toxicogenomics user community. Availability and implementation BMDExpress 2 is available at https://github.com/auerbachs/BMDExpress-2/releases. Supplementary information Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: A deadline and cost-aware scheduling algorithm that minimizes the execution cost of a workflow under deadline constraints in the infrastructure as a service (IaaS) model and performs well compared to state-of-the-art algorithms is proposed.
Abstract: Large-scale applications of Internet of things (IoT), which require considerable computing tasks and storage resources, are increasingly deployed in cloud environments. Compared with the traditional computing model, characteristics of the cloud such as pay-as-you-go, unlimited expansion, and dynamic acquisition represent different conveniences for these applications using the IoT architecture. One of the major challenges is to satisfy the quality of service requirements while assigning resources to tasks. In this paper, we propose a deadline and cost-aware scheduling algorithm that minimizes the execution cost of a workflow under deadline constraints in the infrastructure as a service (IaaS) model. Considering the virtual machine (VM) performance variation and acquisition delay, we first divide tasks into different levels according to the topological structure so that no dependency exists between tasks at the same level. Three strings are used to code the genes in the proposed algorithm to better reflect the heterogeneous and resilient characteristics of cloud environments. Then, HEFT is used to generate individuals with the minimum completion time and cost. Novel schemes are developed for crossover and mutation to increase the diversity of the solutions. Based on this process, a task scheduling method that considers cost and deadlines is proposed. Experiments on workflows that simulate the structured tasks of the IoT demonstrate that our algorithm achieves a high success rate and performs well compared to state-of-the-art algorithms.

Journal ArticleDOI
TL;DR: The states of the art are presented to identify emerging research topics, challenges, and promising applications in integrating BCT into the development of BPM.
Abstract: In addition to functionalities, business process management (BPM) involves several key indicators such as openness, security, flexibility, and scalability. Optimizing system performance is becoming a great challenge for an ever-increasing large-scale distributed application system in the digital economy on the Internet of Things (IoT) era. In a centralized BPM, many indicators, such as security and openness, or cost and flexibility, are conflicting with each other. For example, inviting new partners across enterprises, domains, and regions to form a service workflow exposes new risks and needs additional security mechanisms for scrutiny; enhancing the flexibility of business workflow compositions increases the cost of security assurance. Blockchain technology (BCT) has thrown the light on the development of vital solutions to various BPM problems. BCT has to be integrated with other BPM system components that often involve IoT devices to implement specified functionalities related to the application. Currently, the potentials of using BCT have been explored although still at an early stage. In this paper, the states of the art are presented to identify emerging research topics, challenges, and promising applications in integrating BCT into the development of BPM.

Journal ArticleDOI
TL;DR: The new functionality of Pyteomics developed during the time since its introduction, a freely available open-source library providing Python interfaces to proteomic data, is summarized.
Abstract: Many of the novel ideas that drive today’s proteomic technologies are focused essentially on experimental or data-processing workflows. The latter are implemented and published in a number of ways, from custom scripts and programs, to projects built using general-purpose or specialized workflow engines; a large part of routine data processing is performed manually or with custom scripts that remain unpublished. Facilitating the development of reproducible data-processing workflows becomes essential for increasing the efficiency of proteomic research. To assist in overcoming the bioinformatics challenges in the daily practice of proteomic laboratories, 5 years ago we developed and announced Pyteomics, a freely available open-source library providing Python interfaces to proteomic data. We summarize the new functionality of Pyteomics developed during the time since its introduction.

Journal ArticleDOI
TL;DR: A new heuristic scheduling algorithm, Budget Deadline Aware Scheduling (BDAS), is introduced that addresses eScience workflow scheduling under budget and deadline constraints in Infrastructure as a Service (IaaS) clouds while introducing a tunable cost-time trade off over heterogeneous instances.
Abstract: Basic science is becoming ever more computationally intensive, increasing the need for large-scale compute and storage resources, be they within a High Performance Computer cluster, or more recently within the cloud. In most cases, large scale scientific computation is represented as a workflow for scheduling and runtime provisioning. Such scheduling become an even more challenging problem on cloud systems due to the dynamic nature of the cloud, in particular, the elasticity, the pricing models (both static and dynamic), the non-homogeneous resource types, the vast array of services, and virtualization. This mapping of workflow tasks on to a set of provisioned instances is an example of the general scheduling problem and is NP-complete. In addition, we also need to ensure that certain runtime constraints are met - the most typical being the cost of the computation and the time which that computation requires to complete. In this article, we introduce a new heuristic scheduling algorithm, Budget Deadline Aware Scheduling (BDAS), that addresses eScience workflow scheduling under budget and deadline constraints in Infrastructure as a Service (IaaS) clouds. The novelty of our work is satisfying both budget and deadline constraints while introducing a tunable cost-time trade off over heterogeneous instances. In addition, we study the stability and robustness of our algorithm by performing sensitivity analysis. The results demonstrate that overall BDAS finds a viable schedule for more than 40000 test cases accomplishing both defined constraints: budget and deadline. Moreover, our algorithm achieves a 17.0-23.8 percent higher success rate when compared to state of the art algorithms.

Journal ArticleDOI
TL;DR: This paper proposes a single-objective workflow scheduling optimization approach called DCOH (deadline-constrained cost optimization for hybrid clouds) for minimizing the monetary cost of scheduling workflows under deadline constraint and a multi-objectives workflow scheduling Optimization approach called MOH (multi-Objective optimization for hybrids clouds) that both consider makespan and monetary cost simultaneously.

Journal ArticleDOI
TL;DR: Caterpillar as discussed by the authors is a blockchain-based BPMN execution engine that supports the creation of instances of a process model and allows users to monitor the state of process instances and to execute tasks thereof.
Abstract: Blockchain platforms, such as Ethereum, allow a set of actors to maintain a ledger of transactions without relying on a central authority and to deploy scripts, called smart contracts, that are executed whenever certain transactions occur. These features can be used as basic building blocks for executing collaborative business processes between mutually untrusting parties. However, implementing business processes using the low-level primitives provided by blockchain platforms is cumbersome and error-prone. In contrast, established business process management systems, such as those based on the standard Business Process Model and Notation (BPMN), provide convenient abstractions for rapid development of process-oriented applications. This article demonstrates how to combine the advantages of a business process management system with those of a blockchain platform. The article introduces a blockchain-based BPMN execution engine, namely Caterpillar. Like any BPMN execution engine, Caterpillar supports the creation of instances of a process model and allows users to monitor the state of process instances and to execute tasks thereof. The specificity of Caterpillar is that the state of each process instance is maintained on the (Ethereum) blockchain and the workflow routing is performed by smart contracts generated by a BPMN-to-Solidity compiler. The Caterpillar compiler supports a large array of BPMN constructs, including subprocesses, multi-instances activities and event handlers. The paper describes the architecture of Caterpillar, and the interfaces it provides to support the monitoring of process instances, the allocation and execution of work items, and the execution of service tasks.

Journal ArticleDOI
TL;DR: Meta-analyses provide informed estimates for biological outcomes and the range of their variability, which are critical for the hypothesis generation and evidence-driven design of translational studies, as well as development of computational models.
Abstract: Basic life science literature is rich with information, however methodically quantitative attempts to organize this information are rare. Unlike clinical research, where consolidation efforts are facilitated by systematic review and meta-analysis, the basic sciences seldom use such rigorous quantitative methods. The goal of this study is to present a brief theoretical foundation, computational resources and workflow outline along with a working example for performing systematic or rapid reviews of basic research followed by meta-analysis. Conventional meta-analytic techniques are extended to accommodate methods and practices found in basic research. Emphasis is placed on handling heterogeneity that is inherently prevalent in studies that use diverse experimental designs and models. We introduce MetaLab, a meta-analytic toolbox developed in MATLAB R2016b which implements the methods described in this methodology and is provided for researchers and statisticians at Git repository (https://github.com/NMikolajewicz/MetaLab). Through the course of the manuscript, a rapid review of intracellular ATP concentrations in osteoblasts is used as an example to demonstrate workflow, intermediate and final outcomes of basic research meta-analyses. In addition, the features pertaining to larger datasets are illustrated with a systematic review of mechanically-stimulated ATP release kinetics in mammalian cells. We discuss the criteria required to ensure outcome validity, as well as exploratory methods to identify influential experimental and biological factors. Thus, meta-analyses provide informed estimates for biological outcomes and the range of their variability, which are critical for the hypothesis generation and evidence-driven design of translational studies, as well as development of computational models.

Posted ContentDOI
17 Jun 2019-bioRxiv
TL;DR: This work investigates algorithm choices for the challenges of pre-processing, and describes a workflow that balances efficiency and accuracy, and demonstrates its flexibility by showing how it can be used for RNA velocity analyses.
Abstract: Analysis of single-cell RNA-seq data begins with the pre-processing of reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.

Journal ArticleDOI
TL;DR: In this review, the smart planning tools in current clinical use are summarized in 3 main categories: automated rule implementation and reasoning, modeling of prior knowledge in clinical practice, and multicriteria optimization.
Abstract: Treatment planning is an essential step of the radiotherapy workflow. It has become more sophisticated over the past couple of decades with the help of computer science, enabling planners to design highly complex radiotherapy plans to minimize the normal tissue damage while persevering sufficient tumor control. As a result, treatment planning has become more labor intensive, requiring hours or even days of planner effort to optimize an individual patient case in a trial-and-error fashion. More recently, artificial intelligence has been utilized to automate and improve various aspects of medical science. For radiotherapy treatment planning, many algorithms have been developed to better support planners. These algorithms focus on automating the planning process and/or optimizing dosimetric trade-offs, and they have already made great impact on improving treatment planning efficiency and plan quality consistency. In this review, the smart planning tools in current clinical use are summarized in 3 main categories: automated rule implementation and reasoning, modeling of prior knowledge in clinical practice, and multicriteria optimization. Novel artificial intelligence-based treatment planning applications, such as deep learning-based algorithms and emerging research directions, are also reviewed. Finally, the challenges of artificial intelligence-based treatment planning are discussed for future works.