scispace - formally typeset
Search or ask a question

Showing papers on "Workflow published in 2018"


Journal ArticleDOI
TL;DR: While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps, which lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.
Abstract: quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations,it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multi-threading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data. The package is designed for R users needing to apply natural language processing to texts,from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.

617 citations


Journal ArticleDOI
TL;DR: The basic workflow to explain how to apply machine learning technology in the networking domain is summarized and a selective survey of the latest representative advances with explanations of their design principles and benefits is provided.
Abstract: Recently, machine learning has been used in every possible field to leverage its amazing power. For a long time, the networking and distributed computing system is the key infrastructure to provide efficient computational resources for machine learning. Networking itself can also benefit from this promising technology. This article focuses on the application of MLN, which can not only help solve the intractable old network questions but also stimulate new network applications. In this article, we summarize the basic workflow to explain how to apply machine learning technology in the networking domain. Then we provide a selective survey of the latest representative advances with explanations of their design principles and benefits. These advances are divided into several network design objectives and the detailed information of how they perform in each step of MLN workflow is presented. Finally, we shed light on the new opportunities in networking design and community building of this new inter-discipline. Our goal is to provide a broad research guideline on networking with machine learning to help motivate researchers to develop innovative algorithms, standards and frameworks.

328 citations


Journal ArticleDOI
TL;DR: The Computational 2D Materials Database (C2DB) as discussed by the authors is a large-scale database of 2D materials and van der Waals heterostructures, including tens of thousands of materials.
Abstract: We introduce the Computational 2D Materials Database (C2DB), which organises a variety of structural, thermodynamic, elastic, electronic, magnetic, and optical properties of around 1500 two-dimensional materials distributed over more than 30 different crystal structures. Material properties are systematically calculated by state-of-the art density functional theory and many-body perturbation theory (G$_0\!$W$\!_0$ and the Bethe-Salpeter Equation for $\sim$200 materials) following a semi-automated workflow for maximal consistency and transparency. The C2DB is fully open and can be browsed online or downloaded in its entirety. In this paper, we describe the workflow behind the database, present an overview of the properties and materials currently available, and explore trends and correlations in the data. Moreover, we identify a large number of new potentially synthesisable 2D materials with interesting properties targeting applications within spintronics, (opto-)electronics, and plasmonics. The C2DB offers a comprehensive and easily accessible overview of the rapidly expanding family of 2D materials and forms an ideal platform for computational modeling and design of new 2D materials and van der Waals heterostructures.

241 citations


Journal ArticleDOI
TL;DR: The challenges of a ‘Big Data’ approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance are assessed.
Abstract: Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a 'Big Data' approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence-only or presence-absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi-source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter- or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi-source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA-based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals.

212 citations


Journal ArticleDOI
TL;DR: Statistics are a vital aspect of psychological science, but the process of analyzing, formatting and reporting results is often fastidious, time-consuming and prone to errors, resulting in frustration and aversion.
Abstract: Statistics are a vital aspect of psychological science. Unfortunately, the process of analyzing, formatting and reporting results is often fastidious, time-consuming and prone to errors, resulting in frustration and aversion. On top of that, many available tools for professionals and students are either overpriced, too complex (i.e., displaying vast amounts of raw information neither demanded nor needed by the user) or too basic (i.e., not supporting advanced statistical procedures). These factors contribute to the reproducibility crisis in psychological science (Chambers et al. 2014, Etz and Vandekerckhove (2016), Szucs and Ioannidis (2016)).

187 citations


Journal ArticleDOI
TL;DR: The ability of the workflow to streamline retrosynthesis pathway design and its major role in reshaping the design, build, test and learn pipeline by driving the process toward the objective of optimizing bioproduction are demonstrated.

158 citations


Journal ArticleDOI
TL;DR: The experiment results show that the GA-PSO algorithm decreases the total execution time of the workflow tasks, in comparison with GA, PSO, HSGA,WSGA, WSGA, and MTCT algorithms, and reduces the execution cost.
Abstract: Cloud computing environment provides several on-demand services and resource sharing for clients. Business processes are managed using the workflow technology over the cloud, which represents one of the challenges in using the resources in an efficient manner due to the dependencies between the tasks. In this paper, a Hybrid GA-PSO algorithm is proposed to allocate tasks to the resources efficiently. The Hybrid GA-PSO algorithm aims to reduce the makespan and the cost and balance the load of the dependent tasks over the heterogonous resources in cloud computing environments. The experiment results show that the GA-PSO algorithm decreases the total execution time of the workflow tasks, in comparison with GA, PSO, HSGA, WSGA, and MTCT algorithms. Furthermore, it reduces the execution cost. In addition, it improves the load balancing of the workflow application over the available resources. Finally, the obtained results also proved that the proposed algorithm converges to optimal solutions faster and with higher quality compared to other algorithms.

154 citations


Journal ArticleDOI
TL;DR: This work proposes a dynamic cost-effective deadline-constrained heuristic algorithm for scheduling a scientific workflow in a public cloud that aims to exploit the advantages offered by cloud computing while taking into account the virtual machine performance variability and instance acquisition delay.
Abstract: Cloud computing, a distributed computing paradigm, enables delivery of IT resources over the Internet and follows the pay-as-you-go billing model. Workflow scheduling is one of the most challenging problems in cloud computing. Although, workflow scheduling on distributed systems like grids and clusters have been extensively studied, however, these solutions are not viable for a cloud environment. It is because, a cloud environment differs from other distributed environment in two major ways: on-demand resource provisioning and pay-as-you-go pricing model. Thus, to achieve the true benefits of workflow orchestration onto cloud resources novel approaches that can capitalize the advantages and address the challenges specific to a cloud environment needs to be developed. This work proposes a dynamic cost-effective deadline-constrained heuristic algorithm for scheduling a scientific workflow in a public cloud. The proposed technique aims to exploit the advantages offered by cloud computing while taking into account the virtual machine (VM) performance variability and instance acquisition delay to identify a just-in-time schedule of a deadline constrained scientific workflow at lesser costs. Performance evaluation on some well-known scientific workflows exhibit that the proposed algorithm delivers better performance in comparison to the current state-of-the-art heuristics.

149 citations


Journal ArticleDOI
TL;DR: This review is envisioned to provide an essential framework on modeling techniques to supplement the experimental optimization process and highlight fundamental modeling strategies, considerations, and results, as well as validation techniques using experimental data.
Abstract: Next generation, additively-manufactured metallic parts will be designed with application-optimized geometry, composition, and functionality. Manufacturers and researchers have investigated various techniques for increasing the reliability of the metal-AM process to create these components, however, understanding and manipulating the complex phenomena that occurs within the printed component during processing remains a formidable challenge-limiting the use of these unique design capabilities. Among various approaches, thermomechanical modeling has emerged as a technique for increasing the reliability of metal-AM processes, however, most literature is specialized and challenging to interpret for users unfamiliar with numerical modeling techniques. This review article highlights fundamental modeling strategies, considerations, and results, as well as validation techniques using experimental data. A discussion of emerging research areas where simulation will enhance the metal-AM optimization process is presented, as well as a potential modeling workflow for process optimization. This review is envisioned to provide an essential framework on modeling techniques to supplement the experimental optimization process.

146 citations


Journal ArticleDOI
TL;DR: This work highlights use cases, computing systems, workflow needs, and concludes by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.
Abstract: Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on those workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community...

144 citations


Journal ArticleDOI
TL;DR: This work proposes pipsCloud which combine recent Cloud computing and HPC techniques to enable large-scale RS data processing system as on-demand real-time services and adopts an adaptive RS data analysis workflow manage system.

Journal ArticleDOI
TL;DR: A workflow and a few empirical case studies for Chinese word segmentation rules of the Conditional Random Fields model are presented, and the potential of leveraging natural language processing and knowledge graph technologies for geoscience is shown.

Journal ArticleDOI
TL;DR: This suite combines three well-tested components-a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines-to achieve an unprecedented level of computational reproducibility.
Abstract: Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components—a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines—to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.

Journal ArticleDOI
TL;DR: Covidence works well and is well suited for more rigorous systematic reviews, where methodology must be adhered to and documented at each stage, and Rayyan works as a viable upgrade from a workflow using only EndNote and/or Excel.
Abstract: Health sciences librarians from two institutions conducted an assessment of Covidence, a subscription-based systematic review tool, and Rayyan, a free competitor, for abilities, strengths, and limitations. Covidence mirrors the multiphase review process, including data extraction, directly in its design. Rayyan, on the other hand, does not easily mirror this process and really only aids with the reference screening phases. Rayyan takes a minimalist approach, placing more of the logistical and workflow burden on the users themselves. Many of the peripheral features (e.g., highlighting, tagging, etc.) are comparable. Covidence works well and is well suited for more rigorous systematic reviews, where methodology must be adhered to and documented at each stage. In spite of some limited functionality and clunky features, Rayyan is a good free alternative for article screening and works as a viable upgrade from a workflow using only EndNote and/or Excel.

Proceedings ArticleDOI
25 Jun 2018
TL;DR: The vision of a knowledge graph for science is proposed, a possible infrastructure for such a knowledge graphs is presented as well as early attempts towards an implementation of the infrastructure.
Abstract: The document-centric workflows in science have reached (or already exceeded) the limits of adequacy. This is emphasized by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. This presents an opportunity to rethink the dominant paradigm of document-centric scholarly information communication and transform it into knowledge-based information flows by representing and expressing information through semantically rich, interlinked knowledge graphs. At the core of knowledge-based information flows is the creation and evolution of information models that establish a common understanding of information communicated between stakeholders as well as the integration of these technologies into the infrastructure and processes of search and information exchange in the research library of the future. By integrating these models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work as information and research results can be seamlessly interlinked with each other and better matched to complex information needs. Furthermore, research results become directly comparable and easier to reuse. As our main contribution, we propose the vision of a knowledge graph for science, present a possible infrastructure for such a knowledge graph as well as our early attempts towards an implementation of the infrastructure.

Journal ArticleDOI
TL;DR: A novel service workflow reconfiguration architecture is designed to provide guidance, which ranges from monitoring to recommendations for project implementation, and experiments are conducted to demonstrate the effectiveness and efficiency of the proposed method.

Journal ArticleDOI
TL;DR: The Bsoft software package, developed over 20 years for analyzing electron micrographs, offers a full workflow for validated single particle analysis with extensive functionality, enabling customization for specific cases.
Abstract: Cryo-electron microscopy (cryoEM) is becoming popular as a tool to solve biomolecular structures with the recent availability of direct electron detectors allowing automated acquisition of high resolution data. The Bsoft software package, developed over 20 years for analyzing electron micrographs, offers a full workflow for validated single particle analysis with extensive functionality, enabling customization for specific cases. With the increasing use of cryoEM and its automation, proper validation of the results is a bigger concern. The three major validation approaches, independent data sets, resolution-limited processing, and coherence testing, can be incorporated into any Bsoft workflow. Here, the main workflow is divided into four phases: (i) micrograph preprocessing, (ii) particle picking, (iii) particle alignment and reconstruction, and (iv) interpretation. Each of these phases represents a conceptual unit that can be automated, followed by a check point to assess the results. The aim in the first three phases is to reconstruct one or more validated maps at the best resolution possible. Map interpretation then involves identification of components, segmentation, quantification, and modeling. The algorithms in Bsoft are well established, with future plans focused on ease of use, automation and institutionalizing validation.

Journal ArticleDOI
TL;DR: The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and researches and discusses the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors’ collaborative research.
Abstract: Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during the generation of data, the development of algorithms, and the evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS). The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and researches. These ideas include experimental design principles of randomization and local control as well as the principle of stability to gain reproducibility and interpretability of algorithms and data results. We discuss the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors’ collaborative research.

Proceedings ArticleDOI
03 Jan 2018
TL;DR: The research reveals that a tamper-proof process history for improved auditability, automation of manual process steps and the decentralized nature of the system can be major advantages of a Blockchain solution for crossorganizational workflow management.
Abstract: Bringing Blockchain technology and business process management together, we follow the Design Science Research approach and design, implement, and evaluate a Blockchain prototype for crossorganizational workflow management together with a German bank. For the use case of a documentary letter of credit we describe the status quo of the process, identify areas of improvement, implement a Blockchain solution, and compare both workflows. The prototype illustrates that the process, as of today paper-based and with high manual effort, can be significantly improved. Our research reveals that a tamper-proof process history for improved auditability, automation of manual process steps and the decentralized nature of the system can be major advantages of a Blockchain solution for crossorganizational workflow management. Further, our research provides insights how Blockchain technology can be used for business process management in general.

Journal ArticleDOI
TL;DR: A meta-heuristic based algorithm for workflow scheduling that considers minimization of makespan and cost and introduces a new factor called cost time equivalence to make the bi-objective optimization more realistic.

Journal ArticleDOI
TL;DR: Signac as discussed by the authors is a framework designed to assist in the integration of various specialized data formats, tools and workflows, simplifying data access and modification through a homogeneous data interface that is largely agnostic to the data source.

Journal ArticleDOI
TL;DR: This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies.
Abstract: This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies. It was developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE program. The structure of the roadmap presented here follows the “life of a paper” workflow and includes the categories Pre-submission, Submission, Production, and Publication. The roadmap is intended to be publisher-agnostic so that all publishers can use this as a starting point when implementing JDDCP-compliant data citation. Authors reading this roadmap will also better know what to expect from publishers and how to enable their own data citations to gain maximum impact, as well as complying with what will become increasingly common funder mandates on data transparency.

Journal ArticleDOI
TL;DR: A case study of VR integrated collaboration workflow serves as an example of how firms can overcome the challenge of benefiting this new technology, and shows the investment in new hardware and software, and resistant against adoption of new technologies are main obstacles of its wide adoption.
Abstract: A successful project delivery based on building information modeling (BIM) methods is interdependent on an efficient collaboration. This relies mainly on the visualization of a BIM model, which can appear on different mediums. Visualization on mediums such as computer screens, lack some degrees of immersion which may prevent the full utilization of the model. Another problem with conventional collaboration methods such as BIM-Big room, is the need of physical presence of participants in a room. Virtual Reality as the most immersive medium for visualizing a model, has the promise to become a regular part of construction industry. The virtual presence of collaborators in a VR environment, eliminates the need of their physical presence. Simulation of on-site task can address a number of issues during construction, such as feasibility of operations. As consumer VR tools have recently been available in the market, little research has been done on their actual employment in architecture, engineering and construction (AEC) practices. This paper investigates the application of a VR based workflow in a real project. The authors collaborated with a software company to evaluate some of their advanced VR software features, such as simulation of an on-site task. A case study of VR integrated collaboration workflow serves as an example of how firms can overcome the challenge of benefiting this new technology. A group of AEC professionals involved in a project were invited to take part in the experiment, utilizing their actual project BIM models. The results of the feedbacks from the experiment confirmed the supposed benefits of a VR collaboration method. Although the participants of the study were from a wide range of disciplines, they could find benefits of the technology in their practice. It also resulted that an experimental method of clash detection via simulation, could actually be practical. The simulation of on-site tasks and perception of architectural spaces in a 1:1 scale are assets unique to VR application in AEC practices. Nevertheless, the study shows the investment in new hardware and software, and resistant against adoption of new technologies are main obstacles of its wide adoption. Further works in computer industry is required to make these technologies more affordable.

Journal ArticleDOI
TL;DR: An integration attempt is made to provide the details of a developed-and-experimented workflow in construction waste reduction at the design stage, and how the proposed workflow reduces the volume of post-optimization paneling waste by 2% at its minimum are the major findings.

Journal ArticleDOI
TL;DR: The design and implementation of the Data Mining Cloud Framework (DMCF), a data analysis system that integrates a visual workflow language and a parallel runtime with the Software-as-a-Service (SaaS) model is described.
Abstract: The extraction of useful information from data is often a complex process that can be conveniently modeled as a data analysis workflow. When very large data sets must be analyzed and/or complex data mining algorithms must be executed, data analysis workflows may take very long times to complete their execution. Therefore, efficient systems are required for the scalable execution of data analysis workflows, by exploiting the computing services of the Cloud platforms where data is increasingly being stored. The objective of the paper is to demonstrate how Cloud software technologies can be integrated to implement an effective environment for designing and executing scalable data analysis workflows. We describe the design and implementation of the Data Mining Cloud Framework (DMCF), a data analysis system that integrates a visual workflow language and a parallel runtime with the Software-as-a-Service (SaaS) model. DMCF was designed taking into account the needs of real data mining applications, with the goal of simplifying the development of data mining applications compared to generic workflow management systems that are not specifically designed for this domain. The result is a high-level environment that, through an integrated visual workflow language, minimizes the programming effort, making easier to domain experts the use of common patterns specifically designed for the development and the parallel execution of data mining applications. The DMCF's visual workflow language, system architecture and runtime mechanisms are presented. We also discuss several data mining workflows developed with DMCF and the scalability obtained executing such workflows on a public Cloud.

Journal ArticleDOI
TL;DR: A major SimVascular (SV) release is introduced that includes a new graphical user interface (GUI) designed to improve user experience and major changes to the software platform and outline features added in this new release.
Abstract: Patient-specific simulation plays an important role in cardiovascular disease research, diagnosis, surgical planning and medical device design, as well as education in cardiovascular biomechanics. simvascular is an open-source software package encompassing an entire cardiovascular modeling and simulation pipeline from image segmentation, three-dimensional (3D) solid modeling, and mesh generation, to patient-specific simulation and analysis. SimVascular is widely used for cardiovascular basic science and clinical research as well as education, following increased adoption by users and development of a GATEWAY web portal to facilitate educational access. Initial efforts of the project focused on replacing commercial packages with open-source alternatives and adding increased functionality for multiscale modeling, fluid-structure interaction (FSI), and solid modeling operations. In this paper, we introduce a major SimVascular (SV) release that includes a new graphical user interface (GUI) designed to improve user experience. Additional improvements include enhanced data/project management, interactive tools to facilitate user interaction, new boundary condition (BC) functionality, plug-in mechanism to increase modularity, a new 3D segmentation tool, and new computer-aided design (CAD)-based solid modeling capabilities. Here, we focus on major changes to the software platform and outline features added in this new release. We also briefly describe our recent experiences using SimVascular in the classroom for bioengineering education.

Journal ArticleDOI
TL;DR: A case study based on real-world third-party IaaS clouds and some well-known scientific workflows shows that the proposed approach outperforms traditional approaches, especially those considering time-invariant or bounded performance only.
Abstract: Cloud computing is becoming an increasingly popular platform for the execution of scientific applications such as scientific workflows. In contrast to grids and other traditional high-performance computing systems, clouds provide a customizable infrastructure where scientific workflows can provision desired resources ahead of the execution and set up a required software environment on virtual machines (VMs). Nevertheless, various challenges, especially its quality-of-service prediction and optimal scheduling, are yet to be addressed. Existing studies mainly consider workflow tasks to be executed with VMs having time-invariant, stochastic, or bounded performance and focus on minimizing workflow execution time or execution cost while meeting the quality-of-service requirements. This work considers time-varying performance and aims at minimizing the execution cost of workflow deployed on Infrastructure-as-a-Service clouds while satisfying Service-Level-Agreements with users. We employ time-series-based approaches to capture dynamic performance fluctuations, feed a genetic algorithm with predicted performance of VMs, and generate schedules at run-time. A case study based on real-world third-party IaaS clouds and some well-known scientific workflows show that our proposed approach outperforms traditional approaches, especially those considering time-invariant or bounded performance only.

Journal ArticleDOI
TL;DR: This work argues that the main goal of doing visual analytics is to build a mental and/or formal model of a certain piece of reality reflected in data, and proposes a detailed conceptual framework in which the visual analytics process is considered as a goal‐oriented workflow producing a model as a result.
Abstract: To complement the currently existing definitions and conceptual frameworks of visual analytics, which focus mainly on activities performed by analysts and types of techniques they use, we attempt to define the expected results of these activities. We argue that the main goal of doing visual analytics is to build a mental and/or formal model of a certain piece of reality reflected in data. The purpose of the model may be to understand, to forecast or to control this piece of reality. Based on this model-building perspective, we propose a detailed conceptual framework in which the visual analytics process is considered as a goal-oriented workflow producing a model as a result. We demonstrate how this framework can be used for performing an analytical survey of the visual analytics research field and identifying the directions and areas where further research is needed.

Journal ArticleDOI
TL;DR: A framework for leveraging the growing ubiquity of devices that can be considered part of the internet of things to inform real-time decision-making on the construction site to provide a practical and sensor-agnostic implementation of operation-level decisions by utilizing IoT networks along with advancements in modeling and simulation tools.

Journal ArticleDOI
TL;DR: A new energy-aware scheduling algorithm for time-constrained workflow tasks is proposed using the DVFS method in which the host reduces the operating frequency using different voltage levels, which performs more efficiently when evaluating metrics such as energy utilization, average execution time, average resource utilization and average SLA violation.