scispace - formally typeset
Search or ask a question

Showing papers on "Workflow published in 2016"


Proceedings ArticleDOI
01 Jan 2016
TL;DR: Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable, is presented.
Abstract: It is increasingly necessary for researchers in all fields to write computer code, and in order to reproduce research results, it is important that this code is published. We present Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable. We discuss various tools and use cases for notebook documents.

2,145 citations



Journal ArticleDOI
TL;DR: An evolutionary multi-objective optimization (EMO)-based algorithm is proposed to solve this workflow scheduling problem on an infrastructure as a service (IaaS) platform and can achieve significantly better solutions than existing state-of-the-art QoS optimization scheduling algorithms in most cases.
Abstract: Cloud computing provides promising platforms for executing large applications with enormous computational resources to offer on demand. In a Cloud model, users are charged based on their usage of resources and the required quality of service (QoS) specifications. Although there are many existing workflow scheduling algorithms in traditional distributed or heterogeneous computing environments, they have difficulties in being directly applied to the Cloud environments since Cloud differs from traditional heterogeneous environments by its service-based resource managing method and pay-per-use pricing strategies. In this paper, we highlight such difficulties, and model the workflow scheduling problem which optimizes both makespan and cost as a Multi-objective Optimization Problem (MOP) for the Cloud environments. We propose an evolutionary multi-objective optimization (EMO)-based algorithm to solve this workflow scheduling problem on an infrastructure as a service (IaaS) platform. Novel schemes for problem-specific encoding and population initialization, fitness evaluation and genetic operators are proposed in this algorithm. Extensive experiments on real world workflows and randomly generated workflows show that the schedules produced by our evolutionary algorithm present more stability on most of the workflows with the instance-based IaaS computing and pricing models. The results also show that our algorithm can achieve significantly better solutions than existing state-of-the-art QoS optimization scheduling algorithms in most cases. The conducted experiments are based on the on-demand instance types of Amazon EC2; however, the proposed algorithm are easy to be extended to the resources and pricing models of other IaaS services.

321 citations


Journal ArticleDOI
TL;DR: This work introduces EAGER, a time-efficient pipeline, which greatly simplifies the analysis of large-scale genomic data sets and provides features to preprocess, map, authenticate, and assess the quality of ancient DNA samples.
Abstract: The automated reconstruction of genome sequences in ancient genome analysis is a multifaceted process. Here we introduce EAGER, a time-efficient pipeline, which greatly simplifies the analysis of large-scale genomic data sets. EAGER provides features to preprocess, map, authenticate, and assess the quality of ancient DNA samples. Additionally, EAGER comprises tools to genotype samples to discover, filter, and analyze variants. EAGER encompasses both state-of-the-art tools for each step as well as new complementary tools tailored for ancient DNA data within a single integrated solution in an easily accessible format.

298 citations


Journal ArticleDOI
TL;DR: A comprehensive survey and analysis of state of the art workflow scheduling schemes for scheduling simple and scientific workflows in the cloud computing and provides a classification of the proposed schemes based on the type of scheduling algorithm applied in each scheme.

203 citations


Journal ArticleDOI
TL;DR: CcpNmr version-3, the latest software release from the Collaborative Computational Project for NMR, is presented, designed to be simple, functional and flexible, and aims to ensure that routine tasks can be performed in a straightforward manner.
Abstract: NMR spectroscopy is an indispensably powerful technique for the analysis of biomolecules under ambient conditions, both for structural- and functional studies. However, in practice the complexity of the technique has often frustrated its application by non-specialists. In this paper, we present CcpNmr version-3, the latest software release from the Collaborative Computational Project for NMR, for all aspects of NMR data analysis, including liquid- and solid-state NMR data. This software has been designed to be simple, functional and flexible, and aims to ensure that routine tasks can be performed in a straightforward manner. We have designed the software according to modern software engineering principles and leveraged the capabilities of modern graphics libraries to simplify a variety of data analysis tasks. We describe the process of backbone assignment as an example of the flexibility and simplicity of implementing workflows, as well as the toolkit used to create the necessary graphics for this workflow. The package can be downloaded from www.ccpn.ac.uk/v3-software/downloads and is freely available to all non-profit organisations.

184 citations


DOI
08 Jul 2016
TL;DR: The Common Workflow Language (CWL) is designed to express workflows for data-intensive science, such as Bioinformatics, Medical Imaging, Chemistry, Physics, and Astronomy.
Abstract: The Common Workflow Language (CWL) is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. Our goal is to create specifications that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility.CWL builds on technologies such as JSON-LD and Avro for data modeling and Docker for portable runtime environments. CWL is designed to express workflows for data-intensive science, such as Bioinformatics, Medical Imaging, Chemistry, Physics, and Astronomy.This is v1.0 of the CWL tool and workflow specification, released on 2016-07-08.

169 citations


01 Jan 2016
TL;DR: The design and control of workflow processes business process management for the service industry is universally compatible with any devices to read, allowing you to get the most less latency time to download any of the authors' books like this one.
Abstract: Thank you for downloading design and control of workflow processes business process management for the service industry. Maybe you have knowledge that, people have search hundreds times for their chosen readings like this design and control of workflow processes business process management for the service industry, but end up in infectious downloads. Rather than enjoying a good book with a cup of tea in the afternoon, instead they are facing with some malicious bugs inside their computer. design and control of workflow processes business process management for the service industry is available in our book collection an online access to it is set as public so you can download it instantly. Our books collection spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the design and control of workflow processes business process management for the service industry is universally compatible with any devices to read.

165 citations


Journal ArticleDOI
TL;DR: The R/Bioconductor package systemPipeR accelerates the extraction of reproducible analysis results from NGS experiments by making efficient use of existing software resources without limiting the user to a set of predefined methods or environments.
Abstract: Next-generation sequencing (NGS) has revolutionized how research is carried out in many areas of biology and medicine. However, the analysis of NGS data remains a major obstacle to the efficient utilization of the technology, as it requires complex multi-step processing of big data demanding considerable computational expertise from users. While substantial effort has been invested on the development of software dedicated to the individual analysis steps of NGS experiments, insufficient resources are currently available for integrating the individual software components within the widely used R/Bioconductor environment into automated workflows capable of running the analysis of most types of NGS applications from start-to-finish in a time-efficient and reproducible manner. To address this need, we have developed the R/Bioconductor package systemPipeR. It is an extensible environment for both building and running end-to-end analysis workflows with automated report generation for a wide range of NGS applications. Its unique features include a uniform workflow interface across different NGS applications, automated report generation, and support for running both R and command-line software on local computers and computer clusters. A flexible sample annotation infrastructure efficiently handles complex sample sets and experimental designs. To simplify the analysis of widely used NGS applications, the package provides pre-configured workflows and reporting templates for RNA-Seq, ChIP-Seq, VAR-Seq and Ribo-Seq. Additional workflow templates will be provided in the future. systemPipeR accelerates the extraction of reproducible analysis results from NGS experiments. By combining the capabilities of many R/Bioconductor and command-line tools, it makes efficient use of existing software resources without limiting the user to a set of predefined methods or environments. systemPipeR is freely available for all common operating systems from Bioconductor ( http://bioconductor.org/packages/devel/systemPipeR ).

159 citations


Journal ArticleDOI
TL;DR: The PPOF and simulation-based workflow help to make generative modeling, informed by powerful energy and lighting simulation engines, more accessible to designers working on regular projects and schedules to create high-performance buildings.

158 citations


Journal ArticleDOI
TL;DR: Current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects will exert a decisive influence on traceability and reproducibility of sequence data.
Abstract: DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.

Journal ArticleDOI
01 Apr 2016
TL;DR: An energy consumption model is presented for applications deployed across cloud computing platforms, and a corresponding energy-aware resource allocation algorithm is proposed for virtual machine scheduling to accomplish scientific workflow executions.
Abstract: Scientific workflows are often deployed across multiple cloud computing platforms due to their large-scale characteristic. This can be technically achieved by expanding a cloud platform. However, it is still a challenge to conduct scientific workflow executions in an energy-aware fashion across cloud platforms or even inside a cloud platform, since the cloud platform expansion will make the energy consumption a big concern. In this paper, we propose an En ergy-aware Re source Al location method, named EnReal , to address the above challenge. Basically, we leverage the dynamic deployment of virtual machines for scientific workflow executions. Specifically, an energy consumption model is presented for applications deployed across cloud computing platforms, and a corresponding energy-aware resource allocation algorithm is proposed for virtual machine scheduling to accomplish scientific workflow executions. Experimental evaluation demonstrates that the proposed method is both effective and efficient.

Journal ArticleDOI
TL;DR: The Geoscience Paper of the Future (GPF) as discussed by the authors is an approach to fully document, share, and cite all their research products including data, software, and computational provenance.
Abstract: Geoscientists now live in a world rich with digital data and methods, and their computational research cannot be fully captured in traditional publications. The Geoscience Paper of the Future (GPF) presents an approach to fully document, share, and cite all their research products including data, software, and computational provenance. This article proposes best practices for GPF authors to make data, software, and methods openly accessible, citable, and well documented. The publication of digital objects empowers scientists to manage their research products as valuable scientific assets in an open and transparent way that enables broader access by other scientists, students, decision makers, and the public. Improving documentation and dissemination of research will accelerate the pace of scientific discovery by improving the ability of others to build upon published work.

Journal ArticleDOI
TL;DR: This paper extensively surveying existing SWFS approaches in cloud and grid computing and provides a classification of cost optimization aspects and parameters of SWFS that are categorized into monetary and temporal cost parameters based on various scheduling stages to help researchers and practitioners choose the most appropriate cost optimization approach.

Journal ArticleDOI
TL;DR: Using imaging, genetic and healthcare data, examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols are provided.
Abstract: Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.

Proceedings ArticleDOI
25 Mar 2016
TL;DR: The experiments on OpenStack, a popular open-source cloud infrastructure, show that CloudSeer's efficiency and problem-detection capability are suitable for online monitoring.
Abstract: Cloud infrastructures provide a rich set of management tasks that operate computing, storage, and networking resources in the cloud. Monitoring the executions of these tasks is crucial for cloud providers to promptly find and understand problems that compromise cloud availability. However, such monitoring is challenging because there are multiple distributed service components involved in the executions. CloudSeer enables effective workflow monitoring. It takes a lightweight non-intrusive approach that purely works on interleaved logs widely existing in cloud infrastructures. CloudSeer first builds an automaton for the workflow of each management task based on normal executions, and then it checks log messages against a set of automata for workflow divergences in a streaming manner. Divergences found during the checking process indicate potential execution problems, which may or may not be accompanied by error log messages. For each potential problem, CloudSeer outputs necessary context information including the affected task automaton and related log messages hinting where the problem occurs to help further diagnosis. Our experiments on OpenStack, a popular open-source cloud infrastructure, show that CloudSeer's efficiency and problem-detection capability are suitable for online monitoring.

Journal ArticleDOI
TL;DR: A real-time workflow fault-tolerant model that extends the traditional PB model by incorporating the cloud characteristics is established and a dynamic fault-Tolerant scheduling algorithm, FASTER, is proposed for realtime workflows in the virtualized cloud.
Abstract: Clouds are becoming an important platform for scientific workflow applications. However, with many nodes being deployed in clouds, managing reliability of resources becomes a critical issue, especially for the real-time scientific workflow execution where deadlines should be satisfied. Therefore, fault tolerance in clouds is extremely essential. The PB (primary backup) based scheduling is a popular technique for fault tolerance and has effectively been used in the cluster and grid computing. However, applying this technique for real-time workflows in a virtualized cloud is much more complicated and has rarely been studied. In this paper, we address this problem. We first establish a real-time workflow fault-tolerant model that extends the traditional PB model by incorporating the cloud characteristics. Based on this model, we develop approaches for task allocation and message transmission to ensure faults can be tolerated during the workflow execution. Finally, we propose a dynamic fault-tolerant scheduling algorithm, FASTER, for real-time workflows in the virtualized cloud. FASTER has three key features: 1) it employs a backward shifting method to make full use of the idle resources and incorporates task overlapping and VM migration for high resource utilization, 2) it applies the vertical/horizontal scaling-up technique to quickly provision resources for a burst of workflows, and 3) it uses the vertical scaling-down scheme to avoid unnecessary and ineffective resource changes due to fluctuated workflow requests. We evaluate our FASTER algorithm with synthetic workflows and workflows collected from the real scientific and business applications and compare it with six baseline algorithms. The experimental results demonstrate that FASTER can effectively improve the resource utilization and schedulability even in the presence of node failures in virtualized clouds.

Journal ArticleDOI
TL;DR: The OECD QSAR Toolbox is a software application intended to be used by governments, the chemical industry and other stakeholders in filling gaps in (eco)toxicity data needed for assessing the hazards of chemicals.
Abstract: The OECD QSAR Toolbox is a software application intended to be used by governments, the chemical industry and other stakeholders in filling gaps in (eco)toxicity data needed for assessing the hazards of chemicals. The development and release of the Toolbox is a cornerstone in the computerization of hazard assessment, providing an 'all inclusive' tool for the application of category approaches, such as read-across and trend analysis, in a single software application, free of charge. The Toolbox incorporates theoretical knowledge, experimental data and computational tools from various sources into a logical workflow. The main steps of this workflow are substance identification, identification of relevant structural characteristics and potential toxic mechanisms of interaction (i.e. profiling), identification of other chemicals that have the same structural characteristics and/or mechanism (i.e. building a category), data collection for the chemicals in the category and use of the existing experimental data to fill the data gap(s). The description of the Toolbox workflow and its main functionalities is the scope of the present article.

Proceedings Article
16 Mar 2016
TL;DR: The design and implementation of Critical Feature Analytics is presented and it is demonstrated that CFA leads to significant improvements in video quality; e.g., 32% less buffering time and 12% higher bitrate than a random decision maker.
Abstract: Many prior efforts have suggested that Internet video Quality of Experience (QoE) could be dramatically improved by using data-driven prediction of video quality for different choices (e.g., CDN or bitrate) to make optimal decisions. However, building such a prediction system is challenging on two fronts. First, the relationships between video quality and observed session features can be quite complex. Second, video quality changes dynamically. Thus, we need a prediction model that is (a) expressive enough to capture these complex relationships and (b) capable of updating quality predictions in near real-time. Unfortunately, several seemingly natural solutions (e.g., simple machine learning approaches and simple network models) fail on one or more fronts. Thus, the potential benefits promised by these prior efforts remain unrealized. We address these challenges and present the design and implementation of Critical Feature Analytics (CFA). The design of CFA is driven by domain-specific insights that video quality is typically determined by a small subset of critical features whose criticality persists over several tens of minutes. This enables a scalable and accurate workflow where we automatically learn critical features for different sessions on coarse-grained timescales, while updating quality predictions in near real-time. Using a combination of a real-world pilot deployment and trace-driven analysis, we demonstrate that CFA leads to significant improvements in video quality; e.g., 32% less buffering time and 12% higher bitrate than a random decision maker.

Journal ArticleDOI
TL;DR: This paper presents a heuristic scheduling algorithm with quadratic time complexity that considers two important constraints for QoS-based workflow scheduling, time and cost, named Deadline-Budget Constrained Scheduling (DBCS).

Journal ArticleDOI
TL;DR: This paper proposes a security and cost aware scheduling (SCAS) algorithm based on the meta-heuristic optimization technique, particle swarm optimization (PSO), the coding strategy of which is devised to minimize the total workflow execution cost while meeting the deadline and risk rate constraints.

Journal ArticleDOI
19 Mar 2016
TL;DR: A fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge is proposed.
Abstract: With the intention of extending the perception and action of surgical staff inside the operating room, the medical community has expressed a growing interest towards context-aware systems. Requiring an accurate identification of the surgical workflow, such systems make use of data from a diverse set of available sensors. In this paper, we propose a fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge. We also introduce new validation metrics for assessment of workflow detection. The segmentation and recognition are based on a four-stage process. Firstly, during the learning time, a Surgical Process Model is automatically constructed from data annotations to guide the following process. Secondly, data samples are described using a combination of low-level visual cues and instrument information. Then, in the third stage, these descriptions are employed to train a set of AdaBoost classifiers capable of distinguishing one surgical phase from others. Finally, AdaBoost responses are used as input to a Hidden semi-Markov Model in order to obtain a final decision. On the MICCAI EndoVis challenge laparoscopic dataset we achieved a precision and a recall of 91 % in classification of 7 phases. Compared to the analysis based on one data type only, a combination of visual features and instrument signals allows better segmentation, reduction of the detection delay and discovery of the correct phase order.

Journal ArticleDOI
TL;DR: A stringent workflow of quality control steps during and after acquisition of T1-weighted images is proposed, which enables researchers dealing with populations that are typically affected by motion artifacts to enhance data quality and maximize sample sizes.
Abstract: In structural magnetic resonance imaging motion artifacts are common, especially when not scanning healthy young adults. It has been shown that motion affects the analysis with automated image-processing techniques (e.g. FreeSurfer). This can bias results. Several developmental and adult studies have found reduced volume and thickness of gray matter due to motion artifacts. Thus, quality control is necessary in order to ensure an acceptable level of quality and to define exclusion criteria of images (i.e. determine participants with most severe artifacts). However, information about the quality control workflow and image exclusion procedure is largely lacking in the current literature and the existing rating systems differ. Here we propose a stringent workflow of quality control steps during and after acquisition of T1-weighted images, which enables researchers dealing with populations that are typically affected by motion artifacts to enhance data quality and maximize sample sizes. As an underlying aim we established a thorough quality control rating system for T1-weighted images and applied it to the analysis of developmental clinical data using the automated processing pipeline FreeSurfer. This hands-on workflow and quality control rating system will aid researchers in minimizing motion artifacts in the final data set, and therefore enhance the quality of structural magnetic resonance imaging studies.

Patent
05 Jan 2016
TL;DR: In this article, a system includes a plurality of data collection devices to receive and provide retail store inventory data, rate of sale data, and incoming product inventory data in real time; and a programmed computer coupled to receive the data from the plurality of devices and execute code to generate and prioritize restocking workflow activities as a function of the received data.
Abstract: A system includes a plurality of data collection devices to receive and provide retail store product inventory data, rate of sale data, and incoming product inventory data in real time; and a programmed computer coupled to receive the data from the plurality of data collection devices and execute code to generate and prioritize restocking workflow activities as a function of the received data. The programmed computer further provides the restocking workflow activities to at least one of the data collection devices to direct a worker to restock a product.

Patent
11 Jan 2016
TL;DR: In this article, a system and a method for accurately and fairly assessing a worker's travel performance by analyzing the worker's voice dialog is presented, which can be used to evaluate the worker travel performance.
Abstract: Logistical operations (e.g., warehouses) may use a voice-enabled workflow to facilitate the work tasks of a staff (i.e., population) of workers. Typically, it is necessary for a worker to travel from location-to-location to complete assigned work tasks. As such, a worker's time spent travelling often correlates with the worker's overall work performance. Understanding the worker's travel performance is highly desirable, but computing a fair and accurate travel-performance metric is difficult. One reason for this is that the distance a worker travels is often unknown. The present invention embraces a system and method for accurately and fairly assessing a worker's travel performance by analyzing the worker's voice dialog.

Journal ArticleDOI
TL;DR: This paper proposes a meta-heuristic cost effective genetic algorithm that minimizes the execution cost of the workflow while meeting the deadline in cloud computing environment, and develops novel schemes for encoding, population initialization, crossover, and mutations operators of genetic algorithm.
Abstract: Cloud computing is becoming an increasingly admired paradigm that delivers high-performance computing resources over the Internet to solve the large-scale scientific problems, but still it has various challenges that need to be addressed to execute scientific workflows. The existing research mainly focused on minimizing finishing time (makespan) or minimization of cost while meeting the quality of service requirements. However, most of them do not consider essential characteristic of cloud and major issues, such as virtual machines (VMs) performance variation and acquisition delay. In this paper, we propose a meta-heuristic cost effective genetic algorithm that minimizes the execution cost of the workflow while meeting the deadline in cloud computing environment. We develop novel schemes for encoding, population initialization, crossover, and mutations operators of genetic algorithm. Our proposal considers all the essential characteristics of the cloud as well as VM performance variation and acquisition delay. Performance evaluation on some well-known scientific workflows, such as Montage, LIGO, CyberShake, and Epigenomics of different size exhibits that our proposed algorithm performs better than the current state-of-the-art algorithms.

Patent
06 Jan 2016
TL;DR: In this article, a system has a domain expert component library stored on a computer readable storage device, the component library containing connectable components that create a mobile workflow based application; an intermediate representation of a workflow application based on the workflow sequence, and a software programming language environment to perform data manipulation changes to the intermediate representation.
Abstract: A system has a domain expert component library stored on a computer readable storage device, the component library containing connectable components that create a mobile workflow based application; a domain expert user interface coupled to the domain expert component library to facilitate assembly of components in a workflow sequence; and a developer user interface coupled to: receive an intermediate representation of a workflow application based on the workflow sequence, and provide a software programming language environment to perform data manipulation changes to the intermediate representation to create the mobile workflow based application.

Patent
05 Jan 2016
TL;DR: In this article, the authors present techniques, software, apparatuses, and systems configured for application development for an application using multiple primary user interfaces (PUIs) in one or more embodiments.
Abstract: Generally discussed herein are techniques, software, apparatuses, and systems configured for application development for an application using multiple primary user interfaces. In one or more embodiments, a method can include receiving data indicating a plurality of workflow activities to be used in an application, each of the workflow activities including data corresponding to a configuration of a view model module and a list of views to be associated with the configuration, receiving data indicating a plurality of primary user interface views to associate with each of the workflow activities, receiving data indicating a connection between two of the workflow activities of the plurality of workflow activities, and producing an application model based on the received data indicating the plurality of workflow activities, the data indicates the connection between two of the workflow activities and the data indicating the plurality of primary user interface views.

Patent
05 Jan 2016
TL;DR: In this paper, the authors detect an event published to a workflow activity by a voice-based dialog view, wherein the event indicates a state of asset retrieval, navigating to a built-in asset retrieval work activity, retrieving an asset, and dismissing the workflow activity to revert to the activity associated with the voice based dialog view.
Abstract: A method includes detecting an event published to a workflow activity by a voice based dialog view, wherein the event indicates a state of asset retrieval, navigating to a built-in asset retrieval work activity, retrieving an asset, and dismissing the workflow activity to revert to a workflow activity associated with the voice based dialog view.

Proceedings ArticleDOI
07 May 2016
TL;DR: This work directly incorporates user behavior via clicks gathered automatically from telemetry data related to the actual product use in the field and uses mixed models, a statistical approach that incorporates these clustered workflows to create five representative personas.
Abstract: User Experience (UX) research teams following a user centered design approach harness personas to better understand a user's workflow by examining that user's behavior, goals, needs, wants, and frustrations To create target personas these researchers rely on workflow data from surveys, self-reports, interviews, and user observation However, this data not directly related to user behavior, weakly reflects a user's actual workflow in the product, is costly to collect, is limited to a few hundred responses, and is outdated as soon as a persona's workflows evolve To address these limitations we present a quantitative bottom-up data-driven approach to create personas First, we directly incorporate user behavior via clicks gathered automatically from telemetry data related to the actual product use in the field; since the data collection is automatic it is also cost effective Next, we aggregate 35 million clicks from 2400 users into 39,000 clickstreams and then structure them into 10 workflows via hierarchical clustering; we thus base our personas on a large data sample Finally, we use mixed models, a statistical approach that incorporates these clustered workflows to create five representative personas; updating our mixed model ensures that these personas remain current We also validated these personas with our product's user behavior experts to ensure that workflows and the persona goals represent actual product use