scispace - formally typeset
Search or ask a question

Showing papers on "Workflow published in 2013"


Proceedings ArticleDOI
23 Feb 2013
TL;DR: This paper outlines a framework that will enable crowd work that is complex, collaborative, and sustainable, and lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.
Abstract: Paid crowd work offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale. But it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework. Can we foresee a future crowd workplace in which we would want our children to participate? This paper frames the major challenges that stand in the way of this goal. Drawing on theory from organizational behavior and distributed computing, as well as direct feedback from workers, we outline a framework that will enable crowd work that is complex, collaborative, and sustainable. The framework lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.

836 citations


Journal ArticleDOI
TL;DR: An update to the taverna tool suite is provided, highlighting new features and developments in the workbench and the Taverna Server.
Abstract: The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

724 citations


Journal ArticleDOI
TL;DR: A characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics is provided, based on novel workflow profiling tools that provide detailed information about the various computational tasks that are present in the workflow.

648 citations


Journal ArticleDOI
TL;DR: Two workflow scheduling algorithms are proposed which aim to minimize the workflow execution cost while meeting a deadline and have a polynomial time complexity which make them suitable options for scheduling large workflows in IaaS Clouds.

580 citations


Journal ArticleDOI
TL;DR: Reflex is a specific implementation of astronomical scientific workflows within the Kepler workflow engine, and the overall design choices and methods can also be applied to other environments for running automated science workflows.
Abstract: Data from complex modern astronomical instruments often consist of a large number of different science and calibration files, and their reduction requires a variety of software tools The execution chain of the tools represents a complex workflow that needs to be tuned and supervised, often by individual researchers that are not necessarily experts for any specific instrument The efficiency of data reduction can be improved by using automatic workflows to organise data and execute the sequence of data reduction steps To realize such efficiency gains, we designed a system that allows intuitive representation, execution and modification of the data reduction workflow, and has facilities for inspection and interaction with the data The European Southern Observatory (ESO) has developed Reflex, an environment to automate data reduction workflows Reflex is implemented as a package of customized components for the Kepler workflow engine Kepler provides the graphical user interface to create an executable flowchart-like representation of the data reduction process Key features of Reflex are a rule-based data organiser, infrastructure to re-use results, thorough book-keeping, data progeny tracking, interactive user interfaces, and a novel concept to exploit information created during data organisation for the workflow execution Reflex includes novel concepts to increase the efficiency of astronomical data processing While Reflex is a specific implementation of astronomical scientific workflows within the Kepler workflow engine, the overall design choices and methods can also be applied to other environments for running automated science workflows

574 citations


Journal ArticleDOI
TL;DR: Reflex as discussed by the authors is an environment to automate data reduction workflows for astronomical data processing, which includes a rule-based data organiser, infrastructure to re-use results, thorough book-keeping, data progeny tracking, interactive user interfaces, and a novel concept to exploit information created during data organisation for the workflow execution.
Abstract: Context. Data from complex modern astronomical instruments often consist of a large number of di erent science and calibration files, and their reduction requires a variety of software tools. The execution chain of the tools represents a complex workflow that needs to be tuned and supervised, often by individual researchers that are not necessarily experts for any specific instrument. Aims. The e ciency of data reduction can be improved by using automatic workflows to organise data and execute a sequence of data reduction steps. To realize such e ciency gains, we designed a system that allows intuitive representation, execution and modification of the data reduction workflow, and has facilities for inspection and interaction with the data. Methods. The European Southern Observatory (ESO) has developed Reflex, an environment to automate data reduction workflows. Reflex is implemented as a package of customized components for the Kepler workflow engine. Kepler provides the graphical user interface to create an executable flowchart-like representation of the data reduction process. Key features of Reflex are a rule-based data organiser, infrastructure to re-use results, thorough book-keeping, data progeny tracking, interactive user interfaces, and a novel concept to exploit information created during data organisation for the workflow execution. Results. Automated workflows can greatly increase the e ciency of astronomical data reduction. In Reflex, workflows can be run noninteractively as a first step. Subsequent optimization can then be carried out while transparently re-using all unchanged intermediate products. We found that such workflows enable the reduction of complex data by non-expert users and minimizes mistakes due to book-keeping errors. Conclusions. Reflex includes novel concepts to increase the e ciency of astronomical data processing. While Reflex is a specific implementation of astronomical scientific workflows within the Kepler workflow engine, the overall design choices and methods can also be applied to other environments for running automated science workflows.

569 citations


Journal ArticleDOI
16 Apr 2013
TL;DR: The aim of this paper is to show how MITK evolved into a software system that is able to cover all steps of a clinical workflow including data retrieval, image analysis, diagnosis, treatment planning, intervention support, and treatment control.
Abstract: The Medical Imaging Interaction Toolkit (MITK) has been available as open-source software for almost 10 years now. In this period the requirements of software systems in the medical image processing domain have become increasingly complex. The aim of this paper is to show how MITK evolved into a software system that is able to cover all steps of a clinical workflow including data retrieval, image analysis, diagnosis, treatment planning, intervention support, and treatment control. MITK provides modularization and extensibility on different levels. In addition to the original toolkit, a module system, micro services for small, system-wide features, a service-oriented architecture based on the Open Services Gateway initiative (OSGi) standard, and an extensible and configurable application framework allow MITK to be used, extended and deployed as needed. A refined software process was implemented to deliver high-quality software, ease the fulfillment of regulatory requirements, and enable teamwork in mixed-competence teams. MITK has been applied by a worldwide community and integrated into a variety of solutions, either at the toolkit level or as an application framework with custom extensions. The MITK Workbench has been released as a highly extensible and customizable end-user application. Optional support for tool tracking, image-guided therapy, diffusion imaging as well as various external packages (e.g. CTK, DCMTK, OpenCV, SOFA, Python) is available. MITK has also been used in several FDA/CE-certified applications, which demonstrates the high-quality software and rigorous development process. MITK provides a versatile platform with a high degree of modularization and interoperability and is well suited to meet the challenging tasks of today’s and tomorrow’s clinically motivated research.

359 citations


Journal ArticleDOI
TL;DR: The design and implementation of G-Hadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters is presented.

319 citations


Journal ArticleDOI
TL;DR: It is outlined how microscopy images can be converted into a data representation suitable for machine learning, and various state-of-the-art machine-learning algorithms are introduced, highlighting recent applications in image-based screening.
Abstract: Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline.

296 citations


Journal ArticleDOI
TL;DR: The hierarchical scheduling strategy is being implemented in the SwinDeW-C cloud workflow system and demonstrating satisfactory performance, and the experimental results show that the overall performance of ACO based scheduling algorithm is better than others on three basic measurements: the optimisations rate on makespan, the optimisation rate on cost and the CPU time.
Abstract: A cloud workflow system is a type of platform service which facilitates the automation of distributed applications based on the novel cloud infrastructure. One of the most important aspects which differentiate a cloud workflow system from its other counterparts is the market-oriented business model. This is a significant innovation which brings many challenges to conventional workflow scheduling strategies. To investigate such an issue, this paper proposes a market-oriented hierarchical scheduling strategy in cloud workflow systems. Specifically, the service-level scheduling deals with the Task-to-Service assignment where tasks of individual workflow instances are mapped to cloud services in the global cloud markets based on their functional and non-functional QoS requirements; the task-level scheduling deals with the optimisation of the Task-to-VM (virtual machine) assignment in local cloud data centres where the overall running cost of cloud workflow systems will be minimised given the satisfaction of QoS constraints for individual tasks. Based on our hierarchical scheduling strategy, a package based random scheduling algorithm is presented as the candidate service-level scheduling algorithm and three representative metaheuristic based scheduling algorithms including genetic algorithm (GA), ant colony optimisation (ACO), and particle swarm optimisation (PSO) are adapted, implemented and analysed as the candidate task-level scheduling algorithms. The hierarchical scheduling strategy is being implemented in our SwinDeW-C cloud workflow system and demonstrating satisfactory performance. Meanwhile, the experimental results show that the overall performance of ACO based scheduling algorithm is better than others on three basic measurements: the optimisation rate on makespan, the optimisation rate on cost and the CPU time.

277 citations


Proceedings ArticleDOI
27 Apr 2013
TL;DR: Cascade is an automated workflow that allows crowd workers to spend as little at 20 seconds each while collectively making a taxonomy, and it is shown that on three datasets its quality is 80-90% of that of experts.
Abstract: Taxonomies are a useful and ubiquitous way of organizing information. However, creating organizational hierarchies is difficult because the process requires a global understanding of the objects to be categorized. Usually one is created by an individual or a small group of people working together for hours or even days. Unfortunately, this centralized approach does not work well for the large, quickly changing datasets found on the web. Cascade is an automated workflow that allows crowd workers to spend as little at 20 seconds each while collectively making a taxonomy. We evaluate Cascade and show that on three datasets its quality is 80-90% of that of experts. Cascade has a competitive cost to expert information architects, despite taking six times more human labor. Fortunately, this labor can be parallelized such that Cascade will run in as fast as four minutes instead of hours or days.

Journal ArticleDOI
TL;DR: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats, which supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bio informatics.
Abstract: Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. Availability: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/ EDAM_1.2.owl.

01 Jan 2013
TL;DR: This paper presents a framework for computing activity deadlines so that the overall process deadline is met and all external time constraints are satisfied.
Abstract: Time management is a critical component of workflow-based process management. Important aspects of time management include planning of workflow process execution in time, estimating workflow execution duration, avoiding deadline violations, and satisfying all external time constraints such as fixed-date constraints and upper and lower bounds for time intervals between activities. In this paper, we present a framework for computing activity deadlines so that the overall process deadline is met and all external time constraints are satisfied.

Journal ArticleDOI
TL;DR: The present companion articles were designed to allow a short practically oriented introduction into the concepts of design-based stereology and to provide recommendations for choosing the most appropriate methods to investigate a number of important disease models.
Abstract: The growing awareness of the importance of accurate morphometry in lung research has recently motivated the publication of guidelines set forth by a combined task force of the American Thoracic Society and the European Respiratory Society (20). This official ATS/ERS Research Policy Statement provides general recommendations on which stereological methods are to be used in quantitative microscopy of the lung. However, to integrate stereology into a particular experimental study design, investigators are left with the problem of how to implement this in practice. Specifically, different animal models of human lung disease require the use of different stereological techniques and may determine the mode of lung fixation, tissue processing, preparation of sections, and other things. Therefore, the present companion articles were designed to allow a short practically oriented introduction into the concepts of design-based stereology (Part 1) and to provide recommendations for choosing the most appropriate methods to investigate a number of important disease models (Part 2). Worked examples with illustrative images will facilitate the practical performance of equivalent analyses. Study algorithms provide comprehensive surveys to ensure that no essential step gets lost during the multistage workflow. Thus, with this review, we hope to close the gap between theory and practice and enhance the use of stereological techniques in pulmonary research.

Journal ArticleDOI
TL;DR: The OpenMOLE DSL is presented through the example of a toy model exploration and through the automated calibration of a real-world complex-system model in the field of geography.

Journal ArticleDOI
TL;DR: This work advances the idea of service-oriented modeling by presenting a design for a modeling service that builds from the Open Geospatial Consortium Web Processing Service (WPS) protocol, and demonstrates how the WPS protocol can be used to create modeling services, and how these modeling services can be brought into workflow environments using generic client-side code.
Abstract: Environmental modeling often requires the use of multiple data sources, models, and analysis routines coupled into a workflow to answer a research question. Coupling these computational resources can be accomplished using various tools, each requiring the developer to follow a specific protocol to ensure that components are linkable. Despite these coupling tools, it is not always straight forward to create a modeling workflow due to platform dependencies, computer architecture requirements, and programming language incompatibilities. A service-oriented approach that enables individual models to operate and interact with others using web services is one method for overcoming these challenges. This work advances the idea of service-oriented modeling by presenting a design for a modeling service that builds from the Open Geospatial Consortium (OGC) Web Processing Service (WPS) protocol. We demonstrate how the WPS protocol can be used to create modeling services, and then demonstrate how these modeling services can be brought into workflow environments using generic client-side code. We implemented this approach within the HydroModeler environment, a model coupling tool built on the Open Modeling Interface standard (version 1.4), and show how a hydrology model can be hosted as a WPS web service and used within a client-side workflow. The primary advantage of this approach is that the server-side software follows an established standard that can be leveraged and reused within multiple workflow environments and decision support systems.

Journal ArticleDOI
TL;DR: This study proposes a new approach for multi-objective workflow scheduling in clouds, and presents the hybrid PSO algorithm to optimize the scheduling performance, based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption.
Abstract: We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.

Journal ArticleDOI
01 Dec 2013
TL;DR: A novel heuristic is proposed and evaluated using simulation with four different real-world workflow applications to find a feasible plan for the execution of the workflow which would allow providers to decide whether they can agree with the specific constraints set by the user.
Abstract: In this paper, we assume an environment with multiple, heterogeneous resources, which provide services of different capabilities and of a different cost. Users want to make use of these services to execute a workflow application, within a certain deadline and budget. The problem considered in this paper is to find a feasible plan for the execution of the workflow which would allow providers to decide whether they can agree with the specific constraints set by the user. If they agree to admit the workflow, providers can allocate services for its execution in a way that both deadline and budget constraints are met while account is also taken of the existing load in the provider's environment (confirmed reservations from other users whose requests have been accepted). A novel heuristic is proposed and evaluated using simulation with four different real-world workflow applications.

Journal ArticleDOI
TL;DR: Significant improvements in TMH have rapidly reduced obstacles for its use, and it is important to grow and disseminate data underscoring the promise and effectiveness of TMH, integrate videoconferencing capabilities into electronic medical record platforms, expand TMH reimbursement, and modify licensure standards.
Abstract: Many providers are hesitant to use telemental health technologies When providers are queried, various barriers are presented, such as the clinician's skepticism about the effectiveness of telemental health (TMH), viewing telehealth technologies as inconvenient, or reporting difficulties with medical reimbursement Provider support for TMH is critical to its diffusion because clinicians often serve as the initial gatekeepers to telehealth implementation and program success In this article, we address provider concerns in three broad domains: (1) personal barriers, (2) clinical workflow and technology barriers, and (3) licensure, credentialing, and reimbursement barriers We found evidence that, although many barriers have been discussed in the literature for years, advancements in TMH have rapidly reduced obstacles for its use Improvements include extensive opportunities for training, a growing evidence base supporting positive TMH outcomes, and transformations in technologies that improve prov

Journal ArticleDOI
TL;DR: A dynamic critical‐path‐based adaptive workflow scheduling algorithm for grids, which determines efficient mapping of workflow tasks to grid resources dynamically by calculating the critical path in the workflow task graph at every step is proposed.
Abstract: SUMMARY Effective scheduling is a key concern for the execution of performance-driven grid applications such as workflows. In this paper, we first define the workflow scheduling problem and describe the existing heuristic-based and metaheuristic-based workflow scheduling strategies in grids. Then, we propose a dynamic critical-path-based adaptive workflow scheduling algorithm for grids, which determines efficient mapping of workflow tasks to grid resources dynamically by calculating the critical path in the workflow task graph at every step. Using simulation, we compared the performance of the proposed approach with the existing approaches, discussed in this paper for different types and sizes of workflows. The results demonstrate that the heuristic-based scheduling techniques can adapt to the dynamic nature of resource and avoid performance degradation in dynamically changing grid environments. Finally, we outline a hybrid heuristic combining the features of the proposed adaptive scheduling technique with metaheuristics for optimizing execution cost and time as well as meeting the users requirements to efficiently manage the dynamism and heterogeneity of the hybrid cloud environment. Copyright © 2013 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
18 May 2013
TL;DR: Seahawk is an Eclipse plugin that supports an integrated and largely automated approach to assist programmers using Stack Overflow, and formulates queries automatically from the active context in the IDE, presents a ranked and interactive list of results, and lets users import code samples in discussions through drag & drop.
Abstract: Services, such as Stack Overflow, offer a web platform to programmers for discussing technical issues, in form of Question and Answers (Q&A). Since Q&A services store the discussions, the generated “crowd knowledge” can be accessed and consumed by a large audience for a long time. Nevertheless, Q&A services are detached from the development environments used by programmers: Developers have to tap into this crowd knowledge through web browsers and cannot smoothly integrate it into their workflow. This situation hinders part of the benefits of Q&A services. To better leverage the crowd knowledge of Q&A services, we created Seahawk, an Eclipse plugin that supports an integrated and largely automated approach to assist programmers using Stack Overflow. Seahawk formulates queries automatically from the active context in the IDE, presents a ranked and interactive list of results, lets users import code samples in discussions through drag & drop and link Stack Overflow discussions and source code persistently as a support for team work. Video Demo URL: http://youtu.be/DkqhiU9FYPI.

Journal ArticleDOI
TL;DR: KNIME-CDK is an open-source plug-in for the Konstanz Information Miner, a free workflow platform that allows for efficient cross-vendor structural cheminformatics and its ease-of-use and modularity enables researchers to automate routine tasks and data analysis.
Abstract: Background: Cheminformaticians have to routinely process and analyse libraries of small molecules. Among other things, that includes the standardization of molecules, calculation of various descriptors, visualisation of molecular structures, and downstream analysis. For this purpose, scientific workflow platforms such as the Konstanz Information Miner can be used if provided with the right plug-in. A workflow-based cheminformatics tool provides the advantage of ease-of-use and interoperability between complementary cheminformatics packages within the same framework, hence facilitating the analysis process. Results: KNIME-CDK comprises functions for molecule conversion to/from common formats, generation of signatures, fingerprints, and molecular properties. It is based on the Chemistry Development Toolkit and uses the Chemical Markup Language for persistence. A comparison with the cheminformatics plug-in RDKit shows that KNIME-CDK supports a similar range of chemical classes and adds new functionality to the framework. We describe the design and integration of the plug-in, and demonstrate the usage of the nodes on ChEBI, a library of small molecules of biological interest. Conclusions: KNIME-CDK is an open-source plug-in for the Konstanz Information Miner, a free workflow platform. KNIME-CDK is build on top of the open-source Chemistry Development Toolkit and allows for efficient cross-vendor structural cheminformatics. Its ease-of-use and modularity enables researchers to automate routine tasks and data analysis, bringing complimentary cheminformatics functionality to the workflow environment.

Journal ArticleDOI
TL;DR: This work applies knowledge management to guarantee SLAs and low resource wastage in Clouds and designs and implements two methods, Case-Based Reasoning and rule-based approach, which prove feasibility as KM techniques and shows major improvements towards CBR.

Posted Content
TL;DR: This research builds a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sourced gazetteer entries and introduces a provenance-based trust model for quality assurance.
Abstract: Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sourced gazetteer entries. Using experiments based on geotagged datasets in Flickr, we find that the MapReduce-based workflow running on the spatially enabled Hadoop cluster can reduce the processing time compared with traditional desktop-based operations by an order of magnitude. We demonstrate how to use such a novel spatial-computing infrastructure to facilitate gazetteer research. In addition, we introduce a provenance-based trust model for quality assurance. This work offers new insights on enriching future gazetteers with the use of Hadoop clusters, and makes contributions in connecting GIS to the cloud computing environment for the next frontier of Big Geo-Data analytics.

01 Jan 2013
TL;DR: The Neuroscience Gateway hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines and handles the running of jobs and data management and retrieval.
Abstract: Last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyberinfrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remain a challenge due to issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate all of these barriers to entry for computational neuroscientist. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines and handles the running of jobs and data management and retrieval. This paper describes the architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end.

Book ChapterDOI
01 Jan 2013
TL;DR: The RFLP approach will be presented as the baseline for model-based design with Systems Engineering that enable close interaction and collaboration between the different engineering disciplines render resources and processes more efficient, enhance quality, and ensure that the target system ultimately meets the requirements, while reducing design cycle time and engineering lead time.
Abstract: Today, coping with the different workflows, methods and tools of this inter-disciplinary approach to product development throughout a product’s life-cycle is the key challenge for a company. There is evidently a need for requirements engineering and management, as well as model-based design and engineering. More specifically, however, what is required is a unique and integrated methodology for requirements engineering and management, functional and logical design, as well as physical design in different domains for the multi-disciplinary development process based on a Systems Engineering approach early in the design process. In this paper, the RFLP approach (Requirements – Functional – Logical – Physical) will be presented as the baseline for model-based design with Systems Engineering that enable close interaction and collaboration between the different engineering disciplines render resources and processes more efficient, enhance quality, and ensure that the target system ultimately meets the requirements, while reducing design cycle time and engineering lead time.

Journal ArticleDOI
01 Dec 2013
TL;DR: GATE Teamware enables users to carry out complex corpus annotation projects, involving distributed annotator teams, and has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects.
Abstract: This paper presents GATE Teamware--an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus annotation projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable user interface functionalities, in order to support the complex workflows and user interactions that occur in corpus annotation projects. Documents may be pre-processed automatically, so that human annotators can begin with text that has already been pre-annotated and thus making them more efficient. The user interface is simple to learn, aimed at non-experts, and runs in an ordinary web browser, without need of additional software installation. GATE Teamware has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects. It is available as on-demand service on GateCloud.net, as well as open-source for self-installation.

Journal ArticleDOI
TL;DR: In order to introduce a methodology destined to process point cloud data in a BIM environment with high accuracy, this paper describes some experiences on monumental sites documentation, generated through a plug-in written for Autodesk Revit and codenamed GreenSpider after its capability to layout points in space as if they were nodes of an ideal cobweb.
Abstract: . Since their introduction, modeling tools aimed to architectural design evolved in today’s "digital multi-purpose drawing boards" based on enhanced parametric elements able to originate whole buildings within virtual environments. Semantic splitting and elements topology are features that allow objects to be "intelligent" (i.e. self-aware of what kind of element they are and with whom they can interact), representing this way basics of Building Information Modeling (BIM), a coordinated, consistent and always up to date workflow improved in order to reach higher quality, reliability and cost reductions all over the design process. Even if BIM was originally intended for new architectures, its attitude to store semantic inter-related information can be successfully applied to existing buildings as well, especially if they deserve particular care such as Cultural Heritage sites. BIM engines can easily manage simple parametric geometries, collapsing them to standard primitives connected through hierarchical relationships: however, when components are generated by existing morphologies, for example acquiring point clouds by digital photogrammetry or laser scanning equipment, complex abstractions have to be introduced while remodeling elements by hand, since automatic feature extraction in available software is still not effective. In order to introduce a methodology destined to process point cloud data in a BIM environment with high accuracy, this paper describes some experiences on monumental sites documentation, generated through a plug-in written for Autodesk Revit and codenamed GreenSpider after its capability to layout points in space as if they were nodes of an ideal cobweb.

Journal ArticleDOI
TL;DR: PhyloGenerator is an open‐source, stand‐alone Python program that makes use of pre‐existing sequence data and taxonomic information to largely automate the estimation of phylogenies, and is a step towards an open, reproducible phylogenetic workflow.
Abstract: Summary 1. Ecologists increasingly wish to use phylogenies, but are hampered by the technical challenge of phylogeny estimation. 2. We present phyloGenerator, an open-source, stand-alone Python program, that makes use of pre-existing sequence data and taxonomic information to largely automate the estimation of phylogenies. 3. phyloGenerator allows nonspecialists to quickly and easily produce robust, repeatable, and defensible phylogenies without requiring an extensive knowledge of phylogenetics. Experienced phylogeneticists may also find it useful as a tool to conduct exploratory analyses. 4. phyloGenerator performs a number of ‘sanity checks’ on users’ output, but users should still check their outputs carefully; we give some advice on how to do so. 5. By linking a number of tools in a common framework, phyloGenerator is a step towards an open, reproducible phylogenetic workflow. 6. Bundled downloads for Windows and Mac OSX, along with the source code and an install script for Linux, can be found at http://willpearse.github.io/phyloGenerator (note the capital ‘G’).

Journal ArticleDOI
TL;DR: The NGS-based workflow developed meets the sensitivity and specificity requirements for the genetic diagnosis of HBOCS and improves on the cost-effectiveness of current approaches.
Abstract: Next-generation sequencing (NGS) is changing genetic diagnosis due to its huge sequencing capacity and cost-effectiveness. The aim of this study was to develop an NGS-based workflow for routine diagnostics for hereditary breast and ovarian cancer syndrome (HBOCS), to improve genetic testing for BRCA1 and BRCA2. A NGS-based workflow was designed using BRCA MASTR kit amplicon libraries followed by GS Junior pyrosequencing. Data analysis combined Variant Identification Pipeline freely available software and ad hoc R scripts, including a cascade of filters to generate coverage and variant calling reports. A BRCA homopolymer assay was performed in parallel. A research scheme was designed in two parts. A Training Set of 28 DNA samples containing 23 unique pathogenic mutations and 213 other variants (33 unique) was used. The workflow was validated in a set of 14 samples from HBOCS families in parallel with the current diagnostic workflow (Validation Set). The NGS-based workflow developed permitted the identification of all pathogenic mutations and genetic variants, including those located in or close to homopolymers. The use of NGS for detecting copy-number alterations was also investigated. The workflow meets the sensitivity and specificity requirements for the genetic diagnosis of HBOCS and improves on the cost-effectiveness of current approaches.