scispace - formally typeset
Search or ask a question

Showing papers on "Workflow published in 2022"


Posted ContentDOI
13 Jun 2022-bioRxiv
TL;DR: The Computational Anatomy Toolbox (CAT) is introduced - a powerful suite of tools for morphometric analyses with an intuitive graphical user interface, but also usable as a shell script.
Abstract: A large range of sophisticated brain image analysis tools have been developed by the neuroscience community, greatly advancing the field of human brain mapping. Here we introduce the Computational Anatomy Toolbox (CAT) - a powerful suite of tools for morphometric analyses with an intuitive graphical user interface, but also usable as a shell script. CAT is suitable for beginners, casual users, experts, and developers alike providing a comprehensive set of analysis options, workflows, and integrated pipelines. The available analysis streams – illustrated on an example dataset – allow for voxel-based, surface-based, as well as region-based morphometric analyses. Importantly, CAT includes various quality control options and covers the entire analysis workflow, from cross-sectional or longitudinal data processing, to the statistical analysis, and visualization of results. The overarching aim of this article is to provide a complete description of CAT, while, at the same time, offering a citable standard reference.

269 citations


Journal ArticleDOI
TL;DR: In this paper , the authors present a comprehensive tutorial written in the form of a step-by-step guide starting from experimental planning, through sample selection and handling, instrument setup, data acquisition, spectra analysis, and results presentation.
Abstract: There is a growing concern within the surface science community that the massive increase in the number of XPS articles over the last few decades is accompanied by a decrease in work quality including in many cases meaningless chemical bond assignment. Should this trend continue, it would have disastrous consequences for scientific research. While there are many factors responsible for this situation, the lack of insight of physical principles combined with seeming ease of XPS operation and insufficient training are certainly the major ones. To counter that, we offer a comprehensive tutorial written in the form of a step-by-step guide starting from experimental planning, through sample selection and handling, instrument setup, data acquisition, spectra analysis, and results presentation. Six application examples highlight the broad range of research questions that can be answered by XPS. The topic selection and the discussion level are intended to be accessible for novices yet challenging possible preconceptions of experienced practitioners. The analyses of thin film samples are chosen for model cases as this is from where the bulk of XPS reports presently emanate and also where the author's key expertise lies. At the same time, the majority of discussed topics is applicable to surface science in general and is, thus, of relevance for the analyses of any type of sample and material class. The tutorial contains ca. 160 original spectra and over 290 references for further reading. Particular attention is paid to the correct workflow, development of good research practices, and solid knowledge of factors that impact the quality and reliability of the obtained information. What matters in the end is that the conclusions from the analysis can be trusted. Our aspiration is that after reading this tutorial each practitioner will be able to perform error-free data analysis and draw meaningful insights from the rich well of XPS.

143 citations


Journal ArticleDOI
Enis Afgan, Anton Nekrutenko, Björn Grüning, Daniel Blankenberg, Jeremy Goecks, Michael C. Schatz, Alexander E. Ostrovsky, Alexandru Mahmoud, Andrew Lonie, Anna Syme, Anne Fouilloux, Anthony Bretaudeau, Anup Kumar, Arthur C. Eschenlauer, Assunta D. Desanto, Aysam Guerler, Beatriz Serrano-Solano, Bérénice Batut, Bradley W. Langhorst, Bridget Carr, Bryan Raubenolt, Cameron J. Hyde, Catherine J. Bromhead, Christopher B. Barnett, Coline Royaux, Cristóbal L. García Gallardo, Daniel Fornika, Dannon Baker, Dave Bouvier, Dave Clements, David A. de Lima Morais, David Lopez Tabernero, Delphine Larivière, E. Nasr, Federico Zambelli, Florian Heyl, Fotis Psomopoulos, Frederik Coppens, Gareth Price, Gianmauro Cuccuru, Gildas Le Corguillé, Gregory Von Kuster, Gulsum Gudukbay, Helena Rasche, Hans-Rudolf Hotz, Ignacio Eguinoa, Igor V. Makunin, Isuru Ranawaka, James Taylor, Jayadev Joshi, Jennifer Hillman-Jackson, John Chilton, Kaivan Kamali, Keith Suderman, Krzysztof Poterlowicz, Yvan Le Bras, Lucille Lopez-Delisle, Luke Sargent, Madeline E. Bassetti, M. A. Tangaro, Marius Van Den Beek, Martin Čech, Matthias Bernt, Matthias Fahrner, Mehmet Tekman, Melanie Föll, Michael R. Crusoe, Miguel Angel Roncoroni, N. K. Kucher, Nathaniel Coraor, Nicholas Stoler, Nick Rhodes, Nicola Soranzo, Niko Pinter, Nuwan Goonasekera, Pablo Moreno, Pavankumar Videm, Petera Melanie, Pietro Mandreoli, Pratik D. Jagtap, Qiang Gu, Ralf J. M. Weber, Ross Lazarus, Ruben H.P. Vorderman, Saskia Hiltemann, Sergey Golitsynskiy, Shilpa Garg, Simon Bray, Simon Gladman, Simone Leo, Subina Mehta, Timothy J. Griffin, Vahid Jalili, Yves Vandenbrouck, Vi-Kwei Wen, Vijaykrishna Nagampalli, W. Bacon, W. L. De Koning, Wolf-Martin Maier, P. J. Briggs 
TL;DR: Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools.
Abstract: Abstract Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.

124 citations


Journal ArticleDOI
TL;DR: SLEAP as mentioned in this paper is a machine learning system for multi-animal pose tracking, which enables versatile workflows for data labeling, model training and inference on previously unseen data, including a graphical user interface, a standardized data model, a reproducible configuration system, over 30 model architectures, two approaches to part grouping and two approach to identity tracking.
Abstract: The desire to understand how the brain generates and patterns behavior has driven rapid methodological innovation in tools to quantify natural animal behavior. While advances in deep learning and computer vision have enabled markerless pose estimation in individual animals, extending these to multiple animals presents unique challenges for studies of social behaviors or animals in their natural environments. Here we present Social LEAP Estimates Animal Poses (SLEAP), a machine learning system for multi-animal pose tracking. This system enables versatile workflows for data labeling, model training and inference on previously unseen data. SLEAP features an accessible graphical user interface, a standardized data model, a reproducible configuration system, over 30 model architectures, two approaches to part grouping and two approaches to identity tracking. We applied SLEAP to seven datasets across flies, bees, mice and gerbils to systematically evaluate each approach and architecture, and we compare it with other existing approaches. SLEAP achieves greater accuracy and speeds of more than 800 frames per second, with latencies of less than 3.5 ms at full 1,024 × 1,024 image resolution. This makes SLEAP usable for real-time applications, which we demonstrate by controlling the behavior of one animal on the basis of the tracking and detection of social interactions with another animal.

116 citations


Journal ArticleDOI
TL;DR: Gal as mentioned in this paper is a mature, browser accessible workbench for scientific computing, which enables scientists to share, analyze and visualize their own data, with minimal technical impediments. But it does not support large-scale analyses with many files.
Abstract: Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.

110 citations


Journal ArticleDOI
TL;DR: A comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research is provided in this article , which covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (i.e., the data, learning program, and framework), testing workflow (i., test generation and test evaluation), and application scenarios (e., autonomous driving, machine translation).
Abstract: This paper provides a comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing.

98 citations


Proceedings ArticleDOI
27 Apr 2022
TL;DR: It was found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online.
Abstract: Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants’ feedback.

86 citations


Journal ArticleDOI
TL;DR: SLEAP as discussed by the authors is a machine learning system for multi-animal pose tracking, which enables versatile workflows for data labeling, model training and inference on previously unseen data, including a graphical user interface, a standardized data model, a reproducible configuration system, over 30 model architectures, two approaches to part grouping and two approach to identity tracking.
Abstract: The desire to understand how the brain generates and patterns behavior has driven rapid methodological innovation in tools to quantify natural animal behavior. While advances in deep learning and computer vision have enabled markerless pose estimation in individual animals, extending these to multiple animals presents unique challenges for studies of social behaviors or animals in their natural environments. Here we present Social LEAP Estimates Animal Poses (SLEAP), a machine learning system for multi-animal pose tracking. This system enables versatile workflows for data labeling, model training and inference on previously unseen data. SLEAP features an accessible graphical user interface, a standardized data model, a reproducible configuration system, over 30 model architectures, two approaches to part grouping and two approaches to identity tracking. We applied SLEAP to seven datasets across flies, bees, mice and gerbils to systematically evaluate each approach and architecture, and we compare it with other existing approaches. SLEAP achieves greater accuracy and speeds of more than 800 frames per second, with latencies of less than 3.5 ms at full 1,024 × 1,024 image resolution. This makes SLEAP usable for real-time applications, which we demonstrate by controlling the behavior of one animal on the basis of the tracking and detection of social interactions with another animal.

86 citations


Journal ArticleDOI
TL;DR: In this article , the authors argue that animal ecologists can capitalize on large datasets generated by modern sensors by combining machine learning approaches with domain knowledge, which could improve inputs for ecological models and lead to integrated hybrid modeling tools.
Abstract: Inexpensive and accessible sensors are accelerating data acquisition in animal ecology. These technologies hold great potential for large-scale ecological understanding, but are limited by current processing approaches which inefficiently distill data into relevant information. We argue that animal ecologists can capitalize on large datasets generated by modern sensors by combining machine learning approaches with domain knowledge. Incorporating machine learning into ecological workflows could improve inputs for ecological models and lead to integrated hybrid modeling tools. This approach will require close interdisciplinary collaboration to ensure the quality of novel approaches and train a new generation of data scientists in ecology and conservation.

83 citations


Journal ArticleDOI
TL;DR: The Clinical Knowledge Graph (CKG) as mentioned in this paper is an open-source platform comprising close to 20 million nodes and 220 million relationships that represent relevant experimental data, public databases and literature.
Abstract: Abstract Implementing precision medicine hinges on the integration of omics data, such as proteomics, into the clinical decision-making process, but the quantity and diversity of biomedical data, and the spread of clinically relevant knowledge across multiple biomedical databases and publications, pose a challenge to data integration. Here we present the Clinical Knowledge Graph (CKG), an open-source platform currently comprising close to 20 million nodes and 220 million relationships that represent relevant experimental data, public databases and literature. The graph structure provides a flexible data model that is easily extendable to new nodes and relationships as new databases become available. The CKG incorporates statistical and machine learning algorithms that accelerate the analysis and interpretation of typical proteomics workflows. Using a set of proof-of-concept biomarker studies, we show how the CKG might augment and enrich proteomics data and help inform clinical decision-making.

75 citations


Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed secure Artificial Intelligence of Things (AIoT) for implicit group recommendations (SAIoT-GRs), which is able to maximize the advantages of the two modules.
Abstract: The emergence of Artificial Intelligence of Things (AIoT) has provided novel insights for many social computing applications, such as group recommender systems. As the distances between people have been greatly shortened, there has been more general demand for the provision of personalized services aimed at groups instead of individuals. The existing methods for capturing group-level preference features from individuals have mostly been established via aggregation and face two challenges: 1) secure data management workflows are absent and 2) implicit preference feedback is ignored. To tackle these current difficulties, this article proposes secure AIoT for implicit group recommendations (SAIoT-GRs). For the hardware module, a secure Internet of Things structure is developed as the bottom support platform. For the software module, a collaborative Bayesian network model and noncooperative game are introduced as algorithms. This secure AIoT architecture is able to maximize the advantages of the two modules. In addition, a large number of experiments are carried out to evaluate the performance of SAIoT-GR in terms of efficiency and robustness.

Journal ArticleDOI
04 Feb 2022-Genetics
TL;DR: A broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum are provided.
Abstract: Abstract WormBase (www.wormbase.org) is the central repository for the genetics and genomics of the nematode Caenorhabditis elegans. We provide the research community with data and tools to facilitate the use of C. elegans and related nematodes as model organisms for studying human health, development, and many aspects of fundamental biology. Throughout our 22-year history, we have continued to evolve to reflect progress and innovation in the science and technologies involved in the study of C. elegans. We strive to incorporate new data types and richer data sets, and to provide integrated displays and services that avail the knowledge generated by the published nematode genetics literature. Here, we provide a broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum. Concurrently, we continue the integration and harmonization of infrastructure, processes, and tools with the Alliance of Genome Resources, of which WormBase is a founding member.

Journal ArticleDOI
16 Mar 2022-iMeta
TL;DR: Integrated network analysis pipeline (iNAP) as discussed by the authors is an online analysis pipeline for generating and analyzing comprehensive ecological networks in microbiome studies, which is implemented in two sections, that is, network construction and network analysis, and integrates many open access tools.
Abstract: Integrated network analysis pipeline (iNAP) is an online analysis pipeline for generating and analyzing comprehensive ecological networks in microbiome studies. It is implemented in two sections, that is, network construction and network analysis, and integrates many open-access tools. Network construction contains multiple feasible alternatives, including correlation-based approaches (Pearson's correlation and Spearman's rank correlation along with random matrix theory, and sparse correlations for compositional data) and conditional dependence-based methods (extended local similarity analysis and sparse inverse covariance estimation for ecological association inference), while network analysis provides topological structures at different levels and the potential effects of environmental factors on network structures. Considering the full workflow, from microbiome data set to network result, iNAP contains the molecular ecological network analysis pipeline and interdomain ecological network analysis pipeline (IDENAP), which correspond to the intradomain and interdomain associations of microbial species at multiple taxonomic levels. Here, we describe the detailed workflow by taking IDENAP as an example and show the comprehensive steps to assist researchers to conduct the relevant analyses using their own data sets. Afterwards, some auxiliary tools facilitating the pipeline are introduced to effectively aid in the switch from local analysis to online operations. Therefore, iNAP, as an easy-to-use platform that provides multiple network-associated tools and approaches, can enable researchers to better understand the organization of microbial communities. iNAP is available at http://mem.rcees.ac.cn:8081 with free registration.

Journal ArticleDOI
TL;DR: Neuralomaps as mentioned in this paper is a toolbox for accessing, transforming and analyzing structural and functional brain annotations, which includes curated reference maps and biological ontologies of the human brain, such as molecular, microstructural, electrophysiological, developmental and functional ontologies.
Abstract: Abstract Imaging technologies are increasingly used to generate high-resolution reference maps of brain structure and function. Comparing experimentally generated maps to these reference maps facilitates cross-disciplinary scientific discovery. Although recent data sharing initiatives increase the accessibility of brain maps, data are often shared in disparate coordinate systems, precluding systematic and accurate comparisons. Here we introduce neuromaps, a toolbox for accessing, transforming and analyzing structural and functional brain annotations. We implement functionalities for generating high-quality transformations between four standard coordinate systems. The toolbox includes curated reference maps and biological ontologies of the human brain, such as molecular, microstructural, electrophysiological, developmental and functional ontologies. Robust quantitative assessment of map-to-map similarity is enabled via a suite of spatial autocorrelation-preserving null models. neuromaps combines open-access data with transparent functionality for standardizing and comparing brain maps, providing a systematic workflow for comprehensive structural and functional annotation enrichment analysis of the human brain.

Journal ArticleDOI
TL;DR: In this article , a chip-DIA workflow was proposed to profile the proteomes of adherent and non-adherent malignant cells, with good reproducibility and <16% missing values between runs.
Abstract: Single-cell proteomics can reveal cellular phenotypic heterogeneity and cell-specific functional networks underlying biological processes. Here, we present a streamlined workflow combining microfluidic chips for all-in-one proteomic sample preparation and data-independent acquisition (DIA) mass spectrometry (MS) for proteomic analysis down to the single-cell level. The proteomics chips enable multiplexed and automated cell isolation/counting/imaging and sample processing in a single device. Combining chip-based sample handling with DIA-MS using project-specific mass spectral libraries, we profile on average ~1,500 protein groups across 20 single mammalian cells. Applying the chip-DIA workflow to profile the proteomes of adherent and non-adherent malignant cells, we cover a dynamic range of 5 orders of magnitude with good reproducibility and <16% missing values between runs. Taken together, the chip-DIA workflow offers all-in-one cell characterization, analytical sensitivity and robustness, and the option to add additional functionalities in the future, thus providing a basis for advanced single-cell proteomics applications.

Journal ArticleDOI
TL;DR: In this paper , a chip-DIA workflow was proposed to profile the proteomes of adherent and non-adherent malignant cells, with good reproducibility and <16% missing values between runs.
Abstract: Single-cell proteomics can reveal cellular phenotypic heterogeneity and cell-specific functional networks underlying biological processes. Here, we present a streamlined workflow combining microfluidic chips for all-in-one proteomic sample preparation and data-independent acquisition (DIA) mass spectrometry (MS) for proteomic analysis down to the single-cell level. The proteomics chips enable multiplexed and automated cell isolation/counting/imaging and sample processing in a single device. Combining chip-based sample handling with DIA-MS using project-specific mass spectral libraries, we profile on average ~1,500 protein groups across 20 single mammalian cells. Applying the chip-DIA workflow to profile the proteomes of adherent and non-adherent malignant cells, we cover a dynamic range of 5 orders of magnitude with good reproducibility and <16% missing values between runs. Taken together, the chip-DIA workflow offers all-in-one cell characterization, analytical sensitivity and robustness, and the option to add additional functionalities in the future, thus providing a basis for advanced single-cell proteomics applications.

Journal ArticleDOI
TL;DR: In this paper , the authors identify nine different interpretability methods that have been used for understanding deep learning models for medical image analysis applications based on the type of generated explanations and technical similarities.

Journal ArticleDOI
TL;DR: In this paper , the authors describe three frontier research areas facilitating ethical responsible and legally compliant medical AI: complex networks and their inference, graph causal models and counterfactuals, and explainability methods.

Journal ArticleDOI
TL;DR: In this article, the authors describe three complementary Frontier Research Areas: (1) Complex Networks and their Inference, (2) Graph causal models and counterfactuals, and (3) Verification and Explainability methods.

Journal ArticleDOI
TL;DR: In this article , the authors put forth nine dimensions along which clinically validated digital health tools should be examined by health systems prior to adoption, and proposed strategies for selecting digital health solutions and planning for implementation in this setting.
Abstract: In recent years, the number of digital health tools with the potential to significantly improve delivery of healthcare services has grown tremendously. However, the use of these tools in large, complex health systems remains comparatively limited. The adoption and implementation of digital health tools at an enterprise level is a challenge; few strategies exist to help tools cross the chasm from clinical validation to integration within the workflows of a large health system. Many previously proposed frameworks for digital health implementation are difficult to operationalize in these dynamic organizations. In this piece, we put forth nine dimensions along which clinically validated digital health tools should be examined by health systems prior to adoption, and propose strategies for selecting digital health tools and planning for implementation in this setting. By evaluating prospective tools along these dimensions, health systems can evaluate which existing digital health solutions are worthy of adoption, ensure they have sufficient resources for deployment and long-term use, and devise a strategic plan for implementation.

Journal ArticleDOI
TL;DR: In this paper , a two-dimensional peak-picking algorithm and a neural network-based spectral library generation method are presented. But their work is limited to single peptide injections and does not address the problem of low sample amounts.
Abstract: The dia-PASEF technology uses ion mobility separation to reduce signal interferences and increase sensitivity in proteomic experiments. Here we present a two-dimensional peak-picking algorithm and generation of optimized spectral libraries, as well as take advantage of neural network-based processing of dia-PASEF data. Our computational platform boosts proteomic depth by up to 83% compared to previous work, and is specifically beneficial for fast proteomic experiments and those with low sample amounts. It quantifies over 5300 proteins in single injections recorded at 200 samples per day throughput using Evosep One chromatography system on a timsTOF Pro mass spectrometer and almost 9000 proteins in single injections recorded with a 93-min nanoflow gradient on timsTOF Pro 2, from 200 ng of HeLa peptides. A user-friendly implementation is provided through the incorporation of the algorithms in the DIA-NN software and by the FragPipe workflow for spectral library generation.

Journal ArticleDOI
TL;DR: In this paper , the authors present work led by the NASA Earth Science Data Systems Working Groups and ESIP machine learning cluster to give a comprehensive overview of AI in Earth sciences and provide all the levels of AI practitioners in geosciences with an overall big picture.

Journal ArticleDOI
TL;DR: Kraken as mentioned in this paper is a discovery platform covering monodentate organophosphorus(III) ligands providing comprehensive physicochemical descriptors based on representative conformer ensembles.
Abstract: The design of molecular catalysts typically involves reconciling multiple conflicting property requirements, largely relying on human intuition and local structural searches. However, the vast number of potential catalysts requires pruning of the candidate space by efficient property prediction with quantitative structure-property relationships. Data-driven workflows embedded in a library of potential catalysts can be used to build predictive models for catalyst performance and serve as a blueprint for novel catalyst designs. Herein we introduce kraken, a discovery platform covering monodentate organophosphorus(III) ligands providing comprehensive physicochemical descriptors based on representative conformer ensembles. Using quantum-mechanical methods, we calculated descriptors for 1558 ligands, including commercially available examples, and trained machine learning models to predict properties of over 300000 new ligands. We demonstrate the application of kraken to systematically explore the property space of organophosphorus ligands and how existing data sets in catalysis can be used to accelerate ligand selection during reaction optimization.

Proceedings ArticleDOI
13 May 2022
TL;DR: It is found that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers’ perception of productivity.
Abstract: Neural code synthesis has reached a point where snippet generation is accurate enough to be considered for integration into human software development workflows. Commercial products aim to increase programmers’ productivity, without being able to measure it directly. In this case study, we asked users of GitHub Copilot about its impact on their productivity, and sought to find a reflection of their perception in directly measurable user data. We find that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers’ perception of productivity.

Journal ArticleDOI
TL;DR: A detailed introduction to exosomes, including their physical and chemical properties, roles in normal biological processes and in disease progression, and summarizes some of the on‐going clinical trials are provided.
Abstract: Exosomes are extracellular vesicles that share components of their parent cells and are attractive in biotechnology and biomedical research as potential disease biomarkers as well as therapeutic agents. Crucial to realizing this potential is the ability to manufacture high‐quality exosomes; however, unlike biologics such as proteins, exosomes lack standardized Good Manufacturing Practices for their processing and characterization. Furthermore, there is a lack of well‐characterized reference exosome materials to aid in selection of methods for exosome isolation, purification, and analysis. This review informs exosome research and technology development by comparing exosome processing and characterization methods and recommending exosome workflows. This review also provides a detailed introduction to exosomes, including their physical and chemical properties, roles in normal biological processes and in disease progression, and summarizes some of the on‐going clinical trials.

Journal ArticleDOI
TL;DR: The Deep Docking (DD) platform as discussed by the authors enables up to 100-fold acceleration of structure-based virtual screening by docking only a subset of a chemical library, iteratively synchronized with a ligand-based prediction of the remaining docking scores.
Abstract: With the recent explosion of chemical libraries beyond a billion molecules, more efficient virtual screening approaches are needed. The Deep Docking (DD) platform enables up to 100-fold acceleration of structure-based virtual screening by docking only a subset of a chemical library, iteratively synchronized with a ligand-based prediction of the remaining docking scores. This method results in hundreds- to thousands-fold virtual hit enrichment (without significant loss of potential drug candidates) and hence enables the screening of billion molecule-sized chemical libraries without using extraordinary computational resources. Herein, we present and discuss the generalized DD protocol that has been proven successful in various computer-aided drug discovery (CADD) campaigns and can be applied in conjunction with any conventional docking program. The protocol encompasses eight consecutive stages: molecular library preparation, receptor preparation, random sampling of a library, ligand preparation, molecular docking, model training, model inference and the residual docking. The standard DD workflow enables iterative application of stages 3-7 with continuous augmentation of the training set, and the number of such iterations can be adjusted by the user. A predefined recall value allows for control of the percentage of top-scoring molecules that are retained by DD and can be adjusted to control the library size reduction. The procedure takes 1-2 weeks (depending on the available resources) and can be completely automated on computing clusters managed by job schedulers. This open-source protocol, at https://github.com/jamesgleave/DD_protocol , can be readily deployed by CADD researchers and can significantly accelerate the effective exploration of ultra-large portions of a chemical space.

Posted ContentDOI
18 Mar 2022-bioRxiv
TL;DR: An easy-to-use web service, Hiplot, equipping with comprehensive and interactive biomedical data visualization functions (230+) including basic statistics, multi-omics, regression, clustering, dimensional reduction, meta-analysis, survival analysis, risk modeling, etc, is proposed.
Abstract: Modern web techniques provide an unprecedented opportunity for leveraging complex biomedical data generating in clinical, omics, and mechanism experiments. Currently, the functions for carrying out publication-ready biomedical data visualization represent primary technical hurdles in the state-of-art omics-based web services, whereas the demand for visualization-based interactive data mining is ever-growing. Here, we propose an easy-to-use web service, Hiplot (https://hiplot.com.cn), equipping with comprehensive and interactive biomedical data visualization functions (230+) including basic statistics, multi-omics, regression, clustering, dimensional reduction, meta-analysis, survival analysis, risk modeling, etc. We used the demo and real datasets to demonstrate the usage workflow and the core functions of Hiplot. It permits users to conveniently and interactively complete a few specialized visualization tasks that previously could only be done by senior bioinformatics or biostatistics researchers. A modern web client with efficient user interfaces and interaction methods has been implemented based on the custom components library and the extensible plugin system. The versatile output can also be produced in different environments via using the cross-platform portable command-line interface (CLI) program, Hctl. A switchable view between the editable data table and the file uploader/path selection could facilitate data importing, previewing, and exporting, while the plumber-based response strategy significantly reduced the time costs for generating basic scientific graphics. Diversified layouts, themes/styles, and color palettes in this website allow users to create high-quality and publication-ready graphics. Researchers devoted to both life and data science may benefit from the emerging web service.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed an enhanced firefly algorithm adapted for tackling workflow scheduling challenges in a cloud-edge environment, which overcomes observed deficiencies of original firefly metaheuristic by incorporating genetic operators and quasi-reflection-based learning procedure.
Abstract: Edge computing is a novel technology, which is closely related to the concept of Internet of Things. This technology brings computing resources closer to the location where they are consumed by end-users-to the edge of the cloud. In this way, response time is shortened and lower network bandwidth is utilized. Workflow scheduling must be addressed to accomplish these goals. In this paper, we propose an enhanced firefly algorithm adapted for tackling workflow scheduling challenges in a cloud-edge environment. Our proposed approach overcomes observed deficiencies of original firefly metaheuristics by incorporating genetic operators and quasi-reflection-based learning procedure. First, we have validated the proposed improved algorithm on 10 modern standard benchmark instances and compared its performance with original and other improved state-of-the-art metaheuristics. Secondly, we have performed simulations for a workflow scheduling problem with two objectives-cost and makespan. We performed comparative analysis with other state-of-the-art approaches that were tested under the same experimental conditions. Algorithm proposed in this paper exhibits significant enhancements over the original firefly algorithm and other outstanding metaheuristics in terms of convergence speed and results' quality. Based on the output of conducted simulations, the proposed improved firefly algorithm obtains prominent results and managed to establish improvement in solving workflow scheduling in cloud-edge by reducing makespan and cost compared to other approaches.

Journal ArticleDOI
TL;DR: The count of open source software packages hosted by the Comprehensive R Archive Network (CRAN) using key spatial data handling packages has now passed 1,000 as mentioned in this paper , and providing a comprehensive review of these packages is beyond the scope of an article.
Abstract: The count of open source software packages hosted by the Comprehensive R Archive Network (CRAN) using key spatial data handling packages has now passed 1,000. Providing a comprehensive review of these packages is beyond the scope of an article. Consequently, this review takes the form of a comparative case study, reproducing some of the approach and workflow of a spatial analysis of a data set including almost all the census tracts in the coterminous United States. The case study moves from visualization and the construction of a spatial weights matrix, to exploratory spatial data analysis and spatial regression. For comparison, implementations of the same steps in PySAL and GeoDa are interwoven, and points of convergence and divergence noted and discussed. Conclusions are drawn about the usefulness of open source software, the significance of sharing contributions both in software implementation but also more broadly in reproducible research, and in opportunities for exchanging ideas and solutions with other research domains.

Journal ArticleDOI
TL;DR: Omni-ATAC as mentioned in this paper is an optimized protocol for transposase-accessible chromatin using sequencing (ATAC-seq) that is applicable across a broad range of cell and tissue types.
Abstract: The assay for transposase-accessible chromatin using sequencing (ATAC-seq) provides a simple and scalable way to detect the unique chromatin landscape associated with a cell type and how it may be altered by perturbation or disease. ATAC-seq requires a relatively small number of input cells and does not require a priori knowledge of the epigenetic marks or transcription factors governing the dynamics of the system. Here we describe an updated and optimized protocol for ATAC-seq, called Omni-ATAC, that is applicable across a broad range of cell and tissue types. The ATAC-seq workflow has five main steps: sample preparation, transposition, library preparation, sequencing and data analysis. This protocol details the steps to generate and sequence ATAC-seq libraries, with recommendations for sample preparation and downstream bioinformatic analysis. ATAC-seq libraries for roughly 12 samples can be generated in 10 h by someone familiar with basic molecular biology, and downstream sequencing analysis can be implemented using benchmarked pipelines by someone with basic bioinformatics skills and with access to a high-performance computing environment.