scispace - formally typeset
Search or ask a question

Showing papers on "Software published in 2017"


Journal ArticleDOI
TL;DR: Because MMseqs2 needs no random memory access in its innermost loop, its runtime scales almost inversely with the number of cores used, which enables sensitive protein sequence searching for the analysis of massive data sets.
Abstract: Sequencing costs have dropped much faster than Moore's law in the past decade, and sensitive sequence searching has become the main bottleneck in the analysis of large (meta)genomic datasets. While previous methods sacrificed sensitivity for speed gains, the parallelized, open-source software MMseqs2 overcomes this trade-off: In three-iteration profile searches it reaches 50% higher sensitivity than BLAST at 83-fold speed and the same sensitivity as PSI-BLAST at 270 times its speed. MMseqs2 therefore offers great potential to increase the fraction of annotatable (meta)genomic sequences.

1,371 citations


Journal ArticleDOI
TL;DR: The R package 'dagitty' is introduced, which provides access to all of the capabilities of the DAGitty web application within the R platform for statistical computing, and also offers several new functions that enable epidemiologists to detect causal misspecifications in DAGs and make robust inferences that remain valid for a range of different D AGs.
Abstract: Directed acyclic graphs (DAGs), which offer systematic representations of causal relationships, have become an established framework for the analysis of causal inference in epidemiology, often being used to determine covariate adjustment sets for minimizing confounding bias. DAGitty is a popular web application for drawing and analysing DAGs. Here we introduce the R package 'dagitty', which provides access to all of the capabilities of the DAGitty web application within the R platform for statistical computing, and also offers several new functions. We describe how the R package 'dagitty' can be used to: evaluate whether a DAG is consistent with the dataset it is intended to represent; enumerate 'statistically equivalent' but causally different DAGs; and identify exposure-outcome adjustment sets that are valid for causally different but statistically equivalent DAGs. This functionality enables epidemiologists to detect causal misspecifications in DAGs and make robust inferences that remain valid for a range of different DAGs. The R package 'dagitty' is available through the comprehensive R archive network (CRAN) at [https://cran.r-project.org/web/packages/dagitty/]. The source code is available on github at [https://github.com/jtextor/dagitty]. The web application 'DAGitty' is free software, licensed under the GNU general public licence (GPL) version 2 and is available at [http://dagitty.net/].

1,039 citations


Posted Content
TL;DR: A new simulator built on Unreal Engine that offers physically and visually realistic simulations for autonomous vehicles in real world and that is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols.
Abstract: Developing and testing algorithms for autonomous vehicles in real world is an expensive and time consuming process. Also, in order to utilize recent advances in machine intelligence and deep learning we need to collect a large amount of annotated training data in a variety of conditions and environments. We present a new simulator built on Unreal Engine that offers physically and visually realistic simulations for both of these goals. Our simulator includes a physics engine that can operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations with support for popular protocols (e.g. MavLink). The simulator is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols. In addition, the modular design enables various components to be easily usable independently in other projects. We demonstrate the simulator by first implementing a quadrotor as an autonomous vehicle and then experimentally comparing the software components with real-world flights.

979 citations


Book ChapterDOI
15 May 2017
TL;DR: In this paper, the authors present a new simulator built on Unreal Engine that offers physically and visually realistic simulations for autonomous vehicles in real-world environments, including a physics engine that can operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations with support for popular protocols (e.g., MavLink).
Abstract: Developing and testing algorithms for autonomous vehicles in real world is an expensive and time consuming process Also, in order to utilize recent advances in machine intelligence and deep learning we need to collect a large amount of annotated training data in a variety of conditions and environments We present a new simulator built on Unreal Engine that offers physically and visually realistic simulations for both of these goals Our simulator includes a physics engine that can operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations with support for popular protocols (eg MavLink) The simulator is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols In addition, the modular design enables various components to be easily usable independently in other projects We demonstrate the simulator by first implementing a quadrotor as an autonomous vehicle and then experimentally comparing the software components with real-world flights

938 citations


Journal ArticleDOI
TL;DR: PlatEMO as discussed by the authors is a MATLAB platform for evolutionary multi-objective optimization, which includes more than 50 multiobjective evolutionary algorithms and more than 100 multobjective test problems, along with several widely used performance indicators.
Abstract: Over the last three decades, a large number of evolutionary algorithms have been developed for solving multi-objective optimization problems. However, there lacks an upto-date and comprehensive software platform for researchers to properly benchmark existing algorithms and for practitioners to apply selected algorithms to solve their real-world problems. The demand of such a common tool becomes even more urgent, when the source code of many proposed algorithms has not been made publicly available. To address these issues, we have developed a MATLAB platform for evolutionary multi-objective optimization in this paper, called PlatEMO, which includes more than 50 multiobjective evolutionary algorithms and more than 100 multi-objective test problems, along with several widely used performance indicators. With a user-friendly graphical user interface, PlatEMO enables users to easily compare several evolutionary algorithms at one time and collect statistical results in Excel or LaTeX files. More importantly, PlatEMO is completely open source, such that users are able to develop new algorithms on the basis of it. This paper introduces the main features of PlatEMO and illustrates how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators. Source code of PlatEMO is now available at: http://bimk.ahu.edu.cn/index.php?s=/Index/Software/index.html.

915 citations


Posted Content
TL;DR: The main features of PlatEMO are introduced and how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators are illustrated.
Abstract: Over the last three decades, a large number of evolutionary algorithms have been developed for solving multiobjective optimization problems. However, there lacks an up-to-date and comprehensive software platform for researchers to properly benchmark existing algorithms and for practitioners to apply selected algorithms to solve their real-world problems. The demand of such a common tool becomes even more urgent, when the source code of many proposed algorithms has not been made publicly available. To address these issues, we have developed a MATLAB platform for evolutionary multi-objective optimization in this paper, called PlatEMO, which includes more than 50 multi-objective evolutionary algorithms and more than 100 multi-objective test problems, along with several widely used performance indicators. With a user-friendly graphical user interface, PlatEMO enables users to easily compare several evolutionary algorithms at one time and collect statistical results in Excel or LaTeX files. More importantly, PlatEMO is completely open source, such that users are able to develop new algorithms on the basis of it. This paper introduces the main features of PlatEMO and illustrates how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators. Source code of PlatEMO is now available at: http://bimk.ahu.edu.cn/index.php?s=/Index/Software/index.html.

828 citations


Journal ArticleDOI
TL;DR: A new method and convenient tools for determining sample size and power in mediation models are proposed and demonstrated and will allow researchers to quickly and easily determine power and sample size for simple and complex mediation models.
Abstract: Mediation analyses abound in social and personality psychology. Current recommendations for assessing power and sample size in mediation models include using a Monte Carlo power analysis simulation and testing the indirect effect with a bootstrapped confidence interval. Unfortunately, these methods have rarely been adopted by researchers due to limited software options and the computational time needed. We propose a new method and convenient tools for determining sample size and power in mediation models. We demonstrate our new method through an easy-to-use application that implements the method. These developments will allow researchers to quickly and easily determine power and sample size for simple and complex mediation models.

615 citations


Journal ArticleDOI
TL;DR: The Splatter Bioconductor package is presented for simple, reproducible, and well-documented simulation of scRNA-seq data and provides an interface to multiple simulation methods including Splatter, the authors' own simulation, based on a gamma-Poisson distribution.
Abstract: As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.

568 citations


Journal ArticleDOI
TL;DR: In this paper, the trigger system consists of two levels designed to select events of potential physics interest from a GHz (MHz) interaction rate of proton-proton (heavy ion) collisions.
Abstract: This paper describes the CMS trigger system and its performance during Run 1 of the LHC. The trigger system consists of two levels designed to select events of potential physics interest from a GHz (MHz) interaction rate of proton-proton (heavy ion) collisions. The first level of the trigger is implemented in hardware, and selects events containing detector signals consistent with an electron, photon, muon, tau lepton, jet, or missing transverse energy. A programmable menu of up to 128 object-based algorithms is used to select events for subsequent processing. The trigger thresholds are adjusted to the LHC instantaneous luminosity during data taking in order to restrict the output rate to 100 kHz, the upper limit imposed by the CMS readout electronics. The second level, implemented in software, further refines the purity of the output stream, selecting an average rate of 400 Hz for offline event storage. The objectives, strategy and performance of the trigger system during the LHC Run 1 are described.

532 citations


Journal ArticleDOI
05 Dec 2017-PeerJ
TL;DR: The release of SDMtoolbox 2.0 allows researchers to use the most current ArcGIS software and MaxEnt software, and reduces the amount of time that would be spent developing common solutions.
Abstract: SDMtoolbox 2.0 is a software package for spatial studies of ecology, evolution, and genetics. The release of SDMtoolbox 2.0 allows researchers to use the most current ArcGIS software and MaxEnt software, and reduces the amount of time that would be spent developing common solutions. The central aim of this software is to automate complicated and repetitive spatial analyses in an intuitive graphical user interface. One core tenant facilitates careful parameterization of species distribution models (SDMs) to maximize each model's discriminatory ability and minimize overfitting. This includes carefully processing of occurrence data, environmental data, and model parameterization. This program directly interfaces with MaxEnt, one of the most powerful and widely used species distribution modeling software programs, although SDMtoolbox 2.0 is not limited to species distribution modeling or restricted to modeling in MaxEnt. Many of the SDM pre- and post-processing tools have 'universal' analogs for use with any modeling software. The current version contains a total of 79 scripts that harness the power of ArcGIS for macroecology, landscape genetics, and evolutionary studies. For example, these tools allow for biodiversity quantification (such as species richness or corrected weighted endemism), generation of least-cost paths and corridors among shared haplotypes, assessment of the significance of spatial randomizations, and enforcement of dispersal limitations of SDMs projected into future climates-to only name a few functions contained in SDMtoolbox 2.0. Lastly, dozens of generalized tools exists for batch processing and conversion of GIS data types or formats, which are broadly useful to any ArcMap user.

451 citations


Journal ArticleDOI
TL;DR: Automated tomographic reconstruction is now possible in the IMOD software package, including the merging of tomograms taken around two orthogonal axes, and a user interface for batch processing of tilt series is added to the Etomo program in IMOD.

Journal ArticleDOI
TL;DR: In this article, a Monte Carlo approach is proposed to improve the accuracy of SfM-based DEMs and minimise the associated field effort by robust determination of suitable lower-density deployments of ground control.

Proceedings Article
16 Aug 2017
TL;DR: A new, yet critical, side-channel attack, branch shadowing, that reveals fine-grained control flows (branch granularity) in an enclave and develops two novel exploitation techniques, a last branch record (LBR)-based history-inferring technique and an advanced programmable interrupt controller (APIC)-based technique to control the execution of an enclave in a finegrained manner.
Abstract: Intel has introduced a hardware-based trusted execution environment, Intel Software Guard Extensions (SGX), that provides a secure, isolated execution environment, or enclave, for a user program without trusting any underlying software (e.g., an operating system) or firmware. Researchers have demonstrated that SGX is vulnerable to a page-fault-based attack. However, the attack only reveals page-level memory accesses within an enclave. In this paper, we explore a new, yet critical, side-channel attack, branch shadowing, that reveals fine-grained control flows (branch granularity) in an enclave. The root cause of this attack is that SGX does not clear branch history when switching from enclave to nonenclave mode, leaving fine-grained traces for the outside world to observe, which gives rise to a branch-prediction side channel. However, exploiting this channel in practice is challenging because 1) measuring branch execution time is too noisy for distinguishing fine-grained controlflow changes and 2) pausing an enclave right after it has executed the code block we target requires sophisticated control. To overcome these challenges, we develop two novel exploitation techniques: 1) a last branch record (LBR)-based history-inferring technique and 2) an advanced programmable interrupt controller (APIC)-based technique to control the execution of an enclave in a finegrained manner. An evaluation against RSA shows that our attack infers each private key bit with 99.8% accuracy. Finally, we thoroughly study the feasibility of hardware-based solutions (i.e., branch history flushing) and propose a software-based approach that mitigates the attack.

Journal ArticleDOI
TL;DR: This work introduces CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner.
Abstract: Most existing dimensionality reduction and clustering packages for single-cell RNA-seq (scRNA-seq) data deal with dropouts by heavy modeling and computational machinery. Here, we introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner. Using a range of simulated and real data, we show that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA, and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds when processing a data set of hundreds of cells and minutes for a data set of thousands of cells. CIDR can be downloaded at https://github.com/VCCRI/CIDR .

Journal ArticleDOI
TL;DR: This survey describes and compares the areas of research that have been explored thus far, drawing out common aspects, trends and directions future research should take to address open problems and challenges.
Abstract: App Store Analysis studies information about applications obtained from app stores. App stores provide a wealth of information derived from users that would not exist had the applications been distributed via previous software deployment methods. App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these forms of software repositories. Findings from App Store Analysis have a direct and actionable impact on the software teams that develop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security and testing. This survey describes and compares the areas of research that have been explored thus far, drawing out common aspects, trends and directions future research should take to address open problems and challenges.

Journal ArticleDOI
TL;DR: Through a standard approach to data collection and procedure reporting, researchers and practitioners will be able to make more confident comparisons from their data, which will improve the understanding and impact these devices can have on athlete performance.
Abstract: Athlete-tracking devices that include global positioning system (GPS) and microelectrical mechanical system (MEMS) components are now commonplace in sport research and practice. These devices provide large amounts of data that are used to inform decision making on athlete training and performance. However, the data obtained from these devices are often provided without clear explanation of how these metrics are obtained. At present, there is no clear consensus regarding how these data should be handled and reported in a sport context. Therefore, the aim of this review was to examine the factors that affect the data produced by these athlete-tracking devices and to provide guidelines for collecting, processing, and reporting of data. Many factors including device sampling rate, positioning and fitting of devices, satellite signal, and data-filtering methods can affect the measures obtained from GPS and MEMS devices. Therefore researchers are encouraged to report device brand/model, sampling frequency, number of satellites, horizontal dilution of precision, and software/firmware versions in any published research. In addition, details of inclusion/exclusion criteria for data obtained from these devices are also recommended. Considerations for the application of speed zones to evaluate the magnitude and distribution of different locomotor activities recorded by GPS are also presented, alongside recommendations for both industry practice and future research directions. Through a standard approach to data collection and procedure reporting, researchers and practitioners will be able to make more confident comparisons from their data, which will improve the understanding and impact these devices can have on athlete performance.

Journal ArticleDOI
TL;DR: An R package, metaX, is developed that is capable of end-to-end metabolomics data analysis through a set of interchangeable modules and provides several functions, such as peak picking and annotation, data quality assessment, missing value imputation, data normalization, univariate and multivariate statistics, power analysis and sample size estimation.
Abstract: Non-targeted metabolomics based on mass spectrometry enables high-throughput profiling of the metabolites in a biological sample. The large amount of data generated from mass spectrometry requires intensive computational processing for annotation of mass spectra and identification of metabolites. Computational analysis tools that are fully integrated with multiple functions and are easily operated by users who lack extensive knowledge in programing are needed in this research field. We herein developed an R package, metaX, that is capable of end-to-end metabolomics data analysis through a set of interchangeable modules. Specifically, metaX provides several functions, such as peak picking and annotation, data quality assessment, missing value imputation, data normalization, univariate and multivariate statistics, power analysis and sample size estimation, receiver operating characteristic analysis, biomarker selection, pathway annotation, correlation network analysis, and metabolite identification. In addition, metaX offers a web-based interface ( http://metax.genomics.cn ) for data quality assessment and normalization method evaluation, and it generates an HTML-based report with a visualized interface. The metaX utilities were demonstrated with a published metabolomics dataset on a large scale. The software is available for operation as either a web-based graphical user interface (GUI) or in the form of command line functions. The package and the example reports are available at http://metax.genomics.cn/ . The pipeline of metaX is platform-independent and is easy to use for analysis of metabolomics data generated from mass spectrometry.

Journal ArticleDOI
TL;DR: NewGene is software designed to eliminate many of the difficulties commonly involved in constructing large international relations data sets by providing a highly flexible platform on which users can construct datasets for international relations research using pre-loaded data or by incorporating their own data.
Abstract: This paper introduces a complete redesign of the popular EUGene software, called NewGene. Like EUGene, NewGene is software designed to eliminate many of the difficulties commonly involved in constructing large international relations data sets. NewGene is a stand-alone Microsoft Windows and Osx based program for the construction of annual, monthly, and daily data sets for the variety of decision making units (e.g. countries, leaders, organizations, etc) used in quantitative studies of international relations. It also provides users the ability to construct units of analysis ranging from monads (e.g. country-year), to dyads (e.g. country1-country2-year), to extra-dyadic observations called k-ads (e.g. country1-country2-…-countryk-year). NewGene’s purpose is to provide a highly flexible platform on which users can construct datasets for international relations research using pre-loaded data or by incorporating their own data.

Journal ArticleDOI
TL;DR: Wordbank as mentioned in this paper is a structured database of parent-report data combined with a browsable web interface for exploring patterns of vocabulary growth at the level of both individual children and particular words.
Abstract: The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.


Journal ArticleDOI
01 Dec 2017-Nature
TL;DR: This work shows that by using simple local assembly rules that are modified and applied recursively throughout a hierarchical, multistage assembly process, a small and constant set of unique DNA strands can be used to create DNA origami arrays of increasing size and with arbitrary patterns.
Abstract: Self-assembled DNA nanostructures enable nanometre-precise patterning that can be used to create programmable molecular machines and arrays of functional materials. DNA origami is particularly versatile in this context because each DNA strand in the origami nanostructure occupies a unique position and can serve as a uniquely addressable pixel. However, the scale of such structures has been limited to about 0.05 square micrometres, hindering applications that demand a larger layout and integration with more conventional patterning methods. Hierarchical multistage assembly of simple sets of tiles can in principle overcome this limitation, but so far has not been sufficiently robust to enable successful implementation of larger structures using DNA origami tiles. Here we show that by using simple local assembly rules that are modified and applied recursively throughout a hierarchical, multistage assembly process, a small and constant set of unique DNA strands can be used to create DNA origami arrays of increasing size and with arbitrary patterns. We illustrate this method, which we term ‘fractal assembly’, by producing DNA origami arrays with sizes of up to 0.5 square micrometres and with up to 8,704 pixels, allowing us to render images such as the Mona Lisa and a rooster. We find that self-assembly of the tiles into arrays is unaffected by changes in surface patterns on the tiles, and that the yield of the fractal assembly process corresponds to about 0.95^(m − 1) for arrays containing m tiles. When used in conjunction with a software tool that we developed that converts an arbitrary pattern into DNA sequences and experimental protocols, our assembly method is readily accessible and will facilitate the construction of sophisticated materials and devices with sizes similar to that of a bacterium using DNA nanostructures.

Journal ArticleDOI
TL;DR: A software defined spaceair- ground integrated network architecture for supporting diverse vehicular services in a seamless, efficient, and cost-effective manner is proposed.
Abstract: This article proposes a software defined spaceair- ground integrated network architecture for supporting diverse vehicular services in a seamless, efficient, and cost-effective manner. First, the motivations and challenges for integration of space-air-ground networks are reviewed. Second, a software defined network architecture with a layered structure is presented. To protect the legacy services in the satellite, aerial, and terrestrial segments, resources in each segment are sliced through network slicing to achieve service isolation. Then available resources are put into a common and dynamic space-air-ground resource pool, which is managed by hierarchical controllers to accommodate vehicular services. Finally, a case study is carried out, followed by discussion on some open research topics.

Proceedings ArticleDOI
21 Aug 2017
TL;DR: Themis as discussed by the authors is a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior, and generates efficient test suites to measure discrimination.
Abstract: This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.

Journal ArticleDOI
TL;DR: NMRbox is a shared resource for NMR software and computation that employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud.

Journal ArticleDOI
TL;DR: In this article, the authors present a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.
Abstract: Author summary Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for reproducibility, but researchers new to computing often don't know where to start. This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide variety of published sources from our daily lives and from our work with volunteer organizations that have delivered workshops to over 11,000 people since 2010.

Proceedings ArticleDOI
25 Jul 2017
TL;DR: This paper proposes a framework called Defect Prediction via Convolutional Neural Network (DP-CNN), which leverages deep learning for effective feature generation and evaluates the method on seven open source projects in terms of F-measure in defect prediction.
Abstract: To improve software reliability, software defect prediction is utilized to assist developers in finding potential bugs and allocating their testing efforts. Traditional defect prediction studies mainly focus on designing hand-crafted features, which are input into machine learning classifiers to identify defective code. However, these hand-crafted features often fail to capture the semantic and structural information of programs. Such information is important in modeling program functionality and can lead to more accurate defect prediction.In this paper, we propose a framework called Defect Prediction via Convolutional Neural Network (DP-CNN), which leverages deep learning for effective feature generation. Specifically, based on the programs' Abstract Syntax Trees (ASTs), we first extract token vectors, which are then encoded as numerical vectors via mapping and word embedding. We feed the numerical vectors into Convolutional Neural Network to automatically learn semantic and structural features of programs. After that, we combine the learned features with traditional hand-crafted features, for accurate software defect prediction. We evaluate our method on seven open source projects in terms of F-measure in defect prediction. The experimental results show that in average, DP-CNN improves the state-of-the-art method by 12%.

Journal ArticleDOI
TL;DR: An extensive review of the many different works in the field of software vulnerability analysis and discovery that utilize machine-learning and data-mining techniques that utilize both advantages and shortcomings in this domain is provided.
Abstract: Software security vulnerabilities are one of the critical issues in the realm of computer security. Due to their potential high severity impacts, many different approaches have been proposed in the past decades to mitigate the damages of software vulnerabilities. Machine-learning and data-mining techniques are also among the many approaches to address this issue. In this article, we provide an extensive review of the many different works in the field of software vulnerability analysis and discovery that utilize machine-learning and data-mining techniques. We review different categories of works in this domain, discuss both advantages and shortcomings, and point out challenges and some uncharted territories in the field.

Posted Content
TL;DR: From a light scan of literature, it is demonstrated that there is considerable scope to infuse more results from the social and behavioural sciences into explainable AI, and some key results from these fields that are relevant to explainableAI are presented.
Abstract: In his seminal book `The Inmates are Running the Asylum: Why High-Tech Products Drive Us Crazy And How To Restore The Sanity' [2004, Sams Indianapolis, IN, USA], Alan Cooper argues that a major reason why software is often poorly designed (from a user perspective) is that programmers are in charge of design decisions, rather than interaction designers. As a result, programmers design software for themselves, rather than for their target audience, a phenomenon he refers to as the `inmates running the asylum'. This paper argues that explainable AI risks a similar fate. While the re-emergence of explainable AI is positive, this paper argues most of us as AI researchers are building explanatory agents for ourselves, rather than for the intended users. But explainable AI is more likely to succeed if researchers and practitioners understand, adopt, implement, and improve models from the vast and valuable bodies of research in philosophy, psychology, and cognitive science, and if evaluation of these models is focused more on people than on technology. From a light scan of literature, we demonstrate that there is considerable scope to infuse more results from the social and behavioural sciences into explainable AI, and present some key results from these fields that are relevant to explainable AI.

Journal ArticleDOI
TL;DR: The developed OpenMEE: Open Meta‐analyst for Ecology and Evolution is a cross‐platform, easy‐to‐use graphical user interface that gives E&E researchers access to the diverse and advanced statistical functionalities offered in R, without requiring knowledge of R programming.
Abstract: Summary Meta-analysis and meta-regression are statistical methods for synthesizing and modelling the results of different studies, and are critical research synthesis tools in ecology and evolutionary biology (E&E). However, many E&E researchers carry out meta-analyses using software that is limited in its statistical functionality and is not easily updatable. It is likely that these software limitations have slowed the uptake of new methods in E&E and limited the scope and quality of inferences from research syntheses. We developed OpenMEE: Open Meta-analyst for Ecology and Evolution to address the need for advanced, easy-to-use software for meta-analysis and meta-regression. OpenMEE has a cross-platform, easy-to-use graphical user interface (GUI) that gives E&E researchers access to the diverse and advanced statistical functionalities offered in R, without requiring knowledge of R programming. OpenMEE offers a suite of advanced meta-analysis and meta-regression methods for synthesizing continuous and categorical data, including meta-regression with multiple covariates and their interactions, phylogenetic analyses, and simple missing data imputation. OpenMEE also supports data importing and exporting, exploratory data analysis, graphing of data, and summary table generation. As intuitive, open-source, free software for advanced methods in meta-analysis, OpenMEE meets the current and pressing needs of the E&E community for teaching meta-analysis and conducting high-quality syntheses. Because OpenMEE's statistical components are written in R, new methods and packages can be rapidly incorporated into the software. To fully realize the potential of OpenMEE, we encourage community development with an aim to advance the capabilities of meta-analyses in E&E.

Journal ArticleDOI
TL;DR: The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints.
Abstract: The publication familiarizes the reader with MicMac - a free, open-source photogrammetric software for 3D reconstruction A brief history of the tool, its organisation and unique features vis-a-vis other software tools are in the highlight The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints