Showing papers on "Software published in 2017"

PDF

Open Access

Journal Article•DOI•

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

[...]

Martin Steinegger¹, Johannes Söding¹•Institutions (1)

16 Oct 2017-Nature Biotechnology

TL;DR: Because MMseqs2 needs no random memory access in its innermost loop, its runtime scales almost inversely with the number of cores used, which enables sensitive protein sequence searching for the analysis of massive data sets.

...read moreread less

Abstract: Sequencing costs have dropped much faster than Moore's law in the past decade, and sensitive sequence searching has become the main bottleneck in the analysis of large (meta)genomic datasets. While previous methods sacrificed sensitivity for speed gains, the parallelized, open-source software MMseqs2 overcomes this trade-off: In three-iteration profile searches it reaches 50% higher sensitivity than BLAST at 83-fold speed and the same sensitivity as PSI-BLAST at 270 times its speed. MMseqs2 therefore offers great potential to increase the fraction of annotatable (meta)genomic sequences.

...read moreread less

1,371 citations

Journal Article•DOI•

Robust causal inference using directed acyclic graphs: the R package 'dagitty'.

[...]

Johannes Textor¹, Benito van der Zander, Mark S. Gilthorpe², Maciej Liskiewicz, George T. H. Ellison² - Show less +1 more•Institutions (2)

Radboud University Nijmegen¹, University of Leeds²

15 Jan 2017-International Journal of Epidemiology

TL;DR: The R package 'dagitty' is introduced, which provides access to all of the capabilities of the DAGitty web application within the R platform for statistical computing, and also offers several new functions that enable epidemiologists to detect causal misspecifications in DAGs and make robust inferences that remain valid for a range of different D AGs.

...read moreread less

Abstract: Directed acyclic graphs (DAGs), which offer systematic representations of causal relationships, have become an established framework for the analysis of causal inference in epidemiology, often being used to determine covariate adjustment sets for minimizing confounding bias. DAGitty is a popular web application for drawing and analysing DAGs. Here we introduce the R package 'dagitty', which provides access to all of the capabilities of the DAGitty web application within the R platform for statistical computing, and also offers several new functions. We describe how the R package 'dagitty' can be used to: evaluate whether a DAG is consistent with the dataset it is intended to represent; enumerate 'statistically equivalent' but causally different DAGs; and identify exposure-outcome adjustment sets that are valid for causally different but statistically equivalent DAGs. This functionality enables epidemiologists to detect causal misspecifications in DAGs and make robust inferences that remain valid for a range of different DAGs. The R package 'dagitty' is available through the comprehensive R archive network (CRAN) at [https://cran.r-project.org/web/packages/dagitty/]. The source code is available on github at [https://github.com/jtextor/dagitty]. The web application 'DAGitty' is free software, licensed under the GNU general public licence (GPL) version 2 and is available at [http://dagitty.net/].

...read moreread less

1,039 citations

Posted Content•

AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles

[...]

Shital Shah¹, Debadeepta Dey¹, Chris Lovett¹, Ashish Kapoor¹•Institutions (1)

Microsoft¹

15 May 2017-arXiv: Robotics

TL;DR: A new simulator built on Unreal Engine that offers physically and visually realistic simulations for autonomous vehicles in real world and that is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols.

...read moreread less

Abstract: Developing and testing algorithms for autonomous vehicles in real world is an expensive and time consuming process. Also, in order to utilize recent advances in machine intelligence and deep learning we need to collect a large amount of annotated training data in a variety of conditions and environments. We present a new simulator built on Unreal Engine that offers physically and visually realistic simulations for both of these goals. Our simulator includes a physics engine that can operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations with support for popular protocols (e.g. MavLink). The simulator is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols. In addition, the modular design enables various components to be easily usable independently in other projects. We demonstrate the simulator by first implementing a quadrotor as an autonomous vehicle and then experimentally comparing the software components with real-world flights.

...read moreread less

979 citations

Book Chapter•DOI•

AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles

[...]

Shital Shah¹, Debadeepta Dey¹, Chris Lovett¹, Ashish Kapoor¹•Institutions (1)

Microsoft¹

15 May 2017

TL;DR: In this paper, the authors present a new simulator built on Unreal Engine that offers physically and visually realistic simulations for autonomous vehicles in real-world environments, including a physics engine that can operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations with support for popular protocols (e.g., MavLink).

...read moreread less

Abstract: Developing and testing algorithms for autonomous vehicles in real world is an expensive and time consuming process Also, in order to utilize recent advances in machine intelligence and deep learning we need to collect a large amount of annotated training data in a variety of conditions and environments We present a new simulator built on Unreal Engine that offers physically and visually realistic simulations for both of these goals Our simulator includes a physics engine that can operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations with support for popular protocols (eg MavLink) The simulator is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols In addition, the modular design enables various components to be easily usable independently in other projects We demonstrate the simulator by first implementing a quadrotor as an autonomous vehicle and then experimentally comparing the software components with real-world flights

...read moreread less

938 citations

Journal Article•DOI•

PlatEMO: A MATLAB Platform for Evolutionary Multi-Objective Optimization [Educational Forum]

[...]

Ye Tian¹, Ran Cheng², Xingyi Zhang¹, Yaochu Jin³•Institutions (3)

Anhui University¹, University of Birmingham², University of Surrey³

11 Oct 2017-IEEE Computational Intelligence Magazine

TL;DR: PlatEMO as discussed by the authors is a MATLAB platform for evolutionary multi-objective optimization, which includes more than 50 multiobjective evolutionary algorithms and more than 100 multobjective test problems, along with several widely used performance indicators.

...read moreread less

Abstract: Over the last three decades, a large number of evolutionary algorithms have been developed for solving multi-objective optimization problems. However, there lacks an upto-date and comprehensive software platform for researchers to properly benchmark existing algorithms and for practitioners to apply selected algorithms to solve their real-world problems. The demand of such a common tool becomes even more urgent, when the source code of many proposed algorithms has not been made publicly available. To address these issues, we have developed a MATLAB platform for evolutionary multi-objective optimization in this paper, called PlatEMO, which includes more than 50 multiobjective evolutionary algorithms and more than 100 multi-objective test problems, along with several widely used performance indicators. With a user-friendly graphical user interface, PlatEMO enables users to easily compare several evolutionary algorithms at one time and collect statistical results in Excel or LaTeX files. More importantly, PlatEMO is completely open source, such that users are able to develop new algorithms on the basis of it. This paper introduces the main features of PlatEMO and illustrates how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators. Source code of PlatEMO is now available at: http://bimk.ahu.edu.cn/index.php?s=/Index/Software/index.html.

...read moreread less

915 citations

Posted Content•

PlatEMO: A MATLAB Platform for Evolutionary Multi-Objective Optimization

[...]

Ye Tian, Ran Cheng, Xingyi Zhang, Yaochu Jin

11 Oct 2017-arXiv: Neural and Evolutionary Computing

TL;DR: The main features of PlatEMO are introduced and how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators are illustrated.

...read moreread less

Abstract: Over the last three decades, a large number of evolutionary algorithms have been developed for solving multiobjective optimization problems. However, there lacks an up-to-date and comprehensive software platform for researchers to properly benchmark existing algorithms and for practitioners to apply selected algorithms to solve their real-world problems. The demand of such a common tool becomes even more urgent, when the source code of many proposed algorithms has not been made publicly available. To address these issues, we have developed a MATLAB platform for evolutionary multi-objective optimization in this paper, called PlatEMO, which includes more than 50 multi-objective evolutionary algorithms and more than 100 multi-objective test problems, along with several widely used performance indicators. With a user-friendly graphical user interface, PlatEMO enables users to easily compare several evolutionary algorithms at one time and collect statistical results in Excel or LaTeX files. More importantly, PlatEMO is completely open source, such that users are able to develop new algorithms on the basis of it. This paper introduces the main features of PlatEMO and illustrates how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators. Source code of PlatEMO is now available at: http://bimk.ahu.edu.cn/index.php?s=/Index/Software/index.html.

...read moreread less

828 citations

Journal Article•DOI•

Determining Power and Sample Size for Simple and Complex Mediation Models

[...]

Alexander M. Schoemann¹, Aaron J. Boulton², Stephen D. Short³•Institutions (3)

East Carolina University¹, University of Delaware², College of Charleston³

15 Jun 2017-Social Psychological and Personality Science

TL;DR: A new method and convenient tools for determining sample size and power in mediation models are proposed and demonstrated and will allow researchers to quickly and easily determine power and sample size for simple and complex mediation models.

...read moreread less

Abstract: Mediation analyses abound in social and personality psychology. Current recommendations for assessing power and sample size in mediation models include using a Monte Carlo power analysis simulation and testing the indirect effect with a bootstrapped confidence interval. Unfortunately, these methods have rarely been adopted by researchers due to limited software options and the computational time needed. We propose a new method and convenient tools for determining sample size and power in mediation models. We demonstrate our new method through an easy-to-use application that implements the method. These developments will allow researchers to quickly and easily determine power and sample size for simple and complex mediation models.

...read moreread less

615 citations

Journal Article•DOI•

Splatter: simulation of single-cell RNA sequencing data

[...]

Luke Zappia¹, Luke Zappia², Belinda Phipson², Alicia Oshlack², Alicia Oshlack¹ - Show less +1 more•Institutions (2)

University of Melbourne¹, Royal Children's Hospital²

12 Sep 2017-Genome Biology

TL;DR: The Splatter Bioconductor package is presented for simple, reproducible, and well-documented simulation of scRNA-seq data and provides an interface to multiple simulation methods including Splatter, the authors' own simulation, based on a gamma-Poisson distribution.

...read moreread less

Abstract: As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.

...read moreread less

568 citations

Journal Article•DOI•

The CMS trigger system

[...]

U. Bhawandeep¹, Vardan Khachatryan, Albert M. Sirunyan, Armen Tumasyan +2289 more•Institutions (147)

24 Jan 2017-Journal of Instrumentation

TL;DR: In this paper, the trigger system consists of two levels designed to select events of potential physics interest from a GHz (MHz) interaction rate of proton-proton (heavy ion) collisions.

...read moreread less

Abstract: This paper describes the CMS trigger system and its performance during Run 1 of the LHC. The trigger system consists of two levels designed to select events of potential physics interest from a GHz (MHz) interaction rate of proton-proton (heavy ion) collisions. The first level of the trigger is implemented in hardware, and selects events containing detector signals consistent with an electron, photon, muon, tau lepton, jet, or missing transverse energy. A programmable menu of up to 128 object-based algorithms is used to select events for subsequent processing. The trigger thresholds are adjusted to the LHC instantaneous luminosity during data taking in order to restrict the output rate to 100 kHz, the upper limit imposed by the CMS readout electronics. The second level, implemented in software, further refines the purity of the output stream, selecting an average rate of 400 Hz for offline event storage. The objectives, strategy and performance of the trigger system during the LHC Run 1 are described.

...read moreread less

532 citations

Journal Article•DOI•

SDMtoolbox 2.0: the next generation Python-based GIS toolkit for landscape genetic, biogeographic and species distribution model analyses.

[...]

Jason L. Brown¹, Joseph R. Bennett¹, Connor M. French¹•Institutions (1)

Southern Illinois University Carbondale¹

05 Dec 2017-PeerJ

TL;DR: The release of SDMtoolbox 2.0 allows researchers to use the most current ArcGIS software and MaxEnt software, and reduces the amount of time that would be spent developing common solutions.

...read moreread less

Abstract: SDMtoolbox 2.0 is a software package for spatial studies of ecology, evolution, and genetics. The release of SDMtoolbox 2.0 allows researchers to use the most current ArcGIS software and MaxEnt software, and reduces the amount of time that would be spent developing common solutions. The central aim of this software is to automate complicated and repetitive spatial analyses in an intuitive graphical user interface. One core tenant facilitates careful parameterization of species distribution models (SDMs) to maximize each model's discriminatory ability and minimize overfitting. This includes carefully processing of occurrence data, environmental data, and model parameterization. This program directly interfaces with MaxEnt, one of the most powerful and widely used species distribution modeling software programs, although SDMtoolbox 2.0 is not limited to species distribution modeling or restricted to modeling in MaxEnt. Many of the SDM pre- and post-processing tools have 'universal' analogs for use with any modeling software. The current version contains a total of 79 scripts that harness the power of ArcGIS for macroecology, landscape genetics, and evolutionary studies. For example, these tools allow for biodiversity quantification (such as species richness or corrected weighted endemism), generation of least-cost paths and corridors among shared haplotypes, assessment of the significance of spatial randomizations, and enforcement of dispersal limitations of SDMs projected into future climates-to only name a few functions contained in SDMtoolbox 2.0. Lastly, dozens of generalized tools exists for batch processing and conversion of GIS data types or formats, which are broadly useful to any ArcMap user.

...read moreread less

451 citations

Journal Article•DOI•

Automated tilt series alignment and tomographic reconstruction in IMOD.

[...]

David N. Mastronarde¹, Susannah R. Held¹•Institutions (1)

University of Colorado Boulder¹

01 Feb 2017-Journal of Structural Biology

TL;DR: Automated tomographic reconstruction is now possible in the IMOD software package, including the merging of tomograms taken around two orthogonal axes, and a user interface for batch processing of tilt series is added to the Etomo program in IMOD.

...read moreread less

Journal Article•DOI•

Optimising UAV topographic surveys processed with structure-from-motion: Ground control quality, quantity and bundle adjustment

[...]

Mike R. James¹, Stuart Robson², Sebastian d'Oleire-Oltmanns³, U. Niethammer⁴•Institutions (4)

Lancaster University¹, University College London², University of Salzburg³, University of Stuttgart⁴

01 Mar 2017-Geomorphology

TL;DR: In this article, a Monte Carlo approach is proposed to improve the accuracy of SfM-based DEMs and minimise the associated field effort by robust determination of suitable lower-density deployments of ground control.

...read moreread less

Proceedings Article•

Inferring fine-grained control flow inside SGX enclaves with branch shadowing

[...]

Sangho Lee¹, Ming-Wei Shih¹, Prasun Gera¹, Taesoo Kim¹, Hyesoon Kim¹, Marcus Peinado² - Show less +2 more•Institutions (2)

Georgia Institute of Technology¹, Microsoft²

16 Aug 2017

TL;DR: A new, yet critical, side-channel attack, branch shadowing, that reveals fine-grained control flows (branch granularity) in an enclave and develops two novel exploitation techniques, a last branch record (LBR)-based history-inferring technique and an advanced programmable interrupt controller (APIC)-based technique to control the execution of an enclave in a finegrained manner.

...read moreread less

Abstract: Intel has introduced a hardware-based trusted execution environment, Intel Software Guard Extensions (SGX), that provides a secure, isolated execution environment, or enclave, for a user program without trusting any underlying software (e.g., an operating system) or firmware. Researchers have demonstrated that SGX is vulnerable to a page-fault-based attack. However, the attack only reveals page-level memory accesses within an enclave. In this paper, we explore a new, yet critical, side-channel attack, branch shadowing, that reveals fine-grained control flows (branch granularity) in an enclave. The root cause of this attack is that SGX does not clear branch history when switching from enclave to nonenclave mode, leaving fine-grained traces for the outside world to observe, which gives rise to a branch-prediction side channel. However, exploiting this channel in practice is challenging because 1) measuring branch execution time is too noisy for distinguishing fine-grained controlflow changes and 2) pausing an enclave right after it has executed the code block we target requires sophisticated control. To overcome these challenges, we develop two novel exploitation techniques: 1) a last branch record (LBR)-based history-inferring technique and 2) an advanced programmable interrupt controller (APIC)-based technique to control the execution of an enclave in a finegrained manner. An evaluation against RSA shows that our attack infers each private key bit with 99.8% accuracy. Finally, we thoroughly study the feasibility of hardware-based solutions (i.e., branch history flushing) and propose a software-based approach that mitigates the attack.

...read moreread less

Journal Article•DOI•

CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.

[...]

Peijie Lin¹, Peijie Lin², Michael Troup², Joshua W. K. Ho¹, Joshua W. K. Ho² - Show less +1 more•Institutions (2)

University of New South Wales¹, Victor Chang Cardiac Research Institute²

28 Mar 2017-Genome Biology

TL;DR: This work introduces CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner.

...read moreread less

Abstract: Most existing dimensionality reduction and clustering packages for single-cell RNA-seq (scRNA-seq) data deal with dropouts by heavy modeling and computational machinery. Here, we introduce CIDR (Clustering through Imputation and Dimensionality Reduction), an ultrafast algorithm that uses a novel yet very simple implicit imputation approach to alleviate the impact of dropouts in scRNA-seq data in a principled manner. Using a range of simulated and real data, we show that CIDR improves the standard principal component analysis and outperforms the state-of-the-art methods, namely t-SNE, ZIFA, and RaceID, in terms of clustering accuracy. CIDR typically completes within seconds when processing a data set of hundreds of cells and minutes for a data set of thousands of cells. CIDR can be downloaded at https://github.com/VCCRI/CIDR .

...read moreread less

Journal Article•DOI•

A Survey of App Store Analysis for Software Engineering

[...]

William Martin¹, Federica Sarro¹, Yue Jia¹, Yuanyuan Zhang¹, Mark Harman¹ - Show less +1 more•Institutions (1)

University College London¹

01 Sep 2017-IEEE Transactions on Software Engineering

TL;DR: This survey describes and compares the areas of research that have been explored thus far, drawing out common aspects, trends and directions future research should take to address open problems and challenges.

...read moreread less

Abstract: App Store Analysis studies information about applications obtained from app stores. App stores provide a wealth of information derived from users that would not exist had the applications been distributed via previous software deployment methods. App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these forms of software repositories. Findings from App Store Analysis have a direct and actionable impact on the software teams that develop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security and testing. This survey describes and compares the areas of research that have been explored thus far, drawing out common aspects, trends and directions future research should take to address open problems and challenges.

...read moreread less

Journal Article•DOI•

Unpacking the Black Box: Applications and Considerations for Using GPS Devices in Sport.

[...]

James J. Malone, Richard J Lovell, Matthew C. Varley, Aaron J. Coutts

02 May 2017-International Journal of Sports Physiology and Performance

TL;DR: Through a standard approach to data collection and procedure reporting, researchers and practitioners will be able to make more confident comparisons from their data, which will improve the understanding and impact these devices can have on athlete performance.

...read moreread less

Abstract: Athlete-tracking devices that include global positioning system (GPS) and microelectrical mechanical system (MEMS) components are now commonplace in sport research and practice. These devices provide large amounts of data that are used to inform decision making on athlete training and performance. However, the data obtained from these devices are often provided without clear explanation of how these metrics are obtained. At present, there is no clear consensus regarding how these data should be handled and reported in a sport context. Therefore, the aim of this review was to examine the factors that affect the data produced by these athlete-tracking devices and to provide guidelines for collecting, processing, and reporting of data. Many factors including device sampling rate, positioning and fitting of devices, satellite signal, and data-filtering methods can affect the measures obtained from GPS and MEMS devices. Therefore researchers are encouraged to report device brand/model, sampling frequency, number of satellites, horizontal dilution of precision, and software/firmware versions in any published research. In addition, details of inclusion/exclusion criteria for data obtained from these devices are also recommended. Considerations for the application of speed zones to evaluate the magnitude and distribution of different locomotor activities recorded by GPS are also presented, alongside recommendations for both industry practice and future research directions. Through a standard approach to data collection and procedure reporting, researchers and practitioners will be able to make more confident comparisons from their data, which will improve the understanding and impact these devices can have on athlete performance.

...read moreread less

Journal Article•DOI•

metaX: a flexible and comprehensive software for processing metabolomics data

[...]

Bo Wen, Zhanlong Mei, Chunwei Zeng, Siqi Liu

21 Mar 2017-BMC Bioinformatics

TL;DR: An R package, metaX, is developed that is capable of end-to-end metabolomics data analysis through a set of interchangeable modules and provides several functions, such as peak picking and annotation, data quality assessment, missing value imputation, data normalization, univariate and multivariate statistics, power analysis and sample size estimation.

...read moreread less

Abstract: Non-targeted metabolomics based on mass spectrometry enables high-throughput profiling of the metabolites in a biological sample. The large amount of data generated from mass spectrometry requires intensive computational processing for annotation of mass spectra and identification of metabolites. Computational analysis tools that are fully integrated with multiple functions and are easily operated by users who lack extensive knowledge in programing are needed in this research field. We herein developed an R package, metaX, that is capable of end-to-end metabolomics data analysis through a set of interchangeable modules. Specifically, metaX provides several functions, such as peak picking and annotation, data quality assessment, missing value imputation, data normalization, univariate and multivariate statistics, power analysis and sample size estimation, receiver operating characteristic analysis, biomarker selection, pathway annotation, correlation network analysis, and metabolite identification. In addition, metaX offers a web-based interface ( http://metax.genomics.cn ) for data quality assessment and normalization method evaluation, and it generates an HTML-based report with a visualized interface. The metaX utilities were demonstrated with a published metabolomics dataset on a large scale. The software is available for operation as either a web-based graphical user interface (GUI) or in the form of command line functions. The package and the example reports are available at http://metax.genomics.cn/ . The pipeline of metaX is platform-independent and is easy to use for analysis of metabolomics data generated from mass spectrometry.

...read moreread less

Journal Article•DOI•

NewGene: A Conceptual Manual

[...]

D. Scott Bennett¹, Paul Poast², Allan C. Stam³•Institutions (3)

Pennsylvania State University¹, University of Chicago², University of Virginia³

28 Jun 2017-Social Science Research Network

TL;DR: NewGene is software designed to eliminate many of the difficulties commonly involved in constructing large international relations data sets by providing a highly flexible platform on which users can construct datasets for international relations research using pre-loaded data or by incorporating their own data.

...read moreread less

Abstract: This paper introduces a complete redesign of the popular EUGene software, called NewGene. Like EUGene, NewGene is software designed to eliminate many of the difficulties commonly involved in constructing large international relations data sets. NewGene is a stand-alone Microsoft Windows and Osx based program for the construction of annual, monthly, and daily data sets for the variety of decision making units (e.g. countries, leaders, organizations, etc) used in quantitative studies of international relations. It also provides users the ability to construct units of analysis ranging from monads (e.g. country-year), to dyads (e.g. country1-country2-year), to extra-dyadic observations called k-ads (e.g. country1-country2-…-countryk-year). NewGene’s purpose is to provide a highly flexible platform on which users can construct datasets for international relations research using pre-loaded data or by incorporating their own data.

...read moreread less

Journal Article•DOI•

Wordbank: an open repository for developmental vocabulary data.

[...]

Michael C. Frank¹, Mika Braginsky¹, Daniel Yurovsky¹, Virginia A. Marchman¹•Institutions (1)

Stanford University¹

01 May 2017-Journal of Child Language

TL;DR: Wordbank as mentioned in this paper is a structured database of parent-report data combined with a browsable web interface for exploring patterns of vocabulary growth at the level of both individual children and particular words.

...read moreread less

Abstract: The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.

...read moreread less

Python Software Foundation

[...]

Other Contributors Are Indicated Where They Contribute

29 May 2017

Journal Article•DOI•

Fractal assembly of micrometre-scale DNA origami arrays with arbitrary patterns.

[...]

Grigory Tikhomirov¹, Philip Petersen¹, Lulu Qian¹•Institutions (1)

California Institute of Technology¹

01 Dec 2017-Nature

TL;DR: This work shows that by using simple local assembly rules that are modified and applied recursively throughout a hierarchical, multistage assembly process, a small and constant set of unique DNA strands can be used to create DNA origami arrays of increasing size and with arbitrary patterns.

...read moreread less

Abstract: Self-assembled DNA nanostructures enable nanometre-precise patterning that can be used to create programmable molecular machines and arrays of functional materials. DNA origami is particularly versatile in this context because each DNA strand in the origami nanostructure occupies a unique position and can serve as a uniquely addressable pixel. However, the scale of such structures has been limited to about 0.05 square micrometres, hindering applications that demand a larger layout and integration with more conventional patterning methods. Hierarchical multistage assembly of simple sets of tiles can in principle overcome this limitation, but so far has not been sufficiently robust to enable successful implementation of larger structures using DNA origami tiles. Here we show that by using simple local assembly rules that are modified and applied recursively throughout a hierarchical, multistage assembly process, a small and constant set of unique DNA strands can be used to create DNA origami arrays of increasing size and with arbitrary patterns. We illustrate this method, which we term ‘fractal assembly’, by producing DNA origami arrays with sizes of up to 0.5 square micrometres and with up to 8,704 pixels, allowing us to render images such as the Mona Lisa and a rooster. We find that self-assembly of the tiles into arrays is unaffected by changes in surface patterns on the tiles, and that the yield of the fractal assembly process corresponds to about 0.95^(m − 1) for arrays containing m tiles. When used in conjunction with a software tool that we developed that converts an arbitrary pattern into DNA sequences and experimental protocols, our assembly method is readily accessible and will facilitate the construction of sophisticated materials and devices with sizes similar to that of a bacterium using DNA nanostructures.

...read moreread less

Journal Article•DOI•

Software Defined Space-Air-Ground Integrated Vehicular Networks: Challenges and Solutions

[...]

Ning Zhang¹, Shan Zhang², Peng Yang³, Omar Alhussein², Weihua Zhuang², Xuemin Sherman Shen² - Show less +2 more•Institutions (3)

University of Toronto¹, University of Waterloo², Huazhong University of Science and Technology³

14 Jul 2017-IEEE Communications Magazine

TL;DR: A software defined spaceair- ground integrated network architecture for supporting diverse vehicular services in a seamless, efficient, and cost-effective manner is proposed.

...read moreread less

Abstract: This article proposes a software defined spaceair- ground integrated network architecture for supporting diverse vehicular services in a seamless, efficient, and cost-effective manner. First, the motivations and challenges for integration of space-air-ground networks are reviewed. Second, a software defined network architecture with a layered structure is presented. To protect the legacy services in the satellite, aerial, and terrestrial segments, resources in each segment are sliced through network slicing to achieve service isolation. Then available resources are put into a common and dynamic space-air-ground resource pool, which is managed by hierarchical controllers to accommodate vehicular services. Finally, a case study is carried out, followed by discussion on some open research topics.

...read moreread less

Proceedings Article•DOI•

Fairness testing: testing software for discrimination

[...]

Sainyam Galhotra¹, Yuriy Brun¹, Alexandra Meliou¹•Institutions (1)

University of Massachusetts Amherst¹

21 Aug 2017

TL;DR: Themis as discussed by the authors is a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior, and generates efficient test suites to measure discrimination.

...read moreread less

Abstract: This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.

...read moreread less

Journal Article•DOI•

NMRbox: A Resource for Biomolecular NMR Computation

[...]

Mark W. Maciejewski¹, Adam D. Schuyler¹, Michael R. Gryk¹, Ion I. Moraru¹, Pedro R Romero², Eldon L. Ulrich², Hamid R. Eghbalnia², Miron Livny², Frank Delaglio³, Jeffrey C. Hoch¹ - Show less +6 more•Institutions (3)

University of Connecticut Health Center¹, University of Wisconsin-Madison², National Institute of Standards and Technology³

25 Apr 2017-Biophysical Journal

TL;DR: NMRbox is a shared resource for NMR software and computation that employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud.

...read moreread less

Journal Article•DOI•

Good enough practices in scientific computing

[...]

Greg Wilson, Jennifer Bryan¹, Karen Cranston², Justin Kitzes³, Lex Nederbragt⁴, Tracy K. Teal - Show less +2 more•Institutions (4)

University of British Columbia¹, Duke University², University of California, Berkeley³, University of Oslo⁴

22 Jun 2017-PLOS Computational Biology

TL;DR: In this article, the authors present a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.

...read moreread less

Abstract: Author summary Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for reproducibility, but researchers new to computing often don't know where to start. This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide variety of published sources from our daily lives and from our work with volunteer organizations that have delivered workshops to over 11,000 people since 2010.

...read moreread less

Proceedings Article•DOI•

Software Defect Prediction via Convolutional Neural Network

[...]

Jian Li¹, Pinjia He¹, Jieming Zhu¹, Michael R. Lyu¹•Institutions (1)

The Chinese University of Hong Kong¹

25 Jul 2017

TL;DR: This paper proposes a framework called Defect Prediction via Convolutional Neural Network (DP-CNN), which leverages deep learning for effective feature generation and evaluates the method on seven open source projects in terms of F-measure in defect prediction.

...read moreread less

Abstract: To improve software reliability, software defect prediction is utilized to assist developers in finding potential bugs and allocating their testing efforts. Traditional defect prediction studies mainly focus on designing hand-crafted features, which are input into machine learning classifiers to identify defective code. However, these hand-crafted features often fail to capture the semantic and structural information of programs. Such information is important in modeling program functionality and can lead to more accurate defect prediction.In this paper, we propose a framework called Defect Prediction via Convolutional Neural Network (DP-CNN), which leverages deep learning for effective feature generation. Specifically, based on the programs' Abstract Syntax Trees (ASTs), we first extract token vectors, which are then encoded as numerical vectors via mapping and word embedding. We feed the numerical vectors into Convolutional Neural Network to automatically learn semantic and structural features of programs. After that, we combine the learned features with traditional hand-crafted features, for accurate software defect prediction. We evaluate our method on seven open source projects in terms of F-measure in defect prediction. The experimental results show that in average, DP-CNN improves the state-of-the-art method by 12%.

...read moreread less

Journal Article•DOI•

Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey

[...]

Seyed Mohammad Ghaffarian¹, Hamid Reza Shahriari¹•Institutions (1)

Amirkabir University of Technology¹

25 Aug 2017-ACM Computing Surveys

TL;DR: An extensive review of the many different works in the field of software vulnerability analysis and discovery that utilize machine-learning and data-mining techniques that utilize both advantages and shortcomings in this domain is provided.

...read moreread less

Abstract: Software security vulnerabilities are one of the critical issues in the realm of computer security. Due to their potential high severity impacts, many different approaches have been proposed in the past decades to mitigate the damages of software vulnerabilities. Machine-learning and data-mining techniques are also among the many approaches to address this issue. In this article, we provide an extensive review of the many different works in the field of software vulnerability analysis and discovery that utilize machine-learning and data-mining techniques. We review different categories of works in this domain, discuss both advantages and shortcomings, and point out challenges and some uncharted territories in the field.

...read moreread less

Posted Content•

Explainable AI: Beware of Inmates Running the Asylum Or: How I Learnt to Stop Worrying and Love the Social and Behavioural Sciences.

[...]

Tim Miller, Piers D. L. Howe, Liz Sonenberg

02 Dec 2017-arXiv: Artificial Intelligence

TL;DR: From a light scan of literature, it is demonstrated that there is considerable scope to infuse more results from the social and behavioural sciences into explainable AI, and some key results from these fields that are relevant to explainableAI are presented.

...read moreread less

Abstract: In his seminal book `The Inmates are Running the Asylum: Why High-Tech Products Drive Us Crazy And How To Restore The Sanity' [2004, Sams Indianapolis, IN, USA], Alan Cooper argues that a major reason why software is often poorly designed (from a user perspective) is that programmers are in charge of design decisions, rather than interaction designers. As a result, programmers design software for themselves, rather than for their target audience, a phenomenon he refers to as the `inmates running the asylum'. This paper argues that explainable AI risks a similar fate. While the re-emergence of explainable AI is positive, this paper argues most of us as AI researchers are building explanatory agents for ourselves, rather than for the intended users. But explainable AI is more likely to succeed if researchers and practitioners understand, adopt, implement, and improve models from the vast and valuable bodies of research in philosophy, psychology, and cognitive science, and if evaluation of these models is focused more on people than on technology. From a light scan of literature, we demonstrate that there is considerable scope to infuse more results from the social and behavioural sciences into explainable AI, and present some key results from these fields that are relevant to explainable AI.

...read moreread less

Journal Article•DOI•

OpenMEE : Intuitive, open-source software for meta-analysis in ecology and evolutionary biology

[...]

Byron C. Wallace¹, Marc J. Lajeunesse², George Dietz³, Issa J Dahabreh³, Thomas A Trikalinos³, Christopher H. Schmid³, Jessica Gurevitch⁴ - Show less +3 more•Institutions (4)

Northeastern University¹, University of South Florida², Brown University³, Stony Brook University⁴

01 Aug 2017-Methods in Ecology and Evolution

TL;DR: The developed OpenMEE: Open Meta‐analyst for Ecology and Evolution is a cross‐platform, easy‐to‐use graphical user interface that gives E&E researchers access to the diverse and advanced statistical functionalities offered in R, without requiring knowledge of R programming.

...read moreread less

Abstract: Summary Meta-analysis and meta-regression are statistical methods for synthesizing and modelling the results of different studies, and are critical research synthesis tools in ecology and evolutionary biology (E&E). However, many E&E researchers carry out meta-analyses using software that is limited in its statistical functionality and is not easily updatable. It is likely that these software limitations have slowed the uptake of new methods in E&E and limited the scope and quality of inferences from research syntheses. We developed OpenMEE: Open Meta-analyst for Ecology and Evolution to address the need for advanced, easy-to-use software for meta-analysis and meta-regression. OpenMEE has a cross-platform, easy-to-use graphical user interface (GUI) that gives E&E researchers access to the diverse and advanced statistical functionalities offered in R, without requiring knowledge of R programming. OpenMEE offers a suite of advanced meta-analysis and meta-regression methods for synthesizing continuous and categorical data, including meta-regression with multiple covariates and their interactions, phylogenetic analyses, and simple missing data imputation. OpenMEE also supports data importing and exporting, exploratory data analysis, graphing of data, and summary table generation. As intuitive, open-source, free software for advanced methods in meta-analysis, OpenMEE meets the current and pressing needs of the E&E community for teaching meta-analysis and conducting high-quality syntheses. Because OpenMEE's statistical components are written in R, new methods and packages can be rapidly incorporated into the software. To fully realize the potential of OpenMEE, we encourage community development with an aim to advance the capabilities of meta-analyses in E&E.

...read moreread less

Journal Article•DOI•

MicMac – a free, open-source solution for photogrammetry

[...]

Ewelina Rupnik¹, Ewelina Rupnik², M. Daakir¹, Marc Deseilligny¹•Institutions (2)

École Normale Supérieure¹, Institut de Physique du Globe de Paris²

01 Jun 2017-Open Geospatial Data, Software and Standards

TL;DR: The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints.

...read moreread less

Abstract: The publication familiarizes the reader with MicMac - a free, open-source photogrammetric software for 3D reconstruction A brief history of the tool, its organisation and unique features vis-a-vis other software tools are in the highlight The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints

...read moreread less

Collapse