Showing papers on "Software published in 2018"

PDF

Open Access

Journal Article•DOI•

MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

[...]

Sudhir Kumar¹, Sudhir Kumar², Glen Stecher¹, Michael Li¹, Christina Knyaz¹, Koichiro Tamura³ - Show less +2 more•Institutions (3)

Temple University¹, King Abdulaziz University², Tokyo Metropolitan University³

01 Jun 2018-Molecular Biology and Evolution

TL;DR: The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine and has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses.

...read moreread less

Abstract: The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.

...read moreread less

21,952 citations

Journal Article•DOI•

UCSF ChimeraX: Meeting modern challenges in visualization and analysis.

[...]

Thomas D. Goddard¹, Conrad C. Huang¹, Elaine C. Meng¹, Eric F. Pettersen¹, Gregory S. Couch¹, John H. Morris¹, Thomas E. Ferrin¹ - Show less +3 more•Institutions (1)

University of California, San Francisco¹

01 Jan 2018-Protein Science

TL;DR: This article highlights some specific advances in the areas of visualization and usability, performance, and extensibility in ChimeraX.

...read moreread less

Abstract: UCSF ChimeraX is next-generation software for the visualization and analysis of molecular structures, density maps, 3D microscopy, and associated data. It addresses challenges in the size, scope, and disparate types of data attendant with cutting-edge experimental methods, while providing advanced options for high-quality rendering (interactive ambient occlusion, reliable molecular surface calculations, etc.) and professional approaches to software design and distribution. This article highlights some specific advances in the areas of visualization and usability, performance, and extensibility. ChimeraX is free for noncommercial use and is available from http://www.rbvi.ucsf.edu/chimerax/ for Windows, Mac, and Linux.

...read moreread less

2,866 citations

Journal Article•DOI•

IsoplotR: A free and open toolbox for geochronology

[...]

Pieter Vermeesch¹•Institutions (1)

University College London¹

11 Apr 2018-Geoscience frontiers

TL;DR: The basic principles of radiometric geochronology as implemented in a new software package called IsoplotR, which was designed to be free, flexible and future-proof, are reviewed.

...read moreread less

Abstract: This paper reviews the basic principles of radiometric geochronology as implemented in a new software package called IsoplotR, which was designed to be free, flexible and future-proof. IsoplotR is free because it is written in non-proprietary languages (R, Javascript and HTML) and is released under the GPL license. The program is flexible because its graphical user interface (GUI) is separated from the command line functionality, and because its code is completely open for inspection and modification. To increase future-proofness, the software is built on free and platform-independent foundations that adhere to international standards, have existed for several decades, and continue to grow in popularity. IsoplotR currently includes functions for U-Pb, Pb-Pb, 40 Ar/ 39 Ar, Rb-Sr, Sm-Nd, Lu-Hf, Re-Os, U-Th-He, fission track and U-series disequilibrium dating. It implements isochron regression in two and three dimensions, visualises multi-aliquot datasets as cumulative age distributions, kernel density estimates and radial plots, and calculates weighted mean ages using a modified Chauvenet outlier detection criterion that accounts for the analytical uncertainties in heteroscedastic datasets. Overdispersion of geochronological data with respect to these analytical uncertainties can be attributed to either a proportional underestimation of the analytical uncertainties, or to an additive geological scatter term. IsoplotR keeps track of error correlations of the isotopic ratio measurements within aliquots of the same samples. It uses a statistical framework that will allow it to handle error correlations between aliquots in the future. Other ongoing developments include the implementation of alternative user interfaces and the integration of IsoplotR with other data reduction software.

...read moreread less

1,320 citations

Proceedings Article•DOI•

ESPNet: End-to-end speech processing toolkit

[...]

Shinji Watanabe¹, Takaaki Hori², Shigeki Karita, Tomoki Hayashi³, Jiro Nishitoba, Yuya Unno, Nelson Yalta⁴, Jahn Heymann⁵, Matthew Wiesner¹, Nanxin Chen¹, Adithya Renduchintala¹, Tsubasa Ochiai⁶ - Show less +8 more•Institutions (6)

Johns Hopkins University¹, Mitsubishi Electric², Nagoya University³, Waseda University⁴, University of Paderborn⁵, Doshisha University⁶

30 Mar 2018

TL;DR: In this article, a new open source platform for end-to-end speech processing named ESPnet is introduced, which mainly focuses on automatic speech recognition (ASR), and adopts widely used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine.

...read moreread less

Abstract: This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.

...read moreread less

806 citations

Journal Article•DOI•

cisTEM, user-friendly software for single-particle image processing

[...]

Tim Grant¹, Alexis Rohou¹, Nikolaus Grigorieff¹•Institutions (1)

Howard Hughes Medical Institute¹

07 Mar 2018-eLife

TL;DR: New open-source software called cisTEM (computational imaging system for transmission electron microscopy) for the processing of data for high-resolution electron cryo-microscopy and single-particle averaging is developed, optimized to enable processing of typical datasets on a high-end, CPU-based workstation in half a day or less, comparable to GPU-accelerated processing.

...read moreread less

Abstract: We have developed new open-source software called cisTEM (computational imaging system for transmission electron microscopy) for the processing of data for high-resolution electron cryo-microscopy and single-particle averaging. cisTEM features a graphical user interface that is used to submit jobs, monitor their progress, and display results. It implements a full processing pipeline including movie processing, image defocus determination, automatic particle picking, 2D classification, ab-initio 3D map generation from random parameters, 3D classification, and high-resolution refinement and reconstruction. Some of these steps implement newly-developed algorithms; others were adapted from previously published algorithms. The software is optimized to enable processing of typical datasets (2000 micrographs, 200 k - 300 k particles) on a high-end, CPU-based workstation in half a day or less, comparable to GPU-accelerated processing. Jobs can also be scheduled on large computer clusters using flexible run profiles that can be adapted for most computing environments. cisTEM is available for download from cistem.org.

...read moreread less

746 citations

Journal Article•DOI•

CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins.

[...]

David Couvin¹, Aude Bernheim², Aude Bernheim³, Claire Toffano-Nioche¹, Marie Touchon³, Marie Touchon², Juraj Michalik⁴, Bertrand Néron³, Eduardo P. C. Rocha³, Eduardo P. C. Rocha², Gilles Vergnaud¹, Daniel Gautheret¹, Christine Pourcel¹ - Show less +9 more•Institutions (4)

Université Paris-Saclay¹, Centre national de la recherche scientifique², Pasteur Institute³, university of lille⁴

02 Jul 2018-Nucleic Acids Research

TL;DR: The program includes an improved CRISPR array detection tool facilitating expert validation based on a rating system, prediction ofCRISPR orientation and a Cas protein detection and typing tool updated to match the latest classification scheme of these systems.

...read moreread less

Abstract: CRISPR (clustered regularly interspaced short palindromic repeats) arrays and their associated (Cas) proteins confer bacteria and archaea adaptive immunity against exogenous mobile genetic elements, such as phages or plasmids. CRISPRCasFinder allows the identification of both CRISPR arrays and Cas proteins. The program includes: (i) an improved CRISPR array detection tool facilitating expert validation based on a rating system, (ii) prediction of CRISPR orientation and (iii) a Cas protein detection and typing tool updated to match the latest classification scheme of these systems. CRISPRCasFinder can either be used online or as a standalone tool compatible with Linux operating system. All third-party software packages employed by the program are freely available. CRISPRCasFinder is available at https://crisprcas.i2bc.paris-saclay.fr.

...read moreread less

740 citations

Journal Article•DOI•

DIALS: implementation and evaluation of a new integration package

[...]

Graeme Winter, David G. Waterman¹, James M. Parkhurst², Aaron S. Brewster³, Richard J. Gildea, Markus Gerstel, Luis Fuentes-Montero, Melanie Vollmar, Tara Michels-Clark³, Iris D. Young³, Nicholas K. Sauter³, Gwyndaf Evans - Show less +8 more•Institutions (3)

Rutherford Appleton Laboratory¹, Laboratory of Molecular Biology², Lawrence Berkeley National Laboratory³

01 Feb 2018-Acta Crystallographica Section D-biological Crystallography

TL;DR: A new X-ray diffraction data-analysis package is presented with a description of the algorithms and examples of its application to biological and chemical crystallography.

...read moreread less

Abstract: The DIALS project is a collaboration between Diamond Light Source, Lawrence Berkeley National Laboratory and CCP4 to develop a new software suite for the analysis of crystallographic X-ray diffraction data, initially encompassing spot finding, indexing, refinement and integration. The design, core algorithms and structure of the software are introduced, alongside results from the analysis of data from biological and chemical crystallography experiments.

...read moreread less

733 citations

Journal Article•DOI•

Markov State Models: From an Art to a Science

[...]

Brooke E. Husic¹, Vijay S. Pande¹•Institutions (1)

Stanford University¹

02 Feb 2018-Journal of the American Chemical Society

TL;DR: An overview of the MSM field to date is presented, presented for a general audience as a timeline of key developments in the field, and the current frontiers of methods development are highlighted, as well as exciting applications in experimental design and drug discovery.

...read moreread less

Abstract: Markov state models (MSMs) are a powerful framework for analyzing dynamical systems, such as molecular dynamics (MD) simulations, that have gained widespread use over the past several decades. This perspective offers an overview of the MSM field to date, presented for a general audience as a timeline of key developments in the field. We sequentially address early studies that motivated the method, canonical papers that established the use of MSMs for MD analysis, and subsequent advances in software and analysis protocols. The derivation of a variational principle for MSMs in 2013 signified a turning point from expertise-driving MSM building to a systematic, objective protocol. The variational approach, combined with best practices for model selection and open-source software, enabled a wide range of MSM analysis for applications such as protein folding and allostery, ligand binding, and protein–protein association. To conclude, the current frontiers of methods development are highlighted, as well as excit...

...read moreread less

555 citations

Journal Article•DOI•

Improvements to the APBS biomolecular solvation software suite.

[...]

Elizabeth Jurrus¹, Dave Engel¹, Keith T. Star¹, Kyle E. Monson¹, Juan Brandi¹, Lisa E. Felberg, David H. Brookes, Leighton Wilson², Jiahui Chen³, Karina Liles¹, Minju Chun¹, Peter Li¹, David W. Gohara, Todd J. Dolinsky, Robert Konecny⁴, David Ryan Koes⁵, Jens Erik Nielsen, Teresa Head-Gordon, Weihua Geng³, Robert Krasny, Guo-Wei Wei⁶, Michael Holst⁴, J. Andrew McCammon⁴, Nathan A. Baker⁷ - Show less +20 more•Institutions (7)

Pacific Northwest National Laboratory¹, University of Michigan², Southern Methodist University³, University of California, Berkeley⁴, University of Pittsburgh⁵, Michigan State University⁶, Brown University⁷

01 Jan 2018-Protein Science

TL;DR: The Adaptive Poisson-Boltzmann Solver (APBS) as mentioned in this paper was developed to solve the equations of continuum electrostatics for large biomolecular assemblages that have provided impact in the study of a broad range of chemical, biological and biomedical applications.

...read moreread less

Abstract: The Adaptive Poisson-Boltzmann Solver (APBS) software was developed to solve the equations of continuum electrostatics for large biomolecular assemblages that have provided impact in the study of a broad range of chemical, biological, and biomedical applications. APBS addresses the three key technology challenges for understanding solvation and electrostatics in biomedical applications: accurate and efficient models for biomolecular solvation and electrostatics, robust and scalable software for applying those theories to biomolecular systems, and mechanisms for sharing and analyzing biomolecular electrostatics data in the scientific community. To address new research applications and advancing computational capabilities, we have continually updated APBS and its suite of accompanying software since its release in 2001. In this article, we discuss the models and capabilities that have recently been implemented within the APBS software package including a Poisson-Boltzmann analytical and a semi-analytical solver, an optimized boundary element solver, a geometry-based geometric flow solvation model, a graph theory-based algorithm for determining pKa values, and an improved web-based visualization tool for viewing electrostatics.

...read moreread less

541 citations

Proceedings Article•DOI•

Deep code comment generation

[...]

Xing Hu¹, Ge Li¹, Xin Xia², David Lo³, Zhi Jin¹ - Show less +1 more•Institutions (3)

Peking University¹, Monash University², Singapore Management University³

28 May 2018

TL;DR: DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features for better comments generation of Java methods.

...read moreread less

Abstract: During software maintenance, code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in the software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named DeepCom to automatically generate code comments for Java methods. The generated comments aim to help developers understand the functionality of Java methods. DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. We use a deep neural network that analyzes structural information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects from GitHub. We evaluate the experimental results on a machine translation metric. Experimental results demonstrate that our method DeepCom outperforms the state-of-the-art by a substantial margin.

...read moreread less

541 citations

Journal Article•DOI•

Comparing implementations of global and local indicators of spatial association

[...]

Roger Bivand¹, David Wong²•Institutions (2)

Norwegian School of Economics¹, George Mason University²

01 Sep 2018-Test

TL;DR: This comparison will consider the implementations of global Moran's I, Getis–Ord G and Geary’s C, local $$I_i$$Ii and $$G-i$$Gi, available in a range of software including Crimestat, GeoDa, ArcGIS, PySAL and R contributed packages.

...read moreread less

Abstract: Functions to calculate measures of spatial association, especially measures of spatial autocorrelation, have been made available in many software applications Measures may be global, applying to the whole data set under consideration, or local, applying to each observation in the data set Methods of statistical inference may also be provided, but these will, like the measures themselves, depend on the support of the observations, chosen assumptions, and the way in which spatial association is represented; spatial weights are often used as a representational technique In addition, assumptions may be made about the underlying mean model, and about error distributions Different software implementations may choose to expose these choices to the analyst, but the sets of choices available may vary between these implementations, as may default settings This comparison will consider the implementations of global Moran’s I, Getis–Ord G and Geary’s C, local $$I_i$$ and $$G_i$$ , available in a range of software including Crimestat, GeoDa, ArcGIS, PySAL and R contributed packages

...read moreread less

Journal Article•DOI•

Mordred: a molecular descriptor calculator

[...]

Hirotomo Moriwaki¹, Yu-Shi Tian¹, Norihito Kawashita², Tatsuya Takagi¹•Institutions (2)

Osaka University¹, Kindai University²

06 Feb 2018-Journal of Cheminformatics

TL;DR: Owing to its good performance, convenience, number of descriptors, and a lax licensing constraint, Mordred is a promising choice of molecular descriptor calculation software that can be utilized for cheminformatics studies, such as those on quantitative structure–property relationships.

...read moreread less

Abstract: Molecular descriptors are widely employed to present molecular characteristics in cheminformatics. Various molecular-descriptor-calculation software programs have been developed. However, users of those programs must contend with several issues, including software bugs, insufficient update frequencies, and software licensing constraints. To address these issues, we propose Mordred, a developed descriptor-calculation software application that can calculate more than 1800 two- and three-dimensional descriptors. It is freely available via GitHub. Mordred can be easily installed and used in the command line interface, as a web application, or as a high-flexibility Python package on all major platforms (Windows, Linux, and macOS). Performance benchmark results show that Mordred is at least twice as fast as the well-known PaDEL-Descriptor and it can calculate descriptors for large molecules, which cannot be accomplished by other software. Owing to its good performance, convenience, number of descriptors, and a lax licensing constraint, Mordred is a promising choice of molecular descriptor calculation software that can be utilized for cheminformatics studies, such as those on quantitative structure–property relationships.

...read moreread less

Proceedings Article•

Azure accelerated networking: SmartNICs in the public cloud

[...]

Daniel Firestone¹, Andrew Putnam¹, Mundkur Sambhrama Madhusudhan¹, Derek Chiou¹, Alireza Dabagh¹, Mike Andrewartha¹, Hari Angepat¹, Vivek Bhanu¹, Adrian M. Caulfield¹, Eric S. Chung¹, Chandrappa Harish Kumar¹, Chaturmohta Somesh¹, Matt Humphrey¹, Jack Lavier¹, Lam Norman C¹, Fengfen Liu¹, Kalin Ovtcharov¹, Jitu Padhye¹, Gautham Popuri¹, Shachar Raindel¹, Tejas Sapre¹, Mark Shaw¹, Gabriel Silva¹, Madhan Sivakumar¹, Nisheeth Srivastava¹, Anshuman Verma¹, Qasim Zuhair¹, Deepak Bansal¹, Doug Burger¹, Kushagra Vaid¹, David A. Maltz¹, Albert Greenberg¹ - Show less +28 more•Institutions (1)

Microsoft¹

09 Apr 2018

TL;DR: The design of AccelNet is presented, including the hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying Accel net on FPGA-based Azure SmartNICs.

...read moreread less

Abstract: Modern cloud architectures rely on each server running its own networking stack to implement policies such as tunneling for virtual networks, security, and load balancing. However, these networking stacks are becoming increasingly complex as features are added and as network speeds increase. Running these stacks on CPU cores takes away processing power from VMs, increasing the cost of running cloud services, and adding latency and variability to network performance. We present Azure Accelerated Networking (AccelNet), our solution for offloading host networking to hardware, using custom Azure SmartNICs based on FPGAs. We define the goals of AccelNet, including programmability comparable to software, and performance and efficiency comparable to hardware. We show that FPGAs are the best current platform for offloading our networking stack as ASICs do not provide sufficient programmability, and embedded CPU cores do not provide scalable performance, especially on single network flows. Azure SmartNICs implementing AccelNet have been deployed on all new Azure servers since late 2015 in a fleet of >1M hosts. The AccelNet service has been available for Azure customers since 2016, providing consistent <15µs VM-VM TCP latencies and 32Gbps throughput, which we believe represents the fastest network available to customers in the public cloud. We present the design of AccelNet, including our hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying AccelNet on FPGA-based Azure SmartNICs.

...read moreread less

Journal Article•DOI•

StructureSelector: A web-based software to select and visualize the optimal number of clusters using multiple methods.

[...]

Yu-Long Li¹, Jin-Xian Liu¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2018-Molecular Ecology Resources

TL;DR: A web‐based user‐friendly software structureselector is developed to calculate the four appealing alternative statistics together with the commonly used Ln Pr(X|K) and ΔK statistics.

...read moreread less

Abstract: Inferences of population genetic structure are of great importance to the fields of ecology and evolutionary biology. The program structure has been widely used to infer population genetic structure. However, previous studies demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure. The most widely used ΔK method tends to identify the uppermost hierarchy of population structure. Recently, four alternative statistics (medmedk, medmeak, maxmedk and maxmeak) were proposed, which appear to be more accurate than the previously used methods for both even and uneven sampling data. However, the lack of easy-to-use software limits the use of these appealing new estimators. Here, we developed a web-based user-friendly software structureselector to calculate the four appealing alternative statistics together with the commonly used Ln Pr(X|K) and ΔK statistics. structureselector accepts the result files of structure, admixture or faststructure as input files. It reports the "best" K for each estimator, and the results are available as HTML or tab separated tables. The program can also generate graphical representations for specific K, which can be easily downloaded from the server. The software is freely available at http://lmme.qdio.ac.cn/StructureSelector/.

...read moreread less

Proceedings Article•DOI•

Evaluating Fuzz Testing

[...]

George Klees¹, Andrew Ruef¹, Benji Cooper¹, Shiyi Wei², Michael Hicks¹ - Show less +1 more•Institutions (2)

University of Maryland, College Park¹, University of Texas at Dallas²

15 Oct 2018

TL;DR: In this paper, the authors surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers and found problems in every evaluation they considered and concluded that the general problems in existing experimental evaluations can indeed translate to actual wrong or misleading assessments.

...read moreread less

Abstract: Fuzz testing has enjoyed great success at discovering security critical bugs in real software. Recently, researchers have devoted significant effort to devising new fuzzing techniques, strategies, and algorithms. Such new ideas are primarily evaluated experimentally so an important question is: What experimental setup is needed to produce trustworthy results? We surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers. We found problems in every evaluation we considered. We then performed our own extensive experimental evaluation using an existing fuzzer. Our results showed that the general problems we found in existing experimental evaluations can indeed translate to actual wrong or misleading assessments. We conclude with some guidelines that we hope will help improve experimental evaluations of fuzz testing algorithms, making reported results more robust.

...read moreread less

Journal Article•

Digital Transformation

[...]

Christof Ebert, Carlos Henrique C. Duarte¹•Institutions (1)

National Australia Bank¹

01 Jul 2018-IEEE Software

TL;DR: This instalment of the Software Technology department discusses how the digital transformation is affecting software technology and the software industry.

...read moreread less

Abstract: This instalment of the Software Technology department discusses how the digital transformation is affecting software technology and the software industry.

...read moreread less

Journal Article•DOI•

PyPSA: Python for Power System Analysis

[...]

Tom Brown¹, Jonas Hörsch¹, David Schlachtberger¹•Institutions (1)

Frankfurt Institute for Advanced Studies¹

16 Jan 2018-Journal of open research software

TL;DR: The basic functionality of PyPSA is described, including the formulation of the full power flow equations and the multi-period optimisation of operation and investment with linear power flow equation.

...read moreread less

Abstract: Python for Power System Analysis (PyPSA) is a free software toolbox for simulating and optimising modern electrical power systems over multiple periods. PyPSA includes models for conventional generators with unit commitment, variable renewable generation, storage units, coupling to other energy sectors, and mixed alternating and direct current networks. It is designed to be easily extensible and to scale well with large networks and long time series. In this paper the basic functionality of PyPSA is described, including the formulation of the full power flow equations and the multi-period optimisation of operation and investment with linear power flow equations. PyPSA is positioned in the existing free software landscape as a bridge between traditional power flow analysis tools for steady-state analysis and full multi-period energy system models. The functionality is demonstrated on two open datasets of the transmission system in Germany (based on SciGRID) and Europe (based on GridKit). Funding statement: This research was conducted as part of the CoNDyNet project, which is supported by the German Federal Ministry of Education and Research under grant no. 03SF0472C. The responsibility for the contents lies solely with the authors

...read moreread less

Proceedings Article•DOI•

Automatic software repair: a survey

[...]

Luca Gazzola¹, Daniela Micucci¹, Leonardo Mariani¹•Institutions (1)

University of Milano-Bicocca¹

27 May 2018

TL;DR: A new class of approaches, namely program repair techniques, whose key idea is to try to automatically repair software systems by producing an actual fix that can be validated by the testers before it is finally accepted, or that is adapted to properly fit the system.

...read moreread less

Abstract: Despite their growing complexity and increasing size, modern software applications must satisfy strict release requirements that impose short bug fixing and maintenance cycles, putting significant pressure on developers who are responsible for timely producing high-quality software. To reduce developers workload, repairing and healing techniques have been extensively investigated as solutions for efficiently repairing and maintaining software in the last few years. In particular, repairing solutions have been able to automatically produce useful fixes for several classes of bugs that might be present in software programs. A range of algorithms, techniques, and heuristics have been integrated, experimented, and studied, producing a heterogeneous and articulated research framework where automatic repair techniques are proliferating. This paper organizes the knowledge in the area by surveying a body of 108 papers about automatic software repair techniques, illustrating the algorithms and the approaches, comparing them on representative examples, and discussing the open challenges and the empirical evidence reported so far.

...read moreread less

Proceedings Article•DOI•

DeepMutation: Mutation Testing of Deep Learning Systems

[...]

Lei Ma¹, Lei Ma², Fuyuan Zhang¹, Jiyuan Sun³, Minhui Xue¹, Bo Li⁴, Felix Juefei-Xu⁵, Chao Xie³, Li Li⁶, Yang Liu¹, Jianjun Zhao³, Yadong Wang² - Show less +8 more•Institutions (6)

Nanyang Technological University¹, Harbin Institute of Technology², Kyushu University³, University of Illinois at Urbana–Champaign⁴, Carnegie Mellon University⁵, Monash University⁶

16 Nov 2018

TL;DR: This paper proposes a mutation testing framework specialized for DL systems to measure the quality of test data, and designs a set of model-level mutation operators that directly inject faults into DL models without a training process.

...read moreread less

Abstract: Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models.

...read moreread less

Book Chapter•DOI•

The 2018 Signal Separation Evaluation Campaign

[...]

Fabian-Robert Stöter¹, Antoine Liutkus¹, Nobutaka Ito²•Institutions (2)

University of Montpellier¹, Nippon Telegraph and Telephone²

02 Jul 2018

TL;DR: SiSEC 2018 as mentioned in this paper was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning-based systems.

...read moreread less

Abstract: This paper reports the organization and results for the 2018 community-based Signal Separation Evaluation Campaign (SiSEC 2018). This year’s edition was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning based systems. For this purpose, we prepared a new music separation database: MUSDB18, featuring close to 10 h of audio. Additionally, open-source software was released to automatically load, process and report performance on MUSDB18. Furthermore, a new official Python version for the BSS Eval toolbox was released, along with reference implementations for three oracle separation methods: ideal binary mask, ideal ratio mask, and multichannel Wiener filter. We finally report the results obtained by the participants.

...read moreread less

Journal Article•DOI•

Dr. Probe: A software for high-resolution STEM image simulation.

[...]

Juri Barthel¹•Institutions (1)

RWTH Aachen University¹

01 Oct 2018-Ultramicroscopy

TL;DR: It is found that a significant averaging over many lattice configurations with different random atomic displacements is required to prevent atom-counting bias from simulations, and a strategy is developed for the amount of required averaging based on the estimated signal variance and the expected signal gain per atom in a column.

...read moreread less

Book Chapter•DOI•

knitr: A Comprehensive Tool for Reproducible Research in R

[...]

Yihui Xie

14 Dec 2018

TL;DR: In this paper, the authors propose to combine computer code and software 4documentation in the same document; the code and documentation can be identified by different markers, and they can either be compiled and mix the results with documentation or extract the source code from the document.

...read moreread less

Abstract: Reproducibility is the ultimate standard by which scientific findings are judged. From the computer science perspective, reproducible research is often related to literate programming [13], a paradigm conceived by Donald Knuth, and the basic idea is to combine computer code and software 4documentation in the same document; the code and documentation can be identified by different special markers. We can either compile the code and mix the results with documentation or extract the source code from the document. To some extent, this implies reproducibility because everything is generated automatically from computer code, and the code can reflect all the details about computing.

...read moreread less

Proceedings Article•DOI•

Automated Vulnerability Detection in Source Code Using Deep Representation Learning

[...]

R. Russell¹, Louis Kim¹, Lei Hamilton¹, Tomo Lazovich², Jacob Harer³, Onur Ozdemir¹, Paul M. Ellingwood, Marc W. McConley¹ - Show less +4 more•Institutions (3)

Charles Stark Draper Laboratory¹, Harvard University², Boston University³

11 Jul 2018

TL;DR: In this article, a deep feature representation learning based approach was proposed to detect vulnerabilities in C and C++ open-source code using machine learning techniques, and they evaluated their approach on code from real software packages and the NIST SATE IV benchmark dataset.

...read moreread less

Abstract: Increasing numbers of software vulnerabilities are discovered every year whether they are reported publicly or discovered internally in proprietary code. These vulnerabilities can pose serious risk of exploit and result in system compromise, information leaks, or denial of service. We leveraged the wealth of C and C++ open-source code available to develop a largescale function-level vulnerability detection system using machine learning. To supplement existing labeled vulnerability datasets, we compiled a vast dataset of millions of open-source functions and labeled it with carefully-selected findings from three different static analyzers that indicate potential exploits. Using these datasets, we developed a fast and scalable vulnerability detection tool based on deep feature representation learning that directly interprets lexed source code. We evaluated our tool on code from both real software packages and the NIST SATE IV benchmark dataset. Our results demonstrate that deep feature representation learning on source code is a promising approach for automated software vulnerability detection.

...read moreread less

Journal Article•DOI•

BindsNET: A Machine Learning-Oriented Spiking Neural Networks Library in Python.

[...]

Hananel Hazan¹, Daniel J. Saunders¹, Hassaan F. Khan¹, Devdhar Patel¹, Darpan T. Sanghavi¹, Hava T. Siegelmann¹, Robert Kozma¹ - Show less +3 more•Institutions (1)

University of Massachusetts Amherst¹

12 Dec 2018-Frontiers in Neuroinformatics

TL;DR: It is argued that this package facilitates the use of spiking networks for large-scale machine learning problems and some simple examples by using BindsNET in practice are shown.

...read moreread less

Abstract: The development of spiking neural network simulation software is a critical component enabling the modeling of neural systems and the development of biologically inspired algorithms. Existing software frameworks support a wide range of neural functionality, software abstraction levels, and hardware devices, yet are typically not suitable for rapid prototyping or application to problems in the domain of machine learning. In this paper, we describe a new Python package for the simulation of spiking neural networks, specifically geared toward machine learning and reinforcement learning. Our software, called BindsNET, enables rapid building and simulation of spiking networks and features user-friendly, concise syntax. BindsNET is built on the PyTorch deep neural networks library, facilitating the implementation of spiking neural networks on fast CPU and GPU computational platforms. Moreover, the BindsNET framework can be adjusted to utilize other existing computing and hardware backends; e.g., TensorFlow and SpiNNaker. We provide an interface with the OpenAI gym library, allowing for training and evaluation of spiking networks on reinforcement learning environments. We argue that this package facilitates the use of spiking networks for large-scale machine learning problems and show some simple examples by using BindsNET in practice.

...read moreread less

Proceedings Article•DOI•

PyDriller: Python framework for mining software repositories

[...]

Davide Spadini¹, Maurício Aniche¹, Alberto Bacchelli²•Institutions (2)

Delft University of Technology¹, University of Zurich²

26 Oct 2018

TL;DR: PyDriller is presented, a Python Framework that eases the process of mining Git, and is compared against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity.

...read moreread less

Abstract: Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most interesting growing fields within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git repository. In this paper, we present PyDriller, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity. URL: https://github.com/ishepard/pydriller Materials: https://doi.org/10.5281/zenodo.1327363 Pre-print: https://doi.org/10.5281/zenodo.1327411

...read moreread less

Journal Article•DOI•

bioBakery: a meta'omic analysis environment.

[...]

Lauren J. McIver¹, Galeb Abu-Ali², Galeb Abu-Ali¹, Eric A. Franzosa¹, Eric A. Franzosa², Randall Schwager², Randall Schwager¹, Xochitl C. Morgan³, Xochitl C. Morgan², Xochitl C. Morgan¹, Levi Waldron⁴, Nicola Segata⁵, Curtis Huttenhower¹, Curtis Huttenhower² - Show less +10 more•Institutions (5)

Harvard University¹, Broad Institute², University of Otago³, City University of New York⁴, University of Trento⁵

01 Apr 2018-Bioinformatics

TL;DR: A meta’omic analysis environment and collection of individual software tools with the capacity to process raw shotgun sequencing data into actionable microbial community feature profiles, summary reports, and publication-ready figures.

...read moreread less

Abstract: Summary bioBakery is a meta'omic analysis environment and collection of individual software tools with the capacity to process raw shotgun sequencing data into actionable microbial community feature profiles, summary reports, and publication-ready figures. It includes a collection of pre-configured analysis modules also joined into workflows for reproducibility. Availability and implementation bioBakery (http://huttenhower.sph.harvard.edu/biobakery) is publicly available for local installation as individual modules and as a virtual machine image. Each individual module has been developed to perform a particular task (e.g. quantitative taxonomic profiling or statistical analysis), and they are provided with source code, tutorials, demonstration data, and validation results; the bioBakery virtual image includes the entire suite of modules and their dependencies pre-installed. Images are available for both Amazon EC2 and Google Compute Engine. All software is open source under the MIT license. bioBakery is actively maintained with a support group at biobakery-users@googlegroups.com and new tools being added upon their release. Contact chuttenh@hsph.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Implementation of an Online Software Tool for the Analytic Hierarchy Process (AHP-OS)

[...]

Klaus D Goepel

06 Dec 2018-International Journal of the Analytic Hierarchy Process

TL;DR: The author has implemented a free, web-based AHP online system with noteworthy features, allowing for the detailed analysis of decision problems, and his intention was to provide a complete and free software tool for educational and research purposes.

...read moreread less

Abstract: The Analytic Hierarchy Process (AHP) remains a popular multi-criteria decision method The author has implemented a free, web-based AHP online system with noteworthy features, allowing for the detailed analysis of decision problems Besides standard functions, like flexible decision hierarchies, support to improve inconsistent judgments, and alternative evaluation and sensitivity analysis, the software can handle group input, calculate group consensus based on Shannon ? and ?-entropy and estimate weight uncertainties based on randomized small variations of input judgments In addition, different AHP judgment scales can be applied a posteriori and alternative evaluation can be done using the weighted sum (WSM) or weighted product model (WPM) This flexibility opens up opportunities to study decision projects under various parameters The author’s intention was to provide a complete and free software tool for educational and research purposes where calculations and algorithms are well documented and all input data and results can be exported in an open format for further processing or presentation The article describes the basic concept and structure of the software and the underlying mathematical algorithms and methods Challenges and practical experiences during the implementation, validation and productive phase of the software are highlighted

...read moreread less

Journal Article•DOI•

NeuroMatic: An Integrated Open-Source Software Toolkit for Acquisition, Analysis and Simulation of Electrophysiological Data

[...]

Jason S. Rothman¹, R. Angus Silver¹•Institutions (1)

University College London¹

04 Apr 2018-Frontiers in Neuroinformatics

TL;DR: NeuroMatic is an open-source software toolkit that performs data acquisition, data analysis, simulations and simulations of electrophysiological properties of the nervous system and has the advantage of working within Igor Pro, a platform-independent environment that includes an extensive library of built-in functions.

...read moreread less

Abstract: Acquisition, analysis and simulation of electrophysiological properties of the nervous system require multiple software packages. This makes it difficult to conserve experimental metadata and track the analysis performed. It also complicates certain experimental approaches such as online analysis. To address this, we developed NeuroMatic, an open-source software toolkit that performs data acquisition (episodic, continuous and triggered recordings), data analysis (spike rasters, spontaneous event detection, curve fitting, stationarity) and simulations (stochastic synaptic transmission, synaptic short-term plasticity, integrate-and-fire and Hodgkin-Huxley-like single-compartment models). The merging of a wide range of tools into a single package facilitates a more integrated style of research, from the development of online analysis functions during data acquisition, to the simulation of synaptic conductance trains during dynamic-clamp experiments. Moreover, NeuroMatic has the advantage of working within Igor Pro, a platform-independent environment that includes an extensive library of built-in functions, a history window for reviewing the user's workflow and the ability to produce publication-quality graphics. Since its original release, NeuroMatic has been used in a wide range of scientific studies and its user base has grown considerably. NeuroMatic version 3.0 can be found at http://www.neuromatic.thinkrandom.com and https://github.com/SilverLabUCL/NeuroMatic.

...read moreread less

Posted Content•

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

[...]

Florian Tramèr¹, Dan Boneh¹•Institutions (1)

Stanford University¹

08 Jun 2018-arXiv: Machine Learning

TL;DR: Slalom as mentioned in this paper is a framework that securely delegates execution of all linear layers in a DNN from a TEE to a faster, yet untrusted, co-located processor.

...read moreread less

Abstract: As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations. A pragmatic solution comes from Trusted Execution Environments (TEEs), which use hardware and software protections to isolate sensitive computations from the untrusted software stack. However, these isolation guarantees come at a price in performance, compared to untrusted alternatives. This paper initiates the study of high performance execution of Deep Neural Networks (DNNs) in TEEs by efficiently partitioning DNN computations between trusted and untrusted devices. Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate Slalom by running DNNs in an Intel SGX enclave, which selectively delegates work to an untrusted GPU. For canonical DNNs (VGG16, MobileNet and ResNet variants) we obtain 6x to 20x increases in throughput for verifiable inference, and 4x to 11x for verifiable and private inference.

...read moreread less

Journal Article•DOI•

shinyCircos: an R/Shiny application for interactive creation of Circos plot

[...]

Yiming Yu¹, Yidan Ouyang¹, Wen Yao¹, Wen Yao²•Institutions (2)

Huazhong Agricultural University¹, Henan Agricultural University²

01 Apr 2018-Bioinformatics

TL;DR: An R/Shiny application shinyCircos, a graphical user interface for interactive creation of Circos plot, which can be easily installed either on computers for personal use or on local or public servers to provide online use to the community.

...read moreread less

Abstract: Summary Creation of Circos plot is one of the most efficient approaches to visualize genomic data. However, the installation and use of existing tools to make Circos plot are challenging for users lacking of coding experiences. To address this issue, we developed an R/Shiny application shinyCircos, a graphical user interface for interactive creation of Circos plot. shinyCircos can be easily installed either on computers for personal use or on local or public servers to provide online use to the community. Furthermore, various types of Circos plots could be easily generated and decorated with simple mouse-click. Availability and implementation shinyCircos and its manual are freely available at https://github.com/venyao/shinyCircos. shinyCircos is deployed at https://yimingyu.shinyapps.io/shinycircos/ and http://shinycircos.ncpgr.cn/ for online use. Contact diana1983941@mail.hzau.edu.cn or yaowen@henau.edu.cn.

...read moreread less

Collapse