Showing papers on "Software published in 2014"

PDF

Open Access

Journal Article•DOI•

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

[...]

Heidelberg Institute for Theoretical Studies¹

01 May 2014-Bioinformatics

TL;DR: This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees.

...read moreread less

Abstract: Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting postanalyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU

...read moreread less

23,838 citations

Book•

A primer on partial least squares structural equation modeling (PLS-SEM)

[...]

Joseph F. Hair, G. Tomas M. Hult, Christian M. Ringle, Marko Sarstedt

01 Jan 2014

TL;DR: The Second Edition of this practical guide to partial least squares structural equation modeling is designed to be easily understood by those with limited statistical and mathematical training who want to pursue research opportunities in new ways.

...read moreread less

Abstract: With applications using SmartPLS (www.smartpls.com)—the primary software used in partial least squares structural equation modeling (PLS-SEM)—this practical guide provides concise instructions on how to use this evolving statistical technique to conduct research and obtain solutions. Featuring the latest research, new examples, and expanded discussions throughout, the Second Edition is designed to be easily understood by those with limited statistical and mathematical training who want to pursue research opportunities in new ways.

...read moreread less

13,621 citations

Journal Article•DOI•

BEAST 2: A Software Platform for Bayesian Evolutionary Analysis

[...]

Remco R. Bouckaert¹, Joseph Heled¹, Denise Kühnert¹, Timothy G. Vaughan², Chieh-Hsi Wu¹, Dong Xie¹, Marc A. Suchard³, Andrew Rambaut⁴, Alexei J. Drummond¹ - Show less +5 more•Institutions (4)

University of Auckland¹, Massey University², University of California, Los Angeles³, University of Edinburgh⁴

10 Apr 2014-PLOS Computational Biology

TL;DR: BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform.

...read moreread less

Abstract: We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.

...read moreread less

5,183 citations

Journal Article•DOI•

The design and verification of Mumax3

[...]

Arne Vansteenkiste, Jonathan Leliaert, Mykola Dvornik, Felipe Garcia-Sanchez, Bartel Van Waeyenberge - Show less +1 more

30 Jun 2014-arXiv: Computational Physics

TL;DR: In this paper, the authors report on the design, verification and performance of mumax3, an open-source GPU-accelerated micromagnetic simulation program that solves the time and space dependent magnetization evolution in nano-to micro-scale magnets using a finite-difference discretization.

...read moreread less

Abstract: We report on the design, verification and performance of mumax3, an open-source GPU-accelerated micromagnetic simulation program. This software solves the time- and space dependent magnetization evolution in nano- to micro scale magnets using a finite-difference discretization. Its high performance and low memory requirements allow for large-scale simulations to be performed in limited time and on inexpensive hardware. We verified each part of the software by comparing results to analytical values where available and to micromagnetic standard problems. mumax3 also offers specific extensions like MFM image generation, moving simulation window, edge charge removal and material grains.

...read moreread less

2,209 citations

Journal Article•DOI•

The design and verification of MuMax3

[...]

Arne Vansteenkiste¹, Jonathan Leliaert¹, Mykola Dvornik¹, Mathias Helsen¹, Felipe Garcia-Sanchez², Bartel Van Waeyenberge¹ - Show less +2 more•Institutions (2)

Ghent University¹, Centre national de la recherche scientifique²

20 Oct 2014-AIP Advances

TL;DR: The design, verification and performance of MUMAX3, an open-source GPU-accelerated micromagnetic simulation program that solves the time- and space dependent magnetization evolution in nano- to micro scale magnets using a finite-difference discretization is reported on.

...read moreread less

Abstract: We report on the design, verification and performance of MUMAX3, an open-source GPU-accelerated micromagnetic simulation program. This software solves the time- and space dependent magnetization evolution in nano- to micro scale magnets using a finite-difference discretization. Its high performance and low memory requirements allow for large-scale simulations to be performed in limited time and on inexpensive hardware. We verified each part of the software by comparing results to analytical values where available and to micromagnetic standard problems. MUMAX3 also offers specific extensions like MFM image generation, moving simulation window, edge charge removal and material grains.

...read moreread less

2,116 citations

Journal Article•DOI•

Kubios HRV - Heart rate variability analysis software

[...]

Mika P. Tarvainen, Juha-Pekka Niskanen¹, Jukka A. Lipponen¹, Perttu O. Ranta-aho¹, Pasi A. Karjalainen¹ - Show less +1 more•Institutions (1)

University of Eastern Finland¹

01 Jan 2014-Computer Methods and Programs in Biomedicine

TL;DR: Kubios HRV is an advanced and easy to use software for heart rate variability (HRV) analysis that includes an adaptive QRS detection algorithm and tools for artifact correction, trend removal and analysis sample selection.

...read moreread less

1,841 citations

Journal Article•DOI•

ANGSD: Analysis of Next Generation Sequencing Data

[...]

Thorfinn Sand Korneliussen¹, Anders Albrechtsen², Rasmus Nielsen³, Rasmus Nielsen¹•Institutions (3)

American Museum of Natural History¹, University of Copenhagen², University of California, Berkeley³

25 Nov 2014-BMC Bioinformatics

TL;DR: A multithreaded program suite called ANGSD that can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.

...read moreread less

Abstract: High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously. We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods. The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd . The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

...read moreread less

1,795 citations

Qualitative content analysis: theoretical foundation, basic procedures and software solution

[...]

Philipp Mayring

01 Jan 2014

1,655 citations

Journal Article•DOI•

Advanced methods of microscope control using μManager software.

[...]

Arthur D. Edelstein¹, Mark A. Tsuchida¹, Nenad Amodaj, Henry Pinkard¹, Ronald D. Vale¹, Nico Stuurman¹ - Show less +2 more•Institutions (1)

University of California, San Francisco¹

07 Nov 2014

TL;DR: A guide to using some of the recently added advanced μManager features, including hardware synchronization, simultaneous use of multiple cameras, projection of patterned light onto a specimen, live slide mapping, imaging with multi-well plates, particle localization and tracking, and high-speed imaging.

...read moreread less

Abstract: μManager is an open-source, cross-platform desktop application, to control a wide variety of motorized microscopes, scientific cameras, stages, illuminators, and other microscope accessories. Since its inception in 2005, μManager has grown to support a wide range of microscopy hardware and is now used by thousands of researchers around the world. The application provides a mature graphical user interface and offers open programming interfaces to facilitate plugins and scripts. Here, we present a guide to using some of the recently added advanced μManager features, including hardware synchronization, simultaneous use of multiple cameras, projection of patterned light onto a specimen, live slide mapping, imaging with multi-well plates, particle localization and tracking, and high-speed imaging.

...read moreread less

1,547 citations

Journal Article•DOI•

NeEstimator v2: re-implementation of software for the estimation of contemporary effective population size (Ne ) from genetic data.

[...]

C. Do, Robin S. Waples¹, David Peel², G. M. Macbeth³, Bree J. Tillett⁴, Jennifer R. Ovenden⁵ - Show less +2 more•Institutions (5)

National Marine Fisheries Service¹, Hobart Corporation², Department of Agriculture, Fisheries and Forestry³, Australian Institute of Marine Science⁴, University of Queensland⁵

01 Jan 2014-Molecular Ecology Resources

TL;DR: NeEstimator v2 includes three single‐sample estimators (updated versions of the linkage disequilibrium and heterozygote‐excess methods, and a new method based on molecular coancestry), as well as the two‐sample (moment‐based temporal) method.

...read moreread less

Abstract: NeEstimator v2 is a completely revised and updated implementation of software that produces estimates of contemporary effective population size, using several different methods and a single input file. NeEstimator v2 includes three single-sample estimators (updated versions of the linkage disequilibrium and heterozygote-excess methods, and a new method based on molecular coancestry), as well as the two-sample (moment-based temporal) method. New features include the following: (i) an improved method for accounting for missing data; (ii) options for screening out rare alleles; (iii) confidence intervals for all methods; (iv) the ability to analyse data sets with large numbers of genetic markers (10000 or more); (v) options for batch processing large numbers of different data sets, which will facilitate cross-method comparisons using simulated data; and (vi) correction for temporal estimates when individuals sampled are not removed from the population (Plan I sampling). The user is given considerable control over input data and composition, and format of output files. The freely available software has a new JAVA interface and runs under MacOS, Linux and Windows.

...read moreread less

1,515 citations

Book•

Interactive Theorem Proving and Program Development: Coq'Art: The Calculus of Inductive Constructions

[...]

Yves Bertot, Pierre Castran

12 Mar 2014

TL;DR: A practical introduction to the development of proofs and certified programs using Coq can be found in this paper, which is an invaluable tool for researchers, students, and engineers interested in formal methods and the developing of zero-fault software.

...read moreread less

Abstract: A practical introduction to the development of proofs and certified programs using Coq. An invaluable tool for researchers, students, and engineers interested in formal methods and the development of zero-fault software.

...read moreread less

Book Chapter•DOI•

Visualizing Bibliometric Networks

[...]

Nees Jan van Eck¹, Ludo Waltman¹•Institutions (1)

Leiden University¹

01 Jan 2014

TL;DR: This chapter provides an introduction to the topic of visualizing bibliometric networks and focuses specifically on two software tools: VOSviewer and CitNetExplorer.

...read moreread less

Abstract: This chapter provides an introduction to the topic of visualizing bibliometric networks. First, the most commonly studied types of bibliometric networks (i.e., citation, co-citation, bibliographic coupling, keyword co-occurrence, and coauthorship networks) are discussed, and three popular visualization approaches (i.e., distance-based, graph-based, and timeline-based approaches) are distinguished. Next, an overview is given of a number of software tools that can be used for visualizing bibliometric networks. In the second part of the chapter, the focus is specifically on two software tools: VOSviewer and CitNetExplorer. The techniques used by these tools to construct, analyze, and visualize bibliometric networks are discussed. In addition, tutorials are offered that demonstrate in a step-by-step manner how both tools can be used. Finally, the chapter concludes with a discussion of the limitations and the proper use of bibliometric network visualizations and with a summary of some ongoing and future developments.

...read moreread less

Journal Article•DOI•

GPOPS-II: A MATLAB Software for Solving Multiple-Phase Optimal Control Problems Using hp-Adaptive Gaussian Quadrature Collocation Methods and Sparse Nonlinear Programming

[...]

Michael A. Patterson¹, Anil V. Rao¹•Institutions (1)

University of Florida¹

27 Oct 2014-ACM Transactions on Mathematical Software

TL;DR: A general-purpose MATLAB software program called GPOPS--II is described for solving multiple-phase optimal control problems using variable-order Gaussian quadrature collocation methods.

...read moreread less

Abstract: A general-purpose MATLAB software program called GPOPS--II is described for solving multiple-phase optimal control problems using variable-order Gaussian quadrature collocation methods. The software employs a Legendre-Gauss-Radau quadrature orthogonal collocation method where the continuous-time optimal control problem is transcribed to a large sparse nonlinear programming problem (NLP). An adaptive mesh refinement method is implemented that determines the number of mesh intervals and the degree of the approximating polynomial within each mesh interval to achieve a specified accuracy. The software can be interfaced with either quasi-Newton (first derivative) or Newton (second derivative) NLP solvers, and all derivatives required by the NLP solver are approximated using sparse finite-differencing of the optimal control problem functions. The key components of the software are described in detail and the utility of the software is demonstrated on five optimal control problems of varying complexity. The software described in this article provides researchers a useful platform upon which to solve a wide variety of complex constrained optimal control problems.

...read moreread less

Journal Article•DOI•

DIYABC v2.0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data

[...]

Jean-Marie Cornuet¹, Pierre Pudlo¹, Julien Veyssier¹, Alexandre Dehne-Garcia¹, Mathieu Gautier¹, Raphaël Leblois¹, Jean-Michel Marin¹, Arnaud Estoup¹ - Show less +4 more•Institutions (1)

University of Montpellier¹

15 Apr 2014-Bioinformatics

TL;DR: DIYABC v2.0 implements a number of new features and analytical methods, including efficient Bayesian model choice using linear discriminant analysis on summary statistics and the serial launching of multiple post-processing analyses.

...read moreread less

Abstract: DIYABC is a software package for a comprehensive analysis of population history using approximate Bayesian computation on DNA polymorphism data. Version 2.0 implements a number of new features and analytical methods. It allows (i) the analysis of single nucleotide polymorphism data at large number of loci, apart from microsatellite and DNA sequence data, (ii) efficient Bayesian model choice using linear discriminant analysis on summary statistics and (iii) the serial launching of multiple post-processing analyses. DIYABC v2.0 also includes a user-friendly graphical interface with various new options. It can be run on three operating systems: GNU/Linux, Microsoft Windows and Apple Os X. Freely available with a detailed notice document and example projects to academic users at http://www1.montpellier.inra.fr/CBGP/diyabc CONTACT: estoup@supagro.inra.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Kwant: a software package for quantum transport

[...]

Christoph Groth, Michael Wimmer¹, Anton R. Akhmerov¹, Anton R. Akhmerov², Xavier Waintal - Show less +1 more•Institutions (2)

Leiden University¹, Harvard University²

26 Jun 2014-New Journal of Physics

TL;DR: Kwant as mentioned in this paper is a Python package for numerical quantum transport calculations that can be used to simulate any dimensionality and geometry with a tight-binding model, including lattices, symmetries, electrodes, orbital/spin/electron-hole degrees of freedom.

...read moreread less

Abstract: Kwant is a Python package for numerical quantum transport calculations. It aims to be a user-friendly, universal, and high-performance toolbox for the simulation of physical systems of any dimensionality and geometry that can be described by a tight-binding model. Kwant has been designed such that the natural concepts of the theory of quantum transport (lattices, symmetries, electrodes, orbital/spin/electron-hole degrees of freedom) are exposed in a simple and transparent way. Defining a new simulation setup is very similar to describing the corresponding mathematical model. Kwant offers direct support for calculations of transport properties (conductance, noise, scattering matrix), dispersion relations, modes, wave functions, various Greenʼs functions, and out-of-equilibrium local quantities. Other computations involving tight-binding Hamiltonians can be implemented easily thanks to its extensible and modular nature. Kwant is free software available at http://kwant-project.org/.

...read moreread less

Journal Article•DOI•

PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R

[...]

Bastian Pfeifer¹, Ulrich Wittelsbürger¹, Sebastian E. Ramos-Onsins, Martin J. Lercher¹•Institutions (1)

University of Düsseldorf¹

01 Jul 2014-Molecular Biology and Evolution

TL;DR: PopGenome is a population genomics package for the R software environment that offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination.

...read moreread less

Abstract: Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and single-nucleotide polymorphism (SNP) data sets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson’s MS and Ewing’s MSMS programs to assess statistical significance based on coalescent simulations. PopGenome’s integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN (http://cran.r-project.org/) for all major operating systems under the GNU General Public License.

...read moreread less

Journal Article•DOI•

DREAM.3D: A Digital Representation Environment for the Analysis of Microstructure in 3D

[...]

Michael A. Groeber¹, Michael A. Jackson•Institutions (1)

Air Force Research Laboratory¹

01 Apr 2014

TL;DR: The approach to building a generalized representation strategy for digital microstructures and the barriers encountered when trying to integrate a set of existing software tools to create an expandable codebase are discussed.

...read moreread less

Abstract: This paper presents a software environment for processing, segmenting, quantifying, representing and manipulating digital microstructure data. The paper discusses the approach to building a generalized representation strategy for digital microstructures and the barriers encountered when trying to integrate a set of existing software tools to create an expandable codebase.

...read moreread less

Journal Article•DOI•

A reconfigurable fabric for accelerating large-scale datacenter services

[...]

Andrew Putnam¹, Adrian M. Caulfield¹, Eric S. Chung¹, Derek Chiou², Kypros Constantinides¹, John Demme³, Hadi Esmaeilzadeh⁴, Jeremy Fowers¹, Gopi Prashanth Gopal¹, Jan Gray¹, Michael Haselman¹, Scott Hauck⁵, Stephen F. Heil¹, Amir Hormati⁶, Joo-Young Kim¹, Sitaram Lanka¹, James R. Larus⁷, Eric C. Peterson¹, Simon Pope¹, Aaron L. Smith¹, Jason Thong¹, Phillip Yi Xiao¹, Doug Burger¹ - Show less +19 more•Institutions (7)

Microsoft¹, University of Texas at Austin², Columbia University³, Georgia Institute of Technology⁴, University of Washington⁵, Google⁶, École Polytechnique Fédérale de Lausanne⁷

14 Jun 2014

TL;DR: The requirements and architecture of the fabric are described, the critical engineering challenges and solutions needed to make the system robust in the presence of failures are detailed, and the performance, power, and resilience of the system when ranking candidate documents are measured.

...read moreread less

Abstract: Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cablesIn this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29%

...read moreread less

Book Chapter•DOI•

3D Slicer: A Platform for Subject-Specific Image Analysis, Visualization, and Clinical Support

[...]

Ron Kikinis¹, Steve Pieper, Kirby G. Vosburgh¹•Institutions (1)

Brigham and Women's Hospital¹

01 Jan 2014

TL;DR: 3D Slicer provides a set of interactive tools and a stable platform that can quickly incorporate new analysis techniques and evolve to serve more sophisticated real-time applications while remaining compatible with the latest hardware and software generations of host computer systems.

...read moreread less

Abstract: 3D Slicer is an open-source platform for the analysis and display of information derived from medical imaging and similar data sets. Such advanced software environments are in daily use by researchers and clinicians and in many nonmedical applications. 3D Slicer is unique through serving clinical users, multidisciplinary clinical research terms, and software architects within a single technology structure and user community. Functions such as interactive visualization, image registration, and model-based analysis are now being complemented by more advanced capabilities, most notably in neurological imaging and intervention. These functions, originally limited to offline use by technical factors, are integral to large scale, rapidly developing research studies, and they are being increasingly integrated into the management and delivery of care. This activity has been led by a community of basic, applied, and clinical scientists and engineers, from both academic and commercial perspectives. 3D Slicer, a free open-source software package, is based in this community; 3D Slicer provides a set of interactive tools and a stable platform that can quickly incorporate new analysis techniques and evolve to serve more sophisticated real-time applications while remaining compatible with the latest hardware and software generations of host computer systems.

...read moreread less

Journal Article•DOI•

Matlab Software for Spatial Panels

[...]

J. Paul Elhorst

01 Jul 2014-International Regional Science Review

TL;DR: This article extends Matlab routines to include the bias correction procedure proposed by Lee and Yu if the spatial panel data model contains spatial and/or time-period fixed effects, the direct and indirect effects estimates of the explanatory variables proposed by LeSage and Pace, and a selection framework to determine which spatialpanel data model best describes the data.

...read moreread less

Abstract: Elhorst provides Matlab routines to estimate spatial panel data models at his website. This article extends these routines to include the bias correction procedure proposed by Lee and Yu if the sp...

...read moreread less

Journal Article•DOI•

Best Practices for Scientific Computing

[...]

Greg Wilson¹, D. A. Aruliah², C. Titus Brown³, Neil Chue Hong, Matt Davis⁴, Richard T. Guy⁵, Steven H. D. Haddock⁶, Kathryn D. Huff⁷, Ian M. Mitchell⁸, Mark D. Plumbley⁹, Ben Waugh¹⁰, Ethan P. White¹¹, Paul P. H. Wilson¹² - Show less +9 more•Institutions (12)

Mozilla Foundation¹, University of Ontario Institute of Technology², Michigan State University³, Space Telescope Science Institute⁴, University of Toronto⁵, Monterey Bay Aquarium Research Institute⁶, University of California, Berkeley⁷, University of British Columbia⁸, Queen Mary University of London⁹, University College London¹⁰, Utah State University¹¹, University of Wisconsin-Madison¹²

07 Jan 2014-PLOS Biology

TL;DR: A set of best practices for scientific software development, based on research and experience, that will improve scientists' productivity and the reliability of their software are described.

...read moreread less

Abstract: Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists' productivity and the reliability of their software. Software is as important to modern scientific research as telescopes and test tubes. From groups that work exclusively on computational problems, to traditional laboratory and field scientists, more and more of the daily operation of science revolves around developing new algorithms, managing and analyzing the large amounts of data that are generated in single research projects, combining disparate datasets to assess synthetic problems, and other computational tasks. Scientists typically develop their own software for these purposes because doing so requires substantial domain-specific knowledge. As a result, recent studies have found that scientists typically spend 30% or more of their time developing software [1],[2]. However, 90% or more of them are primarily self-taught [1],[2], and therefore lack exposure to basic software development practices such as writing maintainable code, using version control and issue trackers, code reviews, unit testing, and task automation. We believe that software is just another kind of experimental apparatus [3] and should be built, checked, and used as carefully as any physical apparatus. However, while most scientists are careful to validate their laboratory and field equipment, most do not know how reliable their software is [4],[5]. This can lead to serious errors impacting the central conclusions of published research [6]: recent high-profile retractions, technical comments, and corrections because of errors in computational methods include papers in Science [7],[8], PNAS [9], the Journal of Molecular Biology [10], Ecology Letters [11],[12], the Journal of Mammalogy [13], Journal of the American College of Cardiology [14], Hypertension [15], and The American Economic Review [16]. In addition, because software is often used for more than a single project, and is often reused by other scientists, computing errors can have disproportionate impacts on the scientific process. This type of cascading impact caused several prominent retractions when an error from another group's code was not discovered until after publication [6]. As with bench experiments, not everything must be done to the most exacting standards; however, scientists need to be aware of best practices both to improve their own approaches and for reviewing computational work by others. This paper describes a set of practices that are easy to adopt and have proven effective in many research settings. Our recommendations are based on several decades of collective experience both building scientific software and teaching computing to scientists [17],[18], reports from many other groups [19]–, guidelines for commercial and open source software development [26],, and on empirical studies of scientific computing [28]–[31] and software development in general (summarized in [32]). None of these practices will guarantee efficient, error-free software development, but used in concert they will reduce the number of errors in scientific software, make it easier to reuse, and save the authors of the software time and effort that can used for focusing on the underlying scientific questions. Our practices are summarized in Box 1; labels in the main text such as “(1a)” refer to items in that summary. For reasons of space, we do not discuss the equally important (but independent) issues of reproducible research, publication and citation of code and data, and open science. We do believe, however, that all of these will be much easier to implement if scientists have the skills we describe. Box 1. Summary of Best Practices Write programs for people, not computers. A program should not require its readers to hold more than a handful of facts in memory at once. Make names consistent, distinctive, and meaningful. Make code style and formatting consistent. Let the computer do the work. Make the computer repeat tasks. Save recent commands in a file for re-use. Use a build tool to automate workflows. Make incremental changes. Work in small steps with frequent feedback and course correction. Use a version control system. Put everything that has been created manually in version control. Don't repeat yourself (or others). Every piece of data must have a single authoritative representation in the system. Modularize code rather than copying and pasting. Re-use code instead of rewriting it. Plan for mistakes. Add assertions to programs to check their operation. Use an off-the-shelf unit testing library. Turn bugs into test cases. Use a symbolic debugger. Optimize software only after it works correctly. Use a profiler to identify bottlenecks. Write code in the highest-level language possible. Document design and purpose, not mechanics. Document interfaces and reasons, not implementations. Refactor code in preference to explaining how it works. Embed the documentation for a piece of software in that software. Collaborate. Use pre-merge code reviews. Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems. Use an issue tracking tool. Write Programs for People, Not Computers Scientists writing software need to write code that both executes correctly and can be easily read and understood by other programmers (especially the author's future self). If software cannot be easily read and understood, it is much more difficult to know that it is actually doing what it is intended to do. To be productive, software developers must therefore take several aspects of human cognition into account: in particular, that human working memory is limited, human pattern matching abilities are finely tuned, and human attention span is short [33]–[37]. First, a program should not require its readers to hold more than a handful of facts in memory at once (1a). Human working memory can hold only a handful of items at a time, where each item is either a single fact or a “chunk” aggregating several facts [33],[34], so programs should limit the total number of items to be remembered to accomplish a task. The primary way to accomplish this is to break programs up into easily understood functions, each of which conducts a single, easily understood, task. This serves to make each piece of the program easier to understand in the same way that breaking up a scientific paper using sections and paragraphs makes it easier to read. Second, scientists should make names consistent, distinctive, and meaningful (1b). For example, using non-descriptive names, like a and foo, or names that are very similar, like results and results2, is likely to cause confusion. Third, scientists should make code style and formatting consistent (1c). If different parts of a scientific paper used different formatting and capitalization, it would make that paper more difficult to read. Likewise, if different parts of a program are indented differently, or if programmers mix CamelCaseNaming and pothole_case_naming, code takes longer to read and readers make more mistakes [35],[36].

...read moreread less

TopoToolbox 2 – MATLAB-based software for topographic analysis and modeling in Earth surface sciences

[...]

Wolfgang Schwanghart, Dirk Scherler

15 Jan 2014

TL;DR: The introduction of a novel technique to store flow directions as topologically ordered vectors of indices enables calculation of flow-related attributes such as flow accumulation ∼20 times faster than conventional algorithms while at the same time reducing memory overhead to 33% of that required by the previous version.

...read moreread less

Abstract: TopoToolbox is a MATLAB program for the analysis of digital elevation models (DEMs). With the release of version 2, the software adopts an object-oriented programming (OOP) approach to work with gridded DEMs and derived data such as flow directions and stream networks. The introduction of a novel technique to store flow directions as topologically ordered vectors of indices enables calculation of flow-related attributes such as flow accumulation ∼20 times faster than conventional algorithms while at the same time reducing memory overhead to 33% of that required by the previous version. Graphical user interfaces (GUIs) enable visual exploration and interaction with DEMs and derivatives and provide access to tools targeted at fluvial and tectonic geomorphologists. With its new release, TopoToolbox has become a more memory-efficient and faster tool for basic and advanced digital terrain analysis that can be used as a framework for building hydrological and geomorphological models in MATLAB.

...read moreread less

Journal Article•DOI•

hroot: Hamburg Registration and Organization Online Tool☆

[...]

Olaf Bock¹, Ingmar Baetge, Andreas Nicklisch¹•Institutions (1)

University of Hamburg¹

01 Oct 2014-European Economic Review

TL;DR: Hamburg Registration and Organization Online Tool (Hroot) as discussed by the authors is a web-based software designed for managing participants of economic experiments, which provides important features to assure a randomized invitation process based on a filtered, pre-specified subject pool.

...read moreread less

Book•

Neural Networks: An Introduction

[...]

Berndt Müller¹, Joachim Reinhardt•Institutions (1)

Duke University¹

12 Mar 2014

TL;DR: The concepts of neural-network models and techniques of parallel distributed processing are comprehensively presented in a three-step approach and the reader is introduced to "neural" information processing, such as associative memory, perceptrons, feature-sensitive networks, learning strategies and practical applications.

...read moreread less

Abstract: The concepts of neural-network models and techniques of parallel distributed processing are comprehensively presented in a three-step approach. After a brief overview of the neural structure of the brain and the history of neural-network modelling, the reader is introduced to "neural" information processing, such as associative memory, perceptrons, feature-sensitive networks, learning strategies and practical applications. Part 2 covers more advanced subjects such as spin glasses, the mean-field theory of the Hopfield model, and the space of interactions in neural networks. The self-contained final part discusses seven programmes that provide practical demonstrations of neural-network models and their learning strategies. Software is included with the text on a 5 1/4-inch MS-DOS diskette and can be run using Borland's TURBO-C 2.0 compiler, the Microsoft C compiler (5.0), or compatible compilers.

...read moreread less

Journal Article•DOI•

Short Communication: TopoToolbox 2 – MATLAB-based software for topographic analysis and modeling in Earth surface sciences

[...]

Wolfgang Schwanghart¹, Wolfgang Schwanghart², Dirk Scherler³•Institutions (3)

University of Potsdam¹, Aarhus University², California Institute of Technology³

15 Jan 2014

TL;DR: TopoToolbox as discussed by the authors is a MATLAB program for the analysis of digital elevation models (DEMs) that adopts an object-oriented programming (OOP) approach to work with gridded DEMs and derived data such as flow directions and stream networks.

...read moreread less

Abstract: . TopoToolbox is a MATLAB program for the analysis of digital elevation models (DEMs). With the release of version 2, the software adopts an object-oriented programming (OOP) approach to work with gridded DEMs and derived data such as flow directions and stream networks. The introduction of a novel technique to store flow directions as topologically ordered vectors of indices enables calculation of flow-related attributes such as flow accumulation ∼20 times faster than conventional algorithms while at the same time reducing memory overhead to 33% of that required by the previous version. Graphical user interfaces (GUIs) enable visual exploration and interaction with DEMs and derivatives and provide access to tools targeted at fluvial and tectonic geomorphologists. With its new release, TopoToolbox has become a more memory-efficient and faster tool for basic and advanced digital terrain analysis that can be used as a framework for building hydrological and geomorphological models in MATLAB.

...read moreread less

Book•

Threat Modeling: Designing for Security

[...]

Adam Shostack

12 Feb 2014

TL;DR: Threat Modeling: Designing for Security is a unique how-to for security and software developers who need to design secure products and systems and test their designs and offers actionable how to advice not tied to any specific software, operating system, or programming language.

...read moreread less

Abstract: The only security book to be chosen as a Dr. Dobbs Jolt Award Finalist since Bruce Schneier's Secrets and Lies and Applied Cryptography! Adam Shostack is responsible for security development lifecycle threat modeling at Microsoft and is one of a handful of threat modeling experts in the world. Now, he is sharing his considerable expertise into this unique book. With pages of specific actionable advice, he details how to build better security into the design of systems, software, or services from the outset. You'll explore various threat modeling approaches, find out how to test your designs against threats, and learn effective ways to address threats that have been validated at Microsoft and other top companies.Systems security managers, you'll find tools and a framework for structured thinking about what can go wrong. Software developers, you'll appreciate the jargon-free and accessible introduction to this essential skill. Security professionals, you'll learn to discern changing threats and discover the easiest ways to adopt a structured approach to threat modeling. * Provides a unique how-to for security and software developers who need to design secure products and systems and test their designs * Explains how to threat model and explores various threat modeling approaches, such as asset-centric, attacker-centric and software-centric * Provides effective approaches and techniques that have been proven at Microsoft and elsewhere * Offers actionable how-to advice not tied to any specific software, operating system, or programming language * Authored by a Microsoft professional who is one of the most prominent threat modeling experts in the worldAs more software is delivered on the Internet or operates on Internet-connected devices, the design of secure software is absolutely critical. Make sure you're ready with Threat Modeling: Designing for Security.

...read moreread less

Journal Article•DOI•

Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations.

[...]

Stefano Piana¹, John L. Klepeis¹, David E. Shaw², David E. Shaw¹•Institutions (2)

D. E. Shaw Research¹, Columbia University²

01 Feb 2014-Current Opinion in Structural Biology

TL;DR: The extent to which current force fields reproduce (and fail to reproduce) certain relevant properties for which such comparisons are possible is examined.

...read moreread less

Journal Article•DOI•

Reference-free cell mixture adjustments in analysis of DNA methylation data

[...]

Eugene Andres Houseman¹, John Molitor¹, Carmen J. Marsit¹•Institutions (1)

Dartmouth College¹

15 May 2014-Bioinformatics

TL;DR: This work proposes a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors and demonstrates that it can perform as well as or better than methods that make explicit use of reference datasets.

...read moreread less

Abstract: Motivation: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation. A recently published set of statistical methods exploits this association to infer changes in cell mixture proportions, and these methods are presently being applied to adjust for cell mixture effect in the context of epigenome-wide association studies. However, these adjustments require the existence of reference datasets, which may be laborious or expensive to collect. For some tissues such as placenta, saliva, adipose or tumor tissue, the relevant underlying cell types may not be known. Results: We propose a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors. We demonstrate via simulation study and several real data analyses that our proposed method can perform as well as or better than methods that make explicit use of reference datasets. In particular, it may adjust for detailed cell type differences that may be unavailable even in existing reference datasets. Availability and implementation: Software is available in the R package RefFreeEWAS. Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981. Contact: andres.houseman@oregonstate.edu Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

WinCSD: software package for crystallographic calculations (Version 4)

[...]

Lev Akselrud¹, Yuri Grin¹•Institutions (1)

Max Planck Society¹

01 Apr 2014-Journal of Applied Crystallography

TL;DR: WinCSD covers the complete spectrum of crystallographic calculations, including powder diffraction pattern deconvolution, crystal structure solution and refinement in 3 + d space, refinement of the multipole model and electron density studies from diffraction data, and graphical representation of crystallography information.

...read moreread less

Abstract: The fourth version of the program package WinCSD is multi-purpose computer software for crystallographic calculations using single-crystal and powder X-ray and neutron diffraction data. The software environment and the graphical user interface are built using the platform of the Microsoft .NET Framework, which grants independence from changing Windows operating systems and allows for transferring to other operating systems. Graphic applications use the three-dimensional OpenGL graphics language. WinCSD covers the complete spectrum of crystallographic calculations, including powder diffraction pattern deconvolution, crystal structure solution and refinement in 3 + d space, refinement of the multipole model and electron density studies from diffraction data, and graphical representation of crystallographic information.

...read moreread less

Journal Article•DOI•

A guide to handling missing data in cost-effectiveness analysis conducted within randomised controlled trials.

[...]

Rita Faria¹, Manuel Gomes², David Epstein¹, David Epstein³, Ian R. White⁴ - Show less +1 more•Institutions (4)

University of York¹, University of London², University of Granada³, Medical Research Council⁴

29 Jul 2014-PharmacoEconomics

TL;DR: Practical guidance on how to handle missing data in within-trial CEAs following a principled approach is provided, which is implemented in three stages: descriptive analysis to inform the assumption on the missing data mechanism; how to choose between alternative methods given their underlying assumptions; and methods for sensitivity analysis.

...read moreread less

Abstract: Missing data are a frequent problem in cost-effectiveness analysis (CEA) within a randomised controlled trial. Inappropriate methods to handle missing data can lead to misleading results and ultimately can affect the decision of whether an intervention is good value for money. This article provides practical guidance on how to handle missing data in within-trial CEAs following a principled approach: (i) the analysis should be based on a plausible assumption for the missing data mechanism, i.e. whether the probability that data are missing is independent of or dependent on the observed and/or unobserved values; (ii) the method chosen for the base-case should fit with the assumed mechanism; and (iii) sensitivity analysis should be conducted to explore to what extent the results change with the assumption made. This approach is implemented in three stages, which are described in detail: (1) descriptive analysis to inform the assumption on the missing data mechanism; (2) how to choose between alternative methods given their underlying assumptions; and (3) methods for sensitivity analysis. The case study illustrates how to apply this approach in practice, including software code. The article concludes with recommendations for practice and suggestions for future research.

...read moreread less

Collapse