Showing papers on "Software published in 2015"

PDF

Open Access

Book•

[...]

Sean R. Eddy¹•Institutions (1)

01 May 2015

TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.

...read moreread less

Abstract: Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

...read moreread less

4,492 citations

Journal Article•DOI•

The ImageJ ecosystem: An open platform for biomedical image analysis

[...]

Johannes Schindelin¹, Curtis Rueden¹, Mark Hiner¹, Kevin W. Eliceiri¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jul 2015-Molecular Reproduction and Development

TL;DR: The ImageJ project is used as a case study of how open‐source software fosters its suites of software tools, making multitudes of image‐analysis technology easily accessible to the scientific community.

...read moreread less

Abstract: Technology in microscopy advances rapidly, enabling increasingly affordable, faster, and more precise quantitative biomedical imaging, which necessitates correspondingly more-advanced image processing and analysis techniques. A wide range of software is available-from commercial to academic, special-purpose to Swiss army knife, small to large-but a key characteristic of software that is suitable for scientific inquiry is its accessibility. Open-source software is ideal for scientific endeavors because it can be freely inspected, modified, and redistributed; in particular, the open-software platform ImageJ has had a huge impact on the life sciences, and continues to do so. From its inception, ImageJ has grown significantly due largely to being freely available and its vibrant and helpful user community. Scientists as diverse as interested hobbyists, technical assistants, students, scientific staff, and advanced biology researchers use ImageJ on a daily basis, and exchange knowledge via its dedicated mailing list. Uses of ImageJ range from data visualization and teaching to advanced image processing and statistical analysis. The software's extensibility continues to attract biologists at all career stages as well as computer scientists who wish to effectively implement specific image-processing algorithms. In this review, we use the ImageJ project as a case study of how open-source software fosters its suites of software tools, making multitudes of image-analysis technology easily accessible to the scientific community. We specifically explore what makes ImageJ so popular, how it impacts the life sciences, how it inspires other projects, and how it is self-influenced by coevolving projects within the ImageJ ecosystem.

...read moreread less

2,081 citations

Journal Article•DOI•

Subject pool recruitment procedures: organizing experiments with ORSEE

[...]

Ben Greiner¹•Institutions (1)

University of New South Wales¹

07 May 2015

TL;DR: This paper discusses aspects of recruiting subjects for economic laboratory experiments, and shows how the Online Recruitment System for Economic Experiments can help.

...read moreread less

Abstract: This paper discusses aspects of recruiting subjects for economic laboratory experiments, and shows how the Online Recruitment System for Economic Experiments can help. The software package provides experimenters with a free, convenient, and very powerful tool to organize their experiments and sessions.

...read moreread less

1,974 citations

Journal Article•DOI•

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

[...]

Thomas Brettin¹, Thomas Brettin², James J. Davis¹, James J. Davis², Terry Disz, Robert Edwards³, Robert Edwards², Svetlana Gerdes², Gary J. Olsen⁴, Robert Olson², Robert Olson¹, Ross Overbeek², Bruce Parrello², Gordon D. Pusch², Maulik Shukla⁵, James Thomason⁶, Rick Stevens¹, Rick Stevens², Veronika Vonstein², Alice R. Wattam⁵, Fangfang Xia², Fangfang Xia¹ - Show less +18 more•Institutions (6)

University of Chicago¹, Argonne National Laboratory², San Diego State University³, University of Illinois at Urbana–Champaign⁴, Virginia Tech⁵, Cold Spring Harbor Laboratory⁶

10 Feb 2015-Scientific Reports

TL;DR: The RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines and offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job.

...read moreread less

Abstract: The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

...read moreread less

1,666 citations

Journal Article•DOI•

LHAPDF6: parton density access in the LHC precision era

[...]

Andrew Buckley¹, James Ferrando¹, Stephen Lloyd², Karl Nordström¹, Ben Page³, Martin Rüfenacht², Marek Schönherr⁴, G. Watt⁵ - Show less +4 more•Institutions (5)

University of Glasgow¹, University of Edinburgh², University of Granada³, University of Zurich⁴, Durham University⁵

20 Mar 2015-European Physical Journal C

TL;DR: In this paper, the authors present the LHAPDF-6 library, a ground-up re-engineering of the PDFLIB/LHAPDF paradigm for PDF access which removes all limits on use of concurrent PDF sets, massively reduces static memory requirements, offers improved CPU performance, and fixes fundamental bugs in multi-set access to PDF metadata.

...read moreread less

Abstract: The Fortran LHAPDF library has been a long-term workhorse in particle physics, providing standardised access to parton density functions for experimental and phenomenological purposes alike, following on from the venerable PDFLIB package During Run 1 of the LHC, however, several fundamental limitations in LHAPDF’s design have became deeply problematic, restricting the usability of the library for important physics-study procedures and providing dangerous avenues by which to silently obtain incorrect results In this paper we present the LHAPDF 6 library, a ground-up re-engineering of the PDFLIB/LHAPDF paradigm for PDF access which removes all limits on use of concurrent PDF sets, massively reduces static memory requirements, offers improved CPU performance, and fixes fundamental bugs in multi-set access to PDF metadata The new design, restricted for now to interpolated PDFs, uses centralised numerical routines and a powerful cascading metadata system to decouple software releases from provision of new PDF data and allow completely general parton content More than 200 PDF sets have been migrated from LHAPDF 5 to the new universal data format, via a stringent quality control procedure LHAPDF 6 is supported by many Monte Carlo generators and other physics programs, in some cases via a full set of compatibility routines, and is recommended for the demanding PDF access needs of LHC Run 2 and beyond

...read moreread less

1,563 citations

Journal Article•DOI•

MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories

[...]

Robert T. McGibbon¹, Kyle A. Beauchamp², Matthew P. Harrigan¹, Christoph Klein³, Jason M. Swails⁴, Carlos X. Hernández¹, Christian R. Schwantes¹, Lee-Ping Wang⁵, Thomas J. Lane⁶, Vijay S. Pande¹ - Show less +6 more•Institutions (6)

Stanford University¹, Kettering University², Vanderbilt University³, Rutgers University⁴, University of California, Davis⁵, SLAC National Accelerator Laboratory⁶

20 Oct 2015-Biophysical Journal

TL;DR: MDTraj is a modern, lightweight, and fast software package for analyzing MD simulations that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python.

...read moreread less

1,480 citations

Journal Article•DOI•

Ncorr: Open-Source 2D Digital Image Correlation Matlab Software

[...]

Justin A. Blaber, B. Adair, A. Antoniou

31 Mar 2015-Experimental Mechanics

TL;DR: Ncorr is an open-source subset-based 2D DIC package that amalgamates modern DIC algorithms proposed in the literature with additional enhancements and several applications of Ncorr that both validate it and showcase its capabilities are discussed.

...read moreread less

Abstract: Digital Image Correlation (DIC) is an important and widely used non-contact technique for measuring material deformation. Considerable progress has been made in recent decades in both developing new experimental DIC techniques and in enhancing the performance of the relevant computational algorithms. Despite this progress, there is a distinct lack of a freely available, high-quality, flexible DIC software. This paper documents a new DIC software package Ncorr that is meant to fill that crucial gap. Ncorr is an open-source subset-based 2D DIC package that amalgamates modern DIC algorithms proposed in the literature with additional enhancements. Several applications of Ncorr that both validate it and showcase its capabilities are discussed.

...read moreread less

1,184 citations

Journal Article•DOI•

DIOPTAS: a program for reduction of two-dimensional X-ray diffraction data and data exploration

[...]

Clemens Prescher¹, Vitali B. Prakapenka¹•Institutions (1)

University of Chicago¹

29 Jun 2015-High Pressure Research

TL;DR: Dioptas is a Python-based program for on-the-fly data processing and exploration of two-dimensional X-ray diffraction area detector data, specifically designed for the large amount of data collected at XRD beamlines at synchrotrons.

...read moreread less

Abstract: The amount of data collected during synchrotron X-ray diffraction (XRD) experiments is constantly increasing. Most of the time, the data are collected with image detectors, which necessitates the use of image reduction/integration routines to extract structural information from measured XRD patterns. This step turns out to be a bottleneck in the data processing procedure due to a lack of suitable software packages. In particular, fast-running synchrotron experiments require online data reduction and analysis in real time so that experimental parameters can be adjusted interactively. Dioptas is a Python-based program for on-the-fly data processing and exploration of two-dimensional X-ray diffraction area detector data, specifically designed for the large amount of data collected at XRD beamlines at synchrotrons. Its fast data reduction algorithm and graphical data exploration capabilities make it ideal for online data processing during XRD experiments and batch post-processing of large numbers of images.

...read moreread less

1,163 citations

Journal Article•DOI•

Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data

[...]

Konstantin Okonechnikov¹, Ana Conesa², Fernando Garcia-Alcalde¹•Institutions (2)

Max Planck Society¹, University of Florida²

01 Oct 2015-Bioinformatics

TL;DR: Qualimap 2 represents a next step in the QC analysis of HTS data, along with comprehensive single-sample analysis of alignment data, and includes new modes that allow simultaneous processing and comparison of multiple samples.

...read moreread less

Abstract: Motivation: Detection of random errors and systematic biases is a crucial step of a robust pipeline for processing high-throughput sequencing (HTS) data. Bioinformatics software tools capable of performing this task are available, either for general analysis of HTS data or targeted to a specific sequencing technology. However, most of the existing QC instruments only allow processing of one sample at a time. Results: Qualimap 2 represents a next step in the QC analysis of HTS data. Along with comprehensive single-sample analysis of alignment data, it includes new modes that allow simultaneous processing and comparison of multiple samples. As with the first version, the new features are available via both graphical and command line interface. Additionally, it includes a large number of improvements proposed by the user community. Availability and implementation: The implementation of the software along with documentation is freely available at http://www.qualimap.org. Contact: ed.gpm.nilreb-biipm@reyem Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

1,154 citations

Journal Article•DOI•

Extended Functional Groups (EFG): An Efficient Set for Chemical Characterization and Structure-Activity Relationship Studies of Chemical Compounds

[...]

Elena Salmina, Norbert Haider¹, Igor V. Tetko•Institutions (1)

University of Vienna¹

23 Dec 2015-Molecules

TL;DR: An extension of a set previously used by the CheckMol software that covers in addition heterocyclic compound classes and periodic table groups is described, which demonstrates that EFG can be efficiently used to develop and interpret structure-activity relationship models.

...read moreread less

Abstract: The article describes a classification system termed "extended functional groups" (EFG), which are an extension of a set previously used by the CheckMol software, that covers in addition heterocyclic compound classes and periodic table groups. The functional groups are defined as SMARTS patterns and are available as part of the ToxAlerts tool (http://ochem.eu/alerts) of the On-line CHEmical database and Modeling (OCHEM) environment platform. The article describes the motivation and the main ideas behind this extension and demonstrates that EFG can be efficiently used to develop and interpret structure-activity relationship models.

...read moreread less

1,024 citations

Journal Article•DOI•

Atomsk: A tool for manipulating and converting atomic data files ☆

[...]

Pierre Hirel¹•Institutions (1)

university of lille¹

01 Dec 2015-Computer Physics Communications

TL;DR: Atomsk is a unified program that allows to generate, convert and transform atomic systems for the purposes of ab initio calculations, classical atomistic simulations, or visualization, in the areas of computational physics and chemistry.

...read moreread less

Journal Article•DOI•

Learning in Nonstationary Environments: A Survey

[...]

Gregory Ditzler¹, Manuel Roveri², Cesare Alippi², Robi Polikar³•Institutions (3)

University of Arizona¹, Polytechnic University of Milan², Rowan University³

12 Oct 2015-IEEE Computational Intelligence Magazine

TL;DR: In such nonstationary environments, where the probabilistic properties of the data change over time, a non-adaptive model trained under the false stationarity assumption is bound to become obsolete in time, and perform sub-optimally at best, or fail catastrophically at worst.

...read moreread less

Abstract: The prevalence of mobile phones, the internet-of-things technology, and networks of sensors has led to an enormous and ever increasing amount of data that are now more commonly available in a streaming fashion [1]-[5]. Often, it is assumed - either implicitly or explicitly - that the process generating such a stream of data is stationary, that is, the data are drawn from a fixed, albeit unknown probability distribution. In many real-world scenarios, however, such an assumption is simply not true, and the underlying process generating the data stream is characterized by an intrinsic nonstationary (or evolving or drifting) phenomenon. The nonstationarity can be due, for example, to seasonality or periodicity effects, changes in the users' habits or preferences, hardware or software faults affecting a cyber-physical system, thermal drifts or aging effects in sensors. In such nonstationary environments, where the probabilistic properties of the data change over time, a non-adaptive model trained under the false stationarity assumption is bound to become obsolete in time, and perform sub-optimally at best, or fail catastrophically at worst.

...read moreread less

Proceedings Article•DOI•

Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network

[...]

Arjun Singh¹, Joon Ong¹, Amit Agarwal¹, Glen Anderson¹, Ashby Armistead¹, Roy Michael Bannon¹, Seb Boving¹, Gaurav Desai¹, Bob Felderman¹, Paulie Germano¹, Anand Kanagala¹, Jeff Provost¹, Jason Simmons¹, Eiichi Tanda¹, Jim Wanderer¹, Urs Hölzle¹, Stephen Stuart¹, Amin Vahdat¹ - Show less +14 more•Institutions (1)

Google¹

17 Aug 2015

TL;DR: This paper built a centralized control mechanism based on a global configuration pushed to all datacenter switches, and modular hardware design coupled with simple, robust software allowed the design to also support inter-cluster and wide-area networks.

...read moreread less

Abstract: We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of building-scale networks. Second, much of the general, but complex, decentralized network routing and management protocols supporting arbitrary deployment scenarios were overkill for single-operator, pre-planned datacenter networks. We built a centralized control mechanism based on a global configuration pushed to all datacenter switches. Third, modular hardware design coupled with simple, robust software allowed our design to also support inter-cluster and wide-area networks. Our datacenter networks run at dozens of sites across the planet, scaling in capacity by 100x over ten years to more than 1Pbps of bisection bandwidth.

...read moreread less

Journal Article•DOI•

The tps series of software

[...]

F. James Rohlf¹•Institutions (1)

State University of New York System¹

12 Jun 2015-Hystrix-italian Journal of Mammalogy

TL;DR: The development and the present state of the “tps” series of software for use in geometric morphometrics on Windows-based computers are described and used in hundreds of studies in mammals and other organisms.

...read moreread less

Abstract: The development and the present state of the “tps” series of software for use in geometric morphometrics on Windows-based computers are described. These programs have been used in hundreds of studies in mammals and other organisms. Download the complete issue.

...read moreread less

Posted Content•

oTree - An Open-Source Platform for Laboratory, Online, and Field Experiments

[...]

Daniel L. Chen¹, Daniel L. Chen², Martin Schonger², Chris Wickens²•Institutions (2)

Harvard University¹, University of Toulouse²

06 Mar 2015-Research Papers in Economics

TL;DR: oTree is an open-source and online software for implementing interactive experiments in the laboratory, online, the field or combinations thereof, and provides the source code, a library of standard game templates and demo games which can be played by anyone.

...read moreread less

Abstract: oTree is an open-source and online software for implementing interactive experiments in the laboratory, online, the field or combinations thereof. oTree does not require installation of software on subjects’ devices; it can run on any device that has a web browser, be that a desktop computer, a tablet or a smartphone. Deployment can be internet-based without a shared local network, or local-network-based even without internet access. For coding, Python is used, a popular, open-source programming language. www.oTree.org provides the source code, a library of standard game templates and demo games which can be played by anyone.

...read moreread less

Journal Article•DOI•

DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics

[...]

Chih-Chiang Tsou¹, Dmitry M. Avtonomov¹, Brett Larsen², Monika Tucholska², Hyungwon Choi³, Anne-Claude Gingras², Alexey I. Nesvizhskii¹ - Show less +3 more•Institutions (3)

University of Michigan¹, Lunenfeld-Tanenbaum Research Institute², National University of Singapore³

01 Mar 2015-Nature Methods

TL;DR: DIA-Umpire enables targeted extraction of quantitative information based on peptides initially identified in only a subset of the samples, resulting in more consistent quantification across multiple samples.

...read moreread less

Abstract: As a result of recent improvements in mass spectrometry (MS), there is increased interest in data-independent acquisition (DIA) strategies in which all peptides are systematically fragmented using wide mass-isolation windows ('multiplex fragmentation'). DIA-Umpire (http://diaumpire.sourceforge.net/), a comprehensive computational workflow and open-source software for DIA data, detects precursor and fragment chromatographic features and assembles them into pseudo-tandem MS spectra. These spectra can be identified with conventional database-searching and protein-inference tools, allowing sensitive, untargeted analysis of DIA data without the need for a spectral library. Quantification is done with both precursor- and fragment-ion intensities. Furthermore, DIA-Umpire enables targeted extraction of quantitative information based on peptides initially identified in only a subset of the samples, resulting in more consistent quantification across multiple samples. We demonstrated the performance of the method with control samples of varying complexity and publicly available glycoproteomics and affinity purification-MS data.

...read moreread less

MEG and EEG data analysis with MNE-Python

[...]

Alexandre Gramfort¹, Martin Luessi¹, Eric B. Larson², Denis A. Engemann³, Daniel Strohmeier⁴, Christian Brodbeck⁵, Roman Goj⁶, Mainak Jas⁷, Teon L Brooks⁵, Lauri Parkkonen⁷, Matti Hämäläinen⁷ - Show less +7 more•Institutions (7)

Harvard University¹, University of Washington², Forschungszentrum Jülich³, Technische Universität Ilmenau⁴, New York University⁵, University of Stirling⁶, Aalto University⁷

01 Jan 2015

TL;DR: MNE-Python is an open-source software package that addresses this challenge by providing state-of-the-art algorithms implemented in Python that cover multiple methods of data preprocessing, source localization, statistical analysis, and estimation of functional connectivity between distributed brain regions.

...read moreread less

Abstract: Magnetoencephalography and electroencephalography (M/EEG) measure the weak electromagnetic signals generated by neuronal activity in the brain. Using these signals to characterize and locate neural activation in the brain is a challenge that requires expertise in physics, signal processing, statistics, and numerical methods. As part of the MNE software suite, MNE-Python is an open-source software package that addresses this challenge by providing state-of-the-art algorithms implemented in Python that cover multiple methods of data preprocessing, source localization, statistical analysis, and estimation of functional connectivity between distributed brain regions. All algorithms and utility functions are implemented in a consistent manner with well-documented interfaces, enabling users to create M/EEG data analysis pipelines by writing Python scripts. Moreover, MNE-Python is tightly integrated with the core Python libraries for scientific comptutation (NumPy, SciPy) and visualization (matplotlib and Mayavi), as well as the greater neuroimaging ecosystem in Python via the Nibabel package. The code is provided under the new BSD license allowing code reuse, even in commercial products. Although MNE-Python has only been under heavy development for a couple of years, it has rapidly evolved with expanded analysis capabilities and pedagogical tutorials because multiple labs have collaborated during code development to help share best practices. MNE-Python also gives easy access to preprocessed datasets, helping users to get started quickly and facilitating reproducibility of methods by other researchers. Full documentation, including dozens of examples, is available at http://martinos.org/mne.

...read moreread less

Journal Article•DOI•

A systematic review of machine learning techniques for software fault prediction

[...]

Ruchika Malhotra¹•Institutions (1)

Delhi Technological University¹

01 Feb 2015

TL;DR: The machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers, however, the application of theMachine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results.

...read moreread less

Abstract: Reviews studies from 1991-2013 to assess application of ML techniques for SFP.Identifies seven categories of the ML techniques.Identifies 64 studies to answer the established research questions.Selects primary studies according to the quality assessment of the studies.Systematic literature review performs the following:Summarize ML techniques for SFP models.Assess performance accuracy and capability of ML techniques for constructing SFP models.Provide comparison between the ML and statistical techniques.Provide comparison of performance accuracy of different ML techniques.Summarize the strength and weakness of the ML techniques.Provides future guidelines to software practitioners and researchers. BackgroundSoftware fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults. MethodIn this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized. ResultsIn this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models. ConclusionBased on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.

...read moreread less

Journal Article•DOI•

Is My Model Good Enough? Best Practices for Verification and Validation of Musculoskeletal Models and Simulations of Movement

[...]

Jennifer L. Hicks¹, Thomas Uchida¹, Ajay Seth¹, Apoorva Rajagopal¹, Scott L. Delp¹ - Show less +1 more•Institutions (1)

Stanford University¹

01 Feb 2015-Journal of Biomechanical Engineering-transactions of The Asme

TL;DR: Practical guidelines for verification and validation of NMS models and simulations are established that researchers, clinicians, reviewers, and others can adopt to evaluate the accuracy and credibility of modeling studies.

...read moreread less

Abstract: Computational modeling and simulation of neuromusculoskeletal (NMS) systems enables researchers and clinicians to study the complex dynamics underlying human and animal movement. NMS models use equations derived from physical laws and biology to help solve challenging real-world problems, from designing prosthetics that maximize running speed to developing exoskeletal devices that enable walking after a stroke. NMS modeling and simulation has proliferated in the biomechanics research community over the past 25 years, but the lack of verification and validation standards remains a major barrier to wider adoption and impact. The goal of this paper is to establish practical guidelines for verification and validation of NMS models and simulations that researchers, clinicians, reviewers, and others can adopt to evaluate the accuracy and credibility of modeling studies. In particular, we review a general process for verification and validation applied to NMS models and simulations, including careful formulation of a research question and methods, traditional verification and validation steps, and documentation and sharing of results for use and testing by other researchers. Modeling the NMS system and simulating its motion involves methods to represent neural control, musculoskeletal geometry, muscle-tendon dynamics, contact forces, and multibody dynamics. For each of these components, we review modeling choices and software verification guidelines; discuss variability, errors, uncertainty, and sensitivity relationships; and provide recommendations for verification and validation by comparing experimental data and testing robustness. We present a series of case studies to illustrate key principles. In closing, we discuss challenges the community must overcome to ensure that modeling and simulation are successfully used to solve the broad spectrum of problems that limit human mobility.

...read moreread less

Proceedings Article•DOI•

Automated Test Input Generation for Android: Are We There Yet? (E)

[...]

Shauvik Roy Choudhary¹, Alessandra Gorla², Alessandro Orso¹•Institutions (2)

Georgia Institute of Technology¹, IMDEA²

09 Nov 2015

TL;DR: In this paper, a comparison of the main existing test input generation tools for Android apps is presented, based on four metrics: ease of use, ability to work on multiple platforms, code coverage, and ability to detect faults.

...read moreread less

Abstract: Like all software, mobile applications ("apps") must be adequately tested to gain confidence that they behave correctly. Therefore, in recent years, researchers and practitioners alike have begun to investigate ways to automate apps testing. In particular, because of Android's open source nature and its large share of the market, a great deal of research has been performed on input generation techniques for apps that run on the Android operating systems. At this point in time, there are in fact a number of such techniques in the literature, which differ in the way they generate inputs, the strategy they use to explore the behavior of the app under test, and the specific heuristics they use. To better understand the strengths and weaknesses of these existing approaches, and get general insight on ways they could be made more effective, in this paper we perform a thorough comparison of the main existing test input generation tools for Android. In our comparison, we evaluate the effectiveness of these tools, and their corresponding techniques, according to four metrics: ease of use, ability to work on multiple platforms, code coverage, and ability to detect faults. Our results provide a clear picture of the state of the art in input generation for Android apps and identify future research directions that, if suitably investigated, could lead to more effective and efficient testing tools for Android.

...read moreread less

Journal Article•DOI•

An Open Approach to Autonomous Vehicles

[...]

Shinpei Kato¹, Eijiro Takeuchi¹, Yoshio Ishiguro¹, Yoshiki Ninomiya¹, Kazuya Takeda¹, Tsuyoshi Hamada² - Show less +2 more•Institutions (2)

Nagoya University¹, Nagasaki University²

01 Nov 2015-IEEE Micro

TL;DR: An open platform using commodity vehicles and sensors is introduced to facilitate the development of autonomous vehicles and presents algorithms, software libraries, and datasets required for scene recognition, path planning, and vehicle control.

...read moreread less

Abstract: Autonomous vehicles are an emerging application of automotive technology. They can recognize the scene, plan the path, and control the motion by themselves while interacting with drivers. Although they receive considerable attention, components of autonomous vehicles are not accessible to the public but instead are developed as proprietary assets. To facilitate the development of autonomous vehicles, this article introduces an open platform using commodity vehicles and sensors. Specifically, the authors present algorithms, software libraries, and datasets required for scene recognition, path planning, and vehicle control. This open platform allows researchers and developers to study the basis of autonomous vehicles, design new algorithms, and test their performance using the common interface.

...read moreread less

Proceedings Article•DOI•

Panoptic Studio: A Massively Multiview System for Social Motion Capture

[...]

Hanbyul Joo¹, Hao Liu², Lei Tan³, Lin Gui², B. Nabbe, Iain Matthews⁴, Takeo Kanade¹, Shohei Nobuhara⁵, Yaser Sheikh¹ - Show less +5 more•Institutions (5)

Carnegie Mellon University¹, Ocean University of China², Hunan University³, Disney Research⁴, Kyoto University⁵

07 Dec 2015

TL;DR: The Panoptic Studio is a system organized around the thesis that social interactions should be measured through the perceptual integration of a large variety of view points, consisting of integrated structural, hardware, and software innovations.

...read moreread less

Abstract: We present an approach to capture the 3D structure and motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent, (2) subtle motion needs to be measured over a space large enough to host a social group, and (3) human appearance and configuration variation is immense. The Panoptic Studio is a system organized around the thesis that social interactions should be measured through the perceptual integration of a large variety of view points. We present a modularized system designed around this principle, consisting of integrated structural, hardware, and software innovations. The system takes, as input, 480 synchronized video streams of multiple people engaged in social activities, and produces, as output, the labeled time-varying 3D structure of anatomical landmarks on individuals in the space. The algorithmic contributions include a hierarchical approach for generating skeletal trajectory proposals, and an optimization framework for skeletal reconstruction with trajectory re-association.

...read moreread less

Journal Article•DOI•

Quantitative evaluation of software packages for single-molecule localization microscopy

[...]

Daniel Sage¹, Hagai Kirshner¹, Thomas Pengo, Nico Stuurman², Junhong Min³, Suliana Manley¹, Michael Unser¹ - Show less +3 more•Institutions (3)

École Polytechnique Fédérale de Lausanne¹, Howard Hughes Medical Institute², KAIST³

01 Aug 2015-Nature Methods

TL;DR: This work focuses on the computational aspects of super-resolution microscopy and presents a comprehensive evaluation of localization software packages, reflecting the various tradeoffs of SMLM software packages and helping users to choose the software that fits their needs.

...read moreread less

Abstract: The quality of super-resolution images obtained by single-molecule localization microscopy (SMLM) depends largely on the software used to detect and accurately localize point sources. In this work, we focus on the computational aspects of super-resolution microscopy and present a comprehensive evaluation of localization software packages. Our philosophy is to evaluate each package as a whole, thus maintaining the integrity of the software. We prepared synthetic data that represent three-dimensional structures modeled after biological components, taking excitation parameters, noise sources, point-spread functions and pixelation into account. We then asked developers to run their software on our data; most responded favorably, allowing us to present a broad picture of the methods available. We evaluated their results using quantitative and user-interpretable criteria: detection rate, accuracy, quality of image reconstruction, resolution, software usability and computational resources. These metrics reflect the various tradeoffs of SMLM software packages and help users to choose the software that fits their needs.

...read moreread less

Book•

Evidence-Based Software Engineering and Systematic Reviews

[...]

Barbara Kitchenham, David Budgen, Pearl Brereton

04 Nov 2015

TL;DR: Evidence-Based Software Engineering and Systematic Reviews provides a clear introduction to the use of an evidence-based model for software engineering research and practice, explaining the roles of primary studies as elements of an over-arching evidence model, rather than as disjointed elements in the empirical spectrum.

...read moreread less

Abstract: In the decade since the idea of adapting the evidence-based paradigm for software engineering was first proposed, it has become a major tool of empirical software engineering. Evidence-Based Software Engineering and Systematic Reviews provides a clear introduction to the use of an evidence-based model for software engineering research and practice. The book explains the roles of primary studies (experiments, surveys, case studies) as elements of an over-arching evidence model, rather than as disjointed elements in the empirical spectrum. Supplying readers with a clear understanding of empirical software engineering best practices, it provides up-to-date guidance on how to conduct secondary studies in software engineeringreplacing the existing 2004 and 2007 technical reports. The book is divided into three parts. The first part discusses the nature of evidence and the evidence-based practices centered on a systematic review, both in general and as applying to software engineering. The second part examines the different elements that provide inputs to a systematic review (usually considered as forming a secondary study), especially the main forms of primary empirical study currently used in software engineering. The final part provides practical guidance on how to conduct systematic reviews (the guidelines), drawing together accumulated experiences to guide researchers and students in planning and conducting their own studies. The book includes an extensive glossary and an appendix that provides a catalogue of reviews that may be useful for practice and teaching.

...read moreread less

Journal Article•DOI•

Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model:

[...]

Jonathan W. Bartlett¹, Shaun R. Seaman, Ian R. White, James R. Carpenter¹, Alzheimer's Disease Neuroimaging Initiative - Show less +1 more•Institutions (1)

University of London¹

01 Aug 2015-Statistical Methods in Medical Research

TL;DR: In this paper, the authors show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model.

...read moreread less

Abstract: Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of multiple imputation may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing multiple imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.

...read moreread less

Journal Article•DOI•

Bonsai: an event-based framework for processing and controlling data streams

[...]

Gonçalo Lopes, Niccolò Bonacchi, João Frazão, Joana P. Neto, Bassam V. Atallah, Sofia Soares, Luís Moreira, Sara Matias, Pavel M. Itskov, Patrícia A. Correia, Roberto E. Medina, Lorenza Calcaterra, Elena Dreosti¹, Joseph J. Paton, Adam R. Kampff - Show less +11 more•Institutions (1)

University College London¹

08 Apr 2015-Frontiers in Neuroinformatics

TL;DR: Bonsai is described, a modular, high-performance, open-source visual programming framework for the acquisition and online processing of data streams and demonstrated how it allows for the rapid and flexible prototyping of integrated experimental designs in neuroscience.

...read moreread less

Abstract: The design of modern scientific experiments requires the control and monitoring of many different data streams. However, the serial execution of programming instructions in a computer makes it a challenge to develop software that can deal with the asynchronous, parallel nature of scientific data. Here we present Bonsai, a modular, high-performance, open-source visual programming framework for the acquisition and online processing of data streams. We describe Bonsai's core principles and architecture and demonstrate how it allows for the rapid and flexible prototyping of integrated experimental designs in neuroscience. We specifically highlight some applications that require the combination of many different hardware and software components, including video tracking of behavior, electrophysiology and closed-loop control of stimulation.

...read moreread less

Journal Article•DOI•

Computer-Assisted Text Analysis for Comparative Politics

[...]

Chris Lucas¹, Richard A. Nielsen², Margaret E. Roberts³, Brandon M. Stewart¹, Alex Storer⁴, Dustin Tingley¹ - Show less +2 more•Institutions (4)

Harvard University¹, Massachusetts Institute of Technology², University of California, San Diego³, Stanford University⁴

04 Feb 2015-Political Analysis

TL;DR: Practical issues that arise in the processing, management, translation, and analysis of textual data are discussed with a particular focus on how procedures differ across languages.

...read moreread less

Abstract: Recent advances in research tools for the systematic analysis of textual data are enabling exciting new research throughout the social sciences. For comparative politics, scholars who are often interested in nonEnglish and possibly multilingual textual datasets, these advances may be difficult to access. This article discusses practical issues that arise in the processing, management, translation, and analysis of textual data with a particular focus on how procedures differ across languages. These procedures are combined in two applied examples of automated text analysis using the recently introduced Structural Topic Model. We also show how the model can be used to analyze data that have been translated into a single language via machine translation tools. All the methods we describe here are implemented in open-source software packages available from the authors.

...read moreread less

Book•

How to do Linguistics with R: Data exploration and statistical analysis

[...]

Natalia Levshina¹•Institutions (1)

Université catholique de Louvain¹

25 Nov 2015

TL;DR: How to do Linguistics with R: Data exploration and statistical analysis is unique in its scope, as it covers a wide range of classical and cutting-edge statistical methods, including different flavours of regression analysis and ANOVA, random forests and conditional inference trees, as well as specific linguistic approaches.

...read moreread less

Abstract: This book provides a linguist with a statistical toolkit for exploration and analysis of linguistic data. It employs R, a free software environment for statistical computing, which is increasingly popular among linguists. How to do Linguistics with R: Data exploration and statistical analysis is unique in its scope, as it covers a wide range of classical and cutting-edge statistical methods, including different flavours of regression analysis and ANOVA, random forests and conditional inference trees, as well as specific linguistic approaches, among which are Behavioural Profiles, Vector Space Models and various measures of association between words and constructions. The statistical topics are presented comprehensively, but without too much technical detail, and illustrated with linguistic case studies that answer non-trivial research questions. The book also demonstrates how to visualize linguistic data with the help of attractive informative graphs, including the popular ggplot2 system and Google visualization tools.

...read moreread less

Journal Article•DOI•

Software defect prediction using ensemble learning on selected features

[...]

Issam H. Laradji¹, Mohammad Alshayeb¹, Lahouari Ghouti¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Feb 2015-Information & Software Technology

TL;DR: Tackling software data issues, including redundancy, correlation, feature irrelevance and missing samples, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.

...read moreread less

Abstract: Context Several issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset. Objective The objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy. Method We carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance. Results Forward selection showed that only few features contribute to high area under the receiver-operating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson’s correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1. Conclusion This paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.

...read moreread less

Journal Article•DOI•

User-Generated Content as a Research Mode in Tourism and Hospitality Applications: Topics, Methods, and Software

[...]

Weilin Lu¹, Svetlana Stepchenkova¹•Institutions (1)

University of Florida¹

17 Feb 2015-Journal of Hospitality Marketing & Management

TL;DR: In this paper, the authors surveyed 122 peer-reviewed journal articles and conference proceedings that used UGC as a data source and investigated the scope of the tourism and hospitality issues that are addressed using available UGC; the methods that have been applied to UGC data to achieve research objectives; and the software that has been used to collect UGC and extract information from large data sets.

...read moreread less

Abstract: The rapid growth of information generated by consumers of tourism and hospitality services calls for a systematic review of how user-generated content (UGC) has been applied in tourism and hospitality research. This study surveyed 122 peer-reviewed journal articles and conference proceedings that used UGC as a data source. The study investigates (a) the scope of the tourism and hospitality issues that are addressed using available UGC; (b) the methods that have been applied to UGC data to achieve research objectives; and (c) the software that has been used to collect UGC and extract information from large UGC data sets. The study also presents the emerging topics and challenges in UGC research.

...read moreread less

Collapse