scispace - formally typeset
Search or ask a question

Showing papers on "Software published in 2016"


Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations


Journal ArticleDOI
TL;DR: The latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine, has been optimized for use on 64-bit computing systems for analyzing larger datasets.
Abstract: We present the latest version of the Molecular Evolutionary Genetics Analysis (Mega) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, Mega has been optimized for use on 64-bit computing systems for analyzing larger datasets. Researchers can now explore and analyze tens of thousands of sequences in Mega The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit Mega is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OS X. The command line Mega is available as native applications for Windows, Linux, and Mac OS X. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.

33,048 citations


Journal ArticleDOI
TL;DR: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses that includes the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, and new output formats to facilitate interoperability with downstream software.
Abstract: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many new methods and features. These include the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, new output formats to facilitate interoperability with downstream software, and many new models of molecular evolution. PartitionFinder 2 is freely available under an open source license and works on Windows, OSX, and Linux operating systems. It can be downloaded from www.robertlanfear.com/partitionfinder. The source code is available at https://github.com/brettc/partitionfinder.

3,445 citations


Journal ArticleDOI
TL;DR: The computer program LOBSTER (Local Orbital Basis Suite Towards Electronic‐Structure Reconstruction) enables chemical‐bonding analysis based on periodic plane‐wave density‐functional theory output and is applicable to a wide range of first‐principles simulations in solid‐state and materials chemistry.
Abstract: The computer program LOBSTER (Local Orbital Basis Suite Towards Electronic-Structure Reconstruction) enables chemical-bonding analysis based on periodic plane-wave (PAW) density-functional theory (DFT) output and is applicable to a wide range of first-principles simulations in solid-state and materials chemistry. LOBSTER incorporates analytic projection routines described previously in this very journal [J. Comput. Chem. 2013, 34, 2557] and offers improved functionality. It calculates, among others, atom-projected densities of states (pDOS), projected crystal orbital Hamilton population (pCOHP) curves, and the recently introduced bond-weighted distribution function (BWDF). The software is offered free-of-charge for non-commercial research. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.

1,531 citations


Journal ArticleDOI
TL;DR: A comprehensive overview of a broad spectrum of fault localization techniques, each of which aims to streamline the fault localization process and make it more effective by attacking the problem in a unique way is provided.
Abstract: Software fault localization, the act of identifying the locations of faults in a program, is widely recognized to be one of the most tedious, time consuming, and expensive – yet equally critical – activities in program debugging. Due to the increasing scale and complexity of software today, manually locating faults when failures occur is rapidly becoming infeasible, and consequently, there is a strong demand for techniques that can guide software developers to the locations of faults in a program with minimal human intervention. This demand in turn has fueled the proposal and development of a broad spectrum of fault localization techniques, each of which aims to streamline the fault localization process and make it more effective by attacking the problem in a unique way. In this article, we catalog and provide a comprehensive overview of such techniques and discuss key issues and concerns that are pertinent to software fault localization as a whole.

822 citations


Journal ArticleDOI
TL;DR: It was concluded that the Assistat software was used in many papers published in journals and that it is functional and efficient in the analysis of experimental data of agricultural research.
Abstract: Statistical programs are essential tools for those who deal with scientific research and need to analyze experimental data. In agriculture, there are often uncontrolled factors, which determine the necessity of statistical analyses of the data. The Assistat software version 7.7 is one of these tools and this study aimed to demonstrate its functionality and efficiency in the analyses of experimental data of agricultural research and evaluate its acceptance. In order to exemplify its utilization, data of agricultural experiments were analyzed using the models of analysis of variance for randomized block and factorial experiments. In addition, the regression was used in the analysis of variance for quantitative treatments. It was concluded that the software was used in many papers published in journals and that it is functional and efficient in the analysis of experimental data of agricultural research. Key words: Analysis of variance (ANOVA), statistical software, Tukey's test.

753 citations


Posted Content
TL;DR: OpenAI Gym as mentioned in this paper is a toolkit for reinforcement learning research that includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms.
Abstract: OpenAI Gym is a toolkit for reinforcement learning research It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software

690 citations


Journal ArticleDOI
TL;DR: Review of selected capabilities of HYDRUS implemented since 2008 New standard and nonstandard specialized add‐on modules significantly expanded capabilities of the software.
Abstract: The HYDRUS-1D and HYDRUS (2D/3D) computer software packages are widely used finite-element models for simulating the one- and two- or three-dimensional movement of water, heat, and multiple solutes in variably saturated media, respectively. In 2008, Simůnek et al. (2008b) described the entire history of the development of the various HYDRUS programs and related models and tools such as STANMOD, RETC, ROSETTA, UNSODA, UNSATCHEM, HP1, and others. The objective of this manuscript is to review selected capabilities of HYDRUS that have been implemented since 2008. Our review is not limited to listing additional processes that were implemented in the standard computational modules, but also describes many new standard and nonstandard specialized add-on modules that significantly expanded the capabilities of the two software packages. We also review additional capabilities that have been incorporated into the graphical user interface (GUI) that supports the use of HYDRUS (2D/3D). Another objective of this manuscript is to review selected applications of the HYDRUS models such as evaluation of various irrigation schemes, evaluation of the effects of plant water uptake on groundwater recharge, assessing the transport of particle-like substances in the subsurface, and using the models in conjunction with various geophysical methods.

661 citations


Journal ArticleDOI
TL;DR: oTree as mentioned in this paper is an open-source and online software for implementing interactive experiments in the laboratory, online, the field or combinations thereof; it can run on any device that has a web browser, be that a desktop computer, a tablet or a smartphone.

627 citations



01 Jan 2016
TL;DR: Two-dimensional phase unwrapping algorithms applied to feminist theory crime and social justice theoretical conscience volume 4 dr-caloriez henry and the paper route cafebr chapter 3 what is money mishkin cafebr.
Abstract: two–dimensional phase unwrapping. theory, algorithms, and two dimensional phase unwrapping theory algorithms and two dimensional phase unwrapping theory algorithms and two-dimensional phase unwrapping using neural networks two-dimensional phase unwrapping: theory, algorithms, and (size 43,32mb) link download two dimensional phase phase unwrapping: project liverpool john moores university pixel-wise absolute phase unwrapping using geometric 2d phase unwrapping on fpgas and gpus phase unwrapping producing bright bands if phase unwrapping and affine transformations using cuda phase unwrapping on reconfigurable hardware ll.mit absolute three-dimensional shape measurement using coded fast twodimensional simultaneous phase unwrapping and low unwrapping differential x-ray phase-contrast images connections between transport of intensity equation and space geodesy seminar sio 239 scripps institution of experiment of phase unwrapping algorithm in interferometric reference documents esa 3d shape measurement technique for multiple rapidly moving phase unwrapping for large sar interferograms: statistical superfast phaseshifting method for 3-d shape measurement space geodesy seminar sio 239 scripps institution of off-axis quantitative phase imaging processing using cuda angular phase unwrapping of optically thick objects with a a comparison of phase unwrapping techniques in synthetic noise robust linear dynamic system for phase unwrapping fast phase processing in off-axis holography by cuda cat d2 dozer manual fiores fourier analysis of rgb fringe-projection profilometry and dynamic quantitative phase imaging for biological objects twowavelength quantitative phase unwrapping of dynamic comparison of phase unwrapping algorithms applied to feminist theory crime and social justice theoretical conscience volume 4 dr-caloriez henry and the paper route cafebr chapter 3 what is money mishkin cafebr

Journal ArticleDOI
TL;DR: RevBayes is a new open-source software package based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models that outperforms competing software for several standard analyses and needs to explicitly specify each part of the model and analysis.
Abstract: Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs We developed a new open-source software package, RevBayes, to address these problems RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses Compared with other programs, RevBayes has fewer black-box elements Users need to explicitly specify each part of the model and analysis Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny RevBayes is freely available at http://wwwRevBayescom [Bayesian inference; Graphical models; MCMC; statistical phylogenetics]

Journal ArticleDOI
TL;DR: This work exemplarily applies destiny, an efficient R implementation of the diffusion map algorithm, to a recent time-resolved mass cytometry dataset of cellular reprogramming and presents an efficient nearest-neighbour approximation.
Abstract: Diffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single-cell expression data. Here we present destiny, an efficient R implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming. Availability and implementation destiny is an open-source R/Bioconductor package "bioconductor.org/packages/destiny" also available at www.helmholtz-muenchen.de/icb/destiny A detailed vignette describing functions and workflows is provided with the package. Contact carsten.marr@helmholtz-muenchen.de or f.buettner@helmholtz-muenchen.de Supplementary information Supplementary data are available at Bioinformatics online.

Proceedings Article
01 Jan 2016
TL;DR: Sanctum offers the same promise as Intel’s Software Guard Extensions (SGX), namely strong provable isolation of software modules running concurrently and sharing resources, but protects against an important class of additional software attacks that infer private information from a program's memory access patterns.
Abstract: Sanctum offers the same promise as Intel’s Software Guard Extensions (SGX), namely strong provable isolation of software modules running concurrently and sharing resources, but protects against an important class of additional software attacks that infer private information from a program’s memory access patterns. Sanctum shuns unnecessary complexity, leading to a simpler security analysis. We follow a principled approach to eliminating entire attack surfaces through isolation, rather than plugging attack-specific privacy leaks. Most of Sanctum’s logic is implemented in trusted software, which does not perform cryptographic operations using keys, and is easier to analyze than SGX’s opaque microcode, which does. Our prototype targets a Rocket RISC-V core, an open implementation that allows any researcher to reason about its security properties. Sanctum’s extensions can be adapted to other processor cores, because we do not change any major CPU building block. Instead, we add hardware at the interfaces between generic building blocks, without impacting cycle time. Sanctum demonstrates that strong software isolation is achievable with a surprisingly small set of minimally invasive hardware changes, and a very reasonable overhead.

Journal ArticleDOI
TL;DR: TSCAN is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis and quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods.
Abstract: When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells Cells are first grouped into clusters and an MST is then constructed to connect cluster centers Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time Clustering cells before MST construction reduces the complexity of the tree space This often leads to improved cell ordering It also allows users to conveniently adjust the ordering based on prior knowledge TSCAN has a graphical user interface (GUI) to support data visualization and user interaction Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods TSCAN is available at https://githubcom/zji90/TSCAN and as a Bioconductor package

Journal ArticleDOI
TL;DR: OpenMS 2.0 is presented, a robust, open-source, cross-platform software specifically designed for the flexible and reproducible analysis of high-throughput MS data.
Abstract: High-resolution mass spectrometry (MS) has become an important tool in the life sciences, contributing to the diagnosis and understanding of human diseases, elucidating biomolecular structural information and characterizing cellular signaling networks. However, the rapid growth in the volume and complexity of MS data makes transparent, accurate and reproducible analysis difficult. We present OpenMS 2.0 (http://www.openms.de), a robust, open-source, cross-platform software specifically designed for the flexible and reproducible analysis of high-throughput MS data. The extensible OpenMS software implements common mass spectrometric data processing tasks through a well-defined application programming interface in C++ and Python and through standardized open data formats. OpenMS additionally provides a set of 185 tools and ready-made workflows for common mass spectrometric data processing tasks, which enable users to perform complex quantitative mass spectrometric analyses with ease.

Proceedings ArticleDOI
14 May 2016
TL;DR: Angelix is a novel semantics- based repair method that scales up to programs of similar size as are handled by search-based repair tools such as GenProg and SPR, and is more scalable than previously proposed semantics based repair methods such as SemFix and DirectFix.
Abstract: Since debugging is a time-consuming activity, automated program repair tools such as GenProg have garnered interest A recent study revealed that the majority of GenProg repairs avoid bugs simply by deleting functionality We found that SPR, a state-of-the-art repair tool proposed in 2015, still deletes functionality in their many "plausible" repairs Unlike generate-and-validate systems such as GenProg and SPR, semantic analysis based repair techniques synthesize a repair based on semantic information of the program While such semantics-based repair methods show promise in terms of quality of generated repairs, their scalability has been a concern so far In this paper, we present Angelix, a novel semantics-based repair method that scales up to programs of similar size as are handled by search-based repair tools such as GenProg and SPR This shows that Angelix is more scalable than previously proposed semantics based repair methods such as SemFix and DirectFix Furthermore, our repair method can repair multiple buggy locations that are dependent on each other Such repairs are hard to achieve using SPR and GenProg In our experiments, Angelix generated repairs from large-scale real-world software such as wireshark and php, and these generated repairs include multi-location repairs We also report our experience in automatically repairing the well-known Heartbleed vulnerability

Journal ArticleDOI
TL;DR: CoSMoMVPA is a lightweight MVPA (MVP analysis) toolbox implemented in the intersection of the Matlab and GNU Octave languages, that treats both fMRI and M/EEG data as first-class citizens.
Abstract: Recent years have seen an increase in the popularity of multivariate pattern (MVP) analysis of functional magnetic resonance (fMRI) data, and, to a much lesser extent, magneto- and electro-encephalography (M/EEG) data. We present CoSMoMVPA, a lightweight MVPA (MVP analysis) toolbox implemented in the intersection of the Matlab and GNU Octave languages, that treats both fMRI and M/EEG data as first-class citizens. CoSMoMVPA supports all state-of-the-art MVP analysis techniques, including searchlight analyses, classification, correlations, representational similarity analysis, and the time generalization method. These can be used to address both data-driven and hypothesis-driven questions about neural organization and representations, both within and across: space, time, frequency bands, neuroimaging modalities, individuals, and species. It uses a uniform data representation of fMRI data in the volume or on the surface, and of M/EEG data at the sensor and source level. Through various external toolboxes, it directly supports reading and writing a variety of fMRI and M/EEG neuroimaging formats, and, where applicable, can convert between them. As a result, it can be integrated readily in existing pipelines and used with existing preprocessed datasets. CoSMoMVPA overloads the traditional volumetric searchlight concept to support neighborhoods for M/EEG and surface-based fMRI data, which supports localization of multivariate effects of interest across space, time, and frequency dimensions. CoSMoMVPA also provides a generalized approach to multiple comparison correction across these dimensions using Threshold-Free Cluster Enhancement with state-of-the-art clustering and permutation techniques. CoSMoMVPA is highly modular and uses abstractions to provide a uniform interface for a variety of MVP measures. Typical analyses require a few lines of code, making it accessible to beginner users. At the same time, expert programmers can easily extend its functionality. CoSMoMVPA comes with extensive documentation, including a variety of runnable demonstration scripts and analysis exercises (with example data and solutions). It uses best software engineering practices including version control, distributed development, an automated test suite, and continuous integration testing. It can be used with the proprietary Matlab and the free GNU Octave software, and it complies with open source distribution platforms such as NeuroDebian. CoSMoMVPA is Free/Open Source Software under the permissive MIT license. Website: cosmomvpa.org

Journal ArticleDOI
TL;DR: This paper presents Combenefit, new free software tool that enables the visualization, analysis and quantification of drug combination effects in terms of synergy and/or antagonism, and provides laboratory scientists with an easy and systematic way to analyze their data.
Abstract: Motivation: Many drug combinations are routinely assessed to identify synergistic interactions in the attempt to develop novel treatment strategies. Appropriate software is required to analyze the results of these studies. Results: We present Combenefit, new free software tool that enables the visualization, analysis and quantification of drug combination effects in terms of synergy and/or antagonism. Data from combinations assays can be processed using classical Synergy models (Loewe, Bliss, HSA), as single experiments or in batch for High Throughput Screens. This user-friendly tool provides laboratory scientists with an easy and systematic way to analyze their data. The companion package provides bioinformaticians with critical implementations of routines enabling the processing of combination data. Availability and Implementation: Combenefit is provided as a Matlab package but also as standalone software for Windows (http://sourceforge.net/projects/combenefit/). Contact: Giovanni.DiVeroli@cruk.cam.ac.uk. Supplementary information:Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: Augmented reality applications that support blended learning in medical training have gained public and scientific interest, but the literature to date is lacking to support such evidence.
Abstract: Background Computer-based applications are increasingly used to support the training of medical professionals. Augmented reality applications (ARAs) render an interactive virtual layer on top of reality. The use of ARAs is of real interest to medical education because they blend digital elements with the physical learning environment. This will result in new educational opportunities. The aim of this systematic review is to investigate to which extent augmented reality applications are currently used to validly support medical professionals training.

Journal ArticleDOI
TL;DR: This article reports on the evolution of the Toolkit over the last ten years, focusing on the expansion of the tool repertoire (e.g. CS-BLAST, HHblits) and on infrastructural work needed to remain operative in a changing web environment.
Abstract: The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts and non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) and internally (e.g. HHpred, HHblits, PCOILS). While a beta version of the Toolkit was released 10 years ago, the current production-level release has been available since 2008 and has serviced more than 1.6 million external user queries. The usage of the Toolkit has continued to increase linearly over the years, reaching more than 400 000 queries in 2015. In fact, through the breadth of its tools and their tight interconnection, the Toolkit has become an excellent platform for experimental scientists as well as a useful resource for teaching bioinformatic inquiry to students in the life sciences. In this article, we report on the evolution of the Toolkit over the last ten years, focusing on the expansion of the tool repertoire (e.g. CS-BLAST, HHblits) and on infrastructural work needed to remain operative in a changing web environment.

Journal ArticleDOI
TL;DR: The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments.
Abstract: The majority of transcriptome sequencing (RNA-seq) expression studies in plants remain underutilized and inaccessible due to the use of disparate transcriptome references and the lack of skills and resources to analyze and visualize these data. We have developed expVIP, an expression visualization and integration platform, which allows easy analysis of RNA-seq data combined with an intuitive and interactive interface. Users can analyze public and user-specified data sets with minimal bioinformatics knowledge using the expVIP virtual machine. This generates a custom Web browser to visualize, sort, and filter the RNA-seq data and provides outputs for differential gene expression analysis. We demonstrate expVIP’s suitability for polyploid crops and evaluate its performance across a range of biologically relevant scenarios. To exemplify its use in crop research, we developed a flexible wheat (Triticum aestivum) expression browser (www.wheat-expression.com) that can be expanded with user-generated data in a local virtual machine environment. The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments.

Journal ArticleDOI
TL;DR: The all-in-one approach to process point disordered structures, powerful command line interface, excellent performance, flexibility and GNU GPL license make the supercell program a versatile set of tools for disordered structure manipulations.
Abstract: Disordered compounds are crucially important for fundamental science and industrial applications. Yet most available methods to explore solid-state material properties require ideal periodicity, which, strictly speaking, does not exist in this type of materials. The supercell approximation is a way to imply periodicity to disordered systems while preserving “disordered” properties at the local level. Although this approach is very common, most of the reported research still uses supercells that are constructed “by hand” and ad-hoc. This paper describes a software named supercell, which has been designed to facilitate the construction of structural models for the description of vacancy or substitution defects in otherwise periodically-ordered (crystalline) materials. The presented software allows to apply the supercell approximation systematically with an all-in-one implementation of algorithms for structure manipulation, supercell generation, permutations of atoms and vacancies, charge balancing, detecting symmetry-equivalent structures, Coulomb energy calculations and sampling output configurations. The mathematical and physical backgrounds of the program are presented, along with an explanation of the main algorithms and relevant technical details of their implementation. Practical applications of the program to different types of solid-state materials are given to illustrate some of its potential fields of application. Comparisons of the various algorithms implemented within supercell with similar solutions are presented where possible. The all-in-one approach to process point disordered structures, powerful command line interface, excellent performance, flexibility and GNU GPL license make the supercell program a versatile set of tools for disordered structures manipulations.

Journal ArticleDOI
TL;DR: OpenSWATH, SWATH 2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from sequential window acquisition of all theoretical fragment-ion spectra, provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics.
Abstract: LFQbench, a software tool to assess the quality of label-free quantitative proteomics analyses, enables developers to benchmark and improve analytic methods.

Journal ArticleDOI
TL;DR: The new version of EDGAR 2.0 provides a quick and user-friendly survey of evolutionary relationships between microbial genomes and simplifies the process of obtaining new biological insights into their differential gene content.
Abstract: The rapidly increasing availability of microbial genome sequences has led to a growing demand for bioinformatics software tools that support the functional analysis based on the comparison of closely related genomes. By utilizing comparative approaches on gene level it is possible to gain insights into the core genes which represent the set of shared features for a set of organisms under study. Vice versa singleton genes can be identified to elucidate the specific properties of an individual genome. Since initial publication, the EDGAR platform has become one of the most established software tools in the field of comparative genomics. Over the last years, the software has been continuously improved and a large number of new analysis features have been added. For the new version, EDGAR 2.0, the gene orthology estimation approach was newly designed and completely re-implemented. Among other new features, EDGAR 2.0 provides extended phylogenetic analysis features like AAI (Average Amino Acid Identity) and ANI (Average Nucleotide Identity) matrices, genome set size statistics and modernized visualizations like interactive synteny plots or Venn diagrams. Thereby, the software supports a quick and user-friendly survey of evolutionary relationships between microbial genomes and simplifies the process of obtaining new biological insights into their differential gene content. All features are offered to the scientific community via a web-based and therefore platform-independent user interface, which allows easy browsing of precomputed datasets. The web server is accessible at http://edgar.computational.bio.

Book
01 Apr 2016
TL;DR: This book gives an overview of modern data visualization methods, both in theory and practice, and details modern graphical tools such as mosaic plots, parallel coordinate plots, and linked views.
Abstract: Visualizing the data is an essential part of any data analysis. Modern computing developments have led to big improvements in graphic capabilities and there are many new possibilities for data displays. This book gives an overview of modern data visualization methods, both in theory and practice. It details modern graphical tools such as mosaic plots, parallel coordinate plots, and linked views. Coverage also examines graphical methodology for particular areas of statistics, for example Bayesian analysis, genomic data and cluster analysis, as well software for graphics.

Journal ArticleDOI
TL;DR: Genome‐wide association study Genomic prediction Simulation and experimental design and the design of experimental design shows clear trends in genome-wide association studies.
Abstract: Most human diseases and agriculturally important traits are complex. Dissecting their genetic architecture requires continued development of innovative and powerful statistical methods. Corresponding advances in computing tools are critical to efficiently use these statistical innovations and to enhance and accelerate biomedical and agricultural research and applications. The genome association and prediction integrated tool (GAPIT) was first released in 2012 and became widely used for genome-wide association studies (GWAS) and genomic prediction. The GAPIT implemented computationally efficient statistical methods, including the compressed mixed linear model (CMLM) and genomic prediction by using genomic best linear unbiased prediction (gBLUP). New state-of-the-art statistical methods have now been implemented in a new, enhanced version of GAPIT. These methods include factored spectrally transformed linear mixed models (FaST-LMM), enriched CMLM (ECMLM), FaST-LMM-Select, and settlement of mixed linear models under progressively exclusive relationship (SUPER). The genomic prediction methods implemented in this new release of the GAPIT include gBLUP based on CMLM, ECMLM, and SUPER. Additionally, the GAPIT was updated to improve its existing output display features and to add new data display and evaluation functions, including new graphing options and capabilities, phenotype simulation, power analysis, and cross-validation. These enhancements make the GAPIT a valuable resource for determining appropriate experimental designs and performing GWAS and genomic prediction. The enhanced R-based GAPIT software package uses state-of-the-art methods to conduct GWAS and genomic prediction. The GAPIT also provides new functions for developing experimental designs and creating publication-ready tabular summaries and graphs to improve the efficiency and application of genomic research.

Journal ArticleDOI
TL;DR: The software tandem, DCC and CircTest uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data and demonstrates the benefits of this approach on previously reported age-dependent circRNAs in the fruit fly.
Abstract: Motivation Circular RNAs (circRNAs) are a poorly characterized class of molecules that have been identified decades ago. Emerging high-throughput sequencing methods as well as first reports on confirmed functions have sparked new interest in this RNA species. However, the computational detection and quantification tools are still limited. Results We developed the software tandem, DCC and CircTest DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates. We assessed the detection performance of DCC on a newly generated mouse brain data set and publicly available sequencing data. Our software achieves a much higher precision than state-of-the-art competitors at similar sensitivity levels. Moreover, DCC estimates circRNA versus host gene expression from counting junction and non-junction reads. These read counts are finally used to test for host gene-independence of circRNA expression across different experimental conditions by our R package CircTest We demonstrate the benefits of this approach on previously reported age-dependent circRNAs in the fruit fly. Availability and implementation The source code of DCC and CircTest is licensed under the GNU General Public Licence (GPL) version 3 and available from https://github.com/dieterich-lab/[DCC or CircTest]. Contact christoph.dieterich@age.mpg.de Supplementary information Supplementary data are available at Bioinformatics online.

Proceedings ArticleDOI
Xin Ye1, Hui Shen1, Xiao Ma1, Razvan Bunescu1, Chang Liu1 
14 May 2016
TL;DR: This paper proposes bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space and shows that the learned vector space embeddings lead to improvements in a previously explored bug localization task and a newly introduced task of linking API documents to computer programming questions.
Abstract: The application of information retrieval techniques to search tasks in software engineering is made difficult by the lexical gap between search queries, usually expressed in natural language (eg English), and retrieved documents, usually expressed in code (eg programming languages) This is often the case in bug and feature location, community question answering, or more generally the communication between technical personnel and non-technical stake holders in a software project In this paper, we propose bridging the lexical gap by projecting natural language statements and code snippets as meaning vectors in a shared representation space In the proposed architecture, word embeddings are first trained on API documents, tutorials, and reference documents, and then aggregated in order to estimate semantic similarities between documents Empirical evaluations show that the learned vector space embeddings lead to improvements in a previously explored bug localization task and a newly defined task of linking API documents to computer programming questions

Proceedings ArticleDOI
15 Oct 2016
TL;DR: Graphicionado augments the vertex programming paradigm, allowing different graph analytics applications to be mapped to the same accelerator framework, while maintaining flexibility through a small set of reconfigurable blocks, for high-performance, energy-efficient processing of graph analytics workloads.
Abstract: Graphs are one of the key data structures for many real-world computing applications and the importance of graph analytics is ever-growing. While existing software graph processing frameworks improve programmability of graph analytics, underlying general purpose processors still limit the performance and energy efficiency of graph analytics. We architect a domain-specific accelerator, Graphicionado, for high-performance, energy-efficient processing of graph analytics workloads. For efficient graph analytics processing, Graphicionado exploits not only data structure-centric datapath specialization, but also memory subsystem specialization, all the while taking advantage of the parallelism inherent in this domain. Graphicionado augments the vertex programming paradigm, allowing different graph analytics applications to be mapped to the same accelerator framework, while maintaining flexibility through a small set of reconfigurable blocks. This paper describes Graphicionado pipeline design choices in detail and gives insights on how Graphicionado combats application execution inefficiencies on general-purpose CPUs. Our results show that Graphicionado achieves a 1.76 − 6.54x speedup while consuming 50 − 100x less energy compared to a state-of-the-art software graph analytics processing framework executing 32 threads on a 16-core Haswell Xeon processor.