Top 136 papers published by Helsinki Institute for Information Technology in 2017

Journal Article•DOI•

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

[...]

Aki Vehtari¹, Andrew Gelman², Jonah Gabry²•Institutions (2)

Helsinki Institute for Information Technology¹, Columbia University²

01 Sep 2017-Statistics and Computing

TL;DR: In this paper, leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are used to estimate pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values.

...read moreread less

Abstract: Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.

...read moreread less

1,533 citations

Journal Article•DOI•

A definition for gamification: anchoring gamification in the service marketing literature

[...]

Kai Huotari¹, Juho Hamari²•Institutions (2)

Helsinki Institute for Information Technology¹, University of Tampere²

01 Feb 2017-Electronic Markets

TL;DR: An attempt to tie in gamification with service marketing theory, which conceptualizes the consumer as a co-producer of the service as well as proposing a definition for gamification, one that emphasizes its experiential nature.

...read moreread less

Abstract: “Gamification” has gained considerable scholarly and practitioner attention; however, the discussion in academia has been largely confined to the human–computer interaction and game studies domains. Since gamification is often used in service design, it is important that the concept be brought in line with the service literature. So far, though, there has been a dearth of such literature. This article is an attempt to tie in gamification with service marketing theory, which conceptualizes the consumer as a co-producer of the service. It presents games as service systems composed of operant and operand resources. It proposes a definition for gamification, one that emphasizes its experiential nature. The definition highlights four important aspects of gamification: affordances, psychological mediators, goals of gamification and the context of gamification. Using the definition the article identifies four possible gamifying actors and examines gamification as communicative staging of the service environment.

...read moreread less

585 citations

Journal Article•DOI•

Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis

[...]

Dominik Sacha¹, Leishi Zhang², Michael Sedlmair³, John Aldo Lee, Jaakko Peltonen⁴, Daniel Weiskopf¹, Stephen C. North, Daniel A. Keim⁵ - Show less +4 more•Institutions (5)

University of Konstanz¹, Middlesex University², University of Vienna³, Helsinki Institute for Information Technology⁴, University of Stuttgart⁵

01 Jan 2017-IEEE Transactions on Visualization and Computer Graphics

TL;DR: This work systematically studied the visual analytics and visualization literature to investigate how analysts interact with automatic DR techniques, and proposes a “human in the loop” process model that provides a general lens for the evaluation of visual interactive DR systems.

...read moreread less

Abstract: Dimensionality Reduction (DR) is a core building block in visualizing multidimensional data. For DR techniques to be useful in exploratory data analysis, they need to be adapted to human needs and domain-specific problems, ideally, interactively, and on-the-fly. Many visual analytics systems have already demonstrated the benefits of tightly integrating DR with interactive visualizations. Nevertheless, a general, structured understanding of this integration is missing. To address this, we systematically studied the visual analytics and visualization literature to investigate how analysts interact with automatic DR techniques. The results reveal seven common interaction scenarios that are amenable to interactive control such as specifying algorithmic constraints, selecting relevant features, or choosing among several DR algorithms. We investigate specific implementations of visual analysis systems integrating DR, and analyze ways that other machine learning methods have been combined with DR. Summarizing the results in a “human in the loop” process model provides a general lens for the evaluation of visual interactive DR systems. We apply the proposed model to study and classify several systems previously described in the literature, and to derive future research opportunities.

...read moreread less

228 citations

Journal Article•DOI•

Sparsity information and regularization in the horseshoe and other shrinkage priors

[...]

Juho Piironen, Aki Vehtari¹•Institutions (1)

Helsinki Institute for Information Technology¹

06 Jul 2017-arXiv: Methodology

TL;DR: The regularized horseshoe prior as mentioned in this paper is a generalization of the spike-and-slab prior with a finite slab width, which allows for a minimum level of regularization to the largest values.

...read moreread less

Abstract: The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage hyperparameter based on the prior information about the degree of sparsity in the parameter vector. Second, the horseshoe prior has the undesired property that there is no possibility of specifying separately information about sparsity and the amount of regularization for the largest coefficients, which can be problematic with weakly identified parameters, such as the logistic regression coefficients in the case of data separation. This paper proposes solutions to both of these problems. We introduce a concept of effective number of nonzero parameters, show an intuitive way of formulating the prior for the global hyperparameter based on the sparsity assumptions, and argue that the previous default choices are dubious based on their tendency to favor solutions with more unshrunk parameters than we typically expect a priori. Moreover, we introduce a generalization to the horseshoe prior, called the regularized horseshoe, that allows us to specify a minimum level of regularization to the largest values. We show that the new prior can be considered as the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe resembles the spike-and-slab with an infinitely wide slab. Numerical experiments on synthetic and real world data illustrate the benefit of both of these theoretical advances.

...read moreread less

227 citations

Journal Article•DOI•

Comparison of Bayesian predictive methods for model selection

[...]

Juho Piironen¹, Aki Vehtari¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 May 2017-Statistics and Computing

TL;DR: The study demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.

...read moreread less

Abstract: The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.

...read moreread less

207 citations

Journal Article•DOI•

Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies.

[...]

Christian Benner¹, Aki S. Havulinna², Aki S. Havulinna¹, Marjo-Riitta Järvelin, Veikko Salomaa², Samuli Ripatti¹, Samuli Ripatti³, Matti Pirinen⁴, Matti Pirinen¹ - Show less +5 more•Institutions (4)

University of Helsinki¹, National Institutes of Health², Wellcome Trust Sanger Institute³, Helsinki Institute for Information Technology⁴

05 Oct 2017-American Journal of Human Genetics

TL;DR: This paper showed that a reference panel of 1,000 individuals from the target population is adequate for a GWAS cohort of up to 10, 000 individuals, whereas smaller panels, such as those from the 1000 Genomes Project, should be avoided.

...read moreread less

Abstract: During the past few years, various novel statistical methods have been developed for fine-mapping with the use of summary statistics from genome-wide association studies (GWASs). Although these approaches require information about the linkage disequilibrium (LD) between variants, there has not been a comprehensive evaluation of how estimation of the LD structure from reference genotype panels performs in comparison with that from the original individual-level GWAS data. Using population genotype data from Finland and the UK Biobank, we show here that a reference panel of 1,000 individuals from the target population is adequate for a GWAS cohort of up to 10,000 individuals, whereas smaller panels, such as those from the 1000 Genomes Project, should be avoided. We also show, both theoretically and empirically, that the size of the reference panel needs to scale with the GWAS sample size; this has important consequences for the application of these methods in ongoing GWAS meta-analyses and large biobank studies. We conclude by providing software tools and by recommending practices for sharing LD information to more efficiently exploit summary statistics in genetics research.

...read moreread less

174 citations

Journal Article•DOI•

Sparsity information and regularization in the horseshoe and other shrinkage priors

[...]

Juho Piironen, Aki Vehtari¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 Jan 2017-Electronic Journal of Statistics

TL;DR: A concept of effective number of nonzero parameters is introduced, an intuitive way of formulating the prior for the global hyperparameter based on the sparsity assumptions is shown, and the previous default choices are argued to be dubious based on their tendency to favor solutions with more unshrunk parameters than the authors typically expect a priori.

...read moreread less

Abstract: The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage hyperparameter based on the prior information about the degree of sparsity in the parameter vector. Second, the horseshoe prior has the undesired property that there is no possibility of specifying separately information about sparsity and the amount of regularization for the largest coefficients, which can be problematic with weakly identified parameters, such as the logistic regression coefficients in the case of data separation. This paper proposes solutions to both of these problems. We introduce a concept of effective number of nonzero parameters, show an intuitive way of formulating the prior for the global hyperparameter based on the sparsity assumptions, and argue that the previous default choices are dubious based on their tendency to favor solutions with more unshrunk parameters than we typically expect a priori. Moreover, we introduce a generalization to the horseshoe prior, called the regularized horseshoe, that allows us to specify a minimum level of regularization to the largest values. We show that the new prior can be considered as the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe resembles the spike-and-slab with an infinitely wide slab. Numerical experiments on synthetic and real world data illustrate the benefit of both of these theoretical advances.

...read moreread less

151 citations

Journal Article•DOI•

Critical Assessment of Small Molecule Identification 2016: automated methods

[...]

Emma L. Schymanski¹, Christoph Ruttkies², Martin Krauss³, Céline Brouard⁴, Céline Brouard⁵, Tobias Kind⁶, Kai Dührkop⁷, Felicity Allen⁸, Arpana Vaniya⁶, Dries Verdegem⁹, Sebastian Böcker⁷, Juho Rousu⁴, Juho Rousu⁵, Huibin Shen⁴, Huibin Shen⁵, Hiroshi Tsugawa, Tanvir Sajed⁸, Oliver Fiehn⁶, Oliver Fiehn¹⁰, Bart Ghesquière⁹, Steffen Neumann² - Show less +17 more•Institutions (10)

Swiss Federal Institute of Aquatic Science and Technology¹, Leibniz Association², Helmholtz Centre for Environmental Research - UFZ³, Helsinki Institute for Information Technology⁴, Aalto University⁵, University of California, Davis⁶, University of Jena⁷, University of Alberta⁸, Katholieke Universiteit Leuven⁹, King Abdulaziz University¹⁰

27 Mar 2017-Journal of Cheminformatics

TL;DR: The improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial, and the achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for “known unknowns”.

...read moreread less

Abstract: The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest ( www.casmi-contest.org ) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification. The Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in “Category 2: Best Automatic Structural Identification—In Silico Fragmentation Only”, won by Team Brouard with 41% challenge wins. The winner of “Category 3: Best Automatic Structural Identification—Full Information” was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways. The improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for “known unknowns”. As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for “real life” annotations. The true “unknown unknowns” remain to be evaluated in future CASMI contests.

...read moreread less

131 citations

Proceedings Article•

Nonlinear ICA of temporally dependent stationary sources

[...]

Aapo Hyvärinen, Hiroshi Morioka¹•Institutions (1)

Helsinki Institute for Information Technology¹

10 Apr 2017

TL;DR: It is proved that the method estimates the sources for general smooth mixing nonlinearities, assuming the sources have sufficiently strong temporal dependencies, and these dependencies are in a certain way different from dependencies found in Gaussian processes.

...read moreread less

Abstract: We develop a nonlinear generalization of independent component analysis (ICA) or blind source separation, based on temporal dependencies (e.g. autocorrelations). We introduce a nonlinear generative model where the independent sources are assumed to be temporally dependent, non-Gaussian, and stationary, and we observe arbitrarily nonlinear mixtures of them. We develop a method for estimating the model (i.e. separating the sources) based on logistic regression in a neural network which learns to discriminate between a short temporal window of the data vs. a temporal window of temporally permuted data. We prove that the method estimates the sources for general smooth mixing nonlinearities, assuming the sources have sufficiently strong temporal dependencies, and these dependencies are in a certain way different from dependencies found in Gaussian processes. For Gaussian (and similar) sources, the method estimates the nonlinear part of the mixing. We thus provide the first rigorous and general proof of identifiability of nonlinear ICA for temporally dependent sources, together with a practical method for its estimation.

...read moreread less

130 citations

Journal Article•DOI•

Frequency-dependent selection in vaccine-associated pneumococcal population dynamics

[...]

Jukka Corander¹, Jukka Corander², Jukka Corander³, Christophe Fraser⁴, Michael U. Gutmann⁵, Brian J. Arnold⁶, William P. Hanage⁶, Stephen D. Bentley², Marc Lipsitch⁶, Nicholas J. Croucher⁷ - Show less +6 more•Institutions (7)

Helsinki Institute for Information Technology¹, Wellcome Trust Sanger Institute², University of Oslo³, University of Oxford⁴, University of Edinburgh⁵, Harvard University⁶, Imperial College London⁷

16 Oct 2017-Nature Ecology and Evolution

TL;DR: It is shown that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes, suggesting negative frequency-dependent selection drives post-vaccination population restructuring.

...read moreread less

Abstract: Many bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these questions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent introduction of partial-coverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Bayesian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions.

...read moreread less

127 citations

Journal Article•DOI•

Effect of KIT and PDGFRA Mutations on Survival in Patients With Gastrointestinal Stromal Tumors Treated With Adjuvant Imatinib: An Exploratory Analysis of a Randomized Clinical Trial.

[...]

Heikki Joensuu¹, Eva Wardelmann², Harri Sihto¹, Mikael Eriksson³, Kirsten Sundby Hall⁴, Annette Reichardt, Joerg T. Hartmann, Daniel Pink, Silke Cameron⁵, Peter Hohenberger⁶, Salah-Eddin Al-Batran, Marcus Schlemmer⁷, Sebastian Bauer⁸, Bengt Nilsson⁹, Raija Kallio¹⁰, Jouni Junnila, Aki Vehtari¹¹, Peter Reichardt - Show less +14 more•Institutions (11)

University of Helsinki¹, University of Münster², Lund University³, Oslo University Hospital⁴, University of Göttingen⁵, University of Mannheim⁶, Ludwig Maximilian University of Munich⁷, University of Duisburg-Essen⁸, Sahlgrenska University Hospital⁹, Oulu University Hospital¹⁰, Helsinki Institute for Information Technology¹¹

01 May 2017-JAMA Oncology

TL;DR: Patients with KIT exon 11 deletion mutations benefit most from the longer duration of adjuvant imatinib, whereas no significant benefit from the 3-year treatment was found in the other mutational subgroups examined.

...read moreread less

Abstract: IMPORTANCE: Little is known about whether the duration of adjuvant imatinib influences the prognostic significance of KIT proto-oncogene receptor tyrosine kinase (KIT) and platelet-derived growth factor receptor α (PDGFRA) mutations. OBJECTIVE: To investigate the effect of KIT and PDGFRA mutations on recurrence-free survival (RFS) in patients with gastrointestinal stromal tumors (GISTs) treated with surgery and adjuvant imatinib. DESIGN, SETTING, AND PARTICIPANTS: This exploratory study is based on the Scandinavian Sarcoma Group VIII/Arbeitsgemeinschaft Internistische Onkologie (SSGXVIII/AIO) multicenter clinical trial. Between February 4, 2004, and September 29, 2008, 400 patients who had undergone surgery for GISTs with a high risk of recurrence were randomized to receive adjuvant imatinib for 1 or 3 years. Of the 397 patients who provided consent, 341 (85.9%) had centrally confirmed, localized GISTs with mutation analysis for KIT and PDGFRA performed centrally using conventional sequencing. During a median follow-up of 88 months (completed December 31, 2013), 142 patients had GIST recurrence. Data of the evaluable population were analyzed February 4, 2004, through December 31, 2013. MAIN OUTCOMES AND MEASURES: The main outcome was RFS. Mutations were grouped by the gene and exon. KIT exon 11 mutations were further grouped as deletion or insertion-deletion mutations, substitution mutations, insertion or duplication mutations, and mutations that involved codons 557 and/or 558. RESULTS: Of the 341 patients (175 men and 166 women; median age at study entry, 62 years) in the 1-year group and 60 years in the 3-year group), 274 (80.4%) had GISTs with a KIT mutation, 43 (12.6%) had GISTs that harbored a PDGFRA mutation, and 24 (7.0%) had GISTs that were wild type for these genes. PDGFRA mutations and KIT exon 11 insertion or duplication mutations were associated with favorable RFS, whereas KIT exon 9 mutations were associated with unfavorable outcome. Patients with KIT exon 11 deletion or insertion-deletion mutation had better RFS when allocated to the 3-year group compared with the 1-year group (5-year RFS, 71.0% vs 41.3%; P < .001), whereas no significant benefit from the 3-year treatment was found in the other mutational subgroups examined. KIT exon 11 deletion mutations, deletions that involved codons 557 and/or 558, and deletions that led to pTrp557-Lys558del were associated with poor RFS in the 1-year group but not in the 3-year group. Similarly, in the subset with KIT exon 11 deletion mutations, higher-than-the-median mitotic counts were associated with unfavorable RFS in the 1-year group but not in the 3-year group. CONCLUSIONS AND RELEVANCE: Patients with KIT exon 11 deletion mutations benefit most from the longer duration of adjuvant imatinib. The duration of adjuvant imatinib modifies the risk of GIST recurrence associated with some KIT mutations, including deletions that affect exon 11 codons 557 and/or 558.

...read moreread less

Journal Article•DOI•

Efficient Inference of Recent and Ancestral Recombination within Bacterial Populations.

[...]

Rafal Mostowy¹, Nicholas J. Croucher¹, Cheryl P. Andam², Jukka Corander³, William P. Hanage², Pekka Marttinen³ - Show less +2 more•Institutions (3)

Imperial College London¹, Harvard University², Helsinki Institute for Information Technology³

01 May 2017-Molecular Biology and Evolution

TL;DR: A novel algorithm called fastGEAR is introduced which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins, and provides insight into recombinations affecting deep branches of the phylogenetic tree.

...read moreread less

Abstract: Prokaryotic evolution is affected by horizontal transfer of genetic material through recombination. Inference of an evolutionary tree of bacteria thus relies on accurate identification of the population genetic structure and recombination-derived mosaicism. Rapidly growing databases represent a challenge for computational methods to detect recombinations in bacterial genomes. We introduce a novel algorithm called fastGEAR which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins. The algorithm detects both recent recombinations (affecting a few isolates) and ancestral recombinations between detected lineages (affecting entire lineages), thus providing insight into recombinations affecting deep branches of the phylogenetic tree. In simulations, fastGEAR had comparable power to detect recent recombinations and outstanding power to detect the ancestral ones, compared with state-of-the-art methods, often with a fraction of computational cost. We demonstrate the utility of the method by analyzing a collection of 616 whole-genomes of a recombinogenic pathogen Streptococcus pneumoniae, for which the method provided a high-resolution view of recombination across the genome. We examined in detail the penicillin-binding genes across the Streptococcus genus, demonstrating previously undetected genetic exchanges between different species at these three loci. Hence, fastGEAR can be readily applied to investigate mosaicism in bacterial genes across multiple species. Finally, fastGEAR correctly identified many known recombination hotspots and pointed to potential new ones. Matlab code and Linux/Windows executables are available at https://users.ics.aalto.fi/~pemartti/fastGEAR/ (last accessed February 6, 2017).

...read moreread less

Journal Article•DOI•

Nudged elastic band calculations accelerated with Gaussian process regression.

[...]

Olli-Pekka Koistinen, Freyja B. Dagbjartsdóttir¹, Vilhjálmur Ásgeirsson¹, Aki Vehtari², Hannes Jónsson¹ - Show less +1 more•Institutions (2)

University of Iceland¹, Helsinki Institute for Information Technology²

21 Sep 2017-Journal of Chemical Physics

TL;DR: In this paper, the Hessian matrix at the initial and final state minima can be used as input in the minimum energy path calculation, thereby improving stability and reducing the number of iterations needed for convergence.

...read moreread less

Abstract: Minimum energy paths for transitions such as atomic and/or spin rearrangements in thermalized systems are the transition paths of largest statistical weight. Such paths are frequently calculated using the nudged elastic band method, where an initial path is iteratively shifted to the nearest minimum energy path. The computational effort can be large, especially when ab initio or electron density functional calculations are used to evaluate the energy and atomic forces. Here, we show how the number of such evaluations can be reduced by an order of magnitude using a Gaussian process regression approach where an approximate energy surface is generated and refined in each iteration. When the goal is to evaluate the transition rate within harmonic transition state theory, the evaluation of the Hessian matrix at the initial and final state minima can be carried out beforehand and used as input in the minimum energy path calculation, thereby improving stability and reducing the number of iterations needed for convergence. A Gaussian process model also provides an uncertainty estimate for the approximate energy surface, and this can be used to focus the calculations on the lesser-known part of the path, thereby reducing the number of needed energy and force evaluations to a half in the present calculations. The methodology is illustrated using the two-dimensional Muller-Brown potential surface and performance assessed on an established benchmark involving 13 rearrangement transitions of a heptamer island on a solid surface.

...read moreread less

Journal Article•DOI•

Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors

[...]

Anna Cichonska¹, Balaguru Ravikumar², Elina Parri², Sanna Timonen², Tapio Pahikkala³, Antti Airola³, Krister Wennerberg², Juho Rousu¹, Tero Aittokallio⁴ - Show less +5 more•Institutions (4)

Helsinki Institute for Information Technology¹, University of Helsinki², Information Technology University³, University of Turku⁴

07 Aug 2017-PLOS Computational Biology

TL;DR: A systematic computational-experimental framework for the prediction and pre-clinical verification of drug-target interactions using a well-established kernel-based regression algorithm as the prediction model is introduced and demonstrated that the kernel- based modeling approach offers practical benefits for probing novel insights into the mode of action of investigational compounds, and for the identification of new target selectivities for drug repurposing applications.

...read moreread less

Abstract: Due to relatively high costs and labor required for experimental profiling of the full target space of chemical compounds, various machine learning models have been proposed as cost-effective means to advance this process in terms of predicting the most potent compound-target interactions for subsequent verification. However, most of the model predictions lack direct experimental validation in the laboratory, making their practical benefits for drug discovery or repurposing applications largely unknown. Here, we therefore introduce and carefully test a systematic computational-experimental framework for the prediction and pre-clinical verification of drug-target interactions using a well-established kernel-based regression algorithm as the prediction model. To evaluate its performance, we first predicted unmeasured binding affinities in a large-scale kinase inhibitor profiling study, and then experimentally tested 100 compound-kinase pairs. The relatively high correlation of 0.77 (p < 0.0001) between the predicted and measured bioactivities supports the potential of the model for filling the experimental gaps in existing compound-target interaction maps. Further, we subjected the model to a more challenging task of predicting target interactions for such a new candidate drug compound that lacks prior binding profile information. As a specific case study, we used tivozanib, an investigational VEGF receptor inhibitor with currently unknown off-target profile. Among 7 kinases with high predicted affinity, we experimentally validated 4 new off-targets of tivozanib, namely the Src-family kinases FRK and FYN A, the non-receptor tyrosine kinase ABL1, and the serine/threonine kinase SLK. Our sub-sequent experimental validation protocol effectively avoids any possible information leakage between the training and validation data, and therefore enables rigorous model validation for practical applications. These results demonstrate that the kernel-based modeling approach offers practical benefits for probing novel insights into the mode of action of investigational compounds, and for the identification of new target selectivities for drug repurposing applications.

...read moreread less

Journal Article•DOI•

Nudged elastic band calculations accelerated with Gaussian process regression

[...]

Olli-Pekka Koistinen, Freyja B. Dagbjartsdóttir¹, Vilhjálmur Ásgeirsson¹, Aki Vehtari², Hannes Jónsson¹ - Show less +1 more•Institutions (2)

University of Iceland¹, Helsinki Institute for Information Technology²

14 Jun 2017-arXiv: Chemical Physics

TL;DR: The number of evaluations of the Hessian matrix at the initial and final state minima can be carried out beforehand and used as input in the minimum energy path calculation, thereby improving stability and reducing the number of iterations needed for convergence.

...read moreread less

Abstract: Minimum energy paths for transitions such as atomic and/or spin rearrangements in thermalized systems are the transition paths of largest statistical weight. Such paths are frequently calculated using the nudged elastic band method, where an initial path is iteratively shifted to the nearest minimum energy path. The computational effort can be large, especially when ab initio or electron density functional calculations are used to evaluate the energy and atomic forces. Here, we show how the number of such evaluations can be reduced by an order of magnitude using a Gaussian process regression approach where an approximate energy surface is generated and refined in each iteration. When the goal is to evaluate the transition rate within harmonic transition state theory, the evaluation of the Hessian matrix at the initial and final state minima can be carried out beforehand and used as input in the minimum energy path calculation, thereby improving stability and reducing the number of iterations needed for convergence. A Gaussian process model also provides an uncertainty estimate for the approximate energy surface, and this can be used to focus the calculations on the lesser-known part of the path, thereby reducing the number of needed energy and force evaluations to a half in the present calculations. The methodology is illustrated using the two-dimensional Muller-Brown potential surface and performance assessed on an established benchmark involving 13 rearrangement transitions of a heptamer island on a solid surface.

...read moreread less

Journal Article•DOI•

A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury.

[...]

Pekka Kohonen¹, Juuso Parkkinen², Egon Willighagen¹, Egon Willighagen³, Rebecca Ceder¹, Krister Wennerberg⁴, Samuel Kaski², Roland C. Grafström¹ - Show less +4 more•Institutions (4)

Karolinska Institutet¹, Helsinki Institute for Information Technology², Maastricht University³, University of Helsinki⁴

03 Jul 2017-Nature Communications

TL;DR: A ‘big data compacting and data fusion’—concept to capture diverse adverse outcomes on cellular and organismal levels is utilized to capture unanticipated harmful effects of chemicals and drug molecules.

...read moreread less

Abstract: Predicting unanticipated harmful effects of chemicals and drug molecules is a difficult and costly task. Here we utilize a 'big data compacting and data fusion'-concept to capture diverse adverse outcomes on cellular and organismal levels. The approach generates from transcriptomics data set a 'predictive toxicogenomics space' (PTGS) tool composed of 1,331 genes distributed over 14 overlapping cytotoxicity-related gene space components. Involving ∼2.5 × 108 data points and 1,300 compounds to construct and validate the PTGS, the tool serves to: explain dose-dependent cytotoxicity effects, provide a virtual cytotoxicity probability estimate intrinsic to omics data, predict chemically-induced pathological states in liver resulting from repeated dosing of rats, and furthermore, predict human drug-induced liver injury (DILI) from hepatocyte experiments. Analysing 68 DILI-annotated drugs, the PTGS tool outperforms and complements existing tests, leading to a hereto-unseen level of DILI prediction accuracy.

...read moreread less

Journal Article•DOI•

Fine-Scale Genetic Structure in Finland

[...]

Sini Kerminen¹, Aki S. Havulinna¹, Aki S. Havulinna², Garrett Hellenthal³, Alicia R. Martin⁴, Alicia R. Martin⁵, Antti-Pekka Sarin², Antti-Pekka Sarin¹, Markus Perola⁶, Markus Perola¹, Markus Perola², Aarno Palotie, Veikko Salomaa², Mark J. Daly, Samuli Ripatti⁷, Samuli Ripatti¹, Matti Pirinen¹, Matti Pirinen⁸ - Show less +14 more•Institutions (8)

University of Helsinki¹, National Institutes of Health², University College London³, Massachusetts Institute of Technology⁴, Harvard University⁵, University of Tartu⁶, Wellcome Trust Sanger Institute⁷, Helsinki Institute for Information Technology⁸

01 Oct 2017-G3: Genes, Genomes, Genetics

TL;DR: A highly geographically clustered genetic structure in Finland is revealed and its connections to the settlement history as well as to the current dialectal regions of the Finnish language are reported.

...read moreread less

Abstract: Coupling dense genotype data with new computational methods offers unprecedented opportunities for individual-level ancestry estimation once geographically precisely defined reference data sets become available. We study such a reference data set for Finland containing 2376 such individuals from the FINRISK Study survey of 1997 both of whose parents were born close to each other. This sampling strategy focuses on the population structure present in Finland before the 1950s. By using the recent haplotype-based methods ChromoPainter (CP) and FineSTRUCTURE (FS) we reveal a highly geographically clustered genetic structure in Finland and report its connections to the settlement history as well as to the current dialectal regions of the Finnish language. The main genetic division within Finland shows striking concordance with the 1323 borderline of the treaty of Noteborg. In general, we detect genetic substructure throughout the country, which reflects stronger regional genetic differences in Finland compared to, for example, the UK, which in a similar analysis was dominated by a single unstructured population. We expect that similar population genetic reference data sets will become available for many more populations in the near future with important applications, for example, in forensic genetics and in genetic association studies. With this in mind, we report those extensions of the CP + FS approach that we found most useful in our analyses of the Finnish data.

...read moreread less

Journal Article•DOI•

Towards Perceptual Optimization of the Visual Design of Scatterplots

[...]

Luana Micallef¹, Gregorio Palmas², Antti Oulasvirta³, Tino Weinkauf²•Institutions (3)

Helsinki Institute for Information Technology¹, Royal Institute of Technology², Aalto University³

01 Jun 2017

TL;DR: It is shown how the cost function can be used in an optimizer to search for the optimal visual design for a user’s dataset and task objectives, and case studies demonstrate that the approach can adapt a design to the data, to reveal patterns without user intervention.

...read moreread less

Abstract: Designing a good scatterplot can be difficult for non-experts in visualization, because they need to decide on many parameters, such as marker size and opacity, aspect ratio, color, and rendering order. This paper contributes to research exploring the use of perceptual models and quality metrics to set such parameters automatically for enhanced visual quality of a scatterplot. A key consideration in this paper is the construction of a cost function to capture several relevant aspects of the human visual system, examining a scatterplot design for some data analysis task. We show how the cost function can be used in an optimizer to search for the optimal visual design for a user’s dataset and task objectives (e.g., “reliable linear correlation estimation is more important than class separation”). The approach is extensible to different analysis tasks. To test its performance in a realistic setting, we pre-calibrated it for correlation estimation, class separation, and outlier detection. The optimizer was able to produce designs that achieved a level of speed and success comparable to that of those using human-designed presets (e.g., in R or MATLAB). Case studies demonstrate that the approach can adapt a design to the data, to reveal patterns without user intervention.

...read moreread less

Posted Content•DOI•

Flex ddG: Rosetta ensemble-based estimation of changes in protein-protein binding affinity upon mutation

[...]

Kyle A. Barlow¹, Shane Ó Conchúir¹, Shane Ó Conchúir², Samuel Thompson¹, Pooja Suresh¹, James E. Lucas¹, Markus Heinonen³, Markus Heinonen⁴, Tanja Kortemme - Show less +5 more•Institutions (4)

University of California, San Francisco¹, California Institute for Quantitative Biosciences², Helsinki Institute for Information Technology³, Aalto University⁴

17 Nov 2017-bioRxiv

TL;DR: A method within the Rosetta macromolecular modeling suite (flex ddG) that samples conformational diversity using “backrub” to generate an ensemble of models, then applying torsion minimization, side chain repacking and averaging across this ensemble to estimate interface ΔΔG values is developed.

...read moreread less

Abstract: Computationally modeling changes in binding free energies upon mutation (interface ΔΔG) allows large-scale prediction and perturbation of protein-protein interactions. Additionally, methods that consider and sample relevant conformational plasticity should be able to achieve higher prediction accuracy over methods that do not. To test this hypothesis, we developed a method within the Rosetta macromolecular modeling suite (flex ddG) that samples conformational diversity using “backrub” to generate an ensemble of models, then applying torsion minimization, side chain repacking and averaging across this ensemble to estimate interface ΔΔG values. We tested our method on a curated benchmark set of 1240 mutants, and found the method outperformed existing methods that sampled conformational space to a lesser degree. We observed considerable improvements with flex ddG over existing methods on the subset of small side chain to large side chain mutations, as well as for multiple simultaneous non-alanine mutations, stabilizing mutations, and mutations in antibody-antigen interfaces. Finally, we applied a generalized additive model (GAM) approach to the Rosetta energy function; the resulting non-linear reweighting model improved agreement with experimentally determined interface DDG values, but also highlights the necessity of future energy function improvements.

...read moreread less

Journal Article•DOI•

Systematic identification of feature combinations for predicting drug response with Bayesian multi-view multi-task linear regression

[...]

Muhammad Ammad-ud-din¹, Muhammad Ammad-ud-din², Suleiman A. Khan¹, Suleiman A. Khan², Krister Wennerberg¹, Tero Aittokallio², Tero Aittokallio¹, Tero Aittokallio³ - Show less +4 more•Institutions (3)

University of Helsinki¹, Helsinki Institute for Information Technology², University of Turku³

15 Jul 2017-Bioinformatics

TL;DR: A novel approach that leverages on systematic integration of data sources to identify response predictive features of multiple drugs and exploits the known human cancer kinome for identifying biologically relevant feature combinations is presented.

...read moreread less

Abstract: Motivation A prime challenge in precision cancer medicine is to identify genomic and molecular features that are predictive of drug treatment responses in cancer cells. Although there are several computational models for accurate drug response prediction, these often lack the ability to infer which feature combinations are the most predictive, particularly for high-dimensional molecular datasets. As increasing amounts of diverse genome-wide data sources are becoming available, there is a need to build new computational models that can effectively combine these data sources and identify maximally predictive feature combinations. Results We present a novel approach that leverages on systematic integration of data sources to identify response predictive features of multiple drugs. To solve the modeling task we implement a Bayesian linear regression method. To further improve the usefulness of the proposed model, we exploit the known human cancer kinome for identifying biologically relevant feature combinations. In case studies with a synthetic dataset and two publicly available cancer cell line datasets, we demonstrate the improved accuracy of our method compared to the widely used approaches in drug response analysis. As key examples, our model identifies meaningful combinations of features for the well known EGFR, ALK, PLK and PDGFR inhibitors. Availability and implementation The source code of the method is available at https://github.com/suleimank/mvlr . Contact muhammad.ammad-ud-din@helsinki.fi or suleiman.khan@helsinki.fi. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Proceedings Article•

Non-stationary spectral kernels

[...]

Sami Remes¹, Markus Heinonen¹, Samuel Kaski¹•Institutions (1)

Helsinki Institute for Information Technology¹

04 Dec 2017

TL;DR: It is shown with case studies that these kernels are necessary when modelling even rather simple time series, image or geospatial data with non-stationary characteristics, and derive efficient inference using model whitening and marginalized posterior.

...read moreread less

Abstract: We propose non-stationary spectral kernels for Gaussian process regression by modelling the spectral density of a non-stationary kernel function as a mixture of input-dependent Gaussian process frequency density surfaces. We solve the generalised Fourier transform with such a model, and present a family of non-stationary and non-monotonic kernels that can learn input-dependent and potentially long-range, non-monotonic covariances between inputs. We derive efficient inference using model whitening and marginalized posterior, and show with case studies that these kernels are necessary when modelling even rather simple time series, image or geospatial data with non-stationary characteristics.

...read moreread less

Journal Article•DOI•

Clingo goes linear constraints over reals and integers

[...]

Tomi Janhunen¹, Roland Kaminski², Max Ostrowski², Sebastian Schellhorn², Philipp Wanko², Torsten Schaub³, Torsten Schaub² - Show less +3 more•Institutions (3)

Helsinki Institute for Information Technology¹, University of Potsdam², French Institute for Research in Computer Science and Automation³

01 Jan 2017

TL;DR: In this article, the authors introduce extensions to clingo with difference and linear constraints over integers and reals, respectively, and realize them in complementary ways, and empirically evaluate the resulting clingo derivatives on common language fragments and contrast them to related ASP systems.

...read moreread less

Abstract: The recent series 5 of the Answer Set Programming (ASP) system clingo provides generic means to enhance basic ASP with theory reasoning capabilities. We instantiate this framework with different forms of linear constraints and elaborate upon its formal properties. Given this, we discuss the respective implementations, and present techniques for using these constraints in a reactive context. More precisely, we introduce extensions to clingo with difference and linear constraints over integers and reals, respectively, and realize them in complementary ways. Finally, we empirically evaluate the resulting clingo derivatives clingo[dl] and clingo[lp] on common language fragments and contrast them to related ASP systems.

...read moreread less

Journal Article•DOI•

Erratum to: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

[...]

Aki Vehtari¹, Andrew Gelman², Jonah Gabry²•Institutions (2)

Helsinki Institute for Information Technology¹, Columbia University²

01 Sep 2017-Statistics and Computing

TL;DR: An efficient computation of LOO is introduced using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights, and it is demonstrated that PSIS-LOO is more robust in the finite case with weak priors or influential observations.

...read moreread less

Abstract: Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.

...read moreread less

Journal Article•DOI•

Multi-view kernel completion

[...]

Sahely Bhadra¹, Samuel Kaski¹, Juho Rousu¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 May 2017-Machine Learning

TL;DR: This paper introduces the first method that can complete kernel matrices with completely missing rows and columns as opposed to individual missing kernel values, with help of information from other incomplete kernel matrix, and proposes a new kernel approximation that generalizes and improves Nyström approximation.

...read moreread less

Abstract: In this paper, we introduce the first method that (1) can complete kernel matrices with completely missing rows and columns as opposed to individual missing kernel values, with help of information from other incomplete kernel matrices. Moreover, (2) the method does not require any of the kernels to be complete a priori, and (3) can tackle non-linear kernels. The kernel completion is done by finding, from the set of available incomplete kernels, an appropriate set of related kernels for each missing entry. These aspects are necessary in practical applications such as integrating legacy data sets, learning under sensor failures and learning when measurements are costly for some of the views. The proposed approach predicts missing rows by modelling both within-view and between-view relationships among kernel values. For within-view learning, we propose a new kernel approximation that generalizes and improves Nystrom approximation. We show, both on simulated data and real case studies, that the proposed method outperforms existing techniques in the settings where they are available, and extends applicability to new settings.

...read moreread less

MediaEval 2017 Predicting Media Interestingness Task

[...]

Claire-Hélène Demarty, Mats Sjöberg¹, Bogdan Ionescu², Thanh-Toan Do, Michael Gygli³, Ngoc Q. K. Duong - Show less +2 more•Institutions (3)

Helsinki Institute for Information Technology¹, Politehnica University of Bucharest², ETH Zurich³

13 Sep 2017

TL;DR: The Predicting Media Interestingness task as mentioned in this paper, which is running for the second year as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation, is presented.

...read moreread less

Abstract: In this paper, the Predicting Media Interestingness task which is running for the second year as part of the MediaEval 2017 Bench-marking Initiative for Multimedia Evaluation, is presented. For the task, participants are expected to create systems that automatically select images and video segments that are considered to be the most interesting for a common viewer. All task characteristics are described, namely the task use case and challenges, the released data set and ground truth, the required participant runs and the evaluation metrics.

...read moreread less

Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies

[...]

Christian Benner¹, Aki S. Havulinna², Aki S. Havulinna¹, Marjo-Riitta Järvelin, Veikko Salomaa², Samuli Ripatti³, Samuli Ripatti¹, Matti Pirinen⁴, Matti Pirinen¹ - Show less +5 more•Institutions (4)

University of Helsinki¹, National Institutes of Health², Wellcome Trust Sanger Institute³, Helsinki Institute for Information Technology⁴

21 Oct 2017

TL;DR: It is shown, both theoretically and empirically, that the size of the reference panel needs to scale with the GWAS sample size; this has important consequences for the application of these methods in ongoing GWAS meta-analyses and large biobank studies.

...read moreread less

Abstract: During the past few years, various novel statistical methods have been developed for fine-mapping with the use of summary statistics from genome-wide association studies (GWASs). Although these approaches require information about the linkage disequilibrium (LD) between variants, there has not been a comprehensive evaluation of how estimation of the LD structure from reference genotype panels performs in comparison with that from the original individual-level GWAS data. Using population genotype data from Finland and the UK Biobank, we show here that a reference panel of 1,000 individuals from the target population is adequate for a GWAS cohort of up to 10,000 individuals, whereas smaller panels, such as those from the 1000 Genomes Project, should be avoided. We also show, both theoretically and empirically, that the size of the reference panel needs to scale with the GWAS sample size; this has important consequences for the application of these methods in ongoing GWAS meta-analyses and large biobank studies. We conclude by providing software tools and by recommending practices for sharing LD information to more efficiently exploit summary statistics in genetics research.

...read moreread less

Posted Content•

Non-Stationary Spectral Kernels

[...]

Sami Remes¹, Markus Heinonen¹, Samuel Kaski¹•Institutions (1)

Helsinki Institute for Information Technology¹

24 May 2017-arXiv: Machine Learning

TL;DR: In this paper, non-stationary spectral kernels for Gaussian process regression were proposed to learn input-dependent and potentially long-range, non-monotonic covariances between inputs.

...read moreread less

Abstract: We propose non-stationary spectral kernels for Gaussian process regression. We propose to model the spectral density of a non-stationary kernel function as a mixture of input-dependent Gaussian process frequency density surfaces. We solve the generalised Fourier transform with such a model, and present a family of non-stationary and non-monotonic kernels that can learn input-dependent and potentially long-range, non-monotonic covariances between inputs. We derive efficient inference using model whitening and marginalized posterior, and show with case studies that these kernels are necessary when modelling even rather simple time series, image or geospatial data with non-stationary characteristics.

...read moreread less

Journal Article•DOI•

Feeling Touched: Emotional Modulation of Somatosensory Potentials to Interpersonal Touch.

[...]

Niklas Ravaja¹, Niklas Ravaja², Niklas Ravaja³, Ville Johannes Harjunen², Ville Johannes Harjunen³, Imtiaj Ahmed², Giulio Jacucci², Michiel M. Spapé² - Show less +4 more•Institutions (3)

Aalto University¹, Helsinki Institute for Information Technology², University of Helsinki³

12 Jan 2017-Scientific Reports

TL;DR: The findings show that not only does touch affect emotion, but also emotional expressions affect touch perception, and the affective modulation of touch was initially obtained as early as 25 ms after the touch onset suggesting that emotional context is integrated to the tactile sensation at a very early stage.

...read moreread less

Abstract: Although the previous studies have shown that an emotional context may alter touch processing, it is not clear how visual contextual information modulates the sensory signals, and at what levels does this modulation take place. Therefore, we investigated how a toucher’s emotional expressions (anger, happiness, fear, and sadness) modulate touchee’s somatosensory-evoked potentials (SEPs) in different temporal ranges. Participants were presented with tactile stimulation appearing to originate from expressive characters in virtual reality. Touch processing was indexed using SEPs, and self-reports of touch experience were collected. Early potentials were found to be amplified after angry, happy and sad facial expressions, while late potentials were amplified after anger but attenuated after happiness. These effects were related to two stages of emotional modulation of tactile perception: anticipation and interpretation. The findings show that not only does touch affect emotion, but also emotional expressions affect touch perception. The affective modulation of touch was initially obtained as early as 25 ms after the touch onset suggesting that emotional context is integrated to the tactile sensation at a very early stage.

...read moreread less

Journal Article•DOI•

Dynamics and impact of homologous recombination on the evolution of Legionella pneumophila.

[...]

Sophia David¹, Sophia David², Leonor Sánchez-Busó¹, Simon R. Harris¹, Pekka Marttinen³, Christophe Rusniok⁴, Christophe Rusniok⁵, Carmen Buchrieser⁴, Carmen Buchrieser⁵, Timothy G. Harrison², Julian Parkhill¹ - Show less +7 more•Institutions (5)

Wellcome Trust Sanger Institute¹, Public Health England², Helsinki Institute for Information Technology³, Centre national de la recherche scientifique⁴, Pasteur Institute⁵

26 Jun 2017-PLOS Genetics

TL;DR: It is suggested that multi-fragment recombination may occur in L. pneumophila, whereby multiple non-contiguous segments that originate from the same molecule of donor DNA are imported into a recipient genome during a single episode of recombination.

...read moreread less

Abstract: Legionella pneumophila is an environmental bacterium and the causative agent of Legionnaires' disease. Previous genomic studies have shown that recombination accounts for a high proportion (>96%) of diversity within several major disease-associated sequence types (STs) of L. pneumophila. This suggests that recombination represents a potentially important force shaping adaptation and virulence. Despite this, little is known about the biological effects of recombination in L. pneumophila, particularly with regards to homologous recombination (whereby genes are replaced with alternative allelic variants). Using newly available population genomic data, we have disentangled events arising from homologous and non-homologous recombination in six major disease-associated STs of L. pneumophila (subsp. pneumophila), and subsequently performed a detailed characterisation of the dynamics and impact of homologous recombination. We identified genomic "hotspots" of homologous recombination that include regions containing outer membrane proteins, the lipopolysaccharide (LPS) region and Dot/Icm effectors, which provide interesting clues to the selection pressures faced by L. pneumophila. Inference of the origin of the recombined regions showed that isolates have most frequently imported DNA from isolates belonging to their own clade, but also occasionally from other major clades of the same subspecies. This supports the hypothesis that the possibility for horizontal exchange of new adaptations between major clades of the subspecies may have been a critical factor in the recent emergence of several clinically important STs from diverse genomic backgrounds. However, acquisition of recombined regions from another subspecies, L. pneumophila subsp. fraseri, was rarely observed, suggesting the existence of a recombination barrier and/or the possibility of ongoing speciation between the two subspecies. Finally, we suggest that multi-fragment recombination may occur in L. pneumophila, whereby multiple non-contiguous segments that originate from the same molecule of donor DNA are imported into a recipient genome during a single episode of recombination.

...read moreread less

Journal Article•DOI•

Individual differences in affective touch: Behavioral inhibition and gender define how an interpersonal touch is perceived

[...]

Ville Johannes Harjunen¹, Ville Johannes Harjunen², Michiel M. Spapé³, Imtiaj Ahmed¹, Imtiaj Ahmed², Giulio Jacucci¹, Giulio Jacucci², Niklas Ravaja⁴, Niklas Ravaja¹ - Show less +5 more•Institutions (4)

Helsinki Institute for Information Technology¹, University of Helsinki², Liverpool Hope University³, Aalto University⁴

01 Mar 2017-Personality and Individual Differences

TL;DR: In this article, a virtual reality experiment was conducted to investigate whether individual differences regarding behavioral inhibition system (BIS) and gender contribute to this affective touch perception, and the results indicated that individual differences that are related to preferences regarding tactile communication also determine how touch is perceived.

...read moreread less

Showing papers by "Helsinki Institute for Information Technology published in 2017"