Showing papers by "Helsinki Institute for Information Technology published in 2018"

PDF

Open Access

Proceedings Article•

Noise2Noise: Learning image restoration without clean data

[...]

Jaakko Lehtinen¹, Jaakko Lehtinen², Jacob Munkberg¹, Jon Hasselgren¹, Samuli Laine¹, Tero Karras¹, Miika Aittala³, Timo Aila¹ - Show less +4 more•Institutions (3)

Nvidia¹, Helsinki Institute for Information Technology², Massachusetts Institute of Technology³

01 Jan 2018

TL;DR: In this article, the authors apply basic statistical reasoning to signal reconstruction by machine learning, learning to map corrupted observations to clean signals without explicit image priors or likelihood models of the corruption, and show that a single model learns photographic noise removal, denoising synthetic Monte Carlo images, and reconstruction of undersampled MRI scans.

...read moreread less

Abstract: We apply basic statistical reasoning to signal reconstruction by machine learning -- learning to map corrupted observations to clean signals -- with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruption. In practice, we show that a single model learns photographic noise removal, denoising synthetic Monte Carlo images, and reconstruction of undersampled MRI scans -- all corrupted by different processes -- based on noisy data only.

...read moreread less

610 citations

Posted Content•

Noise2Noise: Learning Image Restoration without Clean Data

[...]

Jaakko Lehtinen¹, Jaakko Lehtinen², Jacob Munkberg¹, Jon Hasselgren¹, Samuli Laine¹, Tero Karras¹, Miika Aittala³, Timo Aila¹ - Show less +4 more•Institutions (3)

Nvidia¹, Helsinki Institute for Information Technology², Massachusetts Institute of Technology³

12 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that under certain common circumstances, it is possible to learn to restore signals without ever observing clean ones, at performance close or equal to training using clean exemplars.

...read moreread less

399 citations

Journal Article•DOI•

Discussion of "Using Stacking to Average Bayesian Predictive Distributions" by Yao et. al

[...]

Yuling Yao¹, Aki Vehtari², Daniel Simpson³, Andrew Gelman¹•Institutions (3)

Columbia University¹, Helsinki Institute for Information Technology², University of Toronto³

01 Sep 2018-Bayesian Analysis

TL;DR: In this article, the authors present a Bayesian analysis of Bayesian networks with a focus on the first-order dynamics of the Bayesian network, and include invited and contributed discussions.

...read moreread less

Abstract: Main article included invited and contributed discussions: https://dx.doi.org/10.1214/17-BA1091 (Bayesian Analysis 13:3 (2018), pages 917-1007).

...read moreread less

264 citations

Journal Article•DOI•

Computational pan-genomics: status, promises and challenges.

[...]

Tobias Marschall¹, Manja Marz¹, Manja Marz², Thomas Abeel³, Louis Dijkstra, Bas E. Dutilh⁴, Ali Ghaffaari¹, Ali Ghaffaari⁵, Paul Kersey⁶, Wigard P. Kloosterman, Veli Mäkinen⁷, Adam M. Novak⁸, Benedict Paten⁸, David Porubsky, Eric Rivals, Can Alkan, Jasmijn A. Baaijens, Paul I.W. de Bakker, Valentina Boeva, Raoul J. P. Bonnal, Francesca Chiaromonte, Rayan Chikhi⁹, Francesca D. Ciccarelli, Robin Cijvat, Erwin Datema, Cornelia M. van Duijn, Evan E. Eichler⁸, Evan E. Eichler¹⁰, Corinna Ernst, Eleazar Eskin, Erik Garrison¹¹, Mohammed El-Kebir, Gunnar W. Klau, Jan O. Korbel¹¹, Eric-Wubbo Lameijer¹², Benjamin Langmead, Marcel Martin, Paul Medvedev¹³, John C. Mu¹⁴, Pieter B. Neerincx¹⁵, Klaasjan G. Ouwens, Pierre Peterlongo, Nadia Pisanti, Sven Rahmann, Ben Raphael, Knut Reinert, Dick de Ridder¹⁶, Jeroen de Ridder¹⁷, Matthias Schlesner, Ole Schulz-Trieglaff¹⁸, Ashley D. Sanders, Siavash Sheikhizadeh, Carl Shneider, Sandra Smit, Daniel Valenzuela¹⁹, Jiayin Wang²⁰, Lodewyk F. A. Wessels²¹, Y. Zhang, Victor Guryev, Fabio Vandin²², Kai Ye²⁰, Alexander Schönhuth - Show less +58 more•Institutions (22)

01 Jan 2018-Briefings in Bioinformatics

TL;DR: Already available approaches to construct and use pan-genomes are examined, the potential benefits of future technologies and methodologies are discussed, and open challenges from the vantage point of the above-mentioned biological disciplines are reviewed.

...read moreread less

Abstract: Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

...read moreread less

220 citations

Journal Article•DOI•

Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation.

[...]

Kyle A. Barlow¹, Shane Ó Conchúir², Shane Ó Conchúir¹, Samuel Thompson¹, Pooja Suresh¹, James E. Lucas¹, Markus Heinonen³, Markus Heinonen⁴, Tanja Kortemme - Show less +5 more•Institutions (4)

University of California, San Francisco¹, California Institute for Quantitative Biosciences², Aalto University³, Helsinki Institute for Information Technology⁴

05 Feb 2018-Journal of Physical Chemistry B

TL;DR: A method within the Rosetta macromolecular modeling suite (flex ddG) that samples conformational diversity using "backrub" to generate an ensemble of models and then applies torsion minimization, side chain repacking, and averaging across this ensemble to estimate interface ΔΔ G values is developed.

...read moreread less

Abstract: Computationally modeling changes in binding free energies upon mutation (interface ΔΔ G) allows large-scale prediction and perturbation of protein-protein interactions. Additionally, methods that consider and sample relevant conformational plasticity should be able to achieve higher prediction accuracy over methods that do not. To test this hypothesis, we developed a method within the Rosetta macromolecular modeling suite (flex ddG) that samples conformational diversity using "backrub" to generate an ensemble of models and then applies torsion minimization, side chain repacking, and averaging across this ensemble to estimate interface ΔΔ G values. We tested our method on a curated benchmark set of 1240 mutants, and found the method outperformed existing methods that sampled conformational space to a lesser degree. We observed considerable improvements with flex ddG over existing methods on the subset of small side chain to large side chain mutations, as well as for multiple simultaneous non-alanine mutations, stabilizing mutations, and mutations in antibody-antigen interfaces. Finally, we applied a generalized additive model (GAM) approach to the Rosetta energy function; the resulting nonlinear reweighting model improved the agreement with experimentally determined interface ΔΔ G values but also highlighted the necessity of future energy function improvements.

...read moreread less

165 citations

Journal Article•DOI•

Gene exchange drives the ecological success of a multi-host bacterial pathogen

[...]

Emily J Richardson¹, Emily J Richardson², Rodrigo Bacigalupe¹, Ewan M. Harrison³, Lucy A. Weinert³, Samantha Lycett¹, Manouk Vrieling¹, Kirsty Robb⁴, Paul A. Hoskisson⁴, Matthew T. G. Holden⁵, Edward J. Feil⁶, Gavin K. Paterson¹, Steven Y. C. Tong⁷, Adebayo Shittu⁸, Willem J. B. van Wamel⁹, David M. Aanensen¹⁰, Julian Parkhill¹¹, Sharon J. Peacock¹², Jukka Corander¹³, Jukka Corander¹⁴, Jukka Corander¹¹, Mark A. Holmes³, J. Ross Fitzgerald¹ - Show less +19 more•Institutions (14)

University of Edinburgh¹, University of Birmingham², University of Cambridge³, University of Strathclyde⁴, University of St Andrews⁵, University of Bath⁶, University of Melbourne⁷, Obafemi Awolowo University⁸, Erasmus University Rotterdam⁹, Imperial College London¹⁰, Wellcome Trust Sanger Institute¹¹, University of London¹², Helsinki Institute for Information Technology¹³, University of Oslo¹⁴

23 Jul 2018-Nature Ecology and Evolution

TL;DR: A population-genomic analysis of more than 800 isolates of Staphylococcus aureus reveals details of the pathogen’s evolutionary trajectory, including how this has been influenced by animal domestication and antibiotic use.

...read moreread less

Abstract: The capacity for some pathogens to jump into different host-species populations is a major threat to public health and food security Staphylococcus aureus is a multi-host bacterial pathogen responsible for important human and livestock diseases Here, using a population-genomic approach, we identify humans as a major hub for ancient and recent S aureus host-switching events linked to the emergence of endemic livestock strains, and cows as the main animal reservoir for the emergence of human epidemic clones Such host-species transitions are associated with horizontal acquisition of genetic elements from host-specific gene pools conferring traits required for survival in the new host-niche Importantly, genes associated with antimicrobial resistance are unevenly distributed among human and animal hosts, reflecting distinct antibiotic usage practices in medicine and agriculture In addition to gene acquisition, genetic diversification has occurred in pathways associated with nutrient acquisition, implying metabolic remodelling after a host switch in response to distinct nutrient availability For example, S aureus from dairy cattle exhibit enhanced utilization of lactose-a major source of carbohydrate in bovine milk Overall, our findings highlight the influence of human activities on the multi-host ecology of a major bacterial pathogen, underpinned by horizontal gene transfer and core genome diversification

...read moreread less

129 citations

Journal Article•DOI•

Likelihood-free inference via classification

[...]

Michael U. Gutmann¹, Ritabrata Dutta², Samuel Kaski³, Jukka Corander³, Jukka Corander⁴ - Show less +1 more•Institutions (4)

University of Edinburgh¹, University of Lugano², Helsinki Institute for Information Technology³, University of Oslo⁴

01 Jan 2018-Statistics and Computing

TL;DR: This work finds that classification accuracy can be used to assess the discrepancy between simulated and observed data and the complete arsenal of classification methods becomes thereby available for inference of intractable generative models.

...read moreread less

Abstract: Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.

...read moreread less

118 citations

Journal Article•DOI•

Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data

[...]

Danesh Moradigaravand¹, Martin Palm², Anne Farewell², Ville Mustonen³, Ville Mustonen⁴, Jonas Warringer², Leopold Parts⁵ - Show less +3 more•Institutions (5)

University of Birmingham¹, University of Gothenburg², University of Helsinki³, Helsinki Institute for Information Technology⁴, University of Tartu⁵

14 Dec 2018-PLOS Computational Biology

TL;DR: It is demonstrated that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative.

...read moreread less

Abstract: The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.

...read moreread less

113 citations

Journal Article•DOI•

The evolutionary landscape of colorectal tumorigenesis.

[...]

William Cross¹, William Cross², Michal Kovac¹, Michal Kovac³, Ville Mustonen⁴, Daniel Temko², Daniel Temko⁵, Hayley Davis¹, Ann-Marie Baker², Sujata Biswas¹, Roland Arnold⁶, Laura Chegwidden⁶, Chandler Gatenbee, Alexander R. A. Anderson, Viktor H. Koelzer¹, Viktor H. Koelzer⁷, Pierre Martinez², Xiaowei Jiang⁶, Enric Domingo¹, Dan J. Woodcock⁸, Yun Feng¹, Monika Kovacova⁹, Tim Maughan⁸, Marnix Jansen⁵, Manuel Rodriguez-Justo⁵, S Q Ashraf, R. Guy¹⁰, Chris Cunningham¹⁰, James E. East¹¹, David C. Wedge⁸, Lai Mun Wang¹², Claire Palles⁶, Karl Heinimann¹³, Karl Heinimann³, Andrea Sottoriva¹⁴, Simon J. Leedham¹¹, Simon J. Leedham¹, Trevor A. Graham², Ian Tomlinson⁶, Richard Adams¹⁵ - Show less +36 more•Institutions (15)

Wellcome Trust Centre for Human Genetics¹, Queen Mary University of London², University Hospital of Basel³, Helsinki Institute for Information Technology⁴, University College London⁵, University of Birmingham⁶, University of Bern⁷, University of Oxford⁸, Slovak University of Technology in Bratislava⁹, Churchill Hospital¹⁰, John Radcliffe Hospital¹¹, Ludwig Institute for Cancer Research¹², University of Basel¹³, Institute of Cancer Research¹⁴, Cardiff University¹⁵

31 Aug 2018-Nature Ecology and Evolution

TL;DR: It is concluded that adenomas evolve across an undulating fitness landscape, whereas carcinomas occupy a sharper fitness peak, probably owing to stabilizing selection.

...read moreread less

Abstract: The evolutionary events that cause colorectal adenomas (benign) to progress to carcinomas (malignant) remain largely undetermined Using multi-region genome and exome sequencing of 24 benign and malignant colorectal tumours, we investigate the evolutionary fitness landscape occupied by these neoplasms Unlike carcinomas, advanced adenomas frequently harbour sub-clonal driver mutations—considered to be functionally important in the carcinogenic process—that have not swept to fixation, and have relatively high genetic heterogeneity Carcinomas are distinguished from adenomas by widespread aneusomies that are usually clonal and often accrue in a ‘punctuated’ fashion We conclude that adenomas evolve across an undulating fitness landscape, whereas carcinomas occupy a sharper fitness peak, probably owing to stabilizing selection

...read moreread less

99 citations

Journal Article•DOI•

Haplotype Sharing Provides Insights into Fine-Scale Population History and Disease in Finland

[...]

Alicia R. Martin¹, Alicia R. Martin², Konrad J. Karczewski¹, Konrad J. Karczewski², Sini Kerminen³, Mitja I. Kurki, Antti-Pekka Sarin⁴, Antti-Pekka Sarin³, Mykyta Artomov¹, Mykyta Artomov², Johan G. Eriksson³, Johan G. Eriksson⁴, Tõnu Esko⁵, Tõnu Esko¹, Giulio Genovese¹, Aki S. Havulinna⁴, Aki S. Havulinna³, Jaakko Kaprio³, Alexandra Konradi, László Korányi, Anna Kostareva, Minna Männikkö⁶, Andres Metspalu⁵, Markus Perola⁷, Markus Perola³, Markus Perola⁵, Rashmi B. Prasad⁸, Olli T. Raitakari⁷, Oxana Rotar, Veikko Salomaa⁴, Leif Groop⁸, Leif Groop³, Aarno Palotie, Benjamin M. Neale¹, Benjamin M. Neale², Samuli Ripatti³, Matti Pirinen³, Matti Pirinen⁹, Mark J. Daly - Show less +35 more•Institutions (9)

Broad Institute¹, Harvard University², University of Helsinki³, National Institutes of Health⁴, University of Tartu⁵, University of Oulu⁶, Turku University Hospital⁷, Lund University⁸, Helsinki Institute for Information Technology⁹

03 May 2018-American Journal of Human Genetics

TL;DR: In this article, a comprehensive view of recent population history (≤100 generations), the timespan during which most rare-disease-causing alleles arose, was assembled by comparing pairwise haplotype sharing from 43,254 Finns to that of 16,060 Swedes, Estonians, Russians, and Hungarians from geographically and linguistically adjacent countries with different population histories.

...read moreread less

Abstract: Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assembled a comprehensive view of recent population history (≤100 generations), the timespan during which most rare-disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to that of 16,060 Swedes, Estonians, Russians, and Hungarians from geographically and linguistically adjacent countries with different population histories. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from more than 25,000 individuals, we find that although haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland typically share several-fold more of their genome in identity-by-descent segments than individuals from southwest regions. We estimate recent effective population-size changes through time across regions of Finland, and we find that there was more continuous gene flow as Finns migrated from southwest to northeast between the early- and late-settlement regions than was dichotomously described previously. Lastly, we show that haplotype sharing is locally enriched by an order of magnitude among pairs of individuals sharing rare alleles and especially among pairs sharing rare disease-causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.

...read moreread less

65 citations

Journal Article•DOI•

Weak Epistasis May Drive Adaptation in Recombining Bacteria.

[...]

Brian J. Arnold¹, Michael U. Gutmann², Yonatan H. Grad¹, Samuel K. Sheppard³, Jukka Corander⁴, Jukka Corander⁵, Marc Lipsitch¹, William P. Hanage¹ - Show less +4 more•Institutions (5)

Harvard University¹, University of Edinburgh², University of Bath³, University of Oslo⁴, Helsinki Institute for Information Technology⁵

01 Mar 2018-Genetics

TL;DR: Epistasis may play an important role in both the short- and long-term adaptive evolution of bacteria, and, unlike in eukaryotes, is not limited to strong effect sizes, closely linked loci, or other conditions that limit the impact of recombination.

...read moreread less

Abstract: The impact of epistasis on the evolution of multi-locus traits depends on recombination. While sexually reproducing eukaryotes recombine so frequently that epistasis between polymorphisms is not considered to play a large role in short-term adaptation, many bacteria also recombine, some to the degree that their populations are described as "panmictic" or "freely recombining." However, whether this recombination is sufficient to limit the ability of selection to act on epistatic contributions to fitness is unknown. We quantify homologous recombination in five bacterial pathogens and use these parameter estimates in a multilocus model of bacterial evolution with additive and epistatic effects. We find that even for highly recombining species (e.g., Streptococcus pneumoniae or Helicobacter pylori), selection on weak interactions between distant mutations is nearly as efficient as for an asexual species, likely because homologous recombination typically transfers only short segments. However, for strong epistasis, bacterial recombination accelerates selection, with the dynamics dependent on the amount of recombination and the number of loci. Epistasis may thus play an important role in both the short- and long-term adaptive evolution of bacteria, and, unlike in eukaryotes, is not limited to strong effect sizes, closely linked loci, or other conditions that limit the impact of recombination.

...read moreread less

Journal Article•DOI•

Learning with multiple pairwise kernels for drug bioactivity prediction

[...]

Anna Cichonska¹, Anna Cichonska², Tapio Pahikkala³, Sandor Szedmak², Heli Julkunen², Antti Airola³, Markus Heinonen², Tero Aittokallio⁴, Tero Aittokallio¹, Tero Aittokallio², Juho Rousu² - Show less +7 more•Institutions (4)

University of Helsinki¹, Helsinki Institute for Information Technology², Information Technology University³, University of Turku⁴

01 Jul 2018

TL;DR: P pairwiseMKL is introduced, the first method for time‐ and memory‐efficient learning with multiple pairwise kernels that provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem.

...read moreread less

Abstract: Motivation Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem. Availability and implementation Code is available at https://github.com/aalto-ics-kepaco. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Global proteomics profiling improves drug sensitivity prediction: results from a multi-omics, pan-cancer modeling approach.

[...]

Mehreen Ali¹, Suleiman A. Khan², Suleiman A. Khan¹, Krister Wennerberg¹, Tero Aittokallio¹, Tero Aittokallio³, Tero Aittokallio² - Show less +3 more•Institutions (3)

University of Helsinki¹, Helsinki Institute for Information Technology², University of Turku³

15 Apr 2018-Bioinformatics

TL;DR: The first pan‐cancer, multi‐omics comparative analysis of the relative performance of two proteomic technologies, targeted reverse phase protein array (RPPA) and global mass spectrometry (MS), in terms of their accuracy for predicting the sensitivity of cancer cells to both cytotoxic chemotherapeutics and molecularly targeted anticancer compounds is carried out.

...read moreread less

Abstract: Motivation Proteomics profiling is increasingly being used for molecular stratification of cancer patients and cell-line panels. However, systematic assessment of the predictive power of large-scale proteomic technologies across various drug classes and cancer types is currently lacking. To that end, we carried out the first pan-cancer, multi-omics comparative analysis of the relative performance of two proteomic technologies, targeted reverse phase protein array (RPPA) and global mass spectrometry (MS), in terms of their accuracy for predicting the sensitivity of cancer cells to both cytotoxic chemotherapeutics and molecularly targeted anticancer compounds. Results Our results in two cell-line panels demonstrate how MS profiling improves drug response predictions beyond that of the RPPA or the other omics profiles when used alone. However, frequent missing MS data values complicate its use in predictive modeling and required additional filtering, such as focusing on completely measured or known oncoproteins, to obtain maximal predictive performance. Rather strikingly, the two proteomics profiles provided complementary predictive signal both for the cytotoxic and targeted compounds. Further, information about the cellular-abundance of primary target proteins was found critical for predicting the response of targeted compounds, although the non-target features also contributed significantly to the predictive power. The clinical relevance of the selected protein markers was confirmed in cancer patient data. These results provide novel insights into the relative performance and optimal use of the widely applied proteomic technologies, MS and RPPA, which should prove useful in translational applications, such as defining the best combination of omics technologies and marker panels for understanding and predicting drug sensitivities in cancer patients. Availability and implementation Processed datasets, R as well as Matlab implementations of the methods are available at https://github.com/mehr-een/bemkl-rbps. Contact mehreen.ali@helsinki.fi or tero.aittokallio@fimm.fi. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Projective Inference in High-dimensional Problems: Prediction and Feature Selection

[...]

Juho Piironen¹, Markus Paasiniemi¹, Aki Vehtari¹•Institutions (1)

Helsinki Institute for Information Technology¹

04 Oct 2018-arXiv: Machine Learning

TL;DR: In this paper, a two-stage approach is proposed to construct a possibly non-sparse model that predicts well, and then find a minimal subset of features that characterize the predictions.

...read moreread less

Abstract: This paper discusses predictive inference and feature selection for generalized linear models with scarce but high-dimensional data. We argue that in many cases one can benefit from a decision theoretically justified two-stage approach: first, construct a possibly non-sparse model that predicts well, and then find a minimal subset of features that characterize the predictions. The model built in the first step is referred to as the \emph{reference model} and the operation during the latter step as predictive \emph{projection}. The key characteristic of this approach is that it finds an excellent tradeoff between sparsity and predictive accuracy, and the gain comes from utilizing all available information including prior and that coming from the left out features. We review several methods that follow this principle and provide novel methodological contributions. We present a new projection technique that unifies two existing techniques and is both accurate and fast to compute. We also propose a way of evaluating the feature selection process using fast leave-one-out cross-validation that allows for easy and intuitive model size selection. Furthermore, we prove a theorem that helps to understand the conditions under which the projective approach could be beneficial. The benefits are illustrated via several simulated and real world examples.

...read moreread less

Journal Article•DOI•

Liquid-chromatography retention order prediction for metabolite identification.

[...]

Eric Bach¹, Sandor Szedmak¹, Céline Brouard¹, Sebastian Böcker², Juho Rousu¹ - Show less +1 more•Institutions (2)

Helsinki Institute for Information Technology¹, University of Jena²

01 Sep 2018

TL;DR: This work presents a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column, and shows that retention order is much better conserved between instruments than retention time.

...read moreread less

Abstract: Motivation Liquid Chromatography (LC) followed by tandem Mass Spectrometry (MS/MS) is one of the predominant methods for metabolite identification. In recent years, machine learning has started to transform the analysis of tandem mass spectra and the identification of small molecules. In contrast, LC data is rarely used to improve metabolite identification, despite numerous published methods for retention time prediction using machine learning. Results We present a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column. Our method has important advantages over previous approaches: We show that retention order is much better conserved between instruments than retention time. To this end, our method can be trained using retention time measurements from different LC systems and configurations without tedious pre-processing, significantly increasing the amount of available training data. Our experiments demonstrate that retention order prediction is an effective way to learn retention behaviour of molecules from heterogeneous retention time data. Finally, we demonstrate how retention order prediction and MS/MS-based scores can be combined for more accurate metabolite identifications when analyzing a complete LC-MS/MS run. Availability and implementation Implementation of the method is available at https://version.aalto.fi/gitlab/bache1/retention_order_prediction.git.

...read moreread less

Journal Article•DOI•

Towards pan-genome read alignment to improve variation calling

[...]

Daniel Valenzuela¹, Tuukka Norri¹, Niko Välimäki², Esa Pitkänen, Veli Mäkinen¹ - Show less +1 more•Institutions (2)

Helsinki Institute for Information Technology¹, University of Helsinki²

09 May 2018

TL;DR: This work proposes a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference and provides a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows.

...read moreread less

Abstract: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC . Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.

...read moreread less

Journal Article•DOI•

Gaussian process modelling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria

[...]

Marko Järvenpää, Michael U. Gutmann¹, Aki Vehtari², Pekka Marttinen•Institutions (2)

University of Edinburgh¹, Helsinki Institute for Information Technology²

01 Dec 2018-The Annals of Applied Statistics

TL;DR: In this article, the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by approximate Bayesian computation.

...read moreread less

Abstract: Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that it can be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins.

...read moreread less

Journal Article•DOI•

Interactive visual analysis of drug-target interaction networks using Drug Target Profiler, with applications to precision medicine and drug repurposing.

[...]

Ziaurrehman Tanoli¹, Zaid Alam¹, Aleksandr Ianevski¹, Aleksandr Ianevski², Krister Wennerberg³, Markus Vähä-Koskela¹, Tero Aittokallio², Tero Aittokallio¹, Tero Aittokallio⁴ - Show less +5 more•Institutions (4)

University of Helsinki¹, Helsinki Institute for Information Technology², University of Copenhagen³, University of Turku⁴

18 Dec 2018-Briefings in Bioinformatics

TL;DR: This work demonstrates an integrated use of the rich bioactivity data from DTC and related drug databases using Drug Target Profiler (DTP), an open-source software and web tool for interactive exploration of drug-target interaction networks.

...read moreread less

Abstract: Knowledge of the full target space of drugs (or drug-like compounds) provides important insights into the potential therapeutic use of the agents to modulate or avoid their various on- and off-targets in drug discovery and precision medicine. However, there is a lack of consolidated databases and associated data exploration tools that allow for systematic profiling of drug target-binding potencies of both approved and investigational agents using a network-centric approach. We recently initiated a community-driven platform, Drug Target Commons (DTC), which is an open-data crowdsourcing platform designed to improve the management, reproducibility and extended use of compound-target bioactivity data for drug discovery and repurposing, as well as target identification applications. In this work, we demonstrate an integrated use of the rich bioactivity data from DTC and related drug databases using Drug Target Profiler (DTP), an open-source software and web tool for interactive exploration of drug-target interaction networks. DTP was designed for network-centric modeling of mode-of-action of multi-targeting anticancer compounds, especially for precision oncology applications. DTP enables users to construct an interaction network based on integrated bioactivity data across selected chemical compounds and their protein targets, further customizable using various visualization and filtering options, as well as cross-links to several drug and protein databases to provide comprehensive information of the network nodes and interactions. We demonstrate here the operation of the DTP tool and its unique features by several use cases related to both drug discovery and drug repurposing applications, using examples of anticancer drugs with shared target profiles. DTP is freely accessible at http://drugtargetprofiler.fimm.fi/.

...read moreread less

Proceedings Article•

Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures

[...]

Tomi Silander, Janne Leppä-aho¹, Elias Jääsaari², Elias Jääsaari¹, Teemu Roos¹ - Show less +1 more•Institutions (2)

Helsinki Institute for Information Technology¹, Aalto University²

31 Mar 2018

TL;DR: An information theoretic criterion for Bayesian network structure learning which is called quotient normalized maximum likelihood (qNML), which satisfies the property of score equivalence and is decomposable and completely free of adjustable hyperparameters.

...read moreread less

Abstract: We introduce an information theoretic criterion for Bayesian network structure learning which we call quotient normalized maximum likelihood (qNML). In contrast to the closely related factorized normalized maximum likelihood criterion, qNML satisfies the property of score equivalence. It is also decomposable and completely free of adjustable hyperparameters. For practical computations, we identify a remarkably accurate approximation proposed earlier by Szpankowski and Weinberger. Experiments on both simulated and real data demonstrate that the new criterion leads to parsimonious models with good predictive accuracy.

...read moreread less

Book Chapter•DOI•

Classifying Process Instances Using Recurrent Neural Networks

[...]

Markku Hinkka¹, Teemu Lehto¹, Keijo Heljanko¹, Keijo Heljanko², Alexander Jung¹ - Show less +1 more•Institutions (2)

Aalto University¹, Helsinki Institute for Information Technology²

16 Sep 2018-arXiv: Learning

TL;DR: Recurrent neural networks are applied to classifying process instances in a supervised fashion using labeled process instances extracted from event log traces for the first time, showing that GRU outperforms LSTM remarkably in training time while giving almost identical accuracies to L STM models.

...read moreread less

Abstract: Process Mining consists of techniques where logs created by operative systems are transformed into process models. In process mining tools it is often desired to be able to classify ongoing process instances, e.g., to predict how long the process will still require to complete, or to classify process instances to different classes based only on the activities that have occurred in the process instance thus far. Recurrent neural networks and its subclasses, such as Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM), have been demonstrated to be able to learn relevant temporal features for subsequent classification tasks. In this paper we apply recurrent neural networks to classifying process instances. The proposed model is trained in a supervised fashion using labeled process instances extracted from event log traces. This is the first time we know of GRU having been used in classifying business process instances. Our main experimental results shows that GRU outperforms LSTM remarkably in training time while giving almost identical accuracies to LSTM models. Additional contributions of our paper are improving the classification model training time by filtering infrequent activities, which is a technique commonly used, e.g., in Natural Language Processing (NLP).

...read moreread less

Journal Article•DOI•

mGPfusion: predicting protein stability changes with Gaussian process kernel learning and data fusion.

[...]

Emmi Jokinen¹, Markus Heinonen², Markus Heinonen¹, Harri Lähdesmäki¹•Institutions (2)

Aalto University¹, Helsinki Institute for Information Technology²

01 Jul 2018-Bioinformatics

TL;DR: In this paper, a Gaussian Process (GP) based method for predicting protein's stability changes upon single and multiple mutations is proposed. But the accuracy of predictive models is ultimately constrained by the limited availability of experimental data.

...read moreread less

Abstract: Motivation Proteins are commonly used by biochemical industry for numerous processes. Refining these proteins' properties via mutations causes stability effects as well. Accurate computational method to predict how mutations affect protein stability is necessary to facilitate efficient protein design. However, accuracy of predictive models is ultimately constrained by the limited availability of experimental data. Results We have developed mGPfusion, a novel Gaussian process (GP) method for predicting protein's stability changes upon single and multiple mutations. This method complements the limited experimental data with large amounts of molecular simulation data. We introduce a Bayesian data fusion model that re-calibrates the experimental and in silico data sources and then learns a predictive GP model from the combined data. Our protein-specific model requires experimental data only regarding the protein of interest and performs well even with few experimental measurements. The mGPfusion models proteins by contact maps and infers the stability effects caused by mutations with a mixture of graph kernels. Our results show that mGPfusion outperforms state-of-the-art methods in predicting protein stability on a dataset of 15 different proteins and that incorporating molecular simulation data improves the model learning and prediction accuracy. Availability and implementation Software implementation and datasets are available at github.com/emmijokinen/mgpfusion. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer

[...]

Riku Katainen¹, Iikki Donner¹, Tatiana Cajuso¹, Eevi Kaasinen¹, Kimmo Palin¹, Veli Mäkinen², Lauri A. Aaltonen¹, Esa Pitkänen¹, Esa Pitkänen³ - Show less +5 more•Institutions (3)

University of Helsinki¹, Helsinki Institute for Information Technology², European Bioinformatics Institute³

15 Oct 2018-Nature Protocols

TL;DR: An interactive and user-friendly multi-platform-compatible software, BasePlayer, is introduced, which allows scientists, regardless of bioinformatics training, to carry out variant analysis in disease genetics settings.

...read moreread less

Abstract: Next-generation sequencing (NGS) is routinely applied in life sciences and clinical practice, but interpretation of the massive quantities of genomic data produced has become a critical challenge. The genome-wide mutation analyses enabled by NGS have had a revolutionary impact in revealing the predisposing and driving DNA alterations behind a multitude of disorders. The workflow to identify causative mutations from NGS data, for example in cancer and rare diseases, commonly involves phases such as quality filtering, case–control comparison, genome annotation, and visual validation, which require multiple processing steps and usage of various tools and scripts. To this end, we have introduced an interactive and user-friendly multi-platform-compatible software, BasePlayer, which allows scientists, regardless of bioinformatics training, to carry out variant analysis in disease genetics settings. A genome-wide scan of regulatory regions for mutation clusters can be carried out with a desktop computer in ~10 min with a dataset of 3 million somatic variants in 200 whole-genome-sequenced (WGS) cancers. Here, the authors describe how to use BasePlayer, an interactive and user-friendly software that facilitates the identification of causative mutations from next-generation sequencing data.

...read moreread less

Proceedings Article•DOI•

Explainable Time Series Tweaking via Irreversible and Reversible Temporal Transformations

[...]

Isak Karlsson¹, Jonathan Rebane¹, Panagiotis Papapetrou¹, Aristides Gionis²•Institutions (2)

Stockholm University¹, Helsinki Institute for Information Technology²

01 Nov 2018

TL;DR: This paper forms the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, the aim is to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class.

...read moreread less

Abstract: Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.

...read moreread less

Journal Article•DOI•

Depression, depressive symptoms and treatments in women who have recently given birth: UK cohort study

[...]

Irene Petersen¹, Tomi Peltola¹, Tomi Peltola², Samuel Kaski², Kate Walters¹, Sarah Hardoon¹ - Show less +2 more•Institutions (2)

University College London¹, Helsinki Institute for Information Technology²

24 Oct 2018-BMJ Open

TL;DR: Women aged below 30 and from the most deprived areas were at highest risk of depression and most likely to receive antidepressant treatment and more than one in eight women received antidepressant treatment in this period.

...read moreread less

Abstract: Objectives To investigate how depression is recognised in the year after child birth and treatment given in clinical practice. Design Cohort study based on UK primary care electronic health records. Setting Primary care. Participants Women who have given live birth between 2000 and 2013. Outcomes Prevalence of postnatal depression, depression diagnoses, depressive symptoms, antidepressant and non-pharmacological treatment within a year after birth. Results Of 206 517 women, 23 623 (11%) had a record of depressive diagnosis or symptoms in the year after delivery and more than one in eight women received antidepressant treatment. Recording and treatment peaked 6–8 weeks after delivery. Initiation of selective serotonin reuptake inhibitors (SSRI) treatment has become earlier in the more recent years. Thus, the initiation rate of SSRI treatment per 100 pregnancies (95% CI) at 8 weeks were 2.6 (2.5 to 2.8) in 2000–2004, increasing to 3.0 (2.9 to 3.1) in 2005–2009 and 3.8 (3.6 to 3.9) in 2010–2013. The overall rate of initiation of SSRI within the year after delivery, however, has not changed noticeably. A third of the women had at least one record suggestive of depression at any time prior to delivery and of these one in four received SSRI treatment in the year after delivery. Younger women were most likely to have records of depression and depressive symptoms. (Relative risk for postnatal depression: age 15–19: 1.92 (1.76 to 2.10), age 20–24: 1.49 (1.39 to 1.59) versus age 30–34). The risk of depression, postnatal depression and depressive symptoms increased with increasing social deprivation. Conclusions More than 1 in 10 women had electronic health records indicating depression diagnoses or depressive symptoms within a year after delivery and more than one in eight women received antidepressant treatment in this period. Women aged below 30 and from the most deprived areas were at highest risk of depression and most likely to receive antidepressant treatment.

...read moreread less

Journal Article•DOI•

Efficient differentially private learning improves drug sensitivity prediction

[...]

Antti Honkela¹, Antti Honkela², Mrinal Das², Arttu Nieminen², Onur Dikmen², Samuel Kaski² - Show less +2 more•Institutions (2)

University of Helsinki¹, Helsinki Institute for Information Technology²

06 Feb 2018-Biology Direct

TL;DR: It is shown that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method.

...read moreread less

Abstract: Users of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities. We show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy. The proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields. This article was reviewed by Zoltan Gaspari and David Kreil.

...read moreread less

Journal Article•DOI•

Establishing Video Game Genres Using Data-Driven Modeling and Product Databases:

[...]

Ali Faisal¹, Mirva Peltoniemi², Mirva Peltoniemi³•Institutions (3)

Helsinki Institute for Information Technology¹, Aalto University², University of Jyväskylä³

01 Jan 2018-Games and Culture

TL;DR: The evolution of game genres from 1979 till 2010 is analyzed, indicating that until 1990, there have been many genres competing for dominance, but thereafter sport-racing, strategy, and action have become the most prevalent genres.

...read moreread less

Abstract: Establishing genres is the first step toward analyzing games and how the genre landscape evolves over the years. We use data-driven modeling that distils genres from textual descriptions of a large collection of games. We analyze the evolution of game genres from 1979 till 2010. Our results indicate that until 1990, there have been many genres competing for dominance, but thereafter sport-racing, strategy, and action have become the most prevalent genres. Moreover, we find that games vary to a great extent as to whether they belong mostly to one genre or to a combination of several genres. We also compare the results of our data-driven model with two product databases, Metacritic and Mobygames, and observe that the classifications of games to different genres are substantially different, even between product databases. We conclude with discussion on potential future applications and how they may further our understanding of video game genres.

...read moreread less

Proceedings of SAT Competition 2018 : Solver and Benchmark Descriptions

[...]

Marijn J. H. Heule¹, Matti Järvisalo², Martin Suda³•Institutions (3)

University of Texas at Austin¹, Helsinki Institute for Information Technology², Vienna University of Technology³

01 Jan 2018

TL;DR: According to the rule of different tracks at the SAT Competition 2017, multple versions of abcdSAT are developed, which are submitted to agile, main, no-limit, incremental library and parallel track.

...read moreread less

Proceedings Article•DOI•

Maximizing the Diversity of Exposure in a Social Network

[...]

Cigdem Aslay, Antonis Matakos, Esther Galbrun, Aristides Gionis¹•Institutions (1)

Helsinki Institute for Information Technology¹

17 Nov 2018

TL;DR: This paper proposes a novel approach to maximize the diversity of exposure in a social network by introducing a novel extension to the notion of random reverse-reachable sets and demonstrates the efficiency and scalability of the algorithm on several real-world datasets.

...read moreread less

Abstract: Social-media platforms have created new ways for citizens to stay informed and participate in public debates However, to enable a healthy environment for information sharing, social deliberation, and opinion formation, citizens need to be exposed to sufficiently diverse viewpoints that challenge their assumptions, instead of being trapped inside filter bubbles In this paper, we take a step in this direction and propose a novel approach to maximize the diversity of exposure in a social network We formulate the problem in the context of information propagation, as a task of recommending a small number of news articles to selected users We propose a realistic setting where we take into account content and user leanings, and the probability of further sharing an article This setting allows us to capture the balance between maximizing the spread of information and ensuring the exposure of users to diverse viewpoints The resulting problem can be cast as maximizing a monotone and submodular function subject to a matroid constraint on the allocation of articles to users It is a challenging generalization of the influence maximization problem Yet, we are able to devise scalable approximation algorithms by introducing a novel extension to the notion of random reverse-reachable sets We experimentally demonstrate the efficiency and scalability of our algorithm on several real-world datasets

...read moreread less

Journal Article•DOI•

A drama movie activates brains of holistic and analytical thinkers differentially

[...]

Mareike Bacha-Trams¹, Yuri I. Alexandrov², Emilia Broman¹, Enrico Glerean³, Minna Kauppila¹, Janne Kauttonen¹, Elisa Ryyppö¹, Mikko Sams, Iiro P. Jääskeläinen - Show less +5 more•Institutions (3)

Aalto University¹, Russian Academy of Sciences², Helsinki Institute for Information Technology³

09 Nov 2018-Social Cognitive and Affective Neuroscience

TL;DR: Overall, the results show how brain activity in holistic vs analytical participants differs when viewing the same drama movie.

...read moreread less

Abstract: People socialized in different cultures differ in their thinking styles. Eastern-culture people view objects more holistically by taking context into account, whereas Western-culture people view objects more analytically by focusing on them at the expense of context. Here we studied whether participants, who have different thinking styles but live within the same culture, exhibit differential brain activity when viewing a drama movie. A total of 26 Finnish participants, who were divided into holistic and analytical thinkers based on self-report questionnaire scores, watched a shortened drama movie during functional magnetic resonance imaging. We compared intersubject correlation (ISC) of brain hemodynamic activity of holistic vs analytical participants across the movie viewings. Holistic thinkers showed significant ISC in more extensive cortical areas than analytical thinkers, suggesting that they perceived the movie in a more similar fashion. Significantly higher ISC was observed in holistic thinkers in occipital, prefrontal and temporal cortices. In analytical thinkers, significant ISC was observed in right-hemisphere fusiform gyrus, temporoparietal junction and frontal cortex. Since these results were obtained in participants with similar cultural background, they are less prone to confounds by other possible cultural differences. Overall, our results show how brain activity in holistic vs analytical participants differs when viewing the same drama movie.

...read moreread less

Journal Article•DOI•

Global citation recommendation using knowledge graphs

[...]

Frederick Ayala-Gómez¹, Bálint Daróczy², András A. Benczúr², Michael Mathioudakis³, Michael Mathioudakis⁴, Aristides Gionis⁵ - Show less +2 more•Institutions (5)

Eötvös Loránd University¹, Hungarian Academy of Sciences², Helsinki Institute for Information Technology³, University of Lyon⁴, Aalto University⁵

01 Jan 2018-Journal of Intelligent and Fuzzy Systems

TL;DR: This work focuses on a setting where the user provides only the abstract of a new paper as input, and proposes a model to expand the semantic features of the given abstract using knowledge graphs and combine them with other features to fit a learning to rank model.

...read moreread less

Abstract: Scholarly search engines, reference management tools, and academic social networks enable modern researchers to organize their scientific libraries. Moreover, they often provide recommendations for scientific publications that might be of interest to researchers. Because of the exponentially increasing volume of publications, effective citation recommendation is of great importance to researchers, as it reduces the time and effort spent on retrieving, understanding, and selecting research papers. In this context, we address the problem of citation recommendation, i.e., the task of recommending citations for a new paper. Current research investigates this task in different settings, including cases where rich user metadata is available (e.g., user profile, publications, citations). This work focus on a setting where the user provides only the abstract of a new paper as input. Our proposed approach is to expand the semantic features of the given abstract using knowledge graphs – and, combine them with other features (e.g., indegree, recency) to fit a learning to rank model. This model is used to generate the citation recommendations. By evaluating on real data, we show that the expanded semantic features lead to improving the quality of the recommendations measured by nDCG@10.

...read moreread less