scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by approximate Bayesian computation.
Abstract: Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that it can be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins.

30 citations

Proceedings Article
18 Feb 2010
TL;DR: This paper presents a two-step method, which is constrained nonlinear ICA followed by statistical independence tests, to distinguish the cause from the effect in the two-variable case, and successfully identify causes from effects.
Abstract: Distinguishing causes from effects is an important problem in many areas. In this paper, we propose a very general but well defined nonlinear acyclic causal model, namely, post-nonlinear acyclic causal model with inner additive noise, to tackle this problem. In this model, each observed variable is generated by a nonlinear function of its parents, with additive noise, followed by a nonlinear distortion. The nonlinearity in the second stage takes into account the effect of sensor distortions, which are usually encountered in practice. In the two-variable case, if all the nonlinearities involved in the model are invertible, by relating the proposed model to the post-nonlinear independent component analysis (ICA) problem, we give the conditions under which the causal relation can be uniquely found. We present a two-step method, which is constrained nonlinear ICA followed by statistical independence tests, to distinguish the cause from the effect in the two-variable case. We apply this method to solve the problem "CauseEffectPairs" in the Pot-luck challenge, and successfully identify causes from effects.

30 citations

Proceedings Article
24 May 2019
TL;DR: This paper presents gradKCCA, a large-scale sparse non-linear canonical correlation method that outperforms state-of-the-art CCA methods in terms of speed and robustness to noise both in simulated and real-world datasets.
Abstract: This paper presents gradKCCA, a large-scale sparse non-linear canonical correlation method. Like Kernel Canonical Correlation Analysis (KCCA), our method finds non-linear relations through kernel functions, but it does not rely on a kernel matrix, a known bottleneck for scaling up kernel methods. gradKCCA corresponds to solving KCCA with the additional constraint that the canonical projection directions in the kernelinduced feature space have preimages in the original data space. Firstly, this modification allows us to very efficiently maximize kernel canonical correlation through an alternating projected gradient algorithm working in the original data space. Secondly, we can control the sparsity of the projection directions by constraining the `1 norm of the preimages of the projection directions, facilitating the interpretation of the discovered patterns, which is not available through KCCA. Our empirical experiments demonstrate that gradKCCA outperforms state-of-the-art CCA methods in terms of speed and robustness to noise both in simulated and real-world datasets.

30 citations

Proceedings ArticleDOI
21 Jul 2015
TL;DR: An O(n1-2/ω) round matrix multiplication algorithm is obtained, where ω < 2.3728639 is the exponent of matrix multiplication, which gives significant improvements over previous best upper bounds in the congested clique model.
Abstract: In this work, we use algebraic methods for studying distance computation and subgraph detection tasks in the congested clique model. Specifically, we adapt parallel matrix multiplication implementations to the congested clique, obtaining an O(n1-2/ω) round matrix multiplication algorithm, where ω

30 citations

Journal ArticleDOI
28 Mar 2017-PLOS ONE
TL;DR: In this article, the authors used cluster analysis to investigate clinical factors predicting the onset and severity of preeclampsia in a cohort of women with known clinical risk factors, including chronic hypertension, obesity, diabetes, lipid dysfunction, and inflammation.
Abstract: Objectives Preeclampsia is divided into early-onset (delivery before 34 weeks of gestation) and late-onset (delivery at or after 34 weeks) subtypes, which may rise from different etiopathogenic backgrounds. Early-onset disease is associated with placental dysfunction. Late-onset disease develops predominantly due to metabolic disturbances, obesity, diabetes, lipid dysfunction, and inflammation, which affect endothelial function. Our aim was to use cluster analysis to investigate clinical factors predicting the onset and severity of preeclampsia in a cohort of women with known clinical risk factors. Methods We recruited 903 pregnant women with risk factors for preeclampsia at gestational weeks 12+0–13+6. Each individual outcome diagnosis was independently verified from medical records. We applied a Bayesian clustering algorithm to classify the study participants to clusters based on their particular risk factor combination. For each cluster, we computed the risk ratio of each disease outcome, relative to the risk in the general population. Results The risk of preeclampsia increased exponentially with respect to the number of risk factors. Our analysis revealed 25 number of clusters. Preeclampsia in a previous pregnancy (n = 138) increased the risk of preeclampsia 8.1 fold (95% confidence interval (CI) 5.7–11.2) compared to a general population of pregnant women. Having a small for gestational age infant (n = 57) in a previous pregnancy increased the risk of early-onset preeclampsia 17.5 fold (95%CI 2.1–60.5). Cluster of those two risk factors together (n = 21) increased the risk of severe preeclampsia to 23.8-fold (95%CI 5.1–60.6), intermediate onset (delivery between 34+0–36+6 weeks of gestation) to 25.1-fold (95%CI 3.1–79.9) and preterm preeclampsia (delivery before 37+0 weeks of gestation) to 16.4-fold (95%CI 2.0–52.4). Body mass index over 30 kg/m2 (n = 228) as a sole risk factor increased the risk of preeclampsia to 2.1-fold (95%CI 1.1–3.6). Together with preeclampsia in an earlier pregnancy the risk increased to 11.4 (95%CI 4.5–20.9). Chronic hypertension (n = 60) increased the risk of preeclampsia 5.3-fold (95%CI 2.4–9.8), of severe preeclampsia 22.2-fold (95%CI 9.9–41.0), and risk of early-onset preeclampsia 16.7-fold (95%CI 2.0–57.6). If a woman had chronic hypertension combined with obesity, gestational diabetes and earlier preeclampsia, the risk of term preeclampsia increased 4.8-fold (95%CI 0.1–21.7). Women with type 1 diabetes mellitus had a high risk of all subgroups of preeclampsia. Conclusion The risk of preeclampsia increases exponentially with respect to the number of risk factors. Early-onset preeclampsia and severe preeclampsia have different risk profile from term preeclampsia.

30 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127