scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested.
Abstract: High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.

25 citations

Proceedings ArticleDOI
12 Nov 2012
TL;DR: A modification to the generalized measure of association framework that reduces the effect of temporal structure in time series and can capture pairwise dependence between generated signals as well as their envelopes with good statistical power is proposed.
Abstract: The purpose of this paper is two-fold: first, to propose a modification to the generalized measure of association (GMA) framework that reduces the effect of temporal structure in time series; second, to assess the reliability of using association methods to capture dependence between pairs of EEG channels using their time series or envelopes. To achieve the first goal, the GMA algorithm was updated so as to minimize the effect of the correlation inherent in the time structure. The reliability of the modified scheme was then assessed on both synthetic and real data. Synthetic data was generated from a Clayton copula, for which null hypotheses of uncorrelatedness were constructed for the signal. The signal was processed such that the envelope emulated important characteristics of experimental EEG data. Results show that the modified GMA procedure can capture pairwise dependence between generated signals as well as their envelopes with good statistical power. Furthermore, applying GMA and Kendall's tau to quantify dependence using the extracted envelopes of processed EEG data concords with previous findings using the signal itself.

24 citations

Proceedings ArticleDOI
25 Oct 2008
TL;DR: This paper shows how to efficiently find a universal map whose expected cost is O(log mn) times the expected optimal cost, and shows how all these universal mappings give us stochastic online algorithms with the same competitive factors.
Abstract: Given a universe U of n elements and a weighted collection l of m subsets of U, the universal set cover problem is to a-priori map each element u epsi U to a set S(u) epsi l containing u, so that X sube U is covered by S(X)=UuepsiXS(u). The aim is finding a mapping such that the cost of S(X) is as close as possible to the optimal set-cover cost for X. (Such problems are also called oblivious or a-priori optimization problems.) Unfortunately, for every universal mapping, the cost of S(X) can be Omega(radicn) times larger than optimal if the set X is adversarially chosen. In this paper we study the performance on average, when X is a set of randomly chosen elements from the universe: we show how to efficiently find a universal map whose expected cost is O(log mn) times the expected optimal cost. In fact, we give a slightly improved analysis and show that this is the best possible. We generalize these ideas to weighted set cover and show similar guarantees to (non-metric) facility location, where we have to balance the facility opening cost with the cost of connecting clients to the facilities. We show applications of our results to universal multi-cut and disc-covering problems, and show how all these universal mappings give us stochastic online algorithms with the same competitive factors.

24 citations

Journal ArticleDOI
TL;DR: In this paper, Combin et al. showed that the list of known 2-symmetric design biplanes with k = 11 is complete and showed that there is no 3-dimensional symmetric design with k ≥ 1.
Abstract: A biplane is a 2-(k(k − 1)/2 + 1,k,2) symmetric design. Only sixteen nontrivial biplanes are known: there are exactly nine biplanes with k < 11, at least five biplanes with k = 11, and at least two biplanes with k = 13. It is here shown by exhaustive computer search that the list of five known biplanes with k = 11 is complete. This result further implies that there exists no 3-(57, 12, 2) design, no 11211 symmetric configuration, and no (324, 57, 0, 12) strongly regular graph. The five biplanes have 16 residual designs, which by the Hall–Connor theorem constitute a complete classification of the 2-(45, 9, 2) designs. © 2007 Wiley Periodicals, Inc. J Combin Designs 16: 117–127, 2008

24 citations

Journal ArticleDOI
TL;DR: It is implied that genomic selection can be used to capture the yield potential in G×E effects for future growth seasons, providing a possible means to achieve yield improvements, needed for feeding the growing population.
Abstract: MOTIVATION Interaction between the genotype and the environment (G×E) has a strong impact on the yield of major crop plants. Although influential, taking G×E explicitly into account in plant breeding has remained difficult. Recently G×E has been predicted from environmental and genomic covariates, but existing works have not shown that generalization to new environments and years without access to in-season data is possible and practical applicability remains unclear. Using data from a Barley breeding programme in Finland, we construct an in silico experiment to study the viability of G×E prediction under practical constraints. RESULTS We show that the response to the environment of a new generation of untested Barley cultivars can be predicted in new locations and years using genomic data, machine learning and historical weather observations for the new locations. Our results highlight the need for models of G×E: non-linear effects clearly dominate linear ones, and the interaction between the soil type and daily rain is identified as the main driver for G×E for Barley in Finland. Our study implies that genomic selection can be used to capture the yield potential in G×E effects for future growth seasons, providing a possible means to achieve yield improvements, needed for feeding the growing population. AVAILABILITY AND IMPLEMENTATION The data accompanied by the method code (http://research.cs.aalto.fi/pml/software/gxe/bioinformatics_codes.zip) is available in the form of kernels to allow reproducing the results. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

24 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127