Institution

Helsinki Institute for Information Technology

Facility•Espoo, Finland•

About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.

...read moreread less

Topics: Population, Bayesian network, Mobile computing, The Internet, Approximation algorithm ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Approximate all-pairs suffix/prefix overlaps

[...]

Niko Välimäki¹, Susana Ladra², Veli Mäkinen¹•Institutions (2)

Helsinki Institute for Information Technology¹, University of A Coruña²

01 Apr 2012-Information & Computation

TL;DR: This work proposes a new solution for approximate overlaps based on backward backtracking (Lam, et al., 2008) and suffix filters (Karkkainen and Na, 2008), and uses nH"k+o([email protected])+rlogr bits of space, where H"k is the k-th order entropy and @s the alphabet size.

...read moreread less

Abstract: Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate @e, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance [email protected][email protected]@[email protected]?, where @? is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Karkkainen and Na, 2008). Our technique uses nH"k+o([email protected])+rlogr bits of space, where H"k is the k-th order entropy and @s the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales up to millions of DNA reads.

...read moreread less

18 citations

Journal Article•DOI•

Simple integrative preprocessing preserves what is shared in data sources.

[...]

Abhishek Tripathi¹, Abhishek Tripathi², Arto Klami², Arto Klami³, Samuel Kaski², Samuel Kaski³ - Show less +2 more•Institutions (3)

University of Helsinki¹, Helsinki Institute for Information Technology², Helsinki University of Technology³

21 Feb 2008-BMC Bioinformatics

TL;DR: A method for the task of data fusion for exploratory data analysis, when statistical dependencies between the sources and not within a source are interesting, and inherits its good properties of being simple, fast, and easily interpretable as a linear projection.

...read moreread less

Abstract: Bioinformatics data analysis toolbox needs general-purpose, fast and easily interpretable preprocessing tools that perform data integration during exploratory data analysis. Our focus is on vector-valued data sources, each consisting of measurements of the same entity but on different variables, and on tasks where source-specific variation is considered noisy or not interesting. Principal components analysis of all sources combined together is an obvious choice if it is not important to distinguish between data source-specific and shared variation. Canonical Correlation Analysis (CCA) focuses on mutual dependencies and discards source-specific "noise" but it produces a separate set of components for each source. It turns out that components given by CCA can be combined easily to produce a linear and hence fast and easily interpretable feature extraction method. The method fuses together several sources, such that the properties they share are preserved. Source-specific variation is discarded as uninteresting. We give the details and implement them in a software tool. The method is demonstrated on gene expression measurements in three case studies: classification of cell cycle regulated genes in yeast, identification of differentially expressed genes in leukemia, and defining stress response in yeast. The software package is available at http://www.cis.hut.fi/projects/mi/software/drCCA/ . We introduced a method for the task of data fusion for exploratory data analysis, when statistical dependencies between the sources and not within a source are interesting. The method uses canonical correlation analysis in a new way for dimensionality reduction, and inherits its good properties of being simple, fast, and easily interpretable as a linear projection.

...read moreread less

18 citations

Journal Article•DOI•

Neuroadaptive modelling for generating images matching perceptual categories.

[...]

Lauri Kangassalo¹, Michiel M. Spapé¹, Tuukka Ruotsalo², Tuukka Ruotsalo¹•Institutions (2)

University of Helsinki¹, Helsinki Institute for Information Technology²

07 Sep 2020-Scientific Reports

TL;DR: Brain–computer interfaces enable active communication and execution of a pre-defined set of commands, such as typing a letter or moving a cursor, but they have thus far not been able to infer more complex intentions or adapt more complex output based on brain signals.

...read moreread less

Abstract: Brain-computer interfaces enable active communication and execution of a pre-defined set of commands, such as typing a letter or moving a cursor. However, they have thus far not been able to infer more complex intentions or adapt more complex output based on brain signals. Here, we present neuroadaptive generative modelling, which uses a participant's brain signals as feedback to adapt a boundless generative model and generate new information matching the participant's intentions. We report an experiment validating the paradigm in generating images of human faces. In the experiment, participants were asked to specifically focus on perceptual categories, such as old or young people, while being presented with computer-generated, photorealistic faces with varying visual features. Their EEG signals associated with the images were then used as a feedback signal to update a model of the user's intentions, from which new images were generated using a generative adversarial network. A double-blind follow-up with the participant evaluating the output shows that neuroadaptive modelling can be utilised to produce images matching the perceptual category features. The approach demonstrates brain-based creative augmentation between computers and humans for producing new information matching the human operator's perceptual categories.

...read moreread less

18 citations

Book Chapter•DOI•

Haplotype inference via hierarchical genotype parsing

[...]

Pasi Rastas¹, Esko Ukkonen¹•Institutions (1)

Helsinki Institute for Information Technology¹

08 Sep 2007

TL;DR: A polynomial-time parsing algorithm is formulated that finds minimum cross-over parse in a simplified 'flat' parsing model that ignores the historical hierarchy of recombinations.

...read moreread less

Abstract: The within-species genetic variation due to recombinations leads to a mosaic-like structure of DNA. This structure can be modeled, e.g. by parsing sample sequences of current DNA with respect to a small number of founders. The founders represent the ancestral sequence material from which the sample was created in a sequence of recombination steps. This scenario has recently been successfully applied on developing probabilistic Hidden Markov Methods for haplotyping genotypic data. In this paper we introduce a combinatorial method for haplotyping that is based on a similar parsing idea. We formulate a polynomial-time parsing algorithm that finds minimum cross-over parse in a simplified 'flat' parsing model that ignores the historical hierarchy of recombinations. The problem of constructing optimal founders that would give minimum possible parse for given genotypic sequences is shown NP-hard. A heuristic locally-optimal algorithm is given for founder construction. Combined with flat parsing this already gives quite good haplotyping results. Improved haplotyping is obtained by using a hierarchical parsing that properly models the natural recombination process. For finding short hierarchical parses a greedy polynomial-time algorithm is given. Empirical haplotyping results on HapMap data are reported.

...read moreread less

18 citations

Journal Article•DOI•

An expanded analysis framework for multivariate GWAS connects inflammatory biomarkers to functional variants and disease

[...]

Sanni Ruotsalainen¹, Juulia Partanen¹, Anna Cichonska², Anna Cichonska³, Anna Cichonska¹, Jake Lin, Christian Benner¹, Ida Surakka⁴, Mary Pat Reeve¹, Priit Palta⁵, Priit Palta¹, Marko Salmi², Sirpa Jalkanen², Ari Ahola-Olli¹, Ari Ahola-Olli⁶, Ari Ahola-Olli⁷, Aarno Palotie⁷, Aarno Palotie⁶, Aarno Palotie¹, Veikko Salomaa⁸, Mark J. Daly⁷, Mark J. Daly¹, Mark J. Daly⁶, Matti Pirinen¹, Samuli Ripatti⁷, Samuli Ripatti¹, Jukka Koskela⁷, Jukka Koskela¹ - Show less +24 more•Institutions (8)

University of Helsinki¹, University of Turku², Helsinki Institute for Information Technology³, University of Michigan⁴, University of Tartu⁵, Harvard University⁶, Broad Institute⁷, National Institute for Health and Welfare⁸

01 Feb 2021-European Journal of Human Genetics

TL;DR: A novel computational workflow for multivariate GWAS follow-up analyses, including fine-mapping and identification of the subset of traits driving associations (driver traits) and promoting the advancement of powerful multivariate methods in genomics.

...read moreread less

Abstract: Multivariate methods are known to increase the statistical power to detect associations in the case of shared genetic basis between phenotypes. They have, however, lacked essential analytic tools to follow-up and understand the biology underlying these associations. We developed a novel computational workflow for multivariate GWAS follow-up analyses, including fine-mapping and identification of the subset of traits driving associations (driver traits). Many follow-up tools require univariate regression coefficients which are lacking from multivariate results. Our method overcomes this problem by using Canonical Correlation Analysis to turn each multivariate association into its optimal univariate Linear Combination Phenotype (LCP). This enables an LCP-GWAS, which in turn generates the statistics required for follow-up analyses. We implemented our method on 12 highly correlated inflammatory biomarkers in a Finnish population-based study. Altogether, we identified 11 associations, four of which (F5, ABO, C1orf140 and PDGFRB) were not detected by biomarker-specific analyses. Fine-mapping identified 19 signals within the 11 loci and driver trait analysis determined the traits contributing to the associations. A phenome-wide association study on the 19 representative variants from the signals in 176,899 individuals from the FinnGen study revealed 53 disease associations (p < 1 × 10-4). Several reported pQTLs in the 11 loci provided orthogonal evidence for the biologically relevant functions of the representative variants. Our novel multivariate analysis workflow provides a powerful addition to standard univariate GWAS analyses by enabling multivariate GWAS follow-up and thus promoting the advancement of powerful multivariate methods in genomics.

...read moreread less

18 citations

Collapse

Authors

Showing all 632 results

Name	H-index	Papers	Citations
Dimitri P. Bertsekas	94	332	85939
Olli Kallioniemi	90	353	42021
Heikki Mannila	72	295	26500
Jukka Corander	66	411	17220
Jaakko Kangasjärvi	62	146	17096
Aapo Hyvärinen	61	301	44146
Samuel Kaski	58	522	14180
Nadarajah Asokan	58	327	11947
Aristides Gionis	58	292	19300
Hannu Toivonen	56	192	19316
Nicola Zamboni	53	128	11397
Jorma Rissanen	52	151	22720
Tero Aittokallio	52	271	8689
Juha Veijola	52	261	19588
Juho Hamari	51	176	16631

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

93% related

Microsoft

86.9K papers, 4.1M citations

38.6K papers, 1.3M citations

92% related

Carnegie Mellon University

104.3K papers, 5.9M citations

91% related

Facebook

10.9K papers, 570.1K citations

91% related

Performance

Metrics

1,967

Papers

76,126

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	1
2022	4
2021	85
2020	97
2019	140
2018	127