scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a novel semi-supervised spectral ordering algorithm that modifies the Laplacian matrix such that domain knowledge is taken into account and demonstrates the effectiveness of the proposed framework on the seriation of Usenet newsgroup messages.
Abstract: Several studies have demonstrated the prospects of spectral ordering for data mining. One successful application is seriation of paleontological findings, i.e. ordering the sites of excavation, using data on mammal co-occurrences only. However, spectral ordering ignores the background knowledge that is naturally present in the domain: paleontologists can derive the ages of the sites within some accuracy. On the other hand, the age information is uncertain, so the best approach would be to combine the background knowledge with the information on mammal co-occurrences. Motivated by this kind of partial supervision we propose a novel semi-supervised spectral ordering algorithm that modifies the Laplacian matrix such that domain knowledge is taken into account. Also, it performs feature selection by discarding features that contribute most to the unwanted variability of the data in bootstrap sampling. Moreover, we demonstrate the effectiveness of the proposed framework on the seriation of Usenet newsgroup messages, where the task is to find out the underlying flow of discussion. The theoretical properties of our algorithm are thoroughly analyzed and it is demonstrated that the proposed framework enhances the stability of the spectral ordering output and induces computational gains.

12 citations

09 Jun 2017
TL;DR: This thesis proposes to replace the single reference model by a pan-genomic model, motivated by variation calling and haplotyping, and incrementally builds a novel pipeline for variant calling.
Abstract: The advent of Next-Generation Sequencing brought new challenges for biological sequence analysis: larger volumes of sequencing data, a proliferation of research projects relying on genomic analysis, and the materialization of rich genomic databases, to name a few. A recent example of the latter, gnomeAD, contains more than 15,000 whole human genomes from unrelated individuals. Today a pressing challenge is how to leverage the full potential of such pan-genomic collections. Among the many biological sequencing processes that rely on computation methods, this thesis is motivated by variation calling and haplotyping. Variation calling is the process of characterizing an individual’s genome by identifying how it differs from a reference genome. The standard approach is to first obtain a set of small DNA fragments – called reads – from a biological sample. Genetic variants in the individual’s genome are detected by analyzing the alignment of these reads to the reference. A related procedure is haplotype phasing. Sexual organisms have their genome organized in two sets of chromosomes, with equivalent functions. Each set is inherited from the mother and the father respectively, and its elements are called haplotypes. The haplotype phasing problem is, once genetic variants are discovered, to attribute them to either of the haplotypes. The first part of this thesis incrementally builds a novel pipeline for variant calling. We propose to replace the single reference model by a pan-genomic

12 citations

Book ChapterDOI
TL;DR: In this paper, a unified theory for analysis of components in discrete data is presented, and the main families of algorithms discussed are a variational approximation, Gibbs sampling, and Rao-Blackwellised Gibbs sampling.
Abstract: This article presents a unified theory for analysis of components in discrete data, and compares the methods with techniques such as independent component analysis, non-negative matrix factorisation and latent Dirichlet allocation. The main families of algorithms discussed are a variational approximation, Gibbs sampling, and Rao-Blackwellised Gibbs sampling. Applications are presented for voting records from the United States Senate for 2003, and for the Reuters-21578 newswire collection.

12 citations

Journal ArticleDOI
TL;DR: In the past few years, collaborative robots (i.e., cobots) have been largely adopted within industrial manufacturing as discussed by the authors. Although robots can support companies and workers in carrying out complex activ...
Abstract: In the past few years, collaborative robots (i.e., cobots) have been largely adopted within industrial manufacturing. Although robots can support companies and workers in carrying out complex activ...

12 citations

Proceedings ArticleDOI
01 Nov 2018
TL;DR: This paper proposes a novel approach to address the problem of breaking filter bubbles in social media by aiming to maximize the diversity of the information exposed to connected social-media users, and resorts to polynomial non-approximable algorithms, inspired by solutions developed for the quadratic knapsack problem.
Abstract: Social media have a great potential to improve information dissemination in our society, yet, they have been held accountable for a number of undesirable effects, such as polarization and filter bubbles. It is thus important to understand these negative phenomena and develop methods to combat them. In this paper we propose a novel approach to address the problem of breaking filter bubbles in social media. We do so by aiming to maximize the diversity of the information exposed to connected social-media users. We formulate the problem of maximizing the diversity of exposure as a quadratic-knapsack problem. We show that the proposed diversity-maximization problem is inapproximable, and thus, we resort to polynomial non-approximable algorithms, inspired by solutions developed for the quadratic knapsack problem, as well as scalable greedy heuristics. We complement our algorithms with instance-specific upper bounds, which are used to provide empirical approximation guarantees for the given problem instances. Our experimental evaluation shows that a proposed greedy algorithm followed by randomized local search is the algorithm of choice given its quality-vs.-efficiency trade-off.

12 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127