Institution

Helsinki Institute for Information Technology

Facility•Espoo, Finland•

About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.

...read moreread less

Topics: Population, Bayesian network, The Internet, Mobile computing, Cluster analysis ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

MaxSAT Evaluation 2019: Solver and Benchmark Descriptions

[...]

Fahiem Bacchus¹, Matti Järvisalo², Ruben Martins³•Institutions (3)

University of Toronto¹, Helsinki Institute for Information Technology², Carnegie Mellon University³

01 Jan 2019

TL;DR: The MaxSAT solver MAXINO as mentioned in this paper is based on the k-ProcessCore algorithm, a parametric algorithm generalizing OLL, ONE and PMRES, which is used in the SAT solver.

...read moreread less

Abstract: Maxino is based on the k-ProcessCore algorithm, a parametric algorithm generalizing OLL, ONE and PMRES. Parameter k is dynamically determined for each processed unsatisfiable core by a function taking into account the size of the core. Roughly, k is in O(logn), where n is the size of the core. Satisfiability of propositional theories is checked by means of a pseudo-boolean solver extending Glucose 4.1 (single thread). A VERY SHORT DESCRIPTION OF THE SOLVER The solver MAXINO is build on top of the SAT solver GLUCOSE [7] (version 4.1). MaxSAT instances are normalized by replacing non-unary soft clauses with fresh variables, a process known as relaxation. Specifically, the relaxation of a soft clause φ is the clause φ ∨ ¬x, where x is a variable not occurring elsewhere; moreover, the weight associated with clause φ is associated with the soft literal x. Hence, the normalized input processed by MAXINO comprises hard clauses and soft literals, so that the computational problem amounts to maximize a linear function, which is defined by the soft literals, subject to a set of constraints, which is the set of hard clauses. The algorithm implemented by MAXINO to address such a computational problem is based on unsatisfiable core analysis, and in particular takes advantage of the following invariant: A model of the constraints that satisfies all soft literals is an optimum model. The algorithm then starts by searching such a model. On the other hand, if an inconsistency arises, the unsatisfiable core returned by the SAT solver is analyzed. The analysis of an unsatisfiable core results into new constraints and new soft literals, which replace the soft literals involved in the unsatisfiable core. The new constraints are essentially such that models satisfying all new soft literals actually satisfy all but one of the replaced soft literals. Since there is no model that satisfies all replaced soft literals, it turns out that the invariant is preserved, and the process can be iterated. Specifically, the algorithm implemented by MAXINO is K, based on the k-ProcessCore procedure introduced by Alviano et al. [2]. It is a parametric algorithm generalizing OLL [3], ONE [2] and PMRES [8]. Intuitively, for an unsatisfiable core {x0, x1, x2, x3}, ONE introduces the following constraint: x0 + x1 + x2 + x3 + ¬y1 + ¬y2 + ¬y3 ≥ 3 y1 → y2 y2 → y3 where y1, y2, y3 are fresh variables (the new soft literals that replace x0, x1, x2, x3). OLL introduces the following constraints (the first immediately, the second if a core containing y1 is subsequently found, and the third if a core containing y2 is subsequently found): x0 + x1 + x2 + x3 + ¬y1 ≥ 3 x0 + x1 + x2 + x3 + ¬y2 ≥ 2 x0 + x1 + x2 + x3 + ¬y3 ≥ 1 Concerning PMRES, it introduces the following constraints: x0 ∨ x1 ∨ ¬y1 z1 ↔ x0 ∧ x1 z1 ∨ x2 ∨ ¬y2 z2 ↔ z1 ∧ x2 z2 ∨ x3 ∨ ¬y3 which are essentially equivalent to the following constraints: x0 + x1 + ¬z1 + ¬y1 ≥ 2 z1 → y1 z1 + x2 + ¬z2 + ¬y2 ≥ 2 z2 → y2 z2 + x3 + ¬y3 ≥ 1 where y1, y2, y3 are fresh variables (the new soft literals that replace x0, x1, x2, x3), and z1, z2 are fresh auxiliary variables. Algorithm K, instead, introduces a set of constraints of bounded size, where the bound is given by the chosen parameter k, and is specifically 2 · (k+1). ONE, which is essentially a smart encoding of OLL, is the special case for k = ∞, and PMRES is the special case for k = 1. For the example unsatisfiable core, another possibility is k = 2, which would results in the following constraints: x0 + x1 + x2 + ¬z1 + ¬y1 + ¬y2 ≥ 3 z1 → y1 y1 → y2 z1 + x3 + ¬y3 ≥ 1 In this version of MAXINO, the parameter k is dynamically determined based on the size of the analyzed unsatisfiable core: k ∈ O(log n), where n is the size of the core. The analysis of unsatisfiable core is preceded by a shrink procedure [1]. Specifically, a reiterated progression search is performed on the unsatisfiable core returned by the SAT solver. Such a procedure significantly reduce the size of the unsatisfiable core, even if it does not necessarily returns an unsatisfiable core of minimal size. Since minimality of the unsatisfiable cores is not a requirement for the Additionally, satisfiability checks performed during the shrinking process are subject to a budget on the number of conflicts, so that the overhead due to hard checks is limited. Specifically, the budget is set to the number of conflicts arose in the satisfiability check that lead to detecting the unsatisfiable core; if such a number is less than 1000 (one thousand), the budget is raised to 1000. The budget is divided by 2 every time the progression is reiterated. MaxSAT Evaluation 2017: Solver and Benchmark Descriptions, volume B-2017-2 of Department of Computer Science Series of Publications B, University of Helsinki 2017.

...read moreread less

22 citations

Journal Article•

Low-rank doubly stochastic matrix decomposition for cluster analysis

[...]

Zhirong Yang¹, Jukka Corander², Erkki Oja³•Institutions (3)

University of Helsinki¹, Helsinki Institute for Information Technology², Aalto University³

01 Jan 2016-Journal of Machine Learning Research

TL;DR: This work proposes a new low-rank learning method that approximately decomposes a sparse input similarity in a normalized way and its objective can be used to learn both cluster assignments and the number of clusters.

...read moreread less

Abstract: Cluster analysis by nonnegative low-rank approximations has experienced a remarkable progress in the past decade However, the majority of such approximation approaches are still restricted to nonnegative matrix factorization (NMF) and suffer from the following two drawbacks: 1) they are unable to produce balanced partitions for large-scale manifold data which are common in real-world clustering tasks; 2) most existing NMF-type clustering methods cannot automatically determine the number of clusters We propose a new low-rank learning method to address these two problems, which is beyond matrix factorization Our method approximately decomposes a sparse input similarity in a normalized way and its objective can be used to learn both cluster assignments and the number of clusters For efficient optimization, we use a relaxed formulation based on Data-Cluster-Data random walk, which is also shown to be equivalent to low-rank factorization of the doubly-stochastically normalized cluster incidence matrix The probabilistic cluster assignments can thus be learned with a multiplicative majorization-minimization algorithm Experimental results show that the new method is more accurate both in terms of clustering large-scale manifold data sets and of selecting the number of clusters

...read moreread less

22 citations

Book Chapter•DOI•

Node Labels in Local Decision

[...]

Pierre Fraigniaud¹, Juho Hirvonen², Jukka Suomela²•Institutions (2)

Paris Diderot University¹, Helsinki Institute for Information Technology²

14 Jul 2015

TL;DR: This work gives a complete characterisation of the weakest oracle that leaks at least as much information as the unique identifiers in the context of local decision problems, and classifies scalar oracles as large and small, depending on their asymptotic behaviour.

...read moreread less

Abstract: The role of unique node identifiers in network computing is well understood as far as symmetry breaking is concerned. However, the unique identifiers also leak information about the computing environment--in particular, they provide some nodes with information related to the size of the network. It was recently proved that in the context of local decision, there are some decision problems such that 1i?źthey cannot be solved without unique identifiers, and 2i?źunique node identifiers leak a sufficient amount of information such that the problem becomes solvable PODC 2013. In this work we study what is the minimal amount of information that we need to leak from the environment to the nodes in order to solve local decision problems. Our key results are related to scalar oraclesf that, for any given n, provide a multiset fn of n labels; then the adversary assigns the labels to the n nodes in the network. This is a direct generalisation of the usual assumption of unique node identifiers. We give a complete characterisation of the weakest oracle that leaks at least as much information as the unique identifiers. Our main result is the following dichotomy: we classify scalar oracles as large and small, depending on their asymptotic behaviour, and show that 1i?źany large oracle is at least as powerful as the unique identifiers in the context of local decision problems, while 2i?źfor any small oracle there are local decision problems that still benefit from unique identifiers.

...read moreread less

22 citations

Posted Content•

Tight Differential Privacy for Discrete-Valued Mechanisms and for the Subsampled Gaussian Mechanism Using FFT.

[...]

Antti Koskela¹, Antti Koskela², Joonas Jälkö³, Lukas Prediger³, Antti Honkela¹ - Show less +1 more•Institutions (3)

University of Helsinki¹, Helsinki Institute for Information Technology², Aalto University³

12 Jun 2020-arXiv: Machine Learning

TL;DR: A novel approach to accurate privacy accounting of the subsampled Gaussian mechanism using the recently introduced Fast Fourier Transform based accounting technique to give a strict lower and upper bounds for the true $(\varepsilon,\delta)$-values.

...read moreread less

Abstract: We propose a numerical accountant for evaluating the tight $(\varepsilon,\delta)$-privacy loss for algorithms with discrete one dimensional output. The method is based on the privacy loss distribution formalism and it uses the recently introduced fast Fourier transform based accounting technique. We carry out an error analysis of the method in terms of moment bounds of the privacy loss distribution which leads to rigorous lower and upper bounds for the true $(\varepsilon,\delta)$-values. As an application, we present a novel approach to accurate privacy accounting of the subsampled Gaussian mechanism. This completes the previously proposed analysis by giving strict lower and upper bounds for the privacy parameters. We demonstrate the performance of the accountant on the binomial mechanism and show that our approach allows decreasing noise variance up to 75 percent at equal privacy compared to existing bounds in the literature. We also illustrate how to compute tight bounds for the exponential mechanism applied to counting queries.

...read moreread less

22 citations

Book Chapter•DOI•

Indexing finite language representation of population genotypes

[...]

Jouni Sirén¹, Niko Välimäki¹, Veli Mäkinen¹•Institutions (1)

Helsinki Institute for Information Technology¹

05 Sep 2011

TL;DR: A way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account.

...read moreread less

Abstract: We propose a way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account. This is achieved through converting a multiple alignment of individual genomes into a finite automaton recognizing all strings that can be read from the alignment by switching the sequence at any time. The finite automaton is indexed with an extension of Burrows-Wheeler transform to allow pattern search inside the plausible recombinant sequences. The size of the index stays limited, because of the high similarity of individual genomes. The index finds applications in variation calling and in primer design.

...read moreread less

22 citations

Collapse

Authors

Showing all 632 results

Name	H-index	Papers	Citations
Dimitri P. Bertsekas	94	332	85939
Olli Kallioniemi	90	353	42021
Heikki Mannila	72	295	26500
Jukka Corander	66	411	17220
Jaakko Kangasjärvi	62	146	17096
Aapo Hyvärinen	61	301	44146
Samuel Kaski	58	522	14180
Nadarajah Asokan	58	327	11947
Aristides Gionis	58	292	19300
Hannu Toivonen	56	192	19316
Nicola Zamboni	53	128	11397
Jorma Rissanen	52	151	22720
Tero Aittokallio	52	271	8689
Juha Veijola	52	261	19588
Juho Hamari	51	176	16631

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

93% related

Microsoft

86.9K papers, 4.1M citations

38.6K papers, 1.3M citations

92% related

Carnegie Mellon University

104.3K papers, 5.9M citations

91% related

Facebook

10.9K papers, 570.1K citations

91% related

Performance

Metrics

1,967

Papers

76,126

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	1
2022	4
2021	85
2020	97
2019	140
2018	127