scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
01 Jan 2019
TL;DR: The MaxSAT solver MAXINO as mentioned in this paper is based on the k-ProcessCore algorithm, a parametric algorithm generalizing OLL, ONE and PMRES, which is used in the SAT solver.
Abstract: Maxino is based on the k-ProcessCore algorithm, a parametric algorithm generalizing OLL, ONE and PMRES. Parameter k is dynamically determined for each processed unsatisfiable core by a function taking into account the size of the core. Roughly, k is in O(logn), where n is the size of the core. Satisfiability of propositional theories is checked by means of a pseudo-boolean solver extending Glucose 4.1 (single thread). A VERY SHORT DESCRIPTION OF THE SOLVER The solver MAXINO is build on top of the SAT solver GLUCOSE [7] (version 4.1). MaxSAT instances are normalized by replacing non-unary soft clauses with fresh variables, a process known as relaxation. Specifically, the relaxation of a soft clause φ is the clause φ ∨ ¬x, where x is a variable not occurring elsewhere; moreover, the weight associated with clause φ is associated with the soft literal x. Hence, the normalized input processed by MAXINO comprises hard clauses and soft literals, so that the computational problem amounts to maximize a linear function, which is defined by the soft literals, subject to a set of constraints, which is the set of hard clauses. The algorithm implemented by MAXINO to address such a computational problem is based on unsatisfiable core analysis, and in particular takes advantage of the following invariant: A model of the constraints that satisfies all soft literals is an optimum model. The algorithm then starts by searching such a model. On the other hand, if an inconsistency arises, the unsatisfiable core returned by the SAT solver is analyzed. The analysis of an unsatisfiable core results into new constraints and new soft literals, which replace the soft literals involved in the unsatisfiable core. The new constraints are essentially such that models satisfying all new soft literals actually satisfy all but one of the replaced soft literals. Since there is no model that satisfies all replaced soft literals, it turns out that the invariant is preserved, and the process can be iterated. Specifically, the algorithm implemented by MAXINO is K, based on the k-ProcessCore procedure introduced by Alviano et al. [2]. It is a parametric algorithm generalizing OLL [3], ONE [2] and PMRES [8]. Intuitively, for an unsatisfiable core {x0, x1, x2, x3}, ONE introduces the following constraint: x0 + x1 + x2 + x3 + ¬y1 + ¬y2 + ¬y3 ≥ 3 y1 → y2 y2 → y3 where y1, y2, y3 are fresh variables (the new soft literals that replace x0, x1, x2, x3). OLL introduces the following constraints (the first immediately, the second if a core containing y1 is subsequently found, and the third if a core containing y2 is subsequently found): x0 + x1 + x2 + x3 + ¬y1 ≥ 3 x0 + x1 + x2 + x3 + ¬y2 ≥ 2 x0 + x1 + x2 + x3 + ¬y3 ≥ 1 Concerning PMRES, it introduces the following constraints: x0 ∨ x1 ∨ ¬y1 z1 ↔ x0 ∧ x1 z1 ∨ x2 ∨ ¬y2 z2 ↔ z1 ∧ x2 z2 ∨ x3 ∨ ¬y3 which are essentially equivalent to the following constraints: x0 + x1 + ¬z1 + ¬y1 ≥ 2 z1 → y1 z1 + x2 + ¬z2 + ¬y2 ≥ 2 z2 → y2 z2 + x3 + ¬y3 ≥ 1 where y1, y2, y3 are fresh variables (the new soft literals that replace x0, x1, x2, x3), and z1, z2 are fresh auxiliary variables. Algorithm K, instead, introduces a set of constraints of bounded size, where the bound is given by the chosen parameter k, and is specifically 2 · (k+1). ONE, which is essentially a smart encoding of OLL, is the special case for k = ∞, and PMRES is the special case for k = 1. For the example unsatisfiable core, another possibility is k = 2, which would results in the following constraints: x0 + x1 + x2 + ¬z1 + ¬y1 + ¬y2 ≥ 3 z1 → y1 y1 → y2 z1 + x3 + ¬y3 ≥ 1 In this version of MAXINO, the parameter k is dynamically determined based on the size of the analyzed unsatisfiable core: k ∈ O(log n), where n is the size of the core. The analysis of unsatisfiable core is preceded by a shrink procedure [1]. Specifically, a reiterated progression search is performed on the unsatisfiable core returned by the SAT solver. Such a procedure significantly reduce the size of the unsatisfiable core, even if it does not necessarily returns an unsatisfiable core of minimal size. Since minimality of the unsatisfiable cores is not a requirement for the Additionally, satisfiability checks performed during the shrinking process are subject to a budget on the number of conflicts, so that the overhead due to hard checks is limited. Specifically, the budget is set to the number of conflicts arose in the satisfiability check that lead to detecting the unsatisfiable core; if such a number is less than 1000 (one thousand), the budget is raised to 1000. The budget is divided by 2 every time the progression is reiterated. MaxSAT Evaluation 2017: Solver and Benchmark Descriptions, volume B-2017-2 of Department of Computer Science Series of Publications B, University of Helsinki 2017.

22 citations

Journal Article
TL;DR: This work proposes a new low-rank learning method that approximately decomposes a sparse input similarity in a normalized way and its objective can be used to learn both cluster assignments and the number of clusters.
Abstract: Cluster analysis by nonnegative low-rank approximations has experienced a remarkable progress in the past decade However, the majority of such approximation approaches are still restricted to nonnegative matrix factorization (NMF) and suffer from the following two drawbacks: 1) they are unable to produce balanced partitions for large-scale manifold data which are common in real-world clustering tasks; 2) most existing NMF-type clustering methods cannot automatically determine the number of clusters We propose a new low-rank learning method to address these two problems, which is beyond matrix factorization Our method approximately decomposes a sparse input similarity in a normalized way and its objective can be used to learn both cluster assignments and the number of clusters For efficient optimization, we use a relaxed formulation based on Data-Cluster-Data random walk, which is also shown to be equivalent to low-rank factorization of the doubly-stochastically normalized cluster incidence matrix The probabilistic cluster assignments can thus be learned with a multiplicative majorization-minimization algorithm Experimental results show that the new method is more accurate both in terms of clustering large-scale manifold data sets and of selecting the number of clusters

22 citations

Book ChapterDOI
14 Jul 2015
TL;DR: This work gives a complete characterisation of the weakest oracle that leaks at least as much information as the unique identifiers in the context of local decision problems, and classifies scalar oracles as large and small, depending on their asymptotic behaviour.
Abstract: The role of unique node identifiers in network computing is well understood as far as symmetry breaking is concerned. However, the unique identifiers also leak information about the computing environment--in particular, they provide some nodes with information related to the size of the network. It was recently proved that in the context of local decision, there are some decision problems such that 1i?źthey cannot be solved without unique identifiers, and 2i?źunique node identifiers leak a sufficient amount of information such that the problem becomes solvable PODC 2013. In this work we study what is the minimal amount of information that we need to leak from the environment to the nodes in order to solve local decision problems. Our key results are related to scalar oraclesf that, for any given n, provide a multiset fn of n labels; then the adversary assigns the labels to the n nodes in the network. This is a direct generalisation of the usual assumption of unique node identifiers. We give a complete characterisation of the weakest oracle that leaks at least as much information as the unique identifiers. Our main result is the following dichotomy: we classify scalar oracles as large and small, depending on their asymptotic behaviour, and show that 1i?źany large oracle is at least as powerful as the unique identifiers in the context of local decision problems, while 2i?źfor any small oracle there are local decision problems that still benefit from unique identifiers.

22 citations

Posted Content
TL;DR: A novel approach to accurate privacy accounting of the subsampled Gaussian mechanism using the recently introduced Fast Fourier Transform based accounting technique to give a strict lower and upper bounds for the true $(\varepsilon,\delta)$-values.
Abstract: We propose a numerical accountant for evaluating the tight $(\varepsilon,\delta)$-privacy loss for algorithms with discrete one dimensional output. The method is based on the privacy loss distribution formalism and it uses the recently introduced fast Fourier transform based accounting technique. We carry out an error analysis of the method in terms of moment bounds of the privacy loss distribution which leads to rigorous lower and upper bounds for the true $(\varepsilon,\delta)$-values. As an application, we present a novel approach to accurate privacy accounting of the subsampled Gaussian mechanism. This completes the previously proposed analysis by giving strict lower and upper bounds for the privacy parameters. We demonstrate the performance of the accountant on the binomial mechanism and show that our approach allows decreasing noise variance up to 75 percent at equal privacy compared to existing bounds in the literature. We also illustrate how to compute tight bounds for the exponential mechanism applied to counting queries.

22 citations

Book ChapterDOI
05 Sep 2011
TL;DR: A way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account.
Abstract: We propose a way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account. This is achieved through converting a multiple alignment of individual genomes into a finite automaton recognizing all strings that can be read from the alignment by switching the sequence at any time. The finite automaton is indexed with an extension of Burrows-Wheeler transform to allow pattern search inside the plausible recombinant sequences. The size of the index stays limited, because of the high similarity of individual genomes. The index finds applications in variation calling and in primer design.

22 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127