Institution
Helsinki Institute for Information Technology
Facility•Espoo, Finland•
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.
Papers published on a yearly basis
Papers
More filters
••
07 Oct 2013
TL;DR: This work considers how to index strings, trees and graphs for jumbled pattern matching when the authors are asked to return a match if one exists, and shows how to build a quadratic-space index with which to find a match in time proportional to the size of the match.
Abstract: We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.
17 citations
••
27 Jun 2013TL;DR: This work argues for considering information sharing to be transactions within a community, and describes how peer production of privacy is possible through PETs that are grounded in the notion of information as a common-pool resource subject to community governance.
Abstract: Privacy risks have been addressed through technical solutions such as Privacy-Enhancing Technologies (PETs) as well as regulatory measures including Do Not Track. These approaches are inherently limited as they are grounded in the paradigm of a rational end user who can determine, articulate, and manage consistent privacy preferences. This assumes that self-serving efforts to enact privacy preferences lead to socially optimal outcomes with regard to information sharing. We argue that this assumption typically does not hold true. Consequently, solutions to specific risks are developed - even mandated - without effective reduction in the overall harm of privacy breaches. We present a systematic framework to examine these limitations of current technical and policy solutions. To address the shortcomings of existing privacy solutions, we argue for considering information sharing to be transactions within a community. Outcomes of privacy management can be improved at a lower overall cost if peers, as a community, are empowered by appropriate technical and policy mechanisms. Designing for a community requires encouraging dialogue, enabling transparency, and supporting enforcement of community norms. We describe how peer production of privacy is possible through PETs that are grounded in the notion of information as a common-pool resource subject to community governance.
17 citations
•
21 Mar 2012TL;DR: A simple statistical test for inferring whether an estimator of a causal eect is consistent when controlling for a subset of measured covariates is provided, and heuristics to search for such a set are presented.
Abstract: In many fields of science researchers are faced with the problem of estimating causal eects from non-experimental data. A key issue is to avoid inconsistent estimators due to confounding by measured or unmeasured covariates, a problem commonly solved by ‘adjusting for’ a subset of the observed variables. When the data generating process can be represented by a directed acyclic graph, and this graph structure is known, there exist simple graphical procedures for determining which subset of covariates should be adjusted for to obtain consistent estimators of the causal effects. However, when the graph is not known no general and complete procedures for this task are available. In this paper we introduce such a method for linear non-Gaussian models, requiring only partial knowledge about the temporal ordering of the variables: We provide a simple statistical test for inferring whether an estimator of a causal eect is consistent when controlling for a subset of measured covariates, and we present heuristics to search for such a set. We show empirically that this statistical test identifies consistent vs inconsistent estimates, and that the search =
17 citations
••
TL;DR: Synthetic utility of DERA aldolase was improved by protein engineering approaches, and a novel machine learning model utilising Gaussian processes and feature learning was applied for the 3rd mutagenesis round to predict new beneficial mutant combinations.
Abstract: In this work, deoxyribose-5-phosphate aldolase (Ec DERA, EC 4.1.2.4) from Escherichia coli was chosen as the protein engineering target for improving the substrate preference towards smaller, non-phosphorylated aldehyde donor substrates, in particular towards acetaldehyde. The initial broad set of mutations was directed to 24 amino acid positions in the active site or in the close vicinity, based on the 3D complex structure of the E. coli DERA wild-type aldolase. The specific activity of the DERA variants containing one to three amino acid mutations was characterised using three different substrates. A novel machine learning (ML) model utilising Gaussian processes and feature learning was applied for the 3rd mutagenesis round to predict new beneficial mutant combinations. This led to the most clear-cut (two- to threefold) improvement in acetaldehyde (C2) addition capability with the concomitant abolishment of the activity towards the natural donor molecule glyceraldehyde-3-phosphate (C3P) as well as the non-phosphorylated equivalent (C3). The Ec DERA variants were also tested on aldol reaction utilising formaldehyde (C1) as the donor. Ec DERA wild-type was shown to be able to carry out this reaction, and furthermore, some of the improved variants on acetaldehyde addition reaction turned out to have also improved activity on formaldehyde. KEY POINTS: • DERA aldolases are promiscuous enzymes. • Synthetic utility of DERA aldolase was improved by protein engineering approaches. • Machine learning methods aid the protein engineering of DERA.
17 citations
••
TL;DR: A comparative crowdsourced user study established that EulerView and SetNet, both of which draw the sets first, yield significantly faster user responses than Bubble Sets, KelpFusion and LineSets, all of which drew the network first.
17 citations
Authors
Showing all 632 results
Name | H-index | Papers | Citations |
---|---|---|---|
Dimitri P. Bertsekas | 94 | 332 | 85939 |
Olli Kallioniemi | 90 | 353 | 42021 |
Heikki Mannila | 72 | 295 | 26500 |
Jukka Corander | 66 | 411 | 17220 |
Jaakko Kangasjärvi | 62 | 146 | 17096 |
Aapo Hyvärinen | 61 | 301 | 44146 |
Samuel Kaski | 58 | 522 | 14180 |
Nadarajah Asokan | 58 | 327 | 11947 |
Aristides Gionis | 58 | 292 | 19300 |
Hannu Toivonen | 56 | 192 | 19316 |
Nicola Zamboni | 53 | 128 | 11397 |
Jorma Rissanen | 52 | 151 | 22720 |
Tero Aittokallio | 52 | 271 | 8689 |
Juha Veijola | 52 | 261 | 19588 |
Juho Hamari | 51 | 176 | 16631 |