scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Book ChapterDOI
07 Oct 2013
TL;DR: This work considers how to index strings, trees and graphs for jumbled pattern matching when the authors are asked to return a match if one exists, and shows how to build a quadratic-space index with which to find a match in time proportional to the size of the match.
Abstract: We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

17 citations

Proceedings ArticleDOI
27 Jun 2013
TL;DR: This work argues for considering information sharing to be transactions within a community, and describes how peer production of privacy is possible through PETs that are grounded in the notion of information as a common-pool resource subject to community governance.
Abstract: Privacy risks have been addressed through technical solutions such as Privacy-Enhancing Technologies (PETs) as well as regulatory measures including Do Not Track. These approaches are inherently limited as they are grounded in the paradigm of a rational end user who can determine, articulate, and manage consistent privacy preferences. This assumes that self-serving efforts to enact privacy preferences lead to socially optimal outcomes with regard to information sharing. We argue that this assumption typically does not hold true. Consequently, solutions to specific risks are developed - even mandated - without effective reduction in the overall harm of privacy breaches. We present a systematic framework to examine these limitations of current technical and policy solutions. To address the shortcomings of existing privacy solutions, we argue for considering information sharing to be transactions within a community. Outcomes of privacy management can be improved at a lower overall cost if peers, as a community, are empowered by appropriate technical and policy mechanisms. Designing for a community requires encouraging dialogue, enabling transparency, and supporting enforcement of community norms. We describe how peer production of privacy is possible through PETs that are grounded in the notion of information as a common-pool resource subject to community governance.

17 citations

Proceedings Article
21 Mar 2012
TL;DR: A simple statistical test for inferring whether an estimator of a causal eect is consistent when controlling for a subset of measured covariates is provided, and heuristics to search for such a set are presented.
Abstract: In many fields of science researchers are faced with the problem of estimating causal eects from non-experimental data. A key issue is to avoid inconsistent estimators due to confounding by measured or unmeasured covariates, a problem commonly solved by ‘adjusting for’ a subset of the observed variables. When the data generating process can be represented by a directed acyclic graph, and this graph structure is known, there exist simple graphical procedures for determining which subset of covariates should be adjusted for to obtain consistent estimators of the causal effects. However, when the graph is not known no general and complete procedures for this task are available. In this paper we introduce such a method for linear non-Gaussian models, requiring only partial knowledge about the temporal ordering of the variables: We provide a simple statistical test for inferring whether an estimator of a causal eect is consistent when controlling for a subset of measured covariates, and we present heuristics to search for such a set. We show empirically that this statistical test identifies consistent vs inconsistent estimates, and that the search =

17 citations

Journal ArticleDOI
TL;DR: Synthetic utility of DERA aldolase was improved by protein engineering approaches, and a novel machine learning model utilising Gaussian processes and feature learning was applied for the 3rd mutagenesis round to predict new beneficial mutant combinations.
Abstract: In this work, deoxyribose-5-phosphate aldolase (Ec DERA, EC 4.1.2.4) from Escherichia coli was chosen as the protein engineering target for improving the substrate preference towards smaller, non-phosphorylated aldehyde donor substrates, in particular towards acetaldehyde. The initial broad set of mutations was directed to 24 amino acid positions in the active site or in the close vicinity, based on the 3D complex structure of the E. coli DERA wild-type aldolase. The specific activity of the DERA variants containing one to three amino acid mutations was characterised using three different substrates. A novel machine learning (ML) model utilising Gaussian processes and feature learning was applied for the 3rd mutagenesis round to predict new beneficial mutant combinations. This led to the most clear-cut (two- to threefold) improvement in acetaldehyde (C2) addition capability with the concomitant abolishment of the activity towards the natural donor molecule glyceraldehyde-3-phosphate (C3P) as well as the non-phosphorylated equivalent (C3). The Ec DERA variants were also tested on aldol reaction utilising formaldehyde (C1) as the donor. Ec DERA wild-type was shown to be able to carry out this reaction, and furthermore, some of the improved variants on acetaldehyde addition reaction turned out to have also improved activity on formaldehyde. KEY POINTS: • DERA aldolases are promiscuous enzymes. • Synthetic utility of DERA aldolase was improved by protein engineering approaches. • Machine learning methods aid the protein engineering of DERA.

17 citations

Journal ArticleDOI
TL;DR: A comparative crowdsourced user study established that EulerView and SetNet, both of which draw the sets first, yield significantly faster user responses than Bubble Sets, KelpFusion and LineSets, all of which drew the network first.

17 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127