Showing papers by "Helsinki Institute for Information Technology published in 2012"

PDF

Open Access

Journal Article•DOI•

Habits make smartphone use more pervasive

[...]

Antti Oulasvirta¹, Tye Rattenbury², Lingyi Ma¹, Eeva Raita¹•Institutions (2)

Helsinki Institute for Information Technology¹, Intel²

01 Jan 2012

TL;DR: It is found that checking habits occasionally spur users to do other things with the device and may increase usage overall, and supporting habit-formation is an opportunity for making smartphones more “personal” and “pervasive.”

...read moreread less

Abstract: Examining several sources of data on smartphone use, this paper presents evidence for the popular conjecture that mobile devices are "habit-forming." The form of habits we identified is called a checking habit: brief, repetitive inspection of dynamic content quickly accessible on the device. We describe findings on kinds and frequencies of checking behaviors in three studies. We found that checking habits occasionally spur users to do other things with the device and may increase usage overall. Data from a controlled field experiment show that checking behaviors emerge and are reinforced by informational "rewards" that are very quickly accessible. Qualitative data suggest that although repetitive habitual use is frequent, it is experienced more as an annoyance than an addiction. We conclude that supporting habit-formation is an opportunity for making smartphones more "personal" and "pervasive."

...read moreread less

959 citations

Journal Article•DOI•

Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

[...]

Michael U. Gutmann¹, Aapo Hyvärinen¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 Jan 2012-Journal of Machine Learning Research

TL;DR: The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise and it is shown that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models.

...read moreread less

Abstract: We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities.

...read moreread less

695 citations

Journal Article•DOI•

Theory and Practice of Bloom Filters for Distributed Systems

[...]

Sasu Tarkoma¹, Christian Esteve Rothenberg², Eemil Lagerspetz¹•Institutions (2)

Helsinki Institute for Information Technology¹, State University of Campinas²

21 Jan 2012-IEEE Communications Surveys and Tutorials

TL;DR: An overview of the basic and advanced probabilistic techniques is given, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

...read moreread less

Abstract: Many network solutions and overlay networks utilize probabilistic techniques to reduce information processing and networking costs. This survey article presents a number of frequently used and useful probabilistic techniques. Bloom filters and their variants are of prime importance, and they are heavily used in various distributed systems. This has been reflected in recent research and many new algorithms have been proposed for distributed systems that are either directly or indirectly based on Bloom filters. In this survey, we give an overview of the basic and advanced techniques, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

...read moreread less

480 citations

Journal Article•DOI•

Predicting drug–target interactions from chemical and genomic kernels using Bayesian matrix factorization

[...]

Mehmet Gönen¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 Sep 2012-Bioinformatics

TL;DR: This work considers four different drug-target interaction networks from humans involving enzymes, ion channels, G-protein-coupled receptors and nuclear receptors and proposes a novel Bayesian formulation that combines dimensionality reduction, matrix factorization and binary classification for predicting drug- target interaction networks.

...read moreread less

Abstract: Motivation: Identifying interactions between drug compounds and target proteins has a great practical importance in the drug discovery process for known diseases. Existing databases contain very few experimentally validated drug–target interactions and formulating successful computational methods for predicting interactions remains challenging. Results: In this study, we consider four different drug–target interaction networks from humans involving enzymes, ion channels, G-protein-coupled receptors and nuclear receptors. We then propose a novel Bayesian formulation that combines dimensionality reduction, matrix factorization and binary classification for predicting drug–target interaction networks using only chemical similarity between drug compounds and genomic similarity between target proteins. The novelty of our approach comes from the joint Bayesian formulation of projecting drug compounds and target proteins into a unified subspace using the similarities and estimating the interaction network in that subspace. We propose using a variational approximation in order to obtain an efficient inference scheme and give its detailed derivations. Finally, we demonstrate the performance of our proposed method in three different scenarios: (i) exploratory data analysis using low-dimensional projections, (ii) predicting interactions for the out-of-sample drug compounds and (iii) predicting unknown interactions of the given network. Availability: Software and Supplementary Material are available at http://users.ics.aalto.fi/gonen/kbmf2k. Contact: mehmet.gonen@aalto.fi Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

358 citations

Journal Article•DOI•

Metabolite identification and molecular fingerprint prediction through machine learning

[...]

Markus Heinonen¹, Huibin Shen¹, Nicola Zamboni¹, Juho Rousu¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 Sep 2012-Bioinformatics

TL;DR: This work introduces a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine, and demonstrates that several molecular properties can be predicted to high accuracy and used in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule.

...read moreread less

Abstract: Motivation: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. Results: We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. Availability: An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.net/p/fingerid. Contact: markus.heinonen@cs.helsinki.fi

...read moreread less

144 citations

Journal Article•DOI•

Learning linear cyclic causal models with latent variables

[...]

Antti Hyttinen¹, Frederick Eberhardt², Patrik O. Hoyer¹•Institutions (2)

Helsinki Institute for Information Technology¹, Carnegie Mellon University²

01 Jan 2012-Journal of Machine Learning Research

TL;DR: The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) by adapting the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAMchallenges.

...read moreread less

Abstract: Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identifies linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufficient conditions for identifiability, a search algorithm that is complete, and a discussion of what can be done when the identifiability conditions are not satisfied. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAMchallenges. The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) and Hyttinen et al. (2010).

...read moreread less

112 citations

Journal Article•DOI•

Storytelling Group – a co-design method for service design

[...]

Anu Kankainen¹, Kirsikka Vaajakallio², Vesa Kantola¹, Tuuli Mattelmäki²•Institutions (2)

Helsinki Institute for Information Technology¹, Aalto University²

01 Mar 2012-Behaviour & Information Technology

TL;DR: This article introduces a co-design method called Storytelling Group that has been developed and tested in three service design cases and is a quick start for actual design work but still includes users in the process.

...read moreread less

Abstract: In this article, we will introduce a co-design method called Storytelling Group that has been developed and tested in three service design cases. Storytelling Group combines collaborative scenario ...

...read moreread less

108 citations

Journal Article•DOI•

Phylogeographic variation in recombination rates within a global clone of methicillin-resistant Staphylococcus aureus

[...]

Santiago Castillo-Ramírez¹, Jukka Corander², Pekka Marttinen³, Pekka Marttinen⁴, Mona Aldeljawi¹, William P. Hanage⁵, Henrik Westh⁶, Henrik Westh⁷, Kit Boye⁷, Zeynep Gülay⁸, Stephen D. Bentley⁹, Julian Parkhill⁹, Matthew T. G. Holden⁹, Edward J. Feil¹ - Show less +10 more•Institutions (9)

University of Bath¹, University of Helsinki², Helsinki Institute for Information Technology³, Aalto University⁴, Harvard University⁵, University of Copenhagen⁶, Hvidovre Hospital⁷, Dokuz Eylül University⁸, Wellcome Trust Sanger Institute⁹

27 Dec 2012-Genome Biology

TL;DR: It is shown that the rapid global dissemination of a single pathogenic bacterial clone results in local variation in measured recombination rates, and possible explanatory variables include the size and time since emergence of each defined sub-population, variation in transmission dynamics due to host movement, and changes in the bacterial genome affecting the propensity for recombination.

...read moreread less

Abstract: Background Next-generation sequencing (NGS) is a powerful tool for understanding both patterns of descent over time and space (phylogeography) and the molecular processes underpinning genome divergence in pathogenic bacteria. Here, we describe a synthesis between these perspectives by employing a recently developed Bayesian approach, BRATNextGen, for detecting recombination on an expanded NGS dataset of the globally disseminated methicillin-resistant Staphylococcus aureus (MRSA) clone ST239.

...read moreread less

107 citations

Posted Content•

On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter

[...]

Tomi Silander¹, Petri Kontkanen¹, Petri Myllymäki¹•Institutions (1)

Helsinki Institute for Information Technology¹

20 Jun 2012-arXiv: Learning

TL;DR: In this paper, the authors show that the solution of the network structure optimization problem is highly sensitive to the chosen alpha parameter value, and discuss ideas for solving this problem, but no generally accepted rule for determining the alpha parameter has been suggested.

...read moreread less

Abstract: BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size alpha. Unfortunately no generally accepted rule for determining the alpha parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen alpha parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.

...read moreread less

92 citations

Journal Article•DOI•

Empirical evaluation of scoring functions for Bayesian network model selection

[...]

Zhifa Liu¹, Brandon Malone², Brandon Malone¹, Changhe Yuan¹, Changhe Yuan³ - Show less +1 more•Institutions (3)

Mississippi State University¹, Helsinki Institute for Information Technology², Queens College³

11 Sep 2012-BMC Bioinformatics

TL;DR: A major finding of this study suggests that the Minimum Description Length (MDL) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures.

...read moreread less

Abstract: In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. We believe this finding is a result of using both datasets generated from real-world applications rather than from random processes used in previous studies and learning algorithms to select high-scoring structures rather than selecting random models. Other findings of our study support existing work, e.g., large sample sizes result in learning structures closer to the true underlying structure; the BDeu score is sensitive to the parameter settings; and the fNML performs pretty well on small datasets. We also tested a greedy hill climbing algorithm and observed similar results as the optimal algorithm.

...read moreread less

87 citations

Journal Article•DOI•

Musical Activities Predispose to Involuntary Musical Imagery

[...]

Lassi A. Liikkanen¹, Lassi A. Liikkanen²•Institutions (2)

Stanford University¹, Helsinki Institute for Information Technology²

01 Mar 2012-Psychology of Music

TL;DR: The authors found that 89.2% of participants reported experiencing repeated involuntary imagery of music at least once a week and that the amount of music practice and listening was positively related to the frequency of involuntary music.

...read moreread less

Abstract: Involuntary semantic memories are a new topic in psychology. Initial research has suggested that musical memories are a dominant type of involuntary memory. Interestingly, no comprehensive information exists on the commonality of “earworms”, or repeated involuntary imagery of music (INMI), and its relationship to the engagement with musical activities. The present study investigated these using cross-sectional, retrospective reports from a questionnaire study that was conducted among Finnish internet users (N = 12,519). The analyses of the data revealed that 89.2% of participants reported experiencing this phenomenon at least once a week. The amount of music practice and listening was positively related to the frequency of involuntary music. Women reported elevated levels of involuntary imagery episodes in contrast to men, who reacted differently. In older age-groups the frequency of the incidents decreased among both sexes. People with extensive musical practice history seemed to experience longer musica...

...read moreread less

Proceedings Article•

Bayesian Group Factor Analysis

[...]

Seppo Virtanen¹, Arto Klami¹, Suleiman A. Khan², Samuel Kaski¹•Institutions (2)

Helsinki Institute for Information Technology¹, Aalto University²

21 Mar 2012

TL;DR: A factor analysis model is introduced that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does, and is applied to two data analysis tasks, in neuroimaging and chemical systems biology.

...read moreread less

Abstract: We introduce a factor analysis model that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does. A group may correspond to one view of the same set of objects, one of many data sets tied by co-occurrence, or a set of alternative variables collected from statistics tables to measure one property of interest. We show that by assuming groupwise sparse factors, active in a subset of the sets, the variation can be decomposed into factors explaining relationships between the sets and factors explaining away set-specific variation. We formulate the assumptions in a Bayesian model providing the factors, and apply the model to two data analysis tasks, in neuroimaging and chemical systems biology.

...read moreread less

Journal Article•DOI•

The traveling salesman problem in bounded degree graphs

[...]

Andreas Björklund¹, Thore Husfeldt², Petteri Kaski³, Mikko Koivisto³•Institutions (3)

Lund University¹, University of Copenhagen², Helsinki Institute for Information Technology³

25 Apr 2012-ACM Transactions on Algorithms

TL;DR: It is shown that the traveling salesman problem in bounded-degree graphs can be solved in time O((2-ε)ⁿ), where ε > 0 depends only on the degree bound but not on the number of cities.

...read moreread less

Abstract: We show that the traveling salesman problem in bounded-degree graphs can be solved in time O((2-e)n), where e > 0 depends only on the degree bound but not on the number of cities, n. The algorithm is a variant of the classical dynamic programming solution due to Bellman, and, independently, Held and Karp. In the case of bounded integer weights on the edges, we also give a polynomial-space algorithm with running time O((2-e)n) on bounded-degree graphs. In addition, we present an analogous analysis of Ryser's algorithm for the permanent of matrices with a bounded number of nonzero entries in each column.

...read moreread less

Journal Article•DOI•

Technical Section: Exploring the use of handheld AR for outdoor navigation

[...]

Andreas Dünser¹, Mark Billinghurst¹, James Wen¹, Ville Lehtinen², Antti Nurminen² - Show less +1 more•Institutions (2)

University of Canterbury¹, Helsinki Institute for Information Technology²

01 Dec 2012-Computers & Graphics

TL;DR: A user study comparing navigation with information typically provided by currently available handheld AR browsers, to navigation with a digital map, and a combined map and AR condition found no overall difference in task completion time, but found evidence that AR browsers are less useful for navigation in some environment conditions.

...read moreread less

Journal Article•DOI•

Social Interaction in Games: Measuring Physiological Linkage and Social Presence

[...]

Inger Ekman, Guillaume Chanel, Simo Järvelä, J. Matias Kivikangas, Mikko Salminen, Niklas Ravaja¹ - Show less +2 more•Institutions (1)

Helsinki Institute for Information Technology¹

01 Jun 2012-Simulation & Gaming

TL;DR: In this paper, a physiological metric for investigating social experience within a shared gaming context is introduced: Physiological linkage is measured by gathering simultaneous psychophysiological measurements from several players, and the authors discuss various measures used to calculate linkage, the related social processes, and how to use physiological linkage in game experience research.

...read moreread less

Abstract: Psychophysiological methodology has been successfully applied to investigate media responses, including the experience of playing digital games. The approach has many benefits for a player experience assessment-it can provide detailed, unbiased, and time-accurate data without interrupting the gameplay. However, gaming can be a highly social activity. This article extends the methodological focus from single player to include multiple simultaneous players. A physiological metric for investigating social experience within a shared gaming context is introduced: Physiological linkage is measured by gathering simultaneous psychophysiological measurements from several players. The authors review how physiological linkage may be associated with social presence among participants in various gaming situations or social contexts. These metrics provide such information about the interaction among participants that is not currently available by any other method. The authors discuss various measures used to calculate linkage, the related social processes, and how to use physiological linkage in game experience research.

...read moreread less

Journal Article•DOI•

From black and white to full color: extending redescription mining outside the Boolean world

[...]

Esther Galbrun¹, Pauli Miettinen²•Institutions (2)

Helsinki Institute for Information Technology¹, Max Planck Society²

01 Aug 2012-Statistical Analysis and Data Mining

TL;DR: This paper extends redescription mining to categorical and real‐valued data with possibly missing values using a surprisingly simple and efficient approach and shows the statistical significance of the results using recent innovations on randomization methods.

...read moreread less

Abstract: Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding such redescriptors, a task known as niche-finding, is of much importance in biology. Current redescription mining methods cannot handle other than Boolean data. This restricts the range of possible applications or makes discretization a pre-requisite, entailing a possibly harmful loss of information. In niche-finding, while the fauna can be naturally represented using a Boolean presence/absence data, the weather cannot. In this paper, we extend redescription mining to categorical and real-valued data with possibly missing values using a surprisingly simple and efficient approach. We provide extensive experimental evaluation to study the behavior of the proposed algorithm. Furthermore, we show the statistical significance of our results using recent innovations on randomization methods. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 (Part of this work was done when the author was with HIIT.)

...read moreread less

Posted Content•

Causal discovery of linear acyclic models with arbitrary distributions

[...]

Patrik O. Hoyer¹, Aapo Hyvärinen¹, Richard Scheines², Peter Spirtes², Joseph D. Ramsey², Gustavo Lacerda², Shohei Shimizu³ - Show less +3 more•Institutions (3)

Helsinki Institute for Information Technology¹, Carnegie Mellon University², Osaka University³

13 Jun 2012-arXiv: Machine Learning

TL;DR: The authors combine conditional independencies and independent component analysis to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible.

...read moreread less

Abstract: An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.

...read moreread less

Proceedings Article•DOI•

Real-world sybil attacks in BitTorrent mainline DHT

[...]

Liang Wang¹, Jussi Kangasharju²•Institutions (2)

University of Helsinki¹, Helsinki Institute for Information Technology²

01 Dec 2012

TL;DR: This paper considers two kinds of attacks on a DHT, one already known attack and one new kind of an attack, and shows how they can be targeted against Mainline DHT and proposes simple countermeasures against them.

...read moreread less

Abstract: Distributed hash tables (DHT) are a key building block for modern P2P content-distribution system, for example in implementing the distributed tracker of BitTorrent Mainline DHT. DHTs, due to their fully distributed nature, are known to be vulnerable to certain kinds of attacks and different kinds of defenses have been proposed against these attacks. In this paper, we consider two kinds of attacks on a DHT, one already known attack and one new kind of an attack, and show how they can be targeted against Mainline DHT. We complement them by an extensive measurement study using honeypots which shows that both attacks have been going on for a long time in the network and are still happening. We present numbers showing that the number of sybils in the Mainline DHT network is increasing and is currently around 300,000. We analyze the potential threats from these attacks and propose simple countermeasures against them.

...read moreread less

Proceedings Article•

Bayesian Efficient Multiple Kernel Learning

[...]

Mehmet G nen¹, Mehmet G nen²•Institutions (2)

Aalto University¹, Helsinki Institute for Information Technology²

26 Jun 2012

TL;DR: A fully conjugate Bayesian formulation is proposed and derived, which allows us to combine hundreds or thousands of kernels very efficiently and can be extended for multiclass learning and semi-supervised learning.

...read moreread less

Abstract: Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on the computational efficiency issue. However, it is still not feasible to combine many kernels using existing Bayesian approaches due to their high time complexity. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation, which allows us to combine hundreds or thousands of kernels very efficiently. We briefly explain how the proposed method can be extended for multiclass learning and semi-supervised learning. Experiments with large numbers of kernels on benchmark data sets show that our inference method is quite fast, requiring less than a minute. On one bioinformatics and three image recognition data sets, our method outperforms previously reported results with better generalization performance.

...read moreread less

Posted Content•

Cloud Computing: Centralization and Data Sovereignty

[...]

Primavera De Filippi¹, Smari McCarthy²•Institutions (2)

University of Paris¹, Helsinki Institute for Information Technology²

26 Oct 2012-Social Science Research Network

TL;DR: Cloud computing can be defined as the provision of computing resources on-demand over the Internet, which might bring a number of advantages to end-users in terms of accessibility and elasticity of costs, but problems arise concerning the collection of personal information in the Cloud and the legitimate exploitation thereof.

...read moreread less

Abstract: Cloud computing can be defined as the provision of computing resources on-demand over the Internet. Although this might bring a number of advantages to end-users in terms of accessibility and elasticity of costs, problems arise concerning the collection of personal information in the Cloud and the legitimate exploitation thereof. To the extent that most of the content and software application are only accessible online, users have no longer control over the manner in which they can access their data and the extent to which third parties can exploit it.

...read moreread less

Book Chapter•DOI•

Electrostatic modulated friction as tactile feedback: intensity perception

[...]

Dinesh Wijekoon¹, Marta E. Cecchinato², Eve Hoggan¹, Jukka Linjama•Institutions (2)

Helsinki Institute for Information Technology¹, University of Padua²

13 Jun 2012

TL;DR: The preliminary results from an experiment investigating the perceived intensity of modulated friction created by electrostatic force indicate that there are significant correlations between intensity perception and signal amplitude and the highest sensitivity was found at a frequency of 80 Hz.

...read moreread less

Abstract: We describe the preliminary results from an experiment investigating the perceived intensity of modulated friction created by electrostatic force, or electrovibration. A prototype experimental system was created to evaluate user perception of sinusoidal electrovibration stimuli on a flat surface emulating a touch screen interface. We introduce a fixed 6-point Effect Strength Subjective Index (ESSI) as a measure of generic sensation intensity, and compare it with an open magnitude scale. The results of the experiment indicate that there are significant correlations between intensity perception and signal amplitude, and the highest sensitivity was found at a frequency of 80 Hz. The subjective results show that the users perceived the electrovibration stimuli as pleasant and a useful means of feedback for touchscreens.

...read moreread less

Journal Article•DOI•

Drawing on mobile crowds via social media Case UbiAsk: Image based mobile social search across languages

[...]

Yefeng Liu¹, Vili Lehdonvirta², Todorka Alexandrova¹, Tatsuo Nakajima¹•Institutions (2)

Waseda University¹, Helsinki Institute for Information Technology²

01 Feb 2012-Multimedia Systems

TL;DR: A mobile crowdsourcing platform that is built on top of social media, called UbiAsk, designed for assisting foreign visitors by involving the local crowd to answer their image-based questions at hand in a timely fashion is presented.

...read moreread less

Abstract: Recent years have witnessed the impact of crowdsourcing model, social media, and pervasive computing. We believe that the more significant impact is latent in the convergence of these ideas on the mobile platform. In this paper, we introduce a mobile crowdsourcing platform that is built on top of social media. A mobile crowdsourcing application called UbiAsk is presented as one study case. UbiAsk is designed for assisting foreign visitors by involving the local crowd to answer their image-based questions at hand in a timely fashion. Existing social media platforms are used to rapidly allocate microtasks to a wide network of local residents. The resulting data are visualized using a mapping tool as well as augmented reality (AR) technology, result in a visual information pool for public use. We ran a controlled field experiment in Japan for 6 weeks with 55 participants. The results demonstrated a reliable performance on response speed and response quantity: half of the requests were answered within 10 min, 75% of requests were answered within 30 min, and on average every request had 4.2 answers. Especially in the afternoon, evening and night, nearly 88% requests were answered in average approximately 10 min, with more than 4 answers per request. In terms of participation motivation, we found the top active crowdworkers were more driven by intrinsic motivations rather than any of the extrinsic incentives (game-based incentives and social incentives) we designed.

...read moreread less

Journal Article•DOI•

Temperament clusters in a normal population: implications for health and disease.

[...]

Jaana Wessman¹, Stefan Schönauer¹, Jouko Miettunen², Hannu Turunen³, Pekka Parviainen¹, Jouni K. Seppänen¹, Eliza Congdon⁴, Markku Koiranen², Jesper Ekelund⁵, Jesper Ekelund³, Jaana Laitinen⁶, Anja Taanila², Tuija Tammelin⁶, Mirka Hintsanen⁷, Laura Pulkki-Råback⁷, Liisa Keltikangas-Järvinen⁷, Jorma Viikari⁸, Olli T. Raitakari⁸, Matti Joukamaa⁹, Marjo-Riitta Järvelin¹⁰, Nelson B. Freimer⁴, Leena Peltonen¹¹, Leena Peltonen⁷, Leena Peltonen³, Juha Veijola², Heikki Mannila¹, Tiina Paunio⁵, Tiina Paunio³ - Show less +24 more•Institutions (11)

Helsinki Institute for Information Technology¹, University of Oulu², National Institutes of Health³, University of California, Los Angeles⁴, Helsinki University Central Hospital⁵, Finnish Institute of Occupational Health⁶, University of Helsinki⁷, University of Turku⁸, University of Tampere⁹, Imperial College London¹⁰, Wellcome Trust Sanger Institute¹¹

18 Jul 2012-PLOS ONE

TL;DR: This study shows that the temperament subscales do not distribute randomly but have an endogenous structure, and that these patterns have strong associations to health, life events, and well-being.

...read moreread less

Abstract: Background The object of this study was to identify temperament patterns in the Finnish population, and to determine the relationship between these profiles and life habits, socioeconomic status, and health.

...read moreread less

Journal Article•DOI•

Identifying targets of multiple co-regulating transcription factors from expression time-series by Bayesian model comparison

[...]

Michalis K. Titsias¹, Antti Honkela², Antti Honkela³, Neil D. Lawrence⁴, Magnus Rattray⁴ - Show less +1 more•Institutions (4)

Wellcome Trust Centre for Human Genetics¹, Aalto University², Helsinki Institute for Information Technology³, University of Sheffield⁴

30 May 2012-BMC Systems Biology

TL;DR: It is shown that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity.

...read moreread less

Abstract: Complete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-specific networks involving a few interacting transcription factors (TFs) and all of their target genes.

...read moreread less

Proceedings Article•

Local structure discovery in Bayesian networks

[...]

Teppo Niinimäki¹, Pekka Parviainen²•Institutions (2)

Helsinki Institute for Information Technology¹, Royal Institute of Technology²

14 Aug 2012

TL;DR: In this article, a score-based local learning algorithm called SLL is proposed to learn the structure near the target variables and not interested in the rest of the variables, which is theoretically sound in the sense that it is optimal in the limit of large sample size.

...read moreread less

Abstract: Learning a Bayesian network structure from data is an NP-hard problem and thus exact algorithms are feasible only for small data sets. Therefore, network structures for larger networks are usually learned with various heuristics. Another approach to scaling up the structure learning is local learning. In local learning, the modeler has one or more target variables that are of special interest; he wants to learn the structure near the target variables and is not interested in the rest of the variables. In this paper, we present a score-based local learning algorithm called SLL. We conjecture that our algorithm is theoretically sound in the sense that it is optimal in the limit of large sample size. Empirical results suggest that SLL is competitive when compared to the constraint-based HITON algorithm. We also study the prospects of constructing the network structure for the whole node set based on local results by presenting two algorithms and comparing them to several heuristics.

...read moreread less

Posted Content•

A simple approach for finding the globally optimal Bayesian network structure

[...]

Tomi Silander¹, Petri Myllymäki¹•Institutions (1)

Helsinki Institute for Information Technology¹

27 Jun 2012-arXiv: Artificial Intelligence

TL;DR: It is shown that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases and offers a possibility for efficient exploration of the best networks consistent with different variable orderings.

...read moreread less

Abstract: We study the problem of learning the best Bayesian network structure with respect to a decomposable score such as BDe, BIC or AIC. This problem is known to be NP-hard, which means that solving it becomes quickly infeasible as the number of variables increases. Nevertheless, in this paper we show that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases. Our algorithm is less complicated and more efficient than the techniques presented earlier. It can be easily parallelized, and offers a possibility for efficient exploration of the best networks consistent with different variable orderings. In the experimental part of the paper we compare the performance of the algorithm to the previous state-of-the-art algorithm. Free source-code and an online-demo can be found at this http URL.

...read moreread less

Posted Content•

An Improved Admissible Heuristic for Learning Optimal Bayesian Networks

[...]

Changhe Yuan¹, Brandon Malone²•Institutions (2)

Queens College¹, Helsinki Institute for Information Technology²

16 Oct 2012-arXiv: Artificial Intelligence

TL;DR: In this paper, an improved admissible heuristic that tries to avoid directed cycles within small groups of variables is introduced to improve the efficiency and scalability of A* and BFBnB.

...read moreread less

Abstract: Recently two search algorithms, A* and breadth-first branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses optimal parents independently. As a result, the heuristic may contain many directed cycles and result in a loose bound. This paper introduces an improved admissible heuristic that tries to avoid directed cycles within small groups of variables. A sparse representation is also introduced to store only the unique optimal parent choices. Empirical results show that the new techniques significantly improved the efficiency and scalability of A* and BFBnB on most of datasets tested in this paper.

...read moreread less

Journal Article•DOI•

Inducing involuntary musical imagery: An experimental study

[...]

Lassi A. Liikkanen¹, Lassi A. Liikkanen²•Institutions (2)

Stanford University¹, Helsinki Institute for Information Technology²

18 May 2012-Musicae Scientiae

TL;DR: In this paper, a cued-recall method was used to induce involuntary musical imagery (INMI) and delayed self-reports in a large sample of people, and the prevalence of the phenomenon was considerable.

...read moreread less

Abstract: It is still a mystery why we sometimes experience the repetition of memories in our minds. This phenomenon seems to be particularly prominent in music. We believe that present lack of knowledge relates to the lack of methods available for the study of this topic. To improve the understanding of involuntary musical imagery (INMI), this paper proposes a novel method to induce it in experimental settings. We report three experiments that were conducted to evaluate two research questions related to INMI: Can it be experimentally induced, and if so, which factors influence its emergence? Investigation particularly focused on how recent activation of musical memory might predict INMI. The questions were tested in single-trial experiments conducted over the internet. The experiments utilized a cued-recall method to induce INMI and delayed self-reports. Among a large sample of people, the prevalence of the phenomenon was considerable. When the familiarity with the stimuli was controlled for, inducing INMI experim...

...read moreread less

Journal Article•

Cloud Computing : Centralization and Data Sovereignty

[...]

Primavera De Filippi, Smari McCarthy¹•Institutions (1)

Helsinki Institute for Information Technology¹

15 Feb 2012-European journal of law and technology

TL;DR: In this article, the authors discuss the collection of personal information in the cloud and the legitimate exploitation of it by third parties, and propose a solution to the problem of privacy protection in the Cloud.

...read moreread less

Posted Content•

Factorized Multi-Modal Topic Model

[...]

Seppo Virtanen¹, Yangqing Jia², Arto Klami¹, Trevor Darrell²•Institutions (2)

Helsinki Institute for Information Technology¹, University of California, Berkeley²

16 Oct 2012-arXiv: Learning

TL;DR: In this paper, the authors combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics, which is shown to be especially useful for querying the contents of one domain given samples of the other.

...read moreread less

Abstract: Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data, multiple variants of topic models attempting to tie the modalities together have been presented. All of these, however, lack the ability to learn components private to one modality, and consequently will try to force dependencies even between minimally correlating modalities. In this work we combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics. The model is shown to be especially useful for querying the contents of one domain given samples of the other.

...read moreread less