scispace - formally typeset
Search or ask a question

Showing papers by "Helsinki Institute for Information Technology published in 2012"


Journal ArticleDOI
01 Jan 2012
TL;DR: It is found that checking habits occasionally spur users to do other things with the device and may increase usage overall, and supporting habit-formation is an opportunity for making smartphones more “personal” and “pervasive.”
Abstract: Examining several sources of data on smartphone use, this paper presents evidence for the popular conjecture that mobile devices are "habit-forming." The form of habits we identified is called a checking habit: brief, repetitive inspection of dynamic content quickly accessible on the device. We describe findings on kinds and frequencies of checking behaviors in three studies. We found that checking habits occasionally spur users to do other things with the device and may increase usage overall. Data from a controlled field experiment show that checking behaviors emerge and are reinforced by informational "rewards" that are very quickly accessible. Qualitative data suggest that although repetitive habitual use is frequent, it is experienced more as an annoyance than an addiction. We conclude that supporting habit-formation is an opportunity for making smartphones more "personal" and "pervasive."

959 citations


Journal ArticleDOI
TL;DR: The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise and it is shown that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models.
Abstract: We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities.

695 citations


Journal ArticleDOI
TL;DR: An overview of the basic and advanced probabilistic techniques is given, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.
Abstract: Many network solutions and overlay networks utilize probabilistic techniques to reduce information processing and networking costs. This survey article presents a number of frequently used and useful probabilistic techniques. Bloom filters and their variants are of prime importance, and they are heavily used in various distributed systems. This has been reflected in recent research and many new algorithms have been proposed for distributed systems that are either directly or indirectly based on Bloom filters. In this survey, we give an overview of the basic and advanced techniques, reviewing over 20 variants and discussing their application in distributed systems, in particular for caching, peer-to-peer systems, routing and forwarding, and measurement data summarization.

480 citations


Journal ArticleDOI
TL;DR: This work considers four different drug-target interaction networks from humans involving enzymes, ion channels, G-protein-coupled receptors and nuclear receptors and proposes a novel Bayesian formulation that combines dimensionality reduction, matrix factorization and binary classification for predicting drug- target interaction networks.
Abstract: Motivation: Identifying interactions between drug compounds and target proteins has a great practical importance in the drug discovery process for known diseases. Existing databases contain very few experimentally validated drug–target interactions and formulating successful computational methods for predicting interactions remains challenging. Results: In this study, we consider four different drug–target interaction networks from humans involving enzymes, ion channels, G-protein-coupled receptors and nuclear receptors. We then propose a novel Bayesian formulation that combines dimensionality reduction, matrix factorization and binary classification for predicting drug–target interaction networks using only chemical similarity between drug compounds and genomic similarity between target proteins. The novelty of our approach comes from the joint Bayesian formulation of projecting drug compounds and target proteins into a unified subspace using the similarities and estimating the interaction network in that subspace. We propose using a variational approximation in order to obtain an efficient inference scheme and give its detailed derivations. Finally, we demonstrate the performance of our proposed method in three different scenarios: (i) exploratory data analysis using low-dimensional projections, (ii) predicting interactions for the out-of-sample drug compounds and (iii) predicting unknown interactions of the given network. Availability: Software and Supplementary Material are available at http://users.ics.aalto.fi/gonen/kbmf2k. Contact: mehmet.gonen@aalto.fi Supplementary information:Supplementary data are available at Bioinformatics online.

358 citations


Journal ArticleDOI
TL;DR: This work introduces a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine, and demonstrates that several molecular properties can be predicted to high accuracy and used in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule.
Abstract: Motivation: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. Results: We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. Availability: An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.net/p/fingerid. Contact: markus.heinonen@cs.helsinki.fi

144 citations


Journal ArticleDOI
TL;DR: The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) by adapting the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAMchallenges.
Abstract: Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identifies linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufficient conditions for identifiability, a search algorithm that is complete, and a discussion of what can be done when the identifiability conditions are not satisfied. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAMchallenges. The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) and Hyttinen et al. (2010).

112 citations


Journal ArticleDOI
TL;DR: This article introduces a co-design method called Storytelling Group that has been developed and tested in three service design cases and is a quick start for actual design work but still includes users in the process.
Abstract: In this article, we will introduce a co-design method called Storytelling Group that has been developed and tested in three service design cases. Storytelling Group combines collaborative scenario ...

108 citations


Journal ArticleDOI
TL;DR: It is shown that the rapid global dissemination of a single pathogenic bacterial clone results in local variation in measured recombination rates, and possible explanatory variables include the size and time since emergence of each defined sub-population, variation in transmission dynamics due to host movement, and changes in the bacterial genome affecting the propensity for recombination.
Abstract: Background Next-generation sequencing (NGS) is a powerful tool for understanding both patterns of descent over time and space (phylogeography) and the molecular processes underpinning genome divergence in pathogenic bacteria. Here, we describe a synthesis between these perspectives by employing a recently developed Bayesian approach, BRATNextGen, for detecting recombination on an expanded NGS dataset of the globally disseminated methicillin-resistant Staphylococcus aureus (MRSA) clone ST239.

107 citations


Posted Content
TL;DR: In this paper, the authors show that the solution of the network structure optimization problem is highly sensitive to the chosen alpha parameter value, and discuss ideas for solving this problem, but no generally accepted rule for determining the alpha parameter has been suggested.
Abstract: BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size alpha. Unfortunately no generally accepted rule for determining the alpha parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen alpha parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.

92 citations


Journal ArticleDOI
TL;DR: A major finding of this study suggests that the Minimum Description Length (MDL) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures.
Abstract: In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. We believe this finding is a result of using both datasets generated from real-world applications rather than from random processes used in previous studies and learning algorithms to select high-scoring structures rather than selecting random models. Other findings of our study support existing work, e.g., large sample sizes result in learning structures closer to the true underlying structure; the BDeu score is sensitive to the parameter settings; and the fNML performs pretty well on small datasets. We also tested a greedy hill climbing algorithm and observed similar results as the optimal algorithm.

87 citations


Journal ArticleDOI
TL;DR: The authors found that 89.2% of participants reported experiencing repeated involuntary imagery of music at least once a week and that the amount of music practice and listening was positively related to the frequency of involuntary music.
Abstract: Involuntary semantic memories are a new topic in psychology. Initial research has suggested that musical memories are a dominant type of involuntary memory. Interestingly, no comprehensive information exists on the commonality of “earworms”, or repeated involuntary imagery of music (INMI), and its relationship to the engagement with musical activities. The present study investigated these using cross-sectional, retrospective reports from a questionnaire study that was conducted among Finnish internet users (N = 12,519). The analyses of the data revealed that 89.2% of participants reported experiencing this phenomenon at least once a week. The amount of music practice and listening was positively related to the frequency of involuntary music. Women reported elevated levels of involuntary imagery episodes in contrast to men, who reacted differently. In older age-groups the frequency of the incidents decreased among both sexes. People with extensive musical practice history seemed to experience longer musica...

Proceedings Article
21 Mar 2012
TL;DR: A factor analysis model is introduced that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does, and is applied to two data analysis tasks, in neuroimaging and chemical systems biology.
Abstract: We introduce a factor analysis model that summarizes the dependencies between observed variable groups, instead of dependencies between individual variables as standard factor analysis does. A group may correspond to one view of the same set of objects, one of many data sets tied by co-occurrence, or a set of alternative variables collected from statistics tables to measure one property of interest. We show that by assuming groupwise sparse factors, active in a subset of the sets, the variation can be decomposed into factors explaining relationships between the sets and factors explaining away set-specific variation. We formulate the assumptions in a Bayesian model providing the factors, and apply the model to two data analysis tasks, in neuroimaging and chemical systems biology.

Journal ArticleDOI
TL;DR: It is shown that the traveling salesman problem in bounded-degree graphs can be solved in time O((2-ε)n), where ε > 0 depends only on the degree bound but not on the number of cities.
Abstract: We show that the traveling salesman problem in bounded-degree graphs can be solved in time O((2-e)n), where e > 0 depends only on the degree bound but not on the number of cities, n. The algorithm is a variant of the classical dynamic programming solution due to Bellman, and, independently, Held and Karp. In the case of bounded integer weights on the edges, we also give a polynomial-space algorithm with running time O((2-e)n) on bounded-degree graphs. In addition, we present an analogous analysis of Ryser's algorithm for the permanent of matrices with a bounded number of nonzero entries in each column.

Journal ArticleDOI
TL;DR: A user study comparing navigation with information typically provided by currently available handheld AR browsers, to navigation with a digital map, and a combined map and AR condition found no overall difference in task completion time, but found evidence that AR browsers are less useful for navigation in some environment conditions.

Journal ArticleDOI
TL;DR: In this paper, a physiological metric for investigating social experience within a shared gaming context is introduced: Physiological linkage is measured by gathering simultaneous psychophysiological measurements from several players, and the authors discuss various measures used to calculate linkage, the related social processes, and how to use physiological linkage in game experience research.
Abstract: Psychophysiological methodology has been successfully applied to investigate media responses, including the experience of playing digital games. The approach has many benefits for a player experience assessment-it can provide detailed, unbiased, and time-accurate data without interrupting the gameplay. However, gaming can be a highly social activity. This article extends the methodological focus from single player to include multiple simultaneous players. A physiological metric for investigating social experience within a shared gaming context is introduced: Physiological linkage is measured by gathering simultaneous psychophysiological measurements from several players. The authors review how physiological linkage may be associated with social presence among participants in various gaming situations or social contexts. These metrics provide such information about the interaction among participants that is not currently available by any other method. The authors discuss various measures used to calculate linkage, the related social processes, and how to use physiological linkage in game experience research.

Journal ArticleDOI
TL;DR: This paper extends redescription mining to categorical and real‐valued data with possibly missing values using a surprisingly simple and efficient approach and shows the statistical significance of the results using recent innovations on randomization methods.
Abstract: Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding such redescriptors, a task known as niche-finding, is of much importance in biology. Current redescription mining methods cannot handle other than Boolean data. This restricts the range of possible applications or makes discretization a pre-requisite, entailing a possibly harmful loss of information. In niche-finding, while the fauna can be naturally represented using a Boolean presence/absence data, the weather cannot. In this paper, we extend redescription mining to categorical and real-valued data with possibly missing values using a surprisingly simple and efficient approach. We provide extensive experimental evaluation to study the behavior of the proposed algorithm. Furthermore, we show the statistical significance of our results using recent innovations on randomization methods. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 (Part of this work was done when the author was with HIIT.)

Posted Content
TL;DR: The authors combine conditional independencies and independent component analysis to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible.
Abstract: An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.

Proceedings ArticleDOI
01 Dec 2012
TL;DR: This paper considers two kinds of attacks on a DHT, one already known attack and one new kind of an attack, and shows how they can be targeted against Mainline DHT and proposes simple countermeasures against them.
Abstract: Distributed hash tables (DHT) are a key building block for modern P2P content-distribution system, for example in implementing the distributed tracker of BitTorrent Mainline DHT. DHTs, due to their fully distributed nature, are known to be vulnerable to certain kinds of attacks and different kinds of defenses have been proposed against these attacks. In this paper, we consider two kinds of attacks on a DHT, one already known attack and one new kind of an attack, and show how they can be targeted against Mainline DHT. We complement them by an extensive measurement study using honeypots which shows that both attacks have been going on for a long time in the network and are still happening. We present numbers showing that the number of sybils in the Mainline DHT network is increasing and is currently around 300,000. We analyze the potential threats from these attacks and propose simple countermeasures against them.

Proceedings Article
26 Jun 2012
TL;DR: A fully conjugate Bayesian formulation is proposed and derived, which allows us to combine hundreds or thousands of kernels very efficiently and can be extended for multiclass learning and semi-supervised learning.
Abstract: Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is focused on the computational efficiency issue. However, it is still not feasible to combine many kernels using existing Bayesian approaches due to their high time complexity. We propose a fully conjugate Bayesian formulation and derive a deterministic variational approximation, which allows us to combine hundreds or thousands of kernels very efficiently. We briefly explain how the proposed method can be extended for multiclass learning and semi-supervised learning. Experiments with large numbers of kernels on benchmark data sets show that our inference method is quite fast, requiring less than a minute. On one bioinformatics and three image recognition data sets, our method outperforms previously reported results with better generalization performance.

Posted Content
TL;DR: Cloud computing can be defined as the provision of computing resources on-demand over the Internet, which might bring a number of advantages to end-users in terms of accessibility and elasticity of costs, but problems arise concerning the collection of personal information in the Cloud and the legitimate exploitation thereof.
Abstract: Cloud computing can be defined as the provision of computing resources on-demand over the Internet. Although this might bring a number of advantages to end-users in terms of accessibility and elasticity of costs, problems arise concerning the collection of personal information in the Cloud and the legitimate exploitation thereof. To the extent that most of the content and software application are only accessible online, users have no longer control over the manner in which they can access their data and the extent to which third parties can exploit it.

Book ChapterDOI
13 Jun 2012
TL;DR: The preliminary results from an experiment investigating the perceived intensity of modulated friction created by electrostatic force indicate that there are significant correlations between intensity perception and signal amplitude and the highest sensitivity was found at a frequency of 80 Hz.
Abstract: We describe the preliminary results from an experiment investigating the perceived intensity of modulated friction created by electrostatic force, or electrovibration. A prototype experimental system was created to evaluate user perception of sinusoidal electrovibration stimuli on a flat surface emulating a touch screen interface. We introduce a fixed 6-point Effect Strength Subjective Index (ESSI) as a measure of generic sensation intensity, and compare it with an open magnitude scale. The results of the experiment indicate that there are significant correlations between intensity perception and signal amplitude, and the highest sensitivity was found at a frequency of 80 Hz. The subjective results show that the users perceived the electrovibration stimuli as pleasant and a useful means of feedback for touchscreens.

Journal ArticleDOI
TL;DR: A mobile crowdsourcing platform that is built on top of social media, called UbiAsk, designed for assisting foreign visitors by involving the local crowd to answer their image-based questions at hand in a timely fashion is presented.
Abstract: Recent years have witnessed the impact of crowdsourcing model, social media, and pervasive computing. We believe that the more significant impact is latent in the convergence of these ideas on the mobile platform. In this paper, we introduce a mobile crowdsourcing platform that is built on top of social media. A mobile crowdsourcing application called UbiAsk is presented as one study case. UbiAsk is designed for assisting foreign visitors by involving the local crowd to answer their image-based questions at hand in a timely fashion. Existing social media platforms are used to rapidly allocate microtasks to a wide network of local residents. The resulting data are visualized using a mapping tool as well as augmented reality (AR) technology, result in a visual information pool for public use. We ran a controlled field experiment in Japan for 6 weeks with 55 participants. The results demonstrated a reliable performance on response speed and response quantity: half of the requests were answered within 10 min, 75% of requests were answered within 30 min, and on average every request had 4.2 answers. Especially in the afternoon, evening and night, nearly 88% requests were answered in average approximately 10 min, with more than 4 answers per request. In terms of participation motivation, we found the top active crowdworkers were more driven by intrinsic motivations rather than any of the extrinsic incentives (game-based incentives and social incentives) we designed.

Journal ArticleDOI
18 Jul 2012-PLOS ONE
TL;DR: This study shows that the temperament subscales do not distribute randomly but have an endogenous structure, and that these patterns have strong associations to health, life events, and well-being.
Abstract: Background The object of this study was to identify temperament patterns in the Finnish population, and to determine the relationship between these profiles and life habits, socioeconomic status, and health.

Journal ArticleDOI
TL;DR: It is shown that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity.
Abstract: Complete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-specific networks involving a few interacting transcription factors (TFs) and all of their target genes.

Proceedings Article
14 Aug 2012
TL;DR: In this article, a score-based local learning algorithm called SLL is proposed to learn the structure near the target variables and not interested in the rest of the variables, which is theoretically sound in the sense that it is optimal in the limit of large sample size.
Abstract: Learning a Bayesian network structure from data is an NP-hard problem and thus exact algorithms are feasible only for small data sets. Therefore, network structures for larger networks are usually learned with various heuristics. Another approach to scaling up the structure learning is local learning. In local learning, the modeler has one or more target variables that are of special interest; he wants to learn the structure near the target variables and is not interested in the rest of the variables. In this paper, we present a score-based local learning algorithm called SLL. We conjecture that our algorithm is theoretically sound in the sense that it is optimal in the limit of large sample size. Empirical results suggest that SLL is competitive when compared to the constraint-based HITON algorithm. We also study the prospects of constructing the network structure for the whole node set based on local results by presenting two algorithms and comparing them to several heuristics.

Posted Content
TL;DR: It is shown that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases and offers a possibility for efficient exploration of the best networks consistent with different variable orderings.
Abstract: We study the problem of learning the best Bayesian network structure with respect to a decomposable score such as BDe, BIC or AIC. This problem is known to be NP-hard, which means that solving it becomes quickly infeasible as the number of variables increases. Nevertheless, in this paper we show that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases. Our algorithm is less complicated and more efficient than the techniques presented earlier. It can be easily parallelized, and offers a possibility for efficient exploration of the best networks consistent with different variable orderings. In the experimental part of the paper we compare the performance of the algorithm to the previous state-of-the-art algorithm. Free source-code and an online-demo can be found at this http URL.

Posted Content
TL;DR: In this paper, an improved admissible heuristic that tries to avoid directed cycles within small groups of variables is introduced to improve the efficiency and scalability of A* and BFBnB.
Abstract: Recently two search algorithms, A* and breadth-first branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses optimal parents independently. As a result, the heuristic may contain many directed cycles and result in a loose bound. This paper introduces an improved admissible heuristic that tries to avoid directed cycles within small groups of variables. A sparse representation is also introduced to store only the unique optimal parent choices. Empirical results show that the new techniques significantly improved the efficiency and scalability of A* and BFBnB on most of datasets tested in this paper.

Journal ArticleDOI
TL;DR: In this paper, a cued-recall method was used to induce involuntary musical imagery (INMI) and delayed self-reports in a large sample of people, and the prevalence of the phenomenon was considerable.
Abstract: It is still a mystery why we sometimes experience the repetition of memories in our minds. This phenomenon seems to be particularly prominent in music. We believe that present lack of knowledge relates to the lack of methods available for the study of this topic. To improve the understanding of involuntary musical imagery (INMI), this paper proposes a novel method to induce it in experimental settings. We report three experiments that were conducted to evaluate two research questions related to INMI: Can it be experimentally induced, and if so, which factors influence its emergence? Investigation particularly focused on how recent activation of musical memory might predict INMI. The questions were tested in single-trial experiments conducted over the internet. The experiments utilized a cued-recall method to induce INMI and delayed self-reports. Among a large sample of people, the prevalence of the phenomenon was considerable. When the familiarity with the stimuli was controlled for, inducing INMI experim...

Journal Article
TL;DR: In this article, the authors discuss the collection of personal information in the cloud and the legitimate exploitation of it by third parties, and propose a solution to the problem of privacy protection in the Cloud.
Abstract: Cloud computing can be defined as the provision of computing resources on-demand over the Internet. Although this might bring a number of advantages to end-users in terms of accessibility and elasticity of costs, problems arise concerning the collection of personal information in the Cloud and the legitimate exploitation thereof. To the extent that most of the content and software application are only accessible online, users have no longer control over the manner in which they can access their data and the extent to which third parties can exploit it.

Posted Content
TL;DR: In this paper, the authors combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics, which is shown to be especially useful for querying the contents of one domain given samples of the other.
Abstract: Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data, multiple variants of topic models attempting to tie the modalities together have been presented. All of these, however, lack the ability to learn components private to one modality, and consequently will try to force dependencies even between minimally correlating modalities. In this work we combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics. The model is shown to be especially useful for querying the contents of one domain given samples of the other.