scispace - formally typeset
Search or ask a question

Showing papers by "Helsinki Institute for Information Technology published in 2007"


Proceedings ArticleDOI
27 Aug 2007
TL;DR: The Data-Oriented Network Architecture (DONA) is proposed, which involves a clean-slate redesign of Internet naming and name resolution to adapt to changes in Internet usage.
Abstract: The Internet has evolved greatly from its original incarnation. For instance, the vast majority of current Internet usage is data retrieval and service access, whereas the architecture was designed around host-to-host applications such as telnet and ftp. Moreover, the original Internet was a purely transparent carrier of packets, but now the various network stakeholders use middleboxes to improve security and accelerate applications. To adapt to these changes, we propose the Data-Oriented Network Architecture (DONA), which involves a clean-slate redesign of Internet naming and name resolution.

1,643 citations


Book
01 Jan 2007
TL;DR: Inspired by Kolmogorov's structure function for finite sets as models of data in the algorithmic theory of information, this work adapts the construct to families of probability models to avoid the noncomputability problem.
Abstract: Summary form only. Inspired by Kolmogorov's structure function for finite sets as models of data in the algorithmic theory of information we adapt the construct to families of probability models to avoid the noncomputability problem. The picture of modeling looks then as follows: The models in the family have a double index, where the first specifies a structure, ranging over a finite or a countable set, and the second consists of parameter values, ranging over a continuum. An optimal structure index can be determined by the MDL (Minimum Description Length) principle in a two-part code, where the sum of the code lengths for the structure and the data is minimized. The latter is obtained from the universal NML (Normalized Maximum Likelihood) model for the subfamily of models having a specified structure. The determination of the optimal model in the optimized structure is more difficult. It requires a partition of the parameter space into equivalence classes, each associated with a model, in such a way that the Kullback-Leibler distance between any two adjacent models is equal and that the models are optimally distinguishable from the given amount of data. This notion of distinguishability is a modification of a related idea of Balasubramanian. The particular model, specified by the observed data, is the simplest one that incorporates all the properties in the data that can be extracted with the model class considered.

352 citations


Journal ArticleDOI
TL;DR: It is shown how to estimate non-normalized models defined in the non-negative real domain, i.e. R"+^n", and it is shown that the score matching estimator can be obtained in closed form for some exponential families.

233 citations


Proceedings ArticleDOI
29 Apr 2007
TL;DR: A study at a large IT company shows that mobile information workers frequently migrate work across devices, and workers' strategies of coping with these problems center on the physical handling of devices and cross-device synchronization.
Abstract: A study at a large IT company shows that mobile information workers frequently migrate work across devices (here: smartphones, desktop PCs, laptops). While having multiple devices provides new opportunities to work in the face of changing resource deprivations, the management of devices is often problematic. The most salient problems are posed by 1) the physical effort demanded by various management tasks, 2) anticipating what data or functionality will be needed, and 3) aligning these efforts with work, mobility, and social situations. Workers' strategies of coping with these problems center on two interwoven activities: the physical handling of devices and cross-device synchronization. These aim at balancing risk and effort in immediate and subsequent use. Workers also exhibit subtle ways to handle devices in situ, appropriating their physical and operational properties. The design implications are discussed.

133 citations


Journal ArticleDOI
TL;DR: This work proposes a novel model of service provisioning in ad hoc networks based on the concept of context- aware migratory services, and built TJam, a proof-of-concept migratory service that predicts traffic jams in a given region of a highway by using only car-to-car short-range wireless communication.
Abstract: Ad hoc networks can be used not only as data carriers for mobile devices but also as providers of a new class of services specific to ubiquitous computing environments. Building services in ad hoc networks, however, is challenging due to the rapidly changing operating contexts, which often lead to situations where a node hosting a certain service becomes unsuitable for hosting the service execution any longer. We propose a novel model of service provisioning in ad hoc networks based on the concept of context- aware migratory services. Unlike a regular service that executes always on the same node, a migratory service can migrate to different nodes in the network in order to accomplish its task. The migration is triggered by changes of the operating context, and it occurs transparently to the client application. We designed and implemented a framework for developing migratory services. We built TJam, a proof-of-concept migratory service that predicts traffic jams in a given region of a highway by using only car-to-car short-range wireless communication. The experimental results obtained over an ad hoc network of personal digital assistants (PDAs) show the effectiveness of our approach in the presence of frequent disconnections. We also present simulation results that demonstrate the benefits of migratory services in large-scale networks compared to a statically centralized approach.

124 citations


Journal ArticleDOI
TL;DR: An elegant recursion formula is derived which allows efficient computation of the stochastic complexity in the case of n observations of a single multinomial random variable with K values and the time complexity is O(n+K) as opposed to O(nlognlogK) obtained with the previous results.

109 citations


Proceedings ArticleDOI
13 Jun 2007
TL;DR: This paper shows how the variety of material features expands communicative resources and provide border resources for action, in their peripheral, evocative, and referential function, and how materiality is part of performative action, looking at temporal frames of relevance and emergence in specific events.
Abstract: This paper seeks to develop a better understanding of the contribution of materiality for creativity in collaborative settings, exploring the ways in which it provides resources for persuasive, narrative and experiential interactions. Based on extensive field studies of architectural design workplaces and on examples from art works, we show: how the variety of material features expands communicative resources and provide border resources for action, in their peripheral, evocative, and referential function; how spatiality supports the public availability of artefacts as well as people's direct, bodily engagement with materiality; and finally how materiality is part of performative action, looking at temporal frames of relevance and emergence in specific events. We conclude with implications for the development of novel interface technologies.

107 citations


Proceedings Article
19 Jul 2007
TL;DR: The solution of the network structure optimization problem is highly sensitive to the chosen α parameter value, and explanations for how and why this phenomenon happens are given, and ideas for solving this problem are discussed.
Abstract: BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This non-informative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.

93 citations


Journal ArticleDOI
TL;DR: With mobile devices becoming ubiquitous, the time is ripe to bring sensor data out of close-loop networks into the center of daily urban life and address application-specific, static-sensor deployments to accurately monitor the sensed environment in real time.
Abstract: With mobile devices becoming ubiquitous, the time is ripe to bring sensor data out of close-loop networks into the center of daily urban life The Internet has become a great success because its applications appeal to regular people This isn't the case with sensor networks, which are generally perceived as "something" remote in the forest or on the battlefield With few exceptions, first-generation sensor networks address application-specific, static-sensor deployments to accurately monitor the sensed environment in real time

85 citations


Journal ArticleDOI
01 Apr 2007
TL;DR: This analysis of the organization of experience-related activities in the mass event focuses on the active role of technology-mediated memories in constructing experiences and advocates applications that not only store or capture human experience for sharing or later use but also actively participates in the very construction of experience.
Abstract: To fully appreciate the opportunities provided by interactive and ubiquitous multimedia to record and share experiences, we report on an ethnographic investigation on the settings and nature of human memory and experience at a large-scale event. We studied two groups of spectators at a FIA World Rally Championship in Finland, both equipped with multimedia mobile phones. Our analysis of the organization of experience-related activities in the mass event focuses on the active role of technology-mediated memories in constructing experiences. Continuity, reflexivity with regard to the Self and the group, maintaining and re-creating group identity, protagonism and active spectatorship were important social aspects of the experience and were directly reflected in how multimedia was used. Particularly, we witnessed multimedia-mediated forms of expression, such as staging, competition, storytelling, joking, communicating presence, and portraying others; and the motivation for these stemmed from the engaging, processual, and shared nature of experience. Moreover, we observed how temporality and spatiality provided a platform for constructing experiences. The analysis advocates applications that not only store or capture human experience for sharing or later use but also actively participates in the very construction of experience. The approach conveys several valuable design implications.

84 citations


Book ChapterDOI
13 May 2007
TL;DR: The analysis shows how the game succeeds in fostering players' creativity by exploiting ambiguity and how the players were engaged in a fast-paced competition which resulted in 115 stories and 3142 photos in 1.5 hours.
Abstract: We present a large-scale pervasive game called Manhattan Story Mashup that combines the Web, camera phones, and a large public display. The game introduces a new form of interactive storytelling which lets an unlimited number of players author stories in the Web while a large number of players illustrate the stories with camera phones. This paper presents the first deployment of the game and a detailed analysis of its quantitative and qualitative results. We present details on the game implementation and game set up including practical lessons learnt about this large-scale experiment involving over 300 players in total. The analysis shows how the game succeeds in fostering players' creativity by exploiting ambiguity and how the players were engaged in a fast-paced competition which resulted in 115 stories and 3142 photos in 1.5 hours.

Journal ArticleDOI
TL;DR: This work investigates how users interpret cues of other users' situations as a situation, action, or intention of a remote person and then act on them in everyday social interactions through smartphone-based mobile awareness systems.
Abstract: Mobile awareness systems provide user-controlled and automatic, sensor-derived cues of other users' situations and in that way attempt to facilitate group practices and provide opportunities for social interaction. We are interested in investigating how users interpret these cues as a situation, action, or intention of a remote person and then act on them in everyday social interactions. Three field trials utilizing A-B intervention research methodology were conducted with three types of teenager groups (N = 15, total days = 243). Each trial had a slightly different variation of Context Contacts-a smartphone-based multicue mobile awareness system. We report on several analyses on how the cues were accessed, viewed, monitored, inferred, and acted on.

Journal ArticleDOI
TL;DR: This study exposed the human cell lines A549, Beas-2B and Met5A to crocidolite asbestos and determined time-dependent gene expression profiles by using Affymetrix arrays, and identified chromosomal regions enriched with genes potentially contributing to common responses to asbestos in these cell lines.
Abstract: Asbestos has been shown to cause chromosomal damage and DNA aberrations. Exposure to asbestos causes many lung diseases e.g. asbestosis, malignant mesothelioma, and lung cancer, but the disease-related processes are still largely unknown. We exposed the human cell lines A549, Beas-2B and Met5A to crocidolite asbestos and determined time-dependent gene expression profiles by using Affymetrix arrays. The hybridization data was analyzed by using an algorithm specifically designed for clustering of short time series expression data. A canonical correlation analysis was applied to identify correlations between the cell lines, and a Gene Ontology analysis method for the identification of enriched, differentially expressed biological processes. We recognized a large number of previously known as well as new potential asbestos-associated genes and biological processes, and identified chromosomal regions enriched with genes potentially contributing to common responses to asbestos in these cell lines. These include genes such as the thioredoxin domain containing gene (TXNDC) and the potential tumor suppressor, BCL2/adenovirus E1B 19kD-interacting protein gene (BNIP3L), GO-terms such as "positive regulation of I-kappaB kinase/NF-kappaB cascade" and "positive regulation of transcription, DNA-dependent", and chromosomal regions such as 2p22, 9p13, and 14q21. We present the complete data sets as Additional files. This study identifies several interesting targets for further investigation in relation to asbestos-associated diseases.

Journal ArticleDOI
TL;DR: This model is novel in that it is the first to analyze optimal subspace size and how this size is influenced by contrast normalization and shows that the optimal nonlinearity for the pooling is squaring.
Abstract: In previous work, we presented a statistical model of natural images that produced outputs similar to receptive fields of complex cells in primary visual cortex. However, a weakness of that model was that the structure of the pooling was assumed a priori and not learned from the statistical properties of natural images. Here, we present an extended model in which the pooling nonlinearity and the size of the subspaces are optimized rather than fixed, so we make much fewer assumptions about the pooling. Results on natural images indicate that the best probabilistic representation is formed when the size of the subspaces is relatively large, and that the likelihood is considerably higher than for a simple linear model with no pooling. Further, we show that the optimal nonlinearity for the pooling is squaring. We also highlight the importance of contrast gain control for the performance of the model. Our model is novel in that it is the first to analyze optimal subspace size and how this size is influenced by contrast normalization.

Proceedings ArticleDOI
29 Apr 2007
TL;DR: The relationship of functionalities of the artifact and the development of resources are discussed by presenting how functionalities can be designed to support three ways to appropriate communication technologies: increasing technical mastery, re-channeling existing communication into the new medium and inventing new communicative acts between users.
Abstract: Technologies can be used - or appropriated - in different ways by different users, but how do the use patterns evolve, and how can design facilitate such evolution? This paper approaches these questions in light of a case study in which a group of 8 high school students used Comeks, a mobile comic strip creator that enables users to exchange rich, expressive multimedia messages. A qualitative analysis of the use processes shows how users turned the functionalities embodied in Comeks into particular resources for communication during the 9-week trial period. The paper discusses the relationship of functionalities of the artifact and the development of resources by presenting how functionalities can be designed to support three ways to appropriate communication technologies: increasing technical mastery, re-channeling existing communication into the new medium and inventing new communicative acts between users.

Journal ArticleDOI
TL;DR: This work describes how it constructed a federated database infrastructure for genotype and phenotype information collected in seven European countries and Australia and connected this database setting via a network called TwinNET to guarantee effortless data exchange and pooled analyses.
Abstract: Integration of complex data and data management represent major challenges in large-scale biobank-based post-genome era research projects like GenomEUtwin (an international collaboration between eight Twin Registries) with extensive amounts of genotype and phenotype data combined from different data sources located in different countries. The challenge lies not only in data harmonization and constant update of clinical details in various locations, but also in the heterogeneity of data storage and confidentiality of sensitive health-related and genetic data. Solid infrastructure must be built to provide secure, but easily accessible and standardized, data exchange also facilitating statistical analyses of the stored data. Data collection sites desire to have full control of the accumulation of data, and at the same time the integration should facilitate effortless slicing and dicing of the data for different types of data pooling and study designs. Here we describe how we constructed a federated database infrastructure for genotype and phenotype information collected in seven European countries and Australia and connected this database setting via a network called TwinNET to guarantee effortless data exchange and pooled analyses. This federated database system offers a powerful facility for combining different types of information from multiple data sources. The system is transparent to end users and application developers, since it makes the set of federated data sources look like a single system. The user need not be aware of the format or site where the data are stored, the language or programming interface of the data source, how the data are physically stored, whether they are partitioned and/or replicated or what networking protocols are used. The user sees a single standardized interface with the desired data elements for pooled analyses.

Journal ArticleDOI
TL;DR: This work studies the approximability and inapproximability of finding identifying codes and locating-dominating codes of the minimum size and shows that it is possible to approximate both problems within a logarithmic factor, but sublogarithic approximation ratios are intractable.

Proceedings ArticleDOI
22 Oct 2007
TL;DR: A universal conditional NML model is presented, which has minmax optimal properties similar to those of the regular N ML model, but which defines a random process which can be used for prediction and also admits a recursive evaluation for data compression.
Abstract: The NML (normalized maximum likelihood) universal model has certain minmax optimal properties but it has two shortcomings: the normalizing coefficient can be evaluated in a closed form only for special model classes, and it does not define a random process so that it cannot be used for prediction. We present a universal conditional NML model, which has minmax optimal properties similar to those of the regular NML model. However, unlike NML, the conditional NML model defines a random process which can be used for prediction. It also admits a recursive evaluation for data compression. The conditional normalizing coefficient is much easier to evaluate, for instance, for tree machines than the integral of the square root of the Fisher information in the NML model. For Bernoulli distributions, the conditional NML model gives a predictive probability, which behaves like the Krichevsky-Trofimov predictive probability, actually slightly better for extremely skewed strings. For some model classes, it agrees with the predictive probability found earlier by Takimoto and Warmuth, as the solution to a different more restrictive minmax problem. We also calculate the CNML models for the generalized Gaussian regression models, and in particular for the cases where the loss function is quadratic, and show that the CNML model achieves asymptotic optimality in terms of the mean ideal code length. Moreover, the quadratic loss, which represents fitting errors as noise rather than prediction errors, can be shown to be smaller than what can be achieved with the NML as well as with the so-called plug-in or the predictive MDL model.

Proceedings ArticleDOI
06 Jun 2007
TL;DR: It is shown that if h is not constant, the problem is NP-hard; the hardness of approximation is shown and a pseudopolynomial-time algorithm is given for some rectilinear versions of the problem.
Abstract: study the problem of flnding shortest non-crossing thick paths in a polygonal domain, where a thick path is the Min- kowski sum of a usual (zero-thickness, or thin) path and a disk. Given K pairs of terminals on the boundary of a sim- ple n-gon, we compute in O(n + K) time a representation of the set of K shortest non-crossing thick paths joining the terminal pairs; using the representation, any particular path can be output in time proportional to its complexity. We compute K shortest thick non-crossing paths in a polygon with h holes in O (K + 1) h h!poly(n;K) time, us- ing an e-cient method to compute any one of the K thick paths if the \threadings" of all paths amidst the holes are specifled. We show that if h is not constant, the problem is NP-hard; we also show the hardness of approximation. We give a pseudopolynomial-time algorithm for some rectilinear versions of the problem. We apply our thick paths algorithms to obtain the flrst algorithmic results for the minimum-cost continuous ∞ow problem | an extension of the standard discrete minimum- cost network ∞ow problem to continuous domains. The re- sults are based on showing a continuous analog of the Net-


Book ChapterDOI
09 Sep 2007
TL;DR: A statistical model is presented that learns a nonlinear representation from the data that reflects abstract, invariant properties of the signal without making requirements about the kind of signal that can be processed.
Abstract: Capturing regularities in high-dimensional data is an important problem in machine learning and signal processing. Here we present a statistical model that learns a nonlinear representation from the data that reflects abstract, invariant properties of the signal without making requirements about the kind of signal that can be processed. The model has a hierarchy of two layers, with the first layer broadly corresponding to Independent Component Analysis (ICA) and a second layer to represent higher order structure. We estimate the model using the mathematical framework of Score Matching (SM), a novel method for the estimation of non-normalized statistical models. The model incorporates a squaring nonlinearity, which we propose to be suitable for forming a higher-order code of invariances. Additionally the squaring can be viewed as modelling subspaces to capture residual dependencies, which linear models cannot capture.

Journal ArticleDOI
TL;DR: A generative mixture model is introduced, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases, and it is shown that 7% of the HERVs are active.
Abstract: Background Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown.

Journal ArticleDOI
TL;DR: This project has designed and implemented a system platform and application prototype running on smart phones to support a hybrid approach that enhances context-aware service provisioning with peer-to-peer social functionalities in the DYNAMOS project.

Proceedings Article
11 Mar 2007
TL;DR: A new search strategy, in which the information retrieval (IR) query is inferred from eye movements measured when the user is reading text during an IR task, such that relevance predictions for a large set of unseen documents are ranked significantly better than by random guessing.
Abstract: We introduce a new search strategy, in which the information retrieval (IR) query is inferred from eye movements measured when the user is reading text during an IR task. In training phase, we know the users' interest, that is, the relevance of training documents. We learn a predictor that produces a "query" given the eye movements; the target of learning is an "optimal" query that is computed based on the known relevance of the training documents. Assuming the predictor is universal with respect to the users' interests, it can also be applied to infer the implicit query when we have no prior knowledge of the users' interests. The result of an empirical study is that it is possible to learn the implicit query from a small set of read documents, such that relevance predictions for a large set of unseen documents are ranked significantly better than by random guessing.

Proceedings ArticleDOI
02 Jul 2007
TL;DR: This work considers the solution of discounted optimal stopping problems using linear function approximation methods and proposes alternative algorithms, which are based on projected value iteration ideas and least squares, which prove the convergence of some of these algorithms.
Abstract: We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.

Proceedings ArticleDOI
13 Jun 2007
TL;DR: In this article, the authors introduce the method of dramaturgical reading, which was originally a method of producing different crystallized and associative theatrical and graphical presentations of a role character in a drama context.
Abstract: In this paper we introduce the method of dramaturgical reading, which was originally a method of producing different crystallized and associative theatrical and graphical presentations of a role character in a drama context. We transfer dramaturgical reading into the field of user-centered design in order to understand, analyze and represent user-centered material. We compare a persona created with dramaturgical reading to a user profile and persona. We state that adapting a role character as an embodied and concrete user description in user-centered design improves the designers' ability to empathize and understand the users, thus improving the results of the design process. We believe personas must be enabled to "come to life" and allowed to develop in the minds of the designers using them. The dramaturgical method is one way of accomplishing this.

Proceedings ArticleDOI
27 Aug 2007
TL;DR: Performance measurements of HIP over WLAN on Nokia 770 Internet Tablet are presented and comprehensive analysis of the results are provided and suggestions on HIP suitability for lightweight clients are made.
Abstract: The Host Identity Protocol (HIP) is being standardized by the IETF as a new solution for host mobility and multihoming in the Internet. HIP uses self-certifying public-private key pairs in combination with IPsec to authenticate hosts and protect user data. While there are three open-source HIP implementations, no experience is available with running HIP on lightweight hardware such as a PDA or a mobile phone. Limited computational power and battery lifetime of lightweight devices raises concerns if HIP can be used there at all. This paper presents performance measurements of HIP over WLAN on Nokia 770 Internet Tablet. It also provides comprehensive analysis of the results and makes suggestions on HIP suitability for lightweight clients.

Journal ArticleDOI
TL;DR: The aim is to create an understanding about categorization practices in design through a case study about the virtual community, Habbo Hotel, and a qualitative analysis highlighted not only the meaning of the "average user," but also the work that both the developer and the category contribute to this meaning.
Abstract: The "user" is an ambiguous concept in human-computer interaction and information systems. Analyses of users as social actors, participants, or configured users delineate approaches to studying design-use relationships. Here, a developer's reference to a figure of speech, termed the "average user," is contrasted with design guidelines. The aim is to create an understanding about categorization practices in design through a case study about the virtual community, Habbo Hotel. A qualitative analysis highlighted not only the meaning of the "average user," but also the work that both the developer and the category contribute to this meaning. The average user a) represents the unknown, b) influences the boundaries of the target user groups, c) legitimizes the designer to disregard marginal user feedback, and d) keeps the design space open, thus allowing for creativity. The analysis shows how design and use are intertwined and highlights the developers' role in governing different users' interests.

Journal ArticleDOI
TL;DR: There is potential for improvement in XML messaging by using an asynchronous programming style and by using a compact serialization format, and the design and implementation of a messaging system that addresses these requirements is presented.

Book ChapterDOI
12 Mar 2007
TL;DR: In the experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also fasterthan the well-known lookahead scoring algorithm.
Abstract: Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho-Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho-Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.