Institution
Helsinki Institute for Information Technology
Facility•Espoo, Finland•
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.
Papers published on a yearly basis
Papers
More filters
••
23 Jun 2010TL;DR: Surprisingly, when looking for the top-K best matches, all three approaches show similar behavior in terms of retrieval accuracy for small values of K, which suggests that for the specific application area, a computationally cheaper method, such as SPRING, is sufficient to retrieve the bestTop-K matches.
Abstract: We study the performance of three dynamic programming methods on music retrieval. The methods are designed for time series matching but can be directly applied to retrieval of music. Dynamic Time Warping (DTW) identifies an optimal alignment between two time series, and computes the matching cost corresponding to that alignment. Significant speed-ups can be achieved by constrained Dynamic Time Warping (cDTW), which narrows down the set of positions in one time series that can be matched with specific positions in the other time series. Both methods are designed for full sequence matching but can also be applied for subsequence matching, by using a sliding window over each database sequence to compute a matching score for each database subsequence. In addition, SPRING is a dynamic programming approach designed for subsequence matching, where the query is matched with a database subsequence without requiring the match length to be equal to the query length. SPRING has a lower computational cost than DTW and cDTW. Our database consists of a set of MIDI files taken from the web. Each MIDI file has been converted to a 2-dimensional time series, taking into account both note pitches and durations. We have used synthetic queries of fixed size and different noise levels. Surprisingly, when looking for the top-K best matches, all three approaches show similar behavior in terms of retrieval accuracy for small values of K. This suggests that for the specific application area, a computationally cheaper method, such as SPRING, is sufficient to retrieve the best top-K matches.
22 citations
•
TL;DR: In this article, run-length encoded BWT (RLBWT) and compact directed acyclic word graph (CDAWG) data structures are proposed to provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern and the space taken by the structure.
Abstract: In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.
22 citations
•
TL;DR: Underlying all results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P, which can be significantly less than 2n.
Abstract: We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n, to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P, which can be significantly less than 2n. Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders.
22 citations
••
TL;DR: The results of two experiments show that physiological user signals are associated with relevance and affect and help design personalized systems that can annotate digital content using human physiology without the need for any explicit user interaction.
Abstract: We present physiological text annotation, which refers to the practice of associating physiological responses to text content in order to infer characteristics of the user information needs and affective responses. Text annotation is a laborious task, and implicit feedback has been studied as a way to collect annotations without requiring any explicit action from the user. Previous work has explored behavioral signals, such as clicks or dwell time to automatically infer annotations, and physiological signals have mostly been explored for image or video content. We report on two experiments in which physiological text annotation is studied first to (1) indicate perceived relevance and then to (2) indicate affective responses of the users. The first experiment tackles the user's perception of relevance of an information item, which is fundamental towards revealing the user's information needs. The second experiment is then aimed at revealing the user's affective responses towards a -relevant- text document. Results show that physiological user signals are associated with relevance and affect. In particular, electrodermal activity was found to be different when users read relevant content than when they read irrelevant content and was found to be lower when reading texts with negative emotional content than when reading texts with neutral content. Together, the experiments show that physiological text annotation can provide valuable implicit inputs for personalized systems. We discuss how our findings help design personalized systems that can annotate digital content using human physiology without the need for any explicit user interaction.
22 citations
01 Jan 2005
TL;DR: The conclusion is that representing content and their linking in terms of semantic web ontologies and logic rules is flexible from the system construction viewpoint, can be used to provide the end-user with useful “semantic” services, and can reduce human effort in portal maintenance.
Abstract: This paper presents a case study on how semantic search and browsing techniques can be applied to solving the problems of content discovery, aggregation, and linking in e-government portals. At the same time, adaptability of a semantic portal tool, OntoViews, based on the multi-facet search paradigm, to different kinds of content was tested. Our conclusion is that representing content and their linking in terms of semantic web ontologies and logic rules is flexible from the system construction viewpoint, can be used to provide the end-user with useful “semantic” services, and can reduce human effort in portal maintenance.
22 citations
Authors
Showing all 632 results
Name | H-index | Papers | Citations |
---|---|---|---|
Dimitri P. Bertsekas | 94 | 332 | 85939 |
Olli Kallioniemi | 90 | 353 | 42021 |
Heikki Mannila | 72 | 295 | 26500 |
Jukka Corander | 66 | 411 | 17220 |
Jaakko Kangasjärvi | 62 | 146 | 17096 |
Aapo Hyvärinen | 61 | 301 | 44146 |
Samuel Kaski | 58 | 522 | 14180 |
Nadarajah Asokan | 58 | 327 | 11947 |
Aristides Gionis | 58 | 292 | 19300 |
Hannu Toivonen | 56 | 192 | 19316 |
Nicola Zamboni | 53 | 128 | 11397 |
Jorma Rissanen | 52 | 151 | 22720 |
Tero Aittokallio | 52 | 271 | 8689 |
Juha Veijola | 52 | 261 | 19588 |
Juho Hamari | 51 | 176 | 16631 |