Institution

Helsinki Institute for Information Technology

Facility•Espoo, Finland•

About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.

...read moreread less

Topics: Population, Bayesian network, Mobile computing, The Internet, Approximation algorithm ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Benchmarking dynamic time warping for music retrieval

[...]

Jefrey Lijffijt¹, Panagiotis Papapetrou¹, Jaakko Hollmén¹, Vassilis Athitsos²•Institutions (2)

Helsinki Institute for Information Technology¹, University of Texas at Arlington²

23 Jun 2010

TL;DR: Surprisingly, when looking for the top-K best matches, all three approaches show similar behavior in terms of retrieval accuracy for small values of K, which suggests that for the specific application area, a computationally cheaper method, such as SPRING, is sufficient to retrieve the bestTop-K matches.

...read moreread less

Abstract: We study the performance of three dynamic programming methods on music retrieval. The methods are designed for time series matching but can be directly applied to retrieval of music. Dynamic Time Warping (DTW) identifies an optimal alignment between two time series, and computes the matching cost corresponding to that alignment. Significant speed-ups can be achieved by constrained Dynamic Time Warping (cDTW), which narrows down the set of positions in one time series that can be matched with specific positions in the other time series. Both methods are designed for full sequence matching but can also be applied for subsequence matching, by using a sliding window over each database sequence to compute a matching score for each database subsequence. In addition, SPRING is a dynamic programming approach designed for subsequence matching, where the query is matched with a database subsequence without requiring the match length to be equal to the query length. SPRING has a lower computational cost than DTW and cDTW. Our database consists of a set of MIDI files taken from the web. Each MIDI file has been converted to a 2-dimensional time series, taking into account both note pitches and durations. We have used synthetic queries of fixed size and different noise levels. Surprisingly, when looking for the top-K best matches, all three approaches show similar behavior in terms of retrieval accuracy for small values of K. This suggests that for the specific application area, a computationally cheaper method, such as SPRING, is sufficient to retrieve the best top-K matches.

...read moreread less

22 citations

Posted Content•

Composite repetition-aware data structures

[...]

Djamal Belazzougui¹, Djamal Belazzougui², Fabio Cunial², Fabio Cunial¹, Travis Gagie², Travis Gagie¹, Nicola Prezza³, Mathieu Raffinot⁴ - Show less +4 more•Institutions (4)

Helsinki Institute for Information Technology¹, University of Helsinki², University of Udine³, Paris Diderot University⁴

20 Feb 2015-arXiv: Data Structures and Algorithms

TL;DR: In this article, run-length encoded BWT (RLBWT) and compact directed acyclic word graph (CDAWG) data structures are proposed to provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern and the space taken by the structure.

...read moreread less

Abstract: In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.

...read moreread less

22 citations

Journal Article•

Finding optimal Bayesian networks using precedence constraints

[...]

Pekka Parviainen¹, Mikko Koivisto²•Institutions (2)

Royal Institute of Technology¹, Helsinki Institute for Information Technology²

01 Jan 2013-Journal of Machine Learning Research

TL;DR: Underlying all results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P, which can be significantly less than 2n.

...read moreread less

Abstract: We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n, to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P, which can be significantly less than 2n. Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders.

...read moreread less

22 citations

Journal Article•DOI•

Extracting relevance and affect information from physiological text annotation

[...]

Oswald Barral¹, Ilkka Kosunen¹, Tuukka Ruotsalo¹, Michiel M. Spapé¹, Manuel J. A. Eugster¹, Niklas Ravaja¹, Samuel Kaski¹, Giulio Jacucci¹ - Show less +4 more•Institutions (1)

Helsinki Institute for Information Technology¹

01 Dec 2016-User Modeling and User-adapted Interaction

TL;DR: The results of two experiments show that physiological user signals are associated with relevance and affect and help design personalized systems that can annotate digital content using human physiology without the need for any explicit user interaction.

...read moreread less

Abstract: We present physiological text annotation, which refers to the practice of associating physiological responses to text content in order to infer characteristics of the user information needs and affective responses. Text annotation is a laborious task, and implicit feedback has been studied as a way to collect annotations without requiring any explicit action from the user. Previous work has explored behavioral signals, such as clicks or dwell time to automatically infer annotations, and physiological signals have mostly been explored for image or video content. We report on two experiments in which physiological text annotation is studied first to (1) indicate perceived relevance and then to (2) indicate affective responses of the users. The first experiment tackles the user's perception of relevance of an information item, which is fundamental towards revealing the user's information needs. The second experiment is then aimed at revealing the user's affective responses towards a -relevant- text document. Results show that physiological user signals are associated with relevance and affect. In particular, electrodermal activity was found to be different when users read relevant content than when they read irrelevant content and was found to be lower when reading texts with negative emotional content than when reading texts with neutral content. Together, the experiments show that physiological text annotation can provide valuable implicit inputs for personalized systems. We discuss how our findings help design personalized systems that can annotate digital content using human physiology without the need for any explicit user interaction.

...read moreread less

22 citations

Semantic E-government Portals - A Case Study

[...]

Teemu Sidoroff¹, Eero Hyvönen•Institutions (1)

Helsinki Institute for Information Technology¹

01 Jan 2005

TL;DR: The conclusion is that representing content and their linking in terms of semantic web ontologies and logic rules is flexible from the system construction viewpoint, can be used to provide the end-user with useful “semantic” services, and can reduce human effort in portal maintenance.

...read moreread less

Abstract: This paper presents a case study on how semantic search and browsing techniques can be applied to solving the problems of content discovery, aggregation, and linking in e-government portals. At the same time, adaptability of a semantic portal tool, OntoViews, based on the multi-facet search paradigm, to different kinds of content was tested. Our conclusion is that representing content and their linking in terms of semantic web ontologies and logic rules is flexible from the system construction viewpoint, can be used to provide the end-user with useful “semantic” services, and can reduce human effort in portal maintenance.

...read moreread less

22 citations

Collapse

Authors

Showing all 632 results

Name	H-index	Papers	Citations
Dimitri P. Bertsekas	94	332	85939
Olli Kallioniemi	90	353	42021
Heikki Mannila	72	295	26500
Jukka Corander	66	411	17220
Jaakko Kangasjärvi	62	146	17096
Aapo Hyvärinen	61	301	44146
Samuel Kaski	58	522	14180
Nadarajah Asokan	58	327	11947
Aristides Gionis	58	292	19300
Hannu Toivonen	56	192	19316
Nicola Zamboni	53	128	11397
Jorma Rissanen	52	151	22720
Tero Aittokallio	52	271	8689
Juha Veijola	52	261	19588
Juho Hamari	51	176	16631

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

93% related

Microsoft

86.9K papers, 4.1M citations

38.6K papers, 1.3M citations

92% related

Carnegie Mellon University

104.3K papers, 5.9M citations

91% related

Facebook

10.9K papers, 570.1K citations

91% related

Performance

Metrics

1,967

Papers

76,126

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	1
2022	4
2021	85
2020	97
2019	140
2018	127