Institution

Helsinki Institute for Information Technology

Facility•Espoo, Finland•

About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.

...read moreread less

Topics: Population, Bayesian network, Mobile computing, The Internet, Approximation algorithm ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

[...]

Ferdinando Cicalese¹, Travis Gagie², Emanuele Giaquinta³, Eduardo Sany Laber⁴, Zsuzsanna Lipták⁵, Romeo Rizzi⁵, Alexandru I. Tomescu² - Show less +3 more•Institutions (5)

University of Salerno¹, Helsinki Institute for Information Technology², University of Helsinki³, Pontifical Catholic University of Rio de Janeiro⁴, University of Verona⁵

19 Apr 2013-arXiv: Data Structures and Algorithms

TL;DR: In this paper, the authors consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists, and they show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match.

...read moreread less

Abstract: We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

...read moreread less

20 citations

Journal Article•DOI•

Archetypal Analysis for Nominal Observations

[...]

Sohan Seth¹, Manuel J. A. Eugster¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 May 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This article views archetypal analysis in a generative framework: this allows explicit control over choosing a suitable number of archetypes by assigning appropriate prior information, and finding efficient update rules using variational Bayes'.

...read moreread less

Abstract: Archetypal analysis is a popular exploratory tool that explains a set of observations as compositions of few ‘pure’ patterns. The standard formulation of archetypal analysis addresses this problem for real valued observations by finding the approximate convex hull. Recently, a probabilistic formulation has been suggested which extends this framework to other observation types such as binary and count. In this article we further extend this framework to address the general case of nominal observations which includes, for example, multiple-option questionnaires. We view archetypal analysis in a generative framework: this allows explicit control over choosing a suitable number of archetypes by assigning appropriate prior information, and finding efficient update rules using variational Bayes’. We demonstrate the efficacy of this approach extensively on simulated data, and three real world examples: Austrian guest survey dataset, German credit dataset, and SUN attribute image dataset.

...read moreread less

20 citations

Journal Article•DOI•

To What Extent We Repeat Ourselves? Discovering Daily Activity Patterns Across Mobile App Usage

[...]

Tong Li¹, Yong Li², Mohammad A. Hoque³, Tong Xia², Sasu Tarkoma⁴, Pan Hui³ - Show less +2 more•Institutions (4)

Hong Kong University of Science and Technology¹, Tsinghua University², University of Helsinki³, Helsinki Institute for Information Technology⁴

04 Sep 2020-IEEE Transactions on Mobile Computing

TL;DR: A framework to discover daily cyber activity patterns across people's mobile app usage is proposed, which shows that people usually follow yesterday's activity patterns, but the patterns tend to deviate as the time-lapse increases.

...read moreread less

Abstract: With the prevalence of smartphones, people have left abundant behavior records in cyberspace. Discovering and understanding individuals' cyber activities can provide useful implications for policymakers, service providers, and app developers. In this paper, we propose a framework to discover daily cyber activity patterns across people's mobile app usage. We first segment app usage traces into small time windows and then design a probabilistic topic model to infer users' cyber activities of each window. By exploring the coherence of users' activity sequences, the daily patterns of individuals are identified. Next, we recognize the common patterns across diverse groups of individuals using a hierarchical clustering algorithm. We then apply our framework on a large-scale and real-world dataset, consisting of 653,092 users with 971,818,946 usage records of 2,000 popular mobile apps. Our analysis shows that people usually obey yesterday's activity patterns, but the patterns tend to deviate as the time-lapse increases. We also discover five common daily cyber activity patterns, including afternoon reading, nightly entertainment, pervasive socializing, commuting, and nightly socializing. Our findings have profound implications on identifying the demographics of users and their lifestyles, habits, service requirements, and further detecting other disrupting trends such as working overtime and addiction to the game and social media.

...read moreread less

20 citations

Journal Article•DOI•

Error correcting optical mapping data.

[...]

Kingshuk Mukherjee¹, Darshan Washimkar², Martin D. Muggli², Leena Salmela³, Christina Boucher¹ - Show less +1 more•Institutions (3)

University of Florida¹, Colorado State University², Helsinki Institute for Information Technology³

01 Jun 2018-GigaScience

TL;DR: In this article, the authors proposed a method called cOMet to correct the errors in the Rmap data generated from the Escherichia coli K-12 reference genome, which has high prevision and corrected 82.49% of insertion errors and 77.38% of deletion errors.

...read moreread less

Abstract: Optical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome . Recently it has been used for scaffolding contigs and for assembly validation for large-scale sequencing projects, including the maize, goat, and Amborella genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data are analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data are numerical and susceptible to inaccuracy. We develop cOMet to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMet has high prevision and corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the Escherichia coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Last, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous and covers a larger fraction of the genome.

...read moreread less

20 citations

Proceedings Article•DOI•

Solving Analogies on Words based on Minimal Complexity Transformation

[...]

Pierre-Alexandre Murena¹, Pierre-Alexandre Murena², Marie Al-Ghossein¹, Jean-Louis Dessalles¹, Antoine Cornuéjols³ - Show less +1 more•Institutions (3)

Télécom ParisTech¹, Helsinki Institute for Information Technology², Agro ParisTech³

09 Jul 2020

TL;DR: A rough estimation of complexity for word analogies and an algorithm to find the optimal transformations of minimal complexity is proposed and compared with state-of-the-art approaches to demonstrate the interest of using complexity to solve analogies on words.

...read moreread less

Abstract: Analogies are 4-ary relations of the form “A is to B as C is to D”. When A, B and C are fixed, we call analogical equation the problem of finding the correct D. A direct applicative domain is Natural Language Processing, in which it has been shown successful on word inflections, such as conjugation or declension. If most approaches rely on the axioms of proportional analogy to solve these equations, these axioms are known to have limitations, in particular in the nature of the considered flections. In this paper, we propose an alternative approach, based on the assumption that optimal word inflections are transformations of minimal complexity. We propose a rough estimation of complexity for word analogies and an algorithm to find the optimal transformations. We illustrate our method on a large-scale benchmark dataset and compare with state-of-the-art approaches to demonstrate the interest of using complexity to solve analogies on words.

...read moreread less

20 citations

Collapse

Authors

Showing all 632 results

Name	H-index	Papers	Citations
Dimitri P. Bertsekas	94	332	85939
Olli Kallioniemi	90	353	42021
Heikki Mannila	72	295	26500
Jukka Corander	66	411	17220
Jaakko Kangasjärvi	62	146	17096
Aapo Hyvärinen	61	301	44146
Samuel Kaski	58	522	14180
Nadarajah Asokan	58	327	11947
Aristides Gionis	58	292	19300
Hannu Toivonen	56	192	19316
Nicola Zamboni	53	128	11397
Jorma Rissanen	52	151	22720
Tero Aittokallio	52	271	8689
Juha Veijola	52	261	19588
Juho Hamari	51	176	16631

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

93% related

Microsoft

86.9K papers, 4.1M citations

38.6K papers, 1.3M citations

92% related

Carnegie Mellon University

104.3K papers, 5.9M citations

91% related

Facebook

10.9K papers, 570.1K citations

91% related

Performance

Metrics

1,967

Papers

76,126

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	1
2022	4
2021	85
2020	97
2019	140
2018	127