Institution

Helsinki Institute for Information Technology

Facility•Espoo, Finland•

About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.

...read moreread less

Topics: Population, Bayesian network, Mobile computing, The Internet, Approximation algorithm ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Safe and Complete Contig Assembly Through Omnitigs

[...]

Alexandru I. Tomescu¹, Paul Medvedev²•Institutions (2)

Helsinki Institute for Information Technology¹, Pennsylvania State University²

01 Jun 2017-Journal of Computational Biology

TL;DR: This article answers the question of what strings can be safely reported from G as contigs using a model in which the genome is a circular covering walk and gives a polynomial-time algorithm to find such strings, which are called omnitigs.

...read moreread less

Abstract: Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs—a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question remains: given a genome graph G (e.g., a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs? In this article, we answer this question using a model in which the genome is a circular covering walk. We also give a polynomial-time algorithm to find such strings, which we call omnitigs. Our experiments show that omnitigs are 66%–82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.

...read moreread less

27 citations

Journal Article•DOI•

Hierarchical architectures in structured peer-to-peer overlay networks

[...]

Dmitry Korzun¹, Dmitry Korzun², Andrei Gurtov¹•Institutions (2)

Helsinki Institute for Information Technology¹, Petrozavodsk State University²

01 Dec 2014-Peer-to-peer Networking and Applications

TL;DR: A framework consisting of conceptual models of network hierarchy, multi-layer hierarchical DHT architectures, principles affecting the design choices, and cost models for system tradeoff analysis, performance evaluation, and scalability estimation for structured P2P overlay networks is introduced.

...read moreread less

Abstract: Distributed Hash Tables (DHT) are presently used in several large-scale systems in the Internet and envisaged as a key mechanism to provide identifier-locator separation for mobile hosts in Future Internet. Such P2P-based systems become increasingly complex serving popular social networking, resource sharing applications, and Internet-scale infrastructures. Hierarchy is a standard mechanism for coping with heterogeneity and scalability in distributed systems. To address the shortcomings of flat DHT designs, many hierarchical P2P designs have been proposed over recent years. The last generation is hierarchical DHTs (HDHTs) where nodes are organized onto layers and groups. This article discusses hierarchical architectures applied in structured P2P overlay networks, focusing on HDHT designs. We introduce a framework consisting of conceptual models of network hierarchy, multi-layer hierarchical DHT architectures, principles affecting the design choices, and cost models for system tradeoff analysis, performance evaluation, and scalability estimation. Based on the framework we provide a taxonomy and survey more than 20 hierarchical HDHT proposals.

...read moreread less

27 citations

Proceedings Article•

Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures

[...]

Tomi Silander, Janne Leppä-aho¹, Elias Jääsaari¹, Elias Jääsaari², Teemu Roos¹ - Show less +1 more•Institutions (2)

Helsinki Institute for Information Technology¹, Aalto University²

31 Mar 2018

TL;DR: An information theoretic criterion for Bayesian network structure learning which is called quotient normalized maximum likelihood (qNML), which satisfies the property of score equivalence and is decomposable and completely free of adjustable hyperparameters.

...read moreread less

Abstract: We introduce an information theoretic criterion for Bayesian network structure learning which we call quotient normalized maximum likelihood (qNML). In contrast to the closely related factorized normalized maximum likelihood criterion, qNML satisfies the property of score equivalence. It is also decomposable and completely free of adjustable hyperparameters. For practical computations, we identify a remarkably accurate approximation proposed earlier by Szpankowski and Weinberger. Experiments on both simulated and real data demonstrate that the new criterion leads to parsimonious models with good predictive accuracy.

...read moreread less

27 citations

Posted Content•

Bayesian exponential family projections for coupled data sources

[...]

Arto Klami¹, Seppo Virtanen¹, Samuel Kaski¹•Institutions (1)

Helsinki Institute for Information Technology¹

15 Mar 2012-arXiv: Learning

TL;DR: In this paper, the authors extend the EPCA model toolbox by presenting the first exponential family multi-view learning methods of the partial least squares and canonical correlation analysis, based on a unified representation of EPCAs as matrix factorization of the natural parameters of exponential family.

...read moreread less

Abstract: Exponential family extensions of principal component analysis (EPCA) have received a considerable amount of attention in recent years, demonstrating the growing need for basic modeling tools that do not assume the squared loss or Gaussian distribution. We extend the EPCA model toolbox by presenting the first exponential family multi-view learning methods of the partial least squares and canonical correlation analysis, based on a unified representation of EPCA as matrix factorization of the natural parameters of exponential family. The models are based on a new family of priors that are generally usable for all such factorizations. We also introduce new inference strategies, and demonstrate how the methods outperform earlier ones when the Gaussianity assumption does not hold.

...read moreread less

27 citations

Journal Article•DOI•

Live and learn from mistakes: A lightweight system for document classification

[...]

Yevgen Borodin¹, Valentin Polishchuk², Jalal Mahmud³, I. V. Ramakrishnan¹, Amanda Stent⁴ - Show less +1 more•Institutions (4)

Stony Brook University¹, Helsinki Institute for Information Technology², IBM³, AT&T Labs⁴

01 Jan 2013-Information Processing and Management

TL;DR: The 3LM algorithm did not show over-fitting, while consistently outperforming centroid-based, Naive Bayes, C4.5, AdaBoost, kNN, and SVM whose accuracy had been reported on the same three corpora.

...read moreread less

Abstract: We present a Life-Long Learning from Mistakes (3LM) algorithm for document classification, which could be used in various scenarios such as spam filtering, blog classification, and web resource categorization. We extend the ideas of online clustering and batch-mode centroid-based classification to online learning with negative feedback. The 3LM is a competitive learning algorithm, which avoids over-smoothing, characteristic of the centroid-based classifiers, by using a different class representative, which we call clusterhead. The clusterheads competing for vector-space dominance are drawn toward misclassified documents, eventually bringing the model to a ''balanced state'' for a fixed distribution of documents. Subsequently, the clusterheads oscillate between the misclassified documents, heuristically minimizing the rate of misclassifications, an NP-complete problem. Further, the 3LM algorithm prevents over-fitting by ''leashing'' the clusterheads to their respective centroids. A clusterhead provably converges if its class can be separated by a hyper-plane from all other classes. Lifelong learning with fixed learning rate allows 3LM to adapt to possibly changing distribution of the data and continually learn and unlearn document classes. We report on our experiments, which demonstrate high accuracy of document classification on Reuters21578, OHSUMED, and TREC07p-spam datasets. The 3LM algorithm did not show over-fitting, while consistently outperforming centroid-based, Naive Bayes, C4.5, AdaBoost, kNN, and SVM whose accuracy had been reported on the same three corpora.

...read moreread less

27 citations

Collapse

Authors

Showing all 632 results

Name	H-index	Papers	Citations
Dimitri P. Bertsekas	94	332	85939
Olli Kallioniemi	90	353	42021
Heikki Mannila	72	295	26500
Jukka Corander	66	411	17220
Jaakko Kangasjärvi	62	146	17096
Aapo Hyvärinen	61	301	44146
Samuel Kaski	58	522	14180
Nadarajah Asokan	58	327	11947
Aristides Gionis	58	292	19300
Hannu Toivonen	56	192	19316
Nicola Zamboni	53	128	11397
Jorma Rissanen	52	151	22720
Tero Aittokallio	52	271	8689
Juha Veijola	52	261	19588
Juho Hamari	51	176	16631

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

93% related

Microsoft

86.9K papers, 4.1M citations

38.6K papers, 1.3M citations

92% related

Carnegie Mellon University

104.3K papers, 5.9M citations

91% related

Facebook

10.9K papers, 570.1K citations

91% related

Performance

Metrics

1,967

Papers

76,126

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	1
2022	4
2021	85
2020	97
2019	140
2018	127