Institution

Helsinki Institute for Information Technology

Facility•Espoo, Finland•

About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.

...read moreread less

Topics: Population, Bayesian network, The Internet, Mobile computing, Cluster analysis ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Top-k overlapping densest subgraphs

[...]

Esther Galbrun¹, Aristides Gionis², Nikolaj Tatti²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Helsinki Institute for Information Technology²

01 Sep 2016-Data Mining and Knowledge Discovery

TL;DR: This paper reformulates the problem definition in a way that it is able to obtain an algorithm with constant-factor approximation guarantee, and presents a new approach that improves over the existing techniques, both in theory and practice.

...read moreread less

Abstract: Finding dense subgraphs is an important problem in graph mining and has many practical applications. At the same time, while large real-world networks are known to have many communities that are not well-separated, the majority of the existing work focuses on the problem of finding a single densest subgraph. Hence, it is natural to consider the question of finding the top-kdensest subgraphs. One major challenge in addressing this question is how to handle overlaps: eliminating overlaps completely is one option, but this may lead to extracting subgraphs not as dense as it would be possible by allowing a limited amount of overlap. Furthermore, overlaps are desirable as in most real-world graphs there are vertices that belong to more than one community, and thus, to more than one densest subgraph. In this paper we study the problem of finding top-koverlapping densest subgraphs, and we present a new approach that improves over the existing techniques, both in theory and practice. First, we reformulate the problem definition in a way that we are able to obtain an algorithm with constant-factor approximation guarantee. Our approach relies on using techniques for solving the max-sum diversification problem, which however, we need to extend in order to make them applicable to our setting. Second, we evaluate our algorithm on a collection of benchmark datasets and show that it convincingly outperforms the previous methods, both in terms of quality and efficiency.

...read moreread less

56 citations

Proceedings Article•

Causal discovery of linear acyclic models with arbitrary distributions

[...]

Patrik O. Hoyer¹, Aapo Hyvärinen¹, Richard Scheines², Peter Spirtes², Joseph D. Ramsey², Gustavo Lacerda², Shohei Shimizu³ - Show less +3 more•Institutions (3)

Helsinki Institute for Information Technology¹, Carnegie Mellon University², Osaka University³

09 Jul 2008

TL;DR: This paper generalize and combine the two approaches to Independent Component Analysis, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible.

...read moreread less

Abstract: An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.

...read moreread less

56 citations

Journal Article•DOI•

Hidden Roles of the Train Driver: A Challenge for Metro Automation

[...]

Hannu Karvonen¹, Iina Aaltonen¹, Mikael Wahlström², Leena Salo¹, Paula Savioja¹, Leena Norros¹ - Show less +2 more•Institutions (2)

VTT Technical Research Centre of Finland¹, Helsinki Institute for Information Technology²

01 Jul 2011-Interacting with Computers

TL;DR: If the identified critical roles of the drivers are not accounted for, a migration to a fully automated metro system can affect the quality of service and raise safety issues, according to the conclusion of this research.

...read moreread less

56 citations

Journal Article•DOI•

From black and white to full color: extending redescription mining outside the Boolean world

[...]

Esther Galbrun¹, Pauli Miettinen²•Institutions (2)

Helsinki Institute for Information Technology¹, Max Planck Society²

01 Aug 2012-Statistical Analysis and Data Mining

TL;DR: This paper extends redescription mining to categorical and real‐valued data with possibly missing values using a surprisingly simple and efficient approach and shows the statistical significance of the results using recent innovations on randomization methods.

...read moreread less

Abstract: Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding such redescriptors, a task known as niche-finding, is of much importance in biology. Current redescription mining methods cannot handle other than Boolean data. This restricts the range of possible applications or makes discretization a pre-requisite, entailing a possibly harmful loss of information. In niche-finding, while the fauna can be naturally represented using a Boolean presence/absence data, the weather cannot. In this paper, we extend redescription mining to categorical and real-valued data with possibly missing values using a surprisingly simple and efficient approach. We provide extensive experimental evaluation to study the behavior of the proposed algorithm. Furthermore, we show the statistical significance of our results using recent innovations on randomization methods. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 (Part of this work was done when the author was with HIIT.)

...read moreread less

56 citations

Journal Article•DOI•

Breeze: an integrated quality control and data analysis application for high-throughput drug screening

[...]

Swapnil Potdar¹, Aleksandr Ianevski², Aleksandr Ianevski¹, John Patrick Mpindi¹, Dmitrii Bychkov¹, Clément Fiere¹, Philipp Ianevski¹, Bhagwan Yadav¹, Krister Wennerberg³, Krister Wennerberg¹, Tero Aittokallio⁴, Tero Aittokallio², Tero Aittokallio¹, Olli Kallioniemi¹, Olli Kallioniemi⁵, Jani Saarela¹, Päivi Östling⁵, Päivi Östling¹ - Show less +14 more•Institutions (5)

University of Helsinki¹, Helsinki Institute for Information Technology², University of Copenhagen³, University of Turku⁴, Science for Life Laboratory⁵

01 Jun 2020-Bioinformatics

TL;DR: The Breeze application provides a complete solution for data quality assessment, dose–response curve fitting and quantification of the drug responses along with interactive visualization of the results.

...read moreread less

Abstract: Summary High-throughput screening (HTS) enables systematic testing of thousands of chemical compounds for potential use as investigational and therapeutic agents. HTS experiments are often conducted in multi-well plates that inherently bear technical and experimental sources of error. Thus, HTS data processing requires the use of robust quality control procedures before analysis and interpretation. Here, we have implemented an open-source analysis application, Breeze, an integrated quality control and data analysis application for HTS data. Furthermore, Breeze enables a reliable way to identify individual drug sensitivity and resistance patterns in cell lines or patient-derived samples for functional precision medicine applications. The Breeze application provides a complete solution for data quality assessment, dose-response curve fitting and quantification of the drug responses along with interactive visualization of the results. Availability and implementation The Breeze application with video tutorial and technical documentation is accessible at https://breeze.fimm.fi; the R source code is publicly available at https://github.com/potdarswapnil/Breeze under GNU General Public License v3.0. Contact swapnil.potdar@helsinki.fi. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

56 citations

Collapse

Authors

Showing all 632 results

Name	H-index	Papers	Citations
Dimitri P. Bertsekas	94	332	85939
Olli Kallioniemi	90	353	42021
Heikki Mannila	72	295	26500
Jukka Corander	66	411	17220
Jaakko Kangasjärvi	62	146	17096
Aapo Hyvärinen	61	301	44146
Samuel Kaski	58	522	14180
Nadarajah Asokan	58	327	11947
Aristides Gionis	58	292	19300
Hannu Toivonen	56	192	19316
Nicola Zamboni	53	128	11397
Jorma Rissanen	52	151	22720
Tero Aittokallio	52	271	8689
Juha Veijola	52	261	19588
Juho Hamari	51	176	16631

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

93% related

Microsoft

86.9K papers, 4.1M citations

38.6K papers, 1.3M citations

92% related

Carnegie Mellon University

104.3K papers, 5.9M citations

91% related

Facebook

10.9K papers, 570.1K citations

91% related

Performance

Metrics

1,967

Papers

76,126

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	1
2022	4
2021	85
2020	97
2019	140
2018	127