scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Book ChapterDOI
25 Apr 2010
TL;DR: It is shown that the model outperforms other four biclustering procedures in a large miRNA data set and adds interpretability and information retrieval capability in a case study that highlights the potential and novel role of miR-224 in the association between melanoma and non-Hodgkin lymphoma.
Abstract: Clustering methods are a useful and common first step in gene expression studies, but the results may be hard to interpret We bring in explicitly an indicator of which genes tie each cluster, changing the setup to biclustering Furthermore, we make the indicators hierarchical, resulting in a hierarchy of progressively more specific biclusters A non-parametric Bayesian formulation makes the model rigorous and yet flexible, and computations feasible The formulation additionally offers a natural information retrieval relevance measure that allows relating samples in a principled manner We show that the model outperforms other four biclustering procedures in a large miRNA data set We also demonstrate the model's added interpretability and information retrieval capability in a case study that highlights the potential and novel role of miR-224 in the association between melanoma and non-Hodgkin lymphoma Software is publicly available.

19 citations

Proceedings Article
01 Jan 2020
TL;DR: The methods build on a recent Markov chain Monte Carlo scheme for learning Bayesian networks, which enables efficient approximate sampling from the graph posterior, provided that each node is assigned a small number K of candidate parents.
Abstract: We give methods for Bayesian inference of directed acyclic graphs, DAGs, and the induced causal effects from passively observed complete data. Our methods build on a recent Markov chain Monte Carlo scheme for learning Bayesian networks, which enables efficient approximate sampling from the graph posterior, provided that each node is assigned a small number $K$ of candidate parents. We present algorithmic techniques to significantly reduce the space and time requirements, which make the use of substantially larger values of $K$ feasible. Furthermore, we investigate the problem of selecting the candidate parents per node so as to maximize the covered posterior mass. Finally, we combine our sampling method with a novel Bayesian approach for estimating causal effects in linear Gaussian DAG models. Numerical experiments demonstrate the performance of our methods in detecting ancestor-descendant relations, and in causal effect estimation our Bayesian method is shown to outperform previous approaches.

19 citations

Journal ArticleDOI
26 Nov 2014-PLOS ONE
TL;DR: This work introduces the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field, and uses a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models.
Abstract: A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database

19 citations

Book ChapterDOI
12 Mar 2007
TL;DR: In the experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also fasterthan the well-known lookahead scoring algorithm.
Abstract: Fast search algorithms for finding good instances of patterns given as position specific scoring matrices are developed, and some empirical results on their performance on DNA sequences are reported. The algorithms basically generalize the Aho-Corasick, filtration, and superalphabet techniques of string matching to the scoring matrix search. As compared to the naive search, our algorithms can be faster by a factor which is proportional to the length of the pattern. In our experimental comparison of different algorithms the new algorithms were clearly faster than the naive method and also faster than the well-known lookahead scoring algorithm. The Aho-Corasick technique is the fastest for short patterns and high significance thresholds of the search. For longer patterns the filtration method is better while the superalphabet technique is the best for very long patterns and low significance levels. We also observed that the actual speed of all these algorithms is very sensitive to implementation details.

19 citations

Journal Article
TL;DR: It is concluded that a high quality of insight is afforded by the combination of subjective self-report and objective psychophysiology, satisfying two of three observable domains.
Abstract: We report an evaluation study for a novel learning platform, motivated by the growing need for methods to do assessment of serious game efficacy. The study was a laboratory experiment combining evaluation methods from the fields of learning assessment and psychophysiology. 15 participants used the TARGET game platform for 25 minutes, while the bio-signals electrocardiography, electrodermal activity and facial electromyography were recorded. Learning was scored using pre- and post-test question-based assessments Repeated-measures analysis with Generalised Estimating Equations was used to predict scores by tonic psychophysiological data. Results indicate some learning effect, plus a relationship between mental workload (indexed by electrocardiography) and learning. Notably, the game format itself influences the nature of this relationship. We conclude that a high quality of insight is afforded by the combination of subjective self-report and objective psychophysiology, satisfying two of three observable domains.

19 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127