scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Proceedings ArticleDOI
01 Dec 2009
TL;DR: This paper presents the interdomain routing layer and its interplay with the other components of the system, and introduces a new data-oriented congestion control scheme that takes into account the use of storage resources on-path and is fair to multicast flows.
Abstract: Data-oriented networking has attracted research recently, but the efficiency of the state-of-the-art solutions can still be improved. Our work towards this goal is set in a clean-slate architecture consisting of modular rendezvous, routing, and forwarding functions. In this paper we present the interdomain routing layer and its interplay with the other components of the system. The proposed system is built around two types of nodes: forwarding nodes and branching nodes. The forwarding nodes are optimized for throughput with no per-subscription state and no need to change passing packets, while branching nodes contain a large memory for caching and can make complex routing decisions. The amount of storage space and bandwidth can be independently scaled to suit the needs of each network. In the background, topology nodes perform load-balancing and configure routes in each domain using a two-dimensional addressing mechanism. The paths taken by packets adapt to the number of active subscribers to keep the amount of in-network state and latency low. A new data-oriented congestion control scheme is introduced, which takes into account the use of storage resources on-path and is fair to multicast flows.

43 citations

Journal ArticleDOI
TL;DR: In this paper, an approximate inference scheme based on Variational Bayes (VB) was proposed and applied to an existing model of transcript expression inference from RNA-seq data, demonstrating a significant increase in speed with only very small loss in accuracy of expression level estimation.
Abstract: Motivation: Assigning RNA-seq reads to their transcript of origin is a fundamental task in transcript expression estimation. Where ambiguities in assignments exist due to transcripts sharing sequence, e.g. alternative isoforms or alleles, the problem can be solved through probabilistic inference. Bayesian methods have been shown to provide accurate transcript abundance estimates compared with competing methods. However, exact Bayesian inference is intractable and approximate methods such as Markov chain Monte Carlo and Variational Bayes (VB) are typically used. While providing a high degree of accuracy and modelling flexibility, standard implementations can be prohibitively slow for large datasets and complex transcriptome annotations. Results: We propose a novel approximate inference scheme based on VB and apply it to an existing model of transcript expression inference from RNA-seq data. Recent advances in VB algorithmics are used to improve the convergence of the algorithm beyond the standard Variational Bayes Expectation Maximization algorithm. We apply our algorithm to simulated and biological datasets, demonstrating a significant increase in speed with only very small loss in accuracy of expression level estimation. We carry out a comparative study against seven popular alternative methods and demonstrate that our new algorithm provides excellent accuracy and inter-replicate consistency while remaining competitive in computation time.

43 citations

Journal ArticleDOI
TL;DR: This paper considers two generalizations of the Minimum Path Cover Problem dealing with integrating constraints arising from long reads or paired-end reads, and shows that in the case of long reads (subpaths), the generalized problem can be solved in polynomial-time by a reduction to the classical MPC Problem.
Abstract: Multi-assembly problems have gathered much attention in the last years, as Next-Generation Sequencing technologies have started being applied to mixed settings, such as reads from the transcriptome (RNA-Seq), or from viral quasi-species. One classical model that has resurfaced in many multi-assembly methods (e.g. in Cufflinks, ShoRAH, BRANCH, CLASS) is the Minimum Path Cover (MPC) Problem, which asks for the minimum number of directed paths that cover all the nodes of a directed acyclic graph. The MPC Problem is highly popular because the acyclicity of the graph ensures its polynomial-time solvability. In this paper, we consider two generalizations of it dealing with integrating constraints arising from long reads or paired-end reads; these extensions have also been considered by two recent methods, but not fully solved. More specifically, we study the two problems where also a set of subpaths, or pairs of subpaths, of the graph have to be entirely covered by some path in the MPC. We show that in the case of long reads (subpaths), the generalized problem can be solved in polynomial-time by a reduction to the classical MPC Problem. We also consider the weighted case, and show that it can be solved in polynomial-time by a reduction to a min-cost circulation problem. As a side result, we also improve the time complexity of the classical minimum weight MPC Problem. In the case of paired-end reads (pairs of subpaths), the generalized problem becomes NP-hard, but we show that it is fixed-parameter tractable (FPT) in the total number of constraints. This computational dichotomy between long reads and paired-end reads is also a general insight into multi-assembly problems.

43 citations

13 Sep 2017
TL;DR: The Predicting Media Interestingness task as mentioned in this paper, which is running for the second year as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation, is presented.
Abstract: In this paper, the Predicting Media Interestingness task which is running for the second year as part of the MediaEval 2017 Bench-marking Initiative for Multimedia Evaluation, is presented. For the task, participants are expected to create systems that automatically select images and video segments that are considered to be the most interesting for a common viewer. All task characteristics are described, namely the task use case and challenges, the released data set and ground truth, the required participant runs and the evaluation metrics.

43 citations

Proceedings ArticleDOI
22 Oct 2007
TL;DR: A universal conditional NML model is presented, which has minmax optimal properties similar to those of the regular N ML model, but which defines a random process which can be used for prediction and also admits a recursive evaluation for data compression.
Abstract: The NML (normalized maximum likelihood) universal model has certain minmax optimal properties but it has two shortcomings: the normalizing coefficient can be evaluated in a closed form only for special model classes, and it does not define a random process so that it cannot be used for prediction. We present a universal conditional NML model, which has minmax optimal properties similar to those of the regular NML model. However, unlike NML, the conditional NML model defines a random process which can be used for prediction. It also admits a recursive evaluation for data compression. The conditional normalizing coefficient is much easier to evaluate, for instance, for tree machines than the integral of the square root of the Fisher information in the NML model. For Bernoulli distributions, the conditional NML model gives a predictive probability, which behaves like the Krichevsky-Trofimov predictive probability, actually slightly better for extremely skewed strings. For some model classes, it agrees with the predictive probability found earlier by Takimoto and Warmuth, as the solution to a different more restrictive minmax problem. We also calculate the CNML models for the generalized Gaussian regression models, and in particular for the cases where the loss function is quadratic, and show that the CNML model achieves asymptotic optimality in terms of the mean ideal code length. Moreover, the quadratic loss, which represents fitting errors as noise rather than prediction errors, can be shown to be smaller than what can be achieved with the NML as well as with the so-called plug-in or the predictive MDL model.

43 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127