scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Proceedings Article
01 Apr 2014
TL;DR: This work presents a novel CMF solution that allows each of the matrices to have a separate low-rank structure that is independent of the other matrices, as well as structures that are shared only by a subset of them.
Abstract: CMF is a technique for simultaneously learning low-rank representations based on a collection of matrices with shared entities. A typical example is the joint modeling of user-item, item-property, and user-feature matrices in a recommender system. The key idea in CMF is that the embeddings are shared across the matrices, which enables transferring information between them. The existing solutions, however, break down when the individual matrices have low-rank structure not shared with others. In this work we present a novel CMF solution that allows each of the matrices to have a separate low-rank structure that is independent of the other matrices, as well as structures that are shared only by a subset of them. We compare MAP and variational Bayesian solutions based on alternating optimization algorithms and show that the model automatically infers the nature of each factor using group-wise sparsity. Our approach supports in a principled way continuous, binary and count observations and is efficient for sparse matrices involving missing data. We illustrate the solution on a number of examples, focusing in particular on an interesting use-case of augmented multi-view learning.

44 citations

Journal ArticleDOI
TL;DR: Reduced representation bisulfite sequencing on red blood cell derived DNA showed genome-wide temporal changes in more than 40,000 out of the 522,643 CpG sites examined, and sites that showed a temporal and treatment-specific response in DNA methylation are candidate sites of interest for future studies trying to understand the link betweenDNA methylation patterns and timing of reproduction.
Abstract: In seasonal environments, timing of reproduction is a trait with important fitness consequences, but we know little about the molecular mechanisms that underlie the variation in this trait. Recentl ...

44 citations

Journal ArticleDOI
TL;DR: EEL will predict the location and structure of conserved enhancers after being provided with two orthologous DNA sequences and binding specificity matrices for the transcription factors (TFs) that are expected to contribute to the function of the enhancers to be identified.
Abstract: This protocol describes the use of Enhancer Element Locator (EEL), a computer program that was designed to locate distal enhancer elements in long mammalian sequences. EEL will predict the location and structure of conserved enhancers after being provided with two orthologous DNA sequences and binding specificity matrices for the transcription factors (TFs) that are expected to contribute to the function of the enhancers to be identified. The freely available EEL software can analyze two 1-Mb sequences with 100 TF motifs in about 15 min on a modern Windows, Linux or Mac computer. The output provides several hypotheses about enhancer location and structure for further evaluation by an expert on enhancer function.

44 citations

Journal ArticleDOI
01 Sep 2018
TL;DR: This work presents a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column, and shows that retention order is much better conserved between instruments than retention time.
Abstract: Motivation Liquid Chromatography (LC) followed by tandem Mass Spectrometry (MS/MS) is one of the predominant methods for metabolite identification. In recent years, machine learning has started to transform the analysis of tandem mass spectra and the identification of small molecules. In contrast, LC data is rarely used to improve metabolite identification, despite numerous published methods for retention time prediction using machine learning. Results We present a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column. Our method has important advantages over previous approaches: We show that retention order is much better conserved between instruments than retention time. To this end, our method can be trained using retention time measurements from different LC systems and configurations without tedious pre-processing, significantly increasing the amount of available training data. Our experiments demonstrate that retention order prediction is an effective way to learn retention behaviour of molecules from heterogeneous retention time data. Finally, we demonstrate how retention order prediction and MS/MS-based scores can be combined for more accurate metabolite identifications when analyzing a complete LC-MS/MS run. Availability and implementation Implementation of the method is available at https://version.aalto.fi/gitlab/bache1/retention_order_prediction.git.

44 citations

Journal ArticleDOI
TL;DR: This work shows that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and noninformative wavelet coefficients, respectively, and suggests two refinements, adding a code-length for the model index, and extending the model in order to account for subband-dependent coefficient distributions.
Abstract: We refine and extend an earlier minimum description length (MDL) denoising criterion for wavelet-based denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and noninformative wavelet coefficients, respectively. This suggests two refinements, adding a code-length for the model index, and extending the model in order to account for subband-dependent coefficient distributions. A third refinement is the derivation of soft thresholding inspired by predictive universal coding with weighted mixtures. We propose a practical method incorporating all three refinements, which is shown to achieve good performance and robustness in denoising both artificial and natural signals.

44 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127