scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper reports about the fifth edition of the ASP Competition by covering all aspects of the event, ranging from the new design of the competition to an in-depth analysis of the results, including additional analyses that were conceived for measuring the progress of the state of the art, as well as for studying aspects orthogonal to solving technology, such as the effects of modeling.

148 citations

Journal ArticleDOI
TL;DR: This work introduces a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine, and demonstrates that several molecular properties can be predicted to high accuracy and used in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule.
Abstract: Motivation: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. Results: We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. Availability: An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.net/p/fingerid. Contact: markus.heinonen@cs.helsinki.fi

144 citations

Proceedings ArticleDOI
07 Jul 2004
TL;DR: A hierarchical version of these methods for analysis of principal components in discrete data can be interpreted as a discrete version of ICA, and a hierarchical version yielding components at different levels of detail is developed.
Abstract: Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at different levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval.

138 citations

Journal ArticleDOI
TL;DR: The state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction are described and a perspective on further opportunities to make better use of high-dimensional multi-omics profiles are given.
Abstract: In-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input “big data” require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses.

138 citations

Journal ArticleDOI
TL;DR: In this article, the authors propose to decompose the Levy walk behavior observed in human mobility patterns by decomposing them into different classes according to the different transportation modes, such as Walk/Run, Bike, Train/Subway or Car/Taxi/Bus.
Abstract: Human mobility has been empirically observed to exhibit Levy flightcharacteristics and behaviour with power-law distributed jump size. The fundamentalmechanisms behind this behaviour has not yet been fully explained. In thispaper, we propose to explain the Levy walk behaviour observed in humanmobility patterns by decomposing them into different classes according tothe different transportation modes, such as Walk/Run, Bike, Train/Subway orCar/Taxi/Bus. Our analysis is based on two real-life GPS datasets containingapproximately 10 and 20 million GPS samples with transportation mode information.We show that human mobility can be modelled as a mixture of different transportationmodes and that these single movement patterns can be approximated by a lognormaldistribution rather than a power-law distribution. Then, we demonstrate thatthe mixture of the decomposed lognormal flight distributions associated witheach modality is a power-law distribution, providing an explanation to theemergence of Levy Walk patterns that characterize human mobility patterns.

136 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127