scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Book ChapterDOI
18 Sep 2006
TL;DR: A new pruning method based on combining techniques for closed and non-derivable itemsets that allows further reductions of itemsets and shows that the reduction is significant in some datasets.
Abstract: Itemset mining typically results in large amounts of redundant itemsets. Several approaches such as closed itemsets, non-derivable itemsets and generators have been suggested for losslessly reducing the amount of itemsets. We propose a new pruning method based on combining techniques for closed and non-derivable itemsets that allows further reductions of itemsets. This reduction is done without loss of information, that is, the complete collection of frequent itemsets can still be derived from the collection of closed non-derivable itemsets. The number of closed non-derivable itemsets is bound both by the number of closed and the number of non-derivable itemsets, and never exceeds the smaller of these. Our experiments show that the reduction is significant in some datasets.

31 citations

Proceedings ArticleDOI
27 Aug 2013
TL;DR: Sandwich Keyboard is a prototype that folds any three-row keyboard layout and thus, by retaining the finger-to-letter assignment, supports transfer and the detection of key presses from finger release enhances the performance of touch-typing on a multitouch sensor.
Abstract: This Note introduces a keyboard design that affords ten-finger touch typing by utilizing a touch sensor on the back side of a device. Previous work has used physical buttons. Using a touch sensor has the benefit that it retains the form factor and does not insist on a peripheral device. Moreover, any layout can be used. However, it is difficult to hit targets on a flat surface with no haptic feedback. Sandwich Keyboard is a prototype that folds any three-row keyboard layout and thus, by retaining the finger-to-letter assignment, supports transfer. Sandwich Keyboard includes an algorithm for constant adaptation of key targets in the back. We also learned that the detection of key presses from finger release enhances the performance of touch-typing on a multitouch sensor. After eight hours of training, experienced typists of the QWERTY and of the Dvorak Standard Keyboard (DSK) layout reached 26.1 and 46.2 wpm, respectively. We discuss improvements necessary for further increasing both speed and accuracy.

31 citations

Journal ArticleDOI
12 Feb 2014-PLOS ONE
TL;DR: A statistical method which combines the desirable properties of independent component and canonical correlation analysis is proposed and is used to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes.
Abstract: Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation.

30 citations

Journal ArticleDOI
TL;DR: A content-based exploration and retrieval method for whole-metagenome sequencing samples using a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples.
Abstract: Motivation: Over the recent years, the field of whole metagenome shotgun sequencing has witnessed significant growth due to the highthroughput sequencing technologies that allow sequencing genomic samples cheaper, faster, and with better coverage than before. This technical advancement has initiated the trend of sequencing multiple samples in different conditions or environments to explore the similarities and dissimilarities of the microbial communities. Examples include the human microbiome project and various studies of the human intestinal tract. With the availability of ever larger databases of such measurements, finding samples similar to a given query sample is becoming a central operation. Results: In this paper, we develop a content-based exploration and retrieval method for whole metagenome sequencing samples. We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples. We evaluate the performance of the proposed approach on two human gut metagenome data sets as well as human microbiome project metagenomic samples. We observe significant enrichment for diseased gut samples in results of queries with another diseased sample and very high accuracy in discriminating between different body sites even though the method is unsupervised. Availability: A software implementation of the DSM framework is available at https://github.com/HIITMetagenomics/dsm-framework. Contact: sohan.seth@hiit.fi, antti.honkela@hiit.fi

30 citations

Book ChapterDOI
05 Sep 2011
TL;DR: Two distance measures for comparing sequences of interval-based events are introduced which can be used for several data mining tasks such as classification and clustering and show the superiority of Artemis in terms of robustness to high levels of artificially introduced noise.
Abstract: In several application domains, such as sign language, medicine, and sensor networks, events are not necessarily instantaneous but they can have a time duration. Sequences of interval-based events may contain useful domain knowledge; thus, searching, indexing, and mining such sequences is crucial. We introduce two distance measures for comparing sequences of interval-based events which can be used for several data mining tasks such as classification and clustering. The first measure maps each sequence of interval-based events to a set of vectors that hold information about all concurrent events. These sets are then compared using an existing dynamic programming method. The second method, called Artemis, finds correspondence between intervals by mapping the two sequences into a bipartite graph. Similarity is inferred by employing the Hungarian algorithm. In addition, we present a linear-time lowerbound for Artemis. The performance of both measures is tested on data from three domains: sign language, medicine, and sensor networks. Experiments show the superiority of Artemis in terms of robustness to high levels of artificially introduced noise.

30 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127