Institution

Helsinki Institute for Information Technology

Facility•Espoo, Finland•

About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.

...read moreread less

Topics: Population, Bayesian network, Mobile computing, The Internet, Approximation algorithm ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Majorization-Minimization for Manifold Embedding

[...]

Zhirong Yang¹, Jaakko Peltonen², Samuel Kaski³•Institutions (3)

Aalto University¹, University of Tampere², Helsinki Institute for Information Technology³

21 Feb 2015

TL;DR: A new MM procedure is proposed that yields fast MM algorithms for a wide variety of manifold embedding problems and is efficient: in experiments, the newly developed MM algorithms outperformed five state-ofthe-art optimization approaches in manifold embeddedding tasks.

...read moreread less

Abstract: Nonlinear dimensionality reduction by manifold embedding has become a popular and powerful approach both for visualization and as preprocessing for predictive tasks, but more efficient optimization algorithms are still crucially needed. MajorizationMinimization (MM) is a promising approach that monotonically decreases the cost function, but it remains unknown how to tightly majorize the manifold embedding objective functions such that the resulting MM algorithms are efficient and robust. We propose a new MM procedure that yields fast MM algorithms for a wide variety of manifold embedding problems. In our majorization step, two parts of the cost function are respectively upper bounded by quadratic and Lipschitz surrogates, and the resulting upper bound can be minimized in closed form. For cost functions amenable to such QL-majorization, the MM yields monotonic improvement and is efficient: In experiments, the newly developed MM algorithms outperformed five state-ofthe-art optimization approaches in manifold embedding tasks.

...read moreread less

15 citations

Book Chapter•DOI•

Local approximation algorithms for scheduling problems in sensor networks

[...]

Patrik Floréen¹, Petteri Kaski¹, Topi Musto¹, Jukka Suomela¹•Institutions (1)

Helsinki Institute for Information Technology¹

14 Jul 2007

TL;DR: This work shows that there are practically relevant families of graphs where these problems admit a local distributed approximation algorithm; in a local algorithm each node utilises information from its constant-size neighbourhood only.

...read moreread less

Abstract: We study fractional scheduling problems in sensor networks, in particular, sleep scheduling (generalisation of fractional domatic partition) and activity scheduling (generalisation of fractional graph colouring). The problems are hard to solve in general even in a centralised setting; however, we show that there are practically relevant families of graphs where these problems admit a local distributed approximation algorithm; in a local algorithm each node utilises information from its constant-size neighbourhood only. Our algorithm does not need the spatial coordinates of the nodes; it suffices that a subset of nodes is designated as markers during network deployment. Our algorithm can be applied in any marked graph satisfying certain bounds on the marker density; if the bounds are met, guaranteed near-optimal solutions can be found in constant time, space and communication per node.We also show that auxiliary information is necessary--no local algorithm can achieve a satisfactory approximation guarantee on unmarked graphs.

...read moreread less

15 citations

Journal Article•DOI•

Confidence bands for time series data

[...]

Jussi Korpela¹, Kai Puolamäki¹, Aristides Gionis²•Institutions (2)

Finnish Institute of Occupational Health¹, Helsinki Institute for Information Technology²

01 Sep 2014-Data Mining and Knowledge Discovery

TL;DR: It is demonstrated that the method described can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.

...read moreread less

Abstract: Simultaneous confidence intervals, or confidence bands, provide an intuitive description of the variability of a time series. Given a set of $$N$$ N time series of length $$M$$ M , we consider the problem of finding a confidence band that contains a $$(1-\alpha )$$ ( 1 - ? ) -fraction of the observations. We construct such confidence bands by finding the set of $$N\!\!-\!\!K$$ N - K time series whose envelope is minimized. We refer to this problem as the minimum width envelope problem. We show that the minimum width envelope problem is $$\mathbf {NP}$$ NP -hard, and we develop a greedy heuristic algorithm, which we compare to quantile- and distance-based confidence band methods. We also describe a method to find an effective confidence level $$\alpha _{\mathrm {eff}}$$ ? eff and an effective number of observations to remove $$K_{\mathrm {eff}}$$ K eff , such that the resulting confidence bands will keep the family-wise error rate below $$\alpha $$ ? . We evaluate our methods on synthetic and real datasets. We demonstrate that our method can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.

...read moreread less

15 citations

Journal Article•DOI•

Vulnerability of DM watermarking of non-iid host signals to attacks utilising the statistics of independent components

[...]

Patrick Bas, Jarmo Hurri¹•Institutions (1)

Helsinki Institute for Information Technology¹

02 Oct 2006

TL;DR: It is shown that contrary to related works that deal with the security of spread spectrum and quantisation schemes, for non-iid host signals such as images, principal component analysis is not an appropriate technique to estimate the secret carrier.

...read moreread less

Abstract: Security is one of the crucial requirements of a watermarking scheme, because hidden messages such as copyright information are likely to face hostile attacks. In this paper, we question the security of an important class of watermarking schemes based on dither modulation (DM). DM embedding schemes rely on the quantisation of a secret component according to an embedded message, and the strategies used to improve the security of these schemes are the use of a dither vector and the use of a secret carrier. In this paper we show that contrary to related works that deal with the security of spread spectrum and quantisation schemes, for non-iid host signals such as images, principal component analysis is not an appropriate technique to estimate the secret carrier. We propose the use of a blind source separation technique called independent component analysis (ICA) to estimate and remove the watermark. In the case of DM embedding, the watermark signal corresponds to a quantisation noise independent of the host signal. An attacking methodology using ICA is presented for digital images; this attack consists first in estimating the secret carrier by an examination of the high-order statistics of the independent components and second in removing the embedded message by erasing the component related to the watermark. The ICA-based attack scheme is compared with a classical attack that has been proposed for attacking DM schemes. The results reported in this paper demonstrate how changes in natural image statistics can be used to detect watermarks and devise attacks. Different implementations of DM watermarking schemes such as pixel, DCT and spread transform-DM embedding can be attacked successfully. Our attack provides an accurate estimate of the secret key and an average improvement of 2 dB in comparison with optimal additive attacks. Such natural image statistics-based attacks may pose a serious threat against watermarking schemes which are based on quantisation techniques.

...read moreread less

15 citations

Journal Article•DOI•

Efficient algorithms for the discovery of gapped factors

[...]

Alberto Apostolico¹, Alberto Apostolico², Cinzia Pizzi¹, Esko Ukkonen³, Esko Ukkonen⁴ - Show less +1 more•Institutions (4)

University of Padua¹, Georgia Institute of Technology², Helsinki Institute for Information Technology³, University of Helsinki⁴

23 Mar 2011-Algorithms for Molecular Biology

TL;DR: This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence and applies this approach to the discovery of spaced dyads in DNA sequences.

...read moreread less

Abstract: The discovery of surprisingly frequent patterns is of paramount interest in bioinformatics and computational biology. Among the patterns considered, those consisting of pairs of solid words that co-occur within a prescribed maximum distance -or gapped factors- emerge in a variety of contexts of DNA and protein sequence analysis. A few algorithms and tools have been developed in connection with specific formulations of the problem, however, none can handle comprehensively each of the multiple ways in which the distance between the two terms in a pair may be defined. This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence. Whereas the number of such pairs in a sequence of n characters can be Θ(n4), it is shown that an exhaustive discovery process can be carried out in O(n2) or O(n3), depending on the way distance is measured. This is made possible by a prudent combination of properties of pattern maximality and monotonicity of scores, which lead to reduce the number of word pairs to be weighed explicitly, while still producing also the scores attained by any of the pairs not explicitly considered. We applied our approach to the discovery of spaced dyads in DNA sequences. Experiments on biological datasets prove that the method is effective and much faster than exhaustive enumeration of candidate patterns. Software is available freely by academic users via the web interface at http://bcb.dei.unipd.it:8080/dyweb .

...read moreread less

15 citations

Collapse

Authors

Showing all 632 results

Name	H-index	Papers	Citations
Dimitri P. Bertsekas	94	332	85939
Olli Kallioniemi	90	353	42021
Heikki Mannila	72	295	26500
Jukka Corander	66	411	17220
Jaakko Kangasjärvi	62	146	17096
Aapo Hyvärinen	61	301	44146
Samuel Kaski	58	522	14180
Nadarajah Asokan	58	327	11947
Aristides Gionis	58	292	19300
Hannu Toivonen	56	192	19316
Nicola Zamboni	53	128	11397
Jorma Rissanen	52	151	22720
Tero Aittokallio	52	271	8689
Juha Veijola	52	261	19588
Juho Hamari	51	176	16631

Network Information

Related Institutions (5)

Google

39.8K papers, 2.1M citations

93% related

Microsoft

86.9K papers, 4.1M citations

38.6K papers, 1.3M citations

92% related

Carnegie Mellon University

104.3K papers, 5.9M citations

91% related

Facebook

10.9K papers, 570.1K citations

91% related

Performance

Metrics

1,967

Papers

76,126

Citations

No. of papers from the Institution in previous years
Year	Papers
2023	1
2022	4
2021	85
2020	97
2019	140
2018	127