scispace - formally typeset
Search or ask a question
Institution

Helsinki Institute for Information Technology

FacilityEspoo, Finland
About: Helsinki Institute for Information Technology is a facility organization based out in Espoo, Finland. It is known for research contribution in the topics: Population & Bayesian network. The organization has 630 authors who have published 1962 publications receiving 63426 citations.


Papers
More filters
Proceedings Article
21 Feb 2015
TL;DR: A new MM procedure is proposed that yields fast MM algorithms for a wide variety of manifold embedding problems and is efficient: in experiments, the newly developed MM algorithms outperformed five state-ofthe-art optimization approaches in manifold embeddedding tasks.
Abstract: Nonlinear dimensionality reduction by manifold embedding has become a popular and powerful approach both for visualization and as preprocessing for predictive tasks, but more efficient optimization algorithms are still crucially needed. MajorizationMinimization (MM) is a promising approach that monotonically decreases the cost function, but it remains unknown how to tightly majorize the manifold embedding objective functions such that the resulting MM algorithms are efficient and robust. We propose a new MM procedure that yields fast MM algorithms for a wide variety of manifold embedding problems. In our majorization step, two parts of the cost function are respectively upper bounded by quadratic and Lipschitz surrogates, and the resulting upper bound can be minimized in closed form. For cost functions amenable to such QL-majorization, the MM yields monotonic improvement and is efficient: In experiments, the newly developed MM algorithms outperformed five state-ofthe-art optimization approaches in manifold embedding tasks.

15 citations

Book ChapterDOI
14 Jul 2007
TL;DR: This work shows that there are practically relevant families of graphs where these problems admit a local distributed approximation algorithm; in a local algorithm each node utilises information from its constant-size neighbourhood only.
Abstract: We study fractional scheduling problems in sensor networks, in particular, sleep scheduling (generalisation of fractional domatic partition) and activity scheduling (generalisation of fractional graph colouring). The problems are hard to solve in general even in a centralised setting; however, we show that there are practically relevant families of graphs where these problems admit a local distributed approximation algorithm; in a local algorithm each node utilises information from its constant-size neighbourhood only. Our algorithm does not need the spatial coordinates of the nodes; it suffices that a subset of nodes is designated as markers during network deployment. Our algorithm can be applied in any marked graph satisfying certain bounds on the marker density; if the bounds are met, guaranteed near-optimal solutions can be found in constant time, space and communication per node.We also show that auxiliary information is necessary--no local algorithm can achieve a satisfactory approximation guarantee on unmarked graphs.

15 citations

Journal ArticleDOI
TL;DR: It is demonstrated that the method described can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.
Abstract: Simultaneous confidence intervals, or confidence bands, provide an intuitive description of the variability of a time series. Given a set of $$N$$ N time series of length $$M$$ M , we consider the problem of finding a confidence band that contains a $$(1-\alpha )$$ ( 1 - ? ) -fraction of the observations. We construct such confidence bands by finding the set of $$N\!\!-\!\!K$$ N - K time series whose envelope is minimized. We refer to this problem as the minimum width envelope problem. We show that the minimum width envelope problem is $$\mathbf {NP}$$ NP -hard, and we develop a greedy heuristic algorithm, which we compare to quantile- and distance-based confidence band methods. We also describe a method to find an effective confidence level $$\alpha _{\mathrm {eff}}$$ ? eff and an effective number of observations to remove $$K_{\mathrm {eff}}$$ K eff , such that the resulting confidence bands will keep the family-wise error rate below $$\alpha $$ ? . We evaluate our methods on synthetic and real datasets. We demonstrate that our method can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.

15 citations

Journal ArticleDOI
02 Oct 2006
TL;DR: It is shown that contrary to related works that deal with the security of spread spectrum and quantisation schemes, for non-iid host signals such as images, principal component analysis is not an appropriate technique to estimate the secret carrier.
Abstract: Security is one of the crucial requirements of a watermarking scheme, because hidden messages such as copyright information are likely to face hostile attacks. In this paper, we question the security of an important class of watermarking schemes based on dither modulation (DM). DM embedding schemes rely on the quantisation of a secret component according to an embedded message, and the strategies used to improve the security of these schemes are the use of a dither vector and the use of a secret carrier. In this paper we show that contrary to related works that deal with the security of spread spectrum and quantisation schemes, for non-iid host signals such as images, principal component analysis is not an appropriate technique to estimate the secret carrier. We propose the use of a blind source separation technique called independent component analysis (ICA) to estimate and remove the watermark. In the case of DM embedding, the watermark signal corresponds to a quantisation noise independent of the host signal. An attacking methodology using ICA is presented for digital images; this attack consists first in estimating the secret carrier by an examination of the high-order statistics of the independent components and second in removing the embedded message by erasing the component related to the watermark. The ICA-based attack scheme is compared with a classical attack that has been proposed for attacking DM schemes. The results reported in this paper demonstrate how changes in natural image statistics can be used to detect watermarks and devise attacks. Different implementations of DM watermarking schemes such as pixel, DCT and spread transform-DM embedding can be attacked successfully. Our attack provides an accurate estimate of the secret key and an average improvement of 2 dB in comparison with optimal additive attacks. Such natural image statistics-based attacks may pose a serious threat against watermarking schemes which are based on quantisation techniques.

15 citations

Journal ArticleDOI
TL;DR: This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence and applies this approach to the discovery of spaced dyads in DNA sequences.
Abstract: The discovery of surprisingly frequent patterns is of paramount interest in bioinformatics and computational biology. Among the patterns considered, those consisting of pairs of solid words that co-occur within a prescribed maximum distance -or gapped factors- emerge in a variety of contexts of DNA and protein sequence analysis. A few algorithms and tools have been developed in connection with specific formulations of the problem, however, none can handle comprehensively each of the multiple ways in which the distance between the two terms in a pair may be defined. This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence. Whereas the number of such pairs in a sequence of n characters can be Θ(n4), it is shown that an exhaustive discovery process can be carried out in O(n2) or O(n3), depending on the way distance is measured. This is made possible by a prudent combination of properties of pattern maximality and monotonicity of scores, which lead to reduce the number of word pairs to be weighed explicitly, while still producing also the scores attained by any of the pairs not explicitly considered. We applied our approach to the discovery of spaced dyads in DNA sequences. Experiments on biological datasets prove that the method is effective and much faster than exhaustive enumeration of candidate patterns. Software is available freely by academic users via the web interface at http://bcb.dei.unipd.it:8080/dyweb .

15 citations


Authors

Showing all 632 results

NameH-indexPapersCitations
Dimitri P. Bertsekas9433285939
Olli Kallioniemi9035342021
Heikki Mannila7229526500
Jukka Corander6641117220
Jaakko Kangasjärvi6214617096
Aapo Hyvärinen6130144146
Samuel Kaski5852214180
Nadarajah Asokan5832711947
Aristides Gionis5829219300
Hannu Toivonen5619219316
Nicola Zamboni5312811397
Jorma Rissanen5215122720
Tero Aittokallio522718689
Juha Veijola5226119588
Juho Hamari5117616631
Network Information
Related Institutions (5)
Google
39.8K papers, 2.1M citations

93% related

Microsoft
86.9K papers, 4.1M citations

93% related

Carnegie Mellon University
104.3K papers, 5.9M citations

91% related

Facebook
10.9K papers, 570.1K citations

91% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20231
20224
202185
202097
2019140
2018127