scispace - formally typeset
Search or ask a question

Showing papers by "Kai Puolamäki published in 2014"


Journal ArticleDOI
TL;DR: An efficient iterative algorithm to find the attributes and dependencies used by any classifier when making predictions is proposed and the empirical investigation shows that the novel algorithm is indeed able to find groupings of interacting attributes exploited by the different classifiers.
Abstract: Classifiers are often opaque and cannot easily be inspected to gain understanding of which factors are of importance. We propose an efficient iterative algorithm to find the attributes and dependencies used by any classifier when making predictions. The performance and utility of the algorithm is demonstrated on two synthetic and 26 real-world datasets, using 15 commonly used learning algorithms to generate the classifiers. The empirical investigation shows that the novel algorithm is indeed able to find groupings of interacting attributes exploited by the different classifiers. These groupings allow for finding similarities among classifiers for a single dataset as well as for determining the extent to which different classifiers exploit such interactions in general.

108 citations


Journal ArticleDOI
TL;DR: It is suggested that species with evolutionary novelties arise predominantly in “species factories” that develop under harsh environmental conditions, under dominant physical forcing, whereas exceptionally mild environments give rise to “oases in the desert,” characterized by strong competition and survival of relics.
Abstract: The relative weights of physical forcing and biotic interaction as drivers of evolutionary change have been debated in evolutionary theory. The recent finding that species, genera, clades, and chronofaunas all appear to exhibit a symmetrical pattern of waxing and waning lends support to the view that biotic interactions shape the history of life. Yet, there is similarly abundant evidence that these primary units of biological evolution arise and wane in coincidence with major climatic change. We review these patterns and the process-level explanations offered for them. We also propose a tentative synthesis, characterized by interdependence between physical forcing and biotic interactions. We suggest that species with evolutionary novelties arise predominantly in “species factories” that develop under harsh environmental conditions, under dominant physical forcing, whereas exceptionally mild environments give rise to “oases in the desert,” characterized by strong competition and survival of relics.

92 citations


Journal ArticleDOI
TL;DR: The novel problem of finding the smallest set of patterns that explains most about the data in terms of a global p value is studied and it is found that a greedy algorithm gives good results on real data and that it can formulate and solve many known data-mining tasks.
Abstract: Hypothesis testing using constrained null models can be used to compute the significance of data mining results given what is already known about the data. We study the novel problem of finding the smallest set of patterns that explains most about the data in terms of a global p value. The resulting set of patterns, such as frequent patterns or clusterings, is the smallest set that statistically explains the data. We show that the newly formulated problem is, in its general form, NP-hard and there exists no efficient algorithm with finite approximation ratio. However, we show that in a special case a solution can be computed efficiently with a provable approximation ratio. We find that a greedy algorithm gives good results on real data and that, using our approach, we can formulate and solve many known data-mining tasks. We demonstrate our method on several data mining tasks. We conclude that our framework is able to identify in various settings a small set of patterns that statistically explains the data and to formulate data mining problems in the terms of statistical significance.

41 citations


Journal ArticleDOI
01 Jul 2014-Sleep
TL;DR: The findings suggest that HRV spectral power reflects vigilant attention in subjects exposed to partial chronic sleep restriction, and a 3-component alertness model, containing circadian and homeostatic processes coupled with sleep inertia, respectively is studied.
Abstract: STUDY OBJECTIVES Examine the use of spectral heart rate variability (HRV) metrics in measuring sleepiness under chronic partial sleep restriction, and identify underlying relationships between HRV, Karolinska Sleepiness Scale ratings (KSS), and performance on the Psychomotor Vigilance Task (PVT). DESIGN Controlled laboratory study. SETTING Experimental laboratory of the Brain Work Research Centre of the Finnish Institute of Occupational Health, Helsinki, Finland. PARTICIPANTS Twenty-three healthy young males (mean age ± SD = 23.77 ± 2.29). INTERVENTIONS A sleep restriction group (N = 15) was subjected to chronic partial sleep restriction with 4 h sleep for 5 nights. A control group (N = 8) had 8 h sleep on all nights. MEASUREMENTS AND RESULTS Based on a search over all HRV frequency bands in the range [0.00, 0.40] Hz, the band [0.01, 0.08] Hz showed the highest correlation for HRV-PVT (0.60, 95% confidence interval [0.49, 0.69]) and HRV-KSS (0.33, 95% confidence interval [0.16, 0.46]) for the sleep restriction group; no correlation was found for the control group. We studied the fraction of variance in PVT explained by HRV and a 3-component alertness model, containing circadian and homeostatic processes coupled with sleep inertia, respectively. HRV alone explained 33% of PVT variance. CONCLUSIONS The findings suggest that HRV spectral power reflects vigilant attention in subjects exposed to partial chronic sleep restriction. CITATION Henelius A, Sallinen M, Huotilainen M, Muller K, Virkkala J, Puolamaki K. Heart rate variability for evaluating vigilant attention in partial chronic sleep restriction.

29 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the method described can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.
Abstract: Simultaneous confidence intervals, or confidence bands, provide an intuitive description of the variability of a time series. Given a set of $$N$$ N time series of length $$M$$ M , we consider the problem of finding a confidence band that contains a $$(1-\alpha )$$ ( 1 - ? ) -fraction of the observations. We construct such confidence bands by finding the set of $$N\!\!-\!\!K$$ N - K time series whose envelope is minimized. We refer to this problem as the minimum width envelope problem. We show that the minimum width envelope problem is $$\mathbf {NP}$$ NP -hard, and we develop a greedy heuristic algorithm, which we compare to quantile- and distance-based confidence band methods. We also describe a method to find an effective confidence level $$\alpha _{\mathrm {eff}}$$ ? eff and an effective number of observations to remove $$K_{\mathrm {eff}}$$ K eff , such that the resulting confidence bands will keep the family-wise error rate below $$\alpha $$ ? . We evaluate our methods on synthetic and real datasets. We demonstrate that our method can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.

15 citations