Home
/
Authors
/
Kai Puolamäki

Author

Kai Puolamäki

Other affiliations: Helsinki Institute of Physics, Helsinki Institute for Information Technology, Finnish Institute of Occupational Health ...read more

Bio: Kai Puolamäki is an academic researcher from University of Helsinki. The author has contributed to research in topics: Supersymmetry & Exploratory data analysis. The author has an hindex of 26, co-authored 122 publications receiving 2259 citations. Previous affiliations of Kai Puolamäki include Helsinki Institute of Physics & Helsinki Institute for Information Technology.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2001
2000
1999
1998
1997

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Latent grouping models for user preference prediction

[...]

Eerika Savia¹, Kai Puolamäki¹, Samuel Kaski¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 Jan 2009-Machine Learning

TL;DR: A probabilistic latent grouping model for predicting the relevance of a document to a user and compares it against a state-of-the-art method, the User Rating Profile model, where only the users have a latent group structure.

...read moreread less

Abstract: We tackle the problem of new users or documents in collaborative filtering. Generalization over users by grouping them into user groups is beneficial when a rating is to be predicted for a relatively new document having only few observed ratings. Analogously, generalization over documents improves predictions in the case of new users. We show that if either users and documents or both are new, two-way generalization becomes necessary. We demonstrate the benefits of grouping of users, grouping of documents, and two-way grouping, with artificial data and in two case studies with real data. We have introduced a probabilistic latent grouping model for predicting the relevance of a document to a user. The model assumes a latent group structure for both users and items. We compare the model against a state-of-the-art method, the User Rating Profile model, where only the users have a latent group structure. We compute the posterior of both models by Gibbs sampling. The Two-Way Model predicts relevance more accurately when the target consists of both new documents and new users. The reason is that generalization over documents becomes beneficial for new documents and at the same time generalization over users is needed for new users.

...read moreread less

14 citations

Book Chapter•DOI•

Interactive Visual Data Exploration with Subjective Feedback

[...]

Kai Puolamäki¹, Bo Kang², Jefrey Lijffijt², Tijl De Bie²•Institutions (2)

Finnish Institute of Occupational Health¹, Ghent University²

19 Sep 2016

TL;DR: A novel generic method for interactive visual exploration of high-dimensional data that employs data randomization with constraints to allow users to flexibly and intuitively express their interests or beliefs using visual interactions that correspond to exactly defined constraints.

...read moreread less

Abstract: Data visualization and iterative/interactive data mining are growing rapidly in attention, both in research as well as in industry. However, integrated methods and tools that combine advanced visualization and data mining techniques are rare, and those that exist are often specialized to a single problem or domain. In this paper, we introduce a novel generic method for interactive visual exploration of high-dimensional data. In contrast to most visualization tools, it is not based on the traditional dogma of manually zooming and rotating data. Instead, the tool initially presents the user with an ‘interesting’ projection of the data and then employs data randomization with constraints to allow users to flexibly and intuitively express their interests or beliefs using visual interactions that correspond to exactly defined constraints. These constraints expressed by the user are then taken into account by a projection-finding algorithm to compute a new ‘interesting’ projection, a process that can be iterated until the user runs out of time or finds that constraints explain everything she needs to find from the data. We present the tool by means of two case studies, one controlled study on synthetic data and another on real census data. The data and software related to this paper are available at http://www.interesting-patterns.net/forsied/interactive-visual-data-exploration-with-subjective-feedback/.

...read moreread less

13 citations

Journal Article•

Correlations and Co-Occurrences of Taxa: the Role of Temporal, Geographic and Taxonomic Restrictions

[...]

Aleksi Kallio, Kai Puolamäki¹, Mikael Fortelius, Heikki Mannila²•Institutions (2)

Aalto University¹, Helsinki University of Technology²

01 Jan 2011-Palaeontologia Electronica

TL;DR: It is argued that before computing the correlations one has to carefully select what is the underlying base set of locations for which the co-occurrence counts, similarity indices, and their significance is computed.

...read moreread less

Abstract: Correlation between occurrences of taxa is a fundamental concept in the analysis of presence-absence data. Such correlations can result from ecologically relevant processes, such as existence and evolution of species communities. Correlations are typically quantified by some sort of similarity index based on co-occurrence counts. We argue that the individual values of a similarity index are not useful as such: rather, we have to be able to estimate the statistical significance of the index value. Secondly, we argue that before computing the correlations one has to carefully select what is the underlying base set of locations for which the co-occurrence counts, similarity indices, and their significance is computed. We demonstrate base set selection with synthetic examples and conclude with an analysis of real data from a large database of fossil land mammals.

...read moreread less

12 citations

Journal Article•DOI•

Detecting virtual concept drift of regressors without ground truth values

[...]

Emilia Oikarinen¹, Henri Elias Tiittanen¹, Andreas Henelius¹, Kai Puolamäki¹•Institutions (1)

University of Helsinki¹

04 Feb 2021-Data Mining and Knowledge Discovery

TL;DR: This paper presents an efficient framework for estimating the generalization error of regression functions, applicable to any family of regression function when the ground truth is unknown, and finds that it performs robustly and is useful for detecting concept drift in datasets in several real-world domains.

...read moreread less

Abstract: Regression analysis is a standard supervised machine learning method used to model an outcome variable in terms of a set of predictor variables. In most real-world applications the true value of the outcome variable we want to predict is unknown outside the training data, i.e., the ground truth is unknown. Phenomena such as overfitting and concept drift make it difficult to directly observe when the estimate from a model potentially is wrong. In this paper we present an efficient framework for estimating the generalization error of regression functions, applicable to any family of regression functions when the ground truth is unknown. We present a theoretical derivation of the framework and empirically evaluate its strengths and limitations. We find that it performs robustly and is useful for detecting concept drift in datasets in several real-world domains.

...read moreread less

12 citations

Journal Article•DOI•

Inferring Intent and Action from Gaze in Naturalistic Behavior: A Review

[...]

Kristian Lukander¹, Miika Toivanen¹, Kai Puolamäki¹•Institutions (1)

Finnish Institute of Occupational Health¹

01 Oct 2017-International Journal of Mobile Human Computer Interaction

TL;DR: The on-goingﻷminiaturizationﻹ miniaturization ofﻴgaze-based-inference-and-tracking-related-technologies﻽� pervasive-wearable-solutionsﻢeveryday-activities-outside- Research-Research- Studies- laboratoriesｿallows-to-be-allowed-for-the-research-community, Â£1.3bn-worth of solutions.

...read moreread less

Abstract: We constantly move our gaze to gather acute visual information from our environment. Conversely, as originally shown by Yarbus in his seminal work, the elicited gaze patterns hold information over our changing attentional focus while performing a task. Recently, the proliferation of machine learning algorithms has allowed the research community to test the idea of inferring, or even predicting action and intent from gaze behaviour. The on-going miniaturization of gaze tracking technologies toward pervasive wearable solutions allows studying inference also in everyday activities outside research laboratories. This paper scopes the emerging field and reviews studies focusing on the inference of intent and action in naturalistic behaviour. While the task-specific nature of gaze behavior, and the variability in naturalistic setups present challenges, gaze-based inference holds a clear promise for machine-based understanding of human intent and future interactive solutions. KeywoRdS Eye Movements, Gaze Tracking, Inference, Intent Modeling, Scoping Study, Task Modeling

...read moreread less

11 citations

1
2
3
4
5
6
…
7
8
9
10
11
12
13
…
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Theory of Island Biogeography

[...]

Jeff Swinebroad, Robert H. MacArthur, Edward O. Wilson

01 Oct 1969-Journal of Wildlife Management

TL;DR: Preface to the Princeton Landmarks in Biology Edition vii Preface xi Symbols used xiii 1.

...read moreread less

Abstract: Preface to the Princeton Landmarks in Biology Edition vii Preface xi Symbols Used xiii 1. The Importance of Islands 3 2. Area and Number of Speicies 8 3. Further Explanations of the Area-Diversity Pattern 19 4. The Strategy of Colonization 68 5. Invasibility and the Variable Niche 94 6. Stepping Stones and Biotic Exchange 123 7. Evolutionary Changes Following Colonization 145 8. Prospect 181 Glossary 185 References 193 Index 201

...read moreread less

14,171 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Book•

Machine Learning : A Probabilistic Perspective

[...]

Kevin P. Murphy

24 Aug 2012

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

8,059 citations

Computer vision : a modern approach = 计算机视觉 : 一种现代的方法

[...]

David Forsyth, Jean Ponce

01 Jan 2004

TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.

...read moreread less

Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

...read moreread less

3,627 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse