Home
/
Authors
/
Martti Juhola

Author

Martti Juhola

Other affiliations: University of Eastern Finland, University UCINF, Karolinska Institutet ...read more

Bio: Martti Juhola is an academic researcher from Tampere University of Technology. The author has contributed to research in topics: Eye movement & Saccadic masking. The author has an hindex of 25, co-authored 250 publications receiving 3198 citations. Previous affiliations of Martti Juhola include University of Eastern Finland & University UCINF.

Papers published on a yearly basis

2022
2021
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983

Papers

PDF

Open Access

More filters

Informal identification of outliers in medical data

[...]

Jorma Laurikkala¹, Martti Juhola, Erna Kentala•Institutions (1)

University of Tampere¹

01 Jan 2000

TL;DR: The removal of outliers increased the descriptive classification accuracy of discriminant analysis functions and nearest neighbour method, while the predictive ability of these methods reduced somewhat.

...read moreread less

Abstract: Informal box plot identification of outliers in realworld medical data was studied. Box plots were used to detect univariate outliers directly whereas the box plotted Mahalanobis distances identified multivariate outliers. Vertigo and female urinary incontinence data were used in the tests. The removal of outliers increased the descriptive classification accuracy of discriminant analysis functions and nearest neighbour method, while the predictive ability of these methods reduced somewhat. Outliers were also evaluated subjectively by expert physicians, who found most of the multivariate outliers to truly be outliers in their area. The experts sometimes disagreed with the method on univariate outliers. This happened, for example, in heterogeneous diagnostic groups where also extreme values are natural. The informal method may be used for straightforward identification of suspicious data or as a tool to collect abnormal cases for an in-depth analysis.

...read moreread less

218 citations

Proceedings Article•DOI•

Stemming and lemmatization in the clustering of finnish text documents

[...]

Tuomo Korenius¹, Jorma Laurikkala¹, Kalervo Järvelin¹, Martti Juhola¹•Institutions (1)

University of Tampere¹

13 Nov 2004

TL;DR: It is concluded that lemmatization is a better word normalization method than stemming, when Finnish text documents are clustered for information retrieval.

...read moreread less

Abstract: Stemming and lemmatization were compared in the clustering of Finnish text documents. Since Finnish is a highly inflectional and agglutinative language, we hypothesized that lemmatization, involving splitting of the compound words, would be more appropriate normalization approach than the straightforward stemming. The relevance of the documents were evaluated with a four-point relevance assessment scale, which was collapsed into binary one by considering all the relevant and only the highly relevant documents relevant, respectively. Experiments with four hierarchical clustering methods supported the hypothesis. The stringent relevance scale showed that lemmatization allowed the single and complete linkage methods to recover especially the highly relevant documents better than stemming. In comparison with stemming, lemmatization together with the average linkage and Ward's methods produced higher precision. We conclude that lemmatization is a better word normalization method than stemming, when Finnish text documents are clustered for information retrieval.

...read moreread less

182 citations

Journal Article•DOI•

On principal component analysis, cosine and Euclidean measures in information retrieval

[...]

Tuomo Korenius¹, Jorma Laurikkala¹, Martti Juhola¹•Institutions (1)

University of Tampere¹

15 Nov 2007-Information Sciences

TL;DR: The single and complete linkage and Ward clustering was applied to Finnish documents utilizing their relevance assessment as a new feature and a connection between the cosine measure and the Euclidean distance was used in association with PCA.

...read moreread less

147 citations

Journal Article•DOI•

Syntactic recognition of ECG signals by attributed finite automata

[...]

Antti Koski¹, Martti Juhola¹, Merik Meriste²•Institutions (2)

University of Eastern Finland¹, University of Tartu²

01 Dec 1995-Pattern Recognition

TL;DR: A syntactic pattern recognition method of electrocardiograms (ECG) is described in which attributed automata are used to execute the analysis of ECG signals.

...read moreread less

119 citations

Journal Article•DOI•

Genotypic Stability, Segregation and Selection in Heteroplasmic Human Cell Lines Containing np 3243 Mutant mtDNA

[...]

Sara Lehtinen¹, N. Hance², A. El Meziane¹, Martti Juhola¹, K M Juhola¹, R Karhu¹, Johannes N. Spelbrink¹, Ian J. Holt², Howard T. Jacobs¹, Howard T. Jacobs³ - Show less +6 more•Institutions (3)

University of Tampere¹, University of Dundee², University of Glasgow³

01 Jan 2000-Genetics

TL;DR: Diversification and shifts of heteroplasmy level are interpreted as resulting from a reorganization of nucleoids containing many copies of the genome, which can themselves be heteroplasmic, and which are faithfully replicated under nuclear genetic control.

...read moreread less

Abstract: The mitochondrial genotype of heteroplasmic human cell lines containing the pathological np 3243 mtDNA mutation, plus or minus its suppressor at np 12300, has been followed over long periods in culture. Cell lines containing various different proportions of mutant mtDNA remained generally at a consistent, average heteroplasmy value over at least 30 wk of culture in nonselective media and exhibited minimal mitotic segregation, with a segregation number comparable with mtDNA copy number (>/=1000). Growth in selective medium of cells at 99% np 3243 mutant mtDNA did, however, allow the isolation of clones with lower levels of the mutation, against a background of massive cell death. As a rare event, cell lines exhibited a sudden and dramatic diversification of heteroplasmy levels, accompanied by a shift in the average heteroplasmy level over a short period (<8 wk), indicating selection. One such episode was associated with a gain of chromosome 9. Analysis of respiratory phenotype and mitochondrial genotype of cell clones from such cultures revealed that stable heteroplasmy values were generally reestablished within a few weeks, in a reproducible but clone-specific fashion. This occurred independently of any straightforward phenotypic selection at the individual cell-clone level. Our findings are consistent with several alternate views of mtDNA organization in mammalian cells. One model that is supported by our data is that mtDNA is found in nucleoids containing many copies of the genome, which can themselves be heteroplasmic, and which are faithfully replicated. We interpret diversification and shifts of heteroplasmy level as resulting from a reorganization of such nucleoids, under nuclear genetic control. Abrupt remodeling of nucleoids in vivo would have major implications for understanding the developmental consequences of heteroplasmy, including mitochondrial disease phenotype and progression.

...read moreread less

70 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•DOI•

Anomaly detection: A survey

[...]

Varun Chandola¹, Arindam Banerjee¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

30 Jul 2009-ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

...read moreread less

9,627 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse