Home
/
Authors
/
Michal Wozniak

Author

Michal Wozniak

Bio: Michal Wozniak is an academic researcher from Wrocław University of Technology. The author has contributed to research in topics: Random subspace method & Classifier (UML). The author has an hindex of 20, co-authored 121 publications receiving 2470 citations.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2000
1995

Papers

PDF

Open Access

More filters

Advances in Computing, Communications and Informatics (ICACCI)

[...]

Sabu M. Thampi, Michal Wozniak, Oge Marques, Dilip Krishnaswamy, Christian Callegari, Hideyuki Takagi, Zoran Bojkovic, Neeli R. Prasad, Jose M. Alcaraz Calero, Joel J. P. C. Rodrigues, Natarajan Meghanathan, Ravi Sandhu - Show less +8 more

01 Jan 2015

1,187 citations

Journal Article•DOI•

Computer Recognition Systems 3

[...]

Marek Kurzynski¹, Michal Wozniak¹•Institutions (1)

Wrocław University of Technology¹

20 May 2009-Expert Systems

TL;DR: This book is the most comprehensive study of this field and contains a collection of 69 carefully selected articles contributed by experts of pattern recognition with respect to both methodology and applications.

...read moreread less

Abstract: The computer recognition systems are nowadays one of the most promising directions in artificial intelligence. This book is the most comprehensive study of this field. It contains a collection of 69 carefully selected articles contributed by experts of pattern recognition. It reports on current research with respect to both methodology and applications. In particular, it includes the following sections: Features, learning and classifiers, Image processing and computer vision, Speech and word recognition, Medical applications, Miscellaneous applications. This book is a great reference tool for scientists who deal with the problems of designing computer pattern recognition systems. Its target readers can be the as well researchers as students of computer science, artificial intelligence or robotics.

...read moreread less

152 citations

Journal Article•DOI•

Radial-Based Oversampling for Multiclass Imbalanced Data Classification

[...]

Bartosz Krawczyk¹, Michał Koziarski², Michal Wozniak³•Institutions (3)

Virginia Commonwealth University¹, AGH University of Science and Technology², Wrocław University of Technology³

01 Aug 2020-IEEE Transactions on Neural Networks

TL;DR: Results show that by taking into account information coming from all of the classes and conducting a smart oversampling, the MC-RBO algorithm can significantly improve the process of learning from multiclass imbalanced data.

...read moreread less

Abstract: Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls for a better understanding of the relationship among classes. In this paper, we propose multiclass radial-based oversampling (MC-RBO), a novel data-sampling algorithm dedicated to multiclass problems. The main novelty of our method lies in using potential functions for generating artificial instances. We take into account information coming from all of the classes, contrary to existing multiclass oversampling approaches that use only minority class characteristics. The process of artificial instance generation is guided by exploring areas where the value of the mutual class distribution is very small. This way, we ensure a smart oversampling procedure that can cope with difficult data distributions and alleviate the shortcomings of existing methods. The usefulness of the MC-RBO algorithm is evaluated on the basis of extensive experimental study and backed-up with a thorough statistical analysis. Obtained results show that by taking into account information coming from all of the classes and conducting a smart oversampling, we can significantly improve the process of learning from multiclass imbalanced data.

...read moreread less

70 citations

Journal Article•DOI•

Nearest Neighbor Classification for High-Speed Big Data Streams Using Spark

[...]

Sergio Ramírez-Gallego¹, Bartosz Krawczyk², Salvador García¹, Michal Wozniak³, José Manuel Benítez¹, Francisco Herrera¹ - Show less +2 more•Institutions (3)

University of Granada¹, Virginia Commonwealth University², Wrocław University of Technology³

26 Jul 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper proposes an efficient incremental instance selection method for massive data streams that continuously update and remove outdated examples from the case-base, which alleviates the high computational requirements of the original classifier, thus making it suitable for the considered problem.

...read moreread less

Abstract: Mining massive and high-speed data streams among the main contemporary challenges in machine learning. This calls for methods displaying a high computational efficacy, with ability to continuously update their structure and handle ever-arriving big number of instances. In this paper, we present a new incremental and distributed classifier based on the popular nearest neighbor algorithm, adapted to such a demanding scenario. This method, implemented in Apache Spark, includes a distributed metric-space ordering to perform faster searches. Additionally, we propose an efficient incremental instance selection method for massive data streams that continuously update and remove outdated examples from the case-base. This alleviates the high computational requirements of the original classifier, thus making it suitable for the considered problem. Experimental study conducted on a set of real-life massive data streams proves the usefulness of the proposed solution and shows that we are able to provide the first efficient nearest neighbor solution for high-speed big and streaming data.

...read moreread less

67 citations

Journal Article•DOI•

CCR: A combined cleaning and resampling algorithm for imbalanced data classification

[...]

Michał Koziarski¹, Michal Wozniak¹•Institutions (1)

Wrocław University of Technology¹

20 Dec 2017-International Journal of Applied Mathematics and Computer Science

TL;DR: A novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task is described and results indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of Minority examples is considered.

...read moreread less

Abstract: Abstract Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed methods. In such cases the most important issue is often to properly detect minority examples, but at the same time the performance on the majority class cannot be neglected. In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. The proposed method combines cleaning the decision border around minority objects with guided synthetic oversampling. Results of the conducted experimental study indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of minority examples is considered.

...read moreread less

66 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Learning from imbalanced data: open challenges and future directions

[...]

Bartosz Krawczyk¹•Institutions (1)

Wrocław University of Technology¹

22 Apr 2016-Progress in Artificial Intelligence

TL;DR: Seven vital areas of research in this topic are identified, covering the full spectrum of learning from imbalanced data: classification, regression, clustering, data streams, big data analytics and applications, e.g., in social media and computer vision.

...read moreread less

Abstract: Despite more than two decades of continuous development learning from imbalanced data is still a focus of intense research. Starting as a problem of skewed distributions of binary tasks, this topic evolved way beyond this conception. With the expansion of machine learning and data mining, combined with the arrival of big data era, we have gained a deeper insight into the nature of imbalanced learning, while at the same time facing new emerging challenges. Data-level and algorithm-level methods are constantly being improved and hybrid approaches gain increasing popularity. Recent trends focus on analyzing not only the disproportion between classes, but also other difficulties embedded in the nature of data. New real-life problems motivate researchers to focus on computationally efficient, adaptive and real-time methods. This paper aims at discussing open issues and challenges that need to be addressed to further develop the field of imbalanced learning. Seven vital areas of research in this topic are identified, covering the full spectrum of learning from imbalanced data: classification, regression, clustering, data streams, big data analytics and applications, e.g., in social media and computer vision. This paper provides a discussion and suggestions concerning lines of future research for each of them.

...read moreread less

1,503 citations

Journal Article•DOI•

Simulation and the monte carlo method

[...]

J. M. Hammersley

01 Mar 1982-Bulletin of The London Mathematical Society

1,196 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse