Home
/
Authors
/
Tomasz Maciejewski

Author

Tomasz Maciejewski

Bio: Tomasz Maciejewski is an academic researcher from Poznań University of Technology. The author has contributed to research in topics: Genetic programming & Object detection. The author has an hindex of 2, co-authored 2 publications receiving 193 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Local neighbourhood extension of SMOTE for mining imbalanced data

[...]

Tomasz Maciejewski¹, Jerzy Stefanowski¹•Institutions (1)

Poznań University of Technology¹

11 Apr 2011

TL;DR: A new generalization of SMOTE is introduced, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples of the majority class, and improves evaluation measures for the minority class.

...read moreread less

Abstract: In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.

...read moreread less

238 citations

Proceedings Article•DOI•

Evolving cascades of voting feature detectors for vehicle detection in satellite imagery

[...]

Krzysztof Krawiec¹, Bartosz Kukawka¹, Tomasz Maciejewski¹•Institutions (1)

Poznań University of Technology¹

18 Jul 2010

TL;DR: The evolved detection system exhibits competitive sensitivity and relatively low false positive rate for testing images, despite not making use of domain-specific knowledge.

...read moreread less

Abstract: We propose an evolutionary method for detection of vehicles in satellite imagery which involves a large number of simple elementary features and multiple detectors trained by genetic programming. The complete detection system is composed of several detectors that are chained into a cascade and successively filter out the negative examples. Each detector is a committee of genetic programming trees that together vote over the decision concerning vehicle presence, and is trained only on the examples classified as positive by the previous cascade node. The individual trees use typical arithmetic transformations to aggregate features selected from a very large collections of Haar-like features derived from the input image. The paper presents detailed description of the proposed algorithm and reports the results of an extensive computational experiment carried out on real-world satellite images. The evolved detection system exhibits competitive sensitivity and relatively low false positive rate for testing images, despite not making use of domain-specific knowledge.

...read moreread less

4 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary

[...]

Alberto Fernández¹, Salvador García¹, Francisco Herrera¹, Nitesh V. Chawla²•Institutions (2)

University of Granada¹, University of Notre Dame²

01 Jan 2018-Journal of Artificial Intelligence Research

TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.

...read moreread less

Abstract: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of different software packages -- from open source to commercial. In this paper, marking the fifteen year anniversary of SMOTE, we reect on the SMOTE journey, discuss the current state of affairs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.

...read moreread less

905 citations

Proceedings Article•DOI•

Learning Deep Representation for Imbalanced Classification

[...]

Chen Huang¹, Yining Li¹, Chen Change Loy¹, Xiaoou Tang•Institutions (1)

The Chinese University of Hong Kong¹

01 Jun 2016

TL;DR: The representation learned by this approach, when combined with a simple k-nearest neighbor (kNN) algorithm, shows significant improvements over existing methods on both high- and low-level vision classification tasks that exhibit imbalanced class distribution.

...read moreread less

Abstract: Data in vision domain often exhibit highly-skewed class distribution, i.e., most data belong to a few majority classes, while the minority classes only contain a scarce amount of instances. To mitigate this issue, contemporary classification methods based on deep convolutional neural network (CNN) typically follow classic strategies such as class re-sampling or cost-sensitive training. In this paper, we conduct extensive and systematic experiments to validate the effectiveness of these classic schemes for representation learning on class-imbalanced data. We further demonstrate that more discriminative deep representation can be learned by enforcing a deep network to maintain both intercluster and inter-class margins. This tighter constraint effectively reduces the class imbalance inherent in the local data neighborhood. We show that the margins can be easily deployed in standard deep learning framework through quintuplet instance sampling and the associated triple-header hinge loss. The representation learned by our approach, when combined with a simple k-nearest neighbor (kNN) algorithm, shows significant improvements over existing methods on both high-and low-level vision classification tasks that exhibit imbalanced class distribution.

...read moreread less

873 citations

Journal Article•DOI•

A Survey of Predictive Modeling on Imbalanced Domains

[...]

Paula Branco¹, Luís Torgo¹, Rita P. Ribeiro¹•Institutions (1)

University of Porto¹

13 Aug 2016-ACM Computing Surveys

TL;DR: The main challenges raised by imbalanced domains are discussed, a definition of the problem is proposed, the main approaches to these tasks are described, and a taxonomy of the methods are proposed.

...read moreread less

Abstract: Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

...read moreread less

730 citations

Journal Article•DOI•

Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data

[...]

Salman H. Khan, Munawar Hayat¹, Mohammed Bennamoun², Ferdous Sohel³, Roberto Togneri² - Show less +1 more•Institutions (3)

University of Canberra¹, University of Western Australia², Murdoch University³

01 Aug 2018-IEEE Transactions on Neural Networks

TL;DR: In this article, a cost sensitive deep neural network (CoSen) is proposed to learn robust feature representations for both the majority and minority classes, which jointly optimizes the class-dependent costs and the neural network parameters.

...read moreread less

Abstract: Class imbalance is a common problem in the case of real-world object detection and classification tasks. Data of some classes are abundant, making them an overrepresented majority, and data of other classes are scarce, making them an underrepresented minority. This imbalance makes it challenging for a classifier to appropriately learn the discriminating boundaries of the majority and minority classes. In this paper, we propose a cost-sensitive (CoSen) deep neural network, which can automatically learn robust feature representations for both the majority and minority classes. During training, our learning procedure jointly optimizes the class-dependent costs and the neural network parameters. The proposed approach is applicable to both binary and multiclass problems without any modification. Moreover, as opposed to data-level approaches, we do not alter the original data distribution, which results in a lower computational cost during the training process. We report the results of our experiments on six major image classification data sets and show that the proposed approach significantly outperforms the baseline algorithms. Comparisons with popular data sampling techniques and CoSen classifiers demonstrate the superior performance of our proposed method.

...read moreread less

524 citations

Journal Article•DOI•

Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery

[...]

Xin Yang¹, Yifei Wang¹, Ryan Byrne², Gisbert Schneider², Shengyong Yang¹ - Show less +1 more•Institutions (2)

Sichuan University¹, ETH Zurich²

11 Jul 2019-Chemical Reviews

TL;DR: The current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects.

...read moreread less

Abstract: Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.

...read moreread less

425 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Collapse