scispace - formally typeset
Search or ask a question
Author

Zdzisław Pawlak

Bio: Zdzisław Pawlak is an academic researcher from Polish Academy of Sciences. The author has contributed to research in topics: Rough set & Dominance-based rough set approach. The author has an hindex of 49, co-authored 214 publications receiving 28434 citations. Previous affiliations of Zdzisław Pawlak include Warsaw University of Technology & University of Warsaw.


Papers
More filters
Proceedings ArticleDOI
27 Jun 2004
TL;DR: A function approximation using the rough set approach is discussed, based on modification of the inclusion measure, which makes it possible to overcome some drawbacks of the previously used definitions.
Abstract: Approximating of functions that are specified using imperfect knowledge is one of the central issues of many areas such as machine learning, pattern recognition, data mining, or qualitative reasoning. However, we do not have yet satisfactory methods for approximation of functions and developed calculi on function approximations. In the paper we discuss a function approximation using the rough set approach. The main difference with the existing approaches in rough set theory is based on modification of the inclusion measure. This makes it possible to overcome some drawbacks of the previously used definitions. For applications it is important to develop rough measures on approximated objects, in particular on function approximations. The modified inclusion measure is also used to define an exemplary measure, i.e., the rough integral.

9 citations

Proceedings ArticleDOI
04 May 1981
TL;DR: The statistical analysis of scientific data is a process that can be viewed as consisting of three fundamental phases, which may be the case that a preliminary analysis run indicates that a more refined coding scheme is needed or that the coding process reveals deficiencies in the data collection.
Abstract: The statistical analysis of scientific data is a process that can be viewed as consisting of three fundamental phases. First, the observations are recorded. Next, they are encoded into a numeric form suitable for statistical analysis. Finally, the calculations are performed for the particular type of analysis needed for the design of the study. This ordering is, however, only conceptual; in most real studies, the three phases interact and are overlapped. Thus, it may be the case that a preliminary analysis run indicates that a more refined coding scheme is needed or that the coding process reveals deficiencies in the data collection.

9 citations

01 Jan 1993
TL;DR: The Bulletin of the European Association for Theoretical Computer Science (EATCS), 50:234-247, 1993, the authors, and the Institute of Computer Science Report 11/92, Warsaw University of Technology
Abstract: W: Bulletin of the European Association for Theoretical Computer Science (EATCS), 50:234-247, 1993. (see also Institute of Computer Science Report 11/92, Warsaw University of Technology)

9 citations

Book ChapterDOI
01 Jan 2002
TL;DR: The relationship between some ideas of Lukasiewicz's multi-valued logic, Bayes' Theorem and rough sets will be pointed out and the consequences of granularity of knowledge for reasoning about imprecise concepts will be discussed.
Abstract: Granularity of knowledge attracted attention of many researchers recently. This paper concerns this issue from the rough set perspective. Granularity is inherently connected with foundation of rough set theory. The concept of the rough set hinges on classification of objects of interest into similarity classes, which form elementary building blocks (atoms, granules) of knowledge. These granules are employed to define basic concepts of the theory. In the paper basic concepts of rough set theory will be defined and their granular structure will be pointed out. Next the consequences of granularity of knowledge for reasoning about imprecise concepts will be discussed. In particular the relationship between some ideas of Lukasiewicz's multi-valued logic, Bayes' Theorem and rough sets will be pointed out.

9 citations

Proceedings ArticleDOI
27 Feb 1995
TL;DR: This tutorial paper introduces basic notions of rough set theory and illustrates them with simple examples, and discusses methodologies for analysing data and surveys applications, and presents and introduction to logical, algebraic and topological aspects, major extensions to standard rough sets.
Abstract: A rapid growth of interest in rough set theory 290] and its applications can be lately seen in the number of international workshops, conferences and seminars that are either directly dedicated to rough sets, include the subject in their programs, or simply accept papers that use this approach to solve problems at hand. A large number of high quality papers on various aspects of rough sets and their applications have been published in recent years as a result of this attention. The theory has been followed by the development of several software systems that implement rough set operations. In Section 12 we present a list of software systems based on rough sets. Some of the toolkits, provide advanced graphical environments that support the process of developing and validating rough set classiiers. Rough sets are applied in many domains, Several applications have revealed the need to extend the traditional rough set approach. A special place among various extensions is taken by the approach that replaces indiscernibility relation based on equivalence with a tolerance relation. In view of many generalizations, variants and extensions of rough sets a uniform presentation of the theory and methodology is in place. This tutorial paper is intended to fulllll these needs. It introduces basic notions and illustrates them with simple examples. It discusses methodologies for analysing data and surveys applications. It also presents and introduction to logical, algebraic and topological aspects, major extensions to standard rough sets, and it nally glances at future research.

9 citations


Cited by
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.

8,610 citations

Journal ArticleDOI
TL;DR: This approach seems to be of fundamental importance to artificial intelligence (AI) and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, decision support systems, inductive reasoning, and pattern recognition.
Abstract: Rough set theory, introduced by Zdzislaw Pawlak in the early 1980s [11, 12], is a new mathematical tool to deal with vagueness and uncertainty. This approach seems to be of fundamental importance to artificial intelligence (AI) and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, decision support systems, inductive reasoning, and pattern recognition.

7,185 citations

01 Jan 1998
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

3,533 citations