scispace - formally typeset
Search or ask a question

Showing papers on "Rough set published in 2012"


Journal ArticleDOI
TL;DR: A framework for the study of covering based rough set approximations is proposed and three equivalent formulations of the classical rough sets are examined by using equivalence relations, partitions, and @s-algebras.

440 citations


Journal ArticleDOI
TL;DR: This paper proposes a new hybrid method for preprocessing imbalanced data-sets through the construction of new samples, using the Synthetic Minority Oversampling Technique together with the application of an editing technique based on the Rough Set Theory and the lower approximation of a subset.
Abstract: Imbalanced data is a common problem in classification. This phenomenon is growing in importance since it appears in most real domains. It has special relevance to highly imbalanced data-sets (when the ratio between classes is high). Many techniques have been developed to tackle the problem of imbalanced training sets in supervised learning. Such techniques have been divided into two large groups: those at the algorithm level and those at the data level. Data level groups that have been emphasized are those that try to balance the training sets by reducing the larger class through the elimination of samples or increasing the smaller one by constructing new samples, known as undersampling and oversampling, respectively. This paper proposes a new hybrid method for preprocessing imbalanced data-sets through the construction of new samples, using the Synthetic Minority Oversampling Technique together with the application of an editing technique based on the Rough Set Theory and the lower approximation of a subset. The proposed method has been validated by an experimental study showing good results using C4.5 as the learning algorithm.

364 citations


Journal ArticleDOI
17 Aug 2012
TL;DR: This paper compared grey systems with other kinds of uncertainty models such as stochastic probability, rough set theory, and fuzzy mathematics.
Abstract: Purpose – The purpose of this paper is to introduce the elementary concepts and fundamental principles of grey systems and the main components of grey systems theory. Also to discuss the astonishing progress that grey systems theory has made in the world of learning and its wide‐ranging applications in the entire spectrum of science.Design/methodology/approach – The characteristics of unascertained systems including incomplete information and inaccuracies in data are analysed and four uncertain theories: probability statistics, fuzzy mathematics, grey system and rough set theory are compared. The scientific principle of simplicity and how precise models suffer from inaccuracies are also shown.Findings – The four uncertain theories, probability statistics, fuzzy mathematics, grey system and rough set theory are examined with different research objects, different basic sets, different methods and procedures, different data requirements, different emphasis, different objectives and different characteristics....

240 citations


Journal ArticleDOI
TL;DR: This study finds that the new rough sets degenerate to the original MGRS when the size of neighborhood equals zero, and proposes a new definition of covering reduct to describe the smallest attribute subset that preserves the consistency of the neighborhood decision system.

186 citations


Journal ArticleDOI
TL;DR: An efficient rough feature selection algorithm for large-scale data sets, which is stimulated from multi-granulation, is proposed, which yields in a much less amount of time a feature subset (the approximate reduct).

168 citations


Journal ArticleDOI
TL;DR: The proposed algorithms to find reducts that are based on the minimal elements in the discernibility matrix are developed in the framework of fuzzy rough sets and Experimental comparison shows that the proposed algorithms are effective.
Abstract: Attribute reduction is one of the most meaningful research topics in the existing fuzzy rough sets, and the approach of discernibility matrix is the mathematical foundation of computing reducts. When computing reducts with discernibility matrix, we find that only the minimal elements in a discernibility matrix are sufficient and necessary. This fact motivates our idea in this paper to develop a novel algorithm to find reducts that are based on the minimal elements in the discernibility matrix. Relative discernibility relations of conditional attributes are defined and minimal elements in the fuzzy discernibility matrix are characterized by the relative discernibility relations. Then, the algorithms to compute minimal elements and reducts are developed in the framework of fuzzy rough sets. Experimental comparison shows that the proposed algorithms are effective.

161 citations


Journal ArticleDOI
TL;DR: The incremental approaches for updating the relation matrix are proposed to update rough set approximations and show that the proposed incremental approaches effectively reduce the computational time in comparison with the non-incremental approach.

149 citations


Journal ArticleDOI
TL;DR: A new measure of feature quality, called rank mutual information (RMI), is introduced, which combines the advantage of robustness of Shannon's entropy with the ability of dominance rough sets in extracting ordinal structures from monotonic data sets and can get monotonically consistent decision trees.
Abstract: In many decision making tasks, values of features and decision are ordinal. Moreover, there is a monotonic constraint that the objects with better feature values should not be assigned to a worse decision class. Such problems are called ordinal classification with monotonicity constraint. Some learning algorithms have been developed to handle this kind of tasks in recent years. However, experiments show that these algorithms are sensitive to noisy samples and do not work well in real-world applications. In this work, we introduce a new measure of feature quality, called rank mutual information (RMI), which combines the advantage of robustness of Shannon's entropy with the ability of dominance rough sets in extracting ordinal structures from monotonic data sets. Then, we design a decision tree algorithm (REMT) based on rank mutual information. The theoretic and experimental analysis shows that the proposed algorithm can get monotonically consistent decision trees, if training samples are monotonically consistent. Its performance is still good when data are contaminated with noise.

145 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the rough decision entropy measure and the interval approximation roughness measure are effective and valid for evaluating the uncertainty measurement of interval-valued decision systems.
Abstract: Uncertainty measures can supply new points of view for analyzing data and help us to disclose the substantive characteristics of data sets. Some uncertainty measures for single-valued information systems or single-valued decision systems have been developed. However, there are few studies on the uncertainty measurement for interval-valued information systems or interval-valued decision systems. This paper addresses the uncertainty measurement problem in interval-valued decision systems. An extended conditional entropy is proposed in interval-valued decision systems based on possible degree between interval values. Consequently, a concept called rough decision entropy is introduced to evaluate the uncertainty of an interval-valued decision system. Besides, the original approximation accuracy measure proposed by Pawlak is extended to deal with interval-valued decision systems and the concept of interval approximation roughness is presented. Experimental results demonstrate that the rough decision entropy measure and the interval approximation roughness measure are effective and valid for evaluating the uncertainty measurement of interval-valued decision systems. Experimental results also indicate that the rough decision entropy measure outperforms the interval approximation roughness measure.

142 citations


Journal ArticleDOI
TL;DR: The summarization and extension of the results obtained since 2003 when investigations on foundations of approximation of partially defined concepts are presented, including examples of rough set-based strategies for the extension of approximation spaces from samples of objects onto a whole universe of objects.

138 citations


Journal ArticleDOI
Qinghua Hu1, Lei Zhang, Shuang An1, David Zhang, Daren Yu1 
TL;DR: Why the classical fuzzy rough set model is sensitive to noise and how noisy samples impose influence on fuzzy rough computation are revealed and several new robust models are introduced.
Abstract: Rough sets, especially fuzzy rough sets, are supposedly a powerful mathematical tool to deal with uncertainty in data analysis. This theory has been applied to feature selection, dimensionality reduction, and rule learning. However, it is pointed out that the classical model of fuzzy rough sets is sensitive to noisy information, which is considered as a main source of uncertainty in applications. This disadvantage limits the applicability of fuzzy rough sets. In this paper, we reveal why the classical fuzzy rough set model is sensitive to noise and how noisy samples impose influence on fuzzy rough computation. Based on this discussion, we study the properties of some current fuzzy rough models in dealing with noisy data and introduce several new robust models. The properties of the proposed models are also discussed. Finally, a robust classification algorithm is designed based on fuzzy lower approximations. Some numerical experiments are given to illustrate the effectiveness of the models. The classifiers that are developed with the proposed models achieve good generalization performance.

Journal ArticleDOI
TL;DR: By introducing a new notion of complementary neighborhood, some types of neighborhood-related covering rough sets are considered, two of which are firstly defined and some basic properties of the complementary neighborhood are shown.

Journal ArticleDOI
TL;DR: A new definition of intuitionistic fuzzy rough sets is given with the analysis of its basic properties based on the notion of two universes, general binary relations, and a pair (T, I) of intuitionists fuzzy t-norm T and intuitionism fuzzy implicator I.

01 Jan 2012
TL;DR: The study introduces an idea of granular models – generalizations of numeric models that are formed as a result of an optimal allocation (distribution) of information granularity.
Abstract: The highly diversified conceptual and algorithmic landscape of Granular Computing calls for the formation of sound fundamentals of the discipline, which cut across the diversity of formal frameworks (fuzzy sets, sets, rough sets) in which information granules are formed and processed. The study addresses this quest by introducing an idea of granular models – generalizations of numeric models that are formed as a result of an optimal allocation (distribution) of information granularity. Information granularity is regarded as a crucial design asset, which helps establish a better rapport of the resulting granular model with the system under modeling. A suite of modeling situations is elaborated on; they offer convincing examples behind the emergence of granular models. Pertinent problems showing how information granularity is distributed throughout the parameters of numeric functions (and resulting in granular mappings) are formulated as optimization tasks. A set of associated information granularity distribution protocols is discussed. We also provide a number of illustrative examples.

Journal ArticleDOI
TL;DR: The backtrack algorithm is efficient on rational-sized datasets, the weighting mechanism for the heuristic information is effective, and the competition approach can improve the quality of the result significantly.

Journal ArticleDOI
TL;DR: Three types of definitions of lower and upper approximations and corresponding uncertainty measurement concepts including accuracy, roughness and approximation accuracy are investigated andoretical analysis indicates that two of the three types can be used to evaluate the uncertainty in incomplete information systems.

Journal ArticleDOI
TL;DR: The experimental results of the proposed method to fault diagnosis of the gearbox and gasoline engine valve trains show that this method can extract the faulty features, which have better classification ability and at the same time reduce a lot of redundant features in case of assuring the classification accuracy, accordingly improve the classifier efficiency and achieve a better classification performance.

Journal ArticleDOI
TL;DR: Some fundamental properties of the multigranulation rough set model are considered, and it is shown that both the collection of lower definable sets and that of upper definable Set can form a lattice, but such lattices are not distributive, not complemented and pseudo-complemented in the general case.
Abstract: The original rough set model, i.e., Pawlak's single-granulation rough set model has been extended to a multigranulation rough set model, where two kinds of multigranulation approximations, i.e., the optimistic and pessimistic approximations were introduced. In this paper, we consider some fundamental properties of the multigranulation rough set model, and show that (i)Both the collection of lower definable sets and that of upper definable sets in the optimistic multigranulation rough set model can form a lattice, such lattices are not distributive, not complemented and pseudo-complemented in the general case. The collection of definable sets in the optimistic multigranulation rough set model does not even form a lattice in general conditions. (ii)The collection of (lower, upper) definable sets in the optimistic multigranulation rough set model forms a topology on the universe if and only the optimistic multigranulation rough set model is equivalent to Pawlak's single-granulation rough set model. (iii)In the context of the pessimistic multigranulation rough set model, the collections of three different kinds of definable sets coincide with each other, and they determine a clopen topology on the universe, furthermore, they form a Boolean algebra under the usual set-theoretic operations.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method effectively updates approximations of a concept in practice, and a comparison of the proposed incremental method with a nonincremental method of dynamic maintenance of rough set approximation is conducted.
Abstract: Approximations of a concept in rough set theory induce rules and need to update for dynamic data mining and related tasks. Most existing incremental methods based on the classical rough set model can only be used to deal with the categorical data. This paper presents a new dynamic method for incrementally updating approximations of a concept under neighborhood rough sets to deal with numerical data. A comparison of the proposed incremental method with a nonincremental method of dynamic maintenance of rough set approximations is conducted by an extensive experimental evaluation on different data sets from UCI. Experimental results show that the proposed method effectively updates approximations of a concept in practice. © 2012 Wiley Periodicals, Inc. © 2012 Wiley Periodicals, Inc.

Journal ArticleDOI
TL;DR: Compared with several representative reducts, the proposed reduction method in incomplete decision systems can provide a mathematical quantitative measure of knowledge uncertainty and is indeed efficient, and outperforms other available approaches for feature selection from incomplete and complete data sets.
Abstract: Feature selection in large, incomplete decision systems is a challenging problem. To avoid exponential computation in exhaustive feature selection methods, many heuristic feature selection algorithms have been presented in rough set theory. However, these algorithms are still time-consuming to compute. It is therefore necessary to investigate effective and efficient heuristic algorithms. In this paper, rough entropy-based uncertainty measures are introduced to evaluate the roughness and accuracy of knowledge. Moreover, some of their properties are derived and the relationships among these measures are established. Furthermore, compared with several representative reducts, the proposed reduction method in incomplete decision systems can provide a mathematical quantitative measure of knowledge uncertainty. Then, a heuristic algorithm with low computational complexity is constructed to improve computational efficiency of feature selection in incomplete decision systems. Experimental results show that the proposed method is indeed efficient, and outperforms other available approaches for feature selection from incomplete and complete data sets.

Journal ArticleDOI
TL;DR: An extensive experimental evaluation shows that the proposed parallel method for computing rough set approximations based on the MapReduce technique is effective for data mining.

BookDOI
03 Jul 2012
TL;DR: This book constitutes the refereed proceedings of the 8th International Conference on Rough Sets and Current Trends in Computing, RSCTC, held in Chengdu, China, in August 2012, as one of the co-located conferences of the 2012 Joint Rough Set Symposium, JRS 2012.
Abstract: This book constitutes the refereed proceedings of the 8th International Conference on Rough Sets and Current Trends in Computing, RSCTC, held in Chengdu, China, in August 2012, as one of the co-located conferences of the 2012 Joint Rough Set Symposium, JRS 2012. The 55 revised full papers presented together with one keynote paper were carefully reviewed and selected from numerous submissions. The papers are organized in topical sections on rough sets and its applications; current trends in computing; decision-theoretic rough set model and applications; formal concept analysis and granular computing; mining complex data with granular computing; data mining competition.

Journal ArticleDOI
TL;DR: This paper investigates conditions for a covering under which some of common properties of classical lower and upper approximation operations hold for the fourth type of covering-based lower andupper approximation operations.

Journal ArticleDOI
TL;DR: A hybrid evolutionary algorithm for data reduction, using both instance and feature selection, is presented, which obtains high reduction rates on training sets which greatly enhance the behavior of the nearest neighbor classifier.

Journal ArticleDOI
TL;DR: The results show that market-based information does provide valuable information in credit rating predictions and the proposed approach provides better classification results and generates meaningful rules for credit ratings.
Abstract: In current credit ratings models, various accounting-based information are usually selected as prediction variables, based on historical information rather than the market's assessment for future. In the study, we propose credit rating prediction model using market-based information as a predictive variable. In the proposed method, Moody's KMV (KMV) is employed as a tool to evaluate the market-based information of each corporation. To verify the proposed method, using the hybrid model, which combine random forests (RF) and rough set theory (RST) to extract useful information for credit rating. The results show that market-based information does provide valuable information in credit rating predictions. Moreover, the proposed approach provides better classification results and generates meaningful rules for credit ratings.

Journal ArticleDOI
TL;DR: In this article, sample pair selection with rough set is proposed in order to compress the discernibility function of a decision table so that only minimal elements in the discriminibility matrix are employed to find reducts.
Abstract: Attribute reduction is the strongest and most characteristic result in rough set theory to distinguish itself to other theories. In the framework of rough set, an approach of discernibility matrix and function is the theoretical foundation of finding reducts. In this paper, sample pair selection with rough set is proposed in order to compress the discernibility function of a decision table so that only minimal elements in the discernibility matrix are employed to find reducts. First relative discernibility relation of condition attribute is defined, indispensable and dispensable condition attributes are characterized by their relative discernibility relations and key sample pair set is defined for every condition attribute. With the key sample pair sets, all the sample pair selections can be found. Algorithms of computing one sample pair selection and finding reducts are also developed; comparisons with other methods of finding reducts are performed with several experiments which imply sample pair selection is effective as preprocessing step to find reducts.

Journal ArticleDOI
TL;DR: A generalisation of the classical property and object-oriented concept lattices to a fuzzy environment based on the philosophy of the multi-adjoint paradigm, in which different adjoint triples can be used in non-linear sets, as well as the corresponding representation (fundamental) theorems.

Journal ArticleDOI
01 Apr 2012
TL;DR: By considering the levels of tolerance for errors and the cost of actions in real decision procedure, a new two-stage approach is proposed to solve the multiple-category classification problems with Decision-Theoretic Rough Sets (DTRS).
Abstract: By considering the levels of tolerance for errors and the cost of actions in real decision procedure, a new two-stage approach is proposed to solve the multiple-category classification problems with Decision-Theoretic Rough Sets (DTRS). The first stage is to change an m-category classification problem (m > 2) into an m two-category classification problem, and form three types of decision regions: positive region, boundary region and negative region with different states and actions by using DTRS. The positive region makes a decision of acceptance, the negative region makes a decision of rejection, and the boundary region makes a decision of abstaining. The second stage is to choose the best candidate classification in the positive region by using the minimum probability error criterion with Bayesian discriminant analysis approach. A case study of medical diagnosis demonstrates the proposed method.

Journal ArticleDOI
TL;DR: This paper uses rough set to determine the weight of each single prediction method and utilizes Dempster-Shafer evidence theory method as the combination method and finds that the performance of the proposed method is superior to those of single classifier and other multiple classifiers.
Abstract: It is critical to build an effective prediction model to improve the accuracy of financial distress prediction. Some existing literatures have demonstrated that single classifier has limitations and combination of multiple prediction methods has advantages in financial distress prediction. In this paper, we extend the research of multiple predictions to integrate with rough set and Dempster-Shafer evidence theory. We use rough set to determine the weight of each single prediction method and utilize Dempster-Shafer evidence theory method as the combination method. We discuss the research process for the financial distress prediction based on the proposed method. Finally, we provide an empirical experiment with Chinese listed companies' real data to demonstrate the accuracy of the proposed method. We find that the performance of the proposed method is superior to those of single classifier and other multiple classifiers.

Journal ArticleDOI
TL;DR: The uncertainty measure of the knowledge granularity and rough entropy for probabilistic rough set over two universes by the proposed concept is discussed and the general Shannon entropy of covering-based on universe is defined.