scispace - formally typeset
Search or ask a question

Showing papers on "Rough set published in 2015"


Journal ArticleDOI
TL;DR: A heart disease diagnosis system using rough sets based attribute reduction and interval type-2 fuzzy logic system (IT2FLS) to handle with high-dimensional dataset challenge and uncertainties and is effective with results of fewer features and higher accuracy.
Abstract: Rough sets and firefly algorithms is proposed to find optimal attribute reductions.Interval type-2 fuzzy logic system is used to predict heart disease.Proposed system is effective with results of fewer features and higher accuracy. This paper proposes a heart disease diagnosis system using rough sets based attribute reduction and interval type-2 fuzzy logic system (IT2FLS). The integration between rough sets based attribute reduction and IT2FLS aims to handle with high-dimensional dataset challenge and uncertainties. IT2FLS utilizes a hybrid learning process comprising fuzzy c-mean clustering algorithm and parameters tuning by chaos firefly and genetic hybrid algorithms. This learning process is computationally expensive, especially when employed with high-dimensional dataset. The rough sets based attribute reduction using chaos firefly algorithm is investigated to find optimal reduction which therefore reduces computational burden and enhances performance of IT2FLS. Experiment results demonstrate a significant dominance of the proposed system compared to other machine learning methods namely Naive Bayers, support vector machines, and artificial neural network. The proposed model is thus useful as a decision support system for heart disease diagnosis.

172 citations


Journal ArticleDOI
TL;DR: This work proposes a naive model of intuitionistic fuzzy decision-theoretic rough sets (IFDTRSs) and elaborate its relevant properties in advance and designs an algorithm for deriving three-way decisions in multi-period decision making.

171 citations


Journal ArticleDOI
01 Apr 2015
TL;DR: An algorithm is designed to improve the inconsistency of multi-attribute group decision making under linguistic assessment and the proposed model of three-way decisions with linguistic assessment is applied to the selection process of new product ideas.
Abstract: Graphical abstractDisplay Omitted HighlightsWe provide a method of the determination of the two types of parameters used in the DTRS.The application of DTRS is extended to the scenarios of qualitative evaluation.An algorithm is designed to improve the inconsistency of multi-attribute group decision making under linguistic assessment. Based on decision-theoretic rough set model of three-way decisions, we augment the existing model by introducing linguistic terms. Considering the two types of parameters being used in the three-way decisions with linguistic assessment, a certain type of novel three-way decisions based on the Bayesian decision procedure is constructed. In this way, three-way decisions with decision-theoretic rough sets are extended to the qualitative environment. With the aid of multi-attribute group decision making, the values of these parameters are determined. An adaptive algorithm supporting consistency improvement of multi-attribute group decision making is designed. Then, we optimize the scales of the linguistic terms with the use of particle swarm optimization. The values of these parameters of three-way decisions are aggregated when proceeding with group decision making. Finally, the proposed model of three-way decisions with linguistic assessment is applied to the selection process of new product ideas.

154 citations


Journal ArticleDOI
TL;DR: This paper presents an approach for dynamic maintenance of approximations w.r.t. objects and attributes added simultaneously under the framework of decision-theoretic rough set (DTRS) using equivalence feature vector and matrix and extensive experimental results verify the effectiveness of the proposed methods.
Abstract: Uncertainty and fuzziness generally exist in real-life data. Approximations are employed to describe the uncertain information approximately in rough set theory. Certain and uncertain rules are induced directly from different regions partitioned by approximations. Approximation can further be applied to data-mining-related task, $\hbox{e.g.}$ , attribute reduction. Nowadays, different types of data collected from different applications evolve with time, especially new attributes may appear while new objects are added. This paper presents an approach for dynamic maintenance of approximations $\hbox{w.r.t.}$ objects and attributes added simultaneously under the framework of decision-theoretic rough set (DTRS). Equivalence feature vector and matrix are defined first to update approximations of DTRS in different levels of granularity. Then, the information system is decomposed into subspaces, and the equivalence feature matrix is updated in different subspaces incrementally. Finally, the approximations of DTRS are renewed during the process of updating the equivalence feature matrix. Extensive experimental results verify the effectiveness of the proposed methods.

147 citations


Journal ArticleDOI
TL;DR: Through the use of the accelerator, three representative heuristic fuzzy-rough feature selection algorithms have been enhanced and it is shown that these modified algorithms are much faster than their original counterparts.

125 citations


Journal ArticleDOI
Yiyu Yao1
TL;DR: It is argued that an oversight of conceptual formulations makes an in-depth understanding of rough set theory very difficult, and it is essential to pay equal, if not more, attention to conceptual formulations.
Abstract: There exist two formulations of the theory of rough sets. A conceptual formulation emphasizes on the meaning and interpretation of the concepts and notions of the theory, whereas a computational formulation focuses on procedures and algorithms for constructing these notions. Except for a few earlier studies, computational formulations dominate research in rough sets. In this paper, we argue that an oversight of conceptual formulations makes an in-depth understanding of rough set theory very difficult. The conceptual and computational formulations are the two sides of the same coin; it is essential to pay equal, if not more, attention to conceptual formulations. As a demonstration, we examine and compare conceptual and computational formulations of two fundamental concepts of rough sets, namely, approximations and reducts.

124 citations


Journal ArticleDOI
TL;DR: A two-grade fusion approach involved in the evidence theory and multigranulation rough set theory is proposed, which is based on a well-defined distance function among granulation structures and will be useful for pooling the uncertain data from different sources and significant for establishing a new direction of granular computing.

115 citations


Journal ArticleDOI
TL;DR: A supervised feature selection method based on Rough Set Quick Reduct hybridized with Improved Harmony Search algorithm to deal with issues of high dimensionality in the medical dataset is presented.
Abstract: Feature selection is a process of selecting optimal features that produce the most prognostic outcome. It is one of the essential steps in knowledge discovery. The crisis is that not all features are important. Most of the features may be redundant, and the rest may be irrelevant and noisy. This paper presents a novel feature selection approach to deal with issues of high dimensionality in the medical dataset. Medical datasets are habitually classified by a large number of measurements and a comparatively small number of patient records. Most of these measurements are irrelevant or noisy. This paper proposes a supervised feature selection method based on Rough Set Quick Reduct hybridized with Improved Harmony Search algorithm. Rough set theory is one of the most thriving methods used for feature selection. The Rough Set Improved Harmony Search Quick Reduct (RS-IHS-QR) algorithm is a relatively new population-based meta-heuristic optimization algorithm. This approach imitates the music improvisation process, where each musician improvises their instrument's pitch by searching for a perfect state of harmony. The quality of the reduced data is measured by the classification performance. The proposed algorithm is experimentally compared with the existing algorithms Rough Set Quick Reduct (RS-QR) and Rough Set Particle Swarm Optimization Quick Reduct (RS-PSO-QR). The number of features selected by the proposed method is comparatively low. The proposed algorithm reveals more than 90 % classification accuracy in most of the cases and the time taken to reduct the dataset also decreased than the existing methods. The experimental result demonstrates the efficiency and effectiveness of the proposed algorithm.

112 citations


Journal ArticleDOI
TL;DR: The aim of this paper is to introduce the notion of a rough soft hemiring, which is an extended notion ofa rough hemiring and a soft hemires, and to study roughness in soft hemirings with respect to Pawlak approximation spaces.
Abstract: The aim of this paper is to introduce the notion of a rough soft hemiring, which is an extended notion of a rough hemiring and a soft hemiring. We study roughness in soft hemirings with respect to Pawlak approximation spaces. Some new rough soft operations are explored. In particular, lower and upper rough soft hemirings and k-idealistic, h-idealistic, strong h-idealistic idealistic soft hemirings are investigated. Finally, an important result on an upper strong h-idealistic rough soft hemiring via Bourne's congruence relations are obtained.

110 citations


Journal ArticleDOI
TL;DR: A systematic study of the rough set-based discretization techniques found in the literature and categorizes them into a taxonomy that provides a useful roadmap for new researchers in the area of RSBD.
Abstract: The extraction of knowledge from a huge volume of data using rough set methods requires the transformation of continuous value attributes to discrete intervals. This paper presents a systematic study of the rough set-based discretization (RSBD) techniques found in the literature and categorizes them into a taxonomy. In the literature, no review is solely based on RSBD. Only a few rough set discretizers have been studied, while many new developments have been overlooked and need to be highlighted. Therefore, this study presents a formal taxonomy that provides a useful roadmap for new researchers in the area of RSBD. The review also elaborates the process of RSBD with the help of a case study. The study of the existing literature focuses on the techniques adapted in each article, the comparison of these with other similar approaches, the number of discrete intervals they produce as output, their effects on classification and the application of these techniques in a domain. The techniques adopted in each article have been considered as the foundation for the taxonomy. Moreover, a detailed analysis of the existing discretization techniques has been conducted while keeping the concept of RSBD applications in mind. The findings are summarized and presented in this paper.

109 citations


Journal ArticleDOI
TL;DR: It is proved that the union and intersection operations of rough fuzzy approximation pairs are closed and a bounded distributive lattice can be constructed.

Journal ArticleDOI
TL;DR: The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset.
Abstract: The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.

Journal ArticleDOI
TL;DR: This paper proposes a novel rough set based method to feature selection using fish swarm algorithm that can provide an efficient tool for finding a minimal subset of the features without information loss.
Abstract: Rough set theory is one of the effective methods to feature selection which can preserve the characteristics of the original features by deleting redundant information. The main idea of rough set approach to feature selection is to find a globally minimal reduct, the smallest set of features keeping important information of the original set of features. Rough set theory has been used as a dataset preprocessor with much success, but current approaches to feature selection are inadequate for finding a globally minimal reduct. In this paper, we propose a novel rough set based method to feature selection using fish swarm algorithm. The fish swarm algorithm is a new intelligent swarm modeling approach that consists primarily of searching, swarming, and following behaviors. It is attractive for feature selection since fish swarms can discover the best combination of features as they swim within the subset space. In our proposed algorithm, a minimal subset can be located and verified. To show the efficiency of our algorithm, we carry out numerical experiments based on some standard UCI datasets. The results demonstrate that our algorithm can provide an efficient tool for finding a minimal subset of the features without information loss.

Journal ArticleDOI
TL;DR: Three different parallel matrix-based methods are introduced to process large-scale, incomplete data and are built on MapReduce and implemented on Twister that is a lightweight Map Reduce runtime system.
Abstract: As the volume of data grows at an unprecedented rate, large-scale data mining and knowledge discovery present a tremendous challenge. Rough set theory, which has been used successfully in solving problems in pattern recognition, machine learning, and data mining, centers around the idea that a set of distinct objects may be approximated via a lower and upper bound. In order to obtain the benefits that rough sets can provide for data mining and related tasks, efficient computation of these approximations is vital. The recently introduced cloud computing model, MapReduce, has gained a lot of attention from the scientific community for its applicability to large-scale data analysis. In previous research, we proposed a MapReduce-based method for computing approximations in parallel, which can efficiently process complete data but fails in the case of missing (incomplete) data. To address this shortcoming, three different parallel matrix-based methods are introduced to process large-scale, incomplete data. All of them are built on MapReduce and implemented on Twister that is a lightweight MapReduce runtime system. The proposed parallel methods are then experimentally shown to be efficient for processing large-scale data.

Journal ArticleDOI
TL;DR: This paper proposes three basic uncertainty measures and three expected granularity-based uncertainty measures, the monotonicity of these measures is proved to be held and the relationship between these measures and corresponding uncertainty measures in classical rough set model is obtained.

Journal ArticleDOI
TL;DR: Three state-of-the-art methods used in the remote sensing literature are analyzed for comparison and the results point to the superiority of the proposed rough-set-based supervised technique, especially when a small number of bands are to be selected.
Abstract: Band selection is a well-known approach to reduce the dimensionality of hyperspectral imagery. Rough set theory is a paradigm to deal with uncertainty, vagueness, and incompleteness of data. Although it has been applied successfully to feature selection in different application domains, it is seldom used for the analysis of the hyperspectral imagery. In this paper, a rough-set-based supervised method is proposed to select informative bands from hyperspectral imagery. The proposed technique exploits rough set theory to compute the relevance and significance of each spectral band. Then, by defining a novel criterion, it selects the informative bands that have higher relevance and significance values. To assess the effectiveness of the proposed band selection technique, three state-of-the-art methods (one supervised and two unsupervised) used in the remote sensing literature are analyzed for comparison on three hyperspectral data sets. The results of this comparison point to the superiority of the proposed technique, especially when a small number of bands are to be selected.

Book ChapterDOI
Yiyu Yao1
20 Nov 2015
TL;DR: This paper presents a trisecting-and-acting framework of three-way decisions that divides a universal set into three regions and designs most effective strategies for processing the three regions.
Abstract: The notion of three-way decisions was originally introduced by the needs to explain the three regions of probabilistic rough sets. Recent studies show that rough set theory is only one of possible ways to construct three regions. A more general theory of three-way decisions has been proposed, embracing ideas from rough sets, interval sets, shadowed sets, three-way approximations of fuzzy sets, orthopairs, square of oppositions, and others. This paper presents a trisecting-and-acting framework of three-way decisions. With respect to trisecting, we divide a universal set into three regions. With respect to acting, we design most effective strategies for processing the three regions. The identification and explicit investigation of different strategies for different regions are a distinguishing feature of three-way decisions.

Journal ArticleDOI
TL;DR: A multivariate model is established and improves the accuracy of computation by combing traditional fuzzy time series models and rough set method and using fuzzy c-mean algorithm to make the data into discrete.

Journal ArticleDOI
01 Jan 2015
TL;DR: A new method for constructing simpler discernibility matrix with covering based rough sets is provided, and some characterizations of attribute reduction provided by Tsang et al. are improved.
Abstract: A simpler approach to attribute reduction based on discernibility matrix is presented with covering based rough sets.Some important properties of attribute reduction with covering based rough sets are improved.The computational complexity of the improved reduction approach is relatively reduced.A new algorithm to attribute reduction in decision tables is presented in a different strategy of identifying objects. Attribute reduction is viewed as an important preprocessing step for pattern recognition and data mining. Most of researches are focused on attribute reduction by using rough sets. Recently, Tsang et al. discussed attribute reduction with covering rough sets in the paper (Tsang et al., 2008), where an approach based on discernibility matrix was presented to compute all attribute reducts. In this paper, we provide a new method for constructing simpler discernibility matrix with covering based rough sets, and improve some characterizations of attribute reduction provided by Tsang et al. It is proved that the improved discernibility matrix is equivalent to the old one, but the computational complexity of discernibility matrix is relatively reduced. Then we further study attribute reduction in decision tables based on a different strategy of identifying objects. Finally, the proposed reduction method is compared with some existing feature selection methods by numerical experiments and the experimental results show that the proposed reduction method is efficient and effective.

Journal ArticleDOI
TL;DR: An incremental approach for maintaining approximations of DRSA when attribute values vary over time is proposed and Experimental evaluations show that the incremental algorithm can effectively reduce the computational time in comparison with the non-incremental one when the ratio of the attribute values varied is less than a threshold.

Journal ArticleDOI
TL;DR: Two incremental algorithms based on adding attributes and deleting attributes under probabilistic rough sets are proposed, respectively and the experiments validate the feasibility of the proposed incremental approaches.
Abstract: The attribute set in an information system evolves in time when new information arrives. Both lower and upper approximations of a concept will change dynamically when attributes vary. Inspired by the former incremental algorithm in Pawlak rough sets, this paper focuses on new strategies of dynamically updating approximations in probabilistic rough sets and investigates four propositions of updating approximations under probabilistic rough sets. Two incremental algorithms based on adding attributes and deleting attributes under probabilistic rough sets are proposed, respectively. The experiments on five data sets from UCI and a genome data with thousand attributes validate the feasibility of the proposed incremental approaches.

Journal ArticleDOI
TL;DR: This paper critically evaluate most relevant fuzzy rough set models proposed in the literature and establishes a formally correct and unified mathematical framework for them.

Journal ArticleDOI
TL;DR: A framework of double-quantitative decision-theoretic rough set (Dq-DTRS) based on Bayesian decision procedure and GRS is proposed, which essentially indicate the relative and absolute quantification.

Journal ArticleDOI
TL;DR: This paper presents a parameterized dominance-based rough set approach to interval-valued information systems, and proposes the concept of α-dominance relation, and introduces lower and upper approximate reducts into α-Dominance based rough set for simplifying decision rules.

Journal ArticleDOI
TL;DR: This paper presents the updating properties for dynamic maintenance of approximations when the criteria values in the set-valued decision system evolve with time, and proposes two incremental algorithms corresponding to the addition and removal of criteria values.

Journal ArticleDOI
TL;DR: Together with the tolerance information granules in rough sets, the mutual information criterion is provided for evaluating candidate features in incomplete data, which not only utilizes the largest mutual information with the target class but also takes into consideration the redundancy between selected features.

Journal ArticleDOI
TL;DR: A supervised and multivariate discretization algorithm — SMDNS in rough sets, which is derived from the traditional algorithm naive scaler (called Naive), which is efficient in terms of the classification accuracy and the number of generated cuts.
Abstract: Discretization of continuous attributes is an important task in rough sets and many discretization algorithms have been proposed. However, most of the current discretization algorithms are univariate, which may reduce the classification ability of a given decision table. To solve this problem, we propose a supervised and multivariate discretization algorithm — SMDNS in rough sets, which is derived from the traditional algorithm naive scaler (called Naive). Given a decision table DT=(U,C,D,V,f), since SMDNS uses both class information and the interdependence among various condition attributes in C to determine the discretization scheme, the cuts obtained by SMDNS are much less than those obtained by Naive, while the classification ability of DT remains unchanged after discretization. Experimental results show that SMDNS is efficient in terms of the classification accuracy and the number of generated cuts. In particular, our algorithm can obtain a satisfactory compromise between the number of cuts and the classification accuracy.

Journal ArticleDOI
TL;DR: The entropy and granularity of the binary mapping between two different universes are defined, and an approach to uncertainty measurement based on the granular of binary mapping for multigranulation rough set over two universes is given.
Abstract: Recently, a multigranulation rough set MGRS has become a new direction in rough set theory, which is based on multiple binary relations on the universe of discourse. The existing literature about multigranulation rough set is based on the assumption of the same universe. In reality, however, a good deal of practical decision making may relate to the possibility of two or more different universes. In this paper, we consider the rough approximation of a given concept over two different universes with respect to the multigranulation space formed by different mappings of the two universes, i.e., the multigranulation rough set model. We respectively define the optimistic multigranulation rough set, pessimistic multigranulation rough set and variable precision multigranulation rough set over two universes, each of which can be appropriate to a different real-world decision-making problem in management science. Then several important properties of these models are discussed in detail. Also, the relationship between the multigranulation rough set over two universes and the existing models in the literature is investigated. At last, the entropy and granularity of the binary mapping between two different universes are defined, and then we give an approach to uncertainty measurement based on the granularity of binary mapping for multigranulation rough set over two universes. The multigranulation rough set model over two universes provides a new, effective approach for practical decision problems in management science.

Journal ArticleDOI
TL;DR: Experimental results show that by using the relative decision entropy-based feature significance as heuristic information, FSMRDE is efficient for feature selection and is able to achieve good scalability for large data sets.

Journal ArticleDOI
TL;DR: This paper provides a comprehensive review of the most recent methods for feature selection that originated from nature inspired meta-heuristics, where the more classic approaches such as genetic algorithms and ant colony optimisation are also included for comparison.
Abstract: Many strategies have been exploited for the task of feature selection, in an effort to identify more compact and better quality feature subsets. A number of evaluation metrics have been developed recently that can judge the quality of a given feature subset as a whole, rather than assessing the qualities of individual features. Effective techniques of stochastic nature have also emerged, allowing good quality solutions to be discovered without resorting to exhaustive search. This paper provides a comprehensive review of the most recent methods for feature selection that originated from nature inspired meta-heuristics, where the more classic approaches such as genetic algorithms and ant colony optimisation are also included for comparison. A good number of the reviewed methodologies have been significantly modified in the present, in order to systematically support generic subset-based evaluators and higher dimensional problems. Such modifications are carried out because the original studies either work exclusively with certain subset evaluators (e.g., rough set-based methods), or are limited to specific problem domains. A total of ten different algorithms are examined, and their mechanisms and work flows are summarised in an unified manner. The performance of the reviewed approaches are compared using high dimensional, real-valued benchmark data sets. The selected feature subsets are also used to build classification models, in an effort to further validate their efficacies.