scispace - formally typeset
Search or ask a question
Author

S. K. Michael Wong

Bio: S. K. Michael Wong is an academic researcher from University of Regina. The author has contributed to research in topics: Bayesian network & Relational database. The author has an hindex of 7, co-authored 24 publications receiving 176 citations.

Papers
More filters
Book ChapterDOI
26 Apr 1999
TL;DR: Many information-theoretic measures have been applied to quantify the importance of an attribute in data mining and these measures are summarized and critically analyzed.
Abstract: An attribute is deemed important in data mining if it partitions the database such that previously unknown regularities are observable. Many information-theoretic measures have been applied to quantify the importance of an attribute. In this paper, we summarize and critically analyze these measures.

61 citations

Book ChapterDOI
15 Oct 1997
TL;DR: A framework for reasoning using intervals is presented, two interpretations of intervals are examined, and one treats intervals as bounds of a truth evaluation function, and the other treats end points of intervals as two truth evaluation functions.
Abstract: This paper presents a framework for reasoning using intervals. Two interpretations of intervals are examined, one treats intervals as bounds of a truth evaluation function, and the other treats end points of intervals as two truth evaluation functions. They lead to two different reasoning approaches, one is based on interval computations, and the other is based on interval structures. A number of interval based reasoning methods are reviewed and compared within the proposed framework.

19 citations

Book ChapterDOI
15 Jun 1993
TL;DR: Use of compatible probability functions to define the notion of upper entropy and lower entropy of a belief function as a generalization of the Shannon entropy is demonstrated.
Abstract: This paper uses the compatible probability functions to define the notion of upper entropy and lower entropy of a belief function as a generalization of the Shannon entropy. The upper entropy measures the amount of information conveyed by the evidence currently available. The lower entropy measures the maximum possible amount of information that can be obtained if further evidence becomes available. This paper also analyzes the different characteristics of these entropies and the computational aspect. The study demonstrates usefulness of compatible probability functions to apply various notions from the probability theory to the theory of belief functions.

18 citations

Book ChapterDOI
TL;DR: The problem of triangulation of Bayesian networks from a relational database perspective is shown to be equivalent to the problem of identifying a maximal subset of conflict free conditional independencies.
Abstract: In this paper, we study the problem of triangulation of Bayesian networks from a relational database perspective. We show that the problem of triangulating a Bayesian network is equivalent to the problem of identifying a maximal subset of conflict free conditional independencies. Several interesting theoretical results regarding triangulating Bayesian networks are obtained from this perspective.

14 citations

28 May 1992
TL;DR: There exists a closed- foriu expression for the proposed conditional noli-numeric beliefs, which is useful in qualitative, qualitative, Mono-monoionic rensoning, and conditional logic.
Abstract: The non-numeric belief, a counterpart of belief function, is defined as the lower envelope of a family of incidence mappings (i.e., the non-numeric counterpart of probability functions). Likewise, ihe nola- nu in eric conditional belief is defined as the lower envelope of n family of conditional incidence mappings. Such defiiiitions are consistent with the corresponding definifioiis for belief functions. There exists a closed- foriu expression for the proposed conditional noli-numeric beliefs, which is useful in qualitative, ~i.oii-monoionic rensoning, and conditional logic.

11 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A survey of the available literature on data mining using soft computing based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model is provided.
Abstract: The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

630 citations

Posted ContentDOI
30 Apr 2004-viXra
TL;DR: This book is devoted to an emerging branch of Information Fusion based on new approach for modelling the fusion problematic when the information provided by the sources is both uncertain and (highly) conflicting.
Abstract: This book is devoted to an emerging branch of Information Fusion based on new approach for modelling the fusion problematic when the information provided by the sources is both uncertain and (highly) conflicting. This approach, known in literature as DSmT (standing for Dezert-Smarandache Theory), proposes new useful rules of combinations.

576 citations

Journal ArticleDOI
TL;DR: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art.
Abstract: The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art. The reason for considering Web mining, a separate field from data mining, is explained. The limitations of some of the existing Web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) are highlighted. A survey of the existing literature on "soft Web mining" is provided along with the commercially available systems. The prospective areas of Web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft Web mining" systems is explained. An extensive bibliography is also provided.

365 citations

Journal ArticleDOI
05 Oct 2001
TL;DR: An algorithm which is using rough set theory with greedy heuristics for feature selection and selects the features that do not damage the performance of induction is proposed.
Abstract: Practical machine learning algorithms are known to degrade in performance (prediction accuracy) when faced with many features (sometimes attribute is used instead of feature) that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features have been proposed. Among such methods, the filter approach that selects a feature subset using a preprocessing step, and the wrapper approach that selects an optimal feature subset from the space of possible subsets of features using the induction algorithm itself as a part of the evaluation function, are two typical ones. Although the filter approach is a faster one, it has some blindness and the performance of induction is not considered. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because of the complexity of time and space. In this paper, we propose an algorithm which is using rough set theory with greedy heuristics for feature selection. Selecting features is similar to the filter approach, but the evaluation criterion is related to the performance of induction. That is, we select the features that do not damage the performance of induction.

295 citations

Journal ArticleDOI
TL;DR: It is shown that a variable selection approach based on DISR can be formulated as a quadratic optimization problem: the dispersion sum problem (DSP); the combination of BESR and the DISR criterion is compared in theoretical and experimental terms to recently proposed information-theoretic criteria.
Abstract: The paper presents an original filter approach for effective feature selection in microarray data characterized by a large number of input variables and a few samples. The approach is based on the use of a new information-theoretic selection, the double input symmetrical relevance (DISR), which relies on a measure of variable complementarity. This measure evaluates the additional information that a set of variables provides about the output with respect to the sum of each single variable contribution. We show that a variable selection approach based on DISR can be formulated as a quadratic optimization problem: the dispersion sum problem (DSP). To solve this problem, we use a strategy based on backward elimination and sequential replacement (BESR). The combination of BESR and the DISR criterion is compared in theoretical and experimental terms to recently proposed information-theoretic criteria. Experimental results on a synthetic dataset as well as on a set of eleven microarray classification tasks show that the proposed technique is competitive with existing filter selection methods.

274 citations