scispace - formally typeset
Search or ask a question
Author

Cory J. Butz

Other affiliations: University of Ottawa
Bio: Cory J. Butz is an academic researcher from University of Regina. The author has contributed to research in topics: Bayesian network & Inference. The author has an hindex of 19, co-authored 107 publications receiving 1726 citations. Previous affiliations of Cory J. Butz include University of Ottawa.


Papers
More filters
Proceedings Article
01 Jan 2004
TL;DR: This work assumes that the utilities of itemsets may differ, and identifies the high utility itemsets based on information in the transaction database and external information about utilities.
Abstract: Most approaches to mining association rules implicitly consider the utilities of the itemsets to be equal. We assume that the utilities of itemsets may differ, and identify the high utility itemsets based on information in the transaction database and external information about utilities. Our theoretical analysis of the resulting problem lays the foundation for future utility mining algorithms.

505 citations

Journal ArticleDOI
TL;DR: This paper describes how binary classification with SVMs can be interpreted using rough sets and suggests two new approaches, extensions of 1-v-r and 1- v-1, to SVM multi-classification that allow for an error rate.
Abstract: Support vector machines (SVMs) are essentially binary classifiers To improve their applicability, several methods have been suggested for extending SVMs for multi-classification, including one-versus-one (1-v-1), one-versus-rest (1-v-r) and DAGSVM In this paper, we first describe how binary classification with SVMs can be interpreted using rough sets A rough set approach to SVM classification removes the necessity of exact classification and is especially useful when dealing with noisy data Next, by utilizing the boundary region in rough sets, we suggest two new approaches, extensions of 1-v-r and 1-v-1, to SVM multi-classification that allow for an error rate We explicitly demonstrate how our extended 1-v-r may shorten the training time of the conventional 1-v-r approach In addition, we show that our 1-v-1 approach may have reduced storage requirements compared to the conventional 1-v-1 and DAGSVM techniques Our techniques also provide better semantic interpretations of the classification process The theoretical conclusions are supported by experimental findings involving a synthetic dataset

125 citations

Journal ArticleDOI
01 Nov 2000
TL;DR: The present study suggests that there is no real difference between Bayesian networks and relational databases, in the sense that only solvable classes of independencies are useful in the design and implementation of these knowledge systems.
Abstract: The implication problem is to test whether a given set of independencies logically implies another independency. This problem is crucial in the design of a probabilistic reasoning system. We advocate that Bayesian networks are a generalization of standard relational databases. On the contrary, it has been suggested that Bayesian networks are different from the relational databases because the implication problem of these two systems does not coincide for some classes of probabilistic independencies. This remark, however, does not take into consideration one important issue, namely, the solvability of the implication problem. In this comprehensive study of the implication problem for probabilistic conditional independencies, it is emphasized that Bayesian networks and relational databases coincide on solvable classes of independencies. The present study suggests that the implication problem for these two closely related systems differs only in unsolvable classes of independencies. This means there is no real difference between Bayesian networks and relational databases, in the sense that only solvable classes of independencies are useful in the design and implementation of these knowledge systems. More importantly, perhaps, these results suggest that many current attempts to generalize Bayesian networks can take full advantage of the generalizations made to standard relational databases.

110 citations

Journal ArticleDOI
TL;DR: This paper presents a Web-based intelligent tutoring system, called BITS, which takes full advantage of Bayesian networks, which are a formal framework for uncertainty management in Artificial Intelligence based on probability theory.
Abstract: In this paper, we present a Web-based intelligent tutoring system, called BITS. The decision making process conducted in our intelligent system is guided by a Bayesian network approach to support students in learning computer programming. Our system takes full advantage of Bayesian networks, which are a formal framework for uncertainty management in Artificial Intelligence based on probability theory. We discuss how to employ Bayesian networks as an inference engine to guide the students' learning processes. In addition, we describe the architecture of BITS and the role of each module in the system. Whereas many tutoring systems are static HTML Web pages of a class textbook or lecture notes, our intelligent system can help a student navigate through the online course materials, recommend learning goals, and generate appropriate reading sequences.

89 citations

Proceedings ArticleDOI
20 Sep 2004
TL;DR: This paper presents a Web-based intelligent tutoring system for computer programming that can help a student navigate through the online course materials, recommend learning goals, and generate appropriate reading sequences.
Abstract: Web Intelligence is a direction for scientific research that explores practical applications of Artificial Intelligence to the next generation of Web-empowered systems. In this paper, we present a Web-based intelligent tutoring system for computer programming. The decision making process conducted in our intelligent system is guided by Bayesian networks, which are a formal framework for uncertainty management in Artificial Intelligence based on probability theory. Whereas many tutoring systems are static HTML Web pages of a class textbook or lecture notes, our intelligent system can help a student navigate through the online course materials, recommend learning goals, and generate appropriate reading sequences.

68 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
Abstract: We present a model that generates natural language descriptions of images and their regions. Our approach leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between language and visual data. Our alignment model is based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks (RNN) over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. We then describe a Multimodal Recurrent Neural Network architecture that uses the inferred alignments to learn to generate novel descriptions of image regions. We demonstrate that our alignment model produces state of the art results in retrieval experiments on Flickr8K, Flickr30K and MSCOCO datasets. We then show that the generated descriptions outperform retrieval baselines on both full images and on a new dataset of region-level annotations. Finally, we conduct large-scale analysis of our RNN language model on the Visual Genome dataset of 4.1 million captions and highlight the differences between image and region-level caption statistics.

1,953 citations

Journal ArticleDOI
TL;DR: This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
Abstract: Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.

1,198 citations

Journal ArticleDOI
01 Aug 2008
TL;DR: The WEBTABLES system develops new techniques for keyword search over a corpus of tables, and shows that they can achieve substantially higher relevance than solutions based on a traditional search engine.
Abstract: The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from Google's general-purpose web crawl, and used statistical classification techniques to find the estimated 154M that contain high-quality relational data. Because each relational table has its own "schema" of labeled and typed columns, each such table can be considered a small structured database. The resulting corpus of databases is larger than any other corpus we are aware of, by at least five orders of magnitude.We describe the WEBTABLES system to explore two fundamental questions about this collection of databases. First, what are effective techniques for searching for structured data at search-engine scales? Second, what additional power can be derived by analyzing such a huge corpus?First, we develop new techniques for keyword search over a corpus of tables, and show that they can achieve substantially higher relevance than solutions based on a traditional search engine. Second, we introduce a new object derived from the database corpus: the attribute correlation statistics database (AcsDB) that records corpus-wide statistics on co-occurrences of schema elements. In addition to improving search relevance, the AcsDB makes possible several novel applications: schema auto-complete, which helps a database designer to choose schema elements; attribute synonym finding, which automatically computes attribute synonym pairs for schema matching; and join-graph traversal, which allows a user to navigate between extracted schemas using automatically-generated join links.

697 citations

Journal ArticleDOI
TL;DR: A survey of the available literature on data mining using soft computing based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model is provided.
Abstract: The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

630 citations