scispace - formally typeset
Search or ask a question
Author

Maria-Luiza Antonie

Bio: Maria-Luiza Antonie is an academic researcher from University of Alberta. The author has contributed to research in topics: Association rule learning & Contextual image classification. The author has an hindex of 10, co-authored 15 publications receiving 1349 citations.

Papers
More filters
Proceedings Article
26 Aug 2001
TL;DR: This paper investigates the use of different data mining techniques, neural networks and association rule mining, for anomaly detection and classification, and shows that the two approaches performed well, obtaining a classification accuracy reaching over 70% percent for both techniques.
Abstract: Breast cancer represents the second leading cause of cancer deaths in women today and it is the most common type of cancer in women. This paper presents some experiments for tumour detection in digital mammography. We investigate the use of different data mining techniques, neural networks and association rule mining, for anomaly detection and classification. The results show that the two approaches performed well, obtaining a classification accuracy reaching over 70% percent for both techniques. Moreover, the experiments we conducted demonstrate the use and effectiveness of association rule mining in image categorization.

342 citations

Proceedings ArticleDOI
09 Dec 2002
TL;DR: This paper presents a novel approach for automatic text categorization that borrows from market basket analysis techniques using association rule mining in the data-mining field and focuses on finding the best term association rules in a textual database by generating and pruning.
Abstract: A good text classifier is a classifier that efficiently categorizes large sets of text documents in a reasonable time frame and with an acceptable accuracy, and that provides classification rules that are human readable for possible fine-tuning. If the training of the classifier is also quick, this could become in some application domains a good asset for the classifier. Many techniques and algorithms for automatic text categorization have been devised. According to published literature, some are more accurate than others, and some provide more interpretable classification models than others. However, none can combine all the beneficial properties enumerated above. In this paper we present a novel approach for automatic text categorization that borrows from market basket analysis techniques using association rule mining in the data-mining field. We focus on two major problems: (1) finding the best term association rules in a textual database by generating and pruning; and (2) using the rules to build a text classifier. Our text categorization method proves to be efficient and effective, and experiments on well-known collections show that the classifier performs well. In addition, training as well as classification are both fast and the generated rules are human readable.

247 citations

Book ChapterDOI
20 Sep 2004
TL;DR: An algorithm is proposed that extends the support-confidence framework with sliding correlation coefficient threshold and discovers negative association rules with strong negative correlation between the antecedents and consequents.
Abstract: Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Many other applications would benefit from negative association rules if it was not for the expensive process to discover them. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, and while they were referred to in many publications, very few algorithms to mine them have been proposed to date. In this paper we propose an algorithm that extends the support-confidence framework with sliding correlation coefficient threshold. In addition to finding confident positive rules that have a strong correlation, the algorithm discovers negative association rules with strong negative correlation between the antecedents and consequents.

209 citations

Proceedings ArticleDOI
13 Jun 2004
TL;DR: A new algorithm to discover at the same time positive and negative association rules is proposed, and a new associative classifier is introduced that takes advantage of these two types of rules.
Abstract: Associative classifiers use association rules to associate attribute values with observed class labels. This model has been recently introduced in the literature and shows good promise. The proposals so far have only concentrated on, and differ only in the way rules are ranked and selected in the model. We propose a new framework that uses different types of association rules, positive and negative. Negative association rules of interest are rules that either associate negations of attribute values to classes or negatively associate attribute values to classes. In this paper we propose a new algorithm to discover at the same time positive and negative association rules. We introduce a new associative classifier that takes advantage of these two types of rules. Moreover, we present a new way to prune irrelevant classification rules using a correlation coefficient without jeopardizing the accuracy of our associative classifier model. Our preliminary results with UCI datasets are very encouraging.

131 citations

Proceedings Article
23 Jul 2002
TL;DR: This paper illustrates, by comparison to other published research, how important the data cleaning phase is in building an accurate data mining architecture for image classification.
Abstract: This paper proposes a new classification method based on association rule mining. This association rule-based classifier is experimented on a real dataset; a database of medical images. The system we propose consists of: a preprocessing phase, a phase for mining the resulted transactional database, and a final phase to organize the resulted association rules in a classification model. The experimental results show that the method performs well reaching over 80% in accuracy. Moreover, this paper illustrates, by comparison to other published research, how important the data cleaning phase is in building an accurate data mining architecture for image classification.

124 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Book
01 Dec 2006
TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.

1,628 citations

Journal ArticleDOI
TL;DR: In this article, the authors study a space of coherent risk measures M/ obtained as certain expansions of coherent elementary basis measures and give necessary and sufficient conditions on / for M/ to be a coherent measure.
Abstract: We study a space of coherent risk measures M/ obtained as certain expansions of coherent elementary basis measures. In this space, the concept of ‘‘risk aversion function’’ / naturally arises as the spectral representation of each risk measure in a space of functions of confidence level probabilities. We give necessary and sufficient conditions on / for M/ to be a coherent measure. We find in this way a simple interpretation of the concept of coherence and a way to map any rational investor’s subjective risk aversion onto a coherent measure and vice-versa. We also provide for these measures their discrete versions M ðNÞ / acting on finite sets of N independent realizations of a r.v. which are not only shown to be coherent measures for any fixed N, but also consistent estimators of M/ for large N. 2002 Elsevier Science B.V. All rights reserved.

799 citations

Proceedings ArticleDOI
23 Jul 2002
TL;DR: Two algorithms for frequent term-based text clustering are presented, FTC which creates flat clusterings and HFTC for hierarchical clustering, which obtain clusterings of comparable quality significantly more efficiently than state-of-the- artText clustering algorithms.
Abstract: Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently than state-of-the- art text clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.

570 citations