Showing papers on "Concept mining published in 2006"

PDF

Open Access

Book•

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

[...]

Ronen Feldman¹, James Sanger•Institutions (1)

01 Dec 2006

TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.

...read moreread less

Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.

...read moreread less

1,628 citations

Journal Article•DOI•

Tapping the power of text mining

[...]

Weiguo Fan¹, Linda G. Wallace¹, Stephanie Rich¹, Zhongju Zhang²•Institutions (2)

Virginia Tech¹, University of Connecticut²

01 Sep 2006-Communications of The ACM

TL;DR: Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.

...read moreread less

Abstract: Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.

...read moreread less

408 citations

Journal Article•DOI•

A survey of temporal data mining

[...]

Srivatsan Laxman¹, P. S. Sastry¹•Institutions (1)

Indian Institute of Science¹

01 Apr 2006-Sadhana-academy Proceedings in Engineering Sciences

TL;DR: An overview of techniques of temporal data mining is presented, mainly concentrate on algorithms for pattern discovery in sequential data streams, and some recent results regarding statistical analysis of pattern discovery methods are described.

...read moreread less

Abstract: Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.

...read moreread less

346 citations

Journal Article•DOI•

Text mining and its potential applications in systems biology

[...]

Sophia Ananiadou¹, Douglas B. Kell¹, Jun'ichi Tsujii¹, Jun'ichi Tsujii²•Institutions (2)

University of Manchester¹, University of Tokyo²

01 Dec 2006-Trends in Biotechnology

TL;DR: By adding meaning to text, text mining techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.

...read moreread less

314 citations

Book•DOI•

Natural Language Processing and Text Mining

[...]

Anne Kao, Stephen R. Poteet

14 Nov 2006

TL;DR: This state-of-the-art survey is a must-have for advanced students, professionals, and researchers.

...read moreread less

Abstract: Natural Language Processing and Text Mining not only discusses applications of Natural Language Processing techniques to certain Text Mining tasks, but also the converse, the use of Text Mining to assist NLP. It assembles a diverse views from internationally recognized researchers and emphasizes caveats in the attempt to apply Natural Language Processing to text mining. This state-of-the-art survey is a must-have for advanced students, professionals, and researchers.

...read moreread less

292 citations

Proceedings Article•DOI•

MineBench: A Benchmark Suite for Data Mining Workloads

[...]

Ramanathan Narayanan¹, Berkin Ozisikyilmaz¹, Joseph Zambreno², Gokhan Memik¹, Alok Choudhary¹ - Show less +1 more•Institutions (2)

Northwestern University¹, Iowa State University²

01 Oct 2006

TL;DR: MineBench is presented, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories such as clustering, classification, and association rule mining that will be of use to those looking to characterize and accelerate data mining workloads.

...read moreread less

Abstract: Data mining constitutes an important class of scientific and commercial applications. Recent advances in data extraction techniques have created vast data sets, which require increasingly complex data mining algorithms to sift through them to generate meaningful information. The disproportionately slower rate of growth of computer systems has led to a sizeable performance gap between data mining systems and algorithms. The first step in closing this gap is to analyze these algorithms and understand their bottlenecks. With this knowledge, current computer architectures can be optimized for data mining applications. In this paper, we present MineBench, a publicly available benchmark mark suite containing fifteen representative data mining applications belonging to various categories such as clustering, classification, and association rule mining. We believe that MineBench will be of use to those looking to characterize and accelerate data mining workloads.

...read moreread less

242 citations

Proceedings Article•DOI•

TRIAS--An Algorithm for Mining Iceberg Tri-Lattices

[...]

Robert Jäschke¹, Andreas Hotho¹, Christoph Schmitz¹, Bernhard Ganter², Gerd Stumme¹ - Show less +1 more•Institutions (2)

University of Kassel¹, Dresden University of Technology²

18 Dec 2006

TL;DR: The foundations for mining frequent tri-concepts, which extend the notion of closed item-sets to three-dimensional data to allow for mining folk-sonomies, are presented.

...read moreread less

Abstract: In this paper, we present the foundations for mining frequent tri-concepts, which extend the notion of closed item-sets to three-dimensional data to allow for mining folk-sonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution as well as experimental results on a large real-world example.

...read moreread less

145 citations

Book•

Text Mining Application Programming

[...]

Manu Konchady

04 May 2006

TL;DR: Developers will be able to tap into the bevy of information available online in ways they never thought possible and students will have a thorough understanding of the theory and practical application of text mining.

...read moreread less

Abstract: Text mining offers a way for individuals and corporations to exploit the vast amount of information available on the Internet. Text Mining Application Programming teaches developers about the problems of managing unstructured text, and describes how to build tools for text mining using standard statistical methods from Artificial Intelligence and Operations Research. These tools can be used for a variety of fields, including law, business, and medicine. Key topics covered include, information extraction, clustering, text categorization, searching the Web, summarization, and natural language query systems. The book explains the theory behind each topic and algorithm, and then provides a practical solution implementation with which developers and students can experiment. A wide variety of code is also included for developers to build their own custom solutions. After reading through this book developers will be able to tap into the bevy information available online in ways they never thought possible and students will have a thorough understanding of the theory and practical application of text mining.

...read moreread less

118 citations

Mining Aspects from Version History.

[...]

Silvia Breu, Thomas Zimmermann

01 Jan 2006

TL;DR: In this paper, the authors apply formal concept analysis to a program's history to identify cross-cutting concerns in order to then help migrating a system to a better design, maybe even to an aspect-oriented design.

...read moreread less

Abstract: As software evolves, new functionality sometimes no longer aligns with the original design, ending up scattered across a program. Aspect mining identifies such cross-cutting concerns in order to then help migrating a system to a better design, maybe even to an aspect-oriented design. We address this task by applying formal concept analysis to a program's history: method calls added across many locations are likely to be cross-cutting. By taking this historical perspective, we introduce a new dimension to aspect mining. As we only analyse changes from one version to the next, the technique is independent of a system's total size and scales up to industrial-sized projects such as Eclipse.

...read moreread less

102 citations

Book Chapter•DOI•

Video mining with frequent itemset configurations

[...]

Till Quack¹, Vittorio Ferrari², Luc Van Gool¹•Institutions (2)

ETH Zurich¹, French Institute for Research in Computer Science and Automation²

13 Jul 2006

TL;DR: In this article, a method for mining frequently occurring objects and scenes from videos is presented, which is based on the class of frequent itemset mining algorithms, which have proven their efficiency in other domains, but have not been applied to video mining before.

...read moreread less

Abstract: We present a method for mining frequently occurring objects and scenes from videos. Object candidates are detected by finding recurring spatial arrangements of affine covariant regions. Our mining method is based on the class of frequent itemset mining algorithms, which have proven their efficiency in other domains, but have not been applied to video mining before. In this work we show how to express vector-quantized features and their spatial relations as itemsets. Furthermore, a fast motion segmentation method is introduced as an attention filter for the mining algorithm. Results are shown on real world data consisting of music video clips.

...read moreread less

68 citations

Proceedings Article•DOI•

Enhancing Text Clustering Using Concept-based Mining Model

[...]

Shady Shehata¹, Fakhri Karray¹, Mohamed S. Kamel¹•Institutions (1)

University of Waterloo¹

18 Dec 2006

TL;DR: A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced and enhances the clustering quality of sets of documents substantially.

...read moreread less

Abstract: Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The proposed mining model consists of a concept-based analysis of terms and a concept-based similarity measure. The term which contributes to the sentence semantics is analyzed with respect to its importance at the sentence and document levels. The model can efficiently find significant matching terms, either words or phrases, of the documents according to the semantics of the text. The similarity between documents relies on a new concept-based similarity measure which is applied to the matching terms between documents. Experiments using the proposed concept-based term analysis and similarity measure in text clustering are conducted. Experimental results demonstrate that the newly developed concept-based mining model enhances the clustering quality of sets of documents substantially.

...read moreread less

Book•

Multi-Relational Data Mining

[...]

Arno Knobbe

01 Jan 2006

TL;DR: In this paper, the authors present a framework and an architecture that provide a generalization from one-table mining to multiple table mining, i.e., to support mining on full relational databases.

...read moreread less

Abstract: An important aspect of data mining algorithms and systems is that they should scale well to large databases. A consequence of this is that most data mining tools are based on machine learning algorithms that work on data in attribute-value format. Experience has proven that such ''single-table'' mining algorithms indeed scale well. The downside of this format is, however, that more complex patterns are simply not expressible in this format and, thus, cannot be discovered. One way to enlarge the expressiveness is to generalize, as in ILP, from one-table mining to multiple table mining, i.e., to support mining on full relational databases. The key step in such a generalization is to ensure that the search space does not explode and that efficiency and, thus, scalability are maintained. In this paper we present a framework and an architecture that provide such a generalization. In this framework the semantic information in the database schema, e.g., foreign keys, are exploited to prune the search space and, in the architecture, database primitives are defined to ensure efficiency. Moreover, the framework induces a canonical generalization of algorithms, i.e., if the generalized algorithms are run on a single table database, they give the same results as their single-table counterparts. The framework is illustrated by the Warmr algorithm, which is a multi-relational generalization of the Apriori algorithm.

...read moreread less

Book Chapter•DOI•

Don't be afraid of simpler patterns

[...]

Björn Bringmann¹, Albrecht Zimmermann¹, Luc De Raedt¹, Siegfried Nijssen¹•Institutions (1)

University of Freiburg¹

18 Sep 2006

TL;DR: This paper provides empirical answers to the following questions: how does the expressive power of the language affect the computational cost?

...read moreread less

Abstract: This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs.

...read moreread less

Journal Article•DOI•

Domain-Driven Data Mining: A Practical Methodology

[...]

Longbing Cao¹, Chengqi Zhang¹•Institutions (1)

University of Technology, Sydney¹

01 Oct 2006-International Journal of Data Warehousing and Mining

TL;DR: This article proposes a practical data mining methodology referred to as domain-driven data mining, which targets actionable knowledge discovery in a constrained environment for satisfying user preference and illustrates some examples in mining actionable correlations in Australian Stock Exchange.

...read moreread less

Abstract: Extant data mining is based on data-driven methodologies. It either views data mining as an autonomous data-driven, trial-and-error process or only analyzes business issues in an isolated, case-by-case manner. As a result, very often the knowledge discovered generally is not interesting to real business needs. Therefore, this article proposes a practical data mining methodology referred to as domain-driven data mining, which targets actionable knowledge discovery in a constrained environment for satisfying user preference. The domain-driven data mining consists of a DDID-PD framework that considers key components such as constraintbased context, integrating domain knowledge, human-machine cooperation, in-depth mining, actionability enhancement, and iterative refinement process. We also illustrate some examples in mining actionable correlations in Australian Stock Exchange, which show that domain-driven data mining has potential to improve further the actionability of patterns for practical use by industry and business.

...read moreread less

Proceedings Article•DOI•

TRIPS and TIDES: new algorithms for tree mining

[...]

Shirish Tatikonda¹, Srinivasan Parthasarathy¹, Tahsin Kurc¹•Institutions (1)

Ohio State University¹

06 Nov 2006

TL;DR: This paper proposes novel algorithms to mine frequent subtrees from a database of rooted trees to achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.

...read moreread less

Abstract: Recent research in data mining has progressed from mining frequent itemsets to more general and structured patterns like trees and graphs. In this paper, we address the problem of frequent subtree mining that has proven to be viable in a wide range of applications such as bioinformatics, XML processing, computational linguistics, and web usage mining. We propose novel algorithms to mine frequent subtrees from a database of rooted trees. We evaluate the use of two popular sequential encodings of trees to systematically generate and evaluate the candidate patterns. The proposed approach is very generic and can be used to mine embedded or induced subtrees that can be labeled, unlabeled, ordered, unordered, or edge-labeled. Our algorithms are highly cache-conscious in nature because of the compact and simple array-based data structures we use. Typically, L1 and L2 hit rates above 99% are observed. Experimental evaluation showed that our algorithms can achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.

...read moreread less

Book Chapter•DOI•

Towards a general framework for data mining

[...]

Sašo Džeroski¹•Institutions (1)

Jožef Stefan Institute¹

18 Sep 2006

TL;DR: This paper lays out some basic concepts, starting with (structured) data and generalizations and continuing with data mining tasks and basic components of data mining algorithms, and discusses how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.

...read moreread less

Abstract: In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. We also discuss data mining languages and what they should support: this includes the design and implementation of data mining algorithms, as well as their composition into nontrivial multistep knowledge discovery scenarios relevant for practical application. We proceed by laying out some basic concepts, starting with (structured) data and generalizations (e.g., patterns and models) and continuing with data mining tasks and basic components of data mining algorithms (i.e., refinement operators, distances, features and kernels). We next discuss how to use these concepts to formulate constraint-based data mining tasks and design generic data mining algorithms. We finally discuss how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.

...read moreread less

Book•DOI•

Foundations and novel approaches in data mining

[...]

Tsau Young Lin, Setsuo Ohsuga, Churn-Jung Liau, Xiaohua Hu

01 Jan 2006

TL;DR: From the contents Part I: Theoretical Foundations, Making Better Sense of the Demographic Data Value in the Data Mining Procedure.

...read moreread less

Abstract: From the contents Part I: Theoretical Foundations. Commonsense Causal Modeling in the Data Mining Context. Definability of Association Rules in Predicate Calculus. A Measurement-Theoretic Foundation of Rule Interestingness Evaluation. Statistical Independence as Linear Dependence in a Contingency Table. Foundations of Classification.- Part II: Novel Approaches. SVM-OD: SVM Method to Detect Outliers. Extracting Rules from Incomplete Decision Systems: System ERID. Mining for Patterns Based on Contingency Tables by KL-Miner - First Experience. Knowledge Discovery in Fuzzy Databases Using Attribute-Oriented Induction. Rough Set Strategies to Data with Missing Attribute Values. Privacy-Preserving Collaborative Data Mining.- Part III: Novel Applications. Research Issues in Web Structural Delta Mining. Workflow Reduction for Reachable-path Rediscovery in Workflow Mining. Principal Component-based Anomaly Detection Scheme. Making Better Sense of the Demographic Data Value in the Data Mining Procedure.

...read moreread less

Chapter 3 Lexical, terminological and ontological resources for biological text mining

[...]

Olivier Bodenreider

01 Jan 2006

TL;DR: Although a significant part of current text mining efforts focuses on the analysis of documents related to m olecular biology, the use of lexical, terminological and ontological resources is me ntioned in research systems developed for theAnalysis of clinical narratives or the biological literature.

...read moreread less

Abstract: Biomedical terminologies and ontologies are frequently descr ibed as enabling resources in text mining systems [e.g., 1, 2, 3]. These re ou ces are used to supports tasks such as entity recognition (i.e., the identi fication of biomedical entities in text) and relation extraction (i.e., the ide ntification of relationships among biomedical entities). Although a significant part of current text mining efforts focuses on the analysis of documents related to m olecular biology, the use of lexical, terminological and ontological resources is me ntioned in research systems developed for the analysis of clinical narratives ( e.g., MedSyndikate [4]) or the biological literature (e.g., BioRAT [5], GeneS c ne [6], EMPathIE [7] and PASTA [7]). Of note, some systems initially developed fo r extracting clinical information have later been adapted to extract relation s among biological entities (e.g., MedLEE [8] / GENIES [9], SemRep / SemGen [10]). Com mercial systems such as TeSSI i also make use of such resources.

...read moreread less

Proceedings Article•DOI•

Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval

[...]

Xiangji Huang¹, Yan Huang¹, Miao Wen¹, Aijun An¹, Yang Liu¹, Josiah Poon² - Show less +2 more•Institutions (2)

York University¹, University of Sydney²

18 Dec 2006

TL;DR: The results show that data mining can be successfully applied to improve the text retrieval performance and the data mining based feedback method evaluated on the TREC HARD data set is evaluated.

...read moreread less

Abstract: In this paper, we investigate the use of data mining, in particular the text classification and co-training techniques, to identify more relevant passages based on a small set of labeled passages obtained from the blind feedback of a retrieval system. The data mining results are used to expand query terms and to re-estimate some of the parameters used in a probabilistic weighting function. We evaluate the data mining based feedback method on the TREC HARD data set. The results show that data mining can be successfully applied to improve the text retrieval performance. We report our experimental findings in detail.

...read moreread less

Journal Article•DOI•

Multimedia Data Mining and Knowledge Discovery

[...]

Valery A. Petrushin¹, Latifur Khan•Institutions (1)

Accenture¹

01 Dec 2006-Journal of Electronic Imaging

TL;DR: A Time-Constrained Sequential Pattern Mining for Extracting Semantic Events in Videos and a Data Mining Approach to Expressive Music Performance Modeling are introduced.

...read moreread less

Abstract: into Multimedia Data Mining and Knowledge Discovery.- Multimedia Data Mining: An Overview.- Multimedia Data Exploration and Visualization.- A New Hierarchical Approach for Image Clustering.- Multiresolution Clustering of Time Series and Application to Images.- Mining Rare and Frequent Events in Multi-camera Surveillance Video.- Density-Based Data Analysis and Similarity Search.- Feature Selection for Classification of Variable Length Multiattribute Motions.- Multimedia Data Indexing and Retrieval.- FAST: Fast and Semantics-Tailored Image Retrieval.- New Image Retrieval Principle: Image Mining and Visual Ontology.- Visual Alphabets: Video Classification by End Users.- Multimedia Data Modeling and Evaluation.- Cognitively Motivated Novelty Detection in Video Data Streams.- Video Event Mining via Multimodal Content Analysis and Classification.- Exploiting Spatial Transformations for Identifying Mappings in Hierarchical Media Data.- A Novel Framework for Semantic Image Classification and Benchmark Via Salient Objects.- Extracting Semantics Through Dynamic Context.- Mining Image Content by Aligning Entropies with an Exemplar.- More Efficient Mining Over Heterogeneous Data Using Neural Expert Networks.- A Data Mining Approach to Expressive Music Performance Modeling.- Applications and Case Studies.- Supporting Virtual Workspace Design Through Media Mining and Reverse Engineering.- A Time-Constrained Sequential Pattern Mining for Extracting Semantic Events in Videos.- Multiple-Sensor People Localization in an Office Environment.- Multimedia Data Mining Framework for Banner Images.- Analyzing User's Behavior on a Video Database.- On SVD-Free Latent Semantic Indexing for Iris Recognition of Large Databases.- Mining Knowledge in Computer Tomography Image Databases.

...read moreread less

Journal Article•DOI•

Flexible online association rule mining based on multidimensional pattern relations

[...]

Ching-Yao Wang¹, Shian-Shyong Tseng¹, Tzung-Pei Hong²•Institutions (2)

National Chiao Tung University¹, National University of Kaohsiung²

01 Jun 2006-Information Sciences

TL;DR: This work proposes a relation called the multidimensional pattern relation to structurally and systematically store context and mining information for later analysis and develops an online mining approach called three-phase online association rule mining (TOARM) based on this proposed multiddimensional pattern relation.

...read moreread less

Proceedings Article•DOI•

Temporal Data Mining in Dynamic Feature Spaces

[...]

Brent Wenerstrom, Christophe Giraud-Carrier¹•Institutions (1)

Brigham Young University¹

18 Dec 2006

TL;DR: FAE, an incremental ensemble approach to mining data subject to concept drift, is presented, and empirical results on large data streams demonstrate promise.

...read moreread less

Abstract: Many interesting real-world applications for temporal data mining are hindered by concept drift. One particular form of concept drift is characterized by changes to the underlying feature space. Seemingly little has been done in this area. This paper presents FAE, an incremental ensemble approach to mining data subject to such concept drift. Empirical results on large data streams demonstrate promise.

...read moreread less

Book•

Data Mining: theory, Methodology, Techniques, and Applications

[...]

Graham J. Williams¹, Simeon J. Simoff²•Institutions (2)

Australian Taxation Office¹, University of Western Sydney²

01 Jan 2006

TL;DR: A Data Mining Approach to Analyze the Effect of Cognitive Style and Subjective Emotion on the Accuracy of Time-Series Forecasting and a Multi-level Framework for the Analysis of Sequential Data are presented.

...read moreread less

Abstract: 1: State-of-the-Art in Research.- Generality Is Predictive of Prediction Accuracy.- Visualisation and Exploration of Scientific Data Using Graphs.- A Case-Based Data Mining Platform.- Consolidated Trees: An Analysis of Structural Convergence.- K Nearest Neighbor Edition to Guide Classification Tree Learning: Motivation and Experimental Results.- Efficiently Identifying Exploratory Rules' Significance.- Mining Value-Based Item Packages - An Integer Programming Approach.- Decision Theoretic Fusion Framework for Actionability Using Data Mining on an Embedded System.- Use of Data Mining in System Development Life Cycle.- Mining MOUCLAS Patterns and Jumping MOUCLAS Patterns to Construct Classifiers.- A Probabilistic Geocoding System Utilising a Parcel Based Address File.- Decision Models for Record Linkage.- Intelligent Document Filter for the Internet.- Informing the Curious Negotiator: Automatic News Extraction from the Internet.- Text Mining for Insurance Claim Cost Prediction.- An Application of Time-Changing Feature Selection.- A Data Mining Approach to Analyze the Effect of Cognitive Style and Subjective Emotion on the Accuracy of Time-Series Forecasting.- A Multi-level Framework for the Analysis of Sequential Data.- 2: State-of-the-Art in Applications.- Hierarchical Hidden Markov Models: An Application to Health Insurance Data.- Identifying Risk Groups Associated with Colorectal Cancer.- Mining Quantitative Association Rules in Protein Sequences.- Mining X-Ray Images of SARS Patients.- The Scamseek Project - Text Mining for Financial Scams on the Internet.- A Data Mining Approach for Branch and ATM Site Evaluation.- The Effectiveness of Positive Data Sharing in Controlling the Growth of Indebtedness in Hong Kong Credit Card Industry.

...read moreread less

Journal Article•DOI•

From data mining to behavior mining

[...]

Zhengxin Chen¹•Institutions (1)

University of Nebraska Omaha¹

01 Dec 2006-International Journal of Information Technology and Decision Making

TL;DR: Behavior mining as discussed by the authors is a new stage in the research and practice of knowledge discovery, which can be termed as behavior mining, and it has the potential of unifying some other recent activities in data mining.

...read moreread less

Abstract: Knowledge economy requires data mining be more goal-oriented so that more tangible results can be produced. This requirement implies that the semantics of the data should be incorporated into the mining process. Data mining is ready to deal with this challenge because recent developments in data mining have shown an increasing interest on mining of complex data (as exemplified by graph mining, text mining, etc.). By incorporating the relationships of the data along with the data itself (rather than focusing on the data alone), complex data injects semantics into the mining process, thus enhancing the potential of making better contribution to knowledge economy. Since the relationships between the data reveal certain behavioral aspects underlying the plain data, this shift of mining from simple data to complex data signals a fundamental change to a new stage in the research and practice of knowledge discovery, which can be termed as behavior mining. Behavior mining also has the potential of unifying some other recent activities in data mining. We discuss important aspects on behavior mining, and discuss its implications for the future of data mining.

...read moreread less

Journal Article•DOI•

Expressive power of an algebra for data mining

[...]

Toon Calders, Laks V. S. Lakshmanan¹, Raymond T. Ng¹, Jan Paredaens²•Institutions (2)

University of British Columbia¹, University of Antwerp²

01 Dec 2006-ACM Transactions on Database Systems

TL;DR: A database model and an algebra for data mining are presented, based on the 3W-model introduced by Johnson et al.

...read moreread less

Abstract: The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can accommodate disparate mining tasksq We address this problem by presenting a database model and an algebra for data mining. The database model is based on the 3W-model introduced by Johnson et al. [2000]. This model relied on black box mining operators. A main contribution of this article is to open up these black boxes, by using generic operators in a data mining algebra. Two key operators in this algebra are regionize, which creates regions (or models) from data tuples, and a restricted form of looping called mining loop. Then the resulting data mining algebra MA is studied and properties concerning expressive power and complexity are established. We present results in three directions: (1) expressiveness of the mining algebra; (2) relations with alternative frameworks, and (3) interactions between regionize and mining loop.

...read moreread less

Journal Article•

Mining Association Rules in Temporal Document Collections

[...]

Kjetil Nørvåg, Trond Oivind Eriksen, Kjell-Inge Skogstad

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this article, the authors describe the various steps in the temporal text mining process, including data cleaning, text refinement, temporal association rule mining and rule post-processing, and describe the Temporal Text Mining Testbench.

...read moreread less

Abstract: In this paper we describe how to mine association rules in temporal document collections. We describe how to perform the various steps in the temporal text mining process, including data cleaning, text refinement, temporal association rule mining and rule post-processing. We also describe the Temporal Text Mining Testbench, which is a user-friendly and versatile tool for performing temporal text mining, and some results from using this tool.

...read moreread less

Journal Article•DOI•

Incorporating domain knowledge into data mining process: An ontology based framework

[...]

Pan Ding¹, Shen Jun-yi¹, Zhou Mu-xin²•Institutions (2)

Xi'an Jiaotong University¹, Shanghai Jiao Tong University²

01 Jan 2006-Wuhan University Journal of Natural Sciences

TL;DR: A novel model for data mining is proposed in evolving environment where some valid mining task schedules are generated, and then autonomous and local mining are executed periodically, and previous results are merged and refined.

...read moreread less

Abstract: With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then autonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to incorporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data source by ontology. Experiments suggest that the framework should be useful in guiding the continous mining process.

...read moreread less

Proceedings Article•DOI•

Mining Substructures in Protein Data

[...]

Fedja Hadzic, Tharam S. Dillon, Amandeep S. Sidhu, Elizabeth Chang¹, Henry Tan² - Show less +1 more•Institutions (2)

Curtin University¹, University of Technology, Sydney²

18 Dec 2006

TL;DR: The MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees is applied to the Prions database and is shown to be a viable technique for mining protein data.

...read moreread less

Abstract: In this paper we consider the ?Prions? database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent substructures from tree databases is an important task and it has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. This has given rise to the development of many tree mining algorithms which can aid in structural comparisons, association rule discovery and in general mining of tree structured knowledge representations. Previously we have developed the MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees. In this work we apply the algorithm to the Prions database in order to extract the frequently occurring patterns, which in this case are of induced subtree type. Obtaining the set of frequent induced subtrees from the Prions database can potentially reveal some useful knowledge. This aspect will be demonstrated by providing an analysis of the extracted frequent subtrees with respect to discovering interesting protein information. Furthermore, the minimum support threshold can be used as the controlling factor for answering specific queries posed on the Prions dataset. This approach is shown to be a viable technique for mining protein data.

...read moreread less

Proceedings Article•DOI•

Invited Paper: Intelligent Data Mining Assistance via CBR and Ontologies

[...]

M. Charest, Sylvain Delisle, O. Cervantes, Y. Shen

04 Sep 2006

TL;DR: The realization of a hybrid data mining assistant, based on the CBR paradigm and the use of an ontology, is proposed in order to empower the user during the various phases of the data mining process.

...read moreread less

Abstract: Most commercial data mining products provide a large number of models and tools for performing various data mining tasks, but few provide intelligent assistance for addressing many important decisions that must be considered during the mining process. In this paper, we propose the realization of a hybrid data mining assistant, based on the CBR paradigm and the use of an ontology, in order to empower the user during the various phases of the data mining process.

...read moreread less

Book•

Data Mining VII: Data, Text And Web Mining And Their Business Applications

[...]

Alessandro Zanasi, Carlos Alberto Brebbia, N. F. F. Ebecken

23 Jun 2006

TL;DR: A data mining approach to support the development of long-term load forecasting and corporate bankruptcy prediction using data mining techniques is presented.

...read moreread less

Abstract: Section 1: Data preparation Nonlinear dimensionality reduction of large datasets for data exploration Text preparation through extended tokenization Section 2: Clustering technologies A method for association rule quality evaluation based on information theory K-means algorithm and its application for clustering companies listed in Zhejiang province Fuzzy geo-processing for characterization of social groups: an application to a Brazilian mid-size city Dynamic classification: economic welfare growth in the EU during 1995-2004 Cluster analysis of 3D seismic data for oil and gas exploration Clustering of time series using a similarity between segments and bands determined by patterns of technical analysis The CLUSTER3 system for goal-oriented conceptual clustering: method and preliminary results Section 3: Categorisation methods A neural-networks associative classification method for association rule mining An efficient Bayesian network approach for discovering interesting patterns Kernel Discriminant Analysis and information complexity: advanced models for micro-data mining and micro-marketing solutions Section 4: MS SQL Server data mining (Special session by C. L. Curotto and N. F. F. Ebecken) Mining cross-predicting stochastic ARMA time series in SQL server 2005 Stability analysis of time series forecasting with ART models Multi-relational data mining in Microsoft SQL Server 2005 Electrical thunderstorm nowcasting using lightning data mining Section 5: Text mining Computational system for the textual processing of industrial patents Using text mining to understand the call center customers' claims A neural-based text summarization system Analysis and development of latent semantic indexing techniques for information retrieval Section 6: Web mining High performance environment for knowledge discovering in Portuguese language texts in the Web On the relationship between click rate and relevance for search engines Selecting clickstream data mining plans using a case-based reasoning application Web page recommendation using a stochastic process model A new algorithm to measure relevance among Web pages A Web Mining process for e-Knowledge services Section 7: Customer relationship management A study of customer relationship management (CRM) on apparel European Web sites Maximum resolution dichotomy for customer relations management Telco churn analysis classification using a wavelet and RBF approach Section 8: Applications in science and engineering Protein Ontology Project: 2006 updates Evidence-based medicine: data mining and pharmacoepidemiology research SEQUEST: mining frequent subsequences using DMA-Strips Section 9: Applications in business, industry and government Traceability in the food-sector: the state of the art in a North Eastern Italian region A data mining approach to support the development of long-term load forecasting Corporate bankruptcy prediction using data mining techniques A Text Mining based content gathering system as strategic support for SMEs Intelligent analysis tools for computer based assessments Information visualization for the taking of decisions Section 10: Information systems, strategies and methodologies Challenges in developing a cost-effective data warehouse for a tertiary institution in a developing country A proposal of information system architecture for public transport Local nulls in summarised mobile and distributed databases A Semantic Web Portal to construction knowledge exchange Ontological support to knowledge management in a hydrogeological information system Enterprise Intelligence Platform in the airline industry Improvement of generation change on SSE algorithm

...read moreread less