scispace - formally typeset
Search or ask a question

Showing papers on "Concept mining published in 2006"


Book
01 Dec 2006
TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.

1,628 citations


Journal ArticleDOI
TL;DR: Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.
Abstract: Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.

408 citations


Journal ArticleDOI
TL;DR: An overview of techniques of temporal data mining is presented, mainly concentrate on algorithms for pattern discovery in sequential data streams, and some recent results regarding statistical analysis of pattern discovery methods are described.
Abstract: Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining. We mainly concentrate on algorithms for pattern discovery in sequential data streams. We also describe some recent results regarding statistical analysis of pattern discovery methods.

346 citations


Journal ArticleDOI
TL;DR: By adding meaning to text, text mining techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.

314 citations


BookDOI
14 Nov 2006
TL;DR: This state-of-the-art survey is a must-have for advanced students, professionals, and researchers.
Abstract: Natural Language Processing and Text Mining not only discusses applications of Natural Language Processing techniques to certain Text Mining tasks, but also the converse, the use of Text Mining to assist NLP. It assembles a diverse views from internationally recognized researchers and emphasizes caveats in the attempt to apply Natural Language Processing to text mining. This state-of-the-art survey is a must-have for advanced students, professionals, and researchers.

292 citations


Proceedings ArticleDOI
01 Oct 2006
TL;DR: MineBench is presented, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories such as clustering, classification, and association rule mining that will be of use to those looking to characterize and accelerate data mining workloads.
Abstract: Data mining constitutes an important class of scientific and commercial applications. Recent advances in data extraction techniques have created vast data sets, which require increasingly complex data mining algorithms to sift through them to generate meaningful information. The disproportionately slower rate of growth of computer systems has led to a sizeable performance gap between data mining systems and algorithms. The first step in closing this gap is to analyze these algorithms and understand their bottlenecks. With this knowledge, current computer architectures can be optimized for data mining applications. In this paper, we present MineBench, a publicly available benchmark mark suite containing fifteen representative data mining applications belonging to various categories such as clustering, classification, and association rule mining. We believe that MineBench will be of use to those looking to characterize and accelerate data mining workloads.

242 citations


Proceedings ArticleDOI
18 Dec 2006
TL;DR: The foundations for mining frequent tri-concepts, which extend the notion of closed item-sets to three-dimensional data to allow for mining folk-sonomies, are presented.
Abstract: In this paper, we present the foundations for mining frequent tri-concepts, which extend the notion of closed item-sets to three-dimensional data to allow for mining folk-sonomies. We provide a formal definition of the problem, and present an efficient algorithm for its solution as well as experimental results on a large real-world example.

145 citations


Book
04 May 2006
TL;DR: Developers will be able to tap into the bevy of information available online in ways they never thought possible and students will have a thorough understanding of the theory and practical application of text mining.
Abstract: Text mining offers a way for individuals and corporations to exploit the vast amount of information available on the Internet. Text Mining Application Programming teaches developers about the problems of managing unstructured text, and describes how to build tools for text mining using standard statistical methods from Artificial Intelligence and Operations Research. These tools can be used for a variety of fields, including law, business, and medicine. Key topics covered include, information extraction, clustering, text categorization, searching the Web, summarization, and natural language query systems. The book explains the theory behind each topic and algorithm, and then provides a practical solution implementation with which developers and students can experiment. A wide variety of code is also included for developers to build their own custom solutions. After reading through this book developers will be able to tap into the bevy information available online in ways they never thought possible and students will have a thorough understanding of the theory and practical application of text mining.

118 citations


01 Jan 2006
TL;DR: In this paper, the authors apply formal concept analysis to a program's history to identify cross-cutting concerns in order to then help migrating a system to a better design, maybe even to an aspect-oriented design.
Abstract: As software evolves, new functionality sometimes no longer aligns with the original design, ending up scattered across a program. Aspect mining identifies such cross-cutting concerns in order to then help migrating a system to a better design, maybe even to an aspect-oriented design. We address this task by applying formal concept analysis to a program's history: method calls added across many locations are likely to be cross-cutting. By taking this historical perspective, we introduce a new dimension to aspect mining. As we only analyse changes from one version to the next, the technique is independent of a system's total size and scales up to industrial-sized projects such as Eclipse.

102 citations


Book ChapterDOI
13 Jul 2006
TL;DR: In this article, a method for mining frequently occurring objects and scenes from videos is presented, which is based on the class of frequent itemset mining algorithms, which have proven their efficiency in other domains, but have not been applied to video mining before.
Abstract: We present a method for mining frequently occurring objects and scenes from videos. Object candidates are detected by finding recurring spatial arrangements of affine covariant regions. Our mining method is based on the class of frequent itemset mining algorithms, which have proven their efficiency in other domains, but have not been applied to video mining before. In this work we show how to express vector-quantized features and their spatial relations as itemsets. Furthermore, a fast motion segmentation method is introduced as an attention filter for the mining algorithm. Results are shown on real world data consisting of music video clips.

68 citations


Proceedings ArticleDOI
18 Dec 2006
TL;DR: A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced and enhances the clustering quality of sets of documents substantially.
Abstract: Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The proposed mining model consists of a concept-based analysis of terms and a concept-based similarity measure. The term which contributes to the sentence semantics is analyzed with respect to its importance at the sentence and document levels. The model can efficiently find significant matching terms, either words or phrases, of the documents according to the semantics of the text. The similarity between documents relies on a new concept-based similarity measure which is applied to the matching terms between documents. Experiments using the proposed concept-based term analysis and similarity measure in text clustering are conducted. Experimental results demonstrate that the newly developed concept-based mining model enhances the clustering quality of sets of documents substantially.

Book
01 Jan 2006
TL;DR: In this paper, the authors present a framework and an architecture that provide a generalization from one-table mining to multiple table mining, i.e., to support mining on full relational databases.
Abstract: An important aspect of data mining algorithms and systems is that they should scale well to large databases. A consequence of this is that most data mining tools are based on machine learning algorithms that work on data in attribute-value format. Experience has proven that such ''single-table'' mining algorithms indeed scale well. The downside of this format is, however, that more complex patterns are simply not expressible in this format and, thus, cannot be discovered. One way to enlarge the expressiveness is to generalize, as in ILP, from one-table mining to multiple table mining, i.e., to support mining on full relational databases. The key step in such a generalization is to ensure that the search space does not explode and that efficiency and, thus, scalability are maintained. In this paper we present a framework and an architecture that provide such a generalization. In this framework the semantic information in the database schema, e.g., foreign keys, are exploited to prune the search space and, in the architecture, database primitives are defined to ensure efficiency. Moreover, the framework induces a canonical generalization of algorithms, i.e., if the generalized algorithms are run on a single table database, they give the same results as their single-table counterparts. The framework is illustrated by the Warmr algorithm, which is a multi-relational generalization of the Apriori algorithm.

Book ChapterDOI
18 Sep 2006
TL;DR: This paper provides empirical answers to the following questions: how does the expressive power of the language affect the computational cost?
Abstract: This paper investigates the trade-off between the expressiveness of the pattern language and the performance of the pattern miner in structured data mining. This trade-off is investigated in the context of correlated pattern mining, which is concerned with finding the k-best patterns according to a convex criterion, for the pattern languages of itemsets, multi-itemsets, sequences, trees and graphs. The criteria used in our investigation are the typical ones in data mining: computational cost and predictive accuracy and the domain is that of mining molecular graph databases. More specifically, we provide empirical answers to the following questions: how does the expressive power of the language affect the computational cost? and what is the trade-off between expressiveness of the pattern language and the predictive accuracy of the learned model? While answering the first question, we also introduce a novel stepwise approach to correlated pattern mining in which the results of mining a simpler pattern language are employed as a starting point for mining in a more complex one. This stepwise approach typically leads to significant speed-ups (up to a factor 1000) for mining graphs.

Journal ArticleDOI
TL;DR: This article proposes a practical data mining methodology referred to as domain-driven data mining, which targets actionable knowledge discovery in a constrained environment for satisfying user preference and illustrates some examples in mining actionable correlations in Australian Stock Exchange.
Abstract: Extant data mining is based on data-driven methodologies. It either views data mining as an autonomous data-driven, trial-and-error process or only analyzes business issues in an isolated, case-by-case manner. As a result, very often the knowledge discovered generally is not interesting to real business needs. Therefore, this article proposes a practical data mining methodology referred to as domain-driven data mining, which targets actionable knowledge discovery in a constrained environment for satisfying user preference. The domain-driven data mining consists of a DDID-PD framework that considers key components such as constraintbased context, integrating domain knowledge, human-machine cooperation, in-depth mining, actionability enhancement, and iterative refinement process. We also illustrate some examples in mining actionable correlations in Australian Stock Exchange, which show that domain-driven data mining has potential to improve further the actionability of patterns for practical use by industry and business.

Proceedings ArticleDOI
06 Nov 2006
TL;DR: This paper proposes novel algorithms to mine frequent subtrees from a database of rooted trees to achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.
Abstract: Recent research in data mining has progressed from mining frequent itemsets to more general and structured patterns like trees and graphs. In this paper, we address the problem of frequent subtree mining that has proven to be viable in a wide range of applications such as bioinformatics, XML processing, computational linguistics, and web usage mining. We propose novel algorithms to mine frequent subtrees from a database of rooted trees. We evaluate the use of two popular sequential encodings of trees to systematically generate and evaluate the candidate patterns. The proposed approach is very generic and can be used to mine embedded or induced subtrees that can be labeled, unlabeled, ordered, unordered, or edge-labeled. Our algorithms are highly cache-conscious in nature because of the compact and simple array-based data structures we use. Typically, L1 and L2 hit rates above 99% are observed. Experimental evaluation showed that our algorithms can achieve up to several orders of magnitude speedup on real datasets when compared to state-of-the-art tree mining algorithms.

Book ChapterDOI
18 Sep 2006
TL;DR: This paper lays out some basic concepts, starting with (structured) data and generalizations and continuing with data mining tasks and basic components of data mining algorithms, and discusses how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.
Abstract: In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. We also discuss data mining languages and what they should support: this includes the design and implementation of data mining algorithms, as well as their composition into nontrivial multistep knowledge discovery scenarios relevant for practical application. We proceed by laying out some basic concepts, starting with (structured) data and generalizations (e.g., patterns and models) and continuing with data mining tasks and basic components of data mining algorithms (i.e., refinement operators, distances, features and kernels). We next discuss how to use these concepts to formulate constraint-based data mining tasks and design generic data mining algorithms. We finally discuss how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.

BookDOI
01 Jan 2006
TL;DR: From the contents Part I: Theoretical Foundations, Making Better Sense of the Demographic Data Value in the Data Mining Procedure.
Abstract: From the contents Part I: Theoretical Foundations. Commonsense Causal Modeling in the Data Mining Context. Definability of Association Rules in Predicate Calculus. A Measurement-Theoretic Foundation of Rule Interestingness Evaluation. Statistical Independence as Linear Dependence in a Contingency Table. Foundations of Classification.- Part II: Novel Approaches. SVM-OD: SVM Method to Detect Outliers. Extracting Rules from Incomplete Decision Systems: System ERID. Mining for Patterns Based on Contingency Tables by KL-Miner - First Experience. Knowledge Discovery in Fuzzy Databases Using Attribute-Oriented Induction. Rough Set Strategies to Data with Missing Attribute Values. Privacy-Preserving Collaborative Data Mining.- Part III: Novel Applications. Research Issues in Web Structural Delta Mining. Workflow Reduction for Reachable-path Rediscovery in Workflow Mining. Principal Component-based Anomaly Detection Scheme. Making Better Sense of the Demographic Data Value in the Data Mining Procedure.

01 Jan 2006
TL;DR: Although a significant part of current text mining efforts focuses on the analysis of documents related to m olecular biology, the use of lexical, terminological and ontological resources is me ntioned in research systems developed for theAnalysis of clinical narratives or the biological literature.
Abstract: Biomedical terminologies and ontologies are frequently descr ibed as enabling resources in text mining systems [e.g., 1, 2, 3]. These re ou ces are used to supports tasks such as entity recognition (i.e., the identi fication of biomedical entities in text) and relation extraction (i.e., the ide ntification of relationships among biomedical entities). Although a significant part of current text mining efforts focuses on the analysis of documents related to m olecular biology, the use of lexical, terminological and ontological resources is me ntioned in research systems developed for the analysis of clinical narratives ( e.g., MedSyndikate [4]) or the biological literature (e.g., BioRAT [5], GeneS c ne [6], EMPathIE [7] and PASTA [7]). Of note, some systems initially developed fo r extracting clinical information have later been adapted to extract relation s among biological entities (e.g., MedLEE [8] / GENIES [9], SemRep / SemGen [10]). Com mercial systems such as TeSSI i also make use of such resources.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: The results show that data mining can be successfully applied to improve the text retrieval performance and the data mining based feedback method evaluated on the TREC HARD data set is evaluated.
Abstract: In this paper, we investigate the use of data mining, in particular the text classification and co-training techniques, to identify more relevant passages based on a small set of labeled passages obtained from the blind feedback of a retrieval system. The data mining results are used to expand query terms and to re-estimate some of the parameters used in a probabilistic weighting function. We evaluate the data mining based feedback method on the TREC HARD data set. The results show that data mining can be successfully applied to improve the text retrieval performance. We report our experimental findings in detail.

Journal ArticleDOI
TL;DR: A Time-Constrained Sequential Pattern Mining for Extracting Semantic Events in Videos and a Data Mining Approach to Expressive Music Performance Modeling are introduced.
Abstract: into Multimedia Data Mining and Knowledge Discovery.- Multimedia Data Mining: An Overview.- Multimedia Data Exploration and Visualization.- A New Hierarchical Approach for Image Clustering.- Multiresolution Clustering of Time Series and Application to Images.- Mining Rare and Frequent Events in Multi-camera Surveillance Video.- Density-Based Data Analysis and Similarity Search.- Feature Selection for Classification of Variable Length Multiattribute Motions.- Multimedia Data Indexing and Retrieval.- FAST: Fast and Semantics-Tailored Image Retrieval.- New Image Retrieval Principle: Image Mining and Visual Ontology.- Visual Alphabets: Video Classification by End Users.- Multimedia Data Modeling and Evaluation.- Cognitively Motivated Novelty Detection in Video Data Streams.- Video Event Mining via Multimodal Content Analysis and Classification.- Exploiting Spatial Transformations for Identifying Mappings in Hierarchical Media Data.- A Novel Framework for Semantic Image Classification and Benchmark Via Salient Objects.- Extracting Semantics Through Dynamic Context.- Mining Image Content by Aligning Entropies with an Exemplar.- More Efficient Mining Over Heterogeneous Data Using Neural Expert Networks.- A Data Mining Approach to Expressive Music Performance Modeling.- Applications and Case Studies.- Supporting Virtual Workspace Design Through Media Mining and Reverse Engineering.- A Time-Constrained Sequential Pattern Mining for Extracting Semantic Events in Videos.- Multiple-Sensor People Localization in an Office Environment.- Multimedia Data Mining Framework for Banner Images.- Analyzing User's Behavior on a Video Database.- On SVD-Free Latent Semantic Indexing for Iris Recognition of Large Databases.- Mining Knowledge in Computer Tomography Image Databases.

Journal ArticleDOI
TL;DR: This work proposes a relation called the multidimensional pattern relation to structurally and systematically store context and mining information for later analysis and develops an online mining approach called three-phase online association rule mining (TOARM) based on this proposed multiddimensional pattern relation.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: FAE, an incremental ensemble approach to mining data subject to concept drift, is presented, and empirical results on large data streams demonstrate promise.
Abstract: Many interesting real-world applications for temporal data mining are hindered by concept drift. One particular form of concept drift is characterized by changes to the underlying feature space. Seemingly little has been done in this area. This paper presents FAE, an incremental ensemble approach to mining data subject to such concept drift. Empirical results on large data streams demonstrate promise.

Book
01 Jan 2006
TL;DR: A Data Mining Approach to Analyze the Effect of Cognitive Style and Subjective Emotion on the Accuracy of Time-Series Forecasting and a Multi-level Framework for the Analysis of Sequential Data are presented.
Abstract: 1: State-of-the-Art in Research.- Generality Is Predictive of Prediction Accuracy.- Visualisation and Exploration of Scientific Data Using Graphs.- A Case-Based Data Mining Platform.- Consolidated Trees: An Analysis of Structural Convergence.- K Nearest Neighbor Edition to Guide Classification Tree Learning: Motivation and Experimental Results.- Efficiently Identifying Exploratory Rules' Significance.- Mining Value-Based Item Packages - An Integer Programming Approach.- Decision Theoretic Fusion Framework for Actionability Using Data Mining on an Embedded System.- Use of Data Mining in System Development Life Cycle.- Mining MOUCLAS Patterns and Jumping MOUCLAS Patterns to Construct Classifiers.- A Probabilistic Geocoding System Utilising a Parcel Based Address File.- Decision Models for Record Linkage.- Intelligent Document Filter for the Internet.- Informing the Curious Negotiator: Automatic News Extraction from the Internet.- Text Mining for Insurance Claim Cost Prediction.- An Application of Time-Changing Feature Selection.- A Data Mining Approach to Analyze the Effect of Cognitive Style and Subjective Emotion on the Accuracy of Time-Series Forecasting.- A Multi-level Framework for the Analysis of Sequential Data.- 2: State-of-the-Art in Applications.- Hierarchical Hidden Markov Models: An Application to Health Insurance Data.- Identifying Risk Groups Associated with Colorectal Cancer.- Mining Quantitative Association Rules in Protein Sequences.- Mining X-Ray Images of SARS Patients.- The Scamseek Project - Text Mining for Financial Scams on the Internet.- A Data Mining Approach for Branch and ATM Site Evaluation.- The Effectiveness of Positive Data Sharing in Controlling the Growth of Indebtedness in Hong Kong Credit Card Industry.

Journal ArticleDOI
TL;DR: Behavior mining as discussed by the authors is a new stage in the research and practice of knowledge discovery, which can be termed as behavior mining, and it has the potential of unifying some other recent activities in data mining.
Abstract: Knowledge economy requires data mining be more goal-oriented so that more tangible results can be produced. This requirement implies that the semantics of the data should be incorporated into the mining process. Data mining is ready to deal with this challenge because recent developments in data mining have shown an increasing interest on mining of complex data (as exemplified by graph mining, text mining, etc.). By incorporating the relationships of the data along with the data itself (rather than focusing on the data alone), complex data injects semantics into the mining process, thus enhancing the potential of making better contribution to knowledge economy. Since the relationships between the data reveal certain behavioral aspects underlying the plain data, this shift of mining from simple data to complex data signals a fundamental change to a new stage in the research and practice of knowledge discovery, which can be termed as behavior mining. Behavior mining also has the potential of unifying some other recent activities in data mining. We discuss important aspects on behavior mining, and discuss its implications for the future of data mining.

Journal ArticleDOI
TL;DR: A database model and an algebra for data mining are presented, based on the 3W-model introduced by Johnson et al.
Abstract: The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can accommodate disparate mining tasksq We address this problem by presenting a database model and an algebra for data mining. The database model is based on the 3W-model introduced by Johnson et al. [2000]. This model relied on black box mining operators. A main contribution of this article is to open up these black boxes, by using generic operators in a data mining algebra. Two key operators in this algebra are regionize, which creates regions (or models) from data tuples, and a restricted form of looping called mining loop. Then the resulting data mining algebra MA is studied and properties concerning expressive power and complexity are established. We present results in three directions: (1) expressiveness of the mining algebra; (2) relations with alternative frameworks, and (3) interactions between regionize and mining loop.

Journal Article
TL;DR: In this article, the authors describe the various steps in the temporal text mining process, including data cleaning, text refinement, temporal association rule mining and rule post-processing, and describe the Temporal Text Mining Testbench.
Abstract: In this paper we describe how to mine association rules in temporal document collections. We describe how to perform the various steps in the temporal text mining process, including data cleaning, text refinement, temporal association rule mining and rule post-processing. We also describe the Temporal Text Mining Testbench, which is a user-friendly and versatile tool for performing temporal text mining, and some results from using this tool.

Journal ArticleDOI
TL;DR: A novel model for data mining is proposed in evolving environment where some valid mining task schedules are generated, and then autonomous and local mining are executed periodically, and previous results are merged and refined.
Abstract: With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then autonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to incorporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data source by ontology. Experiments suggest that the framework should be useful in guiding the continous mining process.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: The MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees is applied to the Prions database and is shown to be a viable technique for mining protein data.
Abstract: In this paper we consider the ?Prions? database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent substructures from tree databases is an important task and it has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. This has given rise to the development of many tree mining algorithms which can aid in structural comparisons, association rule discovery and in general mining of tree structured knowledge representations. Previously we have developed the MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees. In this work we apply the algorithm to the Prions database in order to extract the frequently occurring patterns, which in this case are of induced subtree type. Obtaining the set of frequent induced subtrees from the Prions database can potentially reveal some useful knowledge. This aspect will be demonstrated by providing an analysis of the extracted frequent subtrees with respect to discovering interesting protein information. Furthermore, the minimum support threshold can be used as the controlling factor for answering specific queries posed on the Prions dataset. This approach is shown to be a viable technique for mining protein data.

Proceedings ArticleDOI
04 Sep 2006
TL;DR: The realization of a hybrid data mining assistant, based on the CBR paradigm and the use of an ontology, is proposed in order to empower the user during the various phases of the data mining process.
Abstract: Most commercial data mining products provide a large number of models and tools for performing various data mining tasks, but few provide intelligent assistance for addressing many important decisions that must be considered during the mining process. In this paper, we propose the realization of a hybrid data mining assistant, based on the CBR paradigm and the use of an ontology, in order to empower the user during the various phases of the data mining process.

Book
23 Jun 2006
TL;DR: A data mining approach to support the development of long-term load forecasting and corporate bankruptcy prediction using data mining techniques is presented.
Abstract: Section 1: Data preparation Nonlinear dimensionality reduction of large datasets for data exploration Text preparation through extended tokenization Section 2: Clustering technologies A method for association rule quality evaluation based on information theory K-means algorithm and its application for clustering companies listed in Zhejiang province Fuzzy geo-processing for characterization of social groups: an application to a Brazilian mid-size city Dynamic classification: economic welfare growth in the EU during 1995-2004 Cluster analysis of 3D seismic data for oil and gas exploration Clustering of time series using a similarity between segments and bands determined by patterns of technical analysis The CLUSTER3 system for goal-oriented conceptual clustering: method and preliminary results Section 3: Categorisation methods A neural-networks associative classification method for association rule mining An efficient Bayesian network approach for discovering interesting patterns Kernel Discriminant Analysis and information complexity: advanced models for micro-data mining and micro-marketing solutions Section 4: MS SQL Server data mining (Special session by C. L. Curotto and N. F. F. Ebecken) Mining cross-predicting stochastic ARMA time series in SQL server 2005 Stability analysis of time series forecasting with ART models Multi-relational data mining in Microsoft SQL Server 2005 Electrical thunderstorm nowcasting using lightning data mining Section 5: Text mining Computational system for the textual processing of industrial patents Using text mining to understand the call center customers' claims A neural-based text summarization system Analysis and development of latent semantic indexing techniques for information retrieval Section 6: Web mining High performance environment for knowledge discovering in Portuguese language texts in the Web On the relationship between click rate and relevance for search engines Selecting clickstream data mining plans using a case-based reasoning application Web page recommendation using a stochastic process model A new algorithm to measure relevance among Web pages A Web Mining process for e-Knowledge services Section 7: Customer relationship management A study of customer relationship management (CRM) on apparel European Web sites Maximum resolution dichotomy for customer relations management Telco churn analysis classification using a wavelet and RBF approach Section 8: Applications in science and engineering Protein Ontology Project: 2006 updates Evidence-based medicine: data mining and pharmacoepidemiology research SEQUEST: mining frequent subsequences using DMA-Strips Section 9: Applications in business, industry and government Traceability in the food-sector: the state of the art in a North Eastern Italian region A data mining approach to support the development of long-term load forecasting Corporate bankruptcy prediction using data mining techniques A Text Mining based content gathering system as strategic support for SMEs Intelligent analysis tools for computer based assessments Information visualization for the taking of decisions Section 10: Information systems, strategies and methodologies Challenges in developing a cost-effective data warehouse for a tertiary institution in a developing country A proposal of information system architecture for public transport Local nulls in summarised mobile and distributed databases A Semantic Web Portal to construction knowledge exchange Ontological support to knowledge management in a hydrogeological information system Enterprise Intelligence Platform in the airline industry Improvement of generation change on SSE algorithm