Top 919 papers published in the topic of Knowledge extraction in 2005

Book•

Data Mining and Knowledge Discovery Handbook

[...]

01 Jan 2005

TL;DR: This book first surveys, then provides comprehensive yet concise algorithmic descriptions of methods, including classic methods plus the extensions and novel methods developed recently.

...read moreread less

Abstract: This book organizes key concepts, theories, standards, methodologies, trends, challenges and applications of data mining and knowledge discovery in databases. It first surveys, then provides comprehensive yet concise algorithmic descriptions of methods, including classic methods plus the extensions and novel methods developed recently. It also gives in-depth descriptions of data mining applications in various interdisciplinary industries.

...read moreread less

2,836 citations

Journal Article•DOI•

A survey of current work in biomedical text mining

[...]

Aaron Cohen¹, William R. Hersh¹•Institutions (1)

Oregon Health & Science University¹

01 Mar 2005-Briefings in Bioinformatics

TL;DR: The major challenge of biomedical text mining over the next 5-10 years will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

...read moreread less

Abstract: The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5–10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

...read moreread less

782 citations

Journal Article•DOI•

An electric energy consumer characterization framework based on data mining techniques

[...]

Vera Figueiredo, Fátima Rodrigues¹, Zita Vale¹, Joaquim Borges Gouveia²•Institutions (2)

Polytechnic Institute of Porto¹, University of Aveiro²

02 May 2005-IEEE Transactions on Power Systems

TL;DR: This paper presents an electricity consumer characterization framework based on a knowledge discovery in databases (KDD) procedure, supported by data mining techniques, applied on the different stages of the process.

...read moreread less

Abstract: This paper presents an electricity consumer characterization framework based on a knowledge discovery in databases (KDD) procedure, supported by data mining (DM) techniques, applied on the different stages of the process. The core of this framework is a data mining model based on a combination of unsupervised and supervised learning techniques. Two main modules compose this framework: the load profiling module and the classification module. The load profiling module creates a set of consumer classes using a clustering operation and the representative load profiles for each class. The classification module uses this knowledge to build a classification model able to assign different consumers to the existing classes. The quality of this framework is illustrated with a case study concerning a real database of LV consumers from the Portuguese distribution company.

...read moreread less

446 citations

Proceedings Article•

CiteSpace II: visualization and knowledge discovery in bibliographic databases.

[...]

Marie Synnestvedt¹, Chaomei Chen, John H. Holmes¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2005

TL;DR: This article presents a description and case study of CiteSpace II, a Java application which supports visual exploration with knowledge discovery in bibliographic databases and qualitatively evaluated two resulting document-term co-citation and MeSH term co-occurrence visualizations.

...read moreread less

Abstract: This article presents a description and case study of CiteSpace II, a Java application which supports visual exploration with knowledge discovery in bibliographic databases. Highly cited and pivotal documents, areas of specialization within a knowledge domain, and emergence of research topics are visually mapped through a progressive knowledge domain visualization approach to detecting and visualizing trends and patterns in scientific literature. The test case in this study is progressive knowledge domain visualization of the field of medical informatics. Datasets based on publications from twelve journals in the medical informatics field covering the time period from 1964-2004 were extracted from PubMed and Web of Science (WOS) and developed as testbeds for evaluation of the CiteSpace system. Two resulting document-term co-citation and MeSH term co-occurrence visualizations are qualitatively evaluated for identification of pivotal documents, areas of specialization, and research trends. Practical applications in bio-medical research settings are discussed.

...read moreread less

358 citations

Journal Article•DOI•

A survey of interestingness measures for knowledge discovery

[...]

Kenneth McGarry¹•Institutions (1)

University of Sunderland¹

01 Mar 2005-Knowledge Engineering Review

TL;DR: A review of the available literature on the various measures devised for evaluating and ranking the discovered patterns produced by the data mining process and their strengths and weaknesses with respect to the level of user integration within the discovery process is presented.

...read moreread less

Abstract: It is a well-known fact that the data mining process can generate many hundreds and often thousands of patterns from data. The task for the data miner then becomes one of determining the most useful patterns from those that are trivial or are already well known to the organization. It is therefore necessary to filter out those patterns through the use of some measure of the patterns actual worth. This article presents a review of the available literature on the various measures devised for evaluating and ranking the discovered patterns produced by the data mining process. These so-called interestingness measures are generally divided into two categories: objective measures based on the statistical strengths or properties of the discovered patterns and subjective measures that are derived from the user's beliefs or expectations of their particular problem domain. We evaluate the strengths and weaknesses of the various interestingness measures with respect to the level of user integration within the discovery process.

...read moreread less

344 citations

Book Chapter•DOI•

Process mining and verification of properties: an approach based on temporal logic

[...]

W.M.P. van der Aalst¹, H. T. de Beer¹, B. F. van Dongen¹•Institutions (1)

Eindhoven University of Technology¹

31 Oct 2005

TL;DR: A new language based on Linear Temporal Logic (LTL) is developed and this is combined with a standard XML format to store event logs and the LTL Checker verifies whether the observed behavior matches the (un)expected/(un)desirable behavior.

...read moreread less

Abstract: Information systems are facing conflicting requirements. On the one hand, systems need to be adaptive and self-managing to deal with rapidly changing circumstances. On the other hand, legislation such as the Sarbanes-Oxley Act, is putting increasing demands on monitoring activities and processes. As processes and systems become more flexible, both the need for, and the complexity of monitoring increases. Our earlier work on process mining has primarily focused on process discovery, i.e., automatically constructing models describing knowledge extracted from event logs. In this paper, we focus on a different problem complementing process discovery. Given an event log and some property, we want to verify whether the property holds. For this purpose we have developed a new language based on Linear Temporal Logic (LTL) and we combine this with a standard XML format to store event logs. Given an event log and an LTL property, our LTL Checker verifies whether the observed behavior matches the (un)expected/(un)desirable behavior.

...read moreread less

332 citations

Journal Article•DOI•

Mining knowledge from text using information extraction

[...]

Raymond J. Mooney¹, Razvan Bunescu¹•Institutions (1)

University of Texas at Austin¹

01 Jun 2005-Sigkdd Explorations

TL;DR: Methods and implemented systems for information extraction systems used to extract concrete data from a set of documents are discussed and results on mining real text corpora of biomedical abstracts, job announcements, and product descriptions are summarized.

...read moreread less

Abstract: An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can be used to directly extricate abstract knowledge from a text corpus, or to extract concrete data from a set of documents which can then be further analyzed with traditional data-mining techniques to discover more general patterns. We discuss methods and implemented systems for both of these approaches and summarize results on mining real text corpora of biomedical abstracts, job announcements, and product descriptions. We also discuss challenges that arise when employing current information extraction technology to discover knowledge in text.

...read moreread less

256 citations

Journal Article•DOI•

CrimeNet explorer: a framework for criminal network knowledge discovery

[...]

Jennifer Xu¹, Hsinchun Chen¹•Institutions (1)

University of Arizona¹

01 Apr 2005-ACM Transactions on Information Systems

TL;DR: A framework for automated network analysis and visualization was proposed that incorporates several advanced techniques: a concept space approach, hierarchical clustering, social network analysis methods, and multidimensional scaling, which demonstrated that the system could achieve higher clustering recall and precision than did untrained subjects when detecting subgroups from criminal networks.

...read moreread less

Abstract: Knowledge about the structure and organization of criminal networks is important for both crime investigation and the development of effective strategies to prevent crimes. However, except for network visualization, criminal network analysis remains primarily a manual process. Existing tools do not provide advanced structural analysis techniques that allow extraction of network knowledge from large volumes of criminal-justice data. To help law enforcement and intelligence agencies discover criminal network knowledge efficiently and effectively, in this research we proposed a framework for automated network analysis and visualization. The framework included four stages: network creation, network partition, structural analysis, and network visualization. Based upon it, we have developed a system called CrimeNet Explorer that incorporates several advanced techniques: a concept space approach, hierarchical clustering, social network analysis methods, and multidimensional scaling. Results from controlled experiments involving student subjects demonstrated that our system could achieve higher clustering recall and precision than did untrained subjects when detecting subgroups from criminal networks. Moreover, subjects identified central members and interaction patterns between groups significantly faster with the help of structural analysis functionality than with only visualization functionality. No significant gain in effectiveness was present, however. Our domain experts also reported that they believed CrimeNet Explorer could be very useful in crime investigation.

...read moreread less

248 citations

Book•

Introduction to Business Data Mining

[...]

David L. Olson

18 Nov 2005

TL;DR: This paper presents a meta-modelling architecture for data mining that automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing data mining techniques.

...read moreread less

Abstract: Part I: INTRODUCTION Chapter 1: Initial Description of Data Mining in Business Chapter 2: Data Mining Processes and Knowledge Discovery Chapter 3: Database Support to Data Mining Part II: DATA MINING METHODS AS TOOLS Chapter 4: Overview of Data Mining Techniques Chapter 4 Appendix: Enterprise Miner Demonstration on Expenditure Data Set Chapter 5: Cluster Analysis Chapter 5 Appendix: Clementine Chapter 6: Regression Algorithms in Data Mining Chapter 7: Neural Networks in Data Mining Chapter 8: Decision Tree Algorithms Appendix 8: Demonstration of See5 Decision Tree Analysis Chapter 9: Linear Programming-Based Methods Chapter 9 Appendix: Data Mining Linear Programming Formulations Part III: BUSINESS APPLICATIONS Chapter 10: Business Data Mining Applications Applications Chapter 11: Market-Basket Analysis Chapter 11 Appendix: Market-Basket Procedure Part IV: DEVELOPING ISSUES Chapter 12: Text and Web Mining Chapter 12 Appendix: Semantic Text Analysis Chapter 13: Ethical Aspects of Data Mining

...read moreread less

245 citations

Book Chapter•DOI•

Concept maps: integrating knowledge and information visualization

[...]

Alberto J. Cañas¹, Roger Carff¹, Greg Hill¹, Marco Carvalho¹, Marco Arguedas¹, Thomas C. Eskridge¹, James Lott¹, Rodrigo Carvajal¹ - Show less +4 more•Institutions (1)

Florida Institute for Human and Machine Cognition¹

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: It is shown how concept map-based knowledge models can be used to organize repositories of information in a way that makes them easily browsable, and how concept maps can improve searching algorithms for the Web.

...read moreread less

Abstract: Information visualization has been a research topic for many years, leading to a mature field where guidelines and practices are well established. Knowledge visualization, in contrast, is a relatively new area of research that has received more attention recently due to the interest from the business community in Knowledge Management. In this paper we present the CmapTools software as an example of how concept maps, a knowledge visualization tool, can be combined with recent technology to provide integration between knowledge and information visualizations. We show how concept map-based knowledge models can be used to organize repositories of information in a way that makes them easily browsable, and how concept maps can improve searching algorithms for the Web. We also report on how information can be used to complement knowledge models and, based on the searching algorithms, improve the process of constructing concept maps.

...read moreread less

220 citations

Book Chapter•DOI•

Privacy-Preserving decision trees over vertically partitioned data

[...]

Jaideep Vaidya¹, Chris Clifton²•Institutions (2)

Rutgers University¹, Purdue University²

07 Aug 2005-Lecture Notes in Computer Science

TL;DR: A generalized privacy preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties is introduced and a complete proof of security that gives a tight bound on the information revealed is given.

...read moreread less

Abstract: Privacy and security concerns can prevent sharing of data, derailing data mining projects.Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized privacy preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with the algorithm, we give a complete proof of security that gives a tight bound on the information revealed.

...read moreread less

Book Chapter•DOI•

The rough set exploration system

[...]

Jan G. Bazan¹, Marcin Szczuka²•Institutions (2)

Rzeszów University¹, University of Warsaw²

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: This article gives an overview of the Rough Set Exploration System (RSES), a freely available software system toolset for data exploration, classification support and knowledge discovery.

...read moreread less

Abstract: This article gives an overview of the Rough Set Exploration System (RSES). RSES is a freely available software system toolset for data exploration, classification support and knowledge discovery. The main functionalities of this software system are presented along with a brief explanation of the algorithmic methods used by RSES. Many of the RSES methods have originated from rough set theory introduced by Zdzislaw Pawlak during the early 1980s.

...read moreread less

Journal Article•DOI•

Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification

[...]

Abraham Bernstein¹, Foster Provost², Shawndra Hill²•Institutions (2)

University of Zurich¹, New York University²

01 Apr 2005-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The Intelligent Discovery Assistant (IDA) as discussed by the authors provides users with systematic enumerations of valid data mining processes and effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute.

...read moreread less

Abstract: A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype intelligent discovery assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition.

...read moreread less

Proceedings Article•DOI•

Extracting knowledge from evaluative text

[...]

Giuseppe Carenini¹, Raymond T. Ng¹, Ed Zwart¹•Institutions (1)

University of British Columbia¹

02 Oct 2005

TL;DR: An improved method for feature extraction that draws on an existing unsupervised method is introduced that turns the task of feature extraction into one of term similarity by mapping crude (learned) features into a user-defined taxonomy of the entity's features.

...read moreread less

Abstract: Capturing knowledge from free-form evaluative texts about an entity is a challenging task. New techniques of feature extraction, polarity determination and strength evaluation have been proposed. Feature extraction is particularly important to the task as it provides the underpinnings of the extracted knowledge. The work in this paper introduces an improved method for feature extraction that draws on an existing unsupervised method. By including user-specific prior knowledge of the evaluated entity, we turn the task of feature extraction into one of term similarity by mapping crude (learned) features into a user-defined taxonomy of the entity's features. Results show promise both in terms of the accuracy of the mapping as well as the reduction in the semantic redundancy of crude features.

...read moreread less

Proceedings Article•DOI•

MCAR: multi-class classification based on association rule

[...]

Fadi Thabtah¹, Peter I. Cowling¹, Yonghong Peng•Institutions (1)

Universities UK¹

03 Jan 2005

TL;DR: A new classification method called multi-class classification based on association rules (MCAR) is presented, which uses an efficient technique for discovering frequent items and employs a rule ranking method which ensures detailed rules with high confidence are part of the classifier.

...read moreread less

Abstract: Summary form only given. Constructing fast, accurate classifiers for large data sets is an important task in data mining and knowledge discovery. In this research paper, a new classification method called multi-class classification based on association rules (MCAR) is presented. MCAR uses an efficient technique for discovering frequent items and employs a rule ranking method which ensures detailed rules with high confidence are part of the classifier. After experimentation with fifteen different data sets, the results indicated that the proposed method is an accurate and efficient classification technique. Furthermore, the classifiers produced are highly competitive with regards to error rate and efficiency, if compared with those generated by popular methods like decision trees, RIPPER and CBA.

...read moreread less

Journal Article•DOI•

A Framework for Evaluating Privacy Preserving Data Mining Algorithms

[...]

Elisa Bertino¹, Igor Nai Fovino², Loredana Parasiliti Provenza²•Institutions (2)

Purdue University¹, University of Milan²

01 Sep 2005-Data Mining and Knowledge Discovery

TL;DR: A first evaluation framework for estimating and comparing different kinds of PPDM algorithms and applies its criteria to a specific set of algorithms and discusses the evaluation results the authors obtain.

...read moreread less

Abstract: Recently, a new class of data mining methods, known as privacy preserving data mining (PPDM) algorithms, has been developed by the research community working on security and knowledge discovery. The aim of these algorithms is the extraction of relevant knowledge from large amount of data, while protecting at the same time sensitive information. Several data mining techniques, incorporating privacy protection mechanisms, have been developed that allow one to hide sensitive itemsets or patterns, before the data mining process is executed. Privacy preserving classification methods, instead, prevent a miner from building a classifier which is able to predict sensitive data. Additionally, privacy preserving clustering techniques have been recently proposed, which distort sensitive numerical attributes, while preserving general features for clustering analysis. A crucial issue is to determine which ones among these privacy-preserving techniques better protect sensitive information. However, this is not the only criteria with respect to which these algorithms can be evaluated. It is also important to assess the quality of the data resulting from the modifications applied by each algorithm, as well as the performance of the algorithms. There is thus the need of identifying a comprehensive set of criteria with respect to which to assess the existing PPDM algorithms and determine which algorithm meets specific requirements. In this paper, we present a first evaluation framework for estimating and comparing different kinds of PPDM algorithms. Then, we apply our criteria to a specific set of algorithms and discuss the evaluation results we obtain. Finally, some considerations about future work and promising directions in the context of privacy preservation in data mining are discussed.

...read moreread less

The Data Mining and Knowledge Discovery Handbook

[...]

Oded Maimon, Lior Rokach

01 Jan 2005

Book•DOI•

Knowledge discovery in databases : pkdd 2005

[...]

Alípio Mário Jorge, Luís Torgo, Pavel Brazdil, Rui Camacho, João Gama - Show less +1 more

01 Jan 2005

TL;DR: In this paper, a WSRF-enabled Weka Toolkit for Distributed Data Mining on Grids is presented for NER using Inductive Logic Programming for predicting protein-protein interactions from multiple Genomic Data.

...read moreread less

Abstract: Invited Talks.- Data Analysis in the Life Sciences - Sparking Ideas -.- Machine Learning for Natural Language Processing (and Vice Versa?).- Statistical Relational Learning: An Inductive Logic Programming Perspective.- Recent Advances in Mining Time Series Data.- Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce.- Data Streams and Data Synopses for Massive Data Sets.- Long Papers.- k-Anonymous Patterns.- Interestingness is Not a Dichotomy: Introducing Softness in Constrained Pattern Mining.- Generating Dynamic Higher-Order Markov Models in Web Usage Mining.- Tree 2 - Decision Trees for Tree Structured Data.- Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results.- Cluster Aggregate Inequality and Multi-level Hierarchical Clustering.- Ensembles of Balanced Nested Dichotomies for Multi-class Problems.- Protein Sequence Pattern Mining with Constraints.- An Adaptive Nearest Neighbor Classification Algorithm for Data Streams.- Support Vector Random Fields for Spatial Classification.- Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication.- A Correspondence Between Maximal Complete Bipartite Subgraphs and Closed Patterns.- Improving Generalization by Data Categorization.- Mining Model Trees from Spatial Data.- Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification.- Mining Paraphrases from Self-anchored Web Sentence Fragments.- M2SP: Mining Sequential Patterns Among Several Dimensions.- A Systematic Comparison of Feature-Rich Probabilistic Classifiers for NER Tasks.- Knowledge Discovery from User Preferences in Conversational Recommendation.- Unsupervised Discretization Using Tree-Based Density Estimation.- Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization.- Non-stationary Environment Compensation Using Sequential EM Algorithm for Robust Speech Recognition.- Hybrid Cost-Sensitive Decision Tree.- Characterization of Novel HIV Drug Resistance Mutations Using Clustering, Multidimensional Scaling and SVM-Based Feature Ranking.- Object Identification with Attribute-Mediated Dependences.- Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids.- Using Inductive Logic Programming for Predicting Protein-Protein Interactions from Multiple Genomic Data.- ISOLLE: Locally Linear Embedding with Geodesic Distance.- Active Sampling for Knowledge Discovery from Biomedical Data.- A Multi-metric Index for Euclidean and Periodic Matching.- Fast Burst Correlation of Financial Data.- A Propositional Approach to Textual Case Indexing.- A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston.- Efficient Classification from Multiple Heterogeneous Databases.- A Probabilistic Clustering-Projection Model for Discrete Data.- Short Papers.- Collaborative Filtering on Data Streams.- The Relation of Closed Itemset Mining, Complete Pruning Strategies and Item Ordering in Apriori-Based FIM Algorithms.- Community Mining from Multi-relational Networks.- Evaluating the Correlation Between Objective Rule Interestingness Measures and Real Human Interest.- A Kernel Based Method for Discovering Market Segments in Beef Meat.- Corpus-Based Neural Network Method for Explaining Unknown Words by WordNet Senses.- Segment and Combine Approach for Non-parametric Time-Series Classification.- Producing Accurate Interpretable Clusters from High-Dimensional Data.- Stress-Testing Hoeffding Trees.- Rank Measures for Ordering.- Dynamic Ensemble Re-Construction for Better Ranking.- Frequency-Based Separation of Climate Signals.- Efficient Processing of Ranked Queries with Sweeping Selection.- Feature Extraction from Mass Spectra for Classification of Pathological States.- Numbers in Multi-relational Data Mining.- Testing Theories in Particle Physics Using Maximum Likelihood and Adaptive Bin Allocation.- Improved Naive Bayes for Extremely Skewed Misclassification Costs.- Clustering and Prediction of Mobile User Routes from Cellular Data.- Elastic Partial Matching of Time Series.- An Entropy-Based Approach for Generating Multi-dimensional Sequential Patterns.- Visual Terrain Analysis of High-Dimensional Datasets.- An Auto-stopped Hierarchical Clustering Algorithm for Analyzing 3D Model Database.- A Comparison Between Block CEM and Two-Way CEM Algorithms to Cluster a Contingency Table.- An Imbalanced Data Rule Learner.- Improvements in the Data Partitioning Approach for Frequent Itemsets Mining.- On-Line Adaptive Filtering of Web Pages.- A Bi-clustering Framework for Categorical Data.- Privacy-Preserving Collaborative Filtering on Vertically Partitioned Data.- Indexed Bit Map (IBM) for Mining Frequent Sequences.- STochFS: A Framework for Combining Feature Selection Outcomes Through a Stochastic Process.- Speeding Up Logistic Model Tree Induction.- A Random Method for Quantifying Changing Distributions in Data Streams.- Deriving Class Association Rules Based on Levelwise Subspace Clustering.- An Incremental Algorithm for Mining Generators Representation.- Hybrid Technique for Artificial Neural Network Architecture and Weight Optimization.

...read moreread less

Book•

Knowledge discovery in databases : PKDD 2005 : 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005 : proceedings

[...]

Alípio Mário Jorge, Luís Torgo, Pavel Brazdil, Rui Camacho, João Gama - Show less +1 more

01 Jan 2005

TL;DR: Invited Talks.- Data Analysis in the Life Sciences - Sparking Ideas - Machine Learning for Natural Language Processing (and Vice Versa?).- Statistical Relational Learning: An Inductive Logic Programming Perspective.

...read moreread less

Abstract: Invited Talks.- Data Analysis in the Life Sciences - Sparking Ideas -.- Machine Learning for Natural Language Processing (and Vice Versa?).- Statistical Relational Learning: An Inductive Logic Programming Perspective.- Recent Advances in Mining Time Series Data.- Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce.- Data Streams and Data Synopses for Massive Data Sets.- Long Papers.- k-Anonymous Patterns.- Interestingness is Not a Dichotomy: Introducing Softness in Constrained Pattern Mining.- Generating Dynamic Higher-Order Markov Models in Web Usage Mining.- Tree 2 - Decision Trees for Tree Structured Data.- Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results.- Cluster Aggregate Inequality and Multi-level Hierarchical Clustering.- Ensembles of Balanced Nested Dichotomies for Multi-class Problems.- Protein Sequence Pattern Mining with Constraints.- An Adaptive Nearest Neighbor Classification Algorithm for Data Streams.- Support Vector Random Fields for Spatial Classification.- Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication.- A Correspondence Between Maximal Complete Bipartite Subgraphs and Closed Patterns.- Improving Generalization by Data Categorization.- Mining Model Trees from Spatial Data.- Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification.- Mining Paraphrases from Self-anchored Web Sentence Fragments.- M2SP: Mining Sequential Patterns Among Several Dimensions.- A Systematic Comparison of Feature-Rich Probabilistic Classifiers for NER Tasks.- Knowledge Discovery from User Preferences in Conversational Recommendation.- Unsupervised Discretization Using Tree-Based Density Estimation.- Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization.- Non-stationary Environment Compensation Using Sequential EM Algorithm for Robust Speech Recognition.- Hybrid Cost-Sensitive Decision Tree.- Characterization of Novel HIV Drug Resistance Mutations Using Clustering, Multidimensional Scaling and SVM-Based Feature Ranking.- Object Identification with Attribute-Mediated Dependences.- Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids.- Using Inductive Logic Programming for Predicting Protein-Protein Interactions from Multiple Genomic Data.- ISOLLE: Locally Linear Embedding with Geodesic Distance.- Active Sampling for Knowledge Discovery from Biomedical Data.- A Multi-metric Index for Euclidean and Periodic Matching.- Fast Burst Correlation of Financial Data.- A Propositional Approach to Textual Case Indexing.- A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston.- Efficient Classification from Multiple Heterogeneous Databases.- A Probabilistic Clustering-Projection Model for Discrete Data.- Short Papers.- Collaborative Filtering on Data Streams.- The Relation of Closed Itemset Mining, Complete Pruning Strategies and Item Ordering in Apriori-Based FIM Algorithms.- Community Mining from Multi-relational Networks.- Evaluating the Correlation Between Objective Rule Interestingness Measures and Real Human Interest.- A Kernel Based Method for Discovering Market Segments in Beef Meat.- Corpus-Based Neural Network Method for Explaining Unknown Words by WordNet Senses.- Segment and Combine Approach for Non-parametric Time-Series Classification.- Producing Accurate Interpretable Clusters from High-Dimensional Data.- Stress-Testing Hoeffding Trees.- Rank Measures for Ordering.- Dynamic Ensemble Re-Construction for Better Ranking.- Frequency-Based Separation of Climate Signals.- Efficient Processing of Ranked Queries with Sweeping Selection.- Feature Extraction from Mass Spectra for Classification of Pathological States.- Numbers in Multi-relational Data Mining.- Testing Theories in Particle Physics Using Maximum Likelihood and Adaptive Bin Allocation.- Improved Naive Bayes for Extremely Skewed Misclassification Costs.- Clustering and Prediction of Mobile User Routes from Cellular Data.- Elastic Partial Matching of Time Series.- An Entropy-Based Approach for Generating Multi-dimensional Sequential Patterns.- Visual Terrain Analysis of High-Dimensional Datasets.- An Auto-stopped Hierarchical Clustering Algorithm for Analyzing 3D Model Database.- A Comparison Between Block CEM and Two-Way CEM Algorithms to Cluster a Contingency Table.- An Imbalanced Data Rule Learner.- Improvements in the Data Partitioning Approach for Frequent Itemsets Mining.- On-Line Adaptive Filtering of Web Pages.- A Bi-clustering Framework for Categorical Data.- Privacy-Preserving Collaborative Filtering on Vertically Partitioned Data.- Indexed Bit Map (IBM) for Mining Frequent Sequences.- STochFS: A Framework for Combining Feature Selection Outcomes Through a Stochastic Process.- Speeding Up Logistic Model Tree Induction.- A Random Method for Quantifying Changing Distributions in Data Streams.- Deriving Class Association Rules Based on Levelwise Subspace Clustering.- An Incremental Algorithm for Mining Generators Representation.- Hybrid Technique for Artificial Neural Network Architecture and Weight Optimization.

...read moreread less

Journal Article•DOI•

Knowledge Discovery with Genetic Programming for Providing Feedback to Courseware Authors

[...]

Cristóbal Romero¹, Sebastián Ventura¹, Paul De Bra²•Institutions (2)

University of Córdoba (Spain)¹, Eindhoven University of Technology²

01 Jan 2005-User Modeling and User-adapted Interaction

TL;DR: A specific data mining tool is presented that can help non-experts in data mining carry out the complete rule discovery process, and its utility is demonstrated by applying it to an adaptive Linux course that was developed.

...read moreread less

Abstract: We introduce a methodology to improve Adaptive Systems for Web-Based Education. This methodology uses evolutionary algorithms as a data mining method for discovering interesting relationships in students' usage data. Such knowledge may be very useful for teachers and course authors to select the most appropriate modifications to improve the effectiveness of the course. We use Grammar-Based Genetic Programming (GBGP) with multi-objective optimization techniques to discover prediction rules. We present a specific data mining tool that can help non-experts in data mining carry out the complete rule discovery process, and demonstrate its utility by applying it to an adaptive Linux course that we developed.

...read moreread less

Book Chapter•DOI•

Community mining from multi-relational networks

[...]

Deng Cai¹, Zheng Shao¹, Xiaofei He², Xifeng Yan¹, Jiawei Han¹ - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Chicago²

03 Oct 2005

TL;DR: This paper systematically analyzes the problem of mining hidden communities on heterogeneous social networks and proposes a new method for learning an optimal linear combination of these relations which can best meet the user's expectation.

...read moreread less

Abstract: Social network analysis has attracted much attention in recent years. Community mining is one of the major directions in social network analysis. Most of the existing methods on community mining assume that there is only one kind of relation in the network, and moreover, the mining results are independent of the users' needs or preferences. However, in reality, there exist multiple, heterogeneous social networks, each representing a particular kind of relationship, and each kind of relationship may play a distinct role in a particular task. In this paper, we systematically analyze the problem of mining hidden communities on heterogeneous social networks. Based on the observation that different relations have different importance with respect to a certain query, we propose a new method for learning an optimal linear combination of these relations which can best meet the user's expectation. With the obtained relation, better performance can be achieved for community mining.

...read moreread less

Journal Article•DOI•

Multi-objective hierarchical genetic algorithm for interpretable fuzzy rule-based knowledge extraction

[...]

Hanli Wang¹, Sam Kwong¹, Yaochu Jin², Wei Wei³, Kim F. Man¹ - Show less +1 more•Institutions (3)

City University of Hong Kong¹, Honda², Zhejiang University³

01 Jan 2005-Fuzzy Sets and Systems

TL;DR: The accuracy and the interpretability of fuzzy models derived by this approach are studied and presented and it is shown that the proposed approach is effective and practical in knowledge extraction.

...read moreread less

Journal Article•DOI•

Privacy-preserving clustering with distributed EM mixture modeling

[...]

Xiaodong Lin¹, Chris Clifton², Michael Zhu²•Institutions (2)

University of Cincinnati¹, Purdue University²

01 Jul 2005-Knowledge and Information Systems

TL;DR: A technique that uses EM mixture modeling to perform clustering on distributed data that controls data sharing, preventing disclosure of individual data items or any results that can be traced to an individual site.

...read moreread less

Abstract: Privacy and security considerations can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery can alleviate this problem. We present a technique that uses EM mixture modeling to perform clustering on distributed data. This method controls data sharing, preventing disclosure of individual data items or any results that can be traced to an individual site.

...read moreread less

Book•

Knowledge and Information Visualization: searching for synergies

[...]

Sigmar-Olaf Tergan, Tanja Keller

01 Jan 2005

TL;DR: This chapter discusses information and Knowledge Visualization in development and use of a Management Information System (MIS) for DaimlerChrysler.

...read moreread less

Abstract: Visualizing Knowledge and Information: An Introduction.- Visualizing Knowledge and Information: An Introduction.- Background.- Visual Queries: The Foundation of Visual Thinking.- Representational Correspondence as a Basic Principle of Diagram Design.- Knowledge Visualization.- Node-Link Mapping Principles for Visualizing Knowledge and Information.- Tools for Representing Problems and the Knowledge Required to Solve Them.- Collaborative Knowledge Visualization for Cross-Community Learning.- Information Visualization.- Modeling Interactive, 3-Dimensional Information Visualizations Supporting Information Seeking Behaviors.- Visualizing Information in Virtual Space: Prospects and Pitfalls.- The Impact of Dimensionality and Color Coding of Information Visualizations on Knowledge Acquisition.- Synergies Visualizing Knowledge and Information for Fostering Learning and Instruction.- Digital Concept Maps for Managing Knowledge and Information.- Concept Maps: Integrating Knowledge and Information Visualization.- Comprehensive Mapping of Knowledge and Information Resources: The Case of Webster.- Towards a Framework and a Model for Knowledge Visualization: Synergies Between Information and Knowledge Visualization.- ParIS - Visualizing Ideas and Information in a Resource-Based Learning Scenario.- Knowledge-Oriented Organization of Information for Fostering Information Use.- LEO: A Concept Map Based Course Visualization Tool for Instructors and Students.- Navigating Personal Information Repositories with Weblog Authoring and Concept Mapping.- Facilitating Web Search with Visualization and Data Mining Techniques.- The Role of Content Representations in Hypermedia Learning: Effects of Task and Learner Variables.- Supporting Self-regulated E-Learning with Visual Topic-Map-Navigation.- Information and Knowledge Visualization in Development and Use of a Management Information System (MIS) for DaimlerChrysler.

...read moreread less

Journal Article•DOI•

Attribute reduction theory and approach to concept lattice

[...]

Zhang Wen-xiu¹, Wei Ling¹, QI Jianjun¹•Institutions (1)

Xi'an Jiaotong University¹

20 Dec 2005-Science in China Series F: Information Sciences

TL;DR: The judgment theorems of consistent sets are examined, and the discernibility matrix of a formal context is introduced, by which an approach to attribute reduction in the concept lattice is presented.

...read moreread less

Abstract: The theory of the concept lattice is an efficient tool for knowledge representation and knowledge discovery, and is applied to many fields successfully. One focus of knowledge discovery is knowledge reduction. This paper proposes the theory of attribute reduction in the concept lattice, which extends the theory of the concept lattice. In this paper, the judgment theorems of consistent sets are examined, and the discernibility matrix of a formal context is introduced, by which we present an approach to attribute reduction in the concept lattice. The characteristics of three types of attributes are analyzed.

...read moreread less

Book Chapter•DOI•

Rough Set Based Decision Support

[...]

Roman Słowiński¹, Salvatore Greco², Salvatore Greco³, Benedetto Matarazzo²•Institutions (3)

Poznań University of Technology¹, University of Catania², University of Portsmouth³

01 Jan 2005

TL;DR: The goal of the chapter is to present a knowledge discovery paradigm for multi-attribute and multicriteria decision making, which is based upon the concept of rough sets, in order to find concise classification patterns that agree with situations that are described by the data.

...read moreread less

Abstract: In this chapter, we are concerned with discovering knowledge from data. The aim is to find concise classification patterns that agree with situations that are described by the data. Such patterns are useful for explanation of the data and for the prediction of future situations. They are particularly useful in such decision problems as technical diagnostics, performance evaluation and risk assessment. The situations are described by a set of attributes, which we might also call properties, features, characteristics, etc. Such attributes may be concerned with either the input or output of a situation. These situations may refer to states, examples, etc. Within this chapter, we will refer to them as objects. The goal of the chapter is to present a knowledge discovery paradigm for multi-attribute and multicriteria decision making, which is based upon the concept of rough sets. Rough set theory was introduced by (Pawlak 1982, Pawlak 1991). Since then, it has often proved to be an excellent mathematical tool for the analysis of a vague description of objects. The adjective vague (referring to the quality of information) is concerned with inconsistency or ambiguity. The rough set philosophy is based on the assumption that with every object of the universe U there is associated a certain amount of information (data, knowledge). This information can be expressed by means of a number of attributes. The attributes describe the object. Objects which have the same description are said to be indiscernible (similar) with respect to the available information.

...read moreread less

Book Chapter•DOI•

Trends in Data Mining and Knowledge Discovery

[...]

Krzysztof J. Cios¹, Krzysztof J. Cios², Lukasz Kurgan³•Institutions (3)

University of Colorado Boulder¹, University of Colorado Denver², University of Alberta³

01 Jan 2005

TL;DR: This chapter describes a six-stepDMKD process model and its component technologies, which help to design flexible, semiautomated, and easy-to-use DMKD models to enable building knowledge repositories and allowing for communication between several data mining tools, databases, and knowledge repositories.

...read moreread less

Abstract: Data mining and knowledge discovery (DMKD) is a fast-growing field of research. Its popularity is caused by an ever increasing demand for tools that help in revealing and comprehending information hidden in huge amounts of data. Such data are generated on a daily basis by federal agencies, banks, insurance companies, retail stores, and on the WWW. This explosion came about through the increasing use of computers, scanners, digital cameras, bar codes, etc. We are in a situation where rich sources of data, stored in databases, warehouses, and other data repositories, are readily available but not easily analyzable. This causes pressure from the federal, business, and industry communities for improvements in the DMKD technology. What is needed is a clear and simple methodology for extracting the knowledge hidden in the data. In this chapter, an integrated DMKD process model based on technologies like XML, PMML, SOAP, UDDI, and OLE BD-DM is introduced. These technologies help to design flexible, semiautomated, and easy-to-use DMKD models to enable building knowledge repositories and allowing for communication between several data mining tools, databases, and knowledge repositories. They also enable integration and automation of the DMKD tasks. This chapter describes a six-step DMKD process model and its component technologies.

...read moreread less

Book Chapter•DOI•

A novel bit level time series representation with implication of similarity search and clustering

[...]

Chotirat Ann Ratanamahatana¹, Eamonn Keogh¹, Anthony J. Bagnall², Stefano Lonardi¹•Institutions (2)

University of California, Riverside¹, University of East Anglia²

18 May 2005

TL;DR: This work introduces a new technique based on a bit level approximation of the data that allows raw data to be directly compared to the reduced representation, while still guaranteeing lower bounds to Euclidean distance.

...read moreread less

Abstract: Because time series are a ubiquitous and increasingly prevalent type of data, there has been much research effort devoted to time series data mining recently. As with all data mining problems, the key to effective and scalable algorithms is choosing the right representation of the data. Many high level representations of time series have been proposed for data mining. In this work, we introduce a new technique based on a bit level approximation of the data. The representation has several important advantages over existing techniques. One unique advantage is that it allows raw data to be directly compared to the reduced representation, while still guaranteeing lower bounds to Euclidean distance. This fact can be exploited to produce faster exact algorithms for similarly search. In addition, we demonstrate that our new representation allows time series clustering to scale to much larger datasets.

...read moreread less

Journal Article•DOI•

Encouraging reuse of design knowledge: a method to index knowledge

[...]

Saeema Ahmed¹•Institutions (1)

Technical University of Denmark¹

01 Nov 2005-Design Studies

TL;DR: In this paper, the authors developed a method to index design knowledge that is intuitive to an engineering designer and therefore encourage the reuse of information, which has been evaluated in two stages: evaluation of individual taxonomies within the method; and indexing of 92 reports using the method.

...read moreread less

Book•

Data Mining In Bioinformatics

[...]

Mohammed J. Zaki¹, Jason T. L. Wang², Hannu Toivonen³•Institutions (3)

Rensselaer Polytechnic Institute¹, New Jersey Institute of Technology², University of Helsinki³

01 Jan 2005

TL;DR: The goal of this workshop was to encourage KDD researchers to take on the numerous challenges that Bioinformatics offers, and encouraged papers that proposed novel data mining techniques for tasks such as gene expression, drug design and other emerging problems in genomics and proteomics.

...read moreread less

Abstract: Written especially for computer scientists, all necessary biology is explained. Presents new techniques on gene expression data mining, gene mapping for disease detection, and phylogenetic knowledge discovery.

...read moreread less

Showing papers on "Knowledge extraction published in 2005"