scispace - formally typeset
Search or ask a question

Showing papers in "Knowledge Based Systems in 2006"


Journal ArticleDOI
TL;DR: It is suggested that Boden's descriptive framework, once elaborated in detail, is more uniform and more powerful than it first appears.
Abstract: I summarise and attempt to clarify some concepts presented in and arising from Margaret Boden's (1990) descriptive hierarchy of creativity, by beginning to formalise the ideas she proposes. The aim is to move towards a model which allows detailed comparison, and hence better understanding, of systems which exhibit behaviour which would be called ''creative'' in humans. The work paves the way for the description of naturalistic, multi-agent creative AI systems, which create in a societal context. I demonstrate some simple reasoning about creative behaviour based on the new framework, to show how it might be useful for the analysis and study of creative systems. In particular, I identify some crucial properties of creative systems, in terms of the framework components, some of which may usefully be proven a priori of a given system. I suggest that Boden's descriptive framework, once elaborated in detail, is more uniform and more powerful than it first appears.

295 citations


Journal ArticleDOI
TL;DR: In this paper, an ensemble comprises multiple clusterers, each of which is trained by k-means algorithm with different initial points, and the clusters discovered by different clusterers are aligned, i.e. similar clusters are assigned with the same label by counting their overlapped data items.
Abstract: Ensemble methods that train multiple learners and then combine their predictions have been shown to be very effective in supervised learning. This paper explores ensemble methods for unsupervised learning. Here, an ensemble comprises multiple clusterers, each of which is trained by k-means algorithm with different initial points. The clusters discovered by different clusterers are aligned, i.e. similar clusters are assigned with the same label, by counting their overlapped data items. Then, four methods are developed to combine the aligned clusterers. Experiments show that clustering performance could be significantly improved by ensemble methods, where utilizing mutual information to select a subset of clusterers for weighted voting is a nice choice. Since the proposed methods work by analyzing the clustering results instead of the internal mechanisms of the component clusterers, they are applicable to diverse kinds of clustering algorithms.

180 citations


Journal ArticleDOI
TL;DR: The experiments show that the use of NIPALS (Non-Linear Iterative Partial Least Squares) PCA method can improve the performance of machine learning in the classification of high-dimensional data.
Abstract: This paper presents the results of an investigation into the use of machine learning methods for the identification of narcotics from Raman spectra. The classification of spectral data and other high-dimensional data, such as images, gene-expression data and spectral data, poses an interesting challenge to machine learning, as the presence of high numbers of redundant or highly correlated attributes can seriously degrade classification accuracy. This paper investigates the use of principal component analysis (PCA) to reduce high-dimensional spectral data and to improve the predictive performance of some well-known machine learning methods. Experiments are carried out on a high-dimensional spectral dataset. These experiments employ the NIPALS (Non-Linear Iterative Partial Least Squares) PCA method, a method that has been used in the field of chemometrics for spectral classification, and is a more efficient alternative than the widely used eigenvector decomposition approach. The experiments show that the use of this PCA method can improve the performance of machine learning in the classification of high-dimensional data.

156 citations


Journal ArticleDOI
TL;DR: This paper takes this research forward by using linguistic variables and triangular fuzzy numbers to model the decision maker's risk and confidence attitudes in order to define a more complete MCDM solution.
Abstract: Recent research has recognised that multicriteria decision making (MCDM) should take account of uncertainty, risk and confidence. This paper takes this research forward by using linguistic variables and triangular fuzzy numbers to model the decision maker's (DM) risk and confidence attitudes in order to define a more complete MCDM solution. To illustrate the computation process and demonstrate the feasibility of the results we use a travel problem that has been used previously to assess MCDM techniques. The results show that the method is useful for tackling imprecision and subjectivity in complex, ill-defined and human-oriented decision problems.

137 citations


Journal ArticleDOI
TL;DR: This work shows how Formal Concept Analysis (FCA) can be applied to Collaborative Recommenders and presents two new algorithms for finding neighbours in a collaborative recommender.
Abstract: We show how Formal Concept Analysis (FCA) can be applied to Collaborative Recommenders. FCA is a mathematical method for analysing binary relations. Here we apply it to the relation between users and items in a collaborative recommender system. FCA groups the users and items into concepts, ordered by a concept lattice. We present two new algorithms for finding neighbours in a collaborative recommender. Both use the concept lattice as an index to the recommender's ratings matrix. Our experimental results show a major decrease in the amount of work needed to find neighbours, while guaranteeing no loss of accuracy or coverage.

129 citations


Journal ArticleDOI
TL;DR: Evaluating the effectiveness of using external indicators, such as commodity prices and currency exchange rates, in predicting movements in the Dow Jones Industrial Average index found basing trading decisions on a neural network trained on a range of external indicators resulted in a return on investment of 23.5% per annum.
Abstract: The aim of this study was to evaluate the effectiveness of using external indicators, such as commodity prices and currency exchange rates, in predicting movements in the Dow Jones Industrial Average index. The performance of each technique is evaluated using different domain-specific metrics. A comprehensive evaluation procedure is described, involving the use of trading simulations to assess the practical value of predictive models, and comparison with simple benchmarks that respond to underlying market growth. In the experiments presented here, basing trading decisions on a neural network trained on a range of external indicators resulted in a return on investment of 23.5% per annum, during a period when the DJIA index grew by 13.03% per annum. A substantial dataset has been compiled and is available to other researchers interested in analysing financial time series.

105 citations


Journal ArticleDOI
TL;DR: This paper looks at the performance of an expert constructed BN compared with other machine learning techniques for predicting the outcome (win, lose, or draw) of matches played by Tottenham Hotspur Football Club.
Abstract: Bayesian networks (BNs) provide a means for representing, displaying, and making available in a usable form the knowledge of experts in a given field. In this paper, we look at the performance of an expert constructed BN compared with other machine learning (ML) techniques for predicting the outcome (win, lose, or draw) of matches played by Tottenham Hotspur Football Club. The period under study was 1995-1997 - the expert BN was constructed at the start of that period, based almost exclusively on subjective judgement. Our objective was to determine retrospectively the comparative accuracy of the expert BN compared to some alternative ML models that were built using data from the two-year period. The additional ML techniques considered were: MC4, a decision tree learner; Naive Bayesian learner; Data Driven Bayesian (a BN whose structure and node probability tables are learnt entirely from data); and a K-nearest neighbour learner. The results show that the expert BN is generally superior to the other techniques for this domain in predictive accuracy. The results are even more impressive for BNs given that, in a number of key respects, the study assumptions place them at a disadvantage. For example, we have assumed that the BN prediction is 'incorrect' if a BN predicts more than one outcome as equally most likely (whereas, in fact, such a prediction would prove valuable to somebody who could place an 'each way' bet on the outcome). Although the expert BN has now long been irrelevant (since it contains variables relating to key players who have retired or left the club) the results here tend to confirm the excellent potential of BNs when they are built by a reliable domain expert. The ability to provide accurate predictions without requiring much learning data are an obvious bonus in any domain where data are scarce. Moreover, the BN was relatively simple for the expert to build and its structure could be used again in this and similar types of problems.

102 citations


Journal ArticleDOI
TL;DR: This publication presents a system that uses ontologies and Natural Language Processing techniques to index texts, and thus supports word sense disambiguation and the retrieval of texts that contain equivalent words, by indexing them to concepts of ontologies.
Abstract: This publication shows how the gap between the HTML based internet and the RDF based vision of the semantic web might be bridged, by linking words in texts to concepts of ontologies. Most current search engines use indexes that are built at the syntactical level and return hits based on simple string comparisons. However, the indexes do not contain synonyms, cannot differentiate between homonyms ('mouse' as a pointing vs. 'mouse' as an animal) and users receive different search results when they use different conjugation forms of the same word. In this publication, we present a system that uses ontologies and Natural Language Processing techniques to index texts, and thus supports word sense disambiguation and the retrieval of texts that contain equivalent words, by indexing them to concepts of ontologies. For this purpose, we developed fully automated methods for mapping equivalent concepts of imported RDF ontologies (for this prototype WordNet, SUMO and OpenCyc). These methods will thus allow the seamless integration of domain specific ontologies for concept based information retrieval in different domains. To demonstrate the practical workability of this approach, a set of web pages that contain synonyms and homonyms were indexed and can be queried via a search engine like query frontend. However, the ontology based indexing approach can also be used for other data mining applications such text clustering, relation mining and for searching free text fields in biological databases. The ontology alignment methods and some of the text mining principles described in this publication are now incorporated into the ONDEX system http://ondex.sourceforge.net/.

101 citations


Journal ArticleDOI
TL;DR: This paper gives a never completely account of approaches that have been used for the research community for representing knowledge and the importance of a layered approach and the use of standards.
Abstract: This paper gives a never completely account of approaches that have been used for the research community for representing knowledge. After underlining the importance of a layered approach and the use of standards, it starts with early efforts used for artificial intelligence researchers. Then recent approaches, aimed mainly at the semantic web, are described. Coding examples from the literature are presented in both sections. Finally, the semantic web ontology creation process, as we envision it, is introduced.

92 citations


Journal ArticleDOI
TL;DR: A knowledge discovery model that integrates the modification of the fuzzy transaction data-mining algorithm (MFTDA) and the Adaptive-Network-Based Fuzzy Inference Systems (ANFIS) for discovering implicit knowledge in the fuzzy database more efficiently and presenting it more concisely is proposed.
Abstract: This study proposes a knowledge discovery model that integrates the modification of the fuzzy transaction data-mining algorithm (MFTDA) and the Adaptive-Network-Based Fuzzy Inference Systems (ANFIS) for discovering implicit knowledge in the fuzzy database more efficiently and presenting it more concisely. A prototype was built for testing the feasibility of the model. The testing data are from a company's human resource management department. The results indicated that the generated rules (knowledge) are useful in supporting the company to predict its employees' future performance and then assign proper persons for appropriate positions and projects. Furthermore, the convergence of ANFIS in the model was proven to be more efficient than a generic fuzzy artificial neural network.

86 citations


Journal ArticleDOI
TL;DR: A knowledge-intensive support paradigm for platform-based product family design and development is presented in this paper, where a module-based integrated design scheme is proposed with knowledge support for product family architecture modeling, product platform establishment, product family generation, and product variant assessment.
Abstract: This paper presents a knowledge-intensive support paradigm for platform-based product family design and development. The fundamental issues underlying the product family design and development, including product platform and product family modeling, product family generation and evolution, and product family evaluation for customization, are discussed. A module-based integrated design scheme is proposed with knowledge support for product family architecture modeling, product platform establishment, product family generation, and product variant assessment. A systematic methodology and the relevant technologies are investigated and developed for knowledge supported product family design process. The developed information and knowledge-modeling framework and prototype system can be used for platform product design knowledge capture, representation and management and offer on-line support for designers in the design process. The issues and requirements related to developing a knowledge-intensive support system for modular platform-based product family design are also addressed.

Journal ArticleDOI
TL;DR: A multi-objective evolutionary algorithm called improved niched Pareto genetic algorithm (INPGA) is proposed for mining highly predictive and comprehensible classification rules from large databases and has a clear edge over SGA and NPGA.
Abstract: We present a multi-objective genetic algorithm for mining highly predictive and comprehensible classification rules from large databases. We emphasize predictive accuracy and comprehensibility of the rules. However, accuracy and comprehensibility of the rules often conflict with each other. This makes it an optimization problem that is very difficult to solve efficiently. We have proposed a multi-objective evolutionary algorithm called improved niched Pareto genetic algorithm (INPGA) for this purpose. We have compared the rule generation by INPGA with that by simple genetic algorithm (SGA) and basic niched Pareto genetic algorithm (NPGA). The experimental result confirms that our rule generation has a clear edge over SGA and NPGA.

Journal ArticleDOI
TL;DR: This work advocates the use of a recently proposed software engineering paradigm, particularly suited to the construction of complex and distributed software-testing systems, which is known as Agent-Oriented Software Engineering.
Abstract: Software testing is the technical kernel of software quality engineering, and to develop critical and complex software systems not only requires a complete, consistent and unambiguous design, and implementation methods, but also a suitable testing environment that meets certain requirements, particularly, to face the complexity issues. Traditional methods, such as analyzing each requirement and developing test cases to verify correct implementation, are not effective in understanding the software's overall complex behavior. In that respect, existing approaches to software testing are viewed as time-consuming and insufficient for the dynamism of the modern business environment. This dynamics requires new tools and techniques, which can be employed in tandem with innovative approaches to using and combining existing software engineering methods. This work advocates the use of a recently proposed software engineering paradigm, which is particularly suited to the construction of complex and distributed software-testing systems, which is known as Agent-Oriented Software Engineering. This methodology is a new one, which gives the basic approach to agent-based frameworks for testing.

Journal ArticleDOI
TL;DR: This paper makes an important contribution to strategic planning of knowledge management systems in law enforcement by identifying stages of growth inknowledge management systems and by identifying examples of applications from police investigations.
Abstract: The amount of information that police officers come into contact with in the course of their work is astounding. By identifying stages of growth in knowledge management systems and by identifying examples of applications from police investigations, this paper makes an important contribution to strategic planning of knowledge management systems in law enforcement. The stages are labeled officer-to-technology systems, officer-to-officer systems, officer-to-information systems, and officer-to-application systems.

Journal ArticleDOI
TL;DR: A novel algorithm is proposed, Self-adaptive NBTree, which induces a hybrid of decision tree and Naive Bayes, which has clear advantages with respect to the generalization ability.
Abstract: Decision tree is useful to obtain a proper set of rules from a large amount of instances However, it has difficulty in obtaining the relationship between continuous-valued data points We propose in this paper a novel algorithm, Self-adaptive NBTree, which induces a hybrid of decision tree and Naive Bayes The Bayes measure, which is used to construct decision tree, can directly handle continuous attributes and automatically find the most appropriate boundaries for discretization and the number of intervals The Naive Bayes node helps to solve overgeneralization and overspecialization problems which are often seen in decision tree Experimental results on a variety of natural domains indicate that Self-adaptive NBTree has clear advantages with respect to the generalization ability

Book ChapterDOI
TL;DR: This paper presents usage data from one laboratory where, over a 29 month period, over 16,000 rules were added and 6,000,000 cases interpreted.
Abstract: Ripple-Down Rules (RDR) is an approach to building knowledge-based systems (KBS) incrementally, while the KBS is in routine use. Domain experts build rules as a minor extension to their normal duties, and are able to keep refining rules as KBS requirements evolve. Commercial RDR systems are now used routinely in some Chemical Pathology laboratories to provide interpretative comments to assist clinicians make the best use of laboratory reports. This paper presents usage data from one laboratory where, over a 29 month period, over 16,000 rules were added and 6,000,000 cases interpreted. The clearest evidence that this facility is highly valuable to the laboratory is the on-going addition of new knowledge bases and refinement of existing knowledge bases by the chemical pathologists.

Journal ArticleDOI
TL;DR: An empirical study on the importance of the analogy retrieval strategy in the domain of software design argues that both types of selection are important, but they play different roles in the process.
Abstract: Analogy is an important reasoning process in creative design. It enables the generation of new design artifacts using ideas from semantically distant domains. Candidate selection is a crucial process in the generation of creative analogies. Without a good set of candidate sources, the success of subsequent phases can be compromised. Two main types of selection have been identified: semantics-based retrieval and structure-based retrieval. This paper presents an empirical study on the importance of the analogy retrieval strategy in the domain of software design. We argue that both types of selection are important, but they play different roles in the process.

Journal ArticleDOI
TL;DR: Development of a rule-based expert system, using Expert System Shell for Text Animation (ESTA), for the diagnosis of the most common diseases occurring in Indian mango has been found to be sound and consistent.
Abstract: This paper emphasizes application of expert system in Indian fruiticulture and describes development of a rule-based expert system, using Expert System Shell for Text Animation (ESTA), for the diagnosis of the most common diseases occurring in Indian mango. The objective is to provide computer-based support for agricultural specialists or farmers. The proposed expert system makes diagnosis on the basis of response/responses of the user made against queries related to particular disease symptoms. The knowledge base of the system contains knowledge about symptoms and remedies of 14 diseases of Indian mango tree appearing during fruiting season and non-fruiting season. The picture base of the system contains pictures related to disease symptoms and are displayed along with the query of the system. The result given by the system has been found to be sound and consistent. und and consistent.

Journal ArticleDOI
TL;DR: The results indicate that the neural network based and the GA based approaches perform satisfactorily, with MSEs of 2.375 and 2.875, respectively, but the GA approach is much better understood and more transparent.
Abstract: This paper describes three approaches for the prediction of dwelling fire occurrences in Derbyshire, a region in the United Kingdom. The system has been designed to calculate the number of fire occurrences for each of the 189 wards in the Derbyshire. In terms of the results from statistical analysis, eight factors are initially selected as the inputs of the neural network. Principal Component Analysis (PCA) is employed for pre-processing the input data set to reduce the number of the inputs. The first three principal components of the available data set are chosen as the inputs, the number of the fires as the output. The first approach is a logistic regression model, which has been widely used in the forest fire prediction. The prediction results of the logistic regression model are not acceptable. The second approach uses a feed-forward neural network to model the relationship between the number of fires and the factors that influence fire occurrence. The model of the neural network gives a prediction with an acceptable accuracy for the fires in dwelling areas. Genetic algorithms (GAs) are the third approach discussed in this study. The first three principle components of the available data set are classified into the different groups according to their number of fires. An iterative GA is proposed and applied to extract features for each data group. Once the features for all the groups have been identified the test data set can be easily clustered into one of the groups based on the group features. The number of fires for the group, which the test data belongs to, is the prediction of the fire occurrence for the test data. The three approaches have been compared. Our results indicate that the neural network based and the GA based approaches perform satisfactorily, with MSEs of 2.375 and 2.875, respectively, but the GA approach is much better understood and more transparent.

Journal ArticleDOI
TL;DR: It is argued that, because the documents of the semantic web are created by human beings, they are actually much more like natural language documents than theory would have us believe.
Abstract: This paper argues that, because the documents of the semantic web are created by human beings, they are actually much more like natural language documents than theory would have us believe. We present evidence that natural language words are used extensively and in complex ways in current ontologies. This leads to a number of dangers for the semantic web, but also opens up interesting new challenges for natural language processing. This is illustrated by our own work using natural language generation to present parts of ontologies.

Journal ArticleDOI
TL;DR: A Petri net tool, P3, is implemented, which can be used as a knowledge acquisition tool based on the PetriNet ontology, which is represented on the Semantic Web using XML-based ontology languages, RDF and OWL.
Abstract: The paper presents the Petri net ontology that enables sharing Petri nets on the Semantic Web. Previous work on formal methods for representing Petri nets mainly defines tool-specific descriptions or formats for model interchange. However, such efforts do not provide a suitable description for using Petri nets on the Semantic Web. This paper uses the Petri net UML model as a starting point for implementing the ontology. Resulting Petri net models are represented on the Semantic Web using XML-based ontology languages, RDF and OWL. We implemented a Petri net tool, P3, which can be used as a knowledge acquisition tool based on the Petri net ontology.

Journal ArticleDOI
TL;DR: An evolutionary approach with modularized evaluation functions to forecast financial distress is introduced, which allows using any evolutionary algorithm to extract the set of critical financial ratios and integrates more evaluation function modules to achieve a better forecasting accuracy by assigning distinct weights.
Abstract: Due to the radical changing of the global economy, a more precise forecasting of corporate financial distress helps provide important judgment principles to decision-makers. Although financial statements reflect a firm's business activities, it is very challenging to discover critical information from these statements. Applying machine learning algorithms can be demonstrated to improve forecasting accuracy in predicting corporate bankruptcy. In this paper, we introduce an evolutionary approach with modularized evaluation functions to forecast financial distress, which allows using any evolutionary algorithm to extract the set of critical financial ratios and integrates more evaluation function modules to achieve a better forecasting accuracy by assigning distinct weights. To achieve a more precise predicting accuracy, the undesirable forecasting results from some modules are weeded out, if their predicting accuracies are out of the allowable tolerance range as learned from our mechanism.

Journal ArticleDOI
TL;DR: This algorithm seeks the automatic raster - vector conversion based on skeleton extraction and graph theory and using GIS database if it is available to provide a numerical structured file which includes the geometric definition for all roads as well as the topologic relations between them.
Abstract: In this paper, a new method in order to achieve the geometrical and topological definition of extracted road networks is presented Starting from a raster binary image where a road network is depicted, this algorithm seeks the automatic raster - vector conversion based on skeleton extraction and graph theory and using GIS database if it is available The last goal of this method is to provide a numerical structured file which includes the geometric definition for all roads as well as the topologic relations between them The applied technique comprises six steps In the first step, the quality of the binary image is improved through a noise cleaning process In the second step, parallel edges of road network are smoothed by means a generalization process In the third step, skeleton is extracted applying a known and efficient method which some years ago was published The fourth step consists in constructing the graph and generating the different cartographic objects which compose the road network In this phase, GIS information can be used in order to improve the result In the fifth step, objects are numerically adjusted by means of polynomial adjustment in the opened objects case, and using a reiterative polygonal adjustment in the sharp objects case In the last step, mathematical morphology is applied to validate topologically the geometrical adjustment For it, the junction nodes are analyzed for changing automatically their coordinates in order to achieve a topologically correct road network vectorization Finally, objects are structured according to cartographic criteria and a numerical file with the vectorized road network is provided Experimental results show the validity of this approach

Journal ArticleDOI
Chris Reed1
TL;DR: This paper compares two approaches to dialogue that have grown from two different disciplines; the descriptive-normative approach of applied philosophy, and the formal, implemented approach of computer science.
Abstract: Dialogic argumentation is a crucial component in many computational domains, and forms a core component of argumentation theory. This paper compares two approaches to dialogue that have grown from two different disciplines; the descriptive-normative approach of applied philosophy, and the formal, implemented approach of computer science. The commonalities between the approaches are explored in developing a means for representing dialogic argumentation in a common format. This common format uses an XML-based language that views locutions as state-changing operations, drawing on an analogy with classical artificial intelligence planning. This representation is then shown to hold a number of important advantages in areas of artificial intelligence and philosophy.

Journal ArticleDOI
TL;DR: For classification generalization ability, the simulation results from the iris data and the appendicitis data demonstrate that proposed method performs well in comparison with other classification methods.
Abstract: This paper propose a new method, that employs the genetic algorithm, to find fuzzy association rules for classification problems based on an effective method for discovering the fuzzy association rules, namely the fuzzy grids based rules mining algorithm (FGBRMA). It is considered that some important parameters, including the number and shapes of membership functions in each quantitative attribute and the minimum fuzzy support, are not easily user-specified. Thus, the above-mentioned parameters are automatically determined by a binary string or chromosome is composed of two substrings: one for each quantitative attribute by the coding method proposed by Ishibuchi and Murata, and the other for the minimum fuzzy support. In each generation, the fitness value, which maximizes the classification accuracy rate and minimizes the number of fuzzy rules, of each chromosome can be obtained. When reaching the termination condition, a chromosome with maximum fitness value is then used to test its performance. For classification generalization ability, the simulation results from the iris data and the appendicitis data demonstrate that proposed method performs well in comparison with other classification methods.

Journal ArticleDOI
TL;DR: Without the need of implementing very specific adaptation rules, the proposed approach resolves the problem of acquiring adaptation knowledge by combining the search power of a genetic algorithm with the guidance provided by domain-specific knowledge.
Abstract: In case-based reasoning systems the adaptation phase is a notoriously difficult and complex step. The design and implementation of an effective case adaptation algorithm is generally determined by the type of application which decides the nature and the structure of the knowledge to be implemented within the adaptation module, and the level of user involvement during this phase. A new adaptation approach is presented in this paper which uses a modified genetic algorithm incorporating specific domain knowledge and information provided by the retrieved cases. The approach has been developed for a CBR system (CBEM) supporting the use and design of numerical models for estuaries. The adaptation module finds the values of hundreds of parameters for a selected numerical model retrieved from the case-base that is to be used in a new problem context. Without the need of implementing very specific adaptation rules, the proposed approach resolves the problem of acquiring adaptation knowledge by combining the search power of a genetic algorithm with the guidance provided by domain-specific knowledge. The genetic algorithm consists of a modifying version of the classical genetic operations of initialisation, selection, crossover and mutation designed to incorporate practical but general principles of model calibration without reference to any specific problems. The genetic algorithm focuses the search within the parameters' space on those zones that most likely contain the required solutions thus reducing computational time. In addition, the design of the genetic algorithm-based adaptation routine ensures that the parameter values found are suitable for the model approximation and hypotheses, and complies with the problem domain features providing correct and realistic model outputs. This adaptation method is suitable for case-based reasoning systems dealing with numerical modelling applications that require the substitution of a large number of parameter values.

Journal ArticleDOI
TL;DR: This work (ALMG—Automatic Log Mining via Genetic) has mined web log files via genetic algorithm using ‘Genetic Algorithm’ (GA) and developed an application which found sequential accessed page groups automatically.
Abstract: This paper is concerned with finding sequential accesses from web log files, using ‘Genetic Algorithm’ (GA). Web log files are independent from servers, and they are ASCII format. Each transaction, whether completed or not, is recorded in the web log files and these files are unstructured for knowledge discovery in database techniques. Data which is stored in web logs have become important for discovering of user behaviors since the using of internet increased rapidly. Analyzing of these log files is one of the important research area of web mining. Especially, with the advent of CRM (Customer Resource Management) issues in business circle, most of the modern firms operating web sites for several purposes are now adopting web-mining as a strategic way of capturing knowledge about potential needs of target customers, future trends in the market and other management factors. Our work (ALMG—Automatic Log Mining via Genetic) has mined web log files via genetic algorithm. When we search the studies about web mining in literature, it can be seen that, GA is generally used in web content and web structure mining. On the other hand, ALMG is a study about web mining usage. The difference between ALMG and other similar works at literature is this point. As for in another work that we are encountering, GA is used for processing the data between HTML tags which are placed at client PC. But ALMG extracts information from data which is placed at server. It is thought to use log files is an advantage for our purpose. Because, we find the character of requests which is made to the server than detect a single person's behavior. We developed an application with this purpose. Firstly, the application is analyzed web log files, than found sequential accessed page groups automatically.

Journal ArticleDOI
TL;DR: This paper contrasts the goals of knowledge-based merging with other technologies such as semantic web technologies, information mediators, and database integration systems, and uses fusion rules to manage the semi-structured information that is input for merging.
Abstract: There is an increasing need for technology for merging semi-structured information (such as structured reports) from heterogeneous sources. For this, we advocate a knowledge-based approach when the information to be merged incorporates diverse, and potentially complex, conflicts (inconsistencies). In this paper, we contrast the goals of knowledge-based merging with other technologies such as semantic web technologies, information mediators, and database integration systems. We then explain how a system for knowledge-based merging can be constructed for a given application. To support the use of a knowledgebase, we use fusion rules to manage the semi-structured information that is input for merging. Fusion rules are a form of scripting language that defines how structured reports should be merged. The antecedent of a fusion rule is a call to investigate the information in the structured reports and the background knowledge, and the consequent of a fusion rule is a formula specifying an action to be undertaken to form a merged report. Fusion rules are not necessarily a definitive specification of how the input can be merged. They can be used by the user to explore different ways that the input can be merged. However, if the user has sufficient confidence in the output from a set of fusion rules, they can be regarded as a definitive specification for merging, and furthermore, they can then be treated as a form of meta-knowledge that gives the provenance of the merged reports. The integrated usage of fusion rules with a knowledgebase offers a practical and valuable technology for merging conflicting information.

Journal ArticleDOI
TL;DR: This paper introduces a semantic converter which converts knowledge representation in Conceptual Graphs into representation in Resource Description Framework, as an experiment to overcome the barriers in knowledge sharing and reuse due to lack of consensus of knowledge representation formats between these two knowledge representation models.
Abstract: One problem in knowledge based systems is the problem of knowledge sharing. Many systems use proprietary frameworks for storing knowledge, and even those systems that use standard knowledge representation formats have the problem that more than one such format exists. Conceptual Graphs and Resource Description Framework are two general-purpose knowledge representation models. Since they are structurally similar in syntax, concepts, and semantics, converting Conceptual Graph models to Resource Description Framework models without loss of semantic meaning is feasible. In this paper, we introduce a semantic converter which converts knowledge representation in Conceptual Graphs into representation in Resource Description Framework, as an experiment to overcome the barriers in knowledge sharing and reuse due to lack of consensus of knowledge representation formats between these two knowledge representation models.

Journal ArticleDOI
TL;DR: A Navigational Pattern mining (NP-miner) algorithm for discovering frequent sequential patterns on the proposed Navgational Pattern Tree for providing real-time recommendations for online users.
Abstract: Web usage mining is widely applied in various areas, and dynamic recommendation is one web usage mining application. However, most of the current recommendation mechanisms need to generate all association rules before recommendations. This takes lots of time in offline computation, and cannot provide real-time recommendations for online users. This study proposes a Navigational Pattern Tree structure for storing the web accessing information. Besides, the Navigational Pattern Tree supports incremental growth for immediately modeling web usage behavior. To provide real-time recommendations efficiently, we develop a Navigational Pattern mining (NP-miner) algorithm for discovering frequent sequential patterns on the proposed Navigational Pattern Tree. According to historical patterns, the NP-miner scans relevant sub-trees of the Navigational Pattern Tree repeatedly for generating candidate recommendations. The experiments study the performance of the NP-miner algorithm through synthetic datasets from real applications. The results show that the NP-miner algorithm can efficiently perform online dynamic recommendation in a stable manner.