scispace - formally typeset
Search or ask a question

Showing papers in "International Journal on Artificial Intelligence Tools in 2008"


Journal ArticleDOI
TL;DR: An empirical analysis of four different real-time software defect data sets using different predictor models shows that a combination of 1R and Instance-based learning along with Consistency-based subset evaluation technique provides a relatively better consistency in achieving accurate predictions as compared with other models.
Abstract: Automated reliability assessment is essential for systems that entail dynamic adaptation based on runtime mission-specific requirements. One approach along this direction is to monitor and assess the system using machine learning-based software defect prediction techniques. Due to the dynamic nature of software data collected, Instance-based learning algorithms are proposed for the above purposes. To evaluate the accuracy of these methods, the paper presents an empirical analysis of four different real-time software defect data sets using different predictor models. The results show that a combination of 1R and Instance-based learning along with Consistency-based subset evaluation technique provides a relatively better consistency in achieving accurate predictions as compared with other models. No direct relationship is observed between the skewness present in the data sets and the prediction accuracy of these models. Principal Component Analysis (PCA) does not show a consistent advantage in improving the...

104 citations


Journal ArticleDOI
TL;DR: A general approach for representing the knowledge of a potential expert as a mixture of language models from associated documents, which allows the expert indirectly through the set of associated documents to exploit their underlying structure and complex language features.
Abstract: Enterprise corpora contain evidence of what employees work on and therefore can be used to automatically find experts on a given topic. We present a general approach for representing the knowledge of a potential expert as a mixture of language models from associated documents. First we retrieve documents given the expert's name using a generative probabilistic technique and weight the retrieved documents according to expert-specific posterior distribution. Then we model the expert indirectly through the set of associated documents, which allows us to exploit their underlying structure and complex language features. Experiments show that our method has excellent performance on the expert search task of the TREC Enterprise track and that it effectively collects and combines evidence for expertise in a heterogeneous collection.

74 citations


Journal ArticleDOI
TL;DR: This paper introduces a stochastic graph-based algorithm, called OutRank, for detecting outliers in data, which is more robust than the existing outlier detection schemes and can effectively address the inherent problems of such schemes.
Abstract: This paper introduces a stochastic graph-based algorithm, called OutRank, for detecting outliers in data. We consider two approaches for constructing a graph representation of the data, based on the object similarity and number of shared neighbors between objects. The heart of this approach is the Markov chain model that is built upon this graph, which assigns an outlier score to each object. Using this framework, we show that our algorithm is more robust than the existing outlier detection schemes and can effectively address the inherent problems of such schemes. Empirical studies conducted on both real and synthetic data sets show that significant improvements in detection rate and false alarm rate are achieved using the proposed framework.

73 citations


Journal ArticleDOI
TL;DR: In this paper, the authors apply knowledge transfer to deep convolutional neural nets, which they argue are particularly well suited for knowledge transfer, and demonstrate that components of a trained deep CNN can constructively transfer information to another such CNN.
Abstract: Knowledge transfer is widely held to be a primary mechanism that enables humans to quickly learn new complex concepts when given only small training sets. In this paper, we apply knowledge transfer to deep convolutional neural nets, which we argue are particularly well suited for knowledge transfer. Our initial results demonstrate that components of a trained deep convolutional neural net can constructively transfer information to another such net. Furthermore, this transfer is completed in such a way that one can envision creating a net that could learn new concepts throughout its lifetime. The experiments we performed involved training a Deep Convolutional Neural Net (DCNN) on a large training set containing 20 different classes of handwritten characters from the NIST Special Database 19. This net was then used as a foundation for training a new net on a set of 20 different character classes from the NIST Special Database 19. The new net would keep the bottom layers of the old net (i.e. those nearest to the input) and only allow the top layers to train on the new character classes. We purposely used small training sets for the new net to force it to rely as much as possible upon transferred knowledge as opposed to a large and varied training set to learn the new set of hand written characters. Our results show a clear advantage in relying upon transferred knowledge to learn new tasks when given small training sets, if the new tasks are sufficiently similar to the previously mastered one. However, this advantage decreases as training sets increase in size.

54 citations


Journal ArticleDOI
TL;DR: An overview of the rules, competition format, benchmarks, participants and results of SMT-COMP 2007 is given.
Abstract: The Satisfiability Modulo Theories Competition (SMT-COMP) is an annual competition aimed at stimulating the advance of the state-of-the-art techniques and tools developed by the Satisfiability Modulo Theories (SMT) community. As with the first two editions, SMT-COMP 2007 was held as a satellite event of CAV 2007, held July 3-7, 2007. This paper gives an overview of the rules, competition format, benchmarks, participants and results of SMT-COMP 2007.

46 citations


Journal ArticleDOI
TL;DR: The optimal scheduler is presented, which is a scheduler that finds provably optimal schedules for basic blocks using techniques from constraint programming and scaled to the largest basic blocks, including basic blocks with up to 2600 instructions.
Abstract: Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. A fundamental problem that arises in instruction scheduling is to find a minimum length schedule for a basic block — a straight-line sequence of code with a single entry point and a single exit point — subject to precedence, latency, and resource constraints. Solving the problem exactly is NP-complete, and heuristic approaches are currently used in most compilers. In contrast, we present a scheduler that finds provably optimal schedules for basic blocks using techniques from constraint programming. In developing our optimal scheduler, the key to scaling up to large, real problems was in the development of preprocessing techniques for improving the constraint model. We experimentally evaluated our optimal scheduler on the SPEC 2000 integer and floating point benchmarks. On this benchmark suite, the optimal scheduler was very robust — all but a handful of the hundreds of thousands of basic blocks in our benchmark suite were solved optimally within a reasonable time limit — and scaled to the largest basic blocks, including basic blocks with up to 2600 instructions. This compares favorably to the best previous exact approaches.

42 citations


Journal ArticleDOI
TL;DR: It is shown that the proximity of two spatial features can be captured by summarizing their spatial objects embedded in a continuous space via various techniques and that clustering techniques can be applied to reveal the rich structure formed by co-located spatial features.
Abstract: The goal of spatial co-location pattern mining is to find subsets of spatial features frequently located together in spatial proximity Example co-location patterns include services requested frequently and located together from mobile devices (eg, PDAs and cellular phones) and symbiotic species in ecology (eg, Nile crocodile and Egyptian plover) Spatial clustering groups similar spatial objects together Reusing research results in clustering, eg algorithms and visualization techniques, by mapping co-location mining problem into a clustering problem would be very useful However, directly clustering spatial objects from various spatial features may not yield well-defined co-location patterns Clustering spatial objects in each layer followed by overlaying the layers of clusters may not applicable to many application domains where the spatial objects in some layers are not clustered In this paper, we propose a new approach to the problem of mining co-location patterns using clustering techniques First, we propose a novel framework for co-location mining using clustering techniques We show that the proximity of two spatial features can be captured by summarizing their spatial objects embedded in a continuous space via various techniques We define the desired properties of proximity functions compared to similarity functions in clustering Furthermore, we summarize the properties of a list of popular spatial statistical measures as the proximity functions Finally, we show that clustering techniques can be applied to reveal the rich structure formed by co-located spatial features A case study on real datasets shows that our method is effective for mining co-locations from large spatial datasets

41 citations


Journal ArticleDOI
TL;DR: The results show that using the KRR the authors are better able to cope with ambiguities in the anchoring module through exploitation of human robot interaction.
Abstract: In this work we introduce symbolic knowledge representation and reasoning capabilities to enrich perceptual anchoring The idea that encompasses perceptual anchoring is the creation and maintenance of a connection between the symbolic and perceptual description that refer to the same object in the environment In this work we further extend the symbolic layer by combining a knowledge representation and reasoning (KRR) system with the anchoring module to exploit a knowledge inference mechanisms We implemented a prototype of this novel approach to explore through initial experimentation the advantages of integrating a symbolic knowledge system to the anchoring framework in the context of an intelligent home Our results show that using the KRR we are better able to cope with ambiguities in the anchoring module through exploitation of human robot interaction

33 citations


Journal ArticleDOI
TL;DR: A graph-based approach to the task of textual entailment between a text and hypothesis and relies heavily on the concept of subsumption, which is able to customize it so that high-confidence results are obtained.
Abstract: In this paper we study a graph-based approach to the task of Recognizing Textual Entailment between a Text and a Hypothesis. The approach takes into account the full lexico-syntactic context of both the Text and Hypothesis and is based on the concept of subsumption. It starts with mapping the Text and Hypothesis on to graph structures that have nodes representing concepts and edges representing lexico-syntactic relations among concepts. An entailment decision is then made on the basis of a subsumption score between the Text-graph and Hypothesis-graph. The results obtained from a standard entailment test data set were promising. The impact of synonymy on entailment is quantified and discussed. An important advantage to a solution like ours is its ability to be customized to obtain high-confidence results.

29 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose a pattern-based reasoning approach, which offers 9 patterns of constraint contradictions that lead to unsatisfiability in Object-Role (ORM) models.
Abstract: Reasoning with ontologies is a challenging task specially for non-logic experts. When checking whether an ontology contains rules that contradict each other, current description logic reasoners can only provide a list of the unsatisfiable concepts. Figuring out why these concepts are unsatisfiable, which rules cause conflicts, and how to resolve these conflicts, is all left to the ontology modeler himself. The problem becomes even more challenging in case of large or medium size ontologies, because an unsatisfiable concept may cause many of its neighboring concepts to be unsatisfiable. The goal of this article is to empower ontology engineering with a user-friendly reasoning mechanism. We propose a pattern-based reasoning approach, which offers 9 patterns of constraint contradictions that lead to unsatisfiability in Object-role (ORM) models. The novelty of this approach is not merely that constraint contradictions are detected, but mainly that it provides the causes and suggestions to resolve contradictions. The approach is implemented in the DogmaModeler ontology engineering tool, and tested in building the CCFORM ontology. We discuss that, although this pattern-based reasoning covers most of contradictions in practice, compared with description logic based reasoning, it is not complete. We argue and illustrate both approaches, pattern-based and description logic-based, their implementation in the DogmaModeler, and conclude that both complement each other from a methodological perspective.

26 citations


Journal ArticleDOI
TL;DR: This paper presents a prototype implementation in a tabled-logic programming environment that illustrates the key features of the specification-driven approach to Web service composition and identifies the cause(s) for such failure which can be used by the developer to reformulate the goal specification.
Abstract: We propose a specification-driven approach to Web service composition. Our framework allows the users (or service developers) to start with a high-level, possibly incomplete specification of a desired (goal) service that is to be realized using a subset of the available component services. These services are represented using labeled transition systems augmented with guards over variables with infinite domains and are used to determine a strategy for their composition that would realize the goal service functionality. However, in the event the goal service cannot be realized using the available services, our approach identifies the cause(s) for such failure which can then be used by the developer to reformulate the goal specification. Thus, the technique supports Web service composition through iterative reformulation of the functional specification. We present a prototype implementation in a tabled-logic programming environment that illustrates the key features of the proposed approach.

Journal ArticleDOI
TL;DR: Results confirm that SHIPs is currently the most effective approach for the HIPP problem, being several orders of magnitude faster than existing integer linear programming and branch and bound solutions.
Abstract: Mutation in DNA is the principal cause for differences among human beings, and Single Nucleotide Polymorphisms (SNPs) are the most common mutations. Hence, a fundamental task is to complete a map of haplotypes (which identify SNPs) in the human population. Associated with this effort, a key computational problem is the inference of haplotype data from genotype data, since in practice genotype data rather than haplotype data is usually obtained. Different haplotype inference approaches have been proposed, including the utilization of statistical methods and the utilization of the pure parsimony criterion. The problem of haplotype inference by pure parsimony (HIPP) is interesting not only because of its application to haplotype inference, but also because it is a challenging NP-hard problem, being APX-hard. Recent work has shown that a SAT-based approach is the most efficient approach for the problem of haplotype inference by pure parsimony (HIPP), being several orders of magnitude faster than existing integer linear programming and branch and bound solutions. This paper provides a detailed description of SHIPs, a SAT-based approach for the HIPP problem, and presents comprehensive experimental results comparing SHIPs with all other exact approaches for the HIPP problem. These results confirm that SHIPs is currently the most effective approach for the HIPP problem.

Journal ArticleDOI
TL;DR: This work applies advanced BNs models to CF tasks instead of simple ones, and work on real-world multi-class CF data instead of synthetic binary-class data, finding that the ELR-optimized BNs CF models are robust in terms of the ability to make predictions, while the robustness of the Pearson correlation-based CF algorithm degrades as the sparseness of the data increases.
Abstract: As one of the most successful recommender systems, collaborative filtering (CF) algorithms are required to deal with high sparsity and high requirement of scalability amongst other challenges. Bayesian networks (BNs), one of the most frequently used classifiers, can be used for CF tasks. Previous works on applying BNs to CF tasks were mainly focused on binary-class data, and used simple or basic Bayesian classifiers.1,2 In this work, we apply advanced BNs models to CF tasks instead of simple ones, and work on real-world multi-class CF data instead of synthetic binary-class data. Empirical results show that with their ability to deal with incomplete data, the extended logistic regression on tree augmented naive Bayes (TAN-ELR)3 CF model consistently performs better than the traditional Pearson correlation-based CF algorithm for the rating data that have few items or high missing rates. In addition, the ELR-optimized BNs CF models are robust in terms of the ability to make predictions, while the robustness of the Pearson correlation-based CF algorithm degrades as the sparseness of the data increases.

Journal ArticleDOI
TL;DR: This work uses a qualitative abstraction mechanism to create a representation of space consisting of the circular order of detected landmarks and the relative position of walls towards the agent's moving direction to empower the agent to learn a certain goal-directed navigation strategy faster compared to metrical representations.
Abstract: The representation of the surrounding world plays an important role in robot navigation, especially when reinforcement learning is applied. This work uses a qualitative abstraction mechanism to create a representation of space consisting of the circular order of detected landmarks and the relative position of walls towards the agent's moving direction. The use of this representation does not only empower the agent to learn a certain goal-directed navigation strategy faster compared to metrical representations, but also facilitates reusing structural knowledge of the world at different locations within the same environment. Acquired policies are also applicable in scenarios with different metrics and corridor angles. Furthermore, gained structural knowledge can be separated, leading to a generally sensible navigation behavior that can be transferred to environments lacking landmark information and/or totally unknown environments.

Journal ArticleDOI
TL;DR: The principal result reported here is that randomly chosen subsets of heuristics can improve the identification of an appropriate mixture of heuristic mixture on a specific class of problems.
Abstract: Problem solvers, both human and machine, have at their disposal many heuristics that may support effective search. The efficacy of these heuristics, however, varies with the problem class, and their mutual interactions may not be well understood. The long-term goal of our work is to learn how to select appropriately from among a large body of heuristics, and how to combine them into a mixture that works well on a specific class of problems. The principal result reported here is that randomly chosen subsets of heuristics can improve the identification of an appropriate mixture of heuristics. A self-supervised learner uses this method here to learn to solve constraint satisfaction problems quickly and effectively.

Journal ArticleDOI
TL;DR: Numerical inequalities can be treated as logical predicates that have been extracted from the data itself and not postulated apriori in a method of generating logical rules, or axioms, from empirical data.
Abstract: We review a method of generating logical rules, or axioms, from empirical data. This method, using closed set properties of formal concept analysis, has been previously described and tested on rather large sets of deterministic data. In spite of the fact that formal concept techniques have been used to prune frequent set data mining results, frequency and/or statistical significance are totally irrelevant to this method. It is strictly logical and deterministic. The contribution of this paper is a completely new extension of this method to create implications involving numeric inequalities. That is, numerical inequalities such as "age > 39" can be treated as logical predicates that have been extracted from the data itself and not postulated apriori.

Journal ArticleDOI
TL;DR: A novel, non-statistical technique to generate a background model and use this model for background subtraction and foreground region detection in the presence of such challenges as waving flags, fluctuating monitors, water surfaces, etc.
Abstract: Video segmentation is one of the most important tasks in high-level video processing applications. Stationary cameras are usually used in applications such as video surveillance and human activity recognition. However, possible changes in the background of the video such as waving flags, fluctuating monitors, water surfaces, etc. make the detection of objects of interest particularly challenging. These types of backgrounds are called quasi-stationary backgrounds. In this paper we propose a novel, non-statistical technique to generate a background model and use this model for background subtraction and foreground region detection in the presence of such challenges. The main advantage of the proposed method over the state of the art is that unlike statistical techniques the accuracy of foreground regions is not limited to the estimate of the probability density. Also, the memory requirements of our method are independent of the number of training samples. This makes the proposed method useful in various scenarios including the presence of slow changes in the background. A comprehensive study is presented on the efficiency of the proposed method. Its performance is compared with various existing techniques quantitatively and qualitatively to show its superiority in various applications.

Journal ArticleDOI
TL;DR: This paper proposes an advanced nonmonotone Conjugate Gradient training algorithm for recurrent neural networks, which is equipped with an adaptive tuning strategy for both the non monotone learning horizon and the stepsize.
Abstract: Recurrent networks constitute an elegant way of increasing the capacity of feedforward networks to deal with complex data in the form of sequences of vectors They are well known for their power to model temporal dependencies and process sequences for classification, recognition, and transduction In this paper we propose an advanced nonmonotone Conjugate Gradient training algorithm for recurrent neural networks, which is equipped with an adaptive tuning strategy for both the nonmonotone learning horizon and the stepsize Simulation results in sequence processing using three different recurrent architectures demonstrate that this modification of the Conjugate Gradient method is more effective than previous attempts

Journal ArticleDOI
TL;DR: The proposed model is robust (for a variety of languages) and computationally efficient and may be useful as a pre-processing tool to various language engineering and text mining applications such as spell-checkers, electronic dictionaries, morphological analyzers etc.
Abstract: This paper addresses the problem of automatic induction of the normalized form (lemma) of regular and mildly irregular words with no direct supervision using language-independent algorithms More specifically, two string distance metric models (ie the Levenshtein Edit Distance algorithm and the Dice Coefficient similarity measure) were employed in order to deal with the automatic word lemmatization task by combining two alignment models based on the string similarity and the most frequent inflectional suffixes The performance of the proposed model has been evaluated quantitatively and qualitatively Experiments were performed for the Modern Greek and English languages and the results, which are set within the state-of-the-art, have showed that the proposed model is robust (for a variety of languages) and computationally efficient The proposed model may be useful as a pre-processing tool to various language engineering and text mining applications such as spell-checkers, electronic dictionaries, morphological analyzers etc

Journal ArticleDOI
TL;DR: Mining Maximal Frequent Item-set (MFI) is an alternative to mining the complete frequent item-set in dense datasets that addresses the challenge of overcoming the inherent computational complexity of this task.
Abstract: Because of the inherent computational complexity, mining the complete frequent item-set in dense datasets remains to be a challenging task. Mining Maximal Frequent Item-set (MFI) is an alternative ...

Journal ArticleDOI
TL;DR: A recent heuristic-based approach to compute infeasible minimal subparts of discrete CSPs, also called Minimally Unsatisfiable Cores (MUCs), is improved, based on the heuristic exploitation of the number of times each constraint has been falsified during previous failed search steps.
Abstract: When a Constraint Satisfaction Problem (CSP) admits no solution, it can be useful to pinpoint which constraints are actually contradicting one another and make the problem infeasible. In this paper, a recent heuristic-based approach to compute infeasible minimal subparts of discrete CSPs, also called Minimally Unsatisfiable Cores (MUCs), is improved. The approach is based on the heuristic exploitation of the number of times each constraint has been falsified during previous failed search steps. It appears to enhance the performance of the initial technique, which was the most efficient one until now.

Journal ArticleDOI
TL;DR: This paper designs an incremental move, maintaining maximal holes, for the strip packing problem, a variant of the famous 2D bin-packing, and implements a metaheuristic, with no user-defined parameter, using this move and standard greedy heuristics.
Abstract: When handling 2D packing problems, numerous incomplete and complete algorithms maintain a so-called bottom-left (BL) property: no rectangle placed in a container can be moved more left or bottom. While it is easy to make a rectangle BL when it is added in a container, it is more expensive to maintain all the placed pieces BL when a rectangle is removed. This prevents researchers from designing incremental moves for metaheuristics or efficient complete optimization algorithms. This paper investigates the possibility of violating the BL property. Instead, we propose to maintain the set of maximal holes, which allows incremental additions and removals of rectangles. To validate our alternative approach, we have designed an incremental move, maintaining maximal holes, for the strip packing problem, a variant of the famous 2D bin-packing. We have also implemented a metaheuristic, with no user-defined parameter, using this move and standard greedy heuristics. We have finally designed two variants of this incomplete method. In the first variant, a better first layout is provided by a hyperheuristic proposed by some of the authors. In the second variant, a fast repacking procedure recovering the BL property is occasionally called during the local search. Experimental results show that the approach is competitive with the best known incomplete algorithms.

Journal ArticleDOI
TL;DR: A new approach for curve clustering designed for analysis of spatiotemporal data based on regression and Gaussian mixture modeling with incorporation of spatial smoothness constraints in the form of a prior for the data labels is presented.
Abstract: We present a new approach for curve clustering designed for analysis of spatiotemporal data. Such data contains both spatial and temporal patterns that we desire to capture. The proposed methodology is based on regression and Gaussian mixture modeling. The novelty of the herein work is the incorporation of spatial smoothness constraints in the form of a prior for the data labels. This allows to take into account the property of spatiotemporal data according to which spatially adjacent data points have higher probability to belong to the same cluster. The proposed model can be formulated as a Maximum a Posteriori (MAP) problem, where the Expectation Maximization (EM) algorithm is used to estimate the model parameters. Several numerical experiments with both simulated data and real cardiac perfusion MRI data are used for evaluating the methodology. The results are promising and demonstrate the value of the proposed approach.

Journal ArticleDOI
TL;DR: The Ant Colony Optimization is used for the construction of a hybrid algorithmic scheme which effectively handles the Pap Smear Cell classification problem and is properly combined with a number of nearest neighbor based approaches for performing the requested classification task.
Abstract: During the last years nature inspired intelligent techniques have become attractive for analyzing large data sets and solving complex optimization problems. In this paper, one of the most interesting of them, the Ant Colony Optimization (ACO), is used for the construction of a hybrid algorithmic scheme which effectively handles the Pap Smear Cell classification problem. This algorithmic approach is properly combined with a number of nearest neighbor based approaches for performing the requested classification task, through the solution of the so-called optimal feature subset selection problem. The proposed complete algorithmic scheme is tested in two sets of data. The first one consists of 917 images of pap smear cells and the second set consists of 500 images, classified carefully by expert cyto-technicians and doctors. Each cell is described by 20 numerical features, and the cells fall into seven (7) classes, four (4) representing normal cells and three (3) abnormal cases. Nevertheless, from the medical diagnosis viewpoint, a minimum requirement corresponds to the general two-class problem of correct separation between normal from abnormal cells.

Journal ArticleDOI
TL;DR: In this paper, the authors describe an algorithm and experiments for inference of edge replacement graph grammars, which generates candidate recursive graph grammar productions based on isomorphic subgraphs which overlap by two nodes.
Abstract: We describe an algorithm and experiments for inference of edge replacement graph grammars. This method generates candidate recursive graph grammar productions based on isomorphic subgraphs which overlap by two nodes. If there is no edge between the two overlapping nodes, the method generates a recursive graph grammar production with a virtual edge. We guide the search for the graph grammar based on the size of the grammar and the portion of the graph described by the grammar. We show experiments where we generate graphs from known graph grammars, use our method to infer the grammar from the generated graphs, and then measure the error between the original and inferred grammars. Experiments show that the method performs well on several types of grammars, and specifically that error decreases with increased numbers of unique labels in the graph.

Journal ArticleDOI
TL;DR: The idea that variable ordering heuristics for CSPs can be characterised in terms of a small number of distinguishable actions or strategies is developed, and that while specific heuristic may be classified differently depending on the problem type, the basic actions that determine their classification are the same.
Abstract: This paper develops the idea that variable ordering heuristics for CSPs can be characterised in terms of a small number of distinguishable actions or strategies, and that while specific heuristics may be classified differently depending on the problem type, the basic actions that determine their classification are the same. These strategies can be described as building up contention and propagating effects to future, uninstantiated variables. The propagation-of-effects type of action is related to the "simplification hypothesis" of Hooker and Vinay, but since this is only one of two independent actions, this work gives a more complete account of the basis of heuristic performance. The basic technique uses factor analysis to simplify the set of correlations between performance scores for a particular set of problems. The present work shows that the results of this analysis are robust, and that amenability of a problem to one or the other kind of heuristic action is also a robust effect. This approach also elucidates the effectiveness of modern adaptive heuristics by showing that they balance the two kinds of heuristic action, which is known to enhance performance. The two heuristic actions are distinguished by descriptive measures such as depth of failure and depth at which a problem becomes tractable, that reflect differences in rapidity of effects with respect to search depth. Other experiments indicate that the effectiveness of propagation-of-effects depends on the degree to which nonadjacent assignments can interact in the future part of the problem. Extensions to structured problems suggest that the theoretical analysis can be generalised, although the factor analysis of performance becomes more complicated due to the interaction of the basic actions with specific parts of the problem. This work contributes to the goals of explaining heuristic performance and putting heuristic selection on a rational basis.

Journal ArticleDOI
TL;DR: This paper presents an efficient distributed algorithm for mining association rules that reduces the time complexity in a magnitude that renders as suitable for scaling up to very large data sets.
Abstract: One of the most important data mining problems is learning association rules of the form "90% of the customers that purchase product x also purchase product y". Discovering association rules from huge volumes of data requires substantial processing power. In this paper we present an efficient distributed algorithm for mining association rules that reduces the time complexity in a magnitude that renders as suitable for scaling up to very large data sets. The proposed algorithm is based on partitioning the initial data set into subsets and processing each subset in parallel. The proposed algorithm can maintain the set of association rules that are extracted when applying an association rule mining algorithm to all the data, by reducing the support threshold during processing the subsets. The above are confirmed by empirical tests that we present and which also demonstrate the utility of the method.

Journal ArticleDOI
TL;DR: In this article, the problem of function tag assignment is modeled as a classification problem, where each function tag is regarded as a class and the task is to find what class/tag a given node in a parse tree belongs to from a set of predefined classes/tags.
Abstract: This paper describes the use of two machine learning techniques, naive Bayes and decision trees, to address the task of assigning function tags to nodes in a syntactic parse tree. Function tags are extra functional information, such as logical subject or predicate, that can be added to certain nodes in syntactic parse trees. We model the function tags assignment problem as a classification problem. Each function tag is regarded as a class and the task is to find what class/tag a given node in a parse tree belongs to from a set of predefined classes/tags. The paper offers the first systematic comparison of the two techniques, naive Bayes and decision trees, for the task of function tags assignment. The comparison is based on a standardized data set, the Penn Treebank, a collection of sentences annotated with syntactic information including function tags. We found out that decision trees generally outperform naive Bayes for the task of function tagging. Furthermore, this is the first large scale evaluation of decision trees based solutions to the task of functional tagging.

Journal ArticleDOI
TL;DR: A multiagent Reinforcement Learning approach, that uses coordinated actions, which are called strategies and a fusing process to guide the agents, is proposed and the results demonstrate the efficiency of the proposed approach.
Abstract: Reinforcement Learning comprises an attractive solution to the problem of coordinating a group of agents in a Multiagent System, due to its robustness for learning in uncertain and unknown environments. This paper proposes a multiagent Reinforcement Learning approach, that uses coordinated actions, which we call strategies and a fusing process to guide the agents. To evaluate the proposed approach, we conduct experiments in the Predator-Prey domain and compare it with other learning techniques. The results demonstrate the efficiency of the proposed approach.

Journal ArticleDOI
TL;DR: New incremental filtering rules integrating propagation through both precedence and dependency constraints are proposed and demonstrated efficiency of the proposed rules on the log-based reconciliation problems and min-cutset problems is demonstrated.
Abstract: Precedence constraints specify that an activity must finish before another activity starts and hence such constraints play a crucial role in planning and scheduling problems. Many real-life problems also include dependency constraints expressing logical relations between the activities – for example, an activity requires presence of another activity in the plan. For such problems a typical objective is a maximization of the number of activities satisfying the precedence and dependency constraints. In the paper we propose new incremental filtering rules integrating propagation through both precedence and dependency constraints. We also propose a new filtering rule using the information about the requested number of activities in the plan. We demonstrate efficiency of the proposed rules on log-based reconciliation problems and min-cutset problems.