scispace - formally typeset
Search or ask a question

Showing papers presented at "Intelligent Systems Design and Applications in 2009"


Proceedings ArticleDOI
30 Nov 2009
TL;DR: This work proposes a simple way to turn standard measures for OR into ones robust to imbalance, and shows that, once used on balanced datasets, the two versions of each measure coincide, and argues that these measures should become the standard choice for OR.
Abstract: Ordinal regression (OR -- also known as ordinal classification) has received increasing attention in recent times, due to its importance in IR applications such as learning to rank and product review rating. However, research has not paid attention to the fact that typical applications of OR often involve datasets that are highly imbalanced. An imbalanced dataset has the consequence that, when testing a system with an evaluation measure conceived for balanced datasets, a trivial system assigning all items to a single class (typically, the majority class) may even outperform genuinely engineered systems. Moreover, if this evaluation measure is used for parameter optimization, a parameter choice may result that makes the system behave very much like a trivial system. In order to avoid this, evaluation measures that can handle imbalance must be used. We propose a simple way to turn standard measures for OR into ones robust to imbalance. We also show that, once used on balanced datasets, the two versions of each measure coincide, and therefore argue that our measures should become the standard choice for OR.

198 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: This paper describes the algorithm design and implementation of GAs on Hadoop, an open source implementation of MapReduce, and demonstrates the convergence and scalability up to 10^5 variable problems.
Abstract: Genetic algorithms(GAs) are increasingly being applied to large scale problems. The traditional MPI-based parallel GAs require detailed knowledge about machine architecture. On the other hand, MapReduce is a powerful abstraction proposed by Google for making scalable and fault tolerant applications. In this paper, we show how genetic algorithms can be modeled into the MapReduce model. We describe the algorithm design and implementation of GAs on Hadoop, an open source implementation of MapReduce. Our experiments demonstrate the convergence and scalability up to 10^5 variable problems. Adding more resources would enable us to solve even larger problems without any changes in the algorithms and implementation since we do not introduce any performance bottlenecks.

175 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: This paper proposes a new algorithm for discovering infrequent patterns in large databases and compares it to other solutions, which are likely to offer interesting insights.
Abstract: Mining patterns in large databases is a challenging task facing NP-hard problems. Research focused attention on the most occurrent patterns, although less frequent patterns still offer interesting insights. In this paper we propose a new algorithm for discovering infrequent patterns and compare it to other solutions.

75 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: An algorithm for the automatic labeling of topics accordingly to a hierarchy that is specifically designed to find the most agreed labels between the given topic and the hierarchy is presented.
Abstract: An algorithm for the automatic labeling of topics accordingly to a hierarchy is presented. Its main ingredients are a set of similarity measures and a set of topic labeling rules. The labeling rules are specifically designed to find the most agreed labels between the given topic and the hierarchy. The hierarchy is obtained from the Google Directory service, extracted via an ad-hoc developed software procedure and expanded through the use of the OpenOffice English Thesaurus. The performance of the proposed algorithm is investigated by using a document corpus consisting of 33,801 documents and a dictionary consisting of 111,795 words. The results are encouraging, while particularly interesting and significant labeling cases emerged

73 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: The purpose of this paper is to tackle the class imbalance for improving the prediction/classification results by over-sampling techniques as well as using cost-sensitive learning (CSL).
Abstract: This paper introduces and compares some techniques used to predict the student performance at the university. Recently, researchers have focused on applying machine learning in higher education to support both the students and the instructors getting better in their performances. Some previous papers have introduced this problem but the prediction results were unsatisfactory because of the class imbalance problem, which causes the degradation of the classifiers. The purpose of this paper is to tackle the class imbalance for improving the prediction/classification results by over-sampling techniques as well as using cost-sensitive learning (CSL). The paper shows that the results have been improved when comparing with only using baseline classifiers such as Decision Tree (DT), Bayesian Networks (BN), and Support Vector Machines (SVM) to the original datasets.

62 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: In all the experiments performed on the vibration signals represented in the frequency domain, the method achieved a classification accuracy higher than 99%, thus proving the high sensitivity of the method to different types of defects and to different degrees of fault severity.
Abstract: This paper presents a method, based on classification techniques, for automatic detection and diagnosis of defects of rolling element bearings. We used vibration signals recorded by four accelerometers on a mechanical device including rolling element bearings: the signals were collected both with all faultless bearings and after substituting one faultless bearing with an artificially damaged one. We considered four defects and, for one of them, three severity levels. In all the experiments performed on the vibration signals represented in the frequency domain we achieved a classification accuracy higher than 99%, thus proving the high sensitivity of our method to different types of defects and to different degrees of fault severity. We also assessed the degree of robustness of our method to noise by analyzing how the classification performance varies on variation of the signal-to-noise ratio and using statistical classifiers and neural networks. We achieved very good levels of robustness.

59 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: A multiobjective optimization algorithm based on Particle Swarm Optimization (MOPSO-CDR) that uses a diversity mechanism called crowding distance to select the social leaders and the cognitive leader and also uses the same mechanism to delete solutions of the external archive.
Abstract: This paper presents a multiobjective optimization algorithm based on Particle Swarm Optimization (MOPSO-CDR) that uses a diversity mechanism called crowding distance to select the social leaders and the cognitive leader. We also use the same mechanism to delete solutions of the external archive. The performance of our proposal was evaluated in five well known benchmark functions using four metrics previously presented in the literature. Our proposal was compared to other four multi objective optimization algorithms based on Particle Swarm Optimization, called m-DNPSO, CSS-MOPSO, MOPSO and MOPSO-CDLS. The results showed that the proposed approach is competitive when compared to the other approaches and outperforms the other algorithms in many cases.

55 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: A new domain-driven framework for EPM is proposed which assumes that a set of pattern templates can be predefined to focus the mining in a desired way and make it more effective and efficient.
Abstract: Educational process mining (EPM) aims at (i) constructing complete and compact educational process models that are able to reproduce all observed behavior (process model discovery), (ii) checking whether the modeled behavior (either pre-authored or discovered from data) matches the observed behavior (conformance checking), and (iii) projecting information extracted from the logs onto the model, to make the tacit knowledge explicit and facilitate better understanding of the process (process model extension). In this paper we propose a new domain-driven framework for EPM which assumes that a set of pattern templates can be predefined to focus the mining in a desired way and make it more effective and efficient. We illustrate the ideas behind our approach with examples of academic curricular modeling, mining, and conformance checking, using the student database of our department.

52 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: This work presents a method to tackle the problem of synonyms and related terms that do not add any information at all to the set of text documents and achieves a better or equal performance compared to the other literature techniques.
Abstract: The vector space model is the usual representation of texts database for computational treatment. However, in such representation synonyms and/or related terms are treated as independent. Furthermore, there are some terms that do not add any information at all to the set of text documents, on the contrary they even might harm the performance of the information retrieval techniques. In an attempt to reduce this problem, some techniques have been proposed in the literature. In this work we present a method to tackle this problem. In order to validate our approach, we carried out a serie of experiments on four databases and we compare the achieved results with other well known techniques. The evaluation results is such that our method obtained in all cases a better or equal performance compared to the other literature techniques.

48 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: A second order co-occurrence and a related distance measure measure for tag similarities that is robust against the variation in tags is introduced that can derive methods to analyze user interest and compute recommendations.
Abstract: Tagging with free form tags is becoming an increasingly important indexing mechanism. However, free form tags have characteristics that require special treatment when used for searching or recommendation because they show much more variation than controlled keywords. In this paper we present a method that puts this large variation to good use. We introduce second order co-occurrence and a related distance measure measure for tag similarities that is robust against the variation in tags. From this distance measure it is straightforward to derive methods to analyze user interest and compute recommendations. We evaluate the use of tag based recommendation on the Movielens dataset and a dataset of tagged books.

48 citations


Proceedings ArticleDOI
30 Nov 2009
TL;DR: A novel approach based on both sign shape and color which uses Particle Swarm Optimization (PSO) for detection is presented which can be used both to detect a sign belonging to a certain category and, at the same time, to estimate its actual position with respect to the camera reference frame.
Abstract: Road Sign Detection is a major goal of Advanced Driving Assistance Systems (ADAS). Since the dawn of this discipline, much work based on different techniques has been published which shows that traffic signs can be first detected and then classified in video sequences in real time. While detection is usually performed using classical computer vision techniques based on color and/or shape matching, most often classification is performed by neural networks. In this work we present a novel approach based on both sign shape and color which uses Particle Swarm Optimization (PSO) for detection. Remarkably, a single fitness function can be used both to detect a sign belonging to a certain category and, at the same time, to estimate its actual position with respect to the camera reference frame. To speed up execution times, the algorithm exploits the parallelism offered by modern graphics cards and, in particular, the CUDA™ architecture by nVIDIA. The effectiveness of the approach has been assessed on a synthetic video sequence, which has been successfully processed in real time at full frame rate.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This contribution explores the use of a hybrid memetic algorithm based on the differential evolution algorithm, named MDE-DC, that combines the explorative/exploitative strength of two heuristic search methods, that separately obtain very competitive results in either low or high dimensional problems.
Abstract: Continuous optimization is one of the most active research lines in evolutionary and metaheuristic algorithms. Since CEC 2005 and CEC 2008 competitions, many different algorithms have been proposed to solve continuous problems. Despite there exist very good algorithms reporting high quality results for a given dimension, the scalability of the search methods is still an open issue. Finding an algorithm with competitive results in the range of 50 to 500 dimensions is a difficult achievement. This contribution explores the use of a hybrid memetic algorithm based on the differential evolution algorithm, named MDE-DC. The proposed algorithm combines the explorative/exploitative strength of two heuristic search methods, that separately obtain very competitive results in either low or high dimensional problems. This paper uses the benchmark problems and conditions required for the workshop on “evolutionary algorithms and other metaheuristics for Continuous Optimization Problems – A Scalability Test” chaired by Francisco Herrera and Manuel Lozano.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: A method is described to automatically generate linguistic summaries of real world time series data provided by a utility company by partitioning time series into fuzzy intervals and generation of summarising sentences and determination of the truth-fullness of these sentences.
Abstract: In this paper a method is described to automatically generate linguistic summaries of real world time series data provided by a utility company. The methodology involves the following main steps: partitioning of time series into fuzzy intervals, calculation of statistical indicators for the partitions, generation of summarising sentences and determination of the truth-fullness of these sentences, and finally selection of relevant sentences from the generated set of sentences.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: The main concept behind this approach is far more general and does not depend on the particular adopted model: it can be used for a wide category of systems, also non-neural, and with a variety of performance indicators.
Abstract: The paper presents an application of genetic algorithms to the problem of input variables selection for the design of neural systems. The basic idea of the proposed method lies in the use of genetic algorithms in order to select the set of variables to be fed to the neural networks. However, the main concept behind this approach is far more general and does not depend on the particular adopted model: it can be used for a wide category of systems, also non-neural, and with a variety of performance indicators. The proposed method has been tested on a simple case study, in order to demonstrate its effectiveness. The results obtained in the processing of experimental data are presented and discussed.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: A scalability test over eleven scalable benchmark functions, provided by the current workshop (Evolutionary Algorithms and other Metaheuristics for Continuous Optimization Problems - A Scalability Test), are conducted for accelerated DE using generalized opposition-based learning (GODE).
Abstract: In this paper a scalability test over eleven scalable benchmark functions, provided by the current workshop (Evolutionary Algorithms and other Metaheuristics for Continuous Optimization Problems - A Scalability Test), are conducted for accelerated DE using generalized opposition-based learning (GODE). The average error of the best individual in the population has been reported for dimensions 50, 100, 200, and 500 in order to compare with the results of other algorithms which are participating in this workshop. Current work is based on opposition-based differential evolution (ODE) and our previous work, accelerated PSO by generalized OBL.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This paper describes two fall detectors, based on bio-inspired algorithms, which can either operate independently or be part of a modular and easily extensible architecture, able to manage different areas of an intelligent environment.
Abstract: A new trend in modern Assistive Technologies implies making extensive use of ICT to develop efficient and reliable "Ambient Intelligence" applications dedicated to disabled, elderly or frail people. In this paper we describe two fall detectors, based on bio-inspired algorithms. Such devices can either operate independently or be part of a modular and easily extensible architecture, able to manage different areas of an intelligent environment. In this case, effective data fusion can be achieved, thanks to the complementary nature of the sensors on which the detectors are based. One device is based on vision and can be implemented on a standard FPGA programmable logic. It relies on a simplified version of the Particle Swarm Optimization algorithm. The other device under consideration is a wearable accelerometer-based fall detector, which relies on a recent soft-computing paradigm called Hierarchical Temporal Memories (HTMs).

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This paper presents a hardware implementation of Particle Swarm Optimization algorithms using an efficient floating-point arithmetic which performs the computations with high precision.
Abstract: High computational cost for solving large engineering optimization problems point out the design of parallel optimization algorithms. Population based optimization algorithms provide parallel capabilities that can be explored by their implementations done directly in hardware. This paper presents a hardware implementation of Particle Swarm Optimization algorithms using an efficient floating-point arithmetic which performs the computations with high precision. All the architectures are parameterizable by bit-width, allowing the designer to choose the suitable format according to the requirements of the optimization problem. Synthesis and simulation results demonstrate that the proposed architecture achieves satisfactory results obtaining a better performance in therms of elapsed time than conventional software implementations.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: A situation-aware service recommender that helps locating services proactively and manages the vagueness of some contextual condition of these rules and outputs an uncertainty degree for each situation.
Abstract: Today's mobile Internet service portals offer thousands of services and mobile devices can host plenty of applications, documents and web URLs. Hence, for average mobile users there is an increasing cognitive burden in finding the most appropriate service among the many available. On the other hand, methodologies such as bookmarks and resource tagging require a great arranging effort to handle increasing resources. To help mobile users in managing and using this personal information space, new levels of granularity should be introduced in the organization of services, together with some degree of self-awareness. This paper proposes a situation-aware service recommender that helps locating services proactively. In the recommender, a semantic layer determines one or more user current situations by using domain knowledge expressed in terms of ontology and semantic rules. A fuzzy inference layer manages the vagueness of some contextual condition of these rules and outputs an uncertainty degree for each situation. Based on this degree, the recommender proposes a set of specific resources.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: A performance study of two versions of a unidimensional search algorithm aimed at solving high-dimensional optimization problems and how metaheuristics for continuous optimization problems respond with increasing dimension is presented.
Abstract: This paper presents a performance study of two versions of a unidimensional search algorithm aimed at solving high-dimensional optimization problems. The algorithms were tested on 11 scalable benchmark problems. The aim is to observe how metaheuristics for continuous optimization problems respond with increasing dimension. To this end, we report the algorithms’ performance on the 50, 100, 200 and 500-dimension versions of each function. Computational results are given along with convergence graphs to provide comparisons with other algorithms during the conference and afterwards.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: The Dynamic Clan PSO topology is proposed, a novel ability is included in the Clan Topology, named migration process, to improve the PSO degree of convergence focusing on the distribution of the particles in the search space.
Abstract: Particle Swarm Optimization (PSO) has been widely used to solve many different real world optimization problems. Many novel PSO approaches have been proposed to improve the PSO performance. Recently, a communication topology based on Clans was proposed. In this paper, we propose the Dynamic Clan PSO topology. In this approach, a novel ability is included in the Clan Topology, named migration process. The goal is to improve the PSO degree of convergence focusing on the distribution of the particles in the search space. A comparison with the Original Clan topology and other well known topologies was performed and our results in five benchmark functions have shown that the changes can provide better results, except for the Rastrigin function.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This paper proposes a fully unsupervised anomaly detection strategy in hyperspectral imagery based on mixture learning through a variation of the well-known Expectation Maximization (EM) algorithm that was developed within a Bayesian framework.
Abstract: This paper proposes a fully unsupervised anomaly detection strategy in hyperspectral imagery based on mixture learning. Anomaly detection is conducted by adopting a Gaussian Mixture Model (GMM) to describe the statistics of the background in hyperspectral data. One of the key tasks in the application of mixture models is the specification in advance of the number of GMM components, the determination of which is essential and strongly affects detection performance. In this work, GMM parameters estimation was performed through a variation of the well-known Expectation Maximization (EM) algorithm that was developed within a Bayesian framework. Specifically, the adopted mixture learning technique incorporates a built-in mechanism for automatically assessing the number of components during the parameter estimation procedure. Then, Generalized Likelihood Ratio Test (GLRT) is considered for detecting anomalies. Real hyperspectral imagery acquired by an airborne sensor is used for experimental evaluation of the proposed anomaly detection strategy.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: The aim of this paper is to develop a system based on Field Programmable Gate Array for the most significant cardiac arrhythmias recognition by means of Kohonen Self-Organizing Map, and the whole digital implementation is validated to be integrated in wearable cardiac monitoring systems.
Abstract: The aim of this paper is constituted by the feasibility study and development of a system based on Field Programmable Gate Array for the most significant cardiac arrhythmias recognition by means of Kohonen Self-Organizing Map. The feasibility study on an implementation on the XILINX Virtex®-4 FX12 FPGA is proposed, in which the QRS complexes are extracted and classified in real time between normal or pathologic classes. The whole digital implementation is validated to be integrated in wearable cardiac monitoring systems.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This work presents a preliminar study of a memetic algorithm that assigns to each individual a local search intensity that depends on its features, by chaining different local search applications, and studies whether this algorithm is scalable enough for being a good algorithm for medium and high-dimensional problems.
Abstract: Memetic algorithms arise as very effective algorithms to obtain reliable and high accurate solutions for complex continuous optimization problems. Nowadays, higher dimensional optimization problems are an interesting field of research, that introduces new problems for the optimization process, making recommendable to test the scalable capacities of optimization algorithms. In particular, in memetic algorithms, a higher dimensionality increases the domain space around each solution, requiring that the local search method must be applied with a high intensity. In this work, we present a preliminar study of a memetic algorithm that assigns to each individual a local search intensity that depends on its features, by chaining different local search applications. This algorithm has obtained good results in continuous optimization problems and we study whether, using this intensity adaptation mechanism with the scalable LS method MTS-LS2, the algorithm is scalable enough for being a good algorithm for medium and high-dimensional problems. Experiments are carried out to test the ability of being scalable, and results obtained show that the proposal is scalable in many of the functions, scalable and non-scalable, of the benchmark used.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: The findings showed that Adjacency matrix, graph theory and network analysis techniques provide more meaningful students’ interaction analysis in term of information of communication transcripts and communication structures in online asynchronous discussion.
Abstract: Asynchronous discussion forum can provide a platform for online learners to communicate with one another easily, without the constraint of place and time. This study explores the analysis process of online asynchronous discussion. We focus upon content analysis and social network analysis, which is the technique often used to measure online discussion in formal educational settings. In addition, Soller’s model for content analysis was developed and employed to qualitatively analyze the online discussion. We also discuss the use of network indicators of social network analysis to assess level participation and communication structure throughout online discussion. Adjacency matrix, graph theory and network analysis techniques were applied to quantitatively define the networks interaction among students. The findings showed that these methods provide more meaningful students’ interaction analysis in term of information of communication transcripts and communication structures in online asynchronous discussion.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This method selects variables using a feature clustering strategy, using a combination of supervised and unsupervised feature distance measure, which is based on Conditional Mutual Information and Conditional Entropy.
Abstract: In this contribution a feature selection method in semi-supervised problems is proposed. This method selects variables using a feature clustering strategy, using a combination of supervised and unsupervised feature distance measure, which is based on Conditional Mutual Information and Conditional Entropy. Real databases were analyzed with different ratios between labelled and unlabelled samples in the training set, showing the satisfactory behaviour of the proposed approach.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: A new aggregation operator that uses the probability and the weighted average in the same formulation is presented, called the fuzzy probabilistic weighted average (FPWA) operator, which focuses on a business decision making problem about the selection of monetary policies.
Abstract: We present a new aggregation operator that uses the probability and the weighted average in the same formulation. Moreover, we consider a situation where the information is uncertain and can be represented with fuzzy numbers. We call this new aggregation operator the fuzzy probabilistic weighted average (FPWA) operator. We study some of its main properties. We also study its applicability and we focus on a business decision making problem about the selection of monetary policies.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This new methodology is tested over a real problem of wind speed forecasting, in which it is shown that the method is able to improve the performance of previous MLPs, obtaining an interpretable model of final regression for each turbine in the wind park.
Abstract: This paper proposes a neural network model for wind speed prediction, a very important task in wind parks management. Currently, several physical-statistical and artificial intelligence (AI) wind speed prediction models are used to this end. A recently proposed hybrid model is based on hybridizations of global and mesoscale forecasting systems, with a final downscaling step using a multilayer perceptron (MLP). In this paper, we test an alternative neural model for this final step of downscaling, in which projection hyperbolic tangent units (HTUs) are used within feed forward neural networks. The architecture, weights and node typology of the HTU-based network are learnt using a hybrid evolutionary programming algorithm. This new methodology is tested over a real problem of wind speed forecasting, in which we show that our method is able to improve the performance of previous MLPs, obtaining an interpretable model of final regression for each turbine in the wind park.

Proceedings ArticleDOI
Hisashi Handa1
30 Nov 2009
TL;DR: The EDA-RL is extended for Multi-Objective Reinforcement Learning Problems, where reward is given by several criteria and the proposed method is enable to acquire various strategies by a single run.
Abstract: EDA-RL, Estimation of Distribution Algorithms for Reinforcement Learning Problems, have been proposed by us recently. The EDA-RL can improve policies by EDA scheme: First, select better episodes. Secondly, estimate probabilistic models, i.e., policies, and finally, interact with the environment for generating new episodes. In this paper, the EDA-RL is extended for Multi-Objective Reinforcement Learning Problems, where reward is given by several criteria. By incorporating the notions in Evolutionary Multi-Objective Optimization, the proposed method is enable to acquire various strategies by a single run.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: In this article, the authors compared systematic, evolutionary and order heuristics used to suppress the drawbacks of k-means, which are the need to choose the number of clusters, k, and the sensibility to the initial prototypes' position.
Abstract: One of the most influential algorithms in data mining, k-means, is broadly used in practical tasks for its simplicity, computational efficiency and effectiveness in high dimensional problems. However, k-means has two major drawbacks, which are the need to choose the number of clusters, k, and the sensibility to the initial prototypes’ position. In this work, systematic, evolutionary and order heuristics used to suppress these drawbacks are compared. 27 variants of 4 algorithmic approaches are used to partition 324 synthetic data sets and the obtained results are compared.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: Some new fuzzy-rough set-based approaches to unsupervised feature selection are proposed, which result in a significant reduction in dimensionality whilst retaining the semantics of the data.
Abstract: For supervised learning, feature selection algorithms attempt to maximise a given function of predictive accuracy. This function usually considers the ability of feature vectors to reflect decision class labels. It is therefore intuitive to retain only those features that are related to or lead to these decision classes. However, in unsupervised learning, decision class labels are not provided, which poses questions such as; which features should be retained? and, why not use all of the information? The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. In this paper, some new fuzzy-rough set-based approaches to unsupervised feature selection are proposed. These approaches require no thresholding or domain information, and result in a significant reduction in dimensionality whilst retaining the semantics of the data.