scispace - formally typeset
Search or ask a question

Showing papers by "Daniel Rodriguez published in 2018"


Journal ArticleDOI
TL;DR: DiReliefF as discussed by the authors is a completely redesigned distributed version of the ReliefF algorithm based on the novel Spark cluster computing model that has been proposed for feature selection in machine learning and data mining fields; removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or improving the processing algorithm's accuracy.
Abstract: Feature selection (FS) is a key research area in the machine learning and data mining fields; removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving the processing algorithm’s accuracy. However, traditional algorithms designed for executing on a single machine lack scalability to deal with the increasing amount of data that have become available in the current Big Data era. ReliefF is one of the most important algorithms successfully implemented in many FS applications. In this paper, we present a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that we have called DiReliefF. The effectiveness of our proposal is tested on four publicly available datasets, all of them with a large number of instances and two of them with also a large number of features. Subsets of these datasets were also used to compare the results to a non-distributed implementation of the algorithm. The results show that the non-distributed implementation is unable to handle such large volumes of data without specialized hardware, while our design can process them in a scalable way with much better processing times and memory usage.

49 citations


Journal ArticleDOI
TL;DR: Four different classification algorithms, widely used in remote sensing, such as Random Forest, Support Vector Machine, SVM, Neural Networks and a well-known decision tree algorithm are explored for classifying burned areas at global scale through a data mining methodology using 2008 MODIS data.

40 citations


Journal ArticleDOI
TL;DR: This paper presents a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that is called DiReliefF and can process large volumes of data in a scalable way with much better processing times and memory usage.
Abstract: Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving the processing algorithm's accuracy. However, traditional algorithms designed for executing on a single machine lack scalability to deal with the increasing amount of data that has become available in the current Big Data era. ReliefF is one of the most important algorithms successfully implemented in many FS applications. In this paper, we present a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that we have called DiReliefF. Spark is increasing its popularity due to its much faster processing times compared with Hadoop's MapReduce model implementation. The effectiveness of our proposal is tested on four publicly available datasets, all of them with a large number of instances and two of them with also a large number of features. Subsets of these datasets were also used to compare the results to a non-distributed implementation of the algorithm. The results show that the non-distributed implementation is unable to handle such large volumes of data without specialized hardware, while our design can process them in a scalable way with much better processing times and memory usage.

37 citations


Journal ArticleDOI
TL;DR: Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects.

26 citations


Journal ArticleDOI
01 Aug 2018
TL;DR: The results of applying two well-known Multi-Objective Evolutionary Algorithms, namely NSGA-II and SPEA2, are shown to obtain a set of optimal solutions for the KPIs associated with delivering process efficiency as a CSF.
Abstract: Today's IT systems and IT processes must be ready to handle change in an efficient and responsive manner to allow businesses to both evolve and adapt to a changing world. In this paper we describe an approach that consists of using simulation based multi-objective optimization to select optimal ITIL change management process strategies that help IT managers achieve process efficiency as a Critical Success Factor (CSF). A multi-method simulation model, which is based on agent-based and discrete-event simulation paradigms, has been built to simulate the whole process lifecycle, since the change initiation until its closure. As most engineering problems, assuring an efficient delivery of the change management process requires optimizing simultaneously the corresponding Key Performance Indicators (KPIs) in which the process-efficiency CSF can be rolled down. In this paper, we show the results of applying two well-known Multi-Objective Evolutionary Algorithms, namely NSGA-II and SPEA2, to obtain a set of optimal solutions for the KPIs associated with delivering process efficiency as a CSF. We also compare the results obtained with the output from the single-objective optimization algorithm provided by the simulation tool. The experimental work included shows how the approach can provide the IT manager with a wide range of high quality solutions to support them in their decision-making towards CSF achievement.

19 citations


Journal ArticleDOI
01 May 2018
TL;DR: The aim of this work is to design and build methodologically, throughout ontological engineering, the ON-SMMILE model to be used as support of future works closely linked to supervision of student's learning as competence-based recommender system.
Abstract: Currently, many educational researchers focus on the extraction of information about the learning progress to properly assist students. We present ON-SMMILE, a student-centered and flexible student model which is represented as an ontology network combining information related to (i) students and their knowledge state, (ii) assessments that rely on rubrics and different types of objectives, (iii) units of learning and (iv) information resources previously employed as support for the student model in intelligent virtual environment for training/instruction and here extended. The aim of this work is to design and build methodologically, throughout ontological engineering, the ON-SMMILE model to be used as support of future works closely linked to supervision of student's learning as competence-based recommender system. For this purpose, our model is designed as a set of ontological resources that have been extended, standardized, interrelated and adapted to be used in multiple learning environments. In this paper, we also analyze the available approaches based on instructional design which can be added to ontology network to build the proposed model. As a case study, a chemical experiment in a virtual environment and its instantiation are described in terms of ON-SMMILE.

17 citations


Journal ArticleDOI
TL;DR: Debug awareness relaxes the traditional assumptions of SRGMs—in particular the very unrealistic immediate repair of detected faults—and incorporates the bug assignment activity, and robustness provides solutions valid in spite of a degree of uncertainty on input parameters.
Abstract: Testing resource allocation is the problem of planning the assignment of resources to testing activities of software components so as to achieve a target goal under given constraints. Existing methods build on software reliability growth models (SRGMs), aiming at maximizing reliability given time/cost constraints, or at minimizing cost given quality/time constraints. We formulate it as a multiobjective debug-aware and robust optimization problem under uncertainty of data, advancing the state-of-the-art in the following ways. Multiobjective optimization produces a set of solutions, allowing to evaluate alternative tradeoffs among reliability, cost, and release time. Debug awareness relaxes the traditional assumptions of SRGMs—in particular the very unrealistic immediate repair of detected faults—and incorporates the bug assignment activity. Robustness provides solutions valid in spite of a degree of uncertainty on input parameters. We show results with a real-world case study.

16 citations


Journal ArticleDOI
TL;DR: A new Systematic Literature Review (SLR) concerning competence-based recommender systems is conducted to analyse in relation to their nature and assessment of competences an others key factors that provide more flexible and exhaustive recommendations.
Abstract: Competence-based learning is increasingly widespread in many institutions since it provides flexibility, facilitates the self-learning and brings the academic and professional worlds closer togethe

15 citations


Journal ArticleDOI
TL;DR: Two sets of defect reports were collected from public issue tracking systems from two different real domains and categorized by a set of annotators of unknown reliability according to their impact from IBM's orthogonal defect classification taxonomy to solve the defect classification problem.

4 citations