Showing papers by "Daniel Rodriguez published in 2018"

PDF

Open Access

Journal Article•DOI•

Distributed ReliefF-based feature selection in Spark

[...]

Raul-Jose Palma-Mendoza, Daniel Rodriguez¹, Luis de-Marcos¹•Institutions (1)

22 Jan 2018-Knowledge and Information Systems

TL;DR: DiReliefF as discussed by the authors is a completely redesigned distributed version of the ReliefF algorithm based on the novel Spark cluster computing model that has been proposed for feature selection in machine learning and data mining fields; removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or improving the processing algorithm's accuracy.

...read moreread less

Abstract: Feature selection (FS) is a key research area in the machine learning and data mining fields; removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving the processing algorithm’s accuracy. However, traditional algorithms designed for executing on a single machine lack scalability to deal with the increasing amount of data that have become available in the current Big Data era. ReliefF is one of the most important algorithms successfully implemented in many FS applications. In this paper, we present a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that we have called DiReliefF. The effectiveness of our proposal is tested on four publicly available datasets, all of them with a large number of instances and two of them with also a large number of features. Subsets of these datasets were also used to compare the results to a non-distributed implementation of the algorithm. The results show that the non-distributed implementation is unable to handle such large volumes of data without specialized hardware, while our design can process them in a scalable way with much better processing times and memory usage.

...read moreread less

49 citations

Journal Article•DOI•

A data mining approach for global burned area mapping

[...]

Rubén Ramo¹, Mariano García¹, Daniel Rodriguez¹, Emilio Chuvieco¹•Institutions (1)

University of Alcalá¹

01 Dec 2018-International Journal of Applied Earth Observation and Geoinformation

TL;DR: Four different classification algorithms, widely used in remote sensing, such as Random Forest, Support Vector Machine, SVM, Neural Networks and a well-known decision tree algorithm are explored for classifying burned areas at global scale through a data mining methodology using 2008 MODIS data.

...read moreread less

40 citations

Journal Article•DOI•

Distributed ReliefF based Feature Selection in Spark.

[...]

Raul-Jose Palma-Mendoza, Daniel Rodriguez¹, Luis de-Marcos¹•Institutions (1)

University of Alcalá¹

01 Nov 2018-arXiv: Learning

TL;DR: This paper presents a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that is called DiReliefF and can process large volumes of data in a scalable way with much better processing times and memory usage.

...read moreread less

Abstract: Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving the processing algorithm's accuracy. However, traditional algorithms designed for executing on a single machine lack scalability to deal with the increasing amount of data that has become available in the current Big Data era. ReliefF is one of the most important algorithms successfully implemented in many FS applications. In this paper, we present a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that we have called DiReliefF. Spark is increasing its popularity due to its much faster processing times compared with Hadoop's MapReduce model implementation. The effectiveness of our proposal is tested on four publicly available datasets, all of them with a large number of instances and two of them with also a large number of features. Subsets of these datasets were also used to compare the results to a non-distributed implementation of the algorithm. The results show that the non-distributed implementation is unable to handle such large volumes of data without specialized hardware, while our design can process them in a scalable way with much better processing times and memory usage.

...read moreread less

37 citations

Journal Article•DOI•

Learning to classify software defects from crowds: A novel approach

[...]

Jerónimo Hernández-González¹, Daniel Rodriguez², Iñaki Inza¹, Rachel Harrison³, Jose A. Lozano¹, Jose A. Lozano⁴ - Show less +2 more•Institutions (4)

University of the Basque Country¹, University of Alcalá², Oxford Brookes University³, Basque Center for Applied Mathematics⁴

01 Jan 2018-Applied Soft Computing

TL;DR: Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects.

...read moreread less

26 citations

Journal Article•DOI•

Using simulation-based optimization in the context of IT service management change process

[...]

Mercedes Ruiz¹, Javier Moreno², Bernabé Dorronsoro¹, Daniel Rodriguez²•Institutions (2)

University of Cádiz¹, University of Alcalá²

01 Aug 2018

TL;DR: The results of applying two well-known Multi-Objective Evolutionary Algorithms, namely NSGA-II and SPEA2, are shown to obtain a set of optimal solutions for the KPIs associated with delivering process efficiency as a CSF.

...read moreread less

Abstract: Today's IT systems and IT processes must be ready to handle change in an efficient and responsive manner to allow businesses to both evolve and adapt to a changing world. In this paper we describe an approach that consists of using simulation based multi-objective optimization to select optimal ITIL change management process strategies that help IT managers achieve process efficiency as a Critical Success Factor (CSF). A multi-method simulation model, which is based on agent-based and discrete-event simulation paradigms, has been built to simulate the whole process lifecycle, since the change initiation until its closure. As most engineering problems, assuring an efficient delivery of the change management process requires optimizing simultaneously the corresponding Key Performance Indicators (KPIs) in which the process-efficiency CSF can be rolled down. In this paper, we show the results of applying two well-known Multi-Objective Evolutionary Algorithms, namely NSGA-II and SPEA2, to obtain a set of optimal solutions for the KPIs associated with delivering process efficiency as a CSF. We also compare the results obtained with the output from the single-objective optimization algorithm provided by the simulation tool. The experimental work included shows how the approach can provide the IT manager with a wide range of high quality solutions to support them in their decision-making towards CSF achievement.

...read moreread less

19 citations

Journal Article•DOI•

ON-SMMILE: Ontology Network-based Student Model for MultIple Learning Environments

[...]

Hector Yago¹, Julia Clemente¹, Daniel Rodriguez¹, Pedro Fernandez-de-Cordoba¹•Institutions (1)

University of Alcalá¹

01 May 2018

TL;DR: The aim of this work is to design and build methodologically, throughout ontological engineering, the ON-SMMILE model to be used as support of future works closely linked to supervision of student's learning as competence-based recommender system.

...read moreread less

Abstract: Currently, many educational researchers focus on the extraction of information about the learning progress to properly assist students. We present ON-SMMILE, a student-centered and flexible student model which is represented as an ontology network combining information related to (i) students and their knowledge state, (ii) assessments that rely on rubrics and different types of objectives, (iii) units of learning and (iv) information resources previously employed as support for the student model in intelligent virtual environment for training/instruction and here extended. The aim of this work is to design and build methodologically, throughout ontological engineering, the ON-SMMILE model to be used as support of future works closely linked to supervision of student's learning as competence-based recommender system. For this purpose, our model is designed as a set of ontological resources that have been extended, standardized, interrelated and adapted to be used in multiple learning environments. In this paper, we also analyze the available approaches based on instructional design which can be added to ontology network to build the proposed model. As a case study, a chemical experiment in a virtual environment and its instantiation are described in terms of ON-SMMILE.

...read moreread less

17 citations

Journal Article•DOI•

Multiobjective Testing Resource Allocation Under Uncertainty

[...]

Roberto Pietrantuono, Pasqualina Potena, Antonio Pecchia, Daniel Rodriguez¹, Stefano Russo², Luis Fernandez-Sanz¹ - Show less +2 more•Institutions (2)

University of Alcalá¹, University of Naples Federico II²

01 Jun 2018-IEEE Transactions on Evolutionary Computation

TL;DR: Debug awareness relaxes the traditional assumptions of SRGMs—in particular the very unrealistic immediate repair of detected faults—and incorporates the bug assignment activity, and robustness provides solutions valid in spite of a degree of uncertainty on input parameters.

...read moreread less

Abstract: Testing resource allocation is the problem of planning the assignment of resources to testing activities of software components so as to achieve a target goal under given constraints. Existing methods build on software reliability growth models (SRGMs), aiming at maximizing reliability given time/cost constraints, or at minimizing cost given quality/time constraints. We formulate it as a multiobjective debug-aware and robust optimization problem under uncertainty of data, advancing the state-of-the-art in the following ways. Multiobjective optimization produces a set of solutions, allowing to evaluate alternative tradeoffs among reliability, cost, and release time. Debug awareness relaxes the traditional assumptions of SRGMs—in particular the very unrealistic immediate repair of detected faults—and incorporates the bug assignment activity. Robustness provides solutions valid in spite of a degree of uncertainty on input parameters. We show results with a real-world case study.

...read moreread less

16 citations

Journal Article•DOI•

Competence-based recommender systems: a systematic literature review

[...]

Hector Yago¹, Julia Clemente¹, Daniel Rodriguez¹•Institutions (1)

University of Alcalá¹

12 Jul 2018-Behaviour & Information Technology

TL;DR: A new Systematic Literature Review (SLR) concerning competence-based recommender systems is conducted to analyse in relation to their nature and assessment of competences an others key factors that provide more flexible and exhaustive recommendations.

...read moreread less

Abstract: Competence-based learning is increasingly widespread in many institutions since it provides flexibility, facilitates the self-learning and brings the academic and professional worlds closer togethe

...read moreread less

15 citations

Journal Article•DOI•

Two datasets of defect reports labeled by a crowd of annotators of unknown reliability.

[...]

Jerónimo Hernández-González¹, Daniel Rodriguez², Iñaki Inza¹, Rachel Harrison³, Jose A. Lozano⁴, Jose A. Lozano¹ - Show less +2 more•Institutions (4)

University of the Basque Country¹, University of Alcalá², Oxford Brookes University³, Basque Center for Applied Mathematics⁴

01 Jun 2018-Data in Brief

TL;DR: Two sets of defect reports were collected from public issue tracking systems from two different real domains and categorized by a set of annotators of unknown reliability according to their impact from IBM's orthogonal defect classification taxonomy to solve the defect classification problem.

...read moreread less

4 citations