scispace - formally typeset
Search or ask a question
Author

Bibiano Rivas

Bio: Bibiano Rivas is an academic researcher from University of Castilla–La Mancha. The author has contributed to research in topics: Data quality & Data warehouse. The author has an hindex of 4, co-authored 5 publications receiving 161 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The main conclusion is that the model can be used as an appropriate way to obtain the Quality-in-Use levels of the input data of the Big Data analysis, and those levels can be understood as indicators of trustworthiness and soundness of the results of theBig Data analysis.

165 citations

Journal ArticleDOI
TL;DR: A service architecture for Master Data Exchange supporting the requirements stated by the different parts of the standard like the development of a data dictionary with master data terms; a communication protocol; an API to manage the master data messages; and the algorithms in MapReduce to measure the data quality.

15 citations

Journal ArticleDOI
TL;DR: In this paper, PERTEST (TIN2013-46928-C3-1-R), project funded by the Spanish Ministry of Science and Technology; TESTEAMOS (Tin2016-76956-C 3-1R) and SEQUOIA (tIN2015-63502-C 1-R) projects are presented.
Abstract: This work was supported in part by PERTEST (TIN2013-46928-C3-1-R), project funded by the Spanish Ministry of Science and Technology; TESTEAMOS (TIN2016-76956-C3-1-R) and SEQUOIA (TIN2015-63502-C3-1-R), projects funded by the Spanish Ministry of Economy and Competitiveness; GRUPIN14-007, funded by the Principality of Asturias (Spain); CIEN LPS-BIGGER project; and ERDF funds

6 citations

Book ChapterDOI
19 Oct 2015
TL;DR: I8K, a reference implementation from academic sources of the aforementioned standard parts (ISO/TS 8000:100-140), may be used for this objective but unfortunately, I8K is not aimed to support the assessment of large Master Data volumes and does not reach the required efficiency in Big Data surroundings.
Abstract: During the execution of business processes involving various organizations, Master Data is usually shared and exchanged. It is necessary to keep appropriate levels of quality in these Master Data, in order to prevent defects and failures in the business processes. A way to support the decision about the usage of data in business processes is to include information about the level of quality alongside the Master Data. ISO/TS 8000 parts 100 to 140, may support the provision of this kind of information in a usable manner. Specifically I8K, a reference implementation from academic sources of the aforementioned standard parts (ISO/TS 8000:100-140), may be used for this objective. Regrettably, I8K is not aimed to support the assessment of large Master Data volumes and does not reach the required efficiency in Big Data surroundings. This paper describe an extension of I8K to resolve those problems of efficiency in Big Data projects.

5 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: A testing technique is proposed to generate different infrastructure configurations for a given test input data, and then the program is executed in these configurations in order to reveal functional faults.
Abstract: Programs that process a large volume of data generally run in a distributed and parallel architecture, such as the programs implemented in the processing model MapReduce. In these programs, developers can abstract the infrastructure where the program will run and focus on the functional issues. However, the infrastructure configuration and its state cause different parallel executions of the program and some could derive in functional faults which are hard to reveal. In general, the infrastructure that executes the program is not considered during the testing, because the tests usually contain few input data and then the parallelization is not necessary. In this paper a testing technique is proposed to generate different infrastructure configurations for a given test input data, and then the program is executed in these configurations in order to reveal functional faults. This testing technique is automatized by using a test engine and is applied to a case study. As a result, several infrastructure configurations are automatically generated and executed for a test case revealing a functional fault that is then fixed by the developer.

4 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, the authors present a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions.

1,267 citations

Journal ArticleDOI
TL;DR: This article presents a comprehensive, well-informed examination, and realistic analysis of deploying big data analytics successfully in companies and presents a methodical analysis for the usage of Big Data Analytics in various applications such as agriculture, healthcare, cyber security, and smart city.
Abstract: Big Data Analytics (BDA) is increasingly becoming a trending practice that generates an enormous amount of data and provides a new opportunity that is helpful in relevant decision-making. The developments in Big Data Analytics provide a new paradigm and solutions for big data sources, storage, and advanced analytics. The BDA provide a nuanced view of big data development, and insights on how it can truly create value for firm and customer. This article presents a comprehensive, well-informed examination, and realistic analysis of deploying big data analytics successfully in companies. It provides an overview of the architecture of BDA including six components, namely: (i) data generation, (ii) data acquisition, (iii) data storage, (iv) advanced data analytics, (v) data visualization, and (vi) decision-making for value-creation. In this paper, seven V's characteristics of BDA namely Volume, Velocity, Variety, Valence, Veracity, Variability, and Value are explored. The various big data analytics tools, techniques and technologies have been described. Furthermore, it presents a methodical analysis for the usage of Big Data Analytics in various applications such as agriculture, healthcare, cyber security, and smart city. This paper also highlights the previous research, challenges, current status, and future directions of big data analytics for various application platforms. This overview highlights three issues, namely (i) concepts, characteristics and processing paradigms of Big Data Analytics; (ii) the state-of-the-art framework for decision-making in BDA for companies to insight value-creation; and (iii) the current challenges of Big Data Analytics as well as possible future directions.

274 citations

Journal ArticleDOI
TL;DR: Using imaging, genetic and healthcare data, examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols are provided.
Abstract: Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be ‘team science’.

126 citations

Proceedings ArticleDOI
18 Nov 2015
TL;DR: The state of the art in classification of poor data is surveyed, including the definition of dimensions and specific data problems, and frequently used dimensions are identified and mapped to map data quality problems to the identified dimensions.
Abstract: Data is part of our everyday life and an essential asset in numerous businesses and organizations. The quality of the data, i.e., the degree to which the data characteristics fulfill requirements, can have a tremendous impact on the businesses themselves, the companies, or even in human lives. In fact, research and industry reports show that huge amounts of capital are spent to improve the quality of the data being used in many systems, sometimes even only to understand the quality of the information in use. Considering the variety of dimensions, characteristics, business views, or simply the specificities of the systems being evaluated, understanding how to measure data quality can be an extremely difficult task. In this paper we survey the state of the art in classification of poor data, including the definition of dimensions and specific data problems, we identify frequently used dimensions and map data quality problems to the identified dimensions. The huge variety of terms and definitions found suggests that further standardization efforts are required. Also, data quality research on Big Data appears to be in its initial steps, leaving open space for further research.

107 citations

Journal ArticleDOI
TL;DR: This paper casts a wide net to understand and consolidate from literature the potential factors that can influence the effective use of big data, so they may be further studied.

90 citations