Showing papers on "Data management published in 2021"

PDF

Open Access

Journal Article•DOI•

A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective

[...]

Yuji Roh¹, Geon Heo¹, Steven Euijong Whang¹•Institutions (1)

01 Apr 2021-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This survey performs a comprehensive study of data collection from a data management point of view, providing a research landscape of these operations, guidelines on which technique to use when, and identify interesting research challenges.

...read moreread less

Abstract: Data collection is a major bottleneck in machine learning and an active research topic in multiple communities. There are largely two reasons data collection has recently become a critical issue. First, as machine learning is becoming more widely-used, we are seeing new applications that do not necessarily have enough labeled data. Second, unlike traditional machine learning, deep learning techniques automatically generate features, which saves feature engineering costs, but in return may require larger amounts of labeled data. Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data. In this survey, we perform a comprehensive study of data collection from a data management point of view. Data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. We provide a research landscape of these operations, provide guidelines on which technique to use when, and identify interesting research challenges. The integration of machine learning and data management for data collection is part of a larger trend of Big data and Artificial Intelligence (AI) integration and opens many opportunities for new research.

...read moreread less

471 citations

Journal Article•DOI•

CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories

[...]

Fernando Martínez-Plumed¹, Lidia Contreras-Ochando¹, Cèsar Ferri¹, José Hernández-Orallo¹, Meelis Kull², Nicolas Lachiche, María José Ramírez-Quintana¹, Peter A. Flach³ - Show less +4 more•Institutions (3)

Polytechnic University of Valencia¹, University of Tartu², University of Bristol³

01 Aug 2021-IEEE Transactions on Knowledge and Data Engineering

TL;DR: It is argued that if the project is goal-directed and process-driven the process model view still largely holds, and when data science projects become more exploratory the paths that the project can take become more varied, and a more flexible model is called for.

...read moreread less

Abstract: CRISP-DM(CRoss-Industry Standard Process for Data Mining) has its origins in the second half of the nineties and is thus about two decades old. According to many surveys and user polls it is still the de facto standard for developing data mining and knowledge discovery projects. However, undoubtedly the field has moved on considerably in twenty years, with data science now the leading term being favoured over data mining . In this paper we investigate whether, and in what contexts, CRISP-DM is still fit for purpose for data science projects. We argue that if the project is goal-directed and process-driven the process model view still largely holds. On the other hand, when data science projects become more exploratory the paths that the project can take become more varied, and a more flexible model is called for. We suggest what the outlines of such a trajectory-based model might look like and how it can be used to categorise data science projects (goal-directed, exploratory or data management). We examine seven real-life exemplars where exploratory activities play an important role and compare them against 51 use cases extracted from the NIST Big Data Public Working Group. We anticipate this categorisation can help project planning in terms of time and cost characteristics.

...read moreread less

120 citations

Journal Article•DOI•

Towards a new era of mass data collection: Assessing pandemic surveillance technologies to preserve user privacy

[...]

Samuel Ribeiro-Navarrete¹, Jose Ramon Saura², Daniel Palacios-Marqués¹•Institutions (2)

Polytechnic University of Valencia¹, King Juan Carlos University²

22 Feb 2021-Technological Forecasting and Social Change

TL;DR: In this article, the authors identify the technologies to control the COVID-19 and future pandemics with massive data collection from users' mobile devices and discuss the important theoretical and practical implications of preserving user privacy and curbing COVID19 infections in the global public health emergency situation.

...read moreread less

115 citations

Journal Article•DOI•

Deep Learning-Embedded Social Internet of Things for Ambiguity-Aware Social Recommendations

[...]

Zhiwei Guo¹, Keping Yu², Yu Li¹, Gautam Srivastava³, Jerry Chun-Wei Lin⁴ - Show less +1 more•Institutions (4)

Chongqing Technology and Business University¹, Waseda University², Brandon University³, Bergen University College⁴

05 Jan 2021-IEEE Transactions on Network Science and Engineering

TL;DR: A deep learning-embedded social Internet of Things (IoT) architecture is developed for social computing scenarios to guarantee reliable data management and overcomes the preference ambiguity problem in SR.

...read moreread less

Abstract: With the increasing demand of users for personalized social services, social recommendation (SR) has been an important concern in academia. However, current research on SR universally faces two main challenges. On the one hand, SR lacks the considerable ability of robust online data management. On the other hand, SR fails to take the ambiguity of preference feedback into consideration. To bridge these gaps, a deep learning-embedded social Internet of Things (IoT) is proposed for ambiguity-aware SR (SIoT-SR). Specifically, a social IoT architecture is developed for social computing scenarios to guarantee reliable data management. A deep learning-based graph neural network model that can be embedded into the model is proposed as the core algorithm to perform ambiguity-aware SR. This design not only provides proper online data sensing and management but also overcomes the preference ambiguity problem in SR. To evaluate the performance of the proposed SIoT-SR, two real-world datasets are selected to establish experimental scenarios. The method is assessed using three different metrics, selecting five typical methods as benchmarks. The experimental results show that the proposed SIoT-SR performs better than the benchmark methods by at least 10% and has good robustness.

...read moreread less

90 citations

Journal Article•DOI•

Data Construction Method for the Applications of Workshop Digital Twin System

[...]

Tianxiang Kong¹, Tianliang Hu, Tingting Zhou², Yingxin Ye³•Institutions (3)

Shandong University¹, Qilu University of Technology², Shandong jianzhu university 山東建築大學³

01 Jan 2021-Journal of Manufacturing Systems

TL;DR: The framework of data construction is designed based on the functional requirements, which are analyzed according to the characteristics of manufacturing data, and application of cutting tool wear prediction is taken as a case study to show the feasibility and effectiveness of the proposed data construction method.

...read moreread less

71 citations

Journal Article•DOI•

A knowledge-based Digital Shadow for machining industry in a Digital Twin perspective

[...]

Asma Ladj¹, Zhiqiang Wang¹, Oussama Meski¹, Farouk Belkadi², Mathieu Ritou¹, Catherine da Cunha² - Show less +2 more•Institutions (2)

University of Nantes¹, Centre national de la recherche scientifique²

01 Jan 2021-Journal of Manufacturing Systems

TL;DR: This paper addresses the problems of data management and analytics for decision-aid by proposing a new vision of Digital Shadow (DS), which would be considered as the core component of a future Digital Twin.

...read moreread less

70 citations

Journal Article•DOI•

Genome Warehouse: A Public Repository Housing Genome-scale Data.

[...]

Meili Chen¹, Yingke Ma¹, Song Wu², Song Wu¹, Xinchang Zheng¹, Hongen Kang², Hongen Kang¹, Jian Sang², Jian Sang¹, Xingjian Xu², Xingjian Xu¹, Lili Hao¹, Zhaohua Li¹, Zhaohua Li², Zheng Gong¹, Zheng Gong², Jingfa Xiao¹, Jingfa Xiao², Zhang Zhang², Zhang Zhang¹, Wenming Zhao¹, Wenming Zhao², Yiming Bao² - Show less +19 more•Institutions (2)

Beijing Institute of Genomics¹, Chinese Academy of Sciences²

24 Jun 2021-Genomics, Proteomics & Bioinformatics

TL;DR: The Genome Warehouse (GWH) as discussed by the authors is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and sharing.

...read moreread less

69 citations

Journal Article•DOI•

On data lake architectures and metadata management

[...]

Pegdwendé N. Sawadogo¹, Jérôme Darmont¹•Institutions (1)

University of Lyon¹

01 Feb 2021-Journal of Intelligent Information Systems

TL;DR: This paper provides a comprehensive state of the art of the different approaches to data lake design, particularly on data lake architectures and metadata management, which are key issues in successful data lakes.

...read moreread less

Abstract: Over the past two decades, we have witnessed an exponential increase of data production in the world So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media They are mainly characterized by volume, velocity, variety and veracity issues Big data-related issues strongly challenge traditional data management and analysis systems The concept of data lake was introduced to address them A data lake is a large, raw data repository that stores and manages all company data bearing any format However, the data lake concept remains ambiguous or fuzzy for many researchers and practitioners, who often confuse it with the Hadoop technology Thus, we provide in this paper a comprehensive state of the art of the different approaches to data lake design We particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes We also discuss the pros and cons of data lakes and their design alternatives

...read moreread less

68 citations

Journal Article•DOI•

FDM: Fuzzy-Optimized Data Management Technique for Improving Big Data Analytics

[...]

Gunasekaran Manogaran, P. Mohamed Shakeel¹, S. Baskar², Ching-Hsien Hsu³, Seifedine Kadry⁴, Revathi Sundarasekar⁵, Priyan Malarvizhi Kumar⁶, BalaAnand Muthu⁷ - Show less +4 more•Institutions (7)

Universiti Teknikal Malaysia Melaka¹, Karpagam University², Asia University (Taiwan)³, Beirut Arab University⁴, Anna University⁵, Middlesex University⁶, V.R.S College of Engineering and Technology⁷

01 Jan 2021-IEEE Transactions on Fuzzy Systems

TL;DR: F fuzzy-optimized data management (FDM) technique for classifying and improving coalition of accumulated information based semantics and constraints to process complex data in a controlled time is introduced.

...read moreread less

Abstract: Big data analytics and processing require complex architectures and sophisticated techniques for extracting useful information from the accumulated information. Visualizing the extracted data for real-time solutions is demanding in accordance with the semantics and the classification employed by the processing models. This article introduces fuzzy-optimized data management (FDM) technique for classifying and improving coalition of accumulated information based semantics and constraints. The dependency of the information is classified on the basis of the relationships modeled between the data based on the attributes. This technique segregates the considered attributes based on similarity index boundaries to process complex data in a controlled time. The performance of the proposed FDM is analyzed using a real-time weather forecast dataset consisting of sensor data (observed) and image data (captured). With this dataset, the functions of FDM such as input semantics analytics and classification based on similarity are performed. The metrics classification and processing time and similarity index are analyzed for the varying data sizes, classification instances, and dataset records. The proposed FDM is found to achieve 36.28% less processing time for varying classification instances, and 12.57% high similarity index.

...read moreread less

67 citations

Journal Article•DOI•

Trustworthy Digital Twins in the Industrial Internet of Things with Blockchain

[...]

Sabah Suhail¹, Rasheed Hussain, Raja Jurdak², Choong Seon Hong¹•Institutions (2)

Kyung Hee University¹, Queensland University of Technology²

18 Feb 2021-IEEE Internet Computing

TL;DR: A blockchain-based framework for the Industrial Internet of Things (IIoT) to address the issues of data management and security and leverage Digital Twins that can draw intelligent conclusions from data by identifying the faults and recommending precautionary measures ahead of critical events is envisioned.

...read moreread less

Abstract: Industrial processes rely on sensory data for critical decision-making processes. Extracting actionable insights from the collected data calls for an infrastructure that can ensure the trustworthiness of data. To this end, we envision a blockchain-based framework for the Industrial Internet of Things (IIoT) to address the issues of data management and security. Once the data collected from trustworthy sources are recorded in the blockchain, product lifecycle events can be fed into data-driven systems for process monitoring, diagnostics, and optimized control. In this regard, we leverage Digital Twins (DTs) that can draw intelligent conclusions from data by identifying the faults and recommending precautionary measures ahead of critical events. Furthermore, we discuss the integration of DTs and blockchain to target key challenges of disparate data repositories, untrustworthy data dissemination, and fault diagnosis. Finally, we identify outstanding challenges faced by the IIoT and future research directions while leveraging blockchain and DTs.

...read moreread less

67 citations

Journal Article•DOI•

Data sharing practices and data availability upon request differ across scientific disciplines

[...]

Leho Tedersoo¹, Rainer Küngas¹, Ester Oras², Ester Oras¹, Kajar Köster¹, Kajar Köster³, Helen Eenmaa¹, Helen Eenmaa², Äli Leijen², Äli Leijen¹, Margus Pedaste², Marju Raju⁴, Marju Raju¹, Anastasiya Astapova², Anastasiya Astapova¹, Heli Lukner², Heli Lukner¹, Karin Kogermann², Karin Kogermann¹, Tuul Sepp¹, Tuul Sepp² - Show less +17 more•Institutions (4)

Estonian Academy of Sciences¹, University of Tartu², University of Helsinki³, Estonian Academy of Music and Theatre⁴

27 Jul 2021-Scientific Data

TL;DR: In this article, the authors evaluated data availability in research articles across nine disciplines in Nature and Science magazines and recorded corresponding authors' concerns, requests and reasons for declining data sharing, and recommended that data management costs should be covered by funding agencies; publicly available research data ought to be included in the evaluation of applications; and surveillance of data sharing should be enforced by both academic publishers and funders.

...read moreread less

Abstract: Data sharing is one of the cornerstones of modern science that enables large-scale analyses and reproducibility. We evaluated data availability in research articles across nine disciplines in Nature and Science magazines and recorded corresponding authors' concerns, requests and reasons for declining data sharing. Although data sharing has improved in the last decade and particularly in recent years, data availability and willingness to share data still differ greatly among disciplines. We observed that statements of data availability upon (reasonable) request are inefficient and should not be allowed by journals. To improve data sharing at the time of manuscript acceptance, researchers should be better motivated to release their data with real benefits such as recognition, or bonus points in grant and job applications. We recommend that data management costs should be covered by funding agencies; publicly available research data ought to be included in the evaluation of applications; and surveillance of data sharing should be enforced by both academic publishers and funders. These cross-discipline survey data are available from the plutoF repository.

...read moreread less

Journal Article•DOI•

ALKEMIE: An intelligent computational platform for accelerating materials discovery and design

[...]

Guanjie Wang¹, Liyu Peng¹, Kaiqi Li¹, Linggang Zhu¹, Jian Zhou¹, Naihua Miao¹, Zhimei Sun¹ - Show less +3 more•Institutions (1)

Beihang University¹

01 Jan 2021-Computational Materials Science

TL;DR: An open-source computational platform named ALKEMIE, acronyms for Artificial Learning and Knowledge Enhanced Materials Informatics Engineering, which enables easy access of data-driven techniques to broad communities and has an elaborately designed user-friendly graphical user-interface which makes the workflow and dataflow more maneuverable and transparent, facilitating its easy-to-use for scientists with broad backgrounds.

...read moreread less

Journal Article•DOI•

Digital twin-based assembly data management and process traceability for complex products

[...]

Cunbo Zhuang¹, Jingcheng Gong¹, Jianhua Liu¹•Institutions (1)

Beijing Institute of Technology¹

01 Jan 2021-Journal of Manufacturing Systems

TL;DR: A digital twin-based assembly data management and process traceability approach for complex products is proposed and the Digital Twin-based Assembly Process Management and Control System (DT-APMCS) was designed to verify the efficiency of the proposed approach.

...read moreread less

Journal Article•DOI•

Clinica: an open source software platform for reproducible clinical neuroscience studies

[...]

13 Aug 2021-Frontiers in Neuroinformatics

TL;DR: Clinica is an open-source software platform designed to make clinical neuroscience studies easier and more reproducible, and for researchers to spend less time on data management and processing, and perform reproducible evaluations of their methods.

...read moreread less

Abstract: We present Clinica (www.clinica.run), an open-source software platform designed to make clinical neuroscience studies easier and more reproducible. Clinica aims for researchers to i) spend less time on data management and processing, ii) perform reproducible evaluations of their methods, and iii) easily share data and results within their institution and with external collaborators. The core of Clinica is a set of automatic pipelines for processing and analysis of multimodal neuroimaging data (currently, T1-weighted MRI, diffusion MRI and PET data), as well as tools for statistics, machine learning and deep learning. It relies on the brain imaging data structure (BIDS) for the organization of raw neuroimaging datasets and on established tools written by the community to build its pipelines. It also provides converters of public neuroimaging datasets to BIDS (currently ADNI, AIBL, OASIS and NIFD). Processed data include image-valued scalar fields (e.g. tissue probability maps), meshes, surface-based scalar fields (e.g. cortical thickness maps) or scalar outputs (e.g. regional averages). These data follow the ClinicA Processed Structure (CAPS) format which shares the same philosophy as BIDS. Consistent organization of raw and processed neuroimaging files facilitates the execution of single pipelines and of sequences of pipelines, as well as the integration of processed data into statistics or machine learning frameworks. The target audience of Clinica is neuroscientists or clinicians conducting clinical neuroscience studies involving multimodal imaging, and researchers developing advanced machine learning algorithms applied to neuroimaging data.

...read moreread less

Journal Article•DOI•

A Survey on Trajectory Data Management, Analytics, and Learning

[...]

Sheng Wang¹, Zhifeng Bao², J. Shane Culpepper², Gao Cong³•Institutions (3)

New York University¹, RMIT University², Nanyang Technological University³

05 Mar 2021-ACM Computing Surveys

TL;DR: In this paper, the authors comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering.

...read moreread less

Abstract: Recent advances in sensor and mobile devices have enabled an unprecedented increase in the availability and collection of urban trajectory data, thus increasing the demand for more efficient ways to manage and analyze the data being produced. In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering. We also explore four closely related analytical tasks commonly used with trajectory data in interactive or real-time processing. Deep trajectory learning is also reviewed for the first time. Finally, we outline the essential qualities that a trajectory data management system should possess to maximize flexibility.

...read moreread less

Journal Article•DOI•

Ensuring Improved Security in Medical Data Using ECC and Blockchain Technology with Edge Devices

[...]

Mary Subaja Christo¹, V. Elizabeth Jesi¹, Uma Priyadarsini², V. Anbarasu¹, Hridya Venugopal, Marimuthu Karuppiah¹ - Show less +2 more•Institutions (2)

SRM University¹, Saveetha University²

16 Oct 2021-Security and Communication Networks

TL;DR: This paper focuses on implementing the elliptic curve cryptography (ECC) technique, a lightweight authentication approach to share the data effectively, and discusses two important data security issues: data authentication and data confidentiality.

...read moreread less

Abstract: Hospital data management is one of the functional parts of operations to store and access healthcare data. Nowadays, protecting these from hacking is one of the most difficult tasks in the healthcare system. As the user’s data collected in the field of healthcare is very sensitive, adequate security measures have to be taken in this field to protect the networks. To maintain security, an effective encryption technology must be utilised. This paper focuses on implementing the elliptic curve cryptography (ECC) technique, a lightweight authentication approach to share the data effectively. Many researches are in place to share the data wirelessly, among which this work uses Electronic Medical Card (EMC) to store the healthcare data. The work discusses two important data security issues: data authentication and data confidentiality. To ensure data authentication, the proposed system employs a secure mechanism to encrypt and decrypt the data with a 512-bit key. Data confidentiality is ensured by using the Blockchain ledger technique which allows ethical users to access the data. Finally, the encrypted data is stored on the edge device. The edge computing technology is used to store the medical reports within the edge network to access the data in a very fast manner. An authenticated user can decrypt the data and process the data at optimum speed. After processing, the updated data is stored in the Blockchain and in the cloud server. This proposed method ensures secure maintenance and efficient retrieval of medical data and reports.

...read moreread less

Journal Article•DOI•

3D Optical Machine Vision Sensors With Intelligent Data Management for Robotic Swarm Navigation Improvement

[...]

Oleg Sergiyenko¹, Vera Tyrsa¹•Institutions (1)

Autonomous University of Baja California¹

15 May 2021-IEEE Sensors Journal

TL;DR: The algorithm of data transfer from 3D optical sensor, based on the principle of dynamic triangulation, uses the distributed scalable big data storage and artificial intelligence in automated 3D metrology to optimize the fused data base for better path planning.

...read moreread less

Abstract: the optimized communication within robotic swarm, or group (RG) in a tightly obstacled ambient is crucial point to optimize group navigation for efficient sector trespass and monitoring. In the present work the main set of problems for multi-objective optimization in a non-stationary environment is described. It is presented the algorithm of data transfer from 3D optical sensor, based on the principle of dynamic triangulation. It uses the distributed scalable big data storage and artificial intelligence in automated 3D metrology. Two different simulations in order to optimize the fused data base for better path planning aiming the improvement of electric wheeled mobile robots group navigation in unknown cluttered terrain is presented. The optical laser scanning sensor combined with Intelligent Data Management permits more efficient dead-reckoning of the RG.

...read moreread less

Journal Article•DOI•

Unlocking the potential of big data to support tactical performance analysis in professional soccer: A systematic review.

[...]

F R Goes¹, Laurentius Antonius Meerhoff², Murilo José de Oliveira Bueno³, Daniele C. Uchoa Maia Rodrigues⁴, Felipe Arruda Moura³, Michel Brink¹, Marije T. Elferink-Gemser¹, Arno Knobbe², Sergio Augusto Cunha⁴, Ricardo da Silva Torres⁴, Koen A.P.M. Lemmink¹ - Show less +7 more•Institutions (4)

University Medical Center Groningen¹, Leiden University², Universidade Estadual de Londrina³, State University of Campinas⁴

03 Apr 2021-European Journal of Sport Science

TL;DR: A set of key challenges concerning the data analytics process, specifically feature construction, spatial and temporal aggregation are discussed, and how these challenges could be resolved through multidisciplinary collaboration, which is pivotal in unlocking the potential of position tracking data in sports analytics.

...read moreread less

Abstract: In professional soccer, increasing amounts of data are collected that harness great potential when it comes to analysing tactical behaviour. Unlocking this potential is difficult as big data challenges the data management and analytics methods commonly employed in sports. By joining forces with computer science, solutions to these challenges could be achieved, helping sports science to find new insights, as is happening in other scientific domains. We aim to bring multiple domains together in the context of analysing tactical behaviour in soccer using position tracking data. A systematic literature search for studies employing position tracking data to study tactical behaviour in soccer was conducted in seven electronic databases, resulting in 2338 identified studies and finally the inclusion of 73 papers. Each domain clearly contributes to the analysis of tactical behaviour, albeit in - sometimes radically - different ways. Accordingly, we present a multidisciplinary framework where each domain's contributions to feature construction, modelling and interpretation can be situated. We discuss a set of key challenges concerning the data analytics process, specifically feature construction, spatial and temporal aggregation. Moreover, we discuss how these challenges could be resolved through multidisciplinary collaboration, which is pivotal in unlocking the potential of position tracking data in sports analytics.

...read moreread less

Journal Article•DOI•

How to address data privacy concerns when using social media data in conservation science

[...]

Enrico Di Minin¹, Enrico Di Minin², Christoph Fink¹, Anna Hausmann¹, Jens Kremer¹, Ritwik Kulkarni¹ - Show less +2 more•Institutions (2)

University of Helsinki¹, University of KwaZulu-Natal²

01 Apr 2021-Conservation Biology

TL;DR: In this article, the legal basis for using social media data while ensuring data subjects' rights through a case study based on the European Union's General Data Protection Regulation is investigated, and the authors recommend that conservation scientists carefully consider their research objectives so as to facilitate responsible use of social media datasets in conservation science research, for example, in conservation culturomics and investigations of illegal wildlife trade online.

...read moreread less

Abstract: Social media data are being increasingly used in conservation science to study human-nature interactions. User-generated content, such as images, video, text, and audio, and the associated metadata can be used to assess such interactions. A number of social media platforms provide free access to user-generated social media content. However, similar to any research involving people, scientific investigations based on social media data require compliance with highest standards of data privacy and data protection, even when data are publicly available. Should social media data be misused, the risks to individual users' privacy and well-being can be substantial. We investigated the legal basis for using social media data while ensuring data subjects' rights through a case study based on the European Union's General Data Protection Regulation. The risks associated with using social media data in research include accidental and purposeful misidentification that has the potential to cause psychological or physical harm to an identified person. To collect, store, protect, share, and manage social media data in a way that prevents potential risks to users involved, one should minimize data, anonymize data, and follow strict data management procedure. Risk-based approaches, such as a data privacy impact assessment, can be used to identify and minimize privacy risks to social media users, to demonstrate accountability and to comply with data protection legislation. We recommend that conservation scientists carefully consider our recommendations in devising their research objectives so as to facilitate responsible use of social media data in conservation science research, for example, in conservation culturomics and investigations of illegal wildlife trade online.

...read moreread less

Journal Article•DOI•

Machine Learning Assisted Information Management Scheme in Service Concentrated IoT

[...]

Gunasekaran Manogaran¹, Mamoun Alazab², Vijayalakshmi Saravanan³, Bharat S. Rawal⁴, P. Mohamed Shakeel⁵, Revathi Sundarasekar⁶, Senthil Murugan Nagarajan⁷, Seifedine Kadry⁸, Carlos Enrique Montenegro-Marin⁹ - Show less +5 more•Institutions (9)

University of California, Davis¹, Charles Darwin University², Ryerson University³, Gannon University⁴, Universiti Teknikal Malaysia Melaka⁵, Anna University⁶, VIT University⁷, Beirut Arab University⁸, District University of Bogotá⁹

01 Apr 2021-IEEE Transactions on Industrial Informatics

TL;DR: A machine learning aided information management scheme is proposed for handling data to ensure uninterrupted user request service and ensures less replication and minimum service response time irrespective of the request and device density.

...read moreread less

Abstract: Internet of Things (IoT) has gained significant importance due to its flexibility in integrating communication technologies and smart devices for the ease of service provisioning. IoT services rely on a heterogeneous cloud network for serving user demands ubiquitously. The service data management is a complex task in this heterogeneous environment due to random access and service compositions. In this article, a machine learning aided information management scheme is proposed for handling data to ensure uninterrupted user request service. The neural learning process gains control over service attributes and data response to abruptly assign resources to the incoming requests in the data plane. The learning process operates in the data plane, where requests and responses for service are instantaneous. This facilitates the smoothing of the learning process to decide upon the possible resources and more precise service delivery without duplication. The proposed data management scheme ensures less replication and minimum service response time irrespective of the request and device density.

...read moreread less

Journal Article•DOI•

Application and Research of the Intelligent Management System Based on Internet of Things Technology in the Era of Big Data

[...]

Xiaojing Lv¹, Minghai Li²•Institutions (2)

Pukyong National University¹, Xi'an University of Architecture and Technology²

16 Jun 2021-Mobile Information Systems

TL;DR: In this article, the authors study the process flow and existing problems in the logistics link in the factory, and combine the management concept of the Internet of Things, and integrate IC card identification, RFID radio frequency identification technology, barriers and ground sensing technology, and OPC/PLC.

...read moreread less

Abstract: With the in-depth application of the Internet of Things, many emerging technologies are changing the global industry landscape on an unprecedented scale. At the same time, they also provide an opportunity for the development of intelligent management to break through the bottleneck. The proposal and evolution of the concept of intelligent management makes the development of management technology more advanced. Therefore, the introduction of the Internet of Things technology into intelligent management has very important research significance and value. First, conduct research on the origin and current situation of the Internet of Things technology at home and abroad and understand the relevant theories and cutting-edge technologies of the Internet of Things technology. Second, take the enterprise as an example to study the process flow and existing problems in the logistics link in the factory, combine the management concept of the Internet of Things, and integrate IC card identification technology, RFID radio frequency identification technology, barriers and ground sensing technology, and OPC/PLC. Third, combined with PLC/OPC technology, the design and integration of the software system and hardware system are realized. The experimental results show that the system modules have been tested to provide company information management, employee multifactor predictive analysis, and efficient batch efficiency evaluation, which has a certain value for company data management and data analysis and mining, and improve company efficiency by more than 30%.

...read moreread less

Journal Article•DOI•

Cpds: Enabling Compressed and Private Data Sharing for Industrial Internet of Things Over Blockchain

[...]

Saiyu Qi¹, Youshui Lu², Yuanqing Zheng³, Yumo Li², Xiaofeng Chen¹ - Show less +1 more•Institutions (3)

Xidian University¹, Xi'an Jiaotong University², Hong Kong Polytechnic University³

01 Apr 2021-IEEE Transactions on Industrial Informatics

TL;DR: Cpds is proposed, a compressed and private data sharing framework that provides efficient andPrivate data management for product data stored on the blockchain and devises two new mechanisms to store compressed and policy-enforced product data on the Blockchain.

...read moreread less

Abstract: Internet of Things (IoT) is a promising technology to provide product traceability for industrial systems. By using sensing and networking techniques, an IoT-enabled industrial system enables its participants to efficiently track products and record their status during production process. Current industrial IoT systems lack a unified product data sharing service, which prevents the participants from acquiring trusted traceability of products. Using emerging blockchain technology to build such a service is a promising direction. However, directly storing product data on blockchain incurs in efficiency and privacy issues in data management due to its distributed infrastructure. In response, we propose Cpds, a compressed and private data sharing framework, that provides efficient and private data management for product data stored on the blockchain. Cpds devises two new mechanisms to store compressed and policy-enforced product data on the blockchain. As a result, multiple industrial participants can efficiently share product data with fine-grained access control in a distributed environment without relying on a trusted intermediary. We conduct extensive empirical studies and demonstrate the feasibility of Cpds in improving the efficiency and security protection of product data storage on the blockchain.

...read moreread less

Journal Article•DOI•

Managing by Data: Algorithmic Categories and Organizing:

[...]

Cristina Alaimo¹, Jannis Kallinikos¹, Jannis Kallinikos²•Institutions (2)

Libera Università Internazionale degli Studi Sociali Guido Carli¹, London School of Economics and Political Science²

01 Sep 2021-Organization Studies

TL;DR: This work conducts an empirical investigation of Last.fm, an online music discovery platform, and finds that data mining and data management techniques are increasingly permeate organizations and the contexts in which they are embedded.

...read moreread less

Abstract: Data and data management techniques increasingly permeate organizations and the contexts in which they are embedded. We conduct an empirical investigation of Last.fm, an online music discovery plat...

...read moreread less

Journal Article•DOI•

Big Data Analytics in Weather Forecasting: A Systematic Review

[...]

Marzieh Fathi¹, Mostafa Haghi Kashani¹, Seyed Mahdi Jameii¹, Ebrahim Mahdipour¹•Institutions (1)

Islamic Azad University¹

28 Jun 2021-Archives of Computational Methods in Engineering

TL;DR: This paper tenders a systematic literature review method for big data analytic approaches in weather forecasting (published between 2014 and August 2020) and presents a comparison of the aforementioned categories regarding accuracy, scalability, execution time, and other Quality of Service factors.

...read moreread less

Abstract: Weather forecasting, as an important and indispensable procedure in people’s daily lives, evaluates the alteration happening in the current condition of the atmosphere. Big data analytics is the process of analyzing big data to extract the concealed patterns and applicable information that can yield better results. Nowadays, several parts of society are interested in big data, and the meteorological institute is not excluded. Therefore, big data analytics will give better results in weather forecasting and will help forecasters to forecast weather more accurately. In order to achieve this goal and to recommend favorable solutions, several big data techniques and technologies have been suggested to manage and analyze the huge volume of weather data from different resources. By employing big data analytics in weather forecasting, the challenges related to traditional data management techniques and technology can be solved. This paper tenders a systematic literature review method for big data analytic approaches in weather forecasting (published between 2014 and August 2020). A feasible taxonomy of the current reviewed papers is proposed as technique-based, technology-based, and hybrid approaches. Moreover, this paper presents a comparison of the aforementioned categories regarding accuracy, scalability, execution time, and other Quality of Service factors. The types of algorithms, measurement environments, modeling tools, and the advantages and disadvantages per paper are extracted. In addition, open issues and future trends are debated.

...read moreread less

Journal Article•DOI•

A blockchain-based preserving and sharing system for medical data privacy

[...]

Chen Zeng¹, Xu Weidong¹, Bingtao Wang¹, Hua Yu¹•Institutions (1)

Chongqing University¹

01 Nov 2021-Future Generation Computer Systems

TL;DR: A complete medical information system model based on blockchain technology is proposed to realize the goal of safe storage and sharing of medical data and provides means for remote diagnosis and treatment, data mining and other practical applications based on the medical data on the blockchain.

...read moreread less

Journal Article•DOI•

On data lake architectures and metadata management

[...]

Pegdwendé N. Sawadogo¹, Jérôme Darmont¹•Institutions (1)

University of Lyon¹

23 Jul 2021-arXiv: Databases

TL;DR: In this paper, the authors provide a comprehensive state of the art of different approaches to data lake design and particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes.

...read moreread less

Abstract: Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are mainly characterized by volume, velocity, variety and veracity issues. Big data-related issues strongly challenge traditional data management and analysis systems. The concept of data lake was introduced to address them. A data lake is a large, raw data repository that stores and manages all company data bearing any format. However, the data lake concept remains ambiguous or fuzzy for many researchers and practitioners, who often confuse it with the Hadoop technology. Thus, we provide in this paper a comprehensive state of the art of the different approaches to data lake design. We particularly focus on data lake architectures and metadata management, which are key issues in successful data lakes. We also discuss the pros and cons of data lakes and their design alternatives.

...read moreread less

Journal Article•DOI•

Model-Based Big Data Analytics-as-a-Service: Take Big Data to the Next Level

[...]

Claudio Agostino Ardagna¹, Valerio Bellandi¹, Michele Bezzi, Paolo Ceravolo¹, Ernesto Damiani², Cedric Hebert - Show less +2 more•Institutions (2)

University of Milan¹, Khalifa University²

01 Mar 2021-IEEE Transactions on Services Computing

TL;DR: A return to roots is proposed by defining a Model-Driven Engineering (MDE) methodology that supports automation of BDA based on model specification that lets customers declare requirements to be achieved by an abstract Big Data platform and smart engines deploy the Big Data pipeline carrying out the analytics on a specific instance of such platform.

...read moreread less

Abstract: The Big Data revolution promises to build a data-driven ecosystem where better decisions are supported by enhanced analytics and data management. However, major hurdles still need to be overcome on the road that leads to commoditization and wide adoption of Big Data Analytics (BDA). Big Data complexity is the first factor hampering the full potential of BDA. The opacity and variety of Big Data technologies and computations, in fact, make BDA a failure prone and resource-intensive process, which requires a trial-and-error approach. This problem is even exacerbated by the fact that current solutions to Big Data application development take a bottom-up approach, where the last technology release drives application development. Selection of the best Big Data platform, as well as of the best pipeline to execute analytics, represents then a deal breaker. In this paper, we propose a return to roots by defining a Model-Driven Engineering (MDE) methodology that supports automation of BDA based on model specification. Our approach lets customers declare requirements to be achieved by an abstract Big Data platform and smart engines deploy the Big Data pipeline carrying out the analytics on a specific instance of such platform. Driven by customers’ requirements, our methodology is based on an OWL-S ontology of Big Data services and on a compiler transforming OWL-S service compositions in workflows that can be directly executed on the selected platform. The proposal is experimentally evaluated in a real-world scenario focusing on the threat detection system of SAP.

...read moreread less

Journal Article•DOI•

Holistic big data integrated artificial intelligent modeling to improve privacy and security in data management of smart cities

[...]

Jie Chen, L. Ramanathan¹, Mamoun Alazab², Mamoun Alazab¹•Institutions (2)

VIT University¹, Charles Darwin University²

01 Mar 2021-Microprocessors and Microsystems

TL;DR: In HBDIAIM, a differential evolutionary algorithm has been incorporated to build adequate security for the confidential data management interface in smart city applications and the Big Data analytics assisted decision privacy scheme has been used in the differential evolutionarygorithm, which improves the scalability and accessibility of the information in a data management interfaces based on their corresponding storage location.

...read moreread less

Journal Article•DOI•

Healthcare and Fitness Data Management Using the IoT-Based Blockchain Platform.

[...]

Tarek Frikha¹, Ahmed Chaari¹, Faten Chaabane¹, Omar Cheikhrouhou², Atef Zaguia² - Show less +1 more•Institutions (2)

University of Sfax¹, Taif University²

01 Jan 2021-Journal of Healthcare Engineering

TL;DR: In this article, the authors proposed an integrated low-powered IoT blockchain platform for a healthcare application to store and review EHRs, which includes a web and mobile application allowing the patient as well as the medical and paramedical staff to have a secure access to health information.

...read moreread less

Abstract: Because of the availability of more than an actor and a wireless component among e-health applications, providing more security and safety is expected. Moreover, ensuring data confidentiality within different services becomes a key requirement. In this paper, we propose to collect data from health and fitness smart devices deployed in connection with the proposed IoT blockchain platform. The use of these devices helps us in extracting an amount of highly valuable heath data that are filtered, analyzed, and stored in electronic health records (EHRs). Different actors of the platform, coaches, patients, and doctors, collaborate to provide an on-time diagnosis and treatment for various diseases in an easy and cost-effective way. Our main purpose is to provide a distributed, secure, and authorized access to these sensitive data using the Ethereum blockchain technology. We have designed an integrated low-powered IoT blockchain platform for a healthcare application to store and review EHRs. This architecture, based on the blockchain Ethereum, includes a web and mobile application allowing the patient as well as the medical and paramedical staff to have a secure access to health information. The Ethereum node is implemented on an embedded platform, which should provide an efficient, flexible, and secure system despite the limited resources and low power consumption of the multiprocessor platform.

...read moreread less

Journal Article•DOI•

A Review of Monitoring Technologies for Solar PV Systems Using Data Processing Modules and Transmission Protocols: Progress, Challenges and Prospects

[...]

Shaheer Ansari, Afida Ayob, Molla Shahadat Hossain Lipu, Mohamad Hanif Md Saad, Aini Hussain - Show less +1 more

21 Jul 2021-Sustainability

TL;DR: This paper comprehensively reviews the progress of several solar PV-based monitoring technologies focusing on various data processing modules and data transmission protocols and offers selective proposals for future research works.

...read moreread less

Abstract: Solar photovoltaic (PV) is one of the prominent sustainable energy sources which shares a greater percentage of the energy generated from renewable resources. As the need for solar energy has risen tremendously in the last few decades, monitoring technologies have received considerable attention in relation to performance enhancement. Recently, the solar PV monitoring system has been integrated with a wireless platform that comprises data acquisition from various sensors and nodes through wireless data transmission. However, several issues could affect the performance of solar PV monitoring, such as large data management, signal interference, long-range data transmission, and security. Therefore, this paper comprehensively reviews the progress of several solar PV-based monitoring technologies focusing on various data processing modules and data transmission protocols. Each module and transmission protocol-based monitoring technology is investigated with regard to type, design, implementations, specifications, and limitations. The critical discussion and analysis are carried out with respect to configurations, parameters monitored, software, platform, achievements, and suggestions. Moreover, various key issues and challenges are explored to identify the existing research gaps. Finally, this review delivers selective proposals for future research works. All the highlighted insights of this review will hopefully lead to increased efforts toward the enhancement of the monitoring technologies in future sustainable solar PV applications.

...read moreread less

Collapse