Showing papers in "Information Processing and Management in 2021"

PDF

Open Access

Journal Article•DOI•

A Survey on Blockchain for Information Systems Management and Security

[...]

David Berdik¹, Safa Otoum², Nikolas Schmidt¹, Dylan Porter¹, Yaser Jararweh¹ - Show less +1 more•Institutions (2)

01 Jan 2021-Information Processing and Management

TL;DR: It is highlighted that blockchain’s structure and modern cloud- and edge-computing paradigms are crucial in enabling a widespread adaption and development of blockchain technologies for new players in today unprecedented vibrant global market.

...read moreread less

Abstract: Blockchain technologies have grown in prominence in recent years, with many experts citing the potential applications of the technology in regard to different aspects of any industry, market, agency, or governmental organizations. In the brief history of blockchain, an incredible number of achievements have been made regarding how blockchain can be utilized and the impacts it might have on several industries. The sheer number and complexity of these aspects can make it difficult to address blockchain potentials and complexities, especially when trying to address its purpose and fitness for a specific task. In this survey, we provide a comprehensive review of applying blockchain as a service for applications within today’s information systems. The survey gives the reader a deeper perspective on how blockchain helps to secure and manage today information systems. The survey contains a comprehensive reporting on different instances of blockchain studies and applications proposed by the research community and their respective impacts on blockchain and its use across other applications or scenarios. Some of the most important findings this survey highlights include the fact that blockchain’s structure and modern cloud- and edge-computing paradigms are crucial in enabling a widespread adaption and development of blockchain technologies for new players in today unprecedented vibrant global market. Ensuring that blockchain is widely available through public and open-source code libraries and tools will help to ensure that the full potential of the technology is reached and that further developments can be made concerning the long-term goals of blockchain enthusiasts.

...read moreread less

291 citations

Journal Article•DOI•

Blockchain-based authentication and authorization for smart city applications

[...]

Christian Esposito¹, Massimo Ficco², Brij B. Gupta³, Brij B. Gupta⁴•Institutions (4)

University of Salerno¹, Seconda Università degli Studi di Napoli², National Institute of Technology, Kurukshetra³, Asia University (Taiwan)⁴

01 Mar 2021-Information Processing and Management

TL;DR: In this article, the authors proposed a solution for distributed management of identity and authorization policies by leveraging on the blockchain technology to hold a global view of the security policies within the system, and integrating it in the FIWARE platform.

...read moreread less

Abstract: The platforms supporting the smart city applications are rarely implemented from scratch by a municipality and/or totally owned by a single company, but are more typically realized by integrating some existing ICT infrastructures thanks to a supporting platform, such as the well known FIWARE platform. Such a multi-tenant deployment model is required to lower the initial investment costs to implement large scale solutions for smart cities, but also imposes some key security obstacles. In fact, smart cities support critical applications demanding to protect the data and functionalities from malicious and unauthorized uses. Equipping the supporting platforms with proper means for access control is demanding, but these means are typically implemented according to a centralized approach, where a single server stores and makes available a set of identity attributes and authorization policies. Having a single root of trust is not suitable in a distributed and cooperating scenario of large scale smart cities due to their multi-tenant deployment. In fact, each of the integrated system has its own set of security policies, and the other systems need to be aware of these policy, in order to allow a seamless use of the same credentials across the overall infrastructure (realizing what is known as the single-sign-on). This imposes the problem of consistent and secure data replicas within a distributed system, which can be properly approached by using the blockchain technology. Therefore, this work proposes a novel solution for distributed management of identity and authorization policies by leveraging on the blockchain technology to hold a global view of the security policies within the system, and integrating it in the FIWARE platform. A detailed assessment is provided to evaluate the goodness of the proposed approach and to compare it with the existing solutions.

...read moreread less

228 citations

Journal Article•DOI•

From information seeking to information avoidance: Understanding the health information behavior during a global health crisis

[...]

Saira Hanif Soroya¹, Ali Farooq², Khalid Mahmood¹, Jouni Isoaho², Shan-e Zara¹ - Show less +1 more•Institutions (2)

University of the Punjab¹, University of Turku²

01 Mar 2021-Information Processing and Management

TL;DR: A model to understand the effect of information seeking, information sources, and information overload (Stimuli) on information anxiety (psychological organism), and consequent behavioral response, information avoidance during the global health crisis (COVID-19) is proposed.

...read moreread less

Abstract: Individuals seek information for informed decision-making, and they consult a variety of information sources nowadays. However, studies show that information from multiple sources can lead to information overload, which then creates negative psychological and behavioral responses. Drawing on the Stimulus-Organism-Response (S-O-R) framework, we propose a model to understand the effect of information seeking, information sources, and information overload (Stimuli) on information anxiety (psychological organism), and consequent behavioral response, information avoidance during the global health crisis (COVID-19). The proposed model was tested using partial least square structural equation modeling (PLS-SEM) for which data were collected from 321 Finnish adults using an online survey. People found to seek information from traditional sources such as mass media, print media, and online sources such as official websites and websites of newspapers and forums. Social media and personal networks were not the preferred sources. On the other hand, among different information sources, social media exposure has a significant relationship with information overload as well as information anxiety. Besides, information overload also predicted information anxiety, which further resulted in information avoidance.

...read moreread less

204 citations

Journal Article•DOI•

Improved Breast Cancer Classification Through Combining Graph Convolutional Network and Convolutional Neural Network

[...]

Yudong Zhang¹, Yudong Zhang², Suresh Chandra Satapathy, David S. Guttery¹, Juan Manuel Górriz³, Shuihua Wang², Shuihua Wang⁴ - Show less +3 more•Institutions (4)

University of Leicester¹, King Abdulaziz University², University of Granada³, Loughborough University⁴

01 Mar 2021-Information Processing and Management

TL;DR: The BDR-CNN-GCN showed improved performance compared to five proposed neural network models and 15 state-of-the-art breast cancer detection approaches, proving to be an effective method for data augmentation and improved detection of malignant breast masses.

...read moreread less

Abstract: Aim In a pilot study to improve detection of malignant lesions in breast mammograms, we aimed to develop a new method called BDR-CNN-GCN, combining two advanced neural networks: (i) graph convolutional network (GCN); and (ii) convolutional neural network (CNN). Method We utilised a standard 8-layer CNN, then integrated two improvement techniques: (i) batch normalization (BN) and (ii) dropout (DO). Finally, we utilized rank-based stochastic pooling (RSP) to substitute the traditional max pooling. This resulted in BDR-CNN, which is a combination of CNN, BN, DO, and RSP. This BDR-CNN was hybridized with a two-layer GCN, and yielded our BDR-CNN-GCN model which was then utilized for analysis of breast mammograms as a 14-way data augmentation method. Results As proof of concept, we ran our BDR-CNN-GCN algorithm 10 times on the breast mini-MIAS dataset (containing 322 mammographic images), achieving a sensitivity of 96.20±2.90%, a specificity of 96.00±2.31% and an accuracy of 96.10±1.60%. Conclusion Our BDR-CNN-GCN showed improved performance compared to five proposed neural network models and 15 state-of-the-art breast cancer detection approaches, proving to be an effective method for data augmentation and improved detection of malignant breast masses.

...read moreread less

189 citations

Journal Article•DOI•

Stance detection on social media: State of the art and trends

[...]

Abeer Aldayel¹, Walid Magdy¹•Institutions (1)

University of Edinburgh¹

01 Jul 2021-Information Processing and Management

TL;DR: An exhaustive review of stance detection techniques on social media, including the task definition, different types of targets in stance detection, features set used, and various machine learning approaches applied is presented.

...read moreread less

Abstract: Stance detection on social media is an emerging opinion mining paradigm for various social and political applications in which sentiment analysis may be sub-optimal. There has been a growing research interest for developing effective methods for stance detection methods varying among multiple communities including natural language processing, web science, and social computing, where each modeled stance detection in different ways. In this paper, we survey the work on stance detection across those communities and present an exhaustive review of stance detection techniques on social media, including the task definition, different types of targets in stance detection, features set used, and various machine learning approaches applied. Our survey reports state-of-the-art results on the existing benchmark datasets on stance detection, and discusses the most effective approaches. In addition, we explore the emerging trends and different applications of stance detection on social media, including opinion mining and prediction and recently using it for fake news detection. The study concludes by discussing the gaps in the current existing research and highlights the possible future directions for stance detection on social media.

...read moreread less

121 citations

Journal Article•DOI•

Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data

[...]

Ranjan Kumar Behera¹, Monalisa Jena, Santanu Kumar Rath¹, Sanjay Misra², Sanjay Misra³ - Show less +1 more•Institutions (3)

National Institute of Technology, Rourkela¹, Covenant University², Atılım University³

01 Jan 2021-Information Processing and Management

TL;DR: A hybrid approach of two deep learning architectures namely Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) (RNN with memory) is suggested for sentiment classification of reviews posted at diverse domains for sentiment analysis of consumer reviews posted on social media.

...read moreread less

Abstract: Analysis of consumer reviews posted on social media is found to be essential for several business applications. Consumer reviews posted in social media are increasing at an exponential rate both in terms of number and relevance, which leads to big data. In this paper, a hybrid approach of two deep learning architectures namely Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) (RNN with memory) is suggested for sentiment classification of reviews posted at diverse domains. Deep convolutional networks have been highly effective in local feature selection, while recurrent networks (LSTM) often yield good results in the sequential analysis of a long text. The proposed Co-LSTM model is mainly aimed at two objectives in sentiment analysis. First, it is highly adaptable in examining big social data, keeping scalability in mind, and secondly, unlike the conventional machine learning approaches, it is free from any particular domain. The experiment has been carried out on four review datasets from diverse domains to train the model which can handle all kinds of dependencies that usually arises in a post. The experimental results show that the proposed ensemble model outperforms other machine learning approaches in terms of accuracy and other parameters.

...read moreread less

120 citations

Journal Article•DOI•

Latency performance modeling and analysis for hyperledger fabric blockchain network

[...]

Xiaoqiong Xu¹, Gang Sun¹, Long Luo¹, Cao Huilong¹, Hongfang Yu¹, Athanasios V. Vasilakos², Athanasios V. Vasilakos³ - Show less +3 more•Institutions (3)

University of Electronic Science and Technology of China¹, Fuzhou University², Luleå University of Technology³

01 Jan 2021-Information Processing and Management

TL;DR: A novel theoretical model is provided to calculate the transaction latency under various network configurations such as block size, block interval, etc to help assess the effectiveness of the Fabric blockchain.

...read moreread less

Abstract: Blockchain has been one of the most attractive technologies for many modern and even future applications. Fabric, an open-source framework to implement the permissioned enterprise-grade blockchain, is getting increasing attention from innovators. The latency performance is crucial to the Fabric blockchain in assessing its effectiveness. Many empirical studies were conducted to analyze this performance based on different hardware platforms. These experimental results are not comparable as they are highly dependent on the underlying networks. Moreover, theoretical analysis on the latency of Fabric blockchain still receives much less attention. This paper provides a novel theoretical model to calculate the transaction latency under various network configurations such as block size, block interval, etc. Subsequently, we validate the proposed latency model with experiments, and the results show that the difference between analytical and experimental results is as low as 6.1 % . We also identify some performance bottlenecks and give insights from the developer’s perspective.

...read moreread less

119 citations

Journal Article•DOI•

What motivates Chinese consumers to avoid information about the COVID-19 pandemic?: The perspective of the stimulus-organism-response model

[...]

Shijie Song¹, Xinlin Yao², Nainan Wen¹•Institutions (2)

Nanjing University¹, Nanjing University of Science and Technology²

01 Jan 2021-Information Processing and Management

TL;DR: In this paper, a model for exploring the effects of external stimuli (perceived threat and perceived information overload) related to COVID-19 on consumers' internal states (sadness, anxiety, and cognitive dissonance) and their subsequent behavioral intentions to avoid health information and engage in preventive behaviors was proposed.

...read moreread less

Abstract: This study investigated consumers’ information-avoidance behavior in the context of a public health emergency—the COVID-19 pandemic in China. Guided by the stimulus-organism-response paradigm, it proposes a model for exploring the effects of external stimuli (perceived threat and perceived information overload) related to COVID-19 on consumers’ internal states (sadness, anxiety, and cognitive dissonance) and their subsequent behavioral intentions to avoid health information and engage in preventive behaviors. With a survey sample (N = 721), we empirically examined the proposed model and tested the hypotheses. The results indicate that sadness, anxiety, and cognitive dissonance, which were a result of perceived threat and perceived information overload, had heterogeneous effects on information avoidance. Anxiety and cognitive dissonance increased information avoidance intention, while sadness decreased information avoidance intention. Moreover, information avoidance predicted a reluctance on the part of consumers to engage in preventive behaviors during the COVID-19 pandemic. These findings not only contribute to the information behavior literature and extend the concept of information avoidance to a public health emergency context, but also yield practical insights for global pandemic control.

...read moreread less

113 citations

Journal Article•DOI•

EtherTwin: Blockchain-based Secure Digital Twin Information Management

[...]

Benedikt Putz, Marietheres Dietz, Philip Empl, Günther Pernul

01 Jan 2021-Information Processing and Management

TL;DR: This work proposes an owner-centric decentralized sharing model for Digital Twin data, and shows how to overcome the numerous implementation challenges associated with fully decentralized data sharing, enabling management of Digital Twin components and their associated information.

...read moreread less

Abstract: Digital Twins are complex digital representations of assets that are used by a variety of organizations across the Industry 4.0 value chain. As the digitization of industrial processes advances, Digital Twins will become widespread. As a result, there is a need to develop new secure data sharing models for a complex ecosystem of interacting Digital Twins and lifecycle parties. Decentralized Applications are uniquely suited to address these sharing challenges while ensuring availability, integrity and confidentiality. They rely on distributed ledgers and decentralized databases for data storage and processing, avoiding single points of trust. To tackle the need for decentralized sharing of Digital Twin data, this work proposes an owner-centric decentralized sharing model. A formal access control model addresses integrity and confidentiality aspects based on Digital Twin components and lifecycle requirements. With our prototypical implementation EtherTwin we show how to overcome the numerous implementation challenges associated with fully decentralized data sharing, enabling management of Digital Twin components and their associated information. For validation, the prototype is evaluated based on an industry use case and semi-structured expert interviews.

...read moreread less

97 citations

Journal Article•DOI•

Transaction-based classification and detection approach for Ethereum smart contract

[...]

Teng Hu¹, Teng Hu², Xiaolei Liu², Ting Chen¹, Xiaosong Zhang¹, Xiaoming Huang, Niu Weina¹, Jiazhong Lu³, Kun Zhou¹, Kun Zhou², Yuan Liu² - Show less +7 more•Institutions (3)

University of Electronic Science and Technology of China¹, China Academy of Engineering Physics², Chengdu University of Information Technology³

01 Mar 2021-Information Processing and Management

TL;DR: This paper collected over 10,000 smart contracts from Ethereum and focused on the data behavior generated by smart contracts and users, and proposed a transaction-based classification and detection approach for Ethereum smart contract to address issues.

...read moreread less

Abstract: Blockchain technology brings innovation to various industries. Ethereum is currently the second blockchain platform by market capitalization, it’s also the largest smart contract blockchain platform. Smart contracts can simplify and accelerate the development of various applications, but they also bring some problems. For example, smart contracts are used to commit fraud, vulnerability contracts are deliberately developed to undermine fairness, and there are numerous duplicative contracts that waste performance with no actual purpose. In this paper, we propose a transaction-based classification and detection approach for Ethereum smart contract to address these issues. We collected over 10,000 smart contracts from Ethereum and focused on the data behavior generated by smart contracts and users. We identified four behavior patterns from the transactions by manual analysis, which can be used to distinguish the difference between different types of contracts. Then 14 basic features of a smart contract are constructed from these. To construct the experimental dataset, we propose a data slicing algorithm for slicing the collected smart contracts. After that, we use an LSTM network to train and test our datasets. The extensive experimental results show that our approach can distinguish different types of contracts and can be applied to anomaly detection and malicious contract identification with satisfactory precision, recall, and f1-score.

...read moreread less

95 citations

Journal Article•DOI•

B-FERL: Blockchain based framework for securing smart vehicles

[...]

Chuka Oham¹, Regio A. Michelin¹, Regio A. Michelin², Raja Jurdak³, Salil S. Kanhere², Salil S. Kanhere¹, Sanjay Jha², Sanjay Jha¹ - Show less +4 more•Institutions (3)

University of New South Wales¹, Cooperative Research Centre², Queensland University of Technology³

01 Jan 2021-Information Processing and Management

TL;DR: In this article, the authors proposed a blockchain based framework for secure vehicular networks (B-FERL), which uses permissioned blockchain technology to tailor information access to restricted entities in the connected vehicle ecosystem, and uses a challenge-response data exchange between the vehicles and roadside units to monitor the internal state of the vehicle to identify cases of in-vehicle network compromise.

...read moreread less

Abstract: The ubiquity of connecting technologies in smart vehicles and the incremental automation of its functionalities promise significant benefits, including a significant decline in congestion and road fatalities. However, increasing automation and connectedness broadens the attack surface and heightens the likelihood of a malicious entity successfully executing an attack. In this paper, we propose a Blockchain based Framework for sEcuring smaRt vehicLes (B-FERL). B-FERL uses permissioned blockchain technology to tailor information access to restricted entities in the connected vehicle ecosystem. It also uses a challenge–response data exchange between the vehicles and roadside units to monitor the internal state of the vehicle to identify cases of in-vehicle network compromise. In order to enable authentic and valid communication in the vehicular network, only vehicles with a verifiable record in the blockchain can exchange messages. Through qualitative arguments, we show that B-FERL is resilient to identified attacks. Also, quantitative evaluations in an emulated scenario show that B-FERL ensures a suitable response time and required storage size compatible with realistic scenarios. Finally, we demonstrate how B-FERL achieves various important functions relevant to the automotive ecosystem such as trust management, vehicular forensics and secure vehicular networks.

...read moreread less

Journal Article•DOI•

Quantum-Inspired Blockchain-Based Cybersecurity: Securing Smart Edge Utilities in IoT-Based Smart Cities

[...]

Ahmed A. Abd El-Latif¹, Bassem Abd-El-Atty¹, Irfan Mehmood², Khan Muhammad³, Salvador E. Venegas-Andraca⁴, Jialiang Peng⁵ - Show less +2 more•Institutions (5)

Menoufia University¹, University of Bradford², Sejong University³, Monterrey Institute of Technology and Higher Education⁴, Heilongjiang University⁵

01 Jul 2021-Information Processing and Management

TL;DR: This paper presents a new authentication and encryption protocol based on quantum-inspired quantum walks (QIQW) that can defend against message attack and impersonation attacks, thus ensuring secure transmission of data among IoT devices.

...read moreread less

Abstract: Blockchain plays a vital task in cybersecurity. With the exerted efforts for realising large-scale quantum computers, most current cryptographic mechanisms may be hacked. Accordingly, we need a quantum tool utilised for designing blockchain frameworks to have the ability to be executed in the level of digital computers and resist the probable attacks from both digital and quantum computers. Quantum walks may be utilised as a quantum-inspired model for designing new cryptographic algorithms. In this paper, we present a new authentication and encryption protocol based on quantum-inspired quantum walks (QIQW). The proposed protocol is utilized to build a blockchain framework for secure data transmission among IoT devices. Instead of using classical cryptographic hash functions, quantum hash functions based on QIQW are employed for linking blocks of the chain. The main advantages of the presented framework are helping IoT nodes to effectively share their data with other nodes and full control of their records. Security analysis demonstrates that our proposed protocol can defend against message attack and impersonation attacks, thus ensuring secure transmission of data among IoT devices.

...read moreread less

Journal Article•DOI•

Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches

[...]

Yuehua Zhao¹, Jingwei Da¹, Jiaqi Yan¹•Institutions (1)

Nanjing University¹

01 Jan 2021-Information Processing and Management

TL;DR: A novel health misinformation detection model was proposed which incorporated the central- level features and the peripheral-level features (including linguistic features, sentiment features, and user behavioral features) and correctly detected about 85% of the health misinformation.

...read moreread less

Abstract: Curbing the diffusion of health misinformation on social media has long been a public concern since the spread of such misinformation can have adverse effects on public health. Previous studies mainly relied on linguistic features and textual features to detect online health-related misinformation. Based on the Elaboration Likelihood Model (ELM), this study proposed that the features of online health misinformation can be classified into two levels: central-level and peripheral-level. In this study, a novel health misinformation detection model was proposed which incorporated the central-level features (including topic features) and the peripheral-level features (including linguistic features, sentiment features, and user behavioral features). In addition, the following behavioral features were introduced to reflect the interaction characteristics of users: Discussion initiation, Interaction engagement, Influential scope, Relational mediation, and Informational independence. Due to the lack of a labeled dataset, we collected the dataset from a real online health community in order to provide a real scenario for data analysis. Four types of misinformation were identified through the coding analysis. The proposed model and its individual features were validated on the real-world dataset. The model correctly detected about 85% of the health misinformation. The results also suggested that behavioral features were more informative than linguistic features in detecting misinformation. The findings not only demonstrated the efficacy of behavioral features in health misinformation detection but also offered both methodological and theoretical contributions to misinformation detection from the perspective of integrating the features of messages as well as the features of message creators.

...read moreread less

Journal Article•DOI•

A multimodal fake news detection model based on crossmodal attention residual and multichannel convolutional neural networks

[...]

Chenguang Song¹, Nianwen Ning¹, Yunlei Zhang², Bin Wu¹•Institutions (2)

Beijing University of Posts and Telecommunications¹, North China Institute of Science and Technology²

01 Jan 2021-Information Processing and Management

TL;DR: A multimodal fake news detection framework based on Crossmodal Attention Residual and Multichannel convolutional neural Networks (CARMN) is proposed and it is demonstrated that the proposed model outperforms the state-of-the-art methods and learns more discriminable feature representations.

...read moreread less

Abstract: In recent years, social media has increasingly become one of the popular ways for people to consume news. As proliferation of fake news on social media has the negative impacts on individuals and society, automatic fake news detection has been explored by different research communities for combating fake news. With the development of multimedia technology, there is a phenomenon that cannot be ignored is that more and more social media news contains information with different modalities, e.g., texts, pictures and videos. The multiple information modalities show more evidence of the happening of news events and present new opportunities to detect features in fake news. First, for multimodal fake news detection task, it is a challenge of keeping the unique properties for each modality while fusing the relevant information between different modalities. Second, for some news, the information fusion between different modalities may produce the noise information which affects model’s performance. Unfortunately, existing methods fail to handle these challenges. To address these problems, we propose a multimodal fake news detection framework based on Crossmodal Attention Residual and Multichannel convolutional neural Networks (CARMN). The Crossmodal Attention Residual Network (CARN) can selectively extract the relevant information related to a target modality from another source modality while maintaining the unique information of the target modality. The Multichannel Convolutional neural Network (MCN) can mitigate the influence of noise information which may be generated by crossmodal fusion component by extracting textual feature representation from original and fused textual information simultaneously. We conduct extensive experiments on four real-world datasets and demonstrate that the proposed model outperforms the state-of-the-art methods and learns more discriminable feature representations.

...read moreread less

Journal Article•DOI•

Combat COVID-19 infodemic using explainable natural language processing models

[...]

Jackie Ayoub¹, X. Jessie Yang¹, Feng Zhou¹•Institutions (1)

University of Michigan¹

06 Mar 2021-Information Processing and Management

TL;DR: An explainable natural language processing model based on DistilBERT and SHAP (Shapley Additive exPlanations) to combat misinformation about COVID-19 due to their efficiency and effectiveness and to boost public trust in model prediction is proposed.

...read moreread less

Abstract: Misinformation of COVID-19 is prevalent on social media as the pandemic unfolds, and the associated risks are extremely high. Thus, it is critical to detect and combat such misinformation. Recently, deep learning models using natural language processing techniques, such as BERT (Bidirectional Encoder Representations from Transformers), have achieved great successes in detecting misinformation. In this paper, we proposed an explainable natural language processing model based on DistilBERT and SHAP (Shapley Additive exPlanations) to combat misinformation about COVID-19 due to their efficiency and effectiveness. First, we collected a dataset of 984 claims about COVID-19 with fact-checking. By augmenting the data using back-translation, we doubled the sample size of the dataset and the DistilBERT model was able to obtain good performance (accuracy: 0.972; areas under the curve: 0.993) in detecting misinformation about COVID-19. Our model was also tested on a larger dataset for AAAI2021 - COVID-19 Fake News Detection Shared Task and obtained good performance (accuracy: 0.938; areas under the curve: 0.985). The performance on both datasets was better than traditional machine learning models. Second, in order to boost public trust in model prediction, we employed SHAP to improve model explainability, which was further evaluated using a between-subjects experiment with three conditions, i.e., text (T), text+SHAP explanation (TSE), and text+SHAP explanation+source and evidence (TSESE). The participants were significantly more likely to trust and share information related to COVID-19 in the TSE and TSESE conditions than in the T condition. Our results provided good implications for detecting misinformation about COVID-19 and improving public trust.

...read moreread less

Journal Article•DOI•

Connecting user and item perspectives in popularity debiasing for collaborative recommendation

[...]

Ludovico Boratto, Gianni Fenu¹, Mirko Marras¹•Institutions (1)

University of Cagliari¹

01 Jan 2021-Information Processing and Management

TL;DR: This paper formalizes two novel metrics that quantify how much a recommender system equally treats items along the popularity tail, and proposes an in-processing approach aimed at minimizing the biased correlation between user-item relevance and item popularity.

...read moreread less

Abstract: Recommender systems learn from historical users’ feedback that is often non-uniformly distributed across items. As a consequence, these systems may end up suggesting popular items more than niche items progressively, even when the latter would be of interest for users. This can hamper several core qualities of the recommended lists (e.g., novelty, coverage, diversity), impacting on the future success of the underlying platform itself. In this paper, we formalize two novel metrics that quantify how much a recommender system equally treats items along the popularity tail. The first one encourages equal probability of being recommended across items, while the second one encourages true positive rates for items to be equal. We characterize the recommendations of representative algorithms by means of the proposed metrics, and we show that the item probability of being recommended and the item true positive rate are biased against the item popularity. To promote a more equal treatment of items along the popularity tail, we propose an in-processing approach aimed at minimizing the biased correlation between user-item relevance and item popularity. Extensive experiments show that, with small losses in accuracy, our popularity-mitigation approach leads to important gains in beyond-accuracy recommendation quality.

...read moreread less

Journal Article•DOI•

CGNet: A graph-knowledge embedded convolutional neural network for detection of pneumonia

[...]

Xiang Yu¹, Shuihua Wang², Yudong Zhang¹, Yudong Zhang³•Institutions (3)

University of Leicester¹, Loughborough University², King Abdulaziz University³

01 Jan 2021-Information Processing and Management

TL;DR: A deep learning framework for a binary classification task that classifies chest X-ray images into normal and pneumonia based on the proposed CGNet, which achieved the best accuracy, sensitivity, and specificity at 0.9795 on a public pneumonia dataset.

...read moreread less

Abstract: Pneumonia is a global disease that causes high children mortality. The situation has even been worsening by the outbreak of the new coronavirus named COVID-19, which has killed more than 983,907 so far. People infected by the virus would show symptoms like fever and coughing as well as pneumonia as the infection progresses. Timely detection is a public consensus achieved that would benefit possible treatments and therefore contain the spread of COVID-19. X-ray, an expedient imaging technique, has been widely used for the detection of pneumonia caused by COVID-19 and some other virus. To facilitate the process of diagnosis of pneumonia, we developed a deep learning framework for a binary classification task that classifies chest X-ray images into normal and pneumonia based on our proposed CGNet. In our CGNet, there are three components including feature extraction, graph-based feature reconstruction and classification. We first use the transfer learning technique to train the state-of-the-art convolutional neural networks (CNNs) for binary classification while the trained CNNs are used to produce features for the following two components. Then, by deploying graph-based feature reconstruction, we, therefore, combine features through the graph to reconstruct features. Finally, a shallow neural network named GNet, a one layer graph neural network, which takes the combined features as the input, classifies chest X-ray images into normal and pneumonia. Our model achieved the best accuracy at 0.9872, sensitivity at 1 and specificity at 0.9795 on a public pneumonia dataset that includes 5,856 chest X-ray images. To evaluate the performance of our proposed method on detection of pneumonia caused by COVID-19, we also tested the proposed method on a public COVID-19 CT dataset, where we achieved the highest performance at the accuracy of 0.99, specificity at 1 and sensitivity at 0.98, respectively.

...read moreread less

Journal Article•DOI•

Characterizing the dissemination of misinformation on social media in health emergencies: An empirical study based on COVID-19

[...]

Cheng Zhou¹, Haoxin Xiu¹, Yuqiu Wang¹, Xinyao Yu¹•Institutions (1)

Nankai University¹

01 Jul 2021-Information Processing and Management

TL;DR: The empirical results show that health caution and advice, help seeking misinformation, and emotional support significantly increase the dissemination of misinformation, indicating both dark and bright misinformation ambiguity and richness.

...read moreread less

Abstract: The dissemination of misinformation in health emergencies poses serious threats to public health and increases health anxiety. To understand the underlying mechanism of the dissemination of misinformation regarding health emergencies, this study creatively draws on social support theory and text mining. It also explores the roles of different types of misinformation, including health advice and caution misinformation and health help-seeking misinformation, and emotional support in affecting individuals’ misinformation dissemination behavior on social media and whether such relationships are contingent on misinformation ambiguity and richness. The theoretical model is tested using 12,101 textual data about COVID-19 collected from Sina Weibo, a leading social media platform in China. The empirical results show that health caution and advice, help seeking misinformation, and emotional support significantly increase the dissemination of misinformation. Furthermore, when the level of ambiguity and richness regarding misinformation is high, the effect of health caution and advice misinformation is strengthened, whereas the effect of health help-seeking misinformation and emotional support is weakened, indicating both dark and bright misinformation ambiguity and richness. This study contributes to the literature on misinformation dissemination behavior on social media during health emergencies and social support theory and provides implications for practice.

...read moreread less

Journal Article•DOI•

PF-BTS: A Privacy-Aware Fog-enhanced Blockchain-assisted task scheduling

[...]

Hamza Baniata¹, Ahmad T. Anaqreh¹, Attila Kertesz¹•Institutions (1)

University of Szeged¹

01 Jan 2021-Information Processing and Management

TL;DR: An Ant Colony Optimization (ACO) algorithm in a Fog-enabled Blockchain-assisted scheduling model, namely PF-BTS is proposed, which allows the fog to process, manage, and perform the tasks to enhance latency measures and shows high privacy awareness and noticeable enhancement in execution time and network load.

...read moreread less

Abstract: In recent years, the deployment of Cloud Computing (CC) has become more popular both in research and industry applications, arising form various fields including e-health, manufacturing, logistics and social networking. This is due to the easiness of service deployment and data management, and the unlimited provision of virtual resources (VR). In simple scenarios, users/applications send computational or storage tasks to be executed in the cloud, by manually assigning those tasks to the available computational resources. In complex scenarios, such as a smart city applications, where there is a large number of tasks, VRs, or both, task scheduling is exposed as an NP-Hard problem. Consequently, it is preferred and more efficient in terms of time and effort, to use a task scheduling automation technique. As there are many automated scheduling solutions proposed, new possibilities arise with the advent of Fog Computing (FC) and Blockchain (BC) technologies. Accordingly, such automation techniques may help the quick, secure and efficient assignment of tasks to the available VRs. In this paper, we propose an Ant Colony Optimization (ACO) algorithm in a Fog-enabled Blockchain-assisted scheduling model, namely PF-BTS. The protocol and algorithms of PF-BTS exploit BC miners for generating efficient assignment of tasks to be performed in the cloud’s VRs using ACO, and award miner nodes for their contribution in generating the best schedule. In our proposal, PF-BTS further allows the fog to process, manage, and perform the tasks to enhance latency measures. While this processing and managing is taking place, the fog is enforced to respect the privacy of system components, and assure that data, location, identity, and usage information are not exposed. We evaluate and compare PF-BTS performance, with a recently proposed Blockchain-based task scheduling protocol, in a simulated environment. Our evaluation and experiments show high privacy awareness of PF-BTS, along with noticeable enhancement in execution time and network load.

...read moreread less

Journal Article•DOI•

Super efficiency SBM-DEA and neural network for performance evaluation

[...]

Kaiyang Zhong¹, Yifan Wang², Jiaming Pei³, Jiaming Pei⁴, Shimeng Tang², Zonglin Han - Show less +2 more•Institutions (4)

Southwestern University of Finance and Economics¹, Tianjin University², Taizhou University³, Illinois Institute of Technology⁴

01 Nov 2021-Information Processing and Management

TL;DR: Wang et al. as mentioned in this paper used a super-efficiency SBM model to construct the relative effective frontier, and then machine learning algorithms were used to construct a regression model and establish the absolute effective frontier.

...read moreread less

Abstract: The traditional data envelopment analysis (DEA) method used for performance evaluation has inherent problems such as being easily affected by statistical noise in data. Furthermore, when new evaluation units are added, the performance of all the original units must be re-measured, which restricts the evaluation efficiency. In this study, machine learning algorithms were applied to make up for the shortcomings of the data envelopment analysis method. First, a super-efficiency SBM model was used to construct the relative effective frontier, and then machine learning algorithms were used to construct a regression model and establish the absolute effective frontier. After 15 machine learning algorithms were compared, BPNN demonstrated the best performance, and a SuperSBM-DEA-BPNN model was eventually established. The new model has the following advantages: First, compared with the traditional data envelopment analysis method, the absolute effective frontier displays better evaluation; second, compared with the data envelopment analysis and neural network fusion outlined in the previous literature, the new model can better overcome the problems associated with data envelopment analysis, thereby improving the fusion efficiency. Taking the innovation efficiency evaluation of China's regional rural commercial banks for instance, the new model is proven to be more applicable and offers more effective management tools to improve efficiency. On the whole, the new model not only provides a stable performance evaluation tool but also facilitates comparison, which has good application significance for organizations.

...read moreread less

Journal Article•DOI•

Investigating gender fairness of recommendation algorithms in the music domain

[...]

Alessandro B. Melchiorre¹, Navid Rekabsaz¹, Emilia Parada-Cabaleiro¹, Stefan Brandl¹, Oleg Lesota¹, Markus Schedl¹ - Show less +2 more•Institutions (1)

Johannes Kepler University of Linz¹

01 Sep 2021-Information Processing and Management

TL;DR: A notion of fairness based on the performance gap of a RS between the users with different demographics is defined, and a variety of collaborative filtering algorithms are evaluated in terms of accuracy and beyond-accuracy metrics to explore the fairness in the RS results toward a specific gender group.

...read moreread less

Abstract: Although recommender systems (RSs) play a crucial role in our society, previous studies have revealed that the performance of RSs may considerably differ between groups of individuals with different characteristics or from different demographics. In this case, a RS is considered to be unfair when it does not perform equally well for different groups of users. Considering the importance of RSs in the distribution and consumption of musical content worldwide, a careful evaluation of fairness in the context of music RSs is crucial. To this end, we first introduce LFM-2b, a novel large-scale real-world dataset of music listening records, comprising a subset to investigate bias of RSs regarding users’ demographics. We then define a notion of fairness based on the performance gap of a RS between the users with different demographics, and evaluate a variety of collaborative filtering algorithms in terms of accuracy and beyond-accuracy metrics to explore the fairness in the RS results toward a specific gender group. We observe the existence of significant discrepancies (unfairness) between the performance of algorithms across male and female user groups. Based on these discrepancies, we explore to what extent recommender algorithms lead to intensifying the underlying population bias in the final results. We also study the effect of a resampling strategy, commonly used as debiasing method , which yields slight improvements in the fairness measures of various algorithms while maintaining their accuracy and beyond-accuracy performance.

...read moreread less

Journal Article•DOI•

A Comparative Study of Effective Approaches for Arabic Sentiment Analysis

[...]

Ibrahim Abu Farha¹, Walid Magdy², Walid Magdy¹•Institutions (2)

University of Edinburgh¹, The Turing Institute²

01 Mar 2021-Information Processing and Management

TL;DR: A comprehensive comparative study on the most effective approaches used for Arabic sentiment analysis, which re-implement most of the existing approaches and test their effectiveness on three of the most popular benchmark datasets for Arabic SA.

...read moreread less

Abstract: Sentiment analysis (SA) is a natural language processing (NLP) application that aims to analyse and identify sentiment within a piece of text. Arabic SA started to receive more attention in the last decade with many approaches showing some effectiveness for detecting sentiment on multiple datasets. While there have been some surveys summarising some of the approaches for Arabic SA in literature, most of these approaches are reported on different datasets, which makes it difficult to identify the most effective approaches among those. In addition, those approaches do not cover the recent advances in NLP that use transformers. This paper presents a comprehensive comparative study on the most effective approaches used for Arabic sentiment analysis. We re-implement most of the existing approaches for Arabic SA and test their effectiveness on three of the most popular benchmark datasets for Arabic SA. Further, we examine the use of transformer-based language models for Arabic SA and show their superior performance compared to the existing approaches, where the best model achieves F-score scores of 0.69, 0.76, and 0.92 on the SemEval, ASTD, and ArSAS benchmark datasets. We also apply an extensive analysis of the possible reasons for failures, which show the limitations of the existing annotated Arabic SA datasets, and the challenge of sarcasm that is prominent in Arabic dialects. Finally, we highlight the main gaps in Arabic sentiment analysis research and suggest the most in-need future research directions in this area.

...read moreread less

Journal Article•DOI•

A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach

[...]

Huiliang Zhao¹, Huiliang Zhao², Zhenghong Liu, Xuemei Yao², Qin Yang¹ - Show less +1 more•Institutions (2)

Guizhou University¹, Minzu University of China²

01 Sep 2021-Information Processing and Management

TL;DR: In this article, the authors proposed a new optimized Machine Learning (ML) algorithm called the Local Search Improvised Bat Algorithm based Elman Neural Network (LSIBA-ENN) for the sentiment analysis of online product reviews.

...read moreread less

Abstract: Recently, online shopping has turned into a mainstream means for users to purchase as well as consume with the upsurge development of Internet technology. User satisfaction can be improved effectively by doing Sentiment Analysis (SA) of a large quantity of user reviews on e-commerce platforms. It is still challenging to envisage the accurate sentiment polarities of the user reviews because of the changes in sequence length, textual order, along with complicated logic. This paper proposes a new optimized Machine Learning (ML) algorithm called the Local Search Improvised Bat Algorithm based Elman Neural Network (LSIBA-ENN) for the SA of online product reviews. The proposed work of SA encompasses ‘4’ major steps: i) Data Collection (DC), ii) preprocessing, iii) Features Extraction (FE) or Term Weighting (TW), Feature Selection (FS), and polarity or Sentiment Classifications (SC). Initially, the Web Scrapping Tool (WST) is utilized to extract the customer reviews of the products for which the data is gathered as of the E-commerce websites. Next, preprocessing is carried out on the web scrap extracted data. Those preprocessed data go through TW and FS for additional processing by means of Log Term Frequency-based Modified Inverse Class Frequency (LTF-MICF) and Hybrid Mutation based Earth Warm Algorithm (HMEWA). Lastly, the HM-EWA data is rendered to the LSIBA-ENN, which classifies the customer reviews’ sentiment as positive, negative, and neutral. For the performance analysis of the proposed and prevailing classifiers, ‘2’ yardstick datasets are taken. The outcomes exhibit that the LSIBA-ENN attains the best performance in SC when weighted against the existing top-notch algorithms. The observations of the reviewer are exact. The prevailing ENN proffers recall of 87.79 when utilizing the proposed LTF-MICF scheme, whereas ENN only achieve 83.55, 84.03, 85.48, and 86.04 of recall whilst utilizing W2V, TF, TF-IDF, and TF-DFS schemes respectively.

...read moreread less

Journal Article•DOI•

hOCBS: A privacy-preserving blockchain framework for healthcare data leveraging an on-chain and off-chain system design

[...]

Ken Miyachi¹, Tim K. Mackey•Institutions (1)

San Diego Supercomputer Center¹

01 May 2021-Information Processing and Management

TL;DR: In this article, a modular hybrid privacy-preserving framework leveraging off-chain and on-chain blockchain system design applied to three different reference models that illustrate how blockchain can enhance healthcare information management.

...read moreread less

Abstract: In the context of blockchain technology, “off-chain” refers to computation or data that is structurally external to the blockchain network. Off-Chain Blockchain Systems (OCBS) enable this information processing and management through distributed software architecture where the blockchain network interacts with off-chain resources. Hence, OCBS are a critical data governance component in the design of enterprise blockchain solutions, resulting in extensive research and development exploring the interplay between on-chain and off-chain storage and computation and efforts to evaluate their performance relative to other information management systems. Key features of OCBS’ are their ability to improve scalability, reduce data storage requirements, and enhance data privacy, all extremely critical issues to enable broader blockchain adoption. These OCBS features map well to the needs of the healthcare industry, particularly due to the need to manage various types of medical, consumer, and other health-related data. However, different types of health data are also subject to stringent regulatory, security and legal requirements, a key factor limiting blockchain adoption in the sector. In response, there is a critical need to better align OCBS design features to different types of healthcare data management and their respective governance and privacy regimes. This article first reviews the characteristics of different constructs of OCBS. It then proposes a modular hybrid privacy-preserving framework leveraging off-chain and on-chain blockchain system design applied to three different reference models that illustrate how blockchain can enhance healthcare information management. Through this privacy-preserving framework we hope to liberate healthcare data by enabling sharing, sovereignty and enhanced trust.

...read moreread less

Journal Article•DOI•

Propagation2Vec: Embedding partial propagation networks for explainable fake news early detection

[...]

Amila Silva¹, Yi Han¹, Ling Luo¹, Shanika Karunasekera¹, Christopher Leckie¹ - Show less +1 more•Institutions (1)

University of Melbourne¹

01 Sep 2021-Information Processing and Management

TL;DR: Propagation2Vec is proposed, a novel fake news early detection technique, which assigns varying levels of importance for the nodes and cascades in propagation networks, and reconstructs the knowledge of complete propagation networks based on their partial propagation networks at an early detection deadline.

...read moreread less

Abstract: Many recent studies have demonstrated that the propagation patterns of news on social media can facilitate the detection of fake news. Most of these studies rely on the complete propagation networks to build their model, which is not fully available in the early stages and may take a long time to complete. Hence, relying on the complete propagation network is not ideal for fake news early detection. However, detecting fake news as early as possible is important due to their fast-spreading nature and the significant harm they can cause. In addition, most existing propagation network-based fake news detection techniques are not explicitly designed to jointly emphasise informative cascades and nodes in the propagation networks to detect fake news. To bridge these research gaps, this work proposes Propagation2Vec, a novel fake news early detection technique, which assigns varying levels of importance for the nodes and cascades in propagation networks, and reconstructs the knowledge of complete propagation networks based on their partial propagation networks at an early detection deadline. Our experiments show that our model can achieve state-of-the-art performance while only having access to the early stage propagation networks. Furthermore, we devise general explanations for the underlying logic of Propagation2Vec based on its attention weights assigned to different nodes and cascades, which improves the applicability of our approach and facilitates future research on propagation network-based fake news detection.

...read moreread less

Journal Article•DOI•

Social media rumor refutation effectiveness: Evaluation, modelling and enhancement

[...]

Zongmin Li¹, Qi Zhang¹, Xinyu Du¹, Yanfang Ma², Shihang Wang³ - Show less +1 more•Institutions (3)

Sichuan University¹, Hebei University of Technology², Columbia University³

01 Jan 2021-Information Processing and Management

TL;DR: The LGBMRegressor has the best goodness-of-fit among the compared regression models and decision making suggestions for rumor refutation platforms on how to organize rumors refutation microblogs under different situations such as rumor category, author’s influence and heat of topics are proposed.

...read moreread less

Abstract: Motivated by the practical needs of enhancing social media rumor refutation effectiveness, this paper is dedicated to develop a proper rumor refutation effectiveness index ( R E I ), identify key factors influencing R E I and propose decision making suggestions for rumor refutation platforms. 298,118 pieces of comments and 185,209 pieces of the reposters’ verification status of 248 rumor refutation microblogs on Sina Weibo (the Chinese equivalent of Twitter) are collected during a 1-year period using a web crawler. To extract the text characteristics and analyze the sentiment of the rumor refutation microblogs, Natural Language Processing (NLP) approaches are applied. To explore the relationship between R E I and the content and contextual factors of the rumor refutation microblogs, four regression models based on the collected data are established, namely linear regression model, Support Vector regression model (SVR), Extreme Gradient Boosting regression model (XGBoostRegressor) and Light Gradient Boosting Machine regression model (LGBMRegressor). The LGBMRegressor has the best goodness-of-fit among the compared regression models. Then, SHapley Additive exPlanations (SHAP) is employed to visualize and explain the LGBMRegressor results. Decision making suggestions for rumor refutation platforms on how to organize rumor refutation microblogs under different situations such as rumor category, author’s influence and heat of topics are proposed.

...read moreread less

Journal Article•DOI•

The causes, impacts and countermeasures of COVID-19 “Infodemic”: A systematic review using narrative synthesis

[...]

Wenjing Pian¹, Wenjing Pian², Jianxing Chi³, Jianxing Chi⁴, Feicheng Ma⁴ - Show less +1 more•Institutions (4)

Fujian Medical University¹, Fuzhou University², Fujian Normal University³, Wuhan University⁴

01 Nov 2021-Information Processing and Management

TL;DR: In this article, a systematic literature search following the PRISMA guideline covering 12 scholarly databases was conducted to retrieve various types of peer-reviewed articles that reported causes, impacts, or countermeasures of the COVID-19 infodemic.

...read moreread less

Abstract: An unprecedented infodemic has been witnessed to create massive damage to human society. However, it was not thoroughly investigated. This systematic review aims to (1) synthesize the existing literature on the causes and impacts of COVID-19 infodemic; (2) summarize the proposed strategies to fight with COVID-19 infodemic; and (3) identify the directions for future research. A systematic literature search following the PRISMA guideline covering 12 scholarly databases was conducted to retrieve various types of peer-reviewed articles that reported causes, impacts, or countermeasures of the infodemic. Empirical studies were assessed for risk of bias using the Mixed-Methods Appraisal Tool. A coding theme was iteratively developed to categorize the causes, impacts, and countermeasures found from the included studies. Social media usage, low level of health/eHealth literacy, and fast publication process and preprint service are identified as the major causes of the infodemic. Besides, the vicious circle of human rumor-spreading behavior and the psychological issues from the public (e.g., anxiety, distress, fear) emerges as the characteristic of the infodemic. Comprehensive lists of countermeasures are summarized from different perspectives, among which risk communication and consumer health information need/seeking are of particular importance. Theoretical and practical implications are discussed and future research directions are suggested.

...read moreread less

Journal Article•DOI•

Convolutional neural network with margin loss for fake news detection

[...]

Mohammad Hadi Goldani¹, Reza Safabakhsh¹, Saeedeh Momtazi¹•Institutions (1)

Amirkabir University of Technology¹

01 Jan 2021-Information Processing and Management

TL;DR: Convolutional Neural Networks (CNN) with margin loss and different embedding models proposed for detecting fake news are presented and their proposed architectures are evaluated on two recent well-known datasets in the field, namely ISOT and LIAR.

...read moreread less

Abstract: The advent of online news platforms such as social media, news blogs, and online newspapers in recent years and their facilitated features such as swift information flow, easy access, and low costs encourage people to seek and raise their information by consuming their provided news. Furthermore, these platforms increase the opportunities for deceiver parties to influence public opinion and awareness by producing fake news, i.e., the news which consists of false and deceptive information and is published for achieving specific political and economic gains. Since the discerning of fake news through their contents by individuals is very difficult, the existence of an automatic fake news detection approach for preventing the spread of such false information is mandatory. In this paper, Convolutional Neural Networks (CNN) with margin loss and different embedding models proposed for detecting fake news. We compare static word embeddings with the non-static embeddings that provide the possibility of incrementally up-training and updating word embedding in the training phase. Our proposed architectures are evaluated on two recent well-known datasets in the field, namely ISOT and LIAR. Our results on the best architecture show encouraging performance, outperforming the state-of-the-art methods by 7.9% on ISOT and 2.1% on the test set of the LIAR dataset.

...read moreread less

Journal Article•DOI•

Accuracy-diversity trade-off in recommender systems via graph convolutions

[...]

Elvin Isufi¹, Matteo Pocchiari¹, Alan Hanjalic¹•Institutions (1)

Delft University of Technology¹

01 Mar 2021-Information Processing and Management

TL;DR: A model that learns from a nearest neighbor and a furthest neighbor graph via a joint convolutional model to establish a novel accuracy-diversity trade-off for recommender systems is developed, showing diversity gains up to seven times by trading as little as 1\% in accuracy.

...read moreread less

Abstract: Graph convolutions, in both their linear and neural network forms, have reached state-of-the-art accuracy on recommender system (RecSys) benchmarks. However, recommendation accuracy is tied with diversity in a delicate trade-off and the potential of graph convolutions to improve the latter is unexplored. Here, we develop a model that learns joint convolutional representations from a nearest neighbor and a furthest neighbor graph to establish a novel accuracy-diversity trade-off for recommender systems. The nearest neighbor graph connects entities (users or items) based on their similarities and is responsible for improving accuracy, while the furthest neighbor graph connects entities based on their dissimilarities and is responsible for diversifying recommendations. The information between the two convolutional modules is balanced already in the training phase through a regularizer inspired by multi-kernel learning. We evaluate the joint convolutional model on three benchmark datasets with different degrees of sparsity. The proposed method can either trade accuracy to improve substantially the catalog coverage or the diversity within the list; or improve both by a lesser amount. Compared with accuracy-oriented graph convolutional approaches, the proposed model shows diversity gains up to seven times by trading as little as 1% in accuracy. Compared with alternative accuracy-diversity trade-off solutions, the joint graph convolutional model retains the highest accuracy while offering a handle to increase diversity. To our knowledge, this is the first work proposing an accuracy-diversity trade-off with graph convolutions and opens the doors to learning over graphs approaches for improving such trade-off.

...read moreread less

Journal Article•DOI•

Mining product innovation ideas from online reviews

[...]

Min Zhang¹, Brandon Fan², Ning Zhang³, Wenjun Wang⁴, Weiguo Fan¹ - Show less +1 more•Institutions (4)

University of Iowa¹, University of Michigan², Qingdao University³, University of Arkansas at Little Rock⁴

01 Jan 2021-Information Processing and Management

TL;DR: A novel ensemble embedding method is developed to generate semantic and contextual representations of the words in review sentences that contain innovation ideas from online reviews that are then used in a long short-term memory (LSTM) model for innovation-sentence identification.

...read moreread less

Abstract: The importance of online customer reviews to product innovation has been well-recognized in prior literature. Mining online reviews has received extensive attention and efforts. Most existing research on mining online reviews focus on issues such as the impact of reviews on sales, helpfulness of reviews, and customers’ participation in reviews. Few research studies, however, seek to identify and extract innovation ideas for products from online reviews. This type of information is particularly important for product functionality improvement and new feature development from a manufacturer's perspective. Mining product innovation ideas allows a manufacturer to proactively review customer opinion and unlock insights about new functionality and features that the market expects, in order to gain a competitive advantage. In this paper, we propose a deep learning-based approach to identify sentences that contain innovation ideas from online reviews. Specifically, we develop a novel ensemble embedding method to generate semantic and contextual representations of the words in review sentences. The resultant representations in each sentence are then used in a long short-term memory (LSTM) model for innovation-sentence identification. Moreover, we adopt a focal loss function in our model to address the class imbalance problem. We validate our approach with a dataset of 10,000 customer reviews from Amazon. Our model achieves an AUC score of 0.91 and an F1 score of 0.89, outperforming a set of state-of-the-art baseline models in the comparison. Our approach can be extended and applied to many other information extraction tasks.

...read moreread less

Collapse