Showing papers in &quot;Information Processing and Management in 2020&quot;

A Deep Look into neural ranking models for information retrieval

TL;DR: A comprehensive overview of the finding to date relating to fake news is presented, characterized the negative impact of online fake news, and the state-of-the-art in detection methods are characterized.

...read moreread less

Abstract: Over the recent years, the growth of online social media has greatly facilitated the way people communicate with each other. Users of online social media share information, connect with other people and stay informed about trending events. However, much recent information appearing on social media is dubious and, in some cases, intended to mislead. Such content is often called fake news. Large amounts of online fake news has the potential to cause serious problems in society. Many point to the 2016 U.S. presidential election campaign as having been influenced by fake news. Subsequent to this election, the term has entered the mainstream vernacular. Moreover it has drawn the attention of industry and academia, seeking to understand its origins, distribution and effects. Of critical interest is the ability to detect when online content is untrue and intended to mislead. This is technically challenging for several reasons. Using social media tools, content is easily generated and quickly spread, leading to a large volume of content to analyse. Online information is very diverse, covering a large number of subjects, which contributes complexity to this task. The truth and intent of any statement often cannot be assessed by computers alone, so efforts must depend on collaboration between humans and technology. For instance, some content that is deemed by experts of being false and intended to mislead are available. While these sources are in limited supply, they can form a basis for such a shared effort. In this survey, we present a comprehensive overview of the finding to date relating to fake news. We characterize the negative impact of online fake news, and the state-of-the-art in detection methods. Many of these rely on identifying features of the users, content, and context that indicate misinformation. We also study existing datasets that have been used for classifying fake news. Finally, we propose promising research directions for online fake news analysis.

...read moreread less

449 citations

Journal Article•DOI•

[...]

Jiafeng Guo¹, Yixing Fan¹, Liang Pang¹, Liu Yang², Qingyao Ai², Hamed Zamani², Chen Wu¹, W. Bruce Croft², Xueqi Cheng¹ - Show less +5 more•Institutions (2)

Chinese Academy of Sciences¹, University of Massachusetts Amherst²

Arabic text classification using deep learning models

TL;DR: A deep look into the neural ranking models from different dimensions is taken to analyze their underlying assumptions, major design principles, and learning strategies to obtain a comprehensive empirical understanding of the existing techniques.

...read moreread less

Abstract: Ranking models lie at the heart of research on information retrieval (IR). During the past decades, different techniques have been proposed for constructing ranking models, from traditional heuristic methods, probabilistic methods, to modern machine learning methods. Recently, with the advance of deep learning technology, we have witnessed a growing body of work in applying shallow or deep neural networks to the ranking problem in IR, referred to as neural ranking models in this paper. The power of neural ranking models lies in the ability to learn from the raw text inputs for the ranking problem to avoid many limitations of hand-crafted features. Neural networks have sufficient capacity to model complicated tasks, which is needed to handle the complexity of relevance estimation in ranking. Since there have been a large variety of neural ranking models proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we will take a deep look into the neural ranking models from different dimensions to analyze their underlying assumptions, major design principles, and learning strategies. We compare these models through benchmark tasks to obtain a comprehensive empirical understanding of the existing techniques. We will also discuss what is missing in the current literature and what are the promising and desired future directions.

...read moreread less

239 citations

Journal Article•DOI•

[...]

Ashraf Elnagar¹, Ridhwan Al-Debsi¹, Omar Einea¹•Institutions (1)

University of Sharjah¹

An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit

TL;DR: This work introduces new rich and unbiased datasets for both the single-label (SANAD) as well as the multi- label (NADiA) Arabic text categorization tasks and presents an extensive comparison of several deep learning models for Arabic text classification.

...read moreread less

Abstract: Text classification or categorization is the process of automatically tagging a textual document with most relevant labels or categories. When the number of labels is restricted to one, the task becomes single-label text categorization. However, the multi-label version is challenging. For Arabic language, both tasks (especially the latter one) become more challenging in the absence of large and free Arabic rich and rational datasets. Therefore, we introduce new rich and unbiased datasets for both the single-label (SANAD) as well as the multi-label (NADiA) Arabic text categorization tasks. Both corpora are made freely available to the research community on Arabic computational linguistics. Further, we present an extensive comparison of several deep learning (DL) models for Arabic text categorization in order to evaluate the effectiveness of such models on SANAD and NADiA. A unique characteristic of our proposed work, when compared to existing ones, is that it does not require a pre-processing phase and fully based on deep learning models. Besides, we studied the impact of utilizing word2vec embedding models to improve the performance of the classification tasks. Our experimental results showed solid performance of all models on SANAD corpus with a minimum accuracy of 91.18%, achieved by convolutional-GRU, and top performance of 96.94%, achieved by attention-GRU. As for NADiA, attention-GRU achieved the highest overall accuracy of 88.68% for a maximum subsets of 10 categories on “Masrawy” dataset.

...read moreread less

152 citations

Journal Article•DOI•

[...]

Stephan A. Curiskis¹, Barry James Drake¹, Thomas R. Osborn¹, Paul J. Kennedy¹•Institutions (1)

University of Technology, Sydney¹

Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong

TL;DR: This study evaluates several techniques for document clustering and topic modelling on three datasets from Twitter and Reddit, and shows that clustering techniques applied to neural embedding feature representations delivered the best performance over all data sets using appropriate extrinsic evaluation measures.

...read moreread less

Abstract: Methods for document clustering and topic modelling in online social networks (OSNs) offer a means of categorising, annotating and making sense of large volumes of user generated content. Many techniques have been developed over the years, ranging from text mining and clustering methods to latent topic models and neural embedding approaches. However, many of these methods deliver poor results when applied to OSN data as such text is notoriously short and noisy, and often results are not comparable across studies. In this study we evaluate several techniques for document clustering and topic modelling on three datasets from Twitter and Reddit. We benchmark four different feature representations derived from term-frequency inverse-document-frequency (tf-idf) matrices and word embedding models combined with four clustering methods, and we include a Latent Dirichlet Allocation topic model for comparison. Several different evaluation measures are used in the literature, so we provide a discussion and recommendation for the most appropriate extrinsic measures for this task. We also demonstrate the performance of the methods over data sets with different document lengths. Our results show that clustering techniques applied to neural embedding feature representations delivered the best performance over all data sets using appropriate extrinsic evaluation measures. We also demonstrate a method for interpreting the clusters with a top-words based approach using tf-idf weights combined with embedding distance measures.

...read moreread less

149 citations

Journal Article•DOI•

[...]

Xiaodong Li¹, Pangjing Wu¹, Wenpeng Wang¹•Institutions (1)

Hohai University¹

Detecting breaking news rumors of emerging topics in social media

TL;DR: This paper builds up a stock prediction system and proposes an approach that represents numerical price data by technical indicators via technical analysis, and represents textual news articles by sentiment vectors via sentiment analysis, which outperforms the baselines in both validation and test sets using two different evaluation metrics.

...read moreread less

Abstract: Stock prediction via market data analysis is an attractive research topic. Both stock prices and news articles have been employed in the prediction processes. However, how to combine technical indicators from stock prices and news sentiments from textual news articles, and make the prediction model be able to learn sequential information within time series in an intelligent way, is still an unsolved problem. In this paper, we build up a stock prediction system and propose an approach that 1) represents numerical price data by technical indicators via technical analysis, and represents textual news articles by sentiment vectors via sentiment analysis, 2) setup a layered deep learning model to learn the sequential information within market snapshot series which is constructed by the technical indicators and news sentiments, 3) setup a fully connected neural network to make stock predictions. Experiments have been conducted on more than five years of Hong Kong Stock Exchange data using four different sentiment dictionaries, and results show that 1) the proposed approach outperforms the baselines in both validation and test sets using two different evaluation metrics, 2) models incorporating prices and news sentiments outperform models that only use either technical indicators or news sentiments, in both individual stock level and sector level, 3) among the four sentiment dictionaries, finance domain-specific sentiment dictionary (Loughran–McDonald Financial Dictionary) models the news sentiments better, which brings more prediction performance improvements than the other three dictionaries.

...read moreread less

146 citations

Journal Article•DOI•

[...]

Sarah A. Alkhodair¹, Steven H. H. Ding², Benjamin C. M. Fung², Junqiang Liu³•Institutions (3)

Concordia University¹, McGill University², Zhejiang Gongshang University³

The effect of the perceived risk on the adoption of the sharing economy in the tourism industry: The case of Airbnb

TL;DR: A new approach that jointly learns word embeddings and trains a recurrent neural network with two different objectives to automatically identify rumors is proposed that outperforms state-of-the-art methods in terms of precision, recall, and F1.

...read moreread less

Abstract: Users of social media websites tend to rapidly spread breaking news and trending stories without considering their truthfulness. This facilitates the spread of rumors through social networks. A rumor is a story or statement for which truthfulness has not been verified. Efficiently detecting and acting upon rumors throughout social networks is of high importance to minimizing their harmful effect. However, detecting them is not a trivial task. They belong to unseen topics or events that are not covered in the training dataset. In this paper, we study the problem of detecting breaking news rumors, instead of long-lasting rumors, that spread in social media. We propose a new approach that jointly learns word embeddings and trains a recurrent neural network with two different objectives to automatically identify rumors. The proposed strategy is simple but effective to mitigate the topic shift issues. Emerging rumors do not have to be false at the time of the detection. They can be deemed later to be true or false. However, most previous studies on rumor detection focus on long-standing rumors and assume that rumors are always false. In contrast, our experiment simulates a cross-topic emerging rumor detection scenario with a real-life rumor dataset. Experimental results suggest that our proposed model outperforms state-of-the-art methods in terms of precision, recall, and F1.

...read moreread less

136 citations

Journal Article•DOI•

[...]

Jisu Yi¹, Gao Yuan², Changsok Yoo²•Institutions (2)

Seoul National University¹, Kyung Hee University²

Social media overload, exhaustion, and use discontinuance: Examining the effects of information overload, system feature overload, and social overload

TL;DR: Results of structural equation modeling applied to 300 potential customers indicate that privacy and financial risks negatively affect the intention to use the sharing economy, however, physical and performance risks are positively related with behavioral intention or desire.

...read moreread less

Abstract: Smart tourism and the sharing economy within it are transforming human lives and are considered a huge innovation in the industry. This change inevitably creates huge resistance, which did not obtain much attention. Thus, this study focuses on sharing economy's risk aspects, which have become a social issue. It investigates how risks affect the development and diffusion of the sharing economy, especially in Airbnb. This study adopts extended model of goal-directed behavior and depicts the decision-making process of potential Airbnb users to analyze risk effect. Results of structural equation modeling applied to 300 potential customers indicate that privacy and financial risks negatively affect the intention to use the sharing economy. However, physical and performance risks are positively related with behavioral intention or desire. This risk paradox can be explained by the disruptive innovation of the sharing economy and the characteristics of risk engagement in tourism. Implications for research and practice are discussed along with the findings of the study.

...read moreread less

127 citations

Journal Article•DOI•

[...]

Shaoxiong Fu¹, Hongxiu Li, Yong Liu², Henri Pirkkalainen, Markus Salo³ - Show less +1 more•Institutions (3)

Wuhan University¹, Aalto University², University of Jyväskylä³

Graph neural news recommendation with long-term and short-term interest modeling

TL;DR: This study employed the stressor–strain–outcome (SSO) framework to explain social media discontinuance behaviors from an overload perspective and indicated that the three types of overload are interconnected through system feature overload.

...read moreread less

Abstract: While users’ discontinuance of use has posed a challenge for social media in recent years, there is a paucity of knowledge on the relationships between different dimensions of overload and how overload adversely affects users’ social media discontinuance behaviors. To address this knowledge gap, this study employed the stressor–strain–outcome (SSO) framework to explain social media discontinuance behaviors from an overload perspective. It also conceptualized social media overload as a multidimensional construct consisting of system feature overload, information overload, and social overload. The proposed research model was empirically validated via 412 valid questionnaire responses collected from Facebook users. Our results indicated that the three types of overload are interconnected through system feature overload. System feature overload, information overload, and social overload engender user exhaustion, which in turn leads to users’ discontinued usage of social media. This study extends current technostress research by demonstrating the value of the SSO perspective in explaining users’ social media discontinuance.

...read moreread less

126 citations

Journal Article•DOI•

[...]

Linmei Hu¹, Chen Li¹, Chuan Shi¹, Cheng Yang¹, Chao Shao² - Show less +1 more•Institutions (2)

Beijing University of Posts and Telecommunications¹, Alibaba Group²

Blockchain-based public auditing for big data in cloud storage

TL;DR: This paper proposes to build a heterogeneous graph to explicitly model the interactions among users, news and latent topics and shows that the proposed model significantly outperforms state-of-the-art methods on news recommendation.

...read moreread less

Abstract: With the information explosion of news articles, personalized news recommendation has become important for users to quickly find news that they are interested in. Existing methods on news recommendation mainly include collaborative filtering methods which rely on direct user-item interactions and content based methods which characterize the content of user reading history. Although these methods have achieved good performances, they still suffer from data sparse problem, since most of them fail to extensively exploit high-order structure information (similar users tend to read similar news articles) in news recommendation systems. In this paper, we propose to build a heterogeneous graph to explicitly model the interactions among users, news and latent topics. The incorporated topic information would help indicate a user’s interest and alleviate the sparsity of user-item interactions. Then we take advantage of graph neural networks to learn user and news representations that encode high-order structure information by propagating embeddings over the graph. The learned user embeddings with complete historic user clicks capture the users’ long-term interests. We also consider a user’s short-term interest using the recent reading history with an attention based LSTM model. Experimental results on real-world datasets show that our proposed model significantly outperforms state-of-the-art methods on news recommendation.

...read moreread less

124 citations

Journal Article•DOI•

[...]

Jiaxing Li¹, Jigang Wu¹, Guiyuan Jiang², Thambipillai Srikanthan²•Institutions (2)

Guangdong University of Technology¹, Nanyang Technological University²

Automatic identification of eyewitness messages on twitter during disasters

TL;DR: Blockchain technique is utilized to develop a novel public auditing scheme for verifying data integrity in cloud storage, different from the existing works that involve three participatory entities, and shows that the proposed scheme can defend against malicious entities and the 51% attack.

...read moreread less

Abstract: Cloud storage enables applications to efficiently manage their remote data but facing the risk of being tampered with. This paper utilizes blockchain technique to develop a novel public auditing scheme for verifying data integrity in cloud storage. In the proposed scheme, different from the existing works that involve three participatory entities, only two predefined entities (i.e. data owner and cloud service provider) who may not trust each other are involved, and the third party auditor for data auditing is removed. Specifically, data owners store the lightweight verification tags on the blockchain and generate a proof by constructing the Merkle Hash Tree using the hashtags to reduce the overhead of computation and communication for integrity verification. Besides, this work is able to achieve 100% confidence of auditing theoretically, as the hashtag of each data block is utilized to build the Merkle Hash Tree for the data integrity verification. Security analysis shows that the proposed scheme can defend against malicious entities and the 51% attack. Experimental results demonstrate the significant improvements on computation and communication.

...read moreread less

123 citations

Journal Article•DOI•

[...]

Kiran Zahra¹, Muhammad Imran², Frank O. Ostermann•Institutions (2)

University of Zurich¹, Qatar Computing Research Institute²

Blockchain-based privacy-preserving remote data integrity checking scheme for IoT information systems

TL;DR: This work investigates different types of sources on tweets related to eyewitnesses and classifies them into three types, observing that words related to perceptual senses tend to be present in direct eyewitness messages, whereas emotions, thoughts, and prayers are more common in indirect witnesses.

...read moreread less

Abstract: Social media platforms such as Twitter provide convenient ways to share and consume important information during disasters and emergencies. Information from bystanders and eyewitnesses can be useful for law enforcement agencies and humanitarian organizations to get firsthand and credible information about an ongoing situation to gain situational awareness among other potential uses. However, the identification of eyewitness reports on Twitter is a challenging task. This work investigates different types of sources on tweets related to eyewitnesses and classifies them into three types (i) direct eyewitnesses, (ii) indirect eyewitnesses, and (iii) vulnerable eyewitnesses. Moreover, we investigate various characteristics associated with each kind of eyewitness type. We observe that words related to perceptual senses (feeling, seeing, hearing) tend to be present in direct eyewitness messages, whereas emotions, thoughts, and prayers are more common in indirect witnesses. We use these characteristics and labeled data to train several machine learning classifiers. Our results performed on several real-world Twitter datasets reveal that textual features (bag-of-words) when combined with domain-expert features achieve better classification performance. Our approach contributes a successful example for combining crowdsourced and machine learning analysis, and increases our understanding and capability of identifying valuable eyewitness reports during disasters.

...read moreread less

Journal Article•DOI•

[...]

Quanyu Zhao¹, Siyi Chen¹, Zheli Liu², Thar Baker³, Yuan Zhang¹ - Show less +1 more•Institutions (3)

Nanjing University¹, Nankai University², Liverpool John Moores University³

An incentive-aware blockchain-based solution for internet of fake media things

TL;DR: The blockchain is utilized to construct a novel privacy-preserving remote data integrity checking scheme for Internet of Things (IoT) information management systems without involving trusted third parties.

...read moreread less

Abstract: Remote data integrity checking is of great importance to the security of cloud-based information systems. Previous works generally assume a trusted third party to oversee the integrity of the outsourced data, which may be invalid in practice. In this paper, we utilize the blockchain to construct a novel privacy-preserving remote data integrity checking scheme for Internet of Things (IoT) information management systems without involving trusted third parties. Our scheme leverages the Lifted EC-ElGamal cryptosystem, bilinear pairing, and blockchain to support efficient public batch signature verifications and protect the security and data privacy of the IoT systems. The results of the experiment demonstrate the efficiency of our scheme.

...read moreread less

Journal Article•DOI•

[...]

Qian Chen¹, Gautam Srivastava², Gautam Srivastava¹, Reza M. Parizi³, Moayad Aloqaily⁴, Ismaeel Al Ridhawi - Show less +2 more•Institutions (4)

Brandon University¹, China Medical University (Taiwan)², Kennesaw State University³, Al Ain University of Science and Technology⁴

Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition

TL;DR: This article proposes a preventative approach using a novel blockchain-based solution suited for IoFMT incorporated with a gamification component, and uses concepts of a customized Proof-of-Authority consensus algorithm, along with a weighted-ranking algorithm, serving as an incentive mechanism in the gamifying component to determine the integrity of fake news.

...read moreread less

Abstract: The concept of Fake Media or Internet of Fake Media Things (IoFMT) has emerged in different domains of digital society such as politics, news, and social media. Due to the integrity of the media being compromised quite frequently, revolutionary changes must be taken to avoid further and more widespread IoFMT. With today’s advancements in Artificial Intelligence (AI) and Deep Learning (DL), such compromises may be profoundly limited. Providing proof of authenticity to outline the authorship and integrity for digital content has been a pressing need. Blockchain, a promising new decentralized secure platform, has been advocated to help combat the authenticity aspect of fake media in a context where resistance to the modification of data is important. Although some methods around blockchain have been proposed to take on authentication problems, most current studies are built on unrealistic assumptions with the after-the-incident type of mechanisms. In this article, we propose a preventative approach using a novel blockchain-based solution suited for IoFMT incorporated with a gamification component. More specifically, the proposed approach uses concepts of a customized Proof-of-Authority consensus algorithm, along with a weighted-ranking algorithm, serving as an incentive mechanism in the gamification component to determine the integrity of fake news. Although our approach focuses on fake news, the framework could be very well extended for other types of digital content as well. A proof of concept implementation is developed to outline the advantage of the proposed solution.

...read moreread less

Journal Article•DOI•

[...]

Chao Li¹, Zhongtian Bao¹, Linhao Li², Ziping Zhao¹•Institutions (2)

Tianjin Normal University¹, Hebei University of Technology²

Detection of Bots in Social Media: A Systematic Review

TL;DR: A multimodal attention-based BLSTM network framework for efficient emotion recognition by utilizing Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks and Decision level fusion strategy to predict the final emotion.

...read moreread less

Abstract: Emotional recognition contributes to automatically perceive the user’s emotional response to multimedia content through implicit annotation, which further benefits establishing effective user-centric services. Physiological-based ways have increasingly attract researcher’s attention because of their objectiveness on emotion representation. Conventional approaches to solve emotion recognition have mostly focused on the extraction of different kinds of hand-crafted features. However, hand-crafted feature always requires domain knowledge for the specific task, and designing the proper features may be more time consuming. Therefore, exploring the most effective physiological-based temporal feature representation for emotion recognition becomes the core problem of most works. In this paper, we proposed a multimodal attention-based BLSTM network framework for efficient emotion recognition. Firstly, raw physiological signals from each channel are transformed to spectrogram image for capturing their time and frequency information. Secondly, Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) are utilized to automatically learn the best temporal features. The learned deep features are then fed into a deep neural network (DNN) to predict the probability of emotional output for each channel. Finally, decision level fusion strategy is utilized to predict the final emotion. The experimental results on AMIGOS dataset show that our method outperforms other state of art methods.

...read moreread less

Journal Article•DOI•

[...]

Mariam Orabi¹, Djedjiga Mouheb¹, Zaher Al Aghbari¹, Ibrahim Kamel¹•Institutions (1)

University of Sharjah¹

01 Jul 2020-Information Processing and Management

TL;DR: This paper is the first systematic review based on a predefined search strategy of literature concerned about social media bots detection methods, published between 2010 and 2019, and includes a refined taxonomy of detection methods.

...read moreread less

Abstract: Social media bots (automated accounts) attacks are organized crimes that pose potential threats to public opinion, democracy, public health, stock market and other disciplines. While researchers are building many models to detect social media bot accounts, attackers, on the other hand, evolve their bots to evade detection. This everlasting cat and mouse game makes this field vibrant and demands continuous development. To guide and enhance future solutions, this work provides an overview of social media bots attacks, current detection methods and challenges in this area. To the best of our knowledge, this paper is the first systematic review based on a predefined search strategy, which includes literature concerned about social media bots detection methods, published between 2010 and 2019. The results of this review include a refined taxonomy of detection methods, a highlight of the techniques used to detect bots in social media and a comparison between current detection methods. Some of the gaps identified by this work are: the literature mostly focus on Twitter platform only and rarely use methods other than supervised machine learning, most of the public datasets are not accurate or large enough, integrated systems and real-time detection are required, and efforts to spread awareness are needed to arm legitimate users with knowledge.

...read moreread less

Journal Article•DOI•

Misogyny Detection in Twitter: a Multilingual and Cross-Domain Study

[...]

Endang Wahyu Pamungkas¹, Valerio Basile¹, Viviana Patti¹•Institutions (1)

University of Turin¹

E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework

TL;DR: It is concluded that misogyny is quite a specific kind of abusive language, while the experimentally found that it is different from sexism, which is worth to be explored in further investigation.

...read moreread less

Abstract: The freedom of expression given by social media has a dark side: the growing proliferation of abusive contents on these platforms. Misogynistic speech is a kind of abusive language, which can be simplified as hate speech targeting women, and it is becoming a more and more relevant issue in recent years. AMI IberEval 2018 and AMI EVALITA 2018 were two shared tasks which mainly focused on tackling the problem of misogyny in Twitter, in three different languages, namely English, Italian, and Spanish. In this paper, we present an in-depth study on the phenomena of misogyny in those three languages, by focusing on three main objectives. Firstly, we investigate the most important features to detect misogyny and the issues which contribute to the difficulty of misogyny detection, by proposing a novel system and conducting a broad evaluation on this task. Secondly, we study the relationship between misogyny and other abusive language phenomena, by conducting a series of cross-domain classification experiments. Finally, we explore the feasibility of detecting misogyny in a multilingual environment, by carrying out cross-lingual classification experiments. Our system succeeded to outperform all state of the art systems in all benchmark AMI datasets both subtask A and subtask B. Moreover, intriguing insights emerged from error analysis, in particular about the interaction between different but related abusive phenomena. Based on our cross-domain experiment, we conclude that misogyny is quite a specific kind of abusive language, while we experimentally found that it is different from sexism. Lastly, our cross-lingual experiments show promising results. Our proposed joint-learning architecture obtained a robust performance across languages, worth to be explored in further investigation.

...read moreread less

Journal Article•DOI•

[...]

Feng Xu¹, Zhenchun Pan², Rui Xia²•Institutions (2)

Nanjing University of Finance and Economics¹, Nanjing University of Science and Technology²

Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data

TL;DR: Experimental results on the Amazon product and movie review sentiment datasets show that the proposed continuous naive Bayes learning framework can use the knowledge learned from past domains to guide learning in new domains, and has a better capacity of dealing with reviews that are continuously updated and come from different domains.

...read moreread less

Abstract: Although statistical learning methods have achieved success in e-commerce platform product review sentiment classification, two problems have limited its practical application: 1) The computational efficiency to process large-scale reviews; 2) the ability to continuously learn from increasing reviews and multiple domains. This paper presents a continuous naive Bayes learning framework for large-scale and multi-domain e-commerce platform product review sentiment classification. While keeping the high computational efficiency of the traditional naive Bayes model, we extend the parameter estimation mechanism in naive Bayes to a continuous learning style. We furthermore propose ways to fine-tune the learned distribution based on three kinds of assumptions to better adapt to different domains. Experimental results on the Amazon product and movie review sentiment datasets show that our model can use the knowledge learned from past domains to guide learning in new domains, and has a better capacity of dealing with reviews that are continuously updated and come from different domains.

...read moreread less

Journal Article•DOI•

[...]

Akshi Kumar¹, Kathiravan Srinivasan², Wen-Huang Cheng³, Albert Y. Zomaya⁴•Institutions (4)

Delhi Technological University¹, VIT University², National Chiao Tung University³, University of Sydney⁴

MGAT: Multimodal Graph Attention Network for Recommendation

TL;DR: A hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data that reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual and visual systems.

...read moreread less

Abstract: Detecting sentiments in natural language is tricky even for humans, making its automated detection more complicated. This research proffers a hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data. It reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual (written text) and visual (still images) and their combination within the online content using decision level multimodal fusion. The proposed contextual ConvNet-SVMBoVW model, has four modules, namely, the discretization, text analytics, image analytics, and decision module. The input to the model is multimodal text, m e {text, image, info-graphic}. The discretization module uses Google Lens to separate the text from the image, which is then processed as discrete entities and sent to the respective text analytics and image analytics modules. Text analytics module determines the sentiment using a hybrid of a convolution neural network (ConvNet) enriched with the contextual semantics of SentiCircle. An aggregation scheme is introduced to compute the hybrid polarity. A support vector machine (SVM) classifier trained using bag-of-visual-words (BoVW) for predicting the visual content sentiment. A Boolean decision module with a logical OR operation is augmented to the architecture which validates and categorizes the output on the basis of five fine-grained sentiment categories (truth values), namely ‘highly positive,’ ‘positive,’ ‘neutral,’ ‘negative’ and ‘highly negative.’ The accuracy achieved by the proposed model is nearly 91% which is an improvement over the accuracy obtained by the text and image modules individually.

...read moreread less

Journal Article•DOI•

[...]

Zhulin Tao¹, Yinwei Wei, Xiang Wang², Xiangnan He³, Xianglin Huang¹, Tat-Seng Chua² - Show less +2 more•Institutions (3)

Communication University of China¹, National University of Singapore², University of Science and Technology of China³

Exploring the Online Doctor-Patient Interaction on Patient Satisfaction Based on Text Mining and Empirical Analysis

TL;DR: A new Multimodal Graph Attention Network, short for MGAT, is proposed, which disentangles personal interests at the granularity of modality and is able to capture more complex interaction patterns hidden in user behaviors and provide a more accurate recommendation.

...read moreread less

Abstract: Graph neural networks (GNNs) have shown great potential for personalized recommendation. At the core is to reorganize interaction data as a user-item bipartite graph and exploit high-order connectivity among user and item nodes to enrich their representations. While achieving great success, most existing works consider interaction graph based only on ID information, foregoing item contents from multiple modalities (e.g., visual, acoustic, and textual features of micro-video items). Distinguishing personal interests on different modalities at a granular level was not explored until recently proposed MMGCN (Wei et al., 2019). However, it simply employs GNNs on parallel interaction graphs and treats information propagated from all neighbors equally, failing to capture user preference adaptively. Hence, the obtained representations might preserve redundant, even noisy information, leading to non-robustness and suboptimal performance. In this work, we aim to investigate how to adopt GNNs on multimodal interaction graphs, to adaptively capture user preference on different modalities and offer in-depth analysis on why an item is suitable to a user. Towards this end, we propose a new Multimodal Graph Attention Network, short for MGAT, which disentangles personal interests at the granularity of modality. In particular, built upon multimodal interaction graphs, MGAT conducts information propagation within individual graphs, while leveraging the gated attention mechanism to identify varying importance scores of different modalities to user preference. As such, it is able to capture more complex interaction patterns hidden in user behaviors and provide a more accurate recommendation. Empirical results on two micro-video recommendation datasets, Tiktok and MovieLens, show that MGAT exhibits substantial improvements over the state-of-the-art baselines like NGCF (Wang, He, et al., 2019) and MMGCN (Wei et al., 2019). Further analysis on a case study illustrates how MGAT generates attentive information flow over multimodal interaction graphs.

...read moreread less

Journal Article•DOI•

[...]

Shuqing Chen¹, Xitong Guo¹, Tianshi Wu¹, Xiaofeng Ju¹•Institutions (1)

Harbin Institute of Technology¹

An extensive study on the evolution of context-aware personalized travel recommender systems

TL;DR: The results indicate that the patient's activeness has a positive effect on a doctor's informational and emotional support, and the effect of emotional support on patient satisfaction is more significant than that of informational support.

...read moreread less

Abstract: In the online health community, the doctor-patient interaction is one of the most important functional modules. A large volume of unstructured text data has been generated in the doctor-patient interaction process. This development is worth exploring. In this paper, we mainly explore the influences of online doctor-patient interaction content on patient satisfaction. We collected the online doctor-patient interaction text data from a big online health community ( http://haodf.com ) in China from January 2015 to December 2016, followed by the use of text mining and econometrics to test our hypotheses. The results indicate that the patient's activeness has a positive effect on a doctor's informational and emotional support. Furthermore, both a doctor's informational and emotional support are known to have a positive effect on patient satisfaction. More significantly, the effect of emotional support on patient satisfaction is more significant than that of informational support. In addition, the patient's disease severity strengthens the link between a doctor's informational and emotional support and that of patient satisfaction. This study has far-reaching significance for a better understanding of the doctor-patient interaction mechanism.

...read moreread less

Journal Article•DOI•

[...]

Shini Renjith¹, A. Sreekumar¹, M. Jathavedan¹•Institutions (1)

Cochin University of Science and Technology¹

Vulnerable community identification using hate speech detection on social media

TL;DR: The study conducted on the evolution of travel recommender systems, their features and current set of limitations is described and the key algorithms being used for classification and recommendation processes and metrics that can be used to evaluate the performance of the algorithms and thereby the recommenders are discussed.

...read moreread less

Abstract: Ever since the beginning of civilization, travel for various causes exists as an essential part of human life so as travel recommendations, though the early form of recommendations were the accrued experiences shared by the community Modern recommender systems evolved along with the growth of Information Technology and are contributing to all industry and service segments inclusive of travel and tourism The journey started with generic recommender engines which gave way to personalized recommender systems and further advanced to contextualized personalization with advent of artificial intelligence Current era is also witnessing a boom in social media usage and the social media big data is acting as a critical input for various analytics with no exception for recommender systems This paper details about the study conducted on the evolution of travel recommender systems, their features and current set of limitations We also discuss on the key algorithms being used for classification and recommendation processes and metrics that can be used to evaluate the performance of the algorithms and thereby the recommenders

...read moreread less

Journal Article•DOI•

[...]

Zewdie Mossie¹, Jenq-Haur Wang¹•Institutions (1)

National Taipei University of Technology¹

Evaluating the use of interactive virtual reality technology with older adults living in residential aged care

TL;DR: This paper proposes a hate speech detection approach to identify hatred against vulnerable minority groups on social media and can successfully identify the Tigre ethnic group as the highly vulnerable community in terms of hatred compared with Amhara and Oromo.

...read moreread less

Abstract: With the rapid development in mobile computing and Web technologies, online hate speech has been increasingly spread in social network platforms since it's easy to post any opinions. Previous studies confirm that exposure to online hate speech has serious offline consequences to historically deprived communities. Thus, research on automated hate speech detection has attracted much attention. However, the role of social networks in identifying hate-related vulnerable community is not well investigated. Hate speech can affect all population groups, but some are more vulnerable to its impact than others. For example, for ethnic groups whose languages have few computational resources, it is a challenge to automatically collect and process online texts, not to mention automatic hate speech detection on social media. In this paper, we propose a hate speech detection approach to identify hatred against vulnerable minority groups on social media. Firstly, in Spark distributed processing framework, posts are automatically collected and pre-processed, and features are extracted using word n-grams and word embedding techniques such as Word2Vec. Secondly, deep learning algorithms for classification such as Gated Recurrent Unit (GRU), a variety of Recurrent Neural Networks (RNNs), are used for hate speech detection. Finally, hate words are clustered with methods such as Word2Vec to predict the potential target ethnic group for hatred. In our experiments, we use Amharic language in Ethiopia as an example. Since there was no publicly available dataset for Amharic texts, we crawled Facebook pages to prepare the corpus. Since data annotation could be biased by culture, we recruit annotators from different cultural backgrounds and achieved better inter-annotator agreement. In our experimental results, feature extraction using word embedding techniques such as Word2Vec performs better in both classical and deep learning-based classification algorithms for hate speech detection, among which GRU achieves the best result. Our proposed approach can successfully identify the Tigre ethnic group as the highly vulnerable community in terms of hatred compared with Amhara and Oromo. As a result, hatred vulnerable group identification is vital to protect them by applying automatic hate speech detection model to remove contents that aggravate psychological harm and physical conflicts. This can also encourage the way towards the development of policies, strategies, and tools to empower and protect vulnerable communities.

...read moreread less

Journal Article•DOI•

[...]

Steven Baker¹, Jenny Waycott¹, Elena Robertson¹, Romina Carrasco¹, Barbara Barbosa Neves², Barbara Barbosa Neves¹, Ralph Hampson¹, Frank Vetere¹ - Show less +4 more•Institutions (2)

University of Melbourne¹, Monash University, Clayton campus²

Adverse drug event detection and extraction from open data: A deep learning approach

TL;DR: The design challenges that will need to be met in order to ensure that interactive VR technology can be used by residents living in aged care, and the potential for VR to be used as a tool to improve the quality of life of some older residents, particularly those for whom traditional social activities do not appeal.

...read moreread less

Abstract: Background and objectives As technologies gain traction within the aged care community, better understanding their impact becomes vital. This paper reports on a study that explored the deployment of virtual reality (VR) as a tool to engage older adults in Residential Aged Care Facilities (RACF). The paper has two aims: 1) to identify the benefits and challenges associated with using VR with residents in aged care settings, and 2) to gather the views of older adult residents in RACF about the potential uses of VR in aged care. Research design and methods Five RACF residents and five RACF staff members took part in an intensive two-week evaluation of a VR system. Qualitative data was collected from multiple interviews and via researcher notes and video recordings made during the VR sessions. Results Results highlight the usability issues that impacted on the aged care residents' ability to use interactive VR technology and the potential negative impact head mounted displays can have on those living with dementia; the role that VR can play in engaging residents who might otherwise self-isolate, and how this can extend to increased engagement with family and friends. Discussion and implications We discuss the design challenges that will need to be met in order to ensure that interactive VR technology can be used by residents living in aged care, and the potential for VR to be used as a tool to improve the quality of life of some older residents, particularly those for whom traditional social activities do not appeal.

...read moreread less

Journal Article•DOI•

[...]

Brandon Fan, Weiguo Fan¹, Carly Smith², Harold “Skip” Garner³•Institutions (3)

University of Iowa¹, Stanford University², Edward Via College of Osteopathic Medicine³

Using AI and Social Media Multimodal Content for Disaster Response and Management: Opportunities, Challenges, and Future Directions

TL;DR: The results show that a BERT-based model achieves new state-of-the-art results on both the ADE detection and extraction task and can be applied to multiple other healthcare and information extraction tasks including medical entity extraction and entity recognition.

...read moreread less

Abstract: Drug prescription is a task that doctors face daily with each patient. However, when prescribing drugs, doctors must be conscious of all potential drug side effects. In fact, according to the U.S. Department of Health and Human Services, adverse drug events (ADEs), or harmful side effects, account for 1/3 of total hospital admissions each year. The goal of this research is to utilize novel deep learning methods for accurate detection and identification of professionally unreported drug side effects using widely available public data (open data). Utilizing a manually-labelled dataset of 10,000 reviews gathered from WebMD and Drugs.com, this research proposes a deep learning-based approach utilizing Bidirectional Encoder Representations from Transformers (BERT) based models for ADE detection and extraction and compares results to standard deep learning models and current state-of-the-art extraction models. By utilizing a hybrid of transfer learning from pre-trained BERT representations and sentence embeddings, the proposed model achieves an AUC score of 0.94 for ADE detection and an F1 score of 0.97 for ADE extraction. Previous state of the art deep learning approach achieves an AUC of 0.85 in ADE detection and an F1 of 0.82 in ADE extraction on our dataset of review texts. The results show that a BERT-based model achieves new state-of-the-art results on both the ADE detection and extraction task. This approach can be applied to multiple healthcare and information extraction tasks and used to help solve the problem that doctors face when prescribing drugs. Overall, this research introduces a novel dataset utilizing social media health forum data and shows the viability and capability of using deep learning techniques in ADE detection and extraction as well as information extraction as a whole. The model proposed in this paper achieves state-of-the-art results and can be applied to multiple other healthcare and information extraction tasks including medical entity extraction and entity recognition.

...read moreread less

Journal Article•DOI•

[...]

Muhammad Imran¹, Ferda Ofli¹, Doina Caragea², Antonio Torralba³•Institutions (3)

Qatar Computing Research Institute¹, Kansas State University², Massachusetts Institute of Technology³

Rapid relevance classification of social media posts in disasters and emergencies: A system and evaluation featuring active, incremental and online learning

TL;DR: Various applications and opportunities of SM multimodal data, latest advancements, current challenges, and future directions for the crisis informatics and other related research fields are highlighted.

...read moreread less

Abstract: People increasingly use Social Media (SM) platforms such as Twitter and Facebook during disasters and emergencies to post situational updates including reports of injured or dead people, infrastructure damage, requests of urgent needs, and the like. Information on SM comes in many forms, such as textual messages, images, and videos. Several studies have shown the utility of SM information for disaster response and management, which encouraged humanitarian organizations to start incorporating SM data sources into their workflows. However, several challenges prevent these organizations from using SM data for response efforts. These challenges include near-real-time information processing, information overload, information extraction, summarization, and verification of both textual and visual content. We highlight various applications and opportunities of SM multimodal data, latest advancements, current challenges, and future directions for the crisis informatics and other related research fields.

...read moreread less

Journal Article•DOI•

[...]

Marc-André Kaufhold¹, Marc-André Kaufhold², Markus Bayer¹, Christian Reuter¹•Institutions (2)

Technische Universität Darmstadt¹, University of Siegen²

Temporal and spatial evolution of online public sentiment on emergencies.

TL;DR: A well-performing classifier based on the European floods dataset is achieved by only requiring a quarter of labeled data compared to the traditional batch learning approach, and a substantial improvement could be determined on the BASF SE incident dataset.

...read moreread less

Abstract: The research field of crisis informatics examines, amongst others, the potentials and barriers of social media use during disasters and emergencies. Social media allow emergency services to receive valuable information (e.g., eyewitness reports, pictures, or videos) from social media. However, the vast amount of data generated during large-scale incidents can lead to issue of information overload. Research indicates that supervised machine learning techniques are suitable for identifying relevant messages and filter out irrelevant messages, thus mitigating information overload. Still, they require a considerable amount of labeled data, clear criteria for relevance classification, a usable interface to facilitate the labeling process and a mechanism to rapidly deploy retrained classifiers. To overcome these issues, we present (1) a system for social media monitoring, analysis and relevance classification, (2) abstract and precise criteria for relevance classification in social media during disasters and emergencies, (3) the evaluation of a well-performing Random Forest algorithm for relevance classification incorporating metadata from social media into a batch learning approach (e.g., 91.28%/89.19% accuracy, 98.3%/89.6% precision and 80.4%/87.5% recall with a fast training time with feature subset selection on the European floods/BASF SE incident datasets), as well as (4) an approach and preliminary evaluation for relevance classification including active, incremental and online learning to reduce the amount of required labeled data and to correct misclassifications of the algorithm by feedback classification. Using the latter approach, we achieved a well-performing classifier based on the European floods dataset by only requiring a quarter of labeled data compared to the traditional batch learning approach. Despite a lesser effect on the BASF SE incident dataset, still a substantial improvement could be determined.

...read moreread less

Journal Article•DOI•

[...]

Shiyue Li¹, Zixuan Liu², Yanling Li², Yanling Li³•Institutions (3)

Tsinghua University¹, Hunan Agricultural University², Nanyang Technological University³

ALDONAr: A hybrid solution for sentence-level aspect-based sentiment analysis using a lexicalized domain ontology and a regularized neural attention model

TL;DR: Interactions between three groups in time and space using a classic SIR (susceptible, infected, and recovered) epidemic model are analyzed to show that government control of public opinion is both cheaper and more effective when it occurs at the initial stages of an incident.

...read moreread less

Abstract: The transmission of online emergency information has become an active means of expressing public opinion and has vitally affected societal emergency response techniques. This paper analyzes interactions between three groups in time and space using a classic SIR (susceptible, infected, and recovered) epidemic model. Through social network theory and analog simulation analysis, we utilize data from China's Sina Weibo (a popular social media platform) to conduct empirical research on 101 major incidents in China that occurred between 2010 and 2017. We divide these emergencies into four types-natural disasters, accidents, public health events, and social security events-and conduct a simulation using three examples from each group. The results show that government control of public opinion is both cheaper and more effective when it occurs at the initial stages of an incident. By cooperating with the government, the media can facilitate emergency management. Finally, if netizens trust the government and the media, they are more likely to make cooperative decisions, maintain interest, and improve the management of online public sentiment.

...read moreread less

Journal Article•DOI•

[...]

Donatas Meškelė¹, Flavius Frasincar¹•Institutions (1)

Erasmus University Rotterdam¹

Social media analytics and business intelligence research: A systematic review

TL;DR: A hybrid solution for sentence-level aspect-based sentiment analysis using A Lexicalized Domain Ontology and a Regularized Neural Attention model (ALDONAr), where the bidirectional context attention mechanism is introduced to measure the influence of each word in a given sentence on an aspect’s sentiment value.

...read moreread less

Abstract: Aspect-based sentiment analysis allows one to compute the sentiment for an aspect in a certain context. One problem in this analysis is that words possibly carry different sentiments for different aspects. Moreover, an aspect’s sentiment might be highly influenced by the domain-specific knowledge. In order to tackle these issues, in this paper, we propose a hybrid solution for sentence-level aspect-based sentiment analysis using A Lexicalized Domain Ontology and a Regularized Neural Attention model (ALDONAr). The bidirectional context attention mechanism is introduced to measure the influence of each word in a given sentence on an aspect’s sentiment value. The classification module is designed to handle the complex structure of a sentence. The manually created lexicalized domain ontology is integrated to utilize the field-specific knowledge. Compared to the existing ALDONA model, ALDONAr uses BERT word embeddings, regularization, the Adam optimizer, and different model initialization. Moreover, its classification module is enhanced with two 1D CNN layers providing superior results on standard datasets.

...read moreread less

Journal Article•DOI•

[...]

Jaewoong Choi¹, Janghyeok Yoon¹, Jaemin Chung¹, Byoung-Youl Coh², Jae-Min Lee² - Show less +1 more•Institutions (2)

Konkuk University¹, Korea Institute of Science and Technology Information²

A collaborative filtering recommender system using genetic algorithm

TL;DR: The findings are expected to inform the existing researchers in the research domain about the future research directions, enable newcomers to understand the overall process of analyzing social media data, and provide the practitioners with social media analysis approaches suitable for their environment.

...read moreread less

Abstract: Evidently, online voice of customers (VoC) expressed in social media has emerged as quality data for researchers who are willing to conduct customer-driven business intelligence (BI) research Nevertheless, to the best of authors’ knowledge, there is still a dearth of studies that deal with such remarkable research stream and address various open data (eg, social media, intellectual property) from a BI research perspective Therefore, this study has attempted to evaluate the applicability of social media data in BI research and provide a systematic review on the primary research articles in the domain This study compared social media data with the other open data (eg, gray literature, public government data) in terms of data content, collection, updatability and structure, which are determined through a thorough discussion with experts Next, this study selected 57 social media-based BI research articles from the Web of Science (WoS) database and analyzed them with three research questions about the data, methodologies, and results to understand this research domain Our findings are expected to inform the existing researchers in the research domain about the future research directions, enable newcomers to understand the overall process of analyzing social media data, and provide the practitioners with social media analysis approaches suitable for their environment

...read moreread less

Journal Article•DOI•

[...]

Bushra Alhijawi¹, Yousef Kilani¹•Institutions (1)

Hashemite University¹