scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Intelligent Systems and Technology in 2019"


Journal ArticleDOI
TL;DR: This work introduces a comprehensive secure federated-learning framework, which includes horizontal federated learning, vertical federatedLearning, and federated transfer learning, and provides a comprehensive survey of existing works on this subject.
Abstract: Today’s artificial intelligence still faces two major challenges. One is that, in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated-learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated-learning framework, which includes horizontal federated learning, vertical federated learning, and federated transfer learning. We provide definitions, architectures, and applications for the federated-learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allowing knowledge to be shared without compromising user privacy.

2,593 citations


Journal ArticleDOI
TL;DR: This paper categorizes existing zero-shot learning methods and introduces representative methods under each category, and highlights promising future research directions of zero- shot learning.
Abstract: Most machine-learning methods focus on classifying instances whose classes have already been seen in training. In practice, many applications require classifying instances whose classes have not been seen previously. Zero-shot learning is a powerful and promising learning paradigm, in which the classes covered by training instances and the classes we aim to classify are disjoint. In this paper, we provide a comprehensive survey of zero-shot learning. First of all, we provide an overview of zero-shot learning. According to the data utilized in model optimization, we classify zero-shot learning into three learning settings. Second, we describe different semantic spaces adopted in existing zero-shot learning works. Third, we categorize existing zero-shot learning methods and introduce representative methods under each category. Fourth, we discuss different applications of zero-shot learning. Finally, we highlight promising future research directions of zero-shot learning.

403 citations


Journal ArticleDOI
TL;DR: This survey describes the modern-day problem of fake news and, in particular, highlights the technical challenges associated with it and comprehensively compile and summarize characteristic features of available datasets.
Abstract: The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users’ engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions.

280 citations


Journal ArticleDOI
TL;DR: This article proposed a CLMB model for analysing microblogging user behaviour and their activity across different countries in the CPSS applications, and evaluated CLBM model under the collected microblog dataset from 16 countries with the largest number of representative and active users in the world.
Abstract: As the rapid growth of social media technologies continues, Cyber-Physical-Social System (CPSS) has been a hot topic in many industrial applications. The use of “microblogging” services, such as Twitter, has rapidly become an influential way to share information. While recent studies have revealed that understanding and modelling microblog user behaviour with massive users’ data in social media are keen to success of many practical applications in CPSS, a key challenge in literatures is that diversity of geography and cultures in social media technologies strongly affect user behaviour and activity. The motivation of this article is to understand differences and similarities between microblogging users from different countries using social media technologies, and to attempt to design a Country-Level Micro-Blog User (CLMB) behaviour and activity model for supporting CPSS applications. We proposed a CLMB model for analysing microblogging user behaviour and their activity across different countries in the CPSS applications. The model has considered three important characteristics of user behaviour in microblogging data, including content of microblogging messages, user emotion index, and user relationship network. We evaluated CLBM model under the collected microblog dataset from 16 countries with the largest number of representative and active users in the world. Experimental results show that (1) for some countries with small population and strong cohesiveness, users pay more attention to social functionalities of microblogging service; (2) for some countries containing mostly large loose social groups, users use microblogging services as a news dissemination platform; (3) users in countries whose social network structure exhibits reciprocity rather than hierarchy will use more linguistic elements to express happiness in microblogging services.

207 citations


Journal ArticleDOI
TL;DR: This article constructs an intelligent offloading system for vehicular edge computing by leveraging deep reinforcement learning and develops a two-sided matching scheme and a deep reinforcementLearning approach to schedule offloading requests and allocate network resources.
Abstract: The development of smart vehicles brings drivers and passengers a comfortable and safe environment. Various emerging applications are promising to enrich users’ traveling experiences and daily life. However, how to execute computing-intensive applications on resource-constrained vehicles still faces huge challenges. In this article, we construct an intelligent offloading system for vehicular edge computing by leveraging deep reinforcement learning. First, both the communication and computation states are modelled by finite Markov chains. Moreover, the task scheduling and resource allocation strategy is formulated as a joint optimization problem to maximize users’ Quality of Experience (QoE). Due to its complexity, the original problem is further divided into two sub-optimization problems. A two-sided matching scheme and a deep reinforcement learning approach are developed to schedule offloading requests and allocate network resources, respectively. Performance evaluations illustrate the effectiveness and superiority of our constructed system.

194 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a systematic literature review and taxonomy of methods for remaining cycle time prediction in the context of business processes, as well as a cross-benchmark comparison of 16 such methods based on 17 real-life datasets originating from different industry domains.
Abstract: Predictive business process monitoring methods exploit historical process execution logs to generate predictions about running instances (called cases) of a business process, such as the prediction of the outcome, next activity, or remaining cycle time of a given process case. These insights could be used to support operational managers in taking remedial actions as business processes unfold, e.g., shifting resources from one case onto another to ensure the latter is completed on time. A number of methods to tackle the remaining cycle time prediction problem have been proposed in the literature. However, due to differences in their experimental setup, choice of datasets, evaluation measures, and baselines, the relative merits of each method remain unclear. This article presents a systematic literature review and taxonomy of methods for remaining time prediction in the context of business processes, as well as a cross-benchmark comparison of 16 such methods based on 17 real-life datasets originating from different industry domains.

116 citations


Journal ArticleDOI
TL;DR: Results corroborate that the proposed mechanisms can efficiently stimulate mobile edge users to perform evaluation task and improve the accuracy of trust evaluation, and validate the validity of Quality-Aware Trustworthy Incentive Mechanism.
Abstract: Both academia and industry have directed tremendous interest toward the combination of Cyber Physical Systems and Cloud Computing, which enables a new breed of applications and services. However, due to the relative long distance between remote cloud and end nodes, Cloud Computing cannot provide effective and direct management for end nodes, which leads to security vulnerabilities. In this article, we first propose a novel trust evaluation mechanism using crowdsourcing and Intelligent Mobile Edge Computing. The mobile edge users with relatively strong computation and storage ability are exploited to provide direct management for end nodes. Through close access to end nodes, mobile edge users can obtain various information of the end nodes and determine whether the node is trustworthy. Then, two incentive mechanisms, i.e., Trustworthy Incentive and Quality-Aware Trustworthy Incentive Mechanisms, are proposed for motivating mobile edge users to conduct trust evaluation. The first one aims to motivate edge users to upload their real information about their capability and costs. The purpose of the second one is to motivate edge users to make trustworthy effort to conduct tasks and report results. Detailed theoretical analysis demonstrates the validity of Quality-Aware Trustworthy Incentive Mechanism from data trustfulness, effort trustfulness, and quality trustfulness, respectively. Extensive experiments are carried out to validate the proposed trust evaluation and incentive mechanisms. The results corroborate that the proposed mechanisms can efficiently stimulate mobile edge users to perform evaluation task and improve the accuracy of trust evaluation.

107 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a pipeline that uses a deep neural network model to automatically extract visual features from images to estimate house prices in London, UK and found that learning to characterize the urban quality of a neighborhood improves house price prediction, even when generalizing to previously unseen London boroughs.
Abstract: When an individual purchases a home, they simultaneously purchase its structural features, its accessibility to work, and the neighborhood amenities. Some amenities, such as air quality, are measurable while others, such as the prestige or the visual impression of a neighborhood, are difficult to quantify. Despite the well-known impacts intangible housing features have on house prices, limited attention has been given to systematically quantifying these difficult to measure amenities. Two issues have led to this neglect. Not only do few quantitative methods exist that can measure the urban environment, but that the collection of such data is both costly and subjective. We show that street image and satellite image data can capture these urban qualities and improve the estimation of house prices. We propose a pipeline that uses a deep neural network model to automatically extract visual features from images to estimate house prices in London, UK. We make use of traditional housing features such as age, size, and accessibility as well as visual features from Google Street View images and Bing aerial images in estimating the house price model. We find encouraging results where learning to characterize the urban quality of a neighborhood improves house price prediction, even when generalizing to previously unseen London boroughs. We explore the use of non-linear vs. linear methods to fuse these cues with conventional models of house pricing, and show how the interpretability of linear models allows us to directly extract proxy variables for visual desirability of neighborhoods that are both of interest in their own right, and could be used as inputs to other econometric methods. This is particularly valuable as once the network has been trained with the training data, it can be applied elsewhere, allowing us to generate vivid dense maps of the visual appeal of London streets.

89 citations


Journal ArticleDOI
TL;DR: A novel multi-view fusion clustering framework based on an ELM, called MVEC, which exposes the underlying clustering structures embedded within multi-View data with a high degree of accuracy and a simple yet efficient solution to solve the optimization problem within MVEC.
Abstract: Unlabeled, multi-view data presents a considerable challenge in many real-world data analysis tasks. These data are worth exploring because they often contain complementary information that improves the quality of the analysis results. Clustering with multi-view data is a particularly challenging problem as revealing the complex data structures between many feature spaces demands discriminative features that are specific to the task and, when too few of these features are present, performance suffers. Extreme learning machines (ELMs) are an emerging form of learning model that have shown an outstanding representation ability and superior performance in a range of different learning tasks. Motivated by the promise of this advancement, we have developed a novel multi-view fusion clustering framework based on an ELM, called MVEC. MVEC learns the embeddings from each view of the data via the ELM network, then constructs a single unified embedding according to the correlations and dependencies between each embedding and automatically weighting the contribution of each. This process exposes the underlying clustering structures embedded within multi-view data with a high degree of accuracy. A simple yet efficient solution is also provided to solve the optimization problem within MVEC. Experiments and comparisons on eight different benchmarks from different domains confirm MVEC’s clustering accuracy.

73 citations


Journal ArticleDOI
TL;DR: This article proposes to simply use a large amount of taxi trips without using the intermediate trajectory points to estimate the travel time between source and destination, and significantly outperforms both state-of-the-art route-based method and online map services.
Abstract: The increased availability of large-scale trajectory data provides rich information for the study of urban dynamics. For example, New York City Taxi 8 Limousine Commission regularly releases source/destination information of taxi trips, where 173 million taxi trips released for Year 2013 [29]. Such a big dataset provides us potential new perspectives to address the traditional traffic problems. In this article, we study the travel time estimation problem. Instead of following the traditional route-based travel time estimation, we propose to simply use a large amount of taxi trips without using the intermediate trajectory points to estimate the travel time between source and destination. Our experiments show very promising results. The proposed big-data-driven approach significantly outperforms both state-of-the-art route-based method and online map services. Our study indicates that novel simple approaches could be empowered by big data and these approaches could serve as new baselines for some traditional computational problems.

66 citations


Journal ArticleDOI
TL;DR: In this paper, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players is proposed, based on a dataset of players' evaluations made by professional soccer scouts.
Abstract: The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this article, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players’ evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank—i.e. searching players and player versatility—showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.

Journal ArticleDOI
TL;DR: A local mean representation-based k-nearest neighbor classifier (LMRKNN) is proposed, which outperforms the related competitive KNN-based methods with more robustness and effectiveness.
Abstract: K-nearest neighbor classification method (KNN), as one of the top 10 algorithms in data mining, is a very simple and yet effective nonparametric technique for pattern recognition. However, due to the selective sensitiveness of the neighborhood size k, the simple majority vote, and the conventional metric measure, the KNN-based classification performance can be easily degraded, especially in the small training sample size cases. In this article, to further improve the classification performance and overcome the main issues in the KNN-based classification, we propose a local mean representation-based k-nearest neighbor classifier (LMRKNN). In the LMRKNN, the categorical k-nearest neighbors of a query sample are first chosen to calculate the corresponding categorical k-local mean vectors, and then the query sample is represented by the linear combination of the categorical k-local mean vectors; finally, the class-specific representation-based distances between the query sample and the categorical k-local mean vectors are adopted to determine the class of the query sample. Extensive experiments on many UCI and KEEL datasets and three popular face databases are carried out by comparing LMRKNN to the state-of-art KNN-based methods. The experimental results demonstrate that the proposed LMRKNN outperforms the related competitive KNN-based methods with more robustness and effectiveness.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed a distributed version of deep forest based on the parameter server system, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data, and different evaluation metrics for automatically determining the cascade level.
Abstract: Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large-scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large-scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data, and different evaluation metrics for automatically determining the cascade level. We tested the deep forest model on an extra-large-scale task, i.e., automatic detection of cash-out fraud, with more than 100 million training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money each day. Even compared with the best-deployed model, the deep forest model can additionally bring a significant decrease in economic loss each day.

Journal ArticleDOI
TL;DR: An efficient and privacy-preserving fog-assisted health data sharing (PFHDS) scheme for e-healthcare systems is proposed that integrates the fog node to classify the shared data into different categories according to disease risks for efficient health data analysis.
Abstract: Pervasive data collected from e-healthcare devices possess significant medical value through data sharing with professional healthcare service providers. However, health data sharing poses several security issues, such as access control and privacy leakage, as well as faces critical challenges to obtain efficient data analysis and services. In this article, we propose an efficient and privacy-preserving fog-assisted health data sharing (PFHDS) scheme for e-healthcare systems. Specifically, we integrate the fog node to classify the shared data into different categories according to disease risks for efficient health data analysis. Meanwhile, we design an enhanced attribute-based encryption method through combination of a personal access policy on patients and a professional access policy on the fog node for effective medical service provision. Furthermore, we achieve significant encryption consumption reduction for patients by offloading a portion of the computation and storage burden from patients to the fog node. Security discussions show that PFHDS realizes data confidentiality and fine-grained access control with collusion resistance. Performance evaluations demonstrate cost-efficient encryption computation, storage and energy consumption.

Journal ArticleDOI
TL;DR: This article examined the impact of time on state-of-the-art news veracity classifiers and showed that, as time progresses, classification performance for both unreliable and hyper-partisan news classification slowly degrade.
Abstract: In this study, we examine the impact of time on state-of-the-art news veracity classifiers. We show that, as time progresses, classification performance for both unreliable and hyper-partisan news classification slowly degrade. While this degradation does happen, it happens slower than expected, illustrating that hand-crafted, content-based features, such as style of writing, are fairly robust to changes in the news cycle. We show that this small degradation can be mitigated using online learning. Last, we examine the impact of adversarial content manipulation by malicious news producers. Specifically, we test three types of attack based on changes in the input space and data availability. We show that static models are susceptible to content manipulation attacks, but online models can recover from such attacks.

Journal ArticleDOI
TL;DR: A lightweight deep learning model consisting of factorization convolutional layers alternating with compression layers (namely, lightweight CNN-FC) is put forth, which can greatly decrease the number of unnecessary parameters and maintain the high accuracy while preserving the portable model size.
Abstract: Cloud computing extends Transportation Cyber-Physical Systems (T-CPS) with provision of enhanced computing and storage capability via offloading computing tasks to remote cloud servers. However, cloud computing cannot fulfill the requirements such as low latency and context awareness in T-CPS. The appearance of Mobile Edge Computing (MEC) can overcome the limitations of cloud computing via offloading the computing tasks at edge servers in approximation to users, consequently reducing the latency and improving the context awareness. Although MEC has the potential in improving T-CPS, it is incapable of processing computational-intensive tasks such as deep learning algorithms due to the intrinsic storage and computing-capability constraints. Therefore, we design and develop a lightweight deep learning model to support MEC applications in T-CPS. In particular, we put forth a stacked convolutional neural network (CNN) consisting of factorization convolutional layers alternating with compression layers (namely, lightweight CNN-FC). Extensive experimental results show that our proposed lightweight CNN-FC can greatly decrease the number of unnecessary parameters, thereby reducing the model size while maintaining the high accuracy in contrast to conventional CNN models. In addition, we also evaluate the performance of our proposed model via conducting experiments at a realistic MEC platform. Specifically, experimental results at this MEC platform show that our model can maintain the high accuracy while preserving the portable model size.

Journal ArticleDOI
TL;DR: This article proposes a novel MB discovery algorithm for balancing efficiency and accuracy, called BAlanced Markov Blanket (BAMB) discovery, which is close to the IAMB algorithm while it is much faster than the remaining seven MB discovery algorithms.
Abstract: The discovery of Markov blanket (MB) for feature selection has attracted much attention in recent years, since the MB of the class attribute is the optimal feature subset for feature selection. However, almost all existing MB discovery algorithms focus on either improving computational efficiency or boosting learning accuracy, instead of both. In this article, we propose a novel MB discovery algorithm for balancing efficiency and accuracy, called BAlanced Markov Blanket (BAMB) discovery. To achieve this goal, given a class attribute of interest, BAMB finds candidate PC (parents and children) and spouses and removes false positives from the candidate MB set in one go. Specifically, once a feature is successfully added to the current PC set, BAMB finds the spouses with regard to this feature, then uses the updated PC and the spouse set to remove false positives from the current MB set. This makes the PC and spouses of the target as small as possible and thus achieves a trade-off between computational efficiency and learning accuracy. In the experiments, we first compare BAMB with 8 state-of-the-art MB discovery algorithms on 7 benchmark Bayesian networks, then we use 10 real-world datasets and compare BAMB with 12 feature selection algorithms, including 8 state-of-the-art MB discovery algorithms and 4 other well-established feature selection methods. On prediction accuracy, BAMB outperforms 12 feature selection algorithms compared. On computational efficiency, BAMB is close to the IAMB algorithm while it is much faster than the remaining seven MB discovery algorithms.

Journal ArticleDOI
TL;DR: A multi-task predictive framework based on a learning-to-rank algorithm for academic performance prediction that captures inter-semester correlation, inter-major correlation, and integrates student similarity to predict students’ academic performance is built.
Abstract: Detecting abnormal behaviors of students in time and providing personalized intervention and guidance at the early stage is important in educational management. Academic performance prediction is an important building block to enabling this pre-intervention and guidance. Most of the previous studies are based on questionnaire surveys and self-reports, which suffer from small sample size and social desirability bias. In this article, we collect longitudinal behavioral data from the smart cards of 6,597 students and propose three major types of discriminative behavioral factors, diligence, orderliness, and sleep patterns. Empirical analysis demonstrates these behavioral factors are strongly correlated with academic performance. Furthermore, motivated by the social influence theory, we analyze the correlation between each student’s academic performance with his/her behaviorally similar students’. Statistical tests indicate this correlation is significant. Based on these factors, we further build a multi-task predictive framework based on a learning-to-rank algorithm for academic performance prediction. This framework captures inter-semester correlation, inter-major correlation, and integrates student similarity to predict students’ academic performance. The experiments on a large-scale real-world dataset show the effectiveness of our methods for predicting academic performance and the effectiveness of proposed behavioral factors.

Journal ArticleDOI
TL;DR: Personalized recommendation has received a lot of attention as a highly practical research topic, but existing recommender systems provide the recommendations with a generic statement such as “This book is good for you” or “Here’s why you should read this book”.
Abstract: Personalized recommendation has received a lot of attention as a highly practical research topic. However, existing recommender systems provide the recommendations with a generic statement such as “Customers who bought this item also bought…”. Explainable recommendation, which makes a user aware of why such items are recommended, is in demand. The goal of our research is to make the users feel as if they are receiving recommendations from their friends. To this end, we formulate a new challenging problem called personalized reason generation for explainable recommendation for songs in conversation applications and propose a solution that generates a natural language explanation of the reason for recommending a song to that particular user. For example, if the user is a student, our method can generate an output such as “Campus radio plays this song at noon every day, and I think it sounds wonderful,” which the student may find easy to relate to. In the offline experiments, through manual assessments, the gain of our method is statistically significant on the relevance to songs and personalization to users comparing with baselines. Large-scale online experiments show that our method outperforms manually selected reasons by 8.2% in terms of click-through rate. Evaluation results indicate that our generated reasons are relevant to songs and personalized to users, and they attract users to click the recommendations.

Journal ArticleDOI
TL;DR: The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation, to recommend up to 500 tracks that fit the target characteristics of the original playlist.
Abstract: The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the RecSys Challenge, Spotify released a dataset of one million user-generated playlists. Participants could compete in two tracks, i.e., main and creative tracks. Participants in the main track were only allowed to use the provided training set, however, in the creative track, the use of external public sources was permitted. In total, 113 teams submitted 1,228 runs to the main track; 33 teams submitted 239 runs to the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of 1.785 was obtained by the best team. This article provides an overview of the challenge, including motivation, task definition, dataset description, and evaluation. We further report and analyze the results obtained by the top-performing teams in each track and explore the approaches taken by the winners. We finally summarize our key findings, discuss generalizability of approaches and results to domains other than music, and list the open avenues and possible future directions in the area of automatic playlist continuation.

Journal ArticleDOI
TL;DR: Performance evaluation through simulation is carried out for success of routing ratio, compromised node detection ratio, and detection routing overhead and shows that the performance can be improved in the TDSR scheme compared to previous schemes.
Abstract: Security is a pivotal issue for the development of Cyber Physical Systems (CPS). The trusted computing of CPS includes the complete protection mechanisms, such as hardware, firmware, and software, the combination of which is responsible for enforcing a system security policy. A Trust Detection-based Secured Routing (TDSR) scheme is proposed to establish security routes from source nodes to the data center under malicious environment to ensure network security. In the TDSR scheme, sensor nodes in the routing path send detection routing to identify relay nodes’ trust. And then, data packets are routed through trustworthy nodes to sink securely. In the TDSR scheme, the detection routing is executed in those nodes that have abundant energy; thus, the network lifetime cannot be affected. Performance evaluation through simulation is carried out for success of routing ratio, compromised node detection ratio, and detection routing overhead. The experiment results show that the performance can be improved in the TDSR scheme compared to previous schemes.

Journal ArticleDOI
TL;DR: This article devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes and presents an efficient special-purpose algorithm to improve the local mining performance.
Abstract: Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.

Journal ArticleDOI
TL;DR: CAPrice is proposed, a novel adaptive pricing scheme for urban MOD networks that uses a new spatio-temporal deep capsule network (STCapsNet) that accurately predicts ride demands and driver supplies with vectorized neuron capsules while accounting for comprehensive spatio/temporal and external factors.
Abstract: Pricing in mobility-on-demand (MOD) networks, such as Uber, Lyft, and connected taxicabs, is done adaptively by leveraging the price responsiveness of drivers (supplies) and passengers (demands) to achieve such goals as maximizing drivers’ incomes, improving riders’ experience, and sustaining platform operation. Existing pricing policies only respond to short-term demand fluctuations without accurate trip forecast and spatial demand-supply balancing, thus mismatching drivers to riders and resulting in loss of profit. We propose CAPrice, a novel adaptive pricing scheme for urban MOD networks. It uses a new spatio-temporal deep capsule network (STCapsNet) that accurately predicts ride demands and driver supplies with vectorized neuron capsules while accounting for comprehensive spatio-temporal and external factors. Given accurate perception of zone-to-zone traffic flows in a city, CAPrice formulates a joint optimization problem by considering spatial equilibrium to balance the platform, providing drivers and riders/passengers with proactive pricing “signals.” We have conducted an extensive experimental evaluation upon over 4.0× 108 MOD trips (Uber, Didi Chuxing, and connected taxicabs) in New York City, Beijing, and Chengdu, validating the accuracy, effectiveness, and profitability (often 20% ride prediction accuracy and 30% profit improvements over the state-of-the-arts) of CAPrice in managing urban MOD networks.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a robust feature representation learning method based on denoising auto-encoders (DDAEs), which uses corruption and reconstruction on both the input and the hidden representation.
Abstract: This article aims to develop a new and robust approach to feature representation. Motivated by the success of Auto-Encoders, we first theoretically analyze and summarize the general properties of all algorithms that are based on traditional Auto-Encoders: (1) The reconstruction error of the input cannot be lower than a lower bound, which can be viewed as a guiding principle for reconstructing the input. Additionally, when the input is corrupted with noises, the reconstruction error of the corrupted input also cannot be lower than a lower bound. (2) The reconstruction of a hidden representation achieving its ideal situation is the necessary condition for the reconstruction of the input to reach the ideal state. (3) Minimizing the Frobenius norm of the Jacobian matrix of the hidden representation has a deficiency and may result in a much worse local optimum value. We believe that minimizing the reconstruction error of the hidden representation is more robust than minimizing the Frobenius norm of the Jacobian matrix of the hidden representation. Based on the above analysis, we propose a new model termed Double Denoising Auto-Encoders (DDAEs), which uses corruption and reconstruction on both the input and the hidden representation. We demonstrate that the proposed model is highly flexible and extensible and has a potentially better capability to learn invariant and robust feature representations. We also show that our model is more robust than Denoising Auto-Encoders (DAEs) for dealing with noises or inessential features. Furthermore, we detail how to train DDAEs with two different pretraining methods by optimizing the objective function in a combined and separate manner, respectively. Comparative experiments illustrate that the proposed model is significantly better for representation learning than the state-of-the-art models.

Journal ArticleDOI
TL;DR: A dictionary learning algorithm based on a non-negative constraint is developed, and a sparse representation anomaly node detection method for sensor networks is proposed based on the dictionary learning and verified the robustness of the proposed method in detecting abnormal nodes against four state of the art approaches.
Abstract: In recent years, wireless sensor networks (WSNs) have become an active area of research for monitoring physical and environmental conditions. Due to the interdependence of sensors, a functional anomaly in one sensor can cause a functional anomaly in another sensor, which can further lead to the malfunctioning of the entire sensor network. Existing research work has analysed faulty sensor anomalies but fails to show the effectiveness throughout the entire interdependent network system. In this article, a dictionary learning algorithm based on a non-negative constraint is developed, and a sparse representation anomaly node detection method for sensor networks is proposed based on the dictionary learning. Through experiment on a specific thermal power plant in China, we verify the robustness of our proposed method in detecting abnormal nodes against four state of the art approaches and proved our method is more robust. Furthermore, the experiments are conducted on the obtained abnormal nodes to prove the interdependence of multi-layer sensor networks and reveal the conditions and causes of a system crash.

Journal ArticleDOI
TL;DR: This article comprehensively surveys the development of trajectory data classification and provides a holistic understanding and deep insight into three types of trajectoryData classification methods and present some promising future directions.
Abstract: This article comprehensively surveys the development of trajectory data classification. Considering the critical role of trajectory data classification in modern intelligent systems for surveillance security, abnormal behavior detection, crowd behavior analysis, and traffic control, trajectory data classification has attracted growing attention. According to the availability of manual labels, which is critical to the classification performances, the methods can be classified into three categories, i.e., unsupervised, semi-supervised, and supervised. Furthermore, classification methods are divided into some sub-categories according to what extracted features are used. We provide a holistic understanding and deep insight into three types of trajectory data classification methods and present some promising future directions.

Journal ArticleDOI
TL;DR: This article investigated energy-efficient and contention-aware static scheduling for tasks with precedence and deadline constraints on intelligent edge devices deploying heterogeneous VFI-based NoC-MPSoCs (VFI-NoC-HMPSoC) with DVFS-enabled processors and proposed a novel population-based algorithm called ARSH-FATI.
Abstract: The interlinked processing units in modern Cyber-Physical Systems (CPS) creates a large network of connected computing embedded systems. Network-on-Chip (NoC)-based Multiprocessor System-on-Chip (MPSoC) architecture is becoming a de facto computing platform for real-time applications due to its higher performance and Quality-of-Service (QoS). The number of processors has increased significantly on the multiprocessor systems in CPS; therefore, Voltage Frequency Island (VFI) has been recently adopted for effective energy management mechanism in the large-scale multiprocessor chip designs. In this article, we investigated energy-efficient and contention-aware static scheduling for tasks with precedence and deadline constraints on intelligent edge devices deploying heterogeneous VFI-based NoC-MPSoCs (VFI-NoC-HMPSoC) with DVFS-enabled processors. Unlike the existing population-based optimization algorithms, we proposed a novel population-based algorithm called ARSH-FATI that can dynamically switch between explorative and exploitative search modes at run-time. Our static scheduler ARHS-FATI collectively performs task mapping, scheduling, and voltage scaling. Consequently, its performance is superior to the existing state-of-the-art approach proposed for homogeneous VFI-based NoC-MPSoCs. We also developed a communication contention-aware Earliest Edge Consistent Deadline First (EECDF) scheduling algorithm and gradient descent--inspired voltage scaling algorithm called Energy Gradient Decent (EGD). We introduced a notion of Energy Gradient (EG) that guides EGD in its search for island voltage settings and minimize the total energy consumption. We conducted the experiments on eight real benchmarks adopted from Embedded Systems Synthesis Benchmarks (E3S). Our static scheduling approach ARSH-FATI outperformed state-of-the-art technique and achieved an average energy-efficiency of ∼24% and ∼30% over CA-TMES-Search and CA-TMES-Quick, respectively.

Journal ArticleDOI
TL;DR: A migration-adapted clustering algorithm that utilizes a segmentation strategy and a group of matching-updating operations to achieve an efficient and accurate clustering analysis of the data for starting mode identification and abnormal test detection and the results demonstrate the effectiveness of the approach and its possible inspiration for the durability test data analysis of other similar industrial products.
Abstract: People face data-rich manufacturing environments in Industry 4.0. As an important technology for explaining and understanding complex data, visual analytics has been increasingly introduced into industrial data analysis scenarios. With the durability test of automotive starters as background, this study proposes a visual analysis approach for understanding large-scale and long-term durability test data. Guided by detailed scenario and requirement analyses, we first propose a migration-adapted clustering algorithm that utilizes a segmentation strategy and a group of matching-updating operations to achieve an efficient and accurate clustering analysis of the data for starting mode identification and abnormal test detection. We then design and implement a visual analysis system that provides a set of user-friendly visual designs and lightweight interactions to help people gain data insights into the test process overview, test data patterns, and durability performance dynamics. Finally, we conduct a quantitative algorithm evaluation, case study, and user interview by using real-world starter durability test datasets. The results demonstrate the effectiveness of the approach and its possible inspiration for the durability test data analysis of other similar industrial products.

Journal ArticleDOI
TL;DR: This article proposes a novel online heterogeneous transfer learning algorithm called Online Heterogeneous Knowledge Transition (OHKT), which first seeks to generate pseudo labels for the co-occurrence data based on the labeled source data, and develops an online learning algorithm to classify the target sequence by leveraging theCo-occurring data with pseudo labels.
Abstract: In this article, we study the problem of online heterogeneous transfer learning, where the objective is to make predictions for a target data sequence arriving in an online fashion, and some offline labeled instances from a heterogeneous source domain are provided as auxiliary data. The feature spaces of the source and target domains are completely different, thus the source data cannot be used directly to assist the learning task in the target domain. To address this issue, we take advantage of unlabeled co-occurrence instances as intermediate supplementary data to connect the source and target domains, and perform knowledge transition from the source domain into the target domain. We propose a novel online heterogeneous transfer learning algorithm called Online Heterogeneous Knowledge Transition (OHKT) for this purpose. In OHKT, we first seek to generate pseudo labels for the co-occurrence data based on the labeled source data, and then develop an online learning algorithm to classify the target sequence by leveraging the co-occurrence data with pseudo labels. Experimental results on real-world data sets demonstrate the effectiveness and efficiency of the proposed algorithm.

Journal ArticleDOI
TL;DR: RecRules is presented, a hybrid and semantic recommendation system that suggests rules based on their final purposes, thus overcoming details like manufacturers and brands and open the way for a new class of recommender systems for EUD that take into account the actual functionality needed by end users.
Abstract: Nowadays, end users can personalize their smart devices and web applications by defining or reusing IF-THEN rules through dedicated End-User Development (EUD) tools. Despite apparent simplicity, such tools present their own set of issues. The emerging and increasing complexity of the Internet of Things, for example, is barely taken into account, and the number of possible combinations between triggers and actions of different smart devices and web applications is continuously growing. Such a large design space makes end-user personalization a complex task for non-programmers, and motivates the need of assisting users in easily discovering and managing rules and functionality, e.g., through recommendation techniques. In this article, we tackle the emerging problem of recommending IF-THEN rules to end users by presenting RecRules, a hybrid and semantic recommendation system. Through a mixed content and collaborative approach, the goal of RecRules is to recommend by functionality: it suggests rules based on their final purposes, thus overcoming details like manufacturers and brands. The algorithm uses a semantic reasoning process to enrich rules with semantic information, with the aim of uncovering hidden connections between rules in terms of shared functionality. Then, it builds a collaborative semantic graph, and it exploits different types of path-based features to train a learning to rank algorithm and compute top-N recommendations. We evaluate RecRules through different experiments on real user data extracted from IFTTT, one of the most popular EUD tools. Results are promising: they show the effectiveness of our approach with respect to other state-of-the-art algorithms and open the way for a new class of recommender systems for EUD that take into account the actual functionality needed by end users.