scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Intelligent Systems and Technology in 2020"


Journal ArticleDOI
TL;DR: A survey will compare single-source and typically homogeneous unsupervised deep domain adaptation approaches, combining the powerful, hierarchical representations from deep learning with domain adaptation to reduce reliance on potentially costly target data labels.
Abstract: Deep learning has produced state-of-the-art results for a variety of tasks. While such approaches for supervised learning have performed well, they assume that training and testing data are drawn from the same distribution, which may not always be the case. As a complement to this challenge, single-source unsupervised domain adaptation can handle situations where a network is trained on labeled data from a source domain and unlabeled data from a related but different target domain with the goal of performing well at test-time on the target domain. Many single-source and typically homogeneous unsupervised deep domain adaptation approaches have thus been developed, combining the powerful, hierarchical representations from deep learning with domain adaptation to reduce reliance on potentially costly target data labels. This survey will compare these approaches by examining alternative methods, the unique and common elements, results, and theoretical insights. We follow this with a look at application areas and open research directions.

496 citations


Journal ArticleDOI
TL;DR: A systematic survey of adversarial examples against deep neural networks for NLP applications is presented in this article, where 40 representative works have been proposed to generate adversarial samples against DNNs.
Abstract: With the development of high computational devices, deep neural networks (DNNs), in recent years, have gained significant popularity in many Artificial Intelligence (AI) applications. However, previous efforts have shown that DNNs are vulnerable to strategically modified samples, named adversarial examples. These samples are generated with some imperceptible perturbations, but can fool the DNNs to give false predictions. Inspired by the popularity of generating adversarial examples against DNNs in Computer Vision (CV), research efforts on attacking DNNs for Natural Language Processing (NLP) applications have emerged in recent years. However, the intrinsic difference between image (CV) and text (NLP) renders challenges to directly apply attacking methods in CV to NLP. Various methods are proposed addressing this difference and attack a wide range of NLP applications. In this article, we present a systematic survey on these works. We collect all related academic works since the first appearance in 2017. We then select, summarize, discuss, and analyze 40 representative works in a comprehensive way. To make the article self-contained, we cover preliminary knowledge of NLP and discuss related seminal works in computer vision. We conclude our survey with a discussion on open issues to bridge the gap between the existing progress and more robust adversarial attacks on NLP DNNs.

226 citations


Journal ArticleDOI
TL;DR: This survey aims to provide a comprehensive review of the state-of-the-art VOST methods, classify these methods into different categories, and identify new trends.
Abstract: Object segmentation and object tracking are fundamental research areas in the computer vision community. These two topics are difficult to handle some common challenges, such as occlusion, deformation, motion blur, scale variation, and more. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity; the latter suffers from difficulties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of Video Object Segmentation and Tracking (VOST) can overcome their respective difficulties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high definition video compression, human computer interaction, and autonomous vehicles. This survey aims to provide a comprehensive review of the state-of-the-art VOST methods, classify these methods into different categories, and identify new trends. First, we broadly categorize VOST methods into Video Object Segmentation (VOS) and Segmentation-based Object Tracking (SOT). Each category is further classified into various types based on the segmentation and tracking mechanism. Moreover, we present some representative VOS and SOT methods of each time node. Second, we provide a detailed discussion and overview of the technical characteristics of the different methods. Third, we summarize the characteristics of the related video dataset and provide a variety of evaluation metrics. Finally, we point out a set of interesting future works and draw our own conclusions.

85 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a novel concept called Dynamic Distribution Adaptation (DDA), which is capable of quantitatively evaluating the relative importance of each distribution, which can be easily incorporated into the framework of structural risk minimization to solve transfer learning problems.
Abstract: Transfer learning aims to learn robust classifiers for the target domain by leveraging knowledge from a source domain. Since the source and the target domains are usually from different distributions, existing methods mainly focus on adapting the cross-domain marginal or conditional distributions. However, in real applications, the marginal and conditional distributions usually have different contributions to the domain discrepancy. Existing methods fail to quantitatively evaluate the different importance of these two distributions, which will result in unsatisfactory transfer performance. In this article, we propose a novel concept called Dynamic Distribution Adaptation (DDA), which is capable of quantitatively evaluating the relative importance of each distribution. DDA can be easily incorporated into the framework of structural risk minimization to solve transfer learning problems. On the basis of DDA, we propose two novel learning algorithms: (1) Manifold Dynamic Distribution Adaptation (MDDA) for traditional transfer learning, and (2) Dynamic Distribution Adaptation Network (DDAN) for deep transfer learning. Extensive experiments demonstrate that MDDA and DDAN significantly improve the transfer learning performance and set up a strong baseline over the latest deep and adversarial methods on digits recognition, sentiment analysis, and image classification. More importantly, it is shown that marginal and conditional distributions have different contributions to the domain divergence, and our DDA is able to provide good quantitative evaluation of their relative importance, which leads to better performance. We believe this observation can be helpful for future research in transfer learning.

80 citations


Journal ArticleDOI
TL;DR: The objective of this survey is to synthesize and present two decades of research on web tables into six main categories of information access tasks: table extraction, table interpretation, table search, question answering, knowledge base augmentation, and table augmentation.
Abstract: Tables are powerful and popular tools for organizing and manipulating data. A vast number of tables can be found on the Web, which represent a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research on web tables. In particular, we organize existing literature into six main categories of information access tasks: table extraction, table interpretation, table search, question answering, knowledge base augmentation, and table augmentation. For each of these tasks, we identify and describe seminal approaches, present relevant resources, and point out interdependencies among the different tasks.

66 citations


Journal ArticleDOI
TL;DR: In this article, a self-weighted robust linear discriminant analysis (LDA) with pairwise between-class distance criterion, called SWRLDA, is proposed for multi-class classification especially with edge classes.
Abstract: Linear discriminant analysis (LDA) is a popular technique to learn the most discriminative features for multi-class classification. A vast majority of existing LDA algorithms are prone to be dominated by the class with very large deviation from the others, i.e., edge class, which occurs frequently in multi-class classification. First, the existence of edge classes often makes the total mean biased in the calculation of between-class scatter matrix. Second, the exploitation of l2-norm based between-class distance criterion magnifies the extremely large distance corresponding to edge class. In this regard, a novel self-weighted robust LDA with l2,1-norm based pairwise between-class distance criterion, called SWRLDA, is proposed for multi-class classification especially with edge classes. SWRLDA can automatically avoid the optimal mean calculation and simultaneously learn adaptive weights for each class pair without setting any additional parameter. An efficient re-weighted algorithm is exploited to derive the global optimum of the challenging l2,1-norm maximization problem. The proposed SWRLDA is easy to implement and converges fast in practice. Extensive experiments demonstrate that SWRLDA performs favorably against other compared methods on both synthetic and real-world datasets while presenting superior computational efficiency in comparison with other techniques.

49 citations


Journal ArticleDOI
TL;DR: A novel Flexible Multi-modal Hashing method that learns multiple modality-specific hash codes and multi- modality collaborative hash codes simultaneously within a single model to address the problem of binarize queries when only one or part of modalities are provided.
Abstract: Multi-modal hashing methods could support efficient multimedia retrieval by combining multi-modal features for binary hash learning at the both offline training and online query stages. However, existing multi-modal methods cannot binarize the queries, when only one or part of modalities are provided. In this article, we propose a novel Flexible Multi-modal Hashing (FMH) method to address this problem. FMH learns multiple modality-specific hash codes and multi-modal collaborative hash codes simultaneously within a single model. The hash codes are flexibly generated according to the newly coming queries, which provide any one or combination of modality features. Besides, the hashing learning procedure is efficiently supervised by the pair-wise semantic matrix to enhance the discriminative capability. It could successfully avoid the challenging symmetric semantic matrix factorization and O(n2) storage cost of semantic matrix. Finally, we design a fast discrete optimization to learn hash codes directly with simple operations. Experiments validate the superiority of the proposed approach.

47 citations


Journal ArticleDOI
TL;DR: A new generalizable face authentication CNN (GFA-CNN) model with three novelties, which introduces a simple yet effective total pairwise confusion loss for CNN training that properly balances contributions of all spoofing patterns for recognizing the spoofing faces.
Abstract: Face anti-spoofing aims to detect presentation attack to face recognition--based authentication systems. It has drawn growing attention due to the high security demand. The widely adopted CNN-based methods usually well recognize the spoofing faces when training and testing spoofing samples display similar patterns, but their performance would drop drastically on testing spoofing faces of novel patterns or unseen scenes, leading to poor generalization performance. Furthermore, almost all current methods treat face anti-spoofing as a prior step to face recognition, which prolongs the response time and makes face authentication inefficient. In this article, we try to boost the generalizability and applicability of face anti-spoofing methods by designing a new generalizable face authentication CNN (GFA-CNN) model with three novelties. First, GFA-CNN introduces a simple yet effective total pairwise confusion loss for CNN training that properly balances contributions of all spoofing patterns for recognizing the spoofing faces. Second, it incorporate a fast domain adaptation component to alleviate negative effects brought by domain variation. Third, it deploys filter diversification learning to make the learned representations more adaptable to new scenes. In addition, the proposed GFA-CNN works in a multi-task manner—it performs face anti-spoofing and face recognition simultaneously. Experimental results on five popular face anti-spoofing and face recognition benchmarks show that GFA-CNN outperforms previous face anti-spoofing methods on cross-test protocols significantly and also well preserves the identity information of input face images.

42 citations


Journal ArticleDOI
TL;DR: This article proposes a novel Privacy preserving POI Recommendation (PriRec) framework that keeps users’ private raw data and models in users' own hands, and protects user privacy to a large extent, and applies PriRec in real-world datasets, and comprehensive experiments demonstrate that it achieves comparable or even better recommendation accuracy.
Abstract: Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently However, most existing approaches build centralized models on the basis of collecting users’ data Both private data and models are held by the recommender, which causes serious privacy concerns In this article, we propose a novel Privacy preserving POI Recommendation (PriRec) framework First, to protect data privacy, users’ private data (features and actions) are kept on their own side, eg, Cellphone or Pad Meanwhile, the public data that need to be accessed by all the users are kept by the recommender to reduce the storage costs of users’ devices Those public data include: (1) static data only related to the status of POI, such as POI categories, and (2) dynamic data dependent on user-POI actions such as visited counts The dynamic data could be sensitive, and we develop local differential privacy techniques to release such data to the public with privacy guarantees Second, PriRec follows the representations of Factorization Machine (FM) that consists of a linear model and the feature interaction model To protect the model privacy, the linear models are saved on the users’ side, and we propose a secure decentralized gradient descent protocol for users to learn it collaboratively The feature interaction model is kept by the recommender since there is no privacy risk, and we adopt a secure aggregation strategy in a federated learning paradigm to learn it To this end, PriRec keeps users’ private raw data and models in users’ own hands, and protects user privacy to a large extent We apply PriRec in real-world datasets, and comprehensive experiments demonstrate that, compared with FM, PriRec achieves comparable or even better recommendation accuracy

36 citations


Journal ArticleDOI
TL;DR: A novel representation learning framework, namely TRajectory EMBedding via Road networks (Trembr), to learn trajectory embeddings for use in a variety of trajectory applications that soundly outperforms the state-of-the-art trajectory representation learning models, trajectory2vec and t2vec.
Abstract: In this article, we propose a novel representation learning framework, namely TRajectory EMBedding via Road networks (Trembr), to learn trajectory embeddings (low-dimensional feature vectors) for use in a variety of trajectory applications. The novelty of Trembr lies in (1) the design of a recurrent neural network--(RNN) based encoder--decoder model, namely Traj2Vec, that encodes spatial and temporal properties inherent in trajectories into trajectory embeddings by exploiting the underlying road networks to constrain the learning process in accordance with the matched road segments obtained using road network matching techniques (e.g., Barefoot [24, 27]), and (2) the design of a neural network--based model, namely Road2Vec, to learn road segment embeddings in road networks that captures various relationships amongst road segments in preparation for trajectory representation learning. In addition to model design, several unique technical issues raising in Trembr, including data preparation in Road2Vec, the road segment relevance-aware loss, and the network topology constraint in Traj2Vec, are examined. To validate our ideas, we learn trajectory embeddings using multiple large-scale real-world trajectory datasets and use them in three tasks, including trajectory similarity measure, travel time prediction, and destination prediction. Empirical results show that Trembr soundly outperforms the state-of-the-art trajectory representation learning models, trajectory2vec and t2vec, by at least one order of magnitude in terms of mean rank in trajectory similarity measure, 23.3% to 41.7% in terms of mean absolute error (MAE) in travel time prediction, and 39.6% to 52.4% in terms of MAE in destination prediction.

29 citations


Journal ArticleDOI
TL;DR: In this article, a novel multi-attribute tensor correlation neural network (MTCN) is used to predict face attributes and achieves the best performance compared with the latest multi- attribute recognition algorithms under the same settings.
Abstract: Multi-task learning plays an important role in face multi-attribute prediction. At present, most researches excavate the shared information between attributes by sharing all convolutional layers. However, it is not appropriate to treat the low-level and high-level features of the face multi-attribute equally, because the high-level features are more biased toward the specific content of the category. In this article, a novel multi-attribute tensor correlation neural network (MTCN) is used to predict face attributes. MTCN shares all attribute features at the low-level layers, and then distinguishes each attribute feature at the high-level layers. To better excavate the correlations among high-level attribute features, each sub-network explores useful information from other networks to enhance its original information. Then a tensor canonical correlation analysis method is used to seek the correlations among the highest-level attributes, which enhances the original information of each attribute. After that, these features are mapped into a highly correlated space through the correlation matrix. Finally, we use sufficient experiments to verify the performance of MTCN on the CelebA and LFWA datasets and our MTCN achieves the best performance compared with the latest multi-attribute recognition algorithms under the same settings.

Journal ArticleDOI
TL;DR: This work proposes to learn discriminative features from microblog posts by following their non-sequential propagation structure and generate more powerful representations for identifying rumors, and reveals that effective rumor detection is highly related to finding evidential posts.
Abstract: Rumor spread in social media severely jeopardizes the credibility of online content. Thus, automatic debunking of rumors is of great importance to keep social media a healthy environment. While facing a dubious claim, people often dispute its truthfulness sporadically in their posts containing various cues, which can form useful evidence with long-distance dependencies. In this work, we propose to learn discriminative features from microblog posts by following their non-sequential propagation structure and generate more powerful representations for identifying rumors. For modeling non-sequential structure, we first represent the diffusion of microblog posts with propagation trees, which provide valuable clues on how a claim in the original post is transmitted and developed over time. We then present a bottom-up and a top-down tree-structured models based on Recursive Neural Networks (RvNN) for rumor representation learning and classification, which naturally conform to the message propagation process in microblogs. To enhance the rumor representation learning, we reveal that effective rumor detection is highly related to finding evidential posts, e.g., the posts expressing specific attitude towards the veracity of a claim, as an extension of the previous RvNN-based detection models that treat every post equally. For this reason, we design discriminative attention mechanisms for the RvNN-based models to selectively attend on the subset of evidential posts during the bottom-up/top-down recursive composition. Experimental results on four datasets collected from real-world microblog platforms confirm that (1) our RvNN-based models achieve much better rumor detection and classification performance than state-of-the-art approaches; (2) the attention mechanisms for focusing on evidential posts can further improve the performance of our RvNN-based method; and (3) our approach possesses superior capacity on detecting rumors at a very early stage.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a thermal image translation method, which can translate thermal/infrared (IR) images into color visible (VI) images, called IR2VI, which consists of two cascaded steps: translation from nighttime thermal IR images to gray-scale visible images (GVI), which is called IR-GVI; and the translation from GVI to color visible images, which is known as GVI-CVI in this article.
Abstract: Context enhancement is critical for the environmental perception in night vision applications, especially for the dark night situation without sufficient illumination. In this article, we propose a thermal image translation method, which can translate thermal/infrared (IR) images into color visible (VI) images, called IR2VI. The IR2VI consists of two cascaded steps: translation from nighttime thermal IR images to gray-scale visible images (GVI), which is called IR-GVI; and the translation from GVI to color visible images (CVI), which is known as GVI-CVI in this article. For the first step, we develop the Texture-Net, a novel unsupervised image translation neural network based on generative adversarial networks. Texture-Net can learn the intrinsic characteristics from the GVI and integrate them into the IR image. In comparison with the state-of-the-art unsupervised image translation methods, the proposed Texture-Net is able to address some common challenges, e.g., incorrect mapping and lack of fine details, with a structure connection module and a region-of-interest focal loss. For the second step, we investigated the state-of-the-art gray-scale image colorization methods and integrate the deep convolutional neural network into the IR2VI framework. The results of the comprehensive evaluation experiments demonstrate the effectiveness of the proposed IR2VI image translation method. This solution will contribute to the environmental perception and understanding in varied night vision applications.

Journal ArticleDOI
TL;DR: This work proposes a novel pair-based active learning for Re-ID that selects pairs instead of instances from the entire dataset for annotation, and takes into account the uncertainty and the diversity in terms of pairwise relations.
Abstract: The effective training of supervised Person Re-identification (Re-ID) models requires sufficient pairwise labeled data. However, when there is limited annotation resource, it is difficult to collect pairwise labeled data. We consider a challenging and practical problem called Early Active Learning, which is applied to the early stage of experiments when there is no pre-labeled sample available as references for human annotating. Previous early active learning methods suffer from two limitations for Re-ID. First, these instance-based algorithms select instances rather than pairs, which can result in missing optimal pairs for Re-ID. Second, most of these methods only consider the representativeness of instances, which can result in selecting less diverse and less informative pairs. To overcome these limitations, we propose a novel pair-based active learning for Re-ID. Our algorithm selects pairs instead of instances from the entire dataset for annotation. Besides representativeness, we further take into account the uncertainty and the diversity in terms of pairwise relations. Therefore, our algorithm can produce the most representative, informative, and diverse pairs for Re-ID data annotation. Extensive experimental results on five benchmark Re-ID datasets have demonstrated the superiority of the proposed pair-based early active learning algorithm.

Journal ArticleDOI
TL;DR: WiSign is proposed that recognizes the continuous sentences of American Sign Language (ASL) with existing WiFi infrastructure and integrates the language model N-gram, which uses the grammar rules of ASL to calibrate the recognized results of sign words.
Abstract: In this article, we propose WiSign that recognizes the continuous sentences of American Sign Language (ASL) with existing WiFi infrastructure. Instead of identifying the individual ASL words from the manually segmented ASL sentence in existing works, WiSign can automatically segment the original channel state information (CSI) based on the power spectral density (PSD) segmentation method. WiSign constructs a five-layer Deep Belief Network (DBN) to automatically extract the features of isolated fragments, and then uses the Hidden Markov Model (HMM) with Gaussian mixture and Forward-Backward algorithm to recognize sign words. In order to further improve the accuracy, WiSign also integrates the language model N-gram, which uses the grammar rules of ASL to calibrate the recognized results of sign words. We implement a prototype of WiSign with commercial WiFi devices and evaluate its performance in real indoor environments. The results show that WiSign achieves satisfactory accuracy when recognizing ASL sentences that involve the movements of the head, arms, hands, and fingers.

Journal ArticleDOI
TL;DR: The e-tourism has become one of the hottest industries with the adoption of self-service booking systems, and the personalized recommendation is invariably highly-valued by both consumers and merchants.
Abstract: As an e-commerce feature, the personalized recommendation is invariably highly-valued by both consumers and merchants. The e-tourism has become one of the hottest industries with the adoption of recommendation systems. Several lines of evidence have confirmed the travel-product recommendation is quite different from traditional recommendations. Travel products are usually browsed and purchased relatively infrequently compared with other traditional products (e.g., books and food), which gives rise to the extreme sparsity of travel data. Meanwhile, the choice of a suitable travel product is affected by an army of factors such as departure, destination, and financial and time budgets. To address these challenging problems, in this article, we propose a Probabilistic Matrix Factorization with Multi-Auxiliary Information (PMF-MAI) model in the context of the travel-product recommendation. In particular, PMF-MAI is able to fuse the probabilistic matrix factorization on the user-item interaction matrix with the linear regression on a suite of features constructed by the multiple auxiliary information. In order to fit the sparse data, PMF-MAI is built by a whole-data based learning approach that utilizes unobserved data to increase the coupling between probabilistic matrix factorization and linear regression. Extensive experiments are conducted on a real-world dataset provided by a large tourism e-commerce company. PMF-MAI shows an overwhelming superiority over all competitive baselines on the recommendation performance. Also, the importance of features is examined to reveal the crucial auxiliary information having a great impact on the adoption of travel products.

Journal ArticleDOI
TL;DR: The preliminary results demonstrate that DeepKey is feasible, shows consistent superior performance compared to a set of methods, and has the potential to be applied to the authentication deployment in real-world settings.
Abstract: Biometric authentication involves various technologies to identify individuals by exploiting their unique, measurable physiological and behavioral characteristics. However, traditional biometric authentication systems (e.g., face recognition, iris, retina, voice, and fingerprint) are at increasing risks of being tricked by biometric tools such as anti-surveillance masks, contact lenses, vocoder, or fingerprint films. In this article, we design a multimodal biometric authentication system named DeepKey, which uses both Electroencephalography (EEG) and gait signals to better protect against such risk. DeepKey consists of two key components: an Invalid ID Filter Model to block unauthorized subjects, and an identification model based on attention-based Recurrent Neural Network (RNN) to identify a subject’s EEG IDs and gait IDs in parallel. The subject can only be granted access while all the components produce consistent affirmations to match the user’s proclaimed identity. We implement DeepKey with a live deployment in our university and conduct extensive empirical experiments to study its technical feasibility in practice. DeepKey achieves the False Acceptance Rate (FAR) and the False Rejection Rate (FRR) of 0 and 1.0%, respectively. The preliminary results demonstrate that DeepKey is feasible, shows consistent superior performance compared to a set of methods, and has the potential to be applied to the authentication deployment in real-world settings.

Journal ArticleDOI
TL;DR: Multi-source domain adaptation has received considerable attention due to its effectiveness of leveraging the knowledge from multiple related sources with different distributions to enhance the knowledge of different distributions.
Abstract: Multi-source domain adaptation has received considerable attention due to its effectiveness of leveraging the knowledge from multiple related sources with different distributions to enhance the learning performance. One of the fundamental challenges in multi-source domain adaptation is how to determine the amount of knowledge transferred from each source domain to the target domain. To address this issue, we propose a new algorithm, called Domain-attention Conditional Wasserstein Distance (DCWD), to learn transferred weights for evaluating the relatedness across the source and target domains. In DCWD, we design a new conditional Wasserstein distance objective function by taking the label information into consideration to measure the distance between a given source domain and the target domain. We also develop an attention scheme to compute the transferred weights of different source domains based on their conditional Wasserstein distances to the target domain. After that, the transferred weights can be used to reweight the source data to determine their importance in knowledge transfer. We conduct comprehensive experiments on several real-world data sets, and the results demonstrate the effectiveness and efficiency of the proposed method.

Journal ArticleDOI
TL;DR: This article designs busCharging, a pricing-aware real-time charging scheduling system based on Markov Decision Process to reduce the overall charging and operating costs for city-scale electric bus fleets, taking the time-variant electricity pricing into account.
Abstract: We are witnessing a rapid growth of electrified vehicles due to the ever-increasing concerns on urban air quality and energy security. Compared to other types of electric vehicles, electric buses have not yet been prevailingly adopted worldwide due to their high owning and operating costs, long charging time, and the uneven spatial distribution of charging facilities. Moreover, the highly dynamic environment factors such as unpredictable traffic congestion, different passenger demands, and even the changing weather can significantly affect electric bus charging efficiency and potentially hinder the further promotion of large-scale electric bus fleets. To address these issues, in this article, we first analyze a real-world dataset including massive data from 16,359 electric buses, 1,400 bus lines, and 5,562 bus stops. Then, we investigate the electric bus network to understand its operating and charging patterns, and further verify the necessity and feasibility of a real-time charging scheduling. With such understanding, we design busCharging, a pricing-aware real-time charging scheduling system based on Markov Decision Process to reduce the overall charging and operating costs for city-scale electric bus fleets, taking the time-variant electricity pricing into account. To show the effectiveness of busCharging, we implement it with the real-world data from Shenzhen, which includes GPS data of electric buses, the metadata of all bus lines and bus stops, combined with data of 376 charging stations for electric buses. The evaluation results show that busCharging dramatically reduces the charging cost by 23.7% and 12.8% of electricity usage simultaneously. Finally, we design a scheduling-based charging station expansion strategy to verify our busCharging is also effective during the charging station expansion process.

Journal ArticleDOI
TL;DR: This work inversely learns the taxi drivers’ preferences from data and characterize the dynamics of such preferences over time, and extracts two types of features to model the decision space of drivers and learns the preferences of drivers with respect to these features.
Abstract: Many real-world human behaviors can be modeled and characterized as sequential decision-making processes, such as a taxi driver’s choices of working regions and times. Each driver possesses unique preferences on the sequential choices over time and improves the driver’s working efficiency. Understanding the dynamics of such preferences helps accelerate the learning process of taxi drivers. Prior works on taxi operation management mostly focus on finding optimal driving strategies or routes, lacking in-depth analysis on what the drivers learned during the process and how they affect the performance of the driver. In this work, we make the first attempt to establish Dynamic Human Preference Analytics. We inversely learn the taxi drivers’ preferences from data and characterize the dynamics of such preferences over time. We extract two types of features (i.e., profile features and habit features) to model the decision space of drivers. Then through inverse reinforcement learning, we learn the preferences of drivers with respect to these features. The results illustrate that self-improving drivers tend to keep adjusting their preferences to habit features to increase their earning efficiency while keeping the preferences to profile features invariant. However, experienced drivers have stable preferences over time. The exploring drivers tend to randomly adjust the preferences over time.

Journal ArticleDOI
TL;DR: In this paper, the authors propose FMC_TA, a novel incomplete task allocation algorithm that allows tasks to be easily sequenced to yield high-quality solutions in realistic multi-agent team applications.
Abstract: Realistic multi-agent team applications often feature dynamic environments with soft deadlines that penalize late execution of tasks. This puts a premium on quickly allocating tasks to agents. However, when such problems include temporal and spatial constraints that require tasks to be executed sequentially by agents, they are NP-hard, and thus are commonly solved using general and specifically designed incomplete heuristic algorithms. We propose FMC_TA, a novel such incomplete task allocation algorithm that allows tasks to be easily sequenced to yield high-quality solutions. FMC_TA first finds allocations that are fair (envy-free), balancing the load and sharing important tasks among agents, and efficient (Pareto optimal) in a simplified version of the problem. It computes such allocations in polynomial or pseudo-polynomial time (centrally or distributedly, respectively) using a Fisher market with agents as buyers and tasks as goods. It then heuristically schedules the allocations, taking into account inter-agent constraints on shared tasks. We empirically compare our algorithm to state-of-the-art incomplete methods, both centralized and distributed, on law enforcement problems inspired by real police logs. We present a novel formalization of the law enforcement problem, which we use to perform our empirical study. The results show a clear advantage for FMC_TA in total utility and in measures in which law enforcement authorities measure their own performance. Besides problems with realistic properties, the algorithms were compared on synthetic problems in which we increased the size of different elements of the problem to investigate the algorithm’s behavior when the problem scales. The domination of the proposed algorithm was found to be consistent.

Journal ArticleDOI
TL;DR: Two approaches are proposed, DUP and RNNPlanner, to discover target plans based on vector representations of actions based on the corpora, capable of discovering underlying plans that are not from plan libraries without requiring action models provided.
Abstract: Plan recognition aims to discover target plans (i.e., sequences of actions) behind observed actions, with history plan libraries or action models in hand. Previous approaches either discover plans by maximally “matching” observed actions to plan libraries, assuming target plans are from plan libraries, or infer plans by executing action models to best explain the observed actions, assuming that complete action models are available. In real-world applications, however, target plans are often not from plan libraries, and complete action models are often not available, since building complete sets of plans and complete action models are often difficult or expensive. In this article, we view plan libraries as corpora and learn vector representations of actions using the corpora; we then discover target plans based on the vector representations. Specifically, we propose two approaches, DUP and RNNPlanner, to discover target plans based on vector representations of actions. DUP explores the EM-style (Expectation Maximization) framework to capture local contexts of actions and discover target plans by optimizing the probability of target plans, while RNNPlanner aims to leverage long-short term contexts of actions based on RNNs (Recurrent Neural Networks) framework to help recognize target plans. In the experiments, we empirically show that our approaches are capable of discovering underlying plans that are not from plan libraries without requiring action models provided. We demonstrate the effectiveness of our approaches by comparing its performance to traditional plan recognition approaches in three planning domains. We also compare DUP and RNNPlanner to see their advantages and disadvantages.

Journal ArticleDOI
TL;DR: This article investigates these blackmarket customers engaged in collusive retweeting activities and adopts Weighted Generalized Canonical Correlation Analysis (WGCCA) to combine these individual representations to derive user embeddings that allow us to effectively classify users as: genuine users, bots, promotional customers, and normal customers.
Abstract: With the rise in popularity of social media platforms like Twitter, having higher influence on these platforms has a greater value attached to it, since it has the power to influence many decisions in the form of brand promotions and shaping opinions. However, blackmarket services that allow users to inorganically gain influence are a threat to the credibility of these social networking platforms. Twitter users can gain inorganic appraisals in the form of likes, retweets, and follows through these blackmarket services either by paying for them or by joining syndicates wherein they gain such appraisals by providing similar appraisals to other users. These customers tend to exhibit a mix of organic and inorganic retweeting behavior, making it tougher to detect them. In this article, we investigate these blackmarket customers engaged in collusive retweeting activities. We collect and annotate a novel dataset containing various types of information about blackmarket customers and use these sources of information to construct multiple user representations. We adopt Weighted Generalized Canonical Correlation Analysis (WGCCA) to combine these individual representations to derive user embeddings that allow us to effectively classify users as: genuine users, bots, promotional customers, and normal customers. Our method significantly outperforms state-of-the-art approaches (32.95% better macro F1-score than the best baseline).

Journal ArticleDOI
TL;DR: In this paper, three social science theories, Emotional Information, Diffusion of Innovations, and Individual Personality, are used for signed link analysis. But the authors focus on the problem of data sparsity, i.e., only a small percentage of signed links are given.
Abstract: Many real-world relations can be represented by signed networks with positive links (e.g., friendships and trust) and negative links (e.g., foes and distrust). Link prediction helps advance tasks in social network analysis such as recommendation systems. Most existing work on link analysis focuses on unsigned social networks. The existence of negative links piques research interests in investigating whether properties and principles of signed networks differ from those of unsigned networks and mandates dedicated efforts on link analysis for signed social networks. Recent findings suggest that properties of signed networks substantially differ from those of unsigned networks and negative links can be of significant help in signed link analysis in complementary ways. In this article, we center our discussion on a challenging problem of signed link analysis. Signed link analysis faces the problem of data sparsity, i.e., only a small percentage of signed links are given. This problem can even get worse when negative links are much sparser than positive ones as users are inclined more toward positive disposition rather than negative. We investigate how we can take advantage of other sources of information for signed link analysis. This research is mainly guided by three social science theories, Emotional Information, Diffusion of Innovations, and Individual Personality. Guided by these, we extract three categories of related features and leverage them for signed link analysis. Experiments show the significance of the features gleaned from social theories for signed link prediction and addressing the data sparsity challenge.

Journal ArticleDOI
TL;DR: Results show that supervised rank aggregation methods provide improvements in the results of the recommended rankings in six out of seven datasets, and these methods provide robustness even in the presence of a big set of weak recommendation rankings.
Abstract: Recommender Systems are tools designed to help users find relevant information from the myriad of content available online. They work by actively suggesting items that are relevant to users according to their historical preferences or observed actions. Among recommender systems, top-N recommenders work by suggesting a ranking of N items that can be of interest to a user. Although a significant number of top-N recommenders have been proposed in the literature, they often disagree in their returned rankings, offering an opportunity for improving the final recommendation ranking by aggregating the outputs of different algorithms. Rank aggregation was successfully used in a significant number of areas, but only a few rank aggregation methods have been proposed in the recommender systems literature. Furthermore, there is a lack of studies regarding rankings’ characteristics and their possible impacts on the improvements achieved through rank aggregation. This work presents an extensive two-phase experimental analysis of rank aggregation in recommender systems. In the first phase, we investigate the characteristics of rankings recommended by 15 different top-N recommender algorithms regarding agreement and diversity. In the second phase, we look at the results of 19 rank aggregation methods and identify different scenarios where they perform best or worst according to the input rankings’ characteristics. Our results show that supervised rank aggregation methods provide improvements in the results of the recommended rankings in six out of seven datasets. These methods provide robustness even in the presence of a big set of weak recommendation rankings. However, in cases where there was a set of non-diverse high-quality input rankings, supervised and unsupervised algorithms produced similar results. In these cases, we can avoid the cost of the former in favor of the latter.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a novel end-to-end approach for visual similarity modeling, called deep neighborhood component analysis, which discriminatively trains deep neural networks to jointly learn visual features and similarities.
Abstract: Learning effective visual similarity is an essential problem in multimedia research. Despite the promising progress made in recent years, most existing approaches learn visual features and similarities in two separate stages, which inevitably limits their performance. Once useful information has been lost in the feature extraction stage, it can hardly be recovered later. This article proposes a novel end-to-end approach for visual similarity modeling, called deep neighborhood component analysis, which discriminatively trains deep neural networks to jointly learn visual features and similarities. Specifically, we first formulate a metric learning objective that maximizes the intra-class correlations and minimizes the inter-class correlations under the neighborhood component analysis criterion, and then train deep convolutional neural networks to learn a nonlinear mapping that projects visual instances from original feature space to a discriminative and neighborhood-structure-preserving embedding space, thus resulting in better performance. We conducted extensive evaluations on several widely used and challenging datasets, and the impressive results demonstrate the effectiveness of our proposed approach.

Journal ArticleDOI
TL;DR: HERA as discussed by the authors integrates the strengths of both the pairwise ranking loss and the pointwise reconstruction loss to provide informative label ranking and reconstruction information for label identification, whereas the embedded sparse and low-rank scheme constrains the sparsity of ground-truth label matrix and the low rank of noise label matrix to explore the global label relevance among the whole training data.
Abstract: Partial label learning (PLL) aims to learn from the data where each training instance is associated with a set of candidate labels, among which only one is correct. Most existing methods deal with this type of problem by either treating each candidate label equally or identifying the ground-truth label iteratively. In this article, we propose a novel PLL approach named HERA, which simultaneously incorporates the HeterogEneous Loss and the SpaRse and Low-rAnk procedure to estimate the labeling confidence for each instance while training the desired model. Specifically, the heterogeneous loss integrates the strengths of both the pairwise ranking loss and the pointwise reconstruction loss to provide informative label ranking and reconstruction information for label identification, whereas the embedded sparse and low-rank scheme constrains the sparsity of ground-truth label matrix and the low rank of noise label matrix to explore the global label relevance among the whole training data, for improving the learning model. Comprehensive ablation study demonstrates the effectiveness of our employed heterogeneous loss, and extensive experiments on both artificial and real-world datasets demonstrate that our method achieves superior or comparable performance against state-of-the-art methods.

Journal ArticleDOI
TL;DR: A multi-task learning framework with deep neural networks (DNNs) to jointly learn and optimize two companion tasks in Web search engines: entity recommendation and document ranking, which can be easily trained in an end-to-end manner.
Abstract: Entity recommendation, providing users with an improved search experience by proactively recommending related entities to a given query, has become an indispensable feature of today’s Web search engine. Existing studies typically only consider the query issued at the current timestep while ignoring the in-session user search behavior (short-term search history) or historical user search behavior across all sessions (long-term search history) when generating entity recommendations. As a consequence, they may fail to recommend entities of interest relevant to a user’s actual information need. In this work, we believe that both short-term and long-term search history convey valuable evidence that could help understand the user’s search intent behind a query, and take both of them into consideration for entity recommendation. Furthermore, there has been little work on exploring whether the use of other companion tasks in Web search such as document ranking as auxiliary tasks could improve the performance of entity recommendation. To this end, we propose a multi-task learning framework with deep neural networks (DNNs) to jointly learn and optimize two companion tasks in Web search engines: entity recommendation and document ranking, which can be easily trained in an end-to-end manner. Specifically, we regard document ranking as an auxiliary task to improve the main task of entity recommendation, where the representations of queries, sessions, and users are shared across all tasks and optimized by the multi-task objective during training. We evaluate our approach using large-scale, real-world search logs of a widely-used commercial Web search engine. We also performed extensive ablation experiments over a number of facets of the proposed multi-task DNN model to figure out their relative importance. The experimental results show that both short-term and long-term search history can bring significant improvements in recommendation effectiveness, and the combination of both outperforms using either of them individually. In addition, the experiments show that the performance of both entity recommendation and document ranking can be significantly improved, which demonstrates the effectiveness of using multi-task learning to jointly optimize the two companion tasks in Web search.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new learning framework, called Representation Learning for Road Networks (RLRN), which explores various intrinsic properties of road networks to learn embeddings of intersections and road segments in road networks.
Abstract: Informative representation of road networks is essential to a wide variety of applications on intelligent transportation systems. In this article, we design a new learning framework, called Representation Learning for Road Networks (RLRN), which explores various intrinsic properties of road networks to learn embeddings of intersections and road segments in road networks. To implement the RLRN framework, we propose a new neural network model, namely Road Network to Vector (RN2Vec), to learn embeddings of intersections and road segments jointly by exploring geo-locality and homogeneity of them, topological structure of the road networks, and moving behaviors of road users. In addition to model design, issues involving data preparation for model training are examined. We evaluate the learned embeddings via extensive experiments on several real-world datasets using different downstream test cases, including node/edge classification and travel time estimation. Experimental results show that the proposed RN2Vec robustly outperforms existing methods, including (i) Feature-based methods: raw features and principal components analysis (PCA); (ii) Network embedding methods: DeepWalk, LINE, and Node2vec; and (iii) Features + Network structure-based methods: network embeddings and PCA, graph convolutional networks, and graph attention networks. RN2Vec significantly outperforms all of them in terms of F1-score in classifying traffic signals (11.96% to 16.86%) and crossings (11.36% to 16.67%) on intersections and in classifying avenue (10.56% to 15.43%) and street (11.54% to 16.07%) on road segments, as well as in terms of Mean Absolute Error in travel time estimation (17.01% to 23.58%).

Journal ArticleDOI
TL;DR: A novel snow removal framework for a single image, which can be separated into a sparse image approximation module and an adaptive tolerance optimization module, which achieves a better efficacy of snow removal than do other state-of-the-art techniques via both objective and subjective evaluations.
Abstract: Images are often corrupted by natural obscuration (e.g., snow, rain, and haze) during acquisition in bad weather conditions. The removal of snowflakes from only a single image is a challenging task due to situational variety and has been investigated only rarely. In this article, we propose a novel snow removal framework for a single image, which can be separated into a sparse image approximation module and an adaptive tolerance optimization module. The first proposed module takes the advantage of sparsity-based regularization to reconstruct a potential snow-free image. An auto-tuning mechanism for this framework is then proposed to seek a better reconstruction of a snow-free image via the time-varying inertia weight particle swarm optimizers in the second proposed module. Through collaboration of these two modules iteratively, the number of snowflakes in the reconstructed image is reduced as generations progress. By the experimental results, the proposed method achieves a better efficacy of snow removal than do other state-of-the-art techniques via both objective and subjective evaluations. As a result, the proposed method is able to remove snowflakes successfully from only a single image while preserving most original object structure information.