scispace - formally typeset
Search or ask a question

Showing papers by "Ivor W. Tsang published in 2020"


Journal ArticleDOI
TL;DR: The four Vs of multi-output learning are characterized, i.e., volume, velocity, variety, and veracity, and the ways in which the four Vs both benefit and bring challenges to multi- output learning by taking inspiration from big data are examined.
Abstract: The aim of multi-output learning is to simultaneously predict multiple outputs given an input. It is an important learning problem for decision-making since making decisions in the real world often involves multiple complex factors and criteria. In recent times, an increasing number of research studies have focused on ways to predict multiple outputs at once. Such efforts have transpired in different forms according to the particular multi-output learning problem under study. Classic cases of multi-output learning include multi-label learning, multi-dimensional learning, multi-target regression, and others. From our survey of the topic, we were struck by a lack in studies that generalize the different forms of multi-output learning into a common framework. This article fills that gap with a comprehensive review and analysis of the multi-output learning paradigm. In particular, we characterize the four Vs of multi-output learning, i.e., volume, velocity, variety, and veracity, and the ways in which the four Vs both benefit and bring challenges to multi-output learning by taking inspiration from big data. We analyze the life cycle of output labeling, present the main mathematical definitions of multi-output learning, and examine the field’s key challenges and corresponding solutions as found in the literature. Several model evaluation metrics and popular data repositories are also discussed. Last but not least, we highlight some emerging challenges with multi-output learning from the perspective of the four Vs as potential research directions worthy of further studies.

124 citations


Proceedings Article
30 Apr 2020
TL;DR: This paper studies the 0-1 loss, which has a monotonic relationship with an empirical adversary (reweighted) risk, and proposes a very simple and efficient loss, i.e. curriculum loss (CL), which bridges a connection between curriculum learning and robust learning.
Abstract: Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship between empirical adversary (reweighted) risk (Hu et al. 2018). Although the 0-1 loss is robust to outliers, it is also difficult to optimize. To efficiently optimize the 0-1 loss while keeping its robust properties, we propose a very simple and efficient loss, i.e. curriculum loss (CL). Our CL is a tighter upper bound of the 0-1 loss compared with conventional summation based surrogate losses. Moreover, CL can adaptively select samples for stagewise training. As a result, our loss can be deemed as a novel perspective of curriculum sample selection strategy, which bridges a connection between curriculum learning and robust learning. Experimental results on noisy MNIST, CIFAR10 and CIFAR100 dataset validate the robustness of the proposed loss.

87 citations


Posted Content
TL;DR: A formal definition of Label-Noise Representation Learning is clarified from the perspective of machine learning and the reason why noisy labels affect deep models' performance is figured out.
Abstract: Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios. However, statistical-learning-based methods may not train deep learning models robustly with these noisy labels. Therefore, it is urgent to design Label-Noise Representation Learning (LNRL) methods for robustly training deep models with noisy labels. To fully understand LNRL, we conduct a survey study. We first clarify a formal definition for LNRL from the perspective of machine learning. Then, via the lens of learning theory and empirical study, we figure out why noisy labels affect deep models' performance. Based on the theoretical guidance, we categorize different LNRL methods into three directions. Under this unified taxonomy, we provide a thorough discussion of the pros and cons of different categories. More importantly, we summarize the essential components of robust LNRL, which can spark new directions. Lastly, we propose possible research directions within LNRL, such as new datasets, instance-dependent LNRL, and adversarial LNRL. We also envision potential directions beyond LNRL, such as learning with feature-noise, preference-noise, domain-noise, similarity-noise, graph-noise and demonstration-noise.

80 citations


Proceedings Article
21 Nov 2020
TL;DR: Stochastic integrated gradient underweighted ascent (SIGUA) is proposed: in a mini-batch, this approach is a versatile approach where data goodness or badness is w.r.t. desired or undesired memorization given a base learning method.
Abstract: Given data with noisy labels, over-parameterized deep networks can gradually memorize the data, and fit everything in the end. Although equipped with corrections for noisy labels, many learning methods in this area still suffer overfitting due to undesired memorization. In this paper, to relieve this issue, we propose stochastic integrated gradient underweighted ascent (SIGUA): in a mini-batch, we adopt gradient descent on good data as usual, and learning-rate-reduced gradient ascent on bad data; the proposal is a versatile approach where data goodness or badness is w.r.t. desired or undesired memorization given a base learning method. Technically, SIGUA pulls optimization back for generalization when their goals conflict with each other; philosophically, SIGUA shows forgetting undesired memorization can reinforce desired memorization. Experiments demonstrate that SIGUA successfully robustifies two typical base learning methods, so that their performance is often significantly improved.

79 citations


Posted Content
TL;DR: There has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data, and it is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.
Abstract: Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data. For example, extreme multi-label classification is an active and rapidly growing research area that deals with classification tasks with an extremely large number of classes or labels; utilizing massive data with limited supervision to build a multi-label classification model becomes valuable for practical applications, etc. Besides these, there are tremendous efforts on how to harvest the strong learning capability of deep learning to better capture the label dependencies in multi-label learning, which is the key for deep learning to address real-world classification tasks. However, it is noted that there has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data. It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.

72 citations


Journal ArticleDOI
TL;DR: In this paper, a novel multi-view co-clustering method based on bipartite graphs is proposed and an efficient algorithm is proposed to optimize this model with theoretically guaranteed convergence.

62 citations


Journal ArticleDOI
TL;DR: In this paper, a variance reduced stochastic gradient descent (VR-SGD) algorithm is proposed to solve non-smooth and non-strongly convex problems directly without any reduction techniques.
Abstract: In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from most algorithms that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate $\mathcal {O}(1/T^2)$ O ( 1 / T 2 ) . Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha.

50 citations


Journal ArticleDOI
TL;DR: This research investigates the use of a convolutional neural network (CNN) as a feature extraction and classification method for land cover mapping using high-resolution orthophotos and demonstrated that the proposed ZSL is a promising tool forLand cover mapping based on high- resolution photos.
Abstract: Zero-shot learning (ZSL) is an approach to classify objects unseen during the training phase and shown to be useful for real-world applications, especially when there is a lack of sufficient training data. Only a limited amount of works has been carried out on ZSL, especially in the field of remote sensing. This research investigates the use of a convolutional neural network (CNN) as a feature extraction and classification method for land cover mapping using high-resolution orthophotos. In the feature extraction phase, we used a CNN model with a single convolutional layer to extract discriminative features. In the second phase, we used class attributes learned from the Word2Vec model (pre-trained by Google News) to train a second CNN model that performed class signature prediction by using both the features extracted by the first CNN and class attributes during training and only the features during prediction. We trained and tested our models on datasets collected over two subareas in the Cameron Highlands (training dataset, first test dataset) and Ipoh (second test dataset) in Malaysia. Several experiments have been conducted on the feature extraction and classification models regarding the main parameters, such as the network’s layers and depth, number of filters, and the impact of Gaussian noise. As a result, the best models were selected using various accuracy metrics such as top-k categorical accuracy for k = [1,2,3], Recall, Precision, and F1-score. The best model for feature extraction achieved 0.953 F1-score, 0.941 precision, 0.882 recall for the training dataset and 0.904 F1-score, 0.869 precision, 0.949 recall for the first test dataset, and 0.898 F1-score, 0.870 precision, 0.838 recall for the second test dataset. The best model for classification achieved an average of 0.778 top-one, 0.890 top-two and 0.942 top-three accuracy, 0.798 F1-score, 0.766 recall and 0.838 precision for the first test dataset and 0.737 top-one, 0.906 top-two, 0.924 top-three, 0.729 F1-score, 0.676 recall and 0.790 precision for the second test dataset. The results demonstrated that the proposed ZSL is a promising tool for land cover mapping based on high-resolution photos.

45 citations


Journal ArticleDOI
TL;DR: Andre, a new ANDroid Hybrid REpresentation Learning approach to clustering weakly-labeled Android malware by preserving heterogeneous information from multiple sources to jointly learn a hybrid representation for accurate clustering.
Abstract: Labeling malware or malware clustering is important for identifying new security threats, triaging and building reference datasets. The state-of-the-art Android malware clustering approaches rely heavily on the raw labels from commercial AntiVirus (AV) vendors, which causes misclustering for a substantial number of weakly-labeled malware due to the inconsistent, incomplete and overly generic labels reported by these closed-source AV engines, whose capabilities vary greatly and whose internal mechanisms are opaque (i.e., intermediate detection results are unavailable for clustering). The raw labels are thus often used as the only important source of information for clustering. To address the limitations of the existing approaches, this paper presents Andre, a new ANDroid Hybrid REpresentation Learning approach to clustering weakly-labeled Android malware by preserving heterogeneous information from multiple sources (including the results of static code analysis, the meta-information of an app, and the raw-labels of the AV vendors) to jointly learn a hybrid representation for accurate clustering. The learned representation is then fed into our outlier-aware clustering to partition the weakly-labeled malware into known and unknown families. The malware whose malicious behaviours are close to those of the existing families on the network, are further classified using a three-layer Deep Neural Network (DNN). The unknown malware are clustered using a standard density-based clustering algorithm. We have evaluated our approach using 5,416 ground-truth malware from Drebin and 9,000 malware from VirusShare (uploaded between Mar. 2017 and Feb. 2018), consisting of 3324 weakly-labeled malware. The evaluation shows that Andre effectively clusters weakly-labeled malware which cannot be clustered by the state-of-the-art approaches, while achieving comparable accuracy with those approaches for clustering ground-truth samples.

42 citations


Journal ArticleDOI
TL;DR: This article makes a shared-latent space assumption on graphs and develops a novel distribution matching-based GNN called structure-attribute transformer (SAT) for attribute-missing graphs that shows better performance than other methods on both link prediction and node attribute completion tasks.
Abstract: Graphs with complete node attributes have been widely explored recently. While in practice, there is a graph where attributes of only partial nodes could be available and those of the others might be entirely missing. This attribute-missing graph is related to numerous real-world applications and there are limited studies investigating the corresponding learning problems. Existing graph learning methods including the popular GNN cannot provide satisfied learning performance since they are not specified for attribute-missing graphs. Thereby, designing a new GNN for these graphs is a burning issue to the graph learning community. In this paper, we make a shared-latent space assumption on graphs and develop a novel distribution matching based GNN called structure-attribute transformer (SAT) for attribute-missing graphs. SAT leverages structures and attributes in a decoupled scheme and achieves the joint distribution modeling of structures and attributes by distribution matching techniques. It could not only perform the link prediction task but also the newly introduced node attribute completion task. Furthermore, practical measures are introduced to quantify the performance of node attribute completion. Extensive experiments on seven real-world datasets indicate SAT shows better performance than other methods on both link prediction and node attribute completion tasks.

41 citations


Proceedings Article
01 Jan 2020
TL;DR: A novel graph cross network (GXN) to achieve comprehensive feature learning from multiple scales of a graph, which includes a novel vertex infomax pooling (VIPool), and a novel feature-crossing layer, enabling feature interchange across scales.
Abstract: We propose a novel graph cross network (GXN) to achieve comprehensive feature learning from multiple scales of a graph. Based on trainable hierarchical representations of a graph, GXN enables the interchange of intermediate features across scales to promote information flow. Two key ingredients of GXN include a novel vertex infomax pooling (VIPool), which creates multiscale graphs in a trainable manner, and a novel feature-crossing layer, enabling feature interchange across scales. The proposed VIPool selects the most informative subset of vertices based on the neural estimation of mutual information between vertex features and neighborhood features. The intuition behind is that a vertex is informative when it can maximally reflect its neighboring information. The proposed feature-crossing layer fuses intermediate features between two scales for mutual enhancement by improving information flow and enriching multiscale features at hidden layers. The cross shape of the feature-crossing layer distinguishes GXN from many other multiscale architectures. Experimental results show that the proposed GXN improves the classification accuracy by 2.12% and 1.15% on average for graph classification and vertex classification, respectively. Based on the same network, the proposed VIPool consistently outperforms other graph-pooling methods.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Wang et al. as discussed by the authors proposed a copy and paste generative adversarial network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination.
Abstract: Existing face hallucination methods based on convolutional neural networks (CNN) have achieved impressive performance on low-resolution (LR) faces in a normal illumination condition. However, their performance degrades dramatically when LR faces are captured in low or non-uniform illumination conditions. This paper proposes a Copy and Paste Generative Adversarial Network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination. To this end, we develop two key components in our CPGAN: internal and external Copy and Paste nets (CPnets). Specifically, our internal CPnet exploits facial information residing in the input image to enhance facial details; while our external CPnet leverages an external HR face for illumination compensation. A new illumination compensation loss is thus developed to capture illumination from the external guided face image effectively. Furthermore, our method offsets illumination and upsamples facial details alternatively in a coarse-to-fine fashion, thus alleviating the correspondence ambiguity between LR inputs and external HR inputs. Extensive experiments demonstrate that our method manifests authentic HR face images in a uniform illumination condition and outperforms state-of-the-art methods qualitatively and quantitatively.

Journal ArticleDOI
TL;DR: An LM Partial LAbel machiNE (LM-PLANE) is proposed by extending multi-class support vector machines (SVM) to PLL by solving the main challenge of PLL, where each training instance is associated with a set of candidate labels but only one label is the ground truth.
Abstract: Partial label learning (PLL) is a multi-class weakly supervised learning problem where each training instance is associated with a set of candidate labels but only one label is the ground truth. The main challenge of PLL is how to deal with the label ambiguities. Among various disambiguation techniques, large margin (LM)-based algorithms attract much attention due to their powerful discriminative performance. However, existing LM-based algorithms either neglect some potential candidate labels in constructing the margin or introduce auxiliary estimation of class capacities which is generally inaccurate. As a result, their generalization performances are deteriorated. To address the above-mentioned drawbacks, motivated by the optimistic superset loss, we propose an LM Partial LAbel machiNE (LM-PLANE) by extending multi-class support vector machines (SVM) to PLL. Compared with existing LM-based disambiguation algorithms, LM-PLANE considers the margin of all potential candidate labels without auxiliary estimation of class capacities. Furthermore, an efficient cutting plane (CP) method is developed to train LM-PLANE in the dual space. Theoretical insights into the effectiveness and convergence of our CP method are also presented. Extensive experiments on various PLL tasks demonstrate the superiority of LM-PLANE over existing LM based and other representative PLL algorithms in terms of classification accuracy.

Journal ArticleDOI
TL;DR: A novel Vivid Face Hallucination Generative Adversarial Network (VividGAN) for simultaneously super-resolving and frontalizing tiny non-frontal face images and achieves photo-realistic frontal HR faces, reaching superior performance in downstream tasks, compared with other state-of-the-art methods.
Abstract: Obtaining a high-quality frontal face image from a low-resolution (LR) non-frontal face image is primarily important for many facial analysis applications. However, mainstreams either focus on super-resolving near-frontal LR faces or frontalizing non-frontal high-resolution (HR) faces. It is desirable to perform both tasks seamlessly for daily-life unconstrained face images. In this paper, we present a novel Vivid Face Hallucination Generative Adversarial Network (VividGAN) for simultaneously super-resolving and frontalizing tiny non-frontal face images. VividGAN consists of coarse-level and fine-level Face Hallucination Networks (FHnet) and two discriminators, i.e., Coarse-D and Fine-D. The coarse-level FHnet generates a frontal coarse HR face and then the fine-level FHnet makes use of the facial component appearance prior, i.e., fine-grained facial components, to attain a frontal HR face image with authentic details. In the fine-level FHnet, we also design a facial component-aware module that adopts the facial geometry guidance as clues to accurately align and merge the frontal coarse HR face and prior information. Meanwhile, two-level discriminators are designed to capture both the global outline of a face image as well as detailed facial characteristics. The Coarse-D enforces the coarsely hallucinated faces to be upright and complete while the Fine-D focuses on the fine hallucinated ones for sharper details. Extensive experiments demonstrate that our VividGAN achieves photo-realistic frontal HR faces, reaching superior performance in downstream tasks, i.e., face recognition and expression classification, compared with other state-of-the-art methods.

Posted Content
TL;DR: This work proposes a novel reward learning module to generate intrinsic reward signals via a generative model that can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment.
Abstract: Imitation learning in a high-dimensional environment is challenging Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, eg, Atari domain To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model Our generative method can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment Thus, our module provides the imitation agent both the intrinsic intention of the demonstrator and a better exploration ability, which is critical for the agent to outperform the demonstrator Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration Remarkably, our method achieves performance that is up to 5 times the performance of the demonstration

Posted Content
TL;DR: A Copy and Paste Generative Adversarial Network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination and alleviating the correspondence ambiguity between LR inputs and external HR inputs is proposed.
Abstract: Existing face hallucination methods based on convolutional neural networks (CNN) have achieved impressive performance on low-resolution (LR) faces in a normal illumination condition. However, their performance degrades dramatically when LR faces are captured in low or non-uniform illumination conditions. This paper proposes a Copy and Paste Generative Adversarial Network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination. To this end, we develop two key components in our CPGAN: internal and external Copy and Paste nets (CPnets). Specifically, our internal CPnet exploits facial information residing in the input image to enhance facial details; while our external CPnet leverages an external HR face for illumination compensation. A new illumination compensation loss is thus developed to capture illumination from the external guided face image effectively. Furthermore, our method offsets illumination and upsamples facial details alternately in a coarse-to-fine fashion, thus alleviating the correspondence ambiguity between LR inputs and external HR inputs. Extensive experiments demonstrate that our method manifests authentic HR face images in a uniform illumination condition and outperforms state-of-the-art methods qualitatively and quantitatively.

Proceedings ArticleDOI
20 Apr 2020
TL;DR: Comprehensive experiments show that the proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.
Abstract: Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.

Journal ArticleDOI
TL;DR: This article proposes to capture the useful traits buried in previous optimized routing solutions by learning a new customer representation, which can be transferred across VRPs, serving as the prior knowledge, to bias the optimization in the target VRP.
Abstract: The Vehicle Routing Problem (VRP) is a well-known NP-hard combinatorial optimization problem, which has wide spread applications in real world, such as logistics, bus route planning, and urban path planning. To solve VRP, traditional optimization methods usually start the search from scratch and ignore the VRPs solved in the past, which could lead to repeated explorations of the search space of related problems, and thus results in slow optimization process involving unnecessary computational cost. Keeping this in mind, to speed up the optimization for vehicle routing, this article presents a new study towards faster vehicle routing by transferring knowledge from customer representations which are learned from past solved VRPs. In particular, we propose to capture the useful traits buried in previous optimized routing solutions by learning a new customer representation, which can be transferred across VRPs, serving as the prior knowledge, to bias the optimization in the target VRP. In contrast to existing approaches, the proposed knowledge transfer is consist of a learning of new customer representation based on the optimized routing solution, which is general to VRPs possessing different structural properties, and a weighted l₁ norm-regularized formulation for building sparse mapping across VRPs, that is easy to solve. Further, the proposed knowledge transfer across VRPs occurs along the whole optimization search process, and is thus able to guide the routing optimization process consistently. To verify the efficacy of the proposed method, by using population-based optimization method as the VRP solver, comprehensive empirical studies on both commonly used VRP benchmarks and real world vehicle routing application are presented.

Journal ArticleDOI
TL;DR: A unified deep architecture (DANA) is proposed to obtain a domain-invariant representation for network alignment via an adversarial domain classifier to achieve state-of-the-art alignment results.
Abstract: Network alignment is a critical task to a wide variety of fields. Many existing works leverage on representation learning to accomplish this task without eliminating domain representation bias induced by domain-dependent features, which yield inferior alignment performance. This paper proposes a unified deep architecture (DANA) to obtain a domain-invariant representation for network alignment via an adversarial domain classifier. Specifically, we employ the graph convolutional networks to perform network embedding under the domain adversarial principle, given a small set of observed anchors. Then, the semi-supervised learning framework is optimized by maximizing a posterior probability distribution of observed anchors and the loss of a domain classifier simultaneously. We also develop a few variants of our model, such as, direction-aware network alignment, weight-sharing for directed networks and simplification of parameter space. Experiments on three real-world social network datasets demonstrate that our proposed approaches achieve state-of-the-art alignment results.

Proceedings Article
12 Jul 2020
TL;DR: In this article, a novel reward learning module is proposed to generate intrinsic reward signals via a generative model, which can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment.
Abstract: Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domain. To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment. Thus, our module provides the imitation agent both the intrinsic intention of the demonstrator and a better exploration ability, which is critical for the agent to outperform the demonstrator. Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration. Remarkably, our method achieves performance that is up to 5 times the performance of the demonstration.

Posted Content
TL;DR: This paper proposes an equivalent transformation learner (ETL) which models the joint distribution of user behaviors across domains and assumes that each user’s preferences in one domain can be expressed by the other one, and these preferences can be mutually converted to each other with the so-called equivalent transformation.
Abstract: Cross domain recommendation (CDR) has been proposed to tackle the data sparsity problem in recommender systems. This paper focuses on a common scenario for CDR where different domains share the same set of users but no overlapping items. The majority of recent methods have explored shared-user representation to transfer knowledge across different domains. However, the idea of shared-user representation resorts to learn the overlapped properties of user preferences across different domains and suppresses the domain-specific properties of user preferences. In this paper, we attempt to learn both properties of user preferences for CDR, i.e. capturing both the overlapped and domain-specific properties. In particular, we assume that each user's preferences in one domain can be expressed by the other one, and these preferences can be mutually converted to each other with the so-called equivalent transformations. Based on this assumption, we propose an equivalent transformation learner (ETL) which models the joint distribution of user behaviors across different domains. The equivalent transformations in ETL relax the idea of shared-user representation and allow the learned preferences in different domains to have the capacity of preserving the domain-specific properties as well as the overlapped properties. Extensive experiments on three public benchmarks demonstrate the effectiveness of ETL compared with recent state-of-the-art methods.

Journal ArticleDOI
TL;DR: This article presents a distribution-shattering strategy without an estimation of hypotheses by shattering the number density of the input distribution, and shows that sampling in a shattered distribution reduces label complexity and error disagreement.
Abstract: Active learning (AL) aims to maximize the learning performance of the current hypothesis by drawing as few labels as possible from an input distribution. Generally, most existing AL algorithms prune the hypothesis set via querying labels of unlabeled samples and could be deemed as a hypothesis-pruning strategy. However, this process critically depends on the initial hypothesis and its subsequent updates. This article presents a distribution-shattering strategy without an estimation of hypotheses by shattering the number density of the input distribution. For any hypothesis class, we halve the number density of an input distribution to obtain a shattered distribution, which characterizes any hypothesis with a lower bound on VC dimension. Our analysis shows that sampling in a shattered distribution reduces label complexity and error disagreement. With this paradigm guarantee, in an input distribution, a Shattered Distribution-based AL (SDAL) algorithm is derived to continuously split the shattered distribution into a number of representative samples. An empirical evaluation of benchmark data sets further verifies the effectiveness of the halving and querying abilities of SDAL in real-world AL tasks with limited labels. Experiments on active querying with adversarial examples and noisy labels further verify our theoretical insights on the performance disagreement of the hypothesis-pruning and distribution-shattering strategies. Our code is available at https://github.com/XiaofengCao-MachineLearning/Shattering-Distribution-for-Active-Learning.

Journal ArticleDOI
TL;DR: A novel channel-reliability-aware ranking (CArank) model for the multichannel ranking problem that learns from BDPs using EEG data robustly and aims at preserving the ordering corresponding to RTs, and introduces a transition matrix to characterize the reliability of each channel used in the EEG data.
Abstract: A driver's cognitive state of mental fatigue significantly affects his or her driving performance and more important, public safety. Previous studies have leveraged reaction time (RT) as the metric...

Posted Content
TL;DR: A Deep Pairwise Hashing (DPH) is proposed to map users and items to binary vectors in Hamming space, where a user's preference for an item can be efficiently calculated by Hamming distance, which significantly improves the efficiency of online recommendation.
Abstract: Recommendation efficiency and data sparsity problems have been regarded as two challenges of improving performance for online recommendation. Most of the previous related work focus on improving recommendation accuracy instead of efficiency. In this paper, we propose a Deep Pairwise Hashing (DPH) to map users and items to binary vectors in Hamming space, where a user's preference for an item can be efficiently calculated by Hamming distance, which significantly improves the efficiency of online recommendation. To alleviate data sparsity and cold-start problems, the user-item interactive information and item content information are unified to learn effective representations of items and users. Specifically, we first pre-train robust item representation from item content data by a Denoising Auto-encoder instead of other deterministic deep learning frameworks; then we finetune the entire framework by adding a pairwise loss objective with discrete constraints; moreover, DPH aims to minimize a pairwise ranking loss that is consistent with the ultimate goal of recommendation. Finally, we adopt the alternating optimization method to optimize the proposed model with discrete constraints. Extensive experiments on three different datasets show that DPH can significantly advance the state-of-the-art frameworks regarding data sparsity and item cold-start recommendation.

Posted Content
TL;DR: The stability of node representations is introduced in addition to the smoothness and identifiability, and a novel method called contrastive graph neural networks (CGNN) is developed that learns robust node representations in an unsupervised manner.
Abstract: Graph neural networks (GNN), as a popular methodology for node representation learning on graphs, currently mainly focus on preserving the smoothness and identifiability of node representations. A robust node representation on graphs should further hold the stability property which means a node representation is resistant to slight perturbations on the input. In this paper, we introduce the stability of node representations in addition to the smoothness and identifiability, and develop a novel method called contrastive graph neural networks (CGNN) that learns robust node representations in an unsupervised manner. Specifically, CGNN maintains the stability and identifiability by a contrastive learning objective, while preserving the smoothness with existing GNN models. Furthermore, the proposed method is a generic framework that can be equipped with many other backbone models (e.g. GCN, GraphSage and GAT). Extensive experiments on four benchmarks under both transductive and inductive learning setups demonstrate the effectiveness of our method in comparison with recent supervised and unsupervised models.

Posted Content
TL;DR: Wang et al. as discussed by the authors proposed a novel graph cross network (GXN) to achieve comprehensive feature learning from multiple scales of a graph, which enables the interchange of intermediate features across scales to promote information flow.
Abstract: We propose a novel graph cross network (GXN) to achieve comprehensive feature learning from multiple scales of a graph. Based on trainable hierarchical representations of a graph, GXN enables the interchange of intermediate features across scales to promote information flow. Two key ingredients of GXN include a novel vertex infomax pooling (VIPool), which creates multiscale graphs in a trainable manner, and a novel feature-crossing layer, enabling feature interchange across scales. The proposed VIPool selects the most informative subset of vertices based on the neural estimation of mutual information between vertex features and neighborhood features. The intuition behind is that a vertex is informative when it can maximally reflect its neighboring information. The proposed feature-crossing layer fuses intermediate features between two scales for mutual enhancement by improving information flow and enriching multiscale features at hidden layers. The cross shape of the feature-crossing layer distinguishes GXN from many other multiscale architectures. Experimental results show that the proposed GXN improves the classification accuracy by 2.12% and 1.15% on average for graph classification and vertex classification, respectively. Based on the same network, the proposed VIPool consistently outperforms other graph-pooling methods.

Journal ArticleDOI
TL;DR: This paper proposes a stage-wise matrix factorization algorithm by exploiting manifold optimization techniques and studies two representative cases of low-rank matrix recovery, i.e., collaborative filtering for recommendation and high dynamic range imaging.
Abstract: Matrix factorization has been widely applied to various applications. With the fast development of storage and internet technologies, we have been witnessing a rapid increase of data. In this paper, we propose new algorithms for matrix factorization with the emphasis on efficiency. In addition, most existing methods of matrix factorization only consider a general smooth least square loss. Differently, many real-world applications have distinctive characteristics. As a result, different losses should be used accordingly. Therefore, it is beneficial to design new matrix factorization algorithms that are able to deal with both smooth and non-smooth losses. To this end, one needs to analyze the characteristics of target data and use the most appropriate loss based on the analysis. We particularly study two representative cases of low-rank matrix recovery, i.e., collaborative filtering for recommendation and high dynamic range imaging. To solve these two problems, we respectively propose a stage-wise matrix factorization algorithm by exploiting manifold optimization techniques. From our theoretical analysis, they are both are provably guaranteed to converge to a stationary point. Extensive experiments on recommender systems and high dynamic range imaging demonstrate the satisfactory performance and efficiency of our proposed method on large-scale real data.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a Deep Pairwise Hashing (DPH) framework to map users and items to binary vectors in the Hamming space, where a user's preference for an item can be efficiently calculated by the hamming distance, which significantly improves the efficiency of online recommendation.
Abstract: Recommendation efficiency and data sparsity problems have been regarded as two main challenges of real-world recommendation systems. Most existing works focus on improving recommendation accuracy instead of efficiency. In this paper, we propose a Deep Pairwise Hashing (DPH) to map users and items to binary vectors in the Hamming space, where a user's preference for an item can be efficiently calculated by the Hamming distance, which significantly improves the efficiency of online recommendation. To alleviate data sparsity and cold-start problems, the item content information exploited and integrated to learn effective representations of items. Specifically, we first pre-train robust item representation from item content data by a robust Denoising Auto-encoder instead of other deterministic deep learning frameworks. Then we fine-tune the entire recommender framework by adding a pairwise loss function with discrete constraints, which is more consistent with the ultimate goal of producing a ranked list of items. Finally, we adopt the alternating optimization method to optimize the proposed model with discrete constraints. Extensive experiments conducted on three different datasets show that DPH can significantly advance the state-of-the-art frameworks regarding data sparsity and cold-start item recommendation.

Proceedings Article
01 Jan 2020
TL;DR: In this article, a simple closed-form rank-1 lattice construction method based on group theory is proposed to reduce the number of distinct pairwise distance values to generate a more regular lattice.
Abstract: Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer search. To address this issue, we propose a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. We theoretically prove a lower and an upper bound of the minimum pairwise distance of any non-degenerate rank-1 lattice. Empirically, our methods can generate a near-optimal rank-1 lattice compared with the Korobov exhaustive search regarding the $l_1$-norm and $l_2$-norm minimum distance. Moreover, experimental results show that our method achieves superior approximation performance on benchmark integration test problems and kernel approximation problems.

Journal ArticleDOI
TL;DR: This work proposes a collaborative generated hashing (CGH) framework to improve the efficiency by denoting users and items as binary codes, then fast hashing search techniques can be used to speed up the online recommendation.
Abstract: Cold-start has being a critical issue in recommender systems with the explosion of data in e-commerce. Most existing studies proposed to alleviate the cold-start problem are also known as hybrid recommender systems that learn representations of users and items by combining user-item interactive and user/item content information. However, previous hybrid methods regularly suffered poor efficiency bottlenecking in online recommendations with large-scale items, because they were designed to project users and items into continuous latent space where the online recommendation is expensive. To this end, we propose a collaborative generated hashing (CGH) framework to improve the efficiency by denoting users and items as binary codes, then fast hashing search techniques can be used to speed up the online recommendation. In addition, the proposed CGH can generate potential users or items for marketing application where the generative network is designed with the principle of minimum description length, which is used to learn compact and informative binary codes. Extensive experiments on two public datasets show the advantages for recommendations in various settings over competing baselines and analyze its feasibility in marketing application.