scispace - formally typeset
Search or ask a question

Showing papers on "Matching (statistics) published in 2019"


Journal ArticleDOI
TL;DR: It is shown that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus increasing imbalance, inefficiency, model dependence, and bias.
Abstract: We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

991 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: A new deep learning approach, Moment Matching for Multi-Source Domain Adaptation (M3SDA), which aims to transfer knowledge learned from multiple labeled source domains to an unlabeled target domain by dynamically aligning moments of their feature distributions.
Abstract: Conventional unsupervised domain adaptation (UDA) assumes that training data are sampled from a single domain. This neglects the more practical scenario where training data are collected from multiple sources, requiring multi-source domain adaptation. We make three major contributions towards addressing this problem. First, we collect and annotate by far the largest UDA dataset, called DomainNet, which contains six domains and about 0.6 million images distributed among 345 categories, addressing the gap in data availability for multi-source UDA research. Second, we propose a new deep learning approach, Moment Matching for Multi-Source Domain Adaptation (M3SDA), which aims to transfer knowledge learned from multiple labeled source domains to an unlabeled target domain by dynamically aligning moments of their feature distributions. Third, we provide new theoretical insights specifically for moment matching approaches in both single and multiple source domain adaptation. Extensive experiments are conducted to demonstrate the power of our new dataset in benchmarking state-of-the-art multi-source domain adaptation methods, as well as the advantage of our proposed model. Dataset and Code are available at http://ai.bu.edu/M3SDA/

597 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, a semi-global aggregation layer and a local guided aggregation layer are proposed to capture local and the whole-image cost dependencies respectively, which can be used to replace the widely used 3D convolutional layer which is computationally costly and memory-consuming.
Abstract: In the stereo matching task, matching cost aggregation is crucial in both traditional methods and deep neural network models in order to accurately estimate disparities. We propose two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively. The first is a semi-global aggregation layer which is a differentiable approximation of the semi-global matching, the second is the local guided aggregation layer which follows a traditional cost filtering strategy to refine thin structures. These two layers can be used to replace the widely used 3D convolutional layer which is computationally costly and memory-consuming as it has cubic computational/memory complexity. In the experiments, we show that nets with a two-layer guided aggregation block easily outperform the state-of-the-art GC-Net which has nineteen 3D convolutional layers. We also train a deep guided aggregation network (GA-Net) which gets better accuracies than state-of-the-art methods on both Scene Flow dataset and KITTI benchmarks.

503 citations


Journal ArticleDOI
TL;DR: The authors' method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds, and achieves better or favorably competitive performance in accuracy while intensively cutting time cost by more than two orders of magnitude.
Abstract: Seeking reliable correspondences between two feature sets is a fundamental and important task in computer vision. This paper attempts to remove mismatches from given putative image feature correspondences. To achieve the goal, an efficient approach, termed as locality preserving matching (LPM), is designed, the principle of which is to maintain the local neighborhood structures of those potential true matches. We formulate the problem into a mathematical model, and derive a closed-form solution with linearithmic time and linear space complexities. Our method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds. To demonstrate the generality of our strategy for handling image matching problems, extensive experiments on various real image pairs for general feature matching, as well as for point set registration, visual homing and near-duplicate image retrieval are conducted. Compared with other state-of-the-art alternatives, our LPM achieves better or favorably competitive performance in accuracy while intensively cutting time cost by more than two orders of magnitude.

416 citations


Journal ArticleDOI
TL;DR: In this article, a two-branch neural network is proposed to learn the similarity between image-sentence matching and visual grounding, which achieves high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional imagesentence retrieval on Flickr30k and MSCOCO datasets.
Abstract: Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network , learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network , fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.

391 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, a reinforcement learning-based approach is proposed to enforce cross-modal grounding both locally and globally via reinforcement learning (RL), where a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories.
Abstract: Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL). Particularly, a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories, and a reasoning navigator is employed to perform cross-modal grounding in the local visual scene. Evaluation on a VLN benchmark dataset shows that our RCM model significantly outperforms previous methods by 10% on SPL and achieves the new state-of-the-art performance. To improve the generalizability of the learned policy, we further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions. We demonstrate that SIL can approximate a better and more efficient policy, which tremendously minimizes the success rate performance gap between seen and unseen environments (from 30.7% to 11.7%).

331 citations


Journal ArticleDOI
TL;DR: This paper proposes an efficient incentive mechanism based on contract theoretical modeling to minimize the network delay from a contract-matching integration perspective and demonstrates that significant performance improvement can be achieved by the proposed scheme.
Abstract: Vehicular fog computing (VFC) has emerged as a promising solution to relieve the overload on the base station and reduce the processing delay during the peak time. The computation tasks can be offloaded from the base station to vehicular fog nodes by leveraging the under-utilized computation resources of nearby vehicles. However, the wide-area deployment of VFC still confronts several critical challenges, such as the lack of efficient incentive and task assignment mechanisms. In this paper, we address the above challenges and provide a solution to minimize the network delay from a contract-matching integration perspective. First, we propose an efficient incentive mechanism based on contract theoretical modeling. The contract is tailored for the unique characteristic of each vehicle type to maximize the expected utility of the base station. Next, we transform the task assignment problem into a two-sided matching problem between vehicles and user equipment. The formulated problem is solved by a pricing-based stable matching algorithm, which iteratively carries out the “propose” and “price-rising” procedures to derive a stable matching based on the dynamically updated preference lists. Finally, numerical results demonstrate that significant performance improvement can be achieved by the proposed scheme.

263 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: Hierarchical Discrete Distribution Decomposition (HD^3), a framework suitable for learning probabilistic pixel correspondences in both optical flow and stereo matching, is proposed and achieves state-of-the-art results.
Abstract: Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications. However, the computation of the match density for each pixel may be prohibitively expensive due to the large number of candidates. In this paper, we propose Hierarchical Discrete Distribution Decomposition (HD^3), a framework suitable for learning probabilistic pixel correspondences in both optical flow and stereo matching. We decompose the full match density into multiple scales hierarchically, and estimate the local matching distributions at each scale conditioned on the matching and warping at coarser scales. The local distributions can then be composed together to form the global match density. Despite its simplicity, our probabilistic method achieves state-of-the-art results for both optical flow and stereo matching on established benchmarks. We also find the estimated uncertainty is a good indication of the reliability of the predicted correspondences.

243 citations


Journal ArticleDOI
TL;DR: An overview of data-driven methods of achieving fault detection, variable prediction, and advanced control of WWTP is provided and practical guidance is given for matching a desired goal with a particular methodology along with considerations regarding the assumed data structure.

200 citations


Journal ArticleDOI
TL;DR: This work proposes an efficient and privacy-preserving carpooling scheme using blockchain-assisted vehicular fog computing to support conditional privacy, one-to-many matching, destination matching, and data auditability, and authenticates users in a conditionally anonymous way.
Abstract: Carpooling enables passengers to share a vehicle to reduce traveling time, vehicle carbon emissions, and traffic congestion. However, the majority of passengers lean to find local drivers, but querying a remote cloud server leads to an unnecessary communication overhead and an increased response delay. Recently, fog computing is introduced to provide local data processing with low latency, but it also raises new security and privacy concerns because users’ private information (e.g., identity and location) could be disclosed when these information are shared during carpooling. While they can be encrypted before transmission, it makes user matching a challenging task and malicious users can upload false locations. Moreover, carpooling records should be kept in a distributed manner to guarantee reliable data auditability. To address these problems, we propose an efficient and privacy-preserving carpooling scheme using blockchain-assisted vehicular fog computing to support conditional privacy, one-to-many matching, destination matching, and data auditability. Specifically, we authenticate users in a conditionally anonymous way. Also, we adopt private proximity test to achieve one-to-many proximity matching and extend it to efficiently establish a secret communication key between a passenger and a driver. We store all location grids into a tree and achieve get-off location matching using a range query technique. A private blockchain is built to store carpooling records. Finally, we analyze the security and privacy properties of the proposed scheme, and evaluate its performance in terms of computational costs and communication overhead.

181 citations


Journal ArticleDOI
TL;DR: A new nonparametric matching framework is introduced that elucidates how various unit fixed effects models implicitly compare treated and control observations to draw causal inference and enables a diverse set of identification strategies to adjust for unobservables in the absence of dynamic causal relationships between treatment and outcome variables.
Abstract: Many researchers use unit fixed effects regression models as their default methods for causal inference with longitudinal data. We show that the ability of these models to adjust for unobserved time‐invariant confounders comes at the expense of dynamic causal relationships, which are permitted under an alternative selection‐on‐observables approach. Using the nonparametric directed acyclic graph, we highlight two key causal identification assumptions of unit fixed effects models: Past treatments do not directly influence current outcome, and past outcomes do not affect current treatment. Furthermore, we introduce a new nonparametric matching framework that elucidates how various unit fixed effects models implicitly compare treated and control observations to draw causal inference. By establishing the equivalence between matching and weighted unit fixed effects estimators, this framework enables a diverse set of identification strategies to adjust for unobservables in the absence of dynamic causal relationships between treatment and outcome variables. We illustrate the proposed methodology through its application to the estimation of GATT membership effects on dyadic trade volume.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A recurrent neural network based reinforcement learning model is proposed which selectively observes a sequence of frames and associates the given sentence with video content in a matching-based manner and extends the method to a semantic matching reinforcement learning (SM-RL) model by extracting semantic concepts of videos and then fusing them with global context features.
Abstract: Current studies on action detection in untrimmed videos are mostly designed for action classes, where an action is described at word level such as jumping, tumbling, swing, etc. This paper focuses on a rarely investigated problem of localizing an activity via a sentence query which would be more challenging and practical. Considering that current methods are generally time-consuming due to the dense frame-processing manner, we propose a recurrent neural network based reinforcement learning model which selectively observes a sequence of frames and associates the given sentence with video content in a matching-based manner. However, directly matching sentences with video content performs poorly due to the large visual-semantic discrepancy. Thus, we extend the method to a semantic matching reinforcement learning (SM-RL) model by extracting semantic concepts of videos and then fusing them with global context features. Extensive experiments on three benchmark datasets, TACoS, Charades-STA and DiDeMo, show that our method achieves the state-of-the-art performance with a high detection speed, demonstrating both effectiveness and efficiency of our method.

Proceedings ArticleDOI
16 Aug 2019
TL;DR: The PubLayNet dataset for document layout analysis is developed by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central and demonstrated that deep neural networks trained on Pub LayNet accurately recognize the layout of scientific articles.
Abstract: Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.

Posted Content
TL;DR: In this paper, sliced score matching is used to learn deep score estimators for implicit distributions, which can be used for variational inference with implicit distributions and training Wasserstein Auto-Encoders.
Abstract: Score matching is a popular method for estimating unnormalized statistical models. However, it has been so far limited to simple, shallow models or low-dimensional data, due to the difficulty of computing the Hessian of log-density functions. We show this difficulty can be mitigated by projecting the scores onto random vectors before comparing them. This objective, called sliced score matching, only involves Hessian-vector products, which can be easily implemented using reverse-mode automatic differentiation. Therefore, sliced score matching is amenable to more complex models and higher dimensional data compared to score matching. Theoretically, we prove the consistency and asymptotic normality of sliced score matching estimators. Moreover, we demonstrate that sliced score matching can be used to learn deep score estimators for implicit distributions. In our experiments, we show sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders.

Posted Content
TL;DR: Two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively are proposed, which can be used to replace the widely used 3D convolutional layer which is computationally costly and memory-consuming as it has cubic computational/memory complexity.
Abstract: In the stereo matching task, matching cost aggregation is crucial in both traditional methods and deep neural network models in order to accurately estimate disparities. We propose two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively. The first is a semi-global aggregation layer which is a differentiable approximation of the semi-global matching, the second is the local guided aggregation layer which follows a traditional cost filtering strategy to refine thin structures. These two layers can be used to replace the widely used 3D convolutional layer which is computationally costly and memory-consuming as it has cubic computational/memory complexity. In the experiments, we show that nets with a two-layer guided aggregation block easily outperform the state-of-the-art GC-Net which has nineteen 3D convolutional layers. We also train a deep guided aggregation network (GA-Net) which gets better accuracies than state-of-the-art methods on both Scene Flow dataset and KITTI benchmarks.

Proceedings ArticleDOI
Chongyang Tao1, Wei Wu2, Can Xu2, Wenpeng Hu1, Dongyan Zhao1, Rui Yan1 
01 Jul 2019
TL;DR: Evaluation results on three benchmark data sets indicate that IoI can significantly outperform state-of-the-art methods in terms of various matching metrics and unveil how the depth of interaction affects the performance of IoI.
Abstract: Currently, researchers have paid great attention to retrieval-based dialogues in open-domain. In particular, people study the problem by investigating context-response matching for multi-turn response selection based on publicly recognized benchmark data sets. State-of-the-art methods require a response to interact with each utterance in a context from the beginning, but the interaction is performed in a shallow way. In this work, we let utterance-response interaction go deep by proposing an interaction-over-interaction network (IoI). The model performs matching by stacking multiple interaction blocks in which residual information from one time of interaction initiates the interaction process again. Thus, matching information within an utterance-response pair is extracted from the interaction of the pair in an iterative fashion, and the information flows along the chain of the blocks via representations. Evaluation results on three benchmark data sets indicate that IoI can significantly outperform state-of-the-art methods in terms of various matching metrics. Through further analysis, we also unveil how the depth of interaction affects the performance of IoI.

Journal ArticleDOI
TL;DR: The proposed matching framework has been evaluated using many different types of multimodal images, and the results demonstrate its superior matching performance with respect to the state-of-the-art methods.
Abstract: While image matching has been studied in remote sensing community for decades, matching multimodal data [e.g., optical, light detection and ranging (LiDAR), synthetic aperture radar (SAR), and map] remains a challenging problem because of significant nonlinear intensity differences between such data. To address this problem, we present a novel fast and robust template matching framework integrating local descriptors for multimodal images. First, a local descriptor [such as histogram of oriented gradient (HOG) and local self-similarity (LSS) or speeded-up robust feature (SURF)] is extracted at each pixel to form a pixelwise feature representation of an image. Then, we define a fast similarity measure based on the feature representation using the fast Fourier transform (FFT) in the frequency domain. A template matching strategy is employed to detect correspondences between images. In this procedure, we also propose a novel pixelwise feature representation using orientated gradients of images, which is named channel features of orientated gradients (CFOG). This novel feature is an extension of the pixelwise HOG descriptor with superior performance in image matching and computational efficiency. The major advantages of the proposed matching framework include: 1) structural similarity representation using the pixelwise feature description and 2) high computational efficiency due to the use of FFT. The proposed matching framework has been evaluated using many different types of multimodal images, and the results demonstrate its superior matching performance with respect to the state-of-the-art methods.

Proceedings ArticleDOI
25 Jul 2019
TL;DR: Sherlock is introduced, a multi-input deep neural network for detecting semantic types that achieves a support-weighted F$_1 score of $0.89, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.
Abstract: Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.

Journal ArticleDOI
TL;DR: The detail analysis of different V-reID methods in terms of mean average precision (mAP) and cumulative matching curve (CMC) provide objective insight into the strengths and weaknesses of these methods.

Proceedings ArticleDOI
Chongyang Tao1, Wei Wu2, Can Xu2, Wenpeng Hu1, Dongyan Zhao1, Rui Yan1 
30 Jan 2019
TL;DR: This work proposes a multi-representation fusion network where the representations can be fused into matching at an early stage, at an intermediate stage, or at the last stage, and demonstrates the effect of each representation to matching, which sheds light on how to select them in practical systems.
Abstract: We consider context-response matching with multiple types of representations for multi-turn response selection in retrieval-based chatbots. The representations encode semantics of contexts and responses on words, n-grams, and sub-sequences of utterances, and capture both short-term and long-term dependencies among words. With such a number of representations in hand, we study how to fuse them in a deep neural architecture for matching and how each of them contributes to matching. To this end, we propose a multi-representation fusion network where the representations can be fused into matching at an early stage, at an intermediate stage, or at the last stage. We empirically compare different representations and fusing strategies on two benchmark data sets. Evaluation results indicate that late fusion is always better than early fusion, and by fusing the representations at the last stage, our model significantly outperforms the existing methods, and achieves new state-of-the-art performance on both data sets. Through a thorough ablation study, we demonstrate the effect of each representation to matching, which sheds light on how to select them in practical systems.

Proceedings ArticleDOI
14 Jan 2019
TL;DR: This paper addresses the problem of 3D pose estimation for multiple people in a few calibrated camera views by using a multi-way matching algorithm to cluster the detected 2D poses in all views and proposes to combine geometric and appearance cues for cross-view matching.
Abstract: This paper addresses the problem of 3D pose estimation for multiple people in a few calibrated camera views. The main challenge of this problem is to find the cross-view correspondences among noisy and incomplete 2D pose predictions. Most previous methods address this challenge by directly reasoning in 3D using a pictorial structure model, which is inefficient due to the huge state space. We propose a fast and robust approach to solve this problem. Our key idea is to use a multi-way matching algorithm to cluster the detected 2D poses in all views. Each resulting cluster encodes 2D poses of the same person across different views and consistent correspondences across the keypoints, from which the 3D pose of each person can be effectively inferred. The proposed convex optimization based multi-way matching algorithm is efficient and robust against missing and false detections, without knowing the number of people in the scene. Moreover, we propose to combine geometric and appearance cues for cross-view matching. The proposed approach achieves significant performance gains from the state-of-the-art (96.3% vs. 90.6% and 96.9% vs. 88% on the Campus and Shelf datasets, respectively), while being efficient for real-time applications.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: Experimental results show that the MCUA scheme for cost volume calculation outperforms state-of-the-art methods by a notable margin and effectively improves the accuracy of stereo matching.
Abstract: Exploiting multi-level context information to cost volume can improve the performance of learning-based stereo matching methods. In recent years, 3-D Convolution Neural Networks (3-D CNNs) show the advantages in regularizing cost volume but are limited by unary features learning in matching cost computation. However, existing methods only use features from plain convolution layers or a simple aggregation of multi-level features to calculate cost volume, which is insufficient because stereo matching requires discriminative features to identify corresponding pixels in rectified stereo image pairs. In this paper, we propose a unary features descriptor using multi-level context ultra-aggregation (MCUA), which encapsulates all convolutional features into a more discriminative representation by intra- and inter-level features combination. Specifically, a child module that takes low-resolution images as input captures larger context information; the larger context information from each layer is densely connected to the main branch of the network. MCUA makes good usage of multi-level features with richer context and performs the image-to-image prediction holistically. We introduce our MCUA scheme for cost volume calculation and test it on PSM-Net. We also evaluate our method on Scene Flow and KITTI 2012/2015 stereo datasets. Experimental results show that our method outperforms state-of-the-art methods by a notable margin and effectively improves the accuracy of stereo matching.

Journal ArticleDOI
TL;DR: A classification of various methods for tracking community evolution in dynamic social networks into four main approaches using as a criterion the functioning principle is proposed, based on independent successive static detection and matching.
Abstract: This paper presents a survey of previous studies done on the problem of tracking community evolution over time in dynamic social networks. This problem is of crucial importance in the field of social network analysis. The goal of our paper is to classify existing methods dealing with the issue. We propose a classification of various methods for tracking community evolution in dynamic social networks into four main approaches using as a criterion the functioning principle: the first one is based on independent successive static detection and matching; the second is based on dependent successive static detection; the third is based on simultaneous study of all stages of community evolution; finally, the fourth and last one concerns methods working directly on temporal networks. Our paper starts by giving basic concepts about social networks, community structure and strategies for evaluating community detection methods. Then, it describes the different approaches, and exposes the strengths as well as the weaknesses of each.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A coarse-to-fine consistency learning scheme to learn consistency globally and locally in two steps to avoid learning ineffective knowledge in consistency learning and preserve the prior common knowledge of intra-camera matching in the pretrained model as reliable guiding information, which does not suffer from cross-camera scene variation as cross- camera matching.
Abstract: For matching pedestrians across disjoint camera views in surveillance, person re-identification (Re-ID) has made great progress in supervised learning. However, it is infeasible to label data in a number of new scenes when extending a Re-ID system. Thus, studying unsupervised learning for Re-ID is important for saving labelling cost. Yet, cross-camera scene variation is a key challenge for unsupervised Re-ID, such as illumination, background and viewpoint variations, which cause domain shift in the feature space and result in inconsistent pairwise similarity distributions that degrade matching performance. To alleviate the effect of cross-camera scene variation, we propose a Camera-Aware Similarity Consistency Loss to learn consistent pairwise similarity distributions for intra-camera matching and cross-camera matching. To avoid learning ineffective knowledge in consistency learning, we preserve the prior common knowledge of intra-camera matching in the pretrained model as reliable guiding information, which does not suffer from cross-camera scene variation as cross-camera matching. To learn similarity consistency more effectively, we further develop a coarse-to-fine consistency learning scheme to learn consistency globally and locally in two steps. Experiments show that our method outperformed the state-of-the-art unsupervised Re-ID methods.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: A multi-level matching and aggregation network (MLMAN) for few-shot relation classification that encodes the query instance and each support set in an interactive way by considering their matching information at both local and instance levels.
Abstract: This paper presents a multi-level matching and aggregation network (MLMAN) for few-shot relation classification. Previous studies on this topic adopt prototypical networks, which calculate the embedding vector of a query instance and the prototype vector of the support set for each relation candidate independently. On the contrary, our proposed MLMAN model encodes the query instance and each support set in an interactive way by considering their matching information at both local and instance levels. The final class prototype for each support set is obtained by attentive aggregation over the representations of support instances, where the weights are calculated using the query instance. Experimental results demonstrate the effectiveness of our proposed methods, which achieve a new state-of-the-art performance on the FewRel dataset.

Journal ArticleDOI
TL;DR: This work describes how both relative and absolute measures of treatment effect can be obtained when using propensity‐score matching with competing risks data and recommends that a marginal subdistribution hazard model that accounts for the within‐pair clustering of outcomes be used to test the equality of CIFs and to estimate subdist distribution hazard ratios.
Abstract: Propensity-score matching is a popular analytic method to remove the effects of confounding due to measured baseline covariates when using observational data to estimate the effects of treatment. Time-to-event outcomes are common in medical research. Competing risks are outcomes whose occurrence precludes the occurrence of the primary time-to-event outcome of interest. All non-fatal outcomes and all cause-specific mortality outcomes are potentially subject to competing risks. There is a paucity of guidance on the conduct of propensity-score matching in the presence of competing risks. We describe how both relative and absolute measures of treatment effect can be obtained when using propensity-score matching with competing risks data. Estimates of the relative effect of treatment can be obtained by using cause-specific hazard models in the matched sample. Estimates of absolute treatment effects can be obtained by comparing cumulative incidence functions (CIFs) between matched treated and matched control subjects. We conducted a series of Monte Carlo simulations to compare the empirical type I error rate of different statistical methods for testing the equality of CIFs estimated in the matched sample. We also examined the performance of different methods to estimate the marginal subdistribution hazard ratio. We recommend that a marginal subdistribution hazard model that accounts for the within-pair clustering of outcomes be used to test the equality of CIFs and to estimate subdistribution hazard ratios. We illustrate the described methods by using data on patients discharged from hospital with acute myocardial infarction to estimate the effect of discharge prescribing of statins on cardiovascular death.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: TIMAM as mentioned in this paper is a text-image modality adversarial matching approach that learns modality-invariant feature representations using adversarial and cross-modal matching objectives, achieving state-of-the-art performance on four widely-used publicly-available datasets.
Abstract: For many computer vision applications such as image captioning, visual question answering, and person search, learning discriminative feature representations at both image and text level is an essential yet challenging problem. Its challenges originate from the large word variance in the text domain as well as the difficulty of accurately measuring the distance between the features of the two modalities. Most prior work focuses on the latter challenge, by introducing loss functions that help the network learn better feature representations but fail to account for the complexity of the textual input. With that in mind, we introduce TIMAM: a Text-Image Modality Adversarial Matching approach that learns modality-invariant feature representations using adversarial and cross-modal matching objectives. In addition, we demonstrate that BERT, a publicly-available language model that extracts word embeddings, can successfully be applied in the text-to-image matching domain. The proposed approach achieves state-of-the-art cross-modal matching performance on four widely-used publicly-available datasets resulting in absolute improvements ranging from 2% to 5% in terms of rank-1 accuracy.

Journal ArticleDOI
TL;DR: This model is to propose the enhancement and matching for latent fingerprints using Scale Invariant Feature Transformation (SIFT), and the matching result is obtained satisfactory compare than minutiae points.
Abstract: Latent fingerprint identification is such a difficult task to law enforcement agencies and border security in identifying suspects. It is a too complicate due to poor quality images with non-linear distortion and complex background noise. Hence, the image quality is required for matching those latent fingerprints. The current researchers have been working based on minutiae points for fingerprint matching because of their accuracy are acceptable. In an effort to extend technology for fingerprint matching, our model is to propose the enhancementand matching for latent fingerprints using Scale Invariant Feature Transformation (SIFT). It has involved in two phases (i) Latent fingerprint contrast enhancement using intuitionistic type-2 fuzzy set (ii) Extract the SIFTfeature points from the latent fingerprints. Then thematching algorithm is performedwith n- number of images and scoresare calculated by Euclidean distance. We tested our algorithm for matching, usinga public domain fingerprint database such as FVC-2004 and IIIT-latent fingerprint. The experimental consequences indicatethe matching result is obtained satisfactory compare than minutiae points.

Journal ArticleDOI
17 Jul 2019
TL;DR: A general framework named DeepCF, short for Deep Collaborative Filtering, is proposed, to combine the strengths of the two types of methods and overcome such flaws in dot product and low-rank relations respectively.
Abstract: In general, recommendation can be viewed as a matching problem, i.e., match proper items for proper users. However, due to the huge semantic gap between users and items, it’s almost impossible to directly match users and items in their initial representation spaces. To solve this problem, many methods have been studied, which can be generally categorized into two types, i.e., representation learning-based CF methods and matching function learning-based CF methods. Representation learning-based CF methods try to map users and items into a common representation space. In this case, the higher similarity between a user and an item in that space implies they match better. Matching function learning-based CF methods try to directly learn the complex matching function that maps user-item pairs to matching scores. Although both methods are well developed, they suffer from two fundamental flaws, i.e., the limited expressiveness of dot product and the weakness in capturing low-rank relations respectively. To this end, we propose a general framework named DeepCF, short for Deep Collaborative Filtering, to combine the strengths of the two types of methods and overcome such flaws. Extensive experiments on four publicly available datasets demonstrate the effectiveness of the proposed DeepCF framework.

Proceedings ArticleDOI
15 Oct 2019
TL;DR: This work proposes a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image- text instance.
Abstract: A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neither of these approaches can, however, balance the matching accuracy and model complexity well. We propose a novel framework that achieves remarkable matching performance with acceptable model complexity. Specifically, in the training stage, we propose a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image-text instance. Then, during testing, we deploy a generic Cross-modal Re-ranking (RR) scheme for refinement without requiring additional training procedure. Extensive experiments on two datasets demonstrate that our MTFN-RR consistently achieves the state-of-the-art matching performance with much less time complexity.