scispace - formally typeset
Search or ask a question

Showing papers on "Matching (statistics) published in 2021"


Journal ArticleDOI
TL;DR: This survey introduces feature detection, description, and matching techniques from handcrafted methods to trainable ones and provides an analysis of the development of these methods in theory and practice, and briefly introduces several typical image matching-based applications.
Abstract: As a fundamental and critical task in various visual applications, image matching can identify then correspond the same or similar structure/content from two or more images. Over the past decades, growing amount and diversity of methods have been proposed for image matching, particularly with the development of deep learning techniques over the recent years. However, it may leave several open questions about which method would be a suitable choice for specific applications with respect to different scenarios and task requirements and how to design better image matching methods with superior performance in accuracy, robustness and efficiency. This encourages us to conduct a comprehensive and systematic review and analysis for those classical and latest techniques. Following the feature-based image matching pipeline, we first introduce feature detection, description, and matching techniques from handcrafted methods to trainable ones and provide an analysis of the development of these methods in theory and practice. Secondly, we briefly introduce several typical image matching-based applications for a comprehensive understanding of the significance of image matching. In addition, we also provide a comprehensive and objective comparison of these classical and latest techniques through extensive experiments on representative datasets. Finally, we conclude with the current status of image matching technologies and deliver insightful discussions and prospects for future works. This survey can serve as a reference for (but not limited to) researchers and engineers in image matching and related fields.

474 citations


Proceedings ArticleDOI
Jiaming Sun1, Zehong Shen1, Yuang Wang1, Hujun Bao1, Xiaowei Zhou1 
01 Apr 2021
TL;DR: LoFTR as discussed by the authors uses self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images, which enables the method to produce dense matches in low-texture areas.
Abstract: We present a novel method for local image feature matching. Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images. The global receptive field provided by Transformer enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. LoFTR also ranks first on two public benchmarks of visual localization among the published methods. Code is available at our project page: https://zju3dv.github.io/loftr/.

459 citations


Proceedings Article
03 May 2021
TL;DR: WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality.
Abstract: This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality. We find that it can generate high fidelity audio samples using as few as six iterations. Experiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations. Audio samples are available at https://wavegrad-iclr2021.github.io/.

351 citations


Journal ArticleDOI
TL;DR: The probit regression results give a different outcome, as rescon, FID, CO2, Human Development Index (HDI), and investment in the energy sector by the private sector that will likely have an impact on the green financing and climate change mitigation of the study countries.
Abstract: Green finance is inextricably linked to investment risk, particularly in emerging and developing economies (EMDE). This study uses the difference in differences (DID) method to evaluate the mean causal effects of a treatment on an outcome of the determinants of scaling up green financing and climate change mitigation in the N-11 countries from 2005 to 2019. After analyzing with a dummy for the treated countries, it was confirmed that the outcome covariates: rescon (renewable energy sources consumption), population, FDI, CO2, inflation, technical corporation grants, domestic credit to the private sector, and research and development are very significant in promoting green financing and climate change mitigation in the study countries. The probit regression results give a different outcome, as rescon, FID, CO2, Human Development Index (HDI), and investment in the energy sector by the private sector that will likely have an impact on the green financing and climate change mitigation of the study countries. Furthermore, after matching the analysis through the nearest neighbor matching, kernel matching, and radius matching, it produced mixed results for both the treated and the untreated countries. Either group experienced an improvement in green financing and climate change mitigation or a decrease. Overall, the DID showed no significant difference among the countries.

156 citations


Journal ArticleDOI
TL;DR: This survey provides a comprehensive review of multimodal image matching methods from handcrafted to deep methods for each research field according to their imaging nature, including medical, remote sensing and computer vision.

155 citations


Journal ArticleDOI
TL;DR: A privacy-preserving medical record searching scheme based on ElGamal Blind Signature that achieves bilateral security, that is, whether the abstracts match or not, both of the privacy of the case-database and the private information of the current patient are well protected.
Abstract: In medical field, previous patients' cases are extremely private as well as intensely valuable to current disease diagnosis. Therefore, how to make full use of precious cases while not leaking out patients' privacy is a leading and promising work especially in future privacy-preserving intelligent medical period. In this paper, we investigate how to securely invoke patients' records from past case-database while protecting the privacy of both current diagnosed patient and the case-database and construct a privacy-preserving medical record searching scheme based on ElGamal Blind Signature. In our scheme, by blinded the healthy data of the patient and the database of the intelligent doctor respectively, the patient can securely make self-helped medical diagnosis by invoking past case-database and securely comparing the blinded abstracts of current data and previous records. Moreover, the patient can obtain target searching information intelligently at the same time he knows whether the abstracts match or not instead of obtaining it after matching. It greatly increases the timeliness of information acquisition and meets high-speed information sharing requirements especially in 5G era. What's more, our proposed scheme achieves bilateral security, that is, whether the abstracts match or not, both of the privacy of the case-database and the private information of the current patient are well protected. Besides, it resists different levels of violent ergodic attacks by adjusting the number of zeros in a bit string according to different security requirements.

118 citations


Journal ArticleDOI
TL;DR: It is found that supply is highly elastic: in periods when demand doubles, sellers perform almost twice as many tasks, prices hardly increase, and the probability of requested tasks being matched falls only slightly, implying that in markets where supply can accommodate demand fluctuations, growth relies on attracting buyers at a faster rate than sellers.
Abstract: We study the growth of online peer-to-peer markets. Using data from TaskRabbit, an expanding marketplace for domestic tasks at the time of our study, we show that growth varies considerably across ...

94 citations


Journal ArticleDOI
TL;DR: An approach to TSMDM with multi-granular HFLTSs is developed and allows matching objects to provide linguistic assessments flexibly and can deal with the situations when incomplete criteria weight information is provided.
Abstract: Two-sided matching decision making (TSMDM) problems exist widely in human being’s daily life. For practical TSMDM problems, matching objects with different culture and knowledge backgrounds usually tend to provide linguistic assessments using different linguistic term sets (i.e., multi-granular linguistic information). Moreover, for TSMDM problems with high uncertainty, it is possible that matching objects may have some hesitancy and thus provide hesitant fuzzy linguistic term sets (HFLTSs). To model these situations, an approach to TSMDM with multi-granular HFLTSs is developed in the paper. In the proposed approach, some optimization models are first constructed to determine criteria weights for matching objects who do not provide clear criteria weight vectors. Afterwards, each matching object’s hesitant fuzzy linguistic decision matrix is aggregated to obtain his/her collective assessments over matching objects on the other side, which are denoted by multi-granular linguistic distribution assessments. These multi-granular linguistic distribution assessments are unified to obtain matching objects’ satisfaction degrees. Furthermore, an optimization model which aims to maximize the overall satisfaction degree of matching objects by considering the stable matching condition is then established and solved to determine the matching between matching objects. Eventually, an example for the matching of green building technology supply and demand is provided to demonstrate the characteristics of the proposed approach. Compared with previous studies, the proposed approach allows matching objects to provide linguistic assessments flexibly and can deal with the situations when incomplete criteria weight information is provided.

89 citations


Proceedings ArticleDOI
11 Jul 2021
TL;DR: In this paper, a deconfounded cross-modal matching (DCM) method is proposed to remove the confounding effects of moment location in the context of video moment retrieval, which can achieve significant improvement in terms of both accuracy and generalization.
Abstract: We tackle the task of video moment retrieval (VMR), which aims to localize a specific moment in a video according to a textual query. Existing methods primarily model the matching relationship between query and moment by complex cross-modal interactions. Despite their effectiveness, current models mostly exploit dataset biases while ignoring the video content, thus leading to poor generalizability. We argue that the issue is caused by the hidden confounder in VMR, i.e., temporal location of moments, that spuriously correlates the model input and prediction. How to design robust matching models against the temporal location biases is crucial but, as far as we know, has not been studied yet for VMR. To fill the research gap, we propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction. Specifically, we develop a Deconfounded Cross-modal Matching (DCM) method to remove the confounding effects of moment location. It first disentangles moment representation to infer the core feature of visual content, and then applies causal intervention on the disentangled multimodal input based on backdoor adjustment, which forces the model to fairly incorporate each possible location of the target into consideration. Extensive experiments clearly show that our approach can achieve significant improvement over the state-of-the-art methods in terms of both accuracy and generalization.

89 citations


Journal ArticleDOI
TL;DR: In this article, a review of the literature on personalized matching in persuasion can be found, where the authors describe different types of persuasive matches, the primary characteristics of people who are targeted, and the key psychological mechanisms underlying the impact of matching.

80 citations


Proceedings ArticleDOI
26 Oct 2021
TL;DR: Wang et al. as discussed by the authors proposed Adaptive RNNs (AdaRNN) to tackle the temporal covariate shift problem by building an adaptive model that generalizes well on the unseen test data.
Abstract: Time series has wide applications in the real world and is known to be difficult to forecast. Since its statistical properties change over time, its distribution also changes temporally, which will cause severe distribution shift problem to existing methods. However, it remains unexplored to model the time series in the distribution perspective. In this paper, we term this as Temporal Covariate Shift (TCS). This paper proposes Adaptive RNNs (AdaRNN) to tackle the TCS problem by building an adaptive model that generalizes well on the unseen test data. AdaRNN is sequentially composed of two novel algorithms. First, we propose Temporal Distribution Characterization to better characterize the distribution information in the TS. Second, we propose Temporal Distribution Matching to reduce the distribution mismatch in TS to learn the adaptive TS model. AdaRNN is a general framework with flexible distribution distances integrated. Experiments on human activity recognition, air quality prediction, and financial analysis show that AdaRNN outperforms the latest methods by a classification accuracy of 2.6% and significantly reduces the RMSE by 9.0%. We also show that the temporal distribution matching algorithm can be extended in Transformer structure to boost its performance.

Journal ArticleDOI
TL;DR: This paper presents an end-to-end trainable convolution neural network to fully use cost volumes for stereo matching, and investigates the problem of developing a robust model to perform well across multiple datasets with different characteristics.
Abstract: For CNNs based stereo matching methods, cost volumes play an important role in achieving good matching accuracy. In this paper, we present an end-to-end trainable convolution neural network to fully use cost volumes for stereo matching. Our network consists of three sub-modules, i.e., shared feature extraction, initial disparity estimation, and disparity refinement. Cost volumes are calculated at multiple levels using the shared features, and are used in both initial disparity estimation and disparity refinement sub-modules. To improve the efficiency of disparity refinement, multi-scale feature constancy is introduced to measure the correctness of the initial disparity in feature space. These sub-modules of our network are tightly-coupled, making it compact and easy to train. Moreover, we investigate the problem of developing a robust model to perform well across multiple datasets with different characteristics. We achieve this by introducing a two-stage finetuning scheme to gently transfer the model to target datasets. Specifically, in the first stage, the model is finetuned using both a large synthetic dataset and the target datasets with a relatively large learning rate, while in the second stage the model is trained using only the target datasets with a small learning rate. The proposed method is tested on several benchmarks including the Middlebury 2014, KITTI 2015, ETH3D 2017, and SceneFlow datasets. Experimental results show that our method achieves the state-of-the-art performance on all the datasets. The proposed method also won the 1st prize on the Stereo task of Robust Vision Challenge 2018.

Journal ArticleDOI
TL;DR: It is shown that ignoring the matching step results in asymptotically valid standard errors if matching is done without replacement and the regression model is correctly specified relative to the population regression function of the outcome variable on the treatment variable and all the covariates used for matching.
Abstract: Nearest-neighbor matching is a popular nonparametric tool to create balance between treatment and control groups in observational studies. As a preprocessing step before regression, matching reduce...

Journal ArticleDOI
TL;DR: A novel approach to two-sided matching decision making with FPRs-SC is developed based on the LLSM and the proposed consistency improving algorithms are devised to improve the multiplicative consistency of an unacceptably consistent FPR-SC.
Abstract: The fuzzy preference relation with self-confidence (FPR-SC), whose elements are composed of the degree to which an alternative is preferred to another and the self-confidence level about the prefer...

Book ChapterDOI
23 Jun 2021
TL;DR: In this paper, the authors explore the application of the language-image model, CLIP, to obtain video representations without the need for said annotations and obtain state-of-the-art results on the MSR-VTT and MSVD benchmarks.
Abstract: Video Retrieval is a challenging task where the task aims at matching a text query to a video or vice versa. Most of the existing approaches for addressing such a problem rely on annotations made by the users. Although simple, this approach is not always feasible in practice. In this work, we explore the application of the language-image model, CLIP, to obtain video representations without the need for said annotations. This model was explicitly trained to learn a common space where images and text can be compared. Using various techniques described in this document, we extended its application to videos, obtaining state-of-the-art results on the MSR-VTT and MSVD benchmarks.

Journal ArticleDOI
TL;DR: A central concepts based ontology partitioning algorithm is used to divide the ontology into several disjoint segments, which borrows the idea from the social network and Firefly Algorithm, and results show that the alignments obtained by the method significantly outperforms the state-of-the-art biomedical ontology matching techniques.

Journal ArticleDOI
TL;DR: This tutorial offers researchers with a broad survey of PSM, ranging from data preprocessing to estimations of propensity scores, and from matching to analyses, and discusses the advantages and disadvantages of propensity score methods.
Abstract: It is increasingly important to accurately and comprehensively estimate the effects of particular clinical treatments Although randomization is the current gold standard, randomized controlled trials (RCTs) are often limited in practice due to ethical and cost issues Observational studies have also attracted a great deal of attention as, quite often, large historical datasets are available for these kinds of studies However, observational studies also have their drawbacks, mainly including the systematic differences in baseline covariates, which relate to outcomes between treatment and control groups that can potentially bias results Propensity score methods, which are a series of balancing methods in these studies, have become increasingly popular by virtue of the two major advantages of dimension reduction and design separation Within this approach, propensity score matching (PSM) has been empirically proven, with outstanding performances across observational datasets While PSM tutorials are available in the literature, there is still room for improvement Some PSM tutorials provide step-by-step guidance, but only one or two packages have been covered, thereby limiting their scope and practicality Several articles and books have expounded upon propensity scores in detail, exploring statistical principles and theories; however, the lack of explanations on function usage in programming language has made it difficult for researchers to understand and follow these materials To this end, this tutorial was developed with a six-step PSM framework, in which we summarize the recent updates and provide step-by-step guidance to the R programming language This tutorial offers researchers with a broad survey of PSM, ranging from data preprocessing to estimations of propensity scores, and from matching to analyses We also explain generalized propensity scoring for multiple or continuous treatments, as well as time-dependent PSM Lastly, we discuss the advantages and disadvantages of propensity score methods

Journal ArticleDOI
TL;DR: An algorithm for large-scale matching with the incomplete preference list to address the problem of almost impossible to know the details of every individual of the other side so that the complete preference list (CPL) cannot be built in reality is proposed.
Abstract: Nowadays, there is an ever-increasing interests in federated learning, which allows end devices to collaboratively train a global machine learning model in a decentralized paradigm without sharing individual data. Despite the advantages of low communication cost and preserving data privacy, federated learning is also facing with new challenges to address. Practically, end devices will consider the resources cost and willingness caused by machine learning model training when they are invited to participate a federated learning task. So, how to assign the preferable tasks to the devices with high willingness has to be considered. Besides, the end devices have the property of high mobility, which means the time of devices localizing within the network is limited. Therefore, to reduce the task execution time is necessary. To address these problems, we first analyze and formulate the latency minimization problem for multitask federated learning in a multiaccess edge computing (MEC) network scenario. Then, we model the corresponding problem as a matching game to find the optimal task assignment solutions. Moreover, considering the large-scale Internet-of-Things (IoT) scenario, it is almost impossible for two sides to know the details of every individual of the other side so that the complete preference list (CPL) cannot be built in reality. Therefore, we propose an algorithm for large-scale matching with the incomplete preference list to address the problem. Finally, we conduct the numerical simulation in various cases to demonstrate the effectiveness of our proposed method. The results show that our approach can achieve similar performance with the CPL case.

Journal ArticleDOI
TL;DR: In this paper, a Representation Invariance Loss (RIL) is proposed to optimize the bounding box regression for the rotating objects, which treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding boxes regression into an adaptive matching process with these local minimima.
Abstract: Arbitrary-oriented objects exist widely in natural scenes, and thus the oriented object detection has received extensive attention in recent years The mainstream rotation detectors use oriented bounding boxes (OBB) or quadrilateral bounding boxes (QBB) to represent the rotating objects However, these methods suffer from the representation ambiguity for oriented object definition, which leads to suboptimal regression optimization and the inconsistency between the loss metric and the localization accuracy of the predictions In this paper, we propose a Representation Invariance Loss (RIL) to optimize the bounding box regression for the rotating objects Specifically, RIL treats multiple representations of an oriented object as multiple equivalent local minima, and hence transforms bounding box regression into an adaptive matching process with these local minima Then, the Hungarian matching algorithm is adopted to obtain the optimal regression strategy We also propose a normalized rotation loss to alleviate the weak correlation between different variables and their unbalanced loss contribution in OBB representation Extensive experiments on remote sensing datasets and scene text datasets show that our method achieves consistent and substantial improvement The source code and trained models are available at this https URL

Journal ArticleDOI
TL;DR: In this paper, the authors used rough-fuzzy number and structural entropy weighting method to perform a coupling analysis on all service activities in the generalized growth scheme set, and to merge redundant service activities.
Abstract: Maximizing the residual value of retired products and reducing process consumption and resource waste are vital for Generalized Growth-oriented Remanufacturing Services (GGRMS). Under the GGRMS, the traditional product-oriented remanufacturing methods to be changed: the products in GGRMS should be divided into multiple parts for maximizing residual value of different parts. However, this increases the difficulty of resource matching for service activities. To improve the efficiency of resource matching, we first used rough-fuzzy number and structural entropy weighting method to perform a coupling analysis on all service activities in the generalized growth scheme set, and to merge redundant service activities. We then considered the interests of both the service providers and integrators and added flexible impact factors to establish a service resource optimization configuration model, and solved it with the Non-Dominated Sorting Genetic Algorithm (NSGA-II). Finally, we, using a retired manual gearbox an experiment, optimized the service resource allocation for its generalized growth scheme set. The experimental results shown that the overall matching efficiency was increased by 74.56% after merging redundant service activities, showing that the proposed method is suitable for the resource allocation of the generalized growth for complex single mechanical products, and can offer guidelines to the development of RMS.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a learning-to-match (LTM) method to break the anchor intersection-over-union (IoU) restriction, allowing objects to match anchors in a flexible manner.
Abstract: Modern CNN-based object detectors assign anchors for ground-truth objects under the restriction of object-anchor Intersection-over-Union (IoU). In this study, we propose a learning-to-match (LTM) method to break IoU restriction, allowing objects to match anchors in a flexible manner. LTM updates hand-crafted anchor assignment to "free" anchor matching by formulating detector training in the Maximum Likelihood Estimation (MLE) framework. During the training phrase, LTM is implemented by converting the detection likelihood to anchor matching loss functions which are plug-and-play. Minimizing the matching loss functions drives learning and selecting features which best explain a class of objects with respect to both classification and localization. LTM is extended from anchor-based detectors to anchor-free detectors, validating the general applicability of learnable object-feature matching mechism for visual object detection. Experiments on MS COCO dataset demonstrate that LTM detectors consistently outperform counterpart detectors with significant margins. The last but not the least, LTM require negligible computational cost in both training and inference phrases as it does not involve any additional architecture or parameter. Code has been made publicly available.

Journal ArticleDOI
TL;DR: It is shown that a P2P market driven by electrical distance leads to reduced losses and line congestion in both mechanisms and is able to capture the high market efficiency of the CDA.

Journal ArticleDOI
TL;DR: An intermediary’s problem of dynamically matching demand and supply of heterogeneous types in a periodic-review fashion is considered, which involves two disjoint sets of types.
Abstract: Problem definition: We consider an intermediary’s problem of dynamically matching demand and supply of heterogeneous types in a periodic-review fashion. Specifically, there are two disjoint sets of...

Journal ArticleDOI
TL;DR: In this paper, a matching decision method for manufacturing service resources is proposed based on multidimensional information fusion, where the information entropy and rough set theory are applied to classify the importance of manufacturing service tasks, while the matching capability are analyzed by using a hybrid collaborative filtering algorithm.
Abstract: With the development of specialization, coordination and intelligence in the manufacturing service process, the issue of how to quickly extract potential resources or capabilities for distributed manufacturing service requirements, and how to carry out resource matching for manufacturing service requirements with correlated mapping characteristics, have become the critical issues to be addressed in the cloud manufacturing environment. Through the combination of the characteristics of relevance, synergy and diversity of manufacturing service tasks on the intelligent cloud platform, a matching decision method for manufacturing service resources is proposed in this paper based on multidimensional information fusion. On the basis of integrating multidimensional information data in cloud manufacturing resource, the information entropy and rough set theory are applied to classify the importance of manufacturing service tasks, while the matching capability are analyzed by using a hybrid collaborative filtering (HCF) algorithm. Then, the information of function attribute, reliability and preference is employed to match and push manufacturing service resources or capabilities actively, so as to realize the matching decision of manufacturing service resources with precise quality, stable service and maximum efficiency. At last, a case study of resources matching decision for body & chassis manufacturing service in a new energy automobile enterprise is presented, in which the experimental results show that the proposed approach is more accuracy and effective compared with other different recommendation algorithms.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the pricing problem faced by a revenue-maximizing platform matching price-sensitive customers to flexible supply units within a geographic area, which can be interpreted as the problem f...
Abstract: We consider the pricing problem faced by a revenue-maximizing platform matching price-sensitive customers to flexible supply units within a geographic area. This can be interpreted as the problem f...

Proceedings ArticleDOI
20 Jun 2021
TL;DR: In this paper, the authors introduce a Hough transform perspective on convolutional matching and propose an effective geometric matching algorithm, dubbed Convolutional Hough Matching (CHM).
Abstract: Despite advances in feature representation, leveraging geometric relations is crucial for establishing reliable visual correspondences under large variations of images. In this work we introduce a Hough transform perspective on convolutional matching and propose an effective geometric matching algorithm, dubbed Convolutional Hough Matching (CHM). The method distributes similarities of candidate matches over a geometric transformation space and evaluate them in a convolutional manner. We cast it into a trainable neural layer with a semi-isotropic high-dimensional kernel, which learns non-rigid matching with a small number of interpretable parameters. To validate the effect, we develop the neural network with CHM layers that perform convolutional matching in the space of translation and scaling. Our method sets a new state of the art on standard benchmarks for semantic visual correspondence, proving its strong robustness to challenging intra-class variations.

Journal ArticleDOI
TL;DR: In this paper, a survey of neural networks for entity matching is presented, identifying which steps of the entity matching process existing work have targeted using neural networks, and providing an overview of the different techniques used at each step.
Abstract: Entity matching is the problem of identifying which records refer to the same real-world entity. It has been actively researched for decades, and a variety of different approaches have been developed. Even today, it remains a challenging problem, and there is still generous room for improvement. In recent years, we have seen new methods based upon deep learning techniques for natural language processing emerge. In this survey, we present how neural networks have been used for entity matching. Specifically, we identify which steps of the entity matching process existing work have targeted using neural networks, and provide an overview of the different techniques used at each step. We also discuss contributions from deep learning in entity matching compared to traditional methods, and propose a taxonomy of deep neural networks for entity matching.

Proceedings Article
01 Jan 2021
TL;DR: In this article, the authors introduce six novel matching operators from the perspective of feature fusion instead of explicit similarity learning, namely Concatenation, Pointwise-Addition, Pairwise-Relation, FiLM, Simple-Transformer and Transductive-Guidance, to explore more feasibility on matching operator selection.
Abstract: Siamese tracking has achieved groundbreaking performance in recent years, where the essence is the efficient matching operator cross-correlation and its variants. Besides the remarkable success, it is important to note that the heuristic matching network design relies heavily on expert experience. Moreover, we experimentally find that one sole matching operator is difficult to guarantee stable tracking in all challenging environments. Thus, in this work, we introduce six novel matching operators from the perspective of feature fusion instead of explicit similarity learning, namely Concatenation, Pointwise-Addition, Pairwise-Relation, FiLM, Simple-Transformer and Transductive-Guidance, to explore more feasibility on matching operator selection. The analyses reveal these operators' selective adaptability on different environment degradation types, which inspires us to combine them to explore complementary features. To this end, we propose binary channel manipulation (BCM) to search for the optimal combination of these operators. BCM determines to retrain or discard one operator by learning its contribution to other tracking steps. By inserting the learned matching networks to a strong baseline tracker Ocean, our model achieves favorable gains by $67.2 \rightarrow 71.4$, $52.6 \rightarrow 58.3$, $70.3 \rightarrow 76.0$ success on OTB100, LaSOT, and TrackingNet, respectively. Notably, Our tracker, dubbed AutoMatch, uses less than half of training data/time than the baseline tracker, and runs at 50 FPS using PyTorch. Code and model will be released at this https URL.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Patch2Pix as discussed by the authors proposes a new perspective to estimate correspondences in a detect-to-refine manner, where they first predict patch-level match proposals and then refine them.
Abstract: The classical matching pipeline used for visual localization typically involves three steps: (i) local feature detection and description, (ii) feature matching, and (iii) outlier rejection. Recently emerged correspondence networks propose to perform those steps inside a single network but suffer from low matching resolution due to the memory bottle-neck. In this work, we propose a new perspective to estimate correspondences in a detect-to-refine manner, where we first predict patch-level match proposals and then refine them. We present Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals and jointly rejecting outlier matches with confidence scores. Patch2Pix is weakly supervised to learn correspondences that are consistent with the epipolar geometry of an input image pair. We show that our refinement network significantly improves the performance of correspondence networks on image matching, homography estimation, and localization tasks. In addition, we show that our learned refinement generalizes to fully-supervised methods without retraining, which leads us to state-of-the-art localization performance. The code is available at https://github.com/GrumpyZhou/patch2pix.