scispace - formally typeset
Search or ask a question

Showing papers on "Matching (statistics) published in 2020"


Book ChapterDOI
Yi Li1, Gu Wang1, Xiangyang Ji1, Yu Xiang2, Dieter Fox2 
01 Mar 2020
TL;DR: A novel deep neural network for 6D pose matching named DeepIM is proposed that is able to iteratively refine the pose by matching the rendered image against the observed image.
Abstract: Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the input image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using an untangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.

340 citations


Proceedings ArticleDOI
Ming Zhong1, Pengfei Liu1, Yiran Chen1, Danqing Wang1, Xipeng Qiu1, Xuanjing Huang1 
01 Jul 2020
TL;DR: This paper forms the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be matched in a semantic space to create a semantic matching framework.
Abstract: This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences, we formulate the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be (extracted from the original text) matched in a semantic space. Notably, this paradigm shift to semantic matching framework is well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors based on the property of the dataset. Besides, even instantiating the framework with a simple form of a matching model, we have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1). Experiments on the other five datasets also show the effectiveness of the matching framework. We believe the power of this matching-based summarization framework has not been fully exploited. To encourage more instantiations in the future, we have released our codes, processed dataset, as well as generated summaries in {https://github.com/maszhongming/MatchSum}.

317 citations


Journal ArticleDOI
TL;DR: In this paper, a discussion of the sustainability and travel behavior impacts of ride-hailing is provided, based on an extensive literature review of studies from both developed and developing countries.
Abstract: A discussion of the sustainability and travel behaviour impacts of ride-hailing is provided, based on an extensive literature review of studies from both developed and developing countries. The effects of ride-hailing on vehicle-kilometres travelled (VKT) and traffic externalities such as congestion, pollution and crashes are analysed. Modal substitution, user characterisation and induced travel outputs are also examined. A summary of findings follows. On the one hand, ride-hailing improves the comfort and security of riders for several types of trips and increases mobility for car-free households and for people with physical and cognitive limitations. Ride-hailing has the potential to be more efficient for rider-driver matching than street-hailing. Ride-hailing is expected to reduce parking requirements, shifting attention towards curb management. On the other hand, results on the degree of complementarity and substitution between ride-hailing and public transport and on the impact of ride-hailing on VKT are mixed; however, there is a tendency from studies with updated data to show that the ride-hailing substitution effect of public transport is stronger than the complementarity effect in several cities and that ride-hailing has incremented motorised traffic and congestion. Early evidence on the impact of ride-hailing on the environment and energy consumption is also concerning. A longer-term assessment must estimate the ride-hailing effect on car ownership. A social welfare analysis that accounts for both the benefits and costs of ride-hailing remains unexplored. The relevance of shared rides in a scenario with mobility-as-a-service subscription packages and automated vehicles is also highlighted.

181 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work designs a novel cross-attention mechanism, which is able to exploit not only the intra-modality relationship within each modality, but also the inter- modality relationship between image regions and sentence words to complement and enhance each other for image and sentence matching.
Abstract: The key of image and sentence matching is to accurately measure the visual-semantic similarity between an image and a sentence. However, most existing methods make use of only the intra-modality relationship within each modality or the inter-modality relationship between image regions and sentence words for the cross-modal matching task. Different from them, in this work, we propose a novel MultiModality Cross Attention (MMCA) Network for image and sentence matching by jointly modeling the intra-modality and inter-modality relationships of image regions and sentence words in a unified deep model. In the proposed MMCA, we design a novel cross-attention mechanism, which is able to exploit not only the intra-modality relationship within each modality, but also the inter-modality relationship between image regions and sentence words to complement and enhance each other for image and sentence matching. Extensive experimental results on two standard benchmarks including Flickr30K and MS-COCO demonstrate that the proposed model performs favorably against state-of-the-art image and sentence matching methods.

180 citations


Journal ArticleDOI
01 Sep 2020
TL;DR: This paper proposed Ditto, a novel entity matching system based on pre-trained Transformer-based language models, which fine-tuned and cast EM as a sequence-pair classification problem to leverage such models with a simple architecture.
Abstract: We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models. We fine-tune and cast EM as a sequence-pair classification problem to leverage such models with a simple architecture. Our experiments show that a straight-forward application of language models such as BERT, DistilBERT, or RoBERTa pre-trained on large text corpora already significantly improves the matching quality and outperforms previous state-of-the-art (SOTA), by up to 29% of F1 score on benchmark datasets. We also developed three optimization techniques to further improve Ditto's matching capability. Ditto allows domain knowledge to be injected by highlighting important pieces of input information that may be of interest when making matching decisions. Ditto also summarizes strings that are too long so that only the essential information is retained and used for EM. Finally, Ditto adapts a SOTA technique on data augmentation for text to EM to augment the training data with (difficult) examples. This way, Ditto is forced to learn "harder" to improve the model's matching capability. The optimizations we developed further boost the performance of Ditto by up to 9.8%. Perhaps more surprisingly, we establish that Ditto can achieve the previous SOTA results with at most half the number of labeled data. Finally, we demonstrate Ditto's effectiveness on a real-world large-scale EM task. On matching two company datasets consisting of 789K and 412K records, Ditto achieves a high F1 score of 96.5%.

175 citations


Proceedings Article
26 Oct 2020
TL;DR: This paper proposes the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework and optimizing the architectures of the entire pipeline jointly.
Abstract: To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algorithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at this https URL.

166 citations


Journal ArticleDOI
TL;DR: The proposed CASC is a joint framework that performs cross-modal attention for local alignment and multilabel prediction for global semantic consistence and directly extracts semantic labels from available sentence corpus without additional labor cost, which provides a global similarity constraint for the aggregated region-word similarity obtained by the local alignment.
Abstract: The task of image–text matching refers to measuring the visual-semantic similarity between an image and a sentence. Recently, the fine-grained matching methods that explore the local alignment between the image regions and the sentence words have shown advance in inferring the image–text correspondence by aggregating pairwise region-word similarity. However, the local alignment is hard to achieve as some important image regions may be inaccurately detected or even missing. Meanwhile, some words with high-level semantics cannot be strictly corresponding to a single-image region. To tackle these problems, we address the importance of exploiting the global semantic consistence between image regions and sentence words as complementary for the local alignment. In this article, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistency (CASC) for image–text matching. The proposed CASC is a joint framework that performs cross-modal attention for local alignment and multilabel prediction for global semantic consistence. It directly extracts semantic labels from available sentence corpus without additional labor cost, which further provides a global similarity constraint for the aggregated region-word similarity obtained by the local alignment. Extensive experiments on Flickr30k and Microsoft COCO (MSCOCO) data sets demonstrate the effectiveness of the proposed CASC on preserving global semantic consistence along with the local alignment and further show its superior image–text matching performance compared with more than 15 state-of-the-art methods.

152 citations


Journal ArticleDOI
25 Feb 2020
TL;DR: In a ride-sharing system, arriving customers must be matched with available drivers, and decisions affect the overall number of customers matched, because they impact whether future available drivers are matched.
Abstract: In a ride-sharing system, arriving customers must be matched with available drivers. These decisions affect the overall number of customers matched, because they impact whether future available dri...

142 citations


Journal ArticleDOI
TL;DR: The main idea is to adaptively cluster the putative matches into several motion consistent clusters together with an outlier/mismatch cluster in the context of feature matching, which enables the approach to achieve quasi-linear time complexity.
Abstract: This paper focuses on removing mismatches from given putative feature matches created typically based on descriptor similarity. To achieve this goal, existing attempts usually involve estimating the image transformation under a geometrical constraint, where a pre-defined transformation model is demanded. This severely limits the applicability, as the transformation could vary with different data and is complex and hard to model in many real-world tasks. From a novel perspective, this paper casts the feature matching into a spatial clustering problem with outliers. The main idea is to adaptively cluster the putative matches into several motion consistent clusters together with an outlier/mismatch cluster. To implement the spatial clustering, we customize the classic density based spatial clustering method of applications with noise ( DBSCAN ) in the context of feature matching, which enables our approach to achieve quasi-linear time complexity. We also design an iterative clustering strategy to promote the matching performance in case of severely degraded data. Extensive experiments on several datasets involving different types of image transformations demonstrate the superiority of our approach over state-of-the-art alternatives. Our approach is also applied to near-duplicate image retrieval and co-segmentation and achieves promising performance.

125 citations


Journal ArticleDOI
Youmin Zhang1, Yimin Chen1, Xiao Bai1, Suihanjin Yu1, Kun Yu, Zhiwei Li, Kuiyuan Yang 
03 Apr 2020
TL;DR: This paper proposes to directly add constraints to the cost volume by filtering cost volume with unimodal distribution peaked at true disparities and achieves state-of-the-art performance on Scene Flow and two KITTI stereo benchmarks.
Abstract: State-of-the-art deep learning based stereo matching approaches treat disparity estimation as a regression problem, where loss function is directly defined on true disparities and their estimated ones. However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overfitting since the cost volume is under constrained. In this paper, we propose to directly add constraints to the cost volume by filtering cost volume with unimodal distribution peaked at true disparities. In addition, variances of the unimodal distributions for each pixel are estimated to explicitly model matching uncertainty under different contexts. The proposed architecture achieves state-of-the-art performance on Scene Flow and two KITTI stereo benchmarks. In particular, our method ranked the 1st place of KITTI 2012 evaluation and the 4th place of KITTI 2015 evaluation (recorded on 2019.8.20). The codes of AcfNet are available at: https://github.com/youmi-zym/AcfNet.

118 citations


Proceedings Article
01 Jan 2020
TL;DR: In crowd counting, each training image contains multiple people, where each person is annotated by a dot, and DM-Count uses Optimal Transport (OT) to measure the similarity between the normalized predicted density map and the normalized ground truth density map to stabilize OT computation.
Abstract: In crowd counting, each training image contains multiple people, where each person is annotated by a dot. Existing crowd counting methods need to use a Gaussian to smooth each annotated dot or to estimate the likelihood of every pixel given the annotated point. In this paper, we show that imposing Gaussians to annotations hurts generalization performance. Instead, we propose to use Distribution Matching for crowd COUNTing (DM-Count). In DM-Count, we use Optimal Transport (OT) to measure the similarity between the normalized predicted density map and the normalized ground truth density map. To stabilize OT computation, we include a Total Variation loss in our model. We show that the generalization error bound of DM-Count is tighter than that of the Gaussian smoothed methods. In terms of Mean Absolute Error, DM-Count outperforms the previous state-of-the-art methods by a large margin on two large-scale counting datasets, UCF-QNRF and NWPU, and achieves the state-of-the-art results on the ShanghaiTech and UCF-CC50 datasets. Notably, DM-Count ranked first on the leaderboard for the NWPU benchmark, reducing the error of the state-of-the-art published result by approximately 16%. Code is available at this https URL.

Posted ContentDOI
22 Jan 2020
TL;DR: DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria and patients records into a shared latent space for matching inference, which outperformed the best baseline by up to 12.4% in average F1.
Abstract: Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment. The core problem of patient-trial matching is to find qualified patients for a trial, where patient information is stored in electronic health records (EHR) while trial eligibility criteria (EC) are described in text documents available on the web. How to represent longitudinal patient EHR? How to extract complex logical rules from EC? Most existing works rely on manual rule-based extraction, which is time consuming and inflexible for complex inference. To address these challenges, we proposed DeepEnroll, a cross-modal inference learning model to jointly encode enrollment criteria (text) and patients records (tabular data) into a shared latent space for matching inference. DeepEnroll applies a pre-trained Bidirectional Encoder Representations from Transformers(BERT) model to encode clinical trial information into sentence embedding. And uses a hierarchical embedding model to represent patient longitudinal EHR. In addition, DeepEnroll is augmented by a numerical information embedding and entailment module to reason over numerical information in both EC and EHR. These encoders are trained jointly to optimize patient-trial matching score. We evaluated DeepEnroll on the trial-patient matching task with demonstrated on real world datasets. DeepEnroll outperformed the best baseline by up to 12.4% in average F1.

Journal ArticleDOI
TL;DR: This study proposes a model that delineates the online matching process in ride-sourcing markets and uses it to examine the impact of the matching time interval and matching radius on system performance and to jointly optimize the two variables under different levels of supply and demand.
Abstract: With the availability of the location information of drivers and passengers, ride-sourcing platforms can now provide increasingly efficient online matching compared with physical searching and meeting performed in the traditional taxi market. The matching time interval (the time interval over which waiting passengers and idle drivers are accumulated and then subjected to peer-to-peer matching) and matching radius (or maximum allowable pick-up distance, within which waiting passengers and idle drivers can be matched or paired) are two key control variables that a platform can employ to optimize system performance in an online matching system. By appropriately extending the matching time interval, the platform can accumulate large numbers of waiting (or unserved) passengers and idle drivers and thus match the two pools with a reduced expected pick-up distance. However, if the matching time interval is excessively long, certain passengers may become impatient and even abandon their requests. Meanwhile, a short matching radius can reduce the expected pick-up distance but may decrease the matching rate as well. Therefore, the matching time interval and matching radius should be optimized to enhance system efficiency in terms of passenger waiting time, vehicle utilization, and matching rate. This study proposes a model that delineates the online matching process in ride-sourcing markets. The model is then used to examine the impact of the matching time interval and matching radius on system performance and to jointly optimize the two variables under different levels of supply and demand. Numerical experiments are conducted to demonstrate how the proposed modeling and optimization approaches can improve the real-time matching of ride-sourcing platforms.

Book ChapterDOI
Haoran Wang1, Ying Zhang2, Zhong Ji1, Yanwei Pang1, Lin Ma2 
17 Jul 2020
TL;DR: A Consensus-aware Visual-Semantic Embedding model is proposed to incorporate the consensus information, namely the commonsense knowledge shared between both modalities, into image-text matching and learns the associations and alignments between image and text based on the exploited consensus.
Abstract: Image-text matching plays a central role in bridging vision and language. Most existing approaches only rely on the image-text instance pair to learn their representations, thereby exploiting their matching relationships and making the corresponding alignments. Such approaches only exploit the superficial associations contained in the instance pairwise data, with no consideration of any external commonsense knowledge, which may hinder their capabilities to reason the higher-level relationships between image and text. In this paper, we propose a Consensus-aware Visual-Semantic Embedding (CVSE) model to incorporate the consensus information, namely the commonsense knowledge shared between both modalities, into image-text matching. Specifically, the consensus information is exploited by computing the statistical co-occurrence correlations between the semantic concepts from the image captioning corpus and deploying the constructed concept correlation graph to yield the consensus-aware concept (CAC) representations. Afterwards, CVSE learns the associations and alignments between image and text based on the exploited consensus as well as the instance-level representations for both modalities. Extensive experiments conducted on two public datasets verify that the exploited consensus makes significant contributions to constructing more meaningful visual-semantic embeddings, with the superior performances over the state-of-the-art approaches on the bidirectional image and text retrieval task. Our code of this paper is available at: https://github.com/BruceW91/CVSE.

Journal ArticleDOI
TL;DR: Propensity score matching (PSM) is a commonly used statistical method in orthopedic surgery research that accomplishes the removal of confounding bias from observational cohorts where the benefit of randomization is not possible.
Abstract: Propensity score matching (PSM) is a commonly used statistical method in orthopedic surgery research that accomplishes the removal of confounding bias from observational cohorts where the benefit of randomization is not possible. An alternative to multiple regression analysis, PSM attempts to reduce the effects of confounders by matching already treated subjects with control subjects who exhibit a similar propensity for treatment based on preexisting covariates that influence treatment selection. It, therefore, establishes a new control group by discarding outlier control subjects. This new control group reduces the unwanted influences of covariates, allowing for proper measurement of the intended variable. An example from orthopedic spine literature is discussed to illustrate how PSM may be applied in practice. PSM is uniquely valuable in its utility and simplicity, but it is limited in that it requires the removal of data and works primarily on binary treatments. In addition to matching, the propensity score can be used for stratification, covariate adjustments, and inverse probability of treatment weighting, but these topics are outside the scope of this paper. Personnel in the orthopedic field would benefit from learning about the function and application of this method given its common use in the orthopedic literature.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work solves the problem of establishing dense correspondences across semantically similar images by converting the maximization problem to the optimal transport formulation and incorporating the staircase weights into optimal transport algorithm to act as empirical distributions.
Abstract: Establishing dense correspondences across semantically similar images is a challenging task. Due to the large intra-class variation and background clutter, two common issues occur in current approaches. First, many pixels in a source image are assigned to one target pixel, i.e., many to one matching. Second, some object pixels are assigned to the background pixels, i.e., background matching. We solve the first issue by global feature matching, which maximizes the total matching correlations between images to obtain a global optimal matching matrix. The row sum and column sum constraints are enforced on the matching matrix to induce a balanced solution, thus suppressing the many to one matching. We solve the second issue by applying a staircase function on the class activation maps to re-weight the importance of pixels into four levels from foreground to background. The whole procedure is combined into a unified optimal transport algorithm by converting the maximization problem to the optimal transport formulation and incorporating the staircase weights into optimal transport algorithm to act as empirical distributions. The proposed algorithm achieves state-of-the-art performance on four benchmark datasets. Notably, a 26\% relative improvement is achieved on the large-scale SPair-71k dataset.

Proceedings ArticleDOI
Dehong Gao1, Linbo Jin1, Ben Chen1, Minghui Qiu1, Peng Li1, Yi Wei1, Yi Hu1, Hao Wang1 
25 Jul 2020
TL;DR: The fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts, so FashionBERT, which leverages patches as image features, is proposed, which learns high level representations of texts and images.
Abstract: In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an adaptive loss to trade off multitask learning in the FashionBERT modeling. Two tasks (i.e., text and image matching and cross-modal retrieval) are incorporated to evaluate FashionBERT. On the public dataset, experiments demonstrate FashionBERT achieves significant improvements in performances than the baseline and state-of-the-art approaches. In practice, FashionBERT is applied in a concrete cross-modal retrieval application. We provide the detailed matching performance and inference efficiency analysis.

Journal ArticleDOI
01 Jan 2020
TL;DR: The strategy of combining the generic models with online training is easily accepted and achieves higher levels of user satisfaction (as measured by subjective reports), which provides a valuable new strategy for improving the performance of P300-based BCI.
Abstract: P300-based brain-computer interfaces (BCIs) provide an additional communication channel for individuals with communication disabilities. In general, P300-based BCIs need to be trained, offline, for a considerable period of time, which causes users to become fatigued. This reduces the efficiency and performance of the system. In order to shorten calibration time and improve system performance, we introduce the concept of a generic model set. We used ERP data from 116 participants to train the generic model set. The resulting set consists of ten models, which are trained by weighted linear discriminant analysis (WLDA). Twelve new participants were then invited to test the validity of the generic model set. The results demonstrated that all new participants matched the best generic model. The resulting mean classification accuracy equaled 80% after online training, an accuracy that was broadly equivalent to the typical training model method. Moreover, the calibration time was shortened by 70.7% of the calibration time of the typical model method. In other words, the best matching model method only took 81s to calibrate, while the typical model method took 276s. There were also significant differences in both accuracy and raw bit rate between the best and the worst matching model methods. We conclude that the strategy of combining the generic models with online training is easily accepted and achieves higher levels of user satisfaction (as measured by subjective reports). Thus, we provide a valuable new strategy for improving the performance of P300-based BCI.

Book ChapterDOI
23 Aug 2020
TL;DR: Liao et al. as mentioned in this paper formulated person image matching as finding local correspondences in feature maps, and constructed query-adaptive convolution kernels on the fly to achieve local matching.
Abstract: For person re-identification, existing deep networks often focus on representation learning. However, without transfer learning, the learned model is fixed as is, which is not adaptable for handling various unseen scenarios. In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps. We treat image matching as finding local correspondences in feature maps, and construct query-adaptive convolution kernels on the fly to achieve local matching. In this way, the matching process and results are interpretable, and this explicit matching is more generalizable than representation features to unseen scenarios, such as unknown misalignments, pose or viewpoint changes. To facilitate end-to-end training of this architecture, we further build a class memory module to cache feature maps of the most recent samples of each class, so as to compute image matching losses for metric learning. Through direct cross-dataset evaluation, the proposed Query-Adaptive Convolution (QAConv) method gains large improvements over popular learning methods (about 10%+ mAP), and achieves comparable results to many transfer learning methods. Besides, a model-free temporal cooccurrence based score weighting method called TLift is proposed, which improves the performance to a further extent, achieving state-of-the-art results in cross-dataset person re-identification. Code is available at https://github.com/ShengcaiLiao/QAConv.

Journal ArticleDOI
TL;DR: A method of text matching, topical inverse regression matching, that allows the analyst to match both on the topical content of confounding documents and the probability that each of these documents is treated is proposed.
Abstract: We identify situations in which conditioning on text can address confounding in observational studies. We argue that a matching approach is particularly well‐suited to this task, but existing matching methods are ill‐equipped to handle high‐dimensional text data. Our proposed solution is to estimate a low‐dimensional summary of the text and condition on this summary via matching. We propose a method of text matching, topical inverse regression matching, that allows the analyst to match both on the topical content of confounding documents and the probability that each of these documents is treated. We validate our approach and illustrate the importance of conditioning on text to address confounding with two applications: the effect of perceptions of author gender on citation counts in the international relations literature and the effects of censorship on Chinese social media users.

Journal ArticleDOI
TL;DR: In this article, a matching model of energy supply and demand of the integrated energy system in coastal areas in the United States was constructed by using the matching relationship between energy supply-demand, so as to complete the matching of the matching.
Abstract: Zhao, X; Gu, B; Gao, F, and Chen, S, 2020 Matching model of energy supply and demand of the integrated energy system in coastal areas In: Yang, Y; Mi, C; Zhao, L, and Lam, S (eds), Global Topics and New Trends in Coastal Research: Port, Coastal and Ocean Engineering Journal of Coastal Research, Special Issue No 103, pp 983–989 Coconut Creek (Florida), ISSN 0749-0208Due to the uncertainty of the selection range of the main equipment capacity of the distributed energy system in coastal areas, the matching ability of energy supply and demand is relatively low From the two directions of “power by heat” and “heat by electricity”, the operation and output modes of energy in the system are studied; the selection range of the main equipment capacity of the distributed energy system is determined by calculating the load of energy supply and demand; according to the selection range, the necessary mapping conditions of the matching relationship between energy supply and demand are analyzed, and the matching model of energy supply and demand is constructed by using the matching relationship between energy supply and demand, so as to complete the matching of energy supply and demand of the integrated energy system in coastal areas The experimental results show that the total energy output of the integrated energy system in coastal areas reaches 1867 kJ in unit time, but the proportion occupancy rate between the output nodes is the lowest, which has a good matching ability of energy supply and demand

Journal ArticleDOI
TL;DR: A Multi-granularity Image-text Alignments (MIA) model is proposed to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id and obtains the state-of-the-art performance on the CUHK-PEDES dataset.
Abstract: Description-based person re-identification (Re-id) is an important task in video surveillance that requires discriminative cross-modal representations to distinguish different people. It is difficult to directly measure the similarity between images and descriptions due to the modality heterogeneity (the cross-modal problem). And all samples belonging to a single category (the fine-grained problem) makes this task even harder than the conventional image-description matching task. In this paper, we propose a Multi-granularity Image-text Alignments (MIA) model to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id. Specifically, three different granularities, i.e. , global-global, global-local and local-local alignments are carried out hierarchically. Firstly, the global-global alignment in the Global Contrast (GC) module is for matching the global contexts of images and descriptions. Secondly, the global-local alignment employs the potential relations between local components and global contexts to highlight the distinguishable components while eliminating the uninvolved ones adaptively in the Relation-guided Global-local Alignment (RGA) module. Thirdly, as for the local-local alignment, we match visual human parts with noun phrases in the Bi-directional Fine-grained Matching (BFM) module. The whole network combining multiple granularities can be end-to-end trained without complex pre-processing. To address the difficulties in training the combination of multiple granularities, an effective step training strategy is proposed to train these granularities step-by-step. Extensive experiments and analysis have shown that our method obtains the state-of-the-art performance on the CUHK-PEDES dataset and outperforms the previous methods by a significant margin.

Journal ArticleDOI
TL;DR: Using a small example as an illustration, this article reviews multivariate matching from the perspective of a working scientist who wishes to make effective use of available methods.
Abstract: Using a small example as an illustration, this article reviews multivariate matching from the perspective of a working scientist who wishes to make effective use of available methods. The several g...

Journal ArticleDOI
TL;DR: Market thickness is a key parameter that can make or break a platform’s business model and can offer more opportunities for participants to meet and higher chances that a potential participant will meet.
Abstract: Market thickness is a key parameter that can make or break a platform’s business model. Thicker markets can offer more opportunities for participants to meet and higher chances that a potential mat...

Journal ArticleDOI
TL;DR: This work constructs a new multi-modality Re-ID dataset, called SYSU-MM01, and designs a modality-gated node as a universal representation of bothmodality-specific and shared structures for constructing a structure-learnable feature extractor called Modality-Gated Extractor.
Abstract: Person re-identification (Re-ID) is an important problem in video surveillance for matching pedestrian images across non-overlapping camera views. Currently, most works focus on RGB-based Re-ID. However, RGB images are not well suited to a dark environment; consequently, infrared (IR) imaging becomes necessary for indoor scenes with low lighting and 24-h outdoor scene surveillance systems. In such scenarios, matching needs to be performed between RGB images and IR images, which exhibit different visual characteristics; this cross-modality matching problem is more challenging than RGB-based Re-ID due to the lack of visible colour information in IR images. To address this challenge, we study the RGB-IR cross-modality Re-ID (RGB-IR Re-ID) problem. Rather than applying existing cross-modality matching models that operate under the assumption of identical data distributions between training and testing sets to handle the discrepancy between RGB and IR modalities for Re-ID, we cast learning shared knowledge for cross-modality matching as the problem of cross-modality similarity preservation. We exploit same-modality similarity as the constraint to guide the learning of cross-modality similarity along with the alleviation of modality-specific information, and finally propose a Focal Modality-Aware Similarity-Preserving Loss. To further assist the feature extractor in extracting shared knowledge, we design a modality-gated node as a universal representation of both modality-specific and shared structures for constructing a structure-learnable feature extractor called Modality-Gated Extractor. For validation, we construct a new multi-modality Re-ID dataset, called SYSU-MM01, to enable wider study of this problem. Extensive experiments on this SYSU-MM01 dataset show the effectiveness of our method. Download link of dataset: https://github.com/wuancong/SYSU-MM01.

Journal ArticleDOI
TL;DR: In this paper, the authors show how to interpret two or more coefficients in a regression model as causal parameters under selection on observables and provide a framework for clearly delineating which effects are presumed to be identified and thus merit a causal interpretation.
Abstract: A common causal identification strategy in political science is selection on observables. This strategy assumes one observes a set of covariates that is, after statistical adjustment, sufficient to make treatment status as-if random. Under adjustment methods such as matching or inverse probability weighting, coefficients for control variables are treated as nuisance parameters and are not directly estimated. This is in direct contrast to regression approaches where estimated parameters are obtained for all covariates. Analysts often find it tempting to give a causal interpretation to all the parameters in such regression models—indeed, such interpretations are often central to the proposed research design. In this paper, we ask when we can justify interpreting two or more coefficients in a regression model as causal parameters. We demonstrate that analysts must appeal to causal identification assumptions to give estimates causal interpretations. Under selection on observables, this task is complicated by the fact that more than one causal effect might be identified. We show how causal graphs provide a framework for clearly delineating which effects are presumed to be identified and thus merit a causal interpretation, and which are not. We conclude with a set of recommendations for how researchers should interpret estimates from regression models when causal inference is the goal.

Journal ArticleDOI
TL;DR: This work combines the benefits of both approaches and, by breaking down the Motion Matching algorithm into its individual steps, shows how learned, scalable alternatives can be used to replace each operation in turn.
Abstract: In this paper we present a learned alternative to the Motion Matching algorithm which retains the positive properties of Motion Matching but additionally achieves the scalability of neural-network-based generative models. Although neural-network-based generative models for character animation are capable of learning expressive, compact controllers from vast amounts of animation data, methods such as Motion Matching still remain a popular choice in the games industry due to their flexibility, predictability, low preprocessing time, and visual quality - all properties which can sometimes be difficult to achieve with neural-network-based methods. Yet, unlike neural networks, the memory usage of such methods generally scales linearly with the amount of data used, resulting in a constant trade-off between the diversity of animation which can be produced and real world production budgets. In this work we combine the benefits of both approaches and, by breaking down the Motion Matching algorithm into its individual steps, show how learned, scalable alternatives can be used to replace each operation in turn. Our final model has no need to store animation data or additional matching meta-data in memory, meaning it scales as well as existing generative models. At the same time, we preserve the behavior of Motion Matching, retaining the quality, control, and quick iteration time which are so important in the industry.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: A new training methodology is proposed which embeds the feature detector in a complete vision pipeline, and where the learnable parameters are trained in an end-to-end fashion, and demonstrates that the accuracy of a state-of-the-art learning-based feature detector can be increased when trained for the task it is supposed to solve at test time.
Abstract: We address a core problem of computer vision: Detection and description of 2D feature points for image matching. For a long time, hand-crafted designs, like the seminal SIFT algorithm, were unsurpassed in accuracy and efficiency. Recently, learned feature detectors emerged that implement detection and description using neural networks. Training these networks usually resorts to optimizing low-level matching scores, often pre-defining sets of image patches which should or should not match, or which should or should not contain key points. Unfortunately, increased accuracy for these low-level matching scores does not necessarily translate to better performance in high-level vision tasks. We propose a new training methodology which embeds the feature detector in a complete vision pipeline, and where the learnable parameters are trained in an end-to-end fashion. We overcome the discrete nature of key point selection and descriptor matching using principles from reinforcement learning. As an example, we address the task of relative pose estimation between a pair of images. We demonstrate that the accuracy of a state-of-the-art learning-based feature detector can be increased when trained for the task it is supposed to solve at test time. Our training methodology poses little restrictions on the task to learn, and works for any architecture which predicts key point heat maps, and descriptors for key point locations.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a simple model of dynamic matching in networked markets, where agents arrive and depart stochastically and the composition of the trade network depends endogenously on the matching algorithm.
Abstract: We introduce a simple model of dynamic matching in networked markets, where agents arrive and depart stochastically and the composition of the trade network depends endogenously on the matching alg...

Journal ArticleDOI
TL;DR: Future conservation impact evaluations could be improved by increased planning of evaluations alongside the intervention, better integration of qualitative methods, considering spillover effects at larger spatial scales, and more publication of preanalysis plans.
Abstract: The awareness of the need for robust impact evaluations in conservation is growing and statistical matching techniques are increasingly being used to assess the impacts of conservation interventions. Used appropriately matching approaches are powerful tools, but they also pose potential pitfalls. We outlined important considerations and best practice when using matching in conservation science. We identified 3 steps in a matching analysis. First, develop a clear theory of change to inform selection of treatment and controls and that accounts for real-world complexities and potential spillover effects. Second, select the appropriate covariates and matching approach. Third, assess the quality of the matching by carrying out a series of checks. The second and third steps can be repeated and should be finalized before outcomes are explored. Future conservation impact evaluations could be improved by increased planning of evaluations alongside the intervention, better integration of qualitative methods, considering spillover effects at larger spatial scales, and more publication of preanalysis plans. Implementing these improvements will require more serious engagement of conservation scientists, practitioners, and funders to mainstream robust impact evaluations into conservation. We hope this article will improve the quality of evaluations and help direct future research to continue to improve the approaches on offer.