scispace - formally typeset
Search or ask a question

Showing papers on "Matching (statistics) published in 2004"


Journal ArticleDOI
TL;DR: In this article, the authors review the state of the art in estimating average treatment effects under various sets of assumptions, including exogeneity, unconfoundedness, or selection on observables.
Abstract: Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described as exogeneity, unconfoundedness, or selection on observables. The implication of these assumptions is that systematic (for example, average or distributional) differences in outcomes between treated and control units with the same values for the covariates are attributable to the treatment. Recent analysis has considered estimation and inference for average treatment effects under weaker assumptions than typical of the earlier literature by avoiding distributional and functional-form assump- tions. Various methods of semiparametric estimation have been proposed, including estimating the unknown regression functions, matching, meth- ods using the propensity score such as weighting and blocking, and combinations of these approaches. In this paper I review the state of this literature and discuss some of its unanswered questions, focusing in particular on the practical implementation of these methods, the plausi- bility of this exogeneity assumption in economic applications, the relative performance of the various semiparametric estimators when the key assumptions (unconfoundedness and overlap) are satise ed, alternative estimands such as quantile treatment effects, and alternate methods such as Bayesian inference.

2,370 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an implementation of matching estimators for average treatment effects in Stata, which allows to estimate the average effect for all units or only for the treated or control units; to choose the number of matches; specify the distance metric; select a bias adjustment; and to use heteroskedastic robust variance estimators.
Abstract: T his paper presents an implementation of matching estimators for average treatment effects in Stata. The nnmatch command allows you to estimate the average effect for all units or only for the treated or control units; to choose the number of matches; to specify the distance metric; to select a bias adjustment; and to use heteroskedastic-robust variance estimators.

1,371 citations


Journal ArticleDOI
TL;DR: In this paper, the effects of college quality using propensity score matching methods and the National Longitudinal Survey of Youth 1979 cohort were estimated using regression-based and matching-based methods.

750 citations


Journal ArticleDOI
TL;DR: The Rosenbaum bounds approach as mentioned in this paper allows the analyst to determine how strongly an unmeasured confounding variable must affect selection into treatment in order to undermine the conclusions about causal effects from a matching analysis.
Abstract: Propensity score matching provides an estimate of the effect of a “treatment” variable on an outcome variable that is largely free of bias arising from an association between treatment status and observable variables. However, matching methods are not robust against “hidden bias” arising from unobserved variables that simultaneously affect assignment to treatment and the outcome variable. One strategy for addressing this problem is the Rosenbaum bounds approach, which allows the analyst to determine how strongly an unmeasured confounding variable must affect selection into treatment in order to undermine the conclusions about causal effects from a matching analysis. Instrumental variables (IV) estimation provides an alternative strategy for the estimation of causal effects, but the method typically reduces the precision of the estimate and has an additional source of uncertainty that derives from the untestable nature of the assumptions of the IV approach. A method of assessing this additional uncertainty...

685 citations


Journal ArticleDOI
TL;DR: In this article, the authors examine the roles played by the propensity score (the probability of selection into treatment) in matching, instrumental variable, and control function methods and contrast the roles of exclusion restrictions in matching and selection models.
Abstract: This paper investigates four topics. (1) It examines the different roles played by the propensity score (the probability of selection into treatment) in matching, instrumental variable, and control function methods. (2) It contrasts the roles of exclusion restrictions in matching and selection models. (3) It characterizes the sensitivity of matching to the choice of conditioning variables and demonstrates the greater robustness of control function methods to misspecification of the conditioning variables. (4) It demonstrates the problem of choosing the conditioning variables in matching and the failure of conventional model selection criteria when candidate conditioning variables are not exogenous in a sense defined in this paper.

631 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the performance of full matching for the first time, modifying it in order to minimize variance as well as bias and then using it to compare coached and uncoached takers of the SAT.
Abstract: Among matching techniques for observational studies, full matching is in principle the best, in the sense that its alignment of comparable treated and control subjects is as good as that of any alternate method, and potentially much better. This article evaluates the practical performance of full matching for the first time, modifying it in order to minimize variance as well as bias and then using it to compare coached and uncoached takers of the SAT. In this new version, with restrictions on the ratio of treated subjects to controls within matched sets, full matching makes use of many more observations than does pair matching, but achieves far closer matches than does matching with k≥ 2 controls. Prior to matching, the coached and uncoached groups are separated on the propensity score by 1.1 SDs. Full matching reduces this separation to 1% or 2% of an SD. In older literature comparing matching and regression, Cochran expressed doubts that any method of adjustment could substantially reduce observed bias ...

537 citations


Proceedings ArticleDOI
13 Jun 2004
TL;DR: The iMAP system is described, which semi-automatically discovers both 1-1 and complex matches, and introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly.
Abstract: Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate*(1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.

420 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare propensity-score matching methods with covariate matching estimators and propose two new matching metrics incorporating the treatment outcome information and participation indicator information, and discuss the motivations of different metrics.
Abstract: We compare propensity-score matching methods with covariatematching estimators. We first discuss the data requirements of propensity-score matching estimators and covariate matching estimators. Then we propose two new matching metrics incorporating the treatment outcome information and participation indicator information, and discuss the motivations of different metrics. Next we study the small-sample properties of propensity-score matching versus covariate matching estimators, and of different matching metrics, through Monte Carlo experiments. Through a series of simulations, we provide some guidance to practitioners on how to choose among different matching estimators and matching metrics.

406 citations


Book ChapterDOI
01 Jan 2004
TL;DR: In this article, the authors focus on criteria and methods for ranking subsets of a set of objects, and identify contexts in which subset rankings are important and discuss a number of ways in which such rankings might be obtained.
Abstract: This chapter focuses on criteria and methods for ranking subsets of a set of objects. There are many situations in which rankings of individual objects suffice for classification or decision making purposes, but many other situations call for rankings that involve subsets of two or more objects. The chapter identifies contexts in which subset rankings are important and discusses a number of ways in which such rankings might be obtained.

388 citations


Journal ArticleDOI
TL;DR: In this paper, the finite-sample properties of matching and weighting estimators, often used for estimating average treatment effects, are analyzed and potential and feasible precision gains relative to pair-matching are examined.
Abstract: In this paper the finite-sample properties of matching and weighting estimators, often used for estimating average treatment effects, are analyzed. Potential and feasible precision gains relative to pair-matching are examined. Local linear matching (with and without trimming), k-nearest neighbour matching, and particularly the weighting estimators performed worst. Ridge matching, on the other hand, leads to an about 25% smaller MSE vis-a-vis pair-matching. In addition, ridge matching is least sensitive to the design choice. Download Discussion Paper: (pdf, 516 kb)

353 citations


Patent
13 Feb 2004
TL;DR: In this article, the functions and operations of a matching service are disclosed, including approximating the satisfaction that a user has in the relationships that the user forms with others and identifying candidates for a relationship with the user based on the approximated satisfaction.
Abstract: The functions and operations of a matching service are disclosed. This includes approximating the satisfaction that a user of the matching service has in the relationships that the user forms with others and identifying candidates for a relationship with the user based on the approximated satisfaction. This also includes approximating the satisfaction that the user will have in a relationship with a particular candidate. The matching service also identifies two parties for a relationship. The matching service makes available a plurality of communication levels at which the parties can communicate. Each communication level allows the parties to exchange information in a different format. The parties are permitted to exchange information at one of the communication levels.

Patent
14 Aug 2004
TL;DR: In this paper, a system and method is described for instantly connecting and matching people and business entities with reciprocal interests in the location of their presence in real-time, using portable communication devices using wireless communication.
Abstract: A system and method is described for instantly connecting and matching people and business entities with reciprocal interests in the location of their presence in real-time. Portable communication devices using wireless communication are used for transmitting data between users with reciprocal interests connected through a peer-to-peer network or in a client-server environment. Telephone users utilize an Interactive Voice Response system to communicate with other users of reciprocal interest. A matching algorithm running in a remote computer connected to the devices through network makes an initial assessment about the likelihood of a match, and then with the permission of the requester and the respondent, sets up communication sessions.

01 Jan 2004
TL;DR: 7 different types of block matching algorithms used for motion estimation in video compression are implemented and compared, ranging from the very basic Exhaustive Search to the recent fast adaptive algorithms like Adaptive Rood Pattern Search.
Abstract: This paper is a review of the block matching algorithms used for motion estimation in video compression. It implements and compares 7 different types of block matching algorithms that range from the very basic Exhaustive Search to the recent fast adaptive algorithms like Adaptive Rood Pattern Search. The algorithms that are evaluated in this paper are widely accepted by the video compressing community and have been used in implementing various standards, ranging from MPEG1 / H.261 to MPEG4 / H.263. The paper also presents a very brief introduction to the entire flow of video compression.

Proceedings ArticleDOI
20 Apr 2004
TL;DR: An FPGA based sub-system for NIDS (Snort) pattern matching using a combination of techniques to reduce the area cost of character matching using character pre-decoding before they are compared in the CAM line, and efficient shift register implementation using the SRL16 Xilinx cell.
Abstract: In this paper we advocate the use of pre-decoding for CAM-based pattern matching. We implement an FPGA based sub-system for NIDS (Snort) pattern matching using a combination of techniques. First, we reduce the area cost of character matching using (i) character pre-decoding before they are compared in the CAM line, and (ii) efficient shift register implementation using the SRL16 Xilinx cell. Then we achieve high operating frequencies by (iii) using ne grain pipelining for faster circuits and (iv) decoupling the data distribution network from the processing components. Our results show that for matching more than 18,000 characters (the entire SNORT rule set) our implementation requires an area cost of less than 1.1 logic cells per matched character, achieving an operating frequency of about 375 MHz (3 Gbps) on a Virtex2 device. When using quad parallelism to increase the matching throughput, the area cost of a single matched character is reduced to less than one logic cell for a throughput of almost 10 Gbps.

Journal ArticleDOI
TL;DR: In this paper, the authors consider bilateral matching problems where each person views those on the other side of the market as either acceptable or unacceptable: an acceptable mate is preferred to remaining single, and the latter to an unacceptable mate; all acceptable mates are welfare-wise identical.
Abstract: We consider bilateral matching problems where each person views those on the other side of the market as either acceptable or unacceptable: an acceptable mate is preferred to remaining single, and the latter to an unacceptable mate; all acceptable mates are welfare-wise identical. Using randomization, many efficient and fair matching methods define strategyproof revelation mechanisms. Randomly selecting a priority ordering of the participants is a simple example. Equalizing as much as possible the probability of getting an acceptable mate across all participants stands out for its normative and incentives properties: the profile of probabilities is Lorenz dominant, and the revelation mechanism is group-strategyproof for each side of the market. Our results apply to the random assignment problem as well.

Journal ArticleDOI
TL;DR: In this paper, the authors used parametric and semi-nonparametric matching techniques to estimate how one human capital investment, school enrollment, is affected by a parent's recent death.
Abstract: Loss of a parent is one of the most traumatic events a child can face. If loss of a parent reduces investments in children, it can also have long-lasting implications. This study uses parametric and semi-nonparametric matching techniques to estimate how one human capital investment, school enrollment, is affected by a parent's recent death. We analyze data from 600,000 households from Indonesia’s National Socioeconomic Survey (SUSENAS) during 1994-96. We find a parent's recent death has a large effect on a child's enrollment. We also use this shock to test several theories of intra-household allocation and find little differential treatment based on the gender of the child or the deceased parent.

Journal ArticleDOI
TL;DR: The use of optimal multivariate matching prior to randomization to improve covariate balance for many variables at the same time is discussed, presenting an algorithm and a case-study of its performance.
Abstract: SUMMARY Although blocking or pairing before randomization is a basic principle of experimental design, the principle is almost invariably applied to at most one or two blocking variables. Here, we discuss the use of optimal multivariate matching prior to randomization to improve covariate balance for many variables at the same time, presenting an algorithm and a case-study of its performance. The method is useful when all subjects, or large groups of subjects, are randomized at the same time. Optimal matching divides a single group of 2n subjects into n pairs to minimize covariate differences within pairs—the so-called nonbipartite matching problem—then one subject in each pair is picked at random for treatment, the other being assigned to control. Using the baseline covariate data for 132 patients from an actual, unmatched, randomized experiment, we construct 66 pairs matching for 14 covariates. We then create 10 000 unmatched and 10 000 matched randomized experiments by repeatedly randomizing the 132 patients, and compare the covariate balance with and without matching. By every measure, every one of the 14 covariates was substantially better balanced when randomization was performed within matched pairs. Even after covariance adjustment for chance imbalances in the 14 covariates, matched randomizations provided more accurate estimates than unmatched randomizations, the increase in accuracy being equivalent to, on average, a 7% increase in sample size. In randomization tests of no treatment effect, matched randomizations using the signed rank test had substantially higher power than unmatched randomizations using the rank sum test, even when only 2 of 14 covariates were relevant to a simulated response. Unmatched randomizations experienced rare disasters which were consistently avoided by matched randomizations.

Journal ArticleDOI
01 Feb 2004
TL;DR: A new match rule is introduced, called r-chunks, and the generalizations induced by different partial matching rules are characterized in terms of the crossover closure, which affects the tradeoff between positive and negative detection.
Abstract: In anomaly detection, the normal behavior of a process is characterized by a model, and deviations from the model are called anomalies. In behavior-based approaches to anomaly detection, the model of normal behavior is constructed from an observed sample of normally occurring patterns. Models of normal behavior can represent either the set of allowed patterns (positive detection) or the set of anomalous patterns (negative detection). A formal framework is given for analyzing the tradeoffs between positive and negative detection schemes in terms of the number of detectors needed to maximize coverage. For realistically sized problems, the universe of possible patterns is too large to represent exactly (in either the positive or negative scheme). Partial matching rules generalize the set of allowable (or unallowable) patterns, and the choice of matching rule affects the tradeoff between positive and negative detection. A new match rule is introduced, called r-chunks, and the generalizations induced by different partial matching rules are characterized in terms of the crossover closure. Permutations of the representation can be used to achieve more precise discrimination between normal and anomalous patterns. Quantitative results are given for the recognition ability of contiguous-bits matching together with permutations.

Posted Content
TL;DR: In this article, the Mortensen-Pissarides search and matching model is modified to make the present value of wages unresponsive to current labor market conditions, which amplifies fluctuations in unemployment and vacancies by an order of magnitude.
Abstract: The standard theory of equilibrium unemployment, the Mortensen-Pissarides search and matching model, cannot explain the magnitude of the business cycle fluctuations in two of its central elements, unemployment and vacancies. Modifying the model to make the present value of wages unresponsive to current labor market conditions amplifies fluctuations in unemployment and vacancies by an order of magnitude, significantly improving the performance of the model. Despite this, the welfare consequences of such rigid wages is negligible.

Journal ArticleDOI
TL;DR: In most studies regression-like techniques are routinely used for adjustment for confounding, although alternative methods are available.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: A new algorithm to compare two arbitrary unlabelled sets of points, and it is shown that it behaves properly in limit of continuous distributions on sub-manifolds and may apply to various matching problems, such as curve or surface matching, or mixings of landmark and curve data.
Abstract: In the paper, we study the problem of optimal matching of two generalized functions (distributions) via a diffeomorphic transformation of the ambient space. In the particular case of discrete distributions (weighted sums of Dirac measures), we provide a new algorithm to compare two arbitrary unlabelled sets of points, and show that it behaves properly in limit of continuous distributions on sub-manifolds. As a consequence, the algorithm may apply to various matching problems, such as curve or surface matching (via a sub-sampling), or mixings of landmark and curve data. As the solution forbids high energy solutions, it is also robust towards addition of noise and the technique can be used for nonlinear projection of datasets. We present 2D and 3D experiments.

Book
01 Jan 2004
TL;DR: In this paper, the shrinkage argument was used to match the prior for distribution functions and for prediction in the case of posterior density regions, and for other credible regions for prediction.
Abstract: Introduction and the Shrinkage Argument.- Matching Priors for Posterior Quantiles.- Matching Priors for Distribution Functions.- Matching Priors for Highest Posterior Density Regions.- Matching Priors for Other Credible Regions.- Matching Priors for Prediction.

Journal ArticleDOI
TL;DR: The purpose of this article is to describe the use of propensity scores to adjust for bias when estimating treatment effects in observational research and to compare use of this technique with conventional multivariable regression.
Abstract: Observational studies assessing the effect of a particular treatment or exposure may be subject to bias, which can be difficult to eliminate using standard analytic techniques. Multivariable models are commonly used in observational research to assess the relationship between a certain exposure or treatment and an outcome, while adjusting for important variables necessary to ensure comparability between the groups. Large differences in the observed covariates between two study groups may exist in observational studies in which the investigator has no control over who was allocated to each treatment group, and these differences may lead to biased estimates of treatment effect. When there are large differences in important prognostic characteristics between the treatment groups, adjusting for these differences with conventional multivariable techniques may not adequately balance the groups, and the remaining bias may limit valid causal inference. Use of a propensity score, described as a conditional probability that a subject will be "treated" based on an observed group of covariates, may better adjust covariates between the groups and reduce bias. The purpose of this article is to describe the use of propensity scores to adjust for bias when estimating treatment effects in observational research and to compare use of this technique with conventional multivariable regression. The authors present three methods for integrating propensity scores into observational analyses using a database collected on head-injured trauma patients. The article details the methods for creating a propensity score, analyzing data with the score, and explores differences between propensity score methods and conventional multivariable methods, including potential benefits and limitations. Graphical representations of the analyses are provided as well.

Journal ArticleDOI
TL;DR: In this article, the authors compared experimental and propensity score impact estimates of dropout prevention programs, and found no consistent evidence that such methods replicate experimental impacts in the setting of Dropout Prevention Programs.
Abstract: By comparing experimental and propensity-score impact estimates of dropout prevention programs, we examine whether propensity-score methods produce unbiased estimates of program impacts. We find no consistent evidence that such methods replicate experimental impacts in our setting. This finding holds even when the data available for matching are extensive. Our findings suggest that evaluators who plan to use nonexperimental methods, such as propensity-score matching, need to consider carefully how programs recruit individuals and why individuals enter programs, as unobserved factors may exert powerful influences on outcomes that are not easily captured using nonexperimental methods.

Proceedings ArticleDOI
22 Aug 2004
TL;DR: This paper develops the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection, and introduces a new correlation measure, $H$-measure, distinct from those proposed in previous work.
Abstract: To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., [author] corresponds to [first name, last name] in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., [first name, last name]) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection. Unlike previous correlation mining algorithms, which mainly focus on finding strong positive correlations, our algorithm cares both positive and negative correlations, especially the subtlety of negative correlations, due to its special importance in schema matching. This leads to the introduction of a new correlation measure, $H$-measure, distinct from those proposed in previous work. We evaluate our approach extensively and the results show good accuracy for discovering complex matchings.

Patent
27 May 2004
TL;DR: In this article, a method and system for efficiently determining grating profiles using dynamic learning in a library generation process is presented, which also relates to a method for searching and matching trial GRating profiles to determine shape, profile, and spectrum data information associated with an actual grating profile.
Abstract: The present invention relates to a method and system for efficiently determining grating profiles using dynamic learning in a library generation process. The present invention also relates to a method and system for searching and matching trial grating profiles to determine shape, profile, and spectrum data information associated with an actual grating profile.

Journal ArticleDOI
TL;DR: A graph theoretic approach is used to model the geographic context and to determine the matching features from multiple sources in a Geographic Information System populated with disparate data sources.
Abstract: A Geographic Information System (GIS) populated with disparate data sources has multiple and different representations of the same real-world object. Often, the type of information in these sources is different, and combining them to generate one composite representation has many benefits. The first step in this conflation process is to identify the features in different sources that represent the same real-world entity. The matching process is not simple, since the identified features from different sources do not always match in their location, extent, and description. We present a new approach to matching GIS features from disparate sources. A graph theoretic approach is used to model the geographic context and to determine the matching features from multiple sources. Experiments on implementation of this approach demonstrate its viability.

Journal ArticleDOI
TL;DR: In this article, the authors consider the implications of firms posting contracts, in a random matching model with on-the-job search, and show that the effect on the labour market is to reduce turnover, below the level required for efficient matching.
Abstract: A common assumption in equilibrium search and matching models of the labour market is that each firm posts a wage, to be paid to any worker hired. This paper considers the implications of firms posting contracts, in a random matching model with on-the-job search. More complex contracts enable firms to address both recruitment and retention problems by, for example, increasing the wage with tenure. The effect on the labour market is to reduce turnover, below the level required for efficient matching of workers to firms. Copyright 2004, Wiley-Blackwell.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: A novel approach to point matching under large viewpoint and illumination changes that are suitable for accurate object pose estimation at a much lower computational cost than state-of-the-art methods is proposed and is both reliable and suitable for initializing real-time applications.
Abstract: We propose a novel approach to point matching under large viewpoint and illumination changes that are suitable for accurate object pose estimation at a much lower computational cost than state-of-the-art methods. Most of these methods rely either on using ad hoc local descriptors or on estimating local affine deformations. By contrast, we treat wide baseline matching of key points as a classification problem, in which each class corresponds to the set of all possible views of such a point. Given one or more images of a target object, we train the system by synthesizing a large number of views of individual key points and by using statistical classification tools to produce a compact description of this view set. At run-time, we rely on this description to decide to which class, if any, an observed feature belongs. This formulation allows us to use a classification method to reduce matching error rates, and to move some of the computational burden from matching to training, which can be performed beforehand. In the context of pose estimation, we present experimental results for both planar and non-planar objects in the presence of occlusions, illumination changes, and cluttered backgrounds. We show that the method is both reliable and suitable for initializing real-time applications.

Proceedings ArticleDOI
21 Jun 2004
TL;DR: A weight-based map matching method is introduced, and it is experimentally shown that, for the offline situation, this algorithm can get up to 94% correctness depending on the GPS sampling interval.
Abstract: In location management, the trajectory represents the motion of a moving object in 3D space-time, i.e., a sequence (x, y, t). Unfortunately, location technologies, cannot guarantee error-freedom. Thus, map matching (a.k.a. snapping), matching a trajectory to the roads on the map, is necessary. We introduce a weight-based map matching method, and experimentally show that, for the offline situation, on average, our algorithm can get up to 94% correctness depending on the GPS sampling interval.