scispace - formally typeset
Search or ask a question

Showing papers on "Matching (statistics) published in 2010"


Journal ArticleDOI
TL;DR: A structure for thinking about matching methods and guidance on their use is provided, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.
Abstract: When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970's, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine, and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods-or developing methods related to matching-do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.

3,952 citations


Book
29 Apr 2010
TL;DR: In this paper, two simple models for Observational Studies are proposed: risk-set matching and risk-free matching, and the power of a Sensitivity Analysis and its limit.
Abstract: Beginnings.- Dilemmas and Craftsmanship.- Causal Inference in Randomized Experiments.- Two Simple Models for Observational Studies.- Competing Theories Structure Design.- Opportunities, Devices, and Instruments.- Transparency.- Matching.- A Matched Observational Study.- Basic Tools of Multivariate Matching.- Various Practical Issues in Matching.- Fine Balance.- Matching Without Groups.- Risk-Set Matching.- Matching in R.- Design Sensitivity.- The Power of a Sensitivity Analysis and Its Limit.- Heterogeneity and Causality.- Uncommon but Dramatic Responses to Treatment.- Anticipated Patterns of Response.- Planning Analysis.- After Matching, Before Analysis.- Planning the Analysis.

887 citations


Book ChapterDOI
08 Nov 2010
TL;DR: A novel approach to binocular stereo for fast matching of high-resolution images by building a prior on the disparities by forming a triangulation on a set of support points which can be robustly matched, reducing the matching ambiguities of the remaining points.
Abstract: In this paper we propose a novel approach to binocular stereo for fast matching of high-resolution images. Our approach builds a prior on the disparities by forming a triangulation on a set of support points which can be robustly matched, reducing the matching ambiguities of the remaining points. This allows for efficient exploitation of the disparity search space, yielding accurate dense reconstruction without the need for global optimization. Moreover, our method automatically determines the disparity range and can be easily parallelized. We demonstrate the effectiveness of our approach on the large-scale Middlebury benchmark, and show that state-of-the-art performance can be achieved with significant speedups. Computing the left and right disparity maps for a one Megapixel image pair takes about one second on a single CPU core.

818 citations


Proceedings ArticleDOI
01 Jan 2010
TL;DR: This work converts the person re-identification problem from an absolute scoring p roblem to a relative ranking problem and develops an novel Ensemble RankSVM to overcome the scalability limitation problem suffered by existing SVM-based ranking methods.
Abstract: Solving the person re-identification problem involves matching observation s of individuals across disjoint camera views. The problem becomes particularly hard in a busy public scene as the number of possible matches is very high. This is further compounded by significant appearance changes due to varying lighting conditions, vie wing angles and body poses across camera views. To address this problem, existing approaches focus on extracting or learning discriminative features followed by template matching using a distance measure. The novelty of this work is that we reformulate the person reidentification problem as a ranking problem and learn a subspace where th e potential true match is given highest ranking rather than any direct distance measure. By doing so, we convert the person re-identification problem from an absolute scoring p roblem to a relative ranking problem. We further develop an novel Ensemble RankSVMto overcome the scalability limitation problem suffered by existing SVM-based ranking methods. This new model reduces significantly memory usage therefore is much more scalable, whilst maintaining high-level performance. We present extensive experiments to demonstrate the performance gain of the proposed ranking approach over existing template matching and classification models.

736 citations


Journal ArticleDOI
TL;DR: In this article, a quantile-based mapping method is developed for the bias correction of monthly global circulation model outputs, which explicitly accounts for distribution changes for a given model between the projection and baseline periods.
Abstract: [1] A new quantile-based mapping method is developed for the bias correction of monthly global circulation model outputs. Compared to the widely used quantile-based matching method that assumes stationarity and only uses the cumulative distribution functions (CDFs) of the model and observations for the baseline period, the proposed method incorporates and adjusts the model CDF for the projection period on the basis of the difference between the model and observation CDFs for the training (baseline) period. Thus, the method explicitly accounts for distribution changes for a given model between the projection and baseline periods. We demonstrate the use of the new method over northern Eurasia. We fit a four-parameter beta distribution to monthly temperature fields and discuss the sensitivity of the results to the choice of distribution range parameters. For monthly precipitation data, a mixed gamma distribution is used that accounts for the intermittent nature of rainfall. To test the fidelity of the proposed method, we choose 1970-1999 as the baseline training period and then randomly select 30 years from 1901-1999 as the projection test period. The bootstrapping is repeated 30 times to mimic different climate conditions that may occur, and the results suggest that both methods are comparable when applied to the 20th century for both temperature and precipitation for the examined quartiles. We also discuss the dependence of the bias correction results on the choice of time period for training. This indicates that the remaining biases in the bias-corrected time series are directly tied to the model's performance during the training period, and therefore care should be taken when using a particular training time period. When applied to the Intergovernmental Panel on Climate Change fourth assessment report (AR4) A2 climate scenario projection, the data time series after bias correction from both methods exhibit similar spatial patterns. However, over regions where the climate model shows large changes in projected variability, there are discernable differences between the methods. The proposed method is more sensitive to a reduction in variability, exemplified by wintertime temperature. Further synthetic experiments using the lower 33% and upper 33% of the full data set as the validation data suggest that the proposed equidistance quantile-matching method is more efficient in reducing biases than the traditional CDF mapping method for changing climates, especially for the tails of the distribution. This has important consequences for the occurrence and intensity of future projected extreme events such as heat waves, floods, and droughts. As the new method is simple to implement and does not require substantial computational time, it can be used to produce auxiliary ensemble scenarios for various climate impact-oriented applications.

705 citations


Book
15 Sep 2010
TL;DR: In this article, the authors discuss the importance of theory in education and the key challenge of causal research in educational research, as well as the challenges of designing, implementing, and learning from Randomized Experiments.
Abstract: 1 The Challenge for Educational Research 1.1 The Long Quest 1.2 The Quest is World-Wide 1.3 What this Book is About 1.4 What to Read Next 2 The Importance of Theory 2.1 What is Theory? 2.2 Theory in Education 2.3 Voucher Theory 2.4 What Kind of Theories? 2.5 What to Read Next 3 Designing Research to Address Causal Questions 3.1 Conditions to Strive for in All Research 3.2 Making Causal Inferences 3.3 Past Approaches To Answering Causal Questions in Education 3.4 The Key Challenge of Causal Research 3.5 What to Read Next 4 Investigator-Designed Randomized Experiments 4.1 Conducting Randomized Experiments 4.1.1 An Example of a "Two-Group" Experiment 4.2 Analyzing Data from Randomized Experiments 4.2.1 Better Your Research Design, the Simpler Your Data-Analysis 4.2.2 Bias and Precision in the Estimation of Experimental Effects 4.3 What to Read Next 5 Challenges in Designing, Implementing, and Learning from Randomized Experiments 5.1 Critical Decisions in the Design of Experiments 5.1.1 Defining the Treatment Being Evaluated 5.1.2 Defining the Population from Which Participants Will Be Sampled 5.1.3 Deciding Which Outcomes to Measure 5.1.4 Deciding How Long To Track Participants 5.2 Threats to Validity of Randomized Experiments 5.2.1 Contamination of the treatment-control contrast 5.2.2 Cross-Overs 5.2.3 Attrition from the sample 5.2.4 Participation in an Experiment Itself Affects Participants' Behavior 5.3 Gaining Support for Conducting Randomized Experiments: Examples from India 5.3.1 Evaluating an Innovative Input Approach 5.3.2 Evaluating an Innovative Incentive Policy 5.4 What to Read Next 6 Statistical Power and Sample Size 6.1 Statistical Power 6.1.1 Reviewing the Process of Statistical Inference 6.1.2 Defining Statistical Power 6.2 Factors Affecting Statistical Power 6.2.1 The Strengths and Limitations of Parametric Tests 6.2.2 The Benefits of Covariates 6.2.3 The Reliability of the Outcome Measure Matters 6.2.4 The Choice between One-Tailed and Two-Tailed Tests 6.3 What to Read Next 7 Experimental Research When Participants Are Clustered within Intact Groups 7.1 Using the Random-Intercepts Multilevel Model to Estimate Effect Size When Intact Groups of Participants Were Randomized To Experimental Conditions 7.2 Statistical Power When Intact Groups of Participants Were Randomized To Experimental Conditions 7.2.1. Statistical Power of the Cluster-Randomized Design and Intraclass Correlation 7.3 Using Fixed-Effects Multilevel Models to Estimate Effect Size When Intact Groups of Participants are Randomized To Experimental Conditions 7.3.1 Specifying a "Fixed-Effects" Multilevel Model 7.3.2. Choosing Between Random- and Fixed-Effects Specifications 7.2 What to Read Next 8 Using Natural Experiments To Provide "Arguably Exogenous" Treatment Variability 8.1 Natural- and Investigator-Designed Experiments: Similarities and Differences 8.2 Two Examples of Natural Experiments 8.2.1 The Vietnam Era Draft Lottery 8.2.2 The Impact of an Offer of Financial Aid for College 8.3 Sources of Natural Experiments 8.4 Choosing the Width of the Analytic Window 8.5 Threats to Validity in Natural Experiments with a Discontinuity Design 8.5.1 Accounting for the Relationship between the Forcing Variable and the Outcome in a Discontinuity Design 8.5.2 Actions by Participants Can Undermine Exogenous Assignment to Experimental Conditions in a Natural Experiment with a Discontinuity Design 8.6 What to Read Next? 9 Estimating Causal Effects Using a Regression-Continuity Approach 9.1 Maimonides' Rule and the Impact of Class Size on Student Achievement 9.1.1 A Simple "First Difference" Analysis 9.1.2 A "Difference-in-Differences" Analysis 9.1.3 A Basic "Regression-Discontinuity" Analysis 9.1.4 Choosing an Appropriate "Window" or "Bandwidth" 9.2 Generalizing the Relationship between Outcome and Forcing Variable 9.2.1 Specification Checks Using Pseudo-Outcomes and Pseudo-Cutoffs 9.2.2 RD Designs and Statistical Power 9.3 Additional Threats to Validity in an RD Design 9.4 What to Read Next 10 Introducing Instrumental Variables Estimation 10.1 Introducing Instrumental Variables Estimation 10.1.1 Investigating the Relationship Between an Outcome and a Potentially- Endogenous Question Predictor Using OLS Regression Analysis 10.1.2 Instrumental Variables Estimation 10.2 Two Critical Assumptions That Underpin Instrumental Variables Estimation 10.3 Alternative Ways of Obtaining the IV Estimate 10.3.1 Obtaining an IV Estimate by the Method of Two-Stage Least-Squares 10.3.2 Obtaining an IVE by Simultaneous Equations Estimation 10.4 Extensions of the Basic IVE Approach 10.4.1 Incorporating Exogenous Covariates into IV Estimation 10.4.2 Incorporating Multiple Instruments into the First-Stage Model 10.4.3 Examining the Impact of Interactions between the Endogenous Question Predictor and Exogenous Covariates in the Second-Stage Model 10.4.4 Choosing Appropriate Functional Forms for the Outcome/Predictor Relationships in the First- and Second-Stage Models 10.5 Finding and Defending Instruments 10.5.1 Proximity of Educational Institutions 10.5.2 Institutional Rules and Personal Characteristics 10.5.3 Deviations from Cohort Trends 10.5.4 The Search Continues 10.6 What To Read Next 11 Using IVE to Recover the Treatment Effect in a Quasi-Experiment 11.1 The Notion of a "Quasi-Experiment" 11.2 Using IVE to Estimate the Causal Impact of a Treatment in a Quasi-Experiment 11.3 Further Insight into the IVE (LATE) Estimate, in the Context of Quasi- Experimental Data 11.4 Using IVE to Resolve "Fuzziness" in a Regression Discontinuity Design 11.5 What To Read Next 12 Dealing with Bias in Treatment Effects Estimated from Non-Experimental Data 12.1 Reducing Observed Bias by the Method of Stratification 12.1.1 Stratifying on a Single Covariate 12.1.2 Stratifying on Covariates 12.2 Reducing Observed Bias by Direct Control for Covariates Using Regression Analysis 12.3 Reducing Observed Bias Using A "Propensity Score" Approach 12.3.1 Estimation of the Treatment Effect by Stratifying on Propensity Scores 12.3.2 Estimation of the Treatment Effect by Matching on Propensity Scores 12.3.3 Estimation of the Treatment Effect by Weighting by the Inverse of the Propensity Scores 12.4 A Return to the Substantive Question 12.5 What to Read Next 13 Substantive Lessons from High-Quality Evaluations of Educational Interventions 13.1 Increasing School Enrollments 13.1.1 Reduce Commuting Time 13.1.2 Reduce Out-of-Pocket Educational Costs 13.1.3 Reduce Opportunity Costs 13.1.4 Does Increasing School Enrollment Necessarily Lead To Improved Long-Term Outcomes? 13.2 Improving School Quality 13.2.1 Provide More or Better Educational Inputs 13.2.1.1 Provide More Books 13.2.1.2 Teach Children in Smaller Classes 13.2.1.3 Recruit Skilled Teachers or Provide Training to Enhance Teachers' Effectiveness 13.2.2 Improve Incentives For Teachers 13.2.3 Improving Incentives for Students 13.2.4 Increase Families' Schooling Choices 13.3 Summing Up 14 Methodological Lessons from the Long Quest 14.1 Be Clear About Your Theory of Action 14.2 Learn about Culture, Rules, and Institutions in the Research Setting 14.3 Understand the Counterfactual 14.4 Worry about Selection Bias 14.5 Measure All Possible Important Outcomes 14.6 Be On the Lookout for Longer-Term Effects 14.7 Develop a Plan for Examining Impacts on Subgroups 14.8 Interpret Your Research Results Correctly 14.9 Pay Attention to Anomalous Results 14.10 Recognize That Good Research Always Raises New Questions 14.11 Final Words

515 citations


Journal ArticleDOI
Peter C. Austin1
TL;DR: The authors recommend that, in most settings, researchers match either 1 or 2 untreated subjects to each treated subject when using propensity-score matching when using many-to-one matching on the propensity score.
Abstract: Propensity-score matching is increasingly being used to estimate the effects of treatments using observational data. In many-to-one (M:1) matching on the propensity score, M untreated subjects are matched to each treated subject using the propensity score. The authors used Monte Carlo simulations to examine the effect of the choice of M on the statistical performance of matched estimators. They considered matching 1-5 untreated subjects to each treated subject using both nearest-neighbor matching and caliper matching in 96 different scenarios. Increasing the number of untreated subjects matched to each treated subject tended to increase the bias in the estimated treatment effect; conversely, increasing the number of untreated subjects matched to each treated subject decreased the sampling variability of the estimated treatment effect. Using nearest-neighbor matching, the mean squared error of the estimated treatment effect was minimized in 67.7% of the scenarios when 1:1 matching was used. Using nearest-neighbor matching or caliper matching, the mean squared error was minimized in approximately 84% of the scenarios when, at most, 2 untreated subjects were matched to each treated subject. The authors recommend that, in most settings, researchers match either 1 or 2 untreated subjects to each treated subject when using propensity-score matching.

498 citations


Journal ArticleDOI
01 Sep 2010
TL;DR: It is found that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.
Abstract: Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community we also consider a state-of-the-art commercial entity resolution implementation. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging resolution tasks such as matching product entities from online shops are not sufficiently solved with conventional approaches based on the similarity of attribute values.

436 citations


Journal ArticleDOI
TL;DR: In this article, the Gale-Shapley algorithm was used to predict offline matches in online dating sites, and the predicted matches are similar to the actual matches achieved by the dating site.
Abstract: Using data on user attributes and interactions from an online dating site, we estimate mate preferences, and use the Gale-Shapley algorithm to predict sta ble matches. The predicted matches are similar to the actual matches achieved by the dating site, and the actual matches are approximately efficient. Outof-sample predictions of offline matches, i.e., marriages, exhibit assortative mating patterns similar to those observed in actual marriages. Thus, mate pref erences, without resort to search frictions, can generate sorting in marriages. However, we underpredict some of the correlation patterns; search frictions may play a role in explaining the discrepancy. (

436 citations


Journal ArticleDOI
TL;DR: A new test for unfamiliar face matching, the Glasgow Face Matching Test (GFMT), which correlates moderately with face memory but more strongly with object matching, a result that is consistent with previous research highlighting a link between object and face matching.
Abstract: We describe a new test for unfamiliar face matching, the Glasgow Face Matching Test (GFMT). Viewers are shown pairs of faces, photographed in full-face view but with different cameras, and are asked to make same/different judgments. The full version of the test comprises 168 face pairs, and we also describe a shortened version with 40 pairs. We provide normative data for these tests derived from large subject samples. We also describe associations between the GFMT and other tests of matching and memory. The new test correlates moderately with face memory but more strongly with object matching, a result that is consistent with previous research highlighting a link between object and face matching, specific to unfamiliar faces. The test is available free for scientific use.

429 citations


Journal ArticleDOI
01 Feb 2010
TL;DR: This paper comparatively analyze 11 proposed frameworks for entity matching and considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task.
Abstract: Entity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed frameworks for entity matching. Our study considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task. Moreover, we consider support for blocking and the combination of different match algorithms. We further study how the different frameworks have been evaluated. The study aims at exploring the current state of the art in research prototypes of entity matching frameworks and their evaluations. The proposed criteria should be helpful to identify promising framework approaches and enable categorizing and comparatively assessing additional entity matching frameworks and their evaluations.

Posted Content
TL;DR: Simulation results indicate that the use of sophisticated optimization methods instead of simple greedy matching rules substantially improve the performance of ride-sharing systems, and it appears that sustainable populations of dynamic ride- sharing participants may be possible even in relatively sprawling urban areas with many employment centers.
Abstract: Smartphone technology enables dynamic ride-sharing systems that bring together people with similar itineraries and time schedules to share rides on short-notice. This paper considers the problem of matching drivers and riders in this dynamic setting. We develop optimizationbased approaches that aim at minimizing the total system-wide vehicle miles and individual travel costs. To assess the merits of our methods we present a simulation study based on 2008 travel demand data from metropolitan Atlanta. The simulation results indicate that the use of sophisticated optimization methods instead of simple greedy matching rules may substantially improve the performance of ride-sharing systems. Furthermore, even with relatively low participation rates, it appears that sustainable populations of dynamic ride-sharing participants may be possible even in relatively sprawling urban areas with many employment centers.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a selective review of recent developments in DSGE models, and describe and implement Bayesian moment matching and impulse response matching procedures for monetary DSGE.
Abstract: Monetary DSGE models are widely used because they fit the data well and can be used to address important monetary policy questions. We provide a selective review of these developments. Policy analysis with DSGE models requires using data to assign numerical values to model parameters. The paper describes and implements Bayesian moment matching and impulse response matching procedures for this purpose.

Proceedings ArticleDOI
23 May 2010
TL;DR: This work proposes an Interactive Voting-based Map Matching (IVMM) algorithm that does not only consider the spatial and temporal information of a GPS trajectory but also devise a voting-based strategy to model the weighted mutual influences between GPS points.
Abstract: Matching a raw GPS trajectory to roads on a digital map is often referred to as the Map Matching problem. However, the occurrence of the low-sampling-rate trajectories (e.g. one point per 2 minutes) has brought lots of challenges to existing map matching algorithms. To address this problem, we propose an Interactive Voting-based Map Matching (IVMM) algorithm based on the following three insights: 1) The position context of a GPS point as well as the topological information of road networks, 2) the mutual influence between GPS points (i.e., the matching result of a point references the positions of its neighbors; in turn, when matching its neighbors, the position of this point will also be referenced), and 3) the strength of the mutual influence weighted by the distance between GPS points (i.e., the farther distance is the weaker influence exists). In this approach, we do not only consider the spatial and temporal information of a GPS trajectory but also devise a voting-based strategy to model the weighted mutual influences between GPS points. We evaluate our IVMM algorithm based on a user labeled real trajectory dataset. As a result, the IVMM algorithm outperforms the related method (ST-Matching algorithm).

Journal ArticleDOI
TL;DR: This paper developed tools and techniques to analyze the determinants of factor allocation and factor prices in economies with a large number of goods and factors and characterized sufficient conditions for robust monotone comparative statics predictions in a Roy-like assignment model.
Abstract: This paper develops tools and techniques to analyze the determinants of factor allocation and factor prices in economies with a large number of goods and factors. The main results of our paper characterize sufficient conditions for robust monotone comparative statics predictions in a Roy-like assignment model. These general results are then used to generate new insights about the consequences of globalization.

Journal ArticleDOI
TL;DR: This paper defines interviewer effects, argues for the importance of measuring and controlling for interviewer effects in health surveys, provides advice about how to interpret research on interviewer effects and summarizes research to date on race, ethnicity and gender effects.
Abstract: Interviewer effects can have a substantial impact on survey data and may be particularly operant in public health surveys, where respondents are likely to be queried about racial attitudes, sensitive behaviors and other topics prone to socially desirable responding. This paper defines interviewer effects, argues for the importance of measuring and controlling for interviewer effects in health surveys, provides advice about how to interpret research on interviewer effects and summarizes research to date on race, ethnicity and gender effects. Interviewer effects appear to be most likely to occur when survey items query attitudes about sociodemographic characteristics or respondents' engagement in sensitive behaviors such as substance use. However, there is surprisingly little evidence to indicate whether sociodemographic interviewer-respondent matching improves survey response rates or data validity, and the use of a matched design introduces possible measurement bias across studies. Additional research is needed to elucidate many issues, including the influence of interviewers' sociodemographic characteristics on health-related topics, the role of within-group interviewer variability on survey data and the simultaneous impact of multiple interviewer characteristics. The findings of such research would provide much-needed guidance to public health professionals on whether or not to match interviewers and respondents on key sociodemographic characteristics.

Journal ArticleDOI
TL;DR: In this article, the role of sex in judging has been explored by addressing two questions of long-standing interest to political scientists: whether and in what ways male and female judges decide cases distinctly and whether serving with a female judge causes malestobehavedifferently.
Abstract: We explore the role of sex in judging by addressing two questions of long-standing interest to political scientists: whether and in what ways male and female judges decide cases distinctly—“individual effects”—and whether and in what ways servingwithafemalejudgecausesmalestobehavedifferently—“paneleffects.”Whileweattendtothedominanttheoretical accounts of why we might expect to observe either or both effects, we do not use the predominant statistical tools to assess them. Instead, we deploy a more appropriate methodology: semiparametric matching, which follows from a formal framework for causal inference. Applying matching methods to 13 areas of law, we observe consistent gender effects in only one—sex discrimination. For these disputes, the probability of a judge deciding in favor of the party alleging discrimination decreases by about 10 percentage points when the judge is a male. Likewise, when a woman serves on a panel with men, the men are significantly more likely to rule in favor of the rights litigant. These results are consistent with an informational account of gendered judging and are inconsistent with several others.

Journal ArticleDOI
TL;DR: The use of information emerging from genetic discovery to motivate risk-reducing health behaviors is described and consideration of using genetic information to identify risk shared within kinship networks and to expand the influence of behavior change beyond the individual is considered.
Abstract: This report describes the use of information emerging from genetic discovery to motivate risk-reducing health behaviors. Most research to date has evaluated the effects of information related to rare genetic variants on screening behaviors, in which genetic risk feedback has been associated consistently with improved screening adherence. The limited research with common genetic variants suggests that genetic information, when based on single-gene variants with low-risk probabilities, has little impact on behavior. The effect on behavioral outcomes of more realistic testing scenarios in which genetic risk is based on numerous genetic variants is largely unexplored. Little attention has been directed to matching genetic information to the literacy levels of target audiences. Another promising area for research is consideration of using genetic information to identify risk shared within kinship networks and to expand the influence of behavior change beyond the individual.

Posted Content
TL;DR: In this article, the authors provide a guide to the key aspects of implementing Propensity-Score Matching (PSM) methodology, summarizing the basic conditions under which PSM can be used to estimate the impact of a program and the data required, presenting examples of PSM applications.
Abstract: This document provides a guide to the key aspects of implementing Propensity-Score Matching (PSM) methodology. It summarizes the basic conditions under which PSM can be used to estimate the impact of a program and the data required, presenting examples of PSM applications. It explains how the Conditional Independence Assumption, combined with the Overlap Condition, reduces selection bias when participation in a program is determined by observable characteristics. It also describes different matching algorithms and some tests to assess the quality of the matching.

Patent
30 Sep 2010
TL;DR: In this article, the authors present an architecture that selects a classification engine based on the expertise of the engine to process a given entity (e.g., a file). Selection of an engine is based on a probability that the engine will detect an unknown entity classification using properties of the entity.
Abstract: Architecture that selects a classification engine based on the expertise of the engine to process a given entity (e.g., a file). Selection of an engine is based on a probability that the engine will detect an unknown entity classification using properties of the entity. One or more of the highest ranked engines are activated in order to achieve the desired performance. A statistical, performance-light module is employed to skip or select several performance-demanding processes. Methods and algorithms are utilized for learning based on matching the best classification engine(s) to detect the entity class based on the entity properties. A user selection option is provided for specifying a maximum number of ranked, classification engines to consider for each state of the machine. A user can also select the minimum probability of detection for a specific entity (e.g., unknown file). The best classifications are re-evaluated over time as the classification engines are updated.

Journal ArticleDOI
TL;DR: A Census-based stereo matching algorithm that handles difficult areas for stereo matching, such as areas with low texture, very well in comparison to state-of-the-art real-time methods and can successfully eliminate false positives to provide reliable 3D data.

Journal ArticleDOI
TL;DR: An automated fingerprint recognition system is described that is successfully used by law enforcement and many other applications such as identity management and access control.
Abstract: Fingerprint matching has been successfully used by law enforcement for more than a century. The technology is now finding many other applications such as identity management and access control. The authors describe an automated fingerprint recognition system and identify key challenges and research opportunities in the field.

Journal ArticleDOI
TL;DR: It is proved that, when the bounded sets form a nested set system, a stable matching can be found by generalising, in non-trivial ways, both the applicant-oriented and college-oriented versions of the classical Gale-Shapley algorithm.

Proceedings Article
31 Mar 2010
TL;DR: This paper studies learning methods for binary restricted Boltzmann machines based on ratio matching and generalized score matching and compares them to a range of existing learning methods including stochastic maximum likelihood, contrastive divergence, and pseudo-likelihood.
Abstract: Recent research has seen the proposal of several new inductive principles designed specifically to avoid the problems associated with maximum likelihood learning in models with intractable partition functions. In this paper, we study learning methods for binary restricted Boltzmann machines (RBMs) based on ratio matching and generalized score matching. We compare these new RBM learning methods to a range of existing learning methods including stochastic maximum likelihood, contrastive divergence, and pseudo-likelihood. We perform an extensive empirical evaluation across multiple tasks and data sets.

Proceedings ArticleDOI
14 Sep 2010
TL;DR: This work addresses the problem of matching user profiles in its globality by providing a suitable matching framework able to consider all the profile’s attributes and allows users to give more importance to some attributes and assign each attribute a different similarity measure.
Abstract: Inter-social networks operations and functionalities are required in several scenarios (data integration, data enrichment, information retrieval, etc.). To achieve this, matching user profiles is required. Current methods are so restrictive and do not consider all the related problems. Particularly, they assume that two profiles describe the same physical person only if the values of their Inverse Functional Property or IFP (e.g. the email address, homepage, etc.) are the same. However, the observed trend in social networks is not fully compatible with this assumption since users tend to create more than one social network account (for personal use, for work, etc.) while using same or different email addresses. In this work, we address the problem of matching user profiles in its globality by providing a suitable matching framework able to consider all the profile’s attributes. Our framework allows users to give more importance to some attributes and assign each attribute a different similarity measure. The set of experiments conducted with our default/recommended attribute/similarity measures shows the superiority of our proposal in comparison with current ones.

Proceedings ArticleDOI
22 Mar 2010
TL;DR: Experiments conducted on the real-world Census-income dataset show that, although the proposed methods provide strong privacy, their effectiveness in reducing matching cost is not far from that of k-anonymity based counterparts.
Abstract: Private matching between datasets owned by distinct parties is a challenging problem with several applications. Private matching allows two parties to identify the records that are close to each other according to some distance functions, such that no additional information other than the join result is disclosed to any party. Private matching can be solved securely and accurately using secure multi-party computation (SMC) techniques, but such an approach is prohibitively expensive in practice. Previous work proposed the release of sanitized versions of the sensitive datasets which allows blocking, i.e., filtering out sub-sets of records that cannot be part of the join result. This way, SMC is applied only to a small fraction of record pairs, reducing the matching cost to acceptable levels. The blocking step is essential for the privacy, accuracy and efficiency of matching. However, the state-of-the-art focuses on sanitization based on k-anonymity, which does not provide sufficient privacy. We propose an alternative design centered on differential privacy, a novel paradigm that provides strong privacy guarantees. The realization of the new model presents difficult challenges, such as the evaluation of distance-based matching conditions with the help of only a statistical queries interface. Specialized versions of data indexing structures (e.g., kd-trees) also need to be devised, in order to comply with differential privacy. Experiments conducted on the real-world Census-income dataset show that, although our methods provide strong privacy, their effectiveness in reducing matching cost is not far from that of k-anonymity based counterparts.

Journal ArticleDOI
TL;DR: In this article, the authors analyzed the contributions of before-fee performance and fees to SRI funds' performance, and investigated the role played by fund management companies in the determination of those variables.
Abstract: In this article, we shed light on the debate about the financial performance of socially responsible investment (SRI) mutual funds by separately analyzing the contributions of before-fee performance and fees to SRI funds’ performance, and by investigating the role played by fund management companies in the determination of those variables. We apply the matching estimator methodology to obtain our results and find that in the period 1997–2005, US SRI funds had better before- and after-fee performance than conventional funds with similar characteristics. The differences, however, are driven exclusively by SRI funds run by management companies specialized in SRI. While these funds significantly outperform similar conventional funds, funds run by companies not specialized in SRI underperform their matched conventional funds. We find no significant differences in fees between SRI and conventional funds except in one case: SRI funds are cheaper than conventional funds run by the same management company.

Journal ArticleDOI
TL;DR: Generally, the Yeshasvini community-based health insurance programme is found to have increased utilisation of health-care services, reduced out-of-pocket spending, and ensured better health and economic outcomes, but these effects vary across socio-economic groups and medical episodes.
Abstract: Using propensity score matching techniques, the study evaluates the impact of India's Yeshasvini community-based health insurance programme on health-care utilisation, financial protection, treatment outcomes and economic well-being. The programme offers free out-patient diagnosis and lab tests at discounted rates when ill, but, more importantly, it covers highly catastrophic and less discretionary in-patient surgical procedures. For its impact evaluation, 4109 randomly selected households in villages in rural Karnataka, an Indian state, were interviewed using a structured questionnaire. A comprehensive set of indicators was developed and the quality of matching was tested. Generally, the programme is found to have increased utilisation of health-care services, reduced out-of-pocket spending, and ensured better health and economic outcomes. More specifically, however, these effects vary across socio-economic groups and medical episodes. The programme operates by bringing the direct price of health-care down but the extent to which this effectively occurs across medical episodes is an empirical issue. Further, the effects are more pronounced for the better-off households. The article demonstrates that community insurance presents a workable model for providing high-end services in resource-poor settings through an emphasis on accountability and local management.

Posted Content
TL;DR: In this article, the authors present evidence from a natural field experiment designed to shed light on the efficacy of fundraising schemes in which donations are matched by a lead donor, and find that straight linear matching schemes raise the total donations received including the match value, but partially crowd out the actual donations given excluding the match.
Abstract: We present evidence from a natural field experiment designed to shed light on the efficacy of fundraising schemes in which donations are matched by a lead donor. In conjunction with the Bavarian State Opera House, we mailed 14,000 regular opera attendees a letter describing a charitable fundraising project organized by the opera house. Recipients were randomly assigned to treatments designed to explore behavioral responses to linear matching schemes, as well as the mere existence of a substantial lead donor. We use the exogenous variation in match rates across treatments to estimate the price elasticities of charitable giving. We find that straight linear matching schemes raise the total donations received including the match value, but partially crowd out the actual donations given excluding the match. If charitable organizations can use lead gifts as they wish, our results show they maximize donations given by simply announcing the presence of a lead gift. We contrast our price elasticity estimates with those based on changes in rules regarding tax deductions for charitable giving, as well as from the nascent literature using large-scale natural field experiments on giving.

Journal ArticleDOI
TL;DR: A methodology for ranking the relevant services for a given request is proposed, introducing objective measures based on dominance relationships defined among the services, and methods for clustering therelevant services in a way that reveals and reflects the different trade-offs between the matched parameters are investigated.
Abstract: As the web is increasingly used not only to find answers to specific information needs but also to carry out various tasks, enhancing the capabilities of current web search engines with effective and efficient techniques for web service retrieval and selection becomes an important issue. Existing service matchmakers typically determine the relevance between a web service advertisement and a service request by computing an overall score that aggregates individual matching scores among the various parameters in their descriptions. Two main drawbacks characterize such approaches. First, there is no single matching criterion that is optimal for determining the similarity between parameters. Instead, there are numerous approaches ranging from Information Retrieval similarity measures up to semantic logic-based inference rules. Second, the reduction of individual scores to an overall similarity leads to significant information loss. Determining appropriate weights for these intermediate scores requires knowledge of user preferences, which is often not possible or easy to acquire. Instead, using a typical aggregation function, such as the average or the minimum of the degrees of match across the service parameters, introduces undesired bias, which often reduces the accuracy of the retrieval process. Consequently, several services, e.g., those having a single unmatched parameter, may be excluded from the result set, while being potentially good candidates. In this work, we present two complementary approaches that overcome the aforementioned deficiencies. First, we propose a methodology for ranking the relevant services for a given request, introducing objective measures based on dominance relationships defined among the services. Second, we investigate methods for clustering the relevant services in a way that reveals and reflects the different trade-offs between the matched parameters. We demonstrate the effectiveness and the efficiency of our proposed techniques and algorithms through extensive experimental evaluation on both real requests and relevance sets, as well as on synthetic scenarios.