scispace - formally typeset
Search or ask a question

Showing papers on "Matching (statistics) published in 2005"


Journal ArticleDOI
TL;DR: In this article, the authors examine the specification and power of tests based on performance-matched discretionary accruals, and make comparisons with tests using traditional discretionary accumrual measures (e.g., Jones and modified-Jones models).

4,247 citations


Journal ArticleDOI
TL;DR: The authors applied cross-sectional and longitudinal propensity score matching estimators to data from the National Supported Work (NSW) Demonstration that have been previously analyzed by LaLonde (1986) and Dehejia and Wahba (1999, 2002).

2,380 citations


Journal ArticleDOI
TL;DR: This article found evidence consistent with small banks being better able to collect and act on soft information than large banks, and that large banks are less willing to lend to informationally "difficult" credits, such as firms with no financial records.

1,407 citations


Journal ArticleDOI
TL;DR: In this article, a review of non-experimental methods for the evaluation of social programmes is presented, where matching and selection methods are compared for cross-section, repeated crosssection and longitudinal data.
Abstract: This paper presents a review of non-experimental methods for the evaluation of social programmes. We consider matching and selection methods and analyse each for cross-section, repeated crosssection and longitudinal data. The methods are assessed drawing on evidence from labour market programmes in the UK and in the US.

1,057 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a model of matching with contracts which incorporates, as special cases, the college admissions problem, the Kelso-Crawford labor market matching model, and ascending package auctions.
Abstract: We develop a model of matching with contracts which incorporates, as special cases, the college admissions problem, the Kelso-Crawford labor market matching model, and ascending package auctions. We introduce a new "law of aggregate demand" for the case of discrete heterogeneous workers and show that, when workers are substitutes, this law is satisfied by profit-maximizing firms. When workers are substitutes and the law is satisfied, truthful reporting is a dominant strategy for workers in a worker-offering auction/matching algorithm. We also parameterize a large class of preferences satisfying the two conditions.,

792 citations


Proceedings Article
30 Aug 2005
TL;DR: This work presents three algorithms that consider especially the trajectory nature of the data rather than simply the current position as in the typical map-matching case, and proposes an incremental algorithm that matches consecutive portions of the trajectory to the road network.
Abstract: Vehicle tracking data is an essential "raw" material for a broad range of applications such as traffic management and control, routing, and navigation. An important issue with this data is its accuracy. The method of sampling vehicular movement using GPS is affected by two error sources and consequently produces inaccurate trajectory data. To become useful, the data has to be related to the underlying road network by means of map matching algorithms. We present three such algorithms that consider especially the trajectory nature of the data rather than simply the current position as in the typical map-matching case. An incremental algorithm is proposed that matches consecutive portions of the trajectory to the road network, effectively trading accuracy for speed of computation. In contrast, the two global algorithms compare the entire trajectory to candidate paths in the road network. The algorithms are evaluated in terms of (i) their running time and (ii) the quality of their matching result. Two novel quality measures utilizing the Frechet distance are introduced and subsequently used in an experimental evaluation to assess the quality of matching real tracking data to a road network.

633 citations


Journal ArticleDOI
TL;DR: Benefits of the model include its potential to facilitate improved understanding of similarities and differences among treatments, to guide treatment selection and matching to clients, to address gaps in the literature, and to point to possibilities for new interventions based on the current research base.
Abstract: A model is proposed whereby the intervention literature can be empirically factored or distilled to derive profiles from evidence-based approaches. The profiles can then be matched to individual clients based on consideration of their target problems, as well as demographic and contextual factors. Application of the model is illustrated by an analysis of the youth treatment literature. Benefits of the model include its potential to facilitate improved understanding of similarities and differences among treatments, to guide treatment selection and matching to clients, to address gaps in the literature, and to point to possibilities for new interventions based on the current research base.

589 citations


Journal ArticleDOI
TL;DR: This surface matching technique is a generalization of the least squares image matching concept and offers high flexibility for any kind of 3D surface correspondence problem, as well as statistical tools for the analysis of the quality of final matching results.
Abstract: The automatic co-registration of point clouds, representing 3D surfaces, is a relevant problem in 3D modeling. This multiple registration problem can be defined as a surface matching task. We treat it as least squares matching of overlapping surfaces. The surface may have been digitized/sampled point by point using a laser scanner device, a photogrammetric method or other surface measurement techniques. Our proposed method estimates the transformation parameters of one or more 3D search surfaces with respect to a 3D template surface, using the Generalized Gauss–Markoff model, minimizing the sum of squares of the Euclidean distances between the surfaces. This formulation gives the opportunity of matching arbitrarily oriented 3D surface patches. It fully considers 3D geometry. Besides the mathematical model and execution aspects we address the further extensions of the basic model. We also show how this method can be used for curve matching in 3D space and matching of curves to surfaces. Some practical examples based on the registration of close-range laser scanner and photogrammetric point clouds are presented for the demonstration of the method. This surface matching technique is a generalization of the least squares image matching concept and offers high flexibility for any kind of 3D surface correspondence problem, as well as statistical tools for the analysis of the quality of final matching results.

569 citations


Journal ArticleDOI
TL;DR: This paper explores the impact of random device mismatch on the performance of general analog circuits and results in a fixed bandwidth-accuracy-power tradeoff which is independent of bias point for bipolar circuits whereas for MOS circuits some bias point optimizations are possible.
Abstract: Random device mismatch plays an important role in the design of accurate analog circuits. Models for the matching of MOS and bipolar devices from open literature show that matching improves with increasing device area. As a result, accuracy requirements impose a minimal device area and this paper explores the impact of this constraint on the performance of general analog circuits. It results in a fixed bandwidth-accuracy-power tradeoff which is set by technology constants. This tradeoff is independent of bias point for bipolar circuits whereas for MOS circuits some bias point optimizations are possible. The performance limitations imposed by matching are compared to the limits imposed by thermal noise. For MOS circuits the power constraints due to matching are several orders of magnitude higher than for thermal noise. For the bipolar case the constraints due to noise and matching are of comparable order of magnitude. The impact of technology scaling on the conclusions of this work are briefly explored.

473 citations


Proceedings ArticleDOI
05 Apr 2005
TL;DR: In this article, a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better, and they show experimental results that demonstrate corpus-based matching outperforms direct matching in multiple domains.
Abstract: Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have proposed a principled combination of multiple algorithms. However, these solutions sometimes perform rather poorly due to the lack of sufficient evidence in the schemas being matched. In this paper we show how a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better. Such a corpus typically contains multiple schemas that model similar concepts and hence enables us to learn variations in the elements and their properties. We exploit such a corpus in two ways. First, we increase the evidence about each element being matched by including evidence from similar elements in the corpus. Second, we learn statistics about elements and their relationships and use them to infer constraints that we use to prune candidate mappings. We also describe how to use known mappings to learn the importance of domain and generic constraints. We present experimental results that demonstrate corpus-based matching outperforms direct matching (without the benefit of a corpus) in multiple domains.

400 citations


Journal ArticleDOI
TL;DR: In this article, the effect of education on individual earnings is reviewed for single treatments and sequential multiple treatments with and without heterogeneous returns, and the sensitivity of the estimates once applied to a common data set is explored.
Abstract: Regression, matching, control function and instrumental variables methods for recovering the effect of education on individual earnings are reviewed for single treatments and sequential multiple treatments with and without heterogeneous returns. The sensitivity of the estimates once applied to a common data set is then explored. We show the importance of correcting for detailed test score and family background differences and of allowing for (observable) heterogeneity in returns. We find an average return of 27% for those completing higher education versus anything less. Compared with stopping at 16 years of age without qualifications, we find an average return to O-levels of 18%, to A-levels of 24% and to higher education of 48%.

Journal ArticleDOI
TL;DR: In this article, the authors discuss propensity score matching in the context of Smith and Todd's (does matching overcome Lalonde's critique of nonexperimental estimators, J. Am. Statist., 2002, forthcoming).

Book
23 Jun 2005
TL;DR: In this paper, the authors present treatment effect analysis for hidden bias analysis and compare different approaches to hidden bias, including Matching, Matching and multiple and dynamic treatments, with a tour of the book.
Abstract: 1. Tour of the book 2. Basics of treatment effect analysis 3. Controlling for covariates 4. Matching 5. Design and instrument for hidden bias 6. Other approaches to hidden bias 7. Multiple and dynamic treatments Appendix References

Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the ability of formal rules to establish U.S. business cycle turning point dates in real time using a new dataset of coincident monthly variables.
Abstract: We evaluate the ability of formal rules to establish U.S. business cycle turning point dates in real time. We consider two approaches, a nonparametric algorithm and a parametric Markov-switching dynamic-factor model. Using a new “real-time” dataset of coincident monthly variables, we find that both approaches would have accurately identified the NBER business cycle chronology had they been in use over the past 30 years, with the Markov-switching model most closely matching the NBER dates. Further, both approaches, and particularly the Markov-switching model, yielded significant improvement over the NBER in the speed with which business cycle troughs were identified.

Book ChapterDOI
TL;DR: Both quality indices for fingerprint images are developed and by applying a quality-based weighting scheme in the matching algorithm, the overall matching performance can be improved; a decrease of 1.94% in EER is observed on the FVC2002 DB3 database.
Abstract: The performance of an automatic fingerprint authentication system relies heavily on the quality of the captured fingerprint images. In this paper, two new quality indices for fingerprint images are developed. The first index measures the energy concentration in the frequency domain as a global feature. The second index measures the spatial coherence in local regions. We present a novel framework for evaluating and comparing quality indices in terms of their capability of predicting the system performance at three different stages, namely, image enhancement, feature extraction and matching. Experimental results on the IBM-HURSLEY and FVC2002 DB3 databases demonstrate that the global index is better than the local index in the enhancement stage (correlation of 0.70 vs. 0.50) and comparative in the feature extraction stage (correlation of 0.70 vs. 0.71). Both quality indices are effective in predicting the matching performance, and by applying a quality-based weighting scheme in the matching algorithm, the overall matching performance can be improved; a decrease of 1.94% in EER is observed on the FVC2002 DB3 database.

Journal ArticleDOI
TL;DR: The authors found that matching messages to recipients' self-schemata leads to increased or decreased persuasion, depending on the advertisement's argument quality, along with participants' pattern of cognitive responses suggests an elaboration-based account.
Abstract: Research indicates that messages or products matching individuals' self-schemata are viewed more favorably, but little is known about how or when such effects occur. Experiment 1 indicates that messages matched to participants' level of extroversion lead to larger argument quality effects on attitudes than do mismatched messages. In experiment 2, these effects are replicated with the self-schema of need for cognition. Across studies, matching messages to recipients' self-schemata leads to increased or decreased persuasion, depending on the advertisement's argument quality. The interaction of self-schema matching with argument quality along with participants' pattern of cognitive responses suggests an elaboration-based account.

Proceedings ArticleDOI
Hang Cui1, Renxu Sun1, Keya Li1, Min-Yen Kan1, Tat-Seng Chua1 
15 Aug 2005
TL;DR: This work presents two methods for learning relation mapping scores from past QA pairs: one based on mutual information and the other on expectation maximization, which significantly outperforms state-of-the-art density-based passage retrieval methods.
Abstract: State-of-the-art question answering (QA) systems employ term-density ranking to retrieve answer passages Such methods often retrieve incorrect passages as relationships among question terms are not considered Previous studies attempted to address this problem by matching dependency relations between questions and answers They used strict matching, which fails when semantically equivalent relationships are phrased differently We propose fuzzy relation matching based on statistical models We present two methods for learning relation mapping scores from past QA pairs: one based on mutual information and the other on expectation maximization Experimental results show that our method significantly outperforms state-of-the-art density-based passage retrieval methods by up to 78% in mean reciprocal rank Relation matching also brings about a 50% improvement in a system enhanced by query expansion

Journal ArticleDOI
TL;DR: This work presents an approach that uses localized secondary features derived from relative minutiae information that is directly applicable to existing databases and balances the tradeoffs between maximizing the number of matches and minimizing total feature distance between query and reference fingerprints.

01 Mar 2005
TL;DR: A joint study committee of the Transportation Research Board and the Institute of Medicine has recommended research strategies to gain practical guidance on cost-beneficial investments and changes in the built environment that would encourage increased levels of physical activity as discussed by the authors.
Abstract: Empirical evidence links the built environment and physical activity, but few studies have proved capable of demonstrating a causal relationship. A joint study committee of the Transportation Research Board and the Institute of Medicine has recommended research strategies to gain practical guidance on cost-beneficial investments and changes in the built environment that would encourage increased levels of physical activity. This article summarizes the major findings of the study committee, as well as the committee's recommendations, which are published in TRB Special Report 282. The committee urges a continuing and well-supported research effort. Priorities for research include interdisciplinary approaches and international collaboration; more complete conceptual models; better research designs; and more detailed examination and matching of specific characteristics of the built environment with different types of physical activity. Other recommendations call for expanding national public health and travel surveys to provide more detailed information about the location of physical activity and travel; evaluating changes to the built environment as natural experiments to be studied for their impacts on physical activity; and emphasizing interdisciplinary education programs at universities.

Journal ArticleDOI
TL;DR: This article is a simple introduction to the latter methods for dealing with confounding in epidemiology with the emphasis on showing how they work, their assumptions, and how they compare with other methods.
Abstract: Confounding is a major concern in causal studies because it results in biased estimation of exposure effects. In the extreme, this can mean that a causal effect is suggested where none exists, or that a true effect is hidden. Typically, confounding occurs when there are differences between the exposed and unexposed groups in respect of independent risk factors for the disease of interest, for example, age or smoking habit; these independent factors are called confounders. Confounding can be reduced by matching in the study design but this can be difficult and/or wasteful of resources. Another possible approach—assuming data on the confounder(s) have been gathered—is to apply a statistical “correction” method during analysis. Such methods produce “adjusted” or “corrected” estimates of the effect of exposure; in theory, these estimates are no longer biased by the erstwhile confounders. Given the importance of confounding in epidemiology, statistical methods said to remove it deserve scrutiny. Many such methods involve strong assumptions about data relationships and their validity may depend on whether these assumptions are justified. Historically, the most common statistical approach for dealing with confounding in epidemiology was based on stratification ; the standardised mortality ratio is a well known statistic using this method to remove confounding by age. Increasingly, this approach is being replaced by methods based on regression models . This article is a simple introduction to the latter methods with the emphasis on showing how they work, their assumptions, and how they compare with other methods. Before applying a statistical correction method, one has to decide which factors are confounders. This sometimes1–4 complex issue is not discussed in detail and for the most part the examples will assume that age is a confounder. However, the use of automated statistical procedures for choosing variables to include in a regression model …

Journal ArticleDOI
TL;DR: A novel sequence matching technique to detect copies of a video clip that is robust to the many digitization and encoding processes that give rise to several distortions, including changes in brightness, color, frame format, as well as different blocky artifacts.
Abstract: This paper proposes a novel sequence matching technique to detect copies of a video clip. If a video copy detection technique is to be effective, it needs to be robust to the many digitization and encoding processes that give rise to several distortions, including changes in brightness, color, frame format, as well as different blocky artifacts. Most of the video copy detection algorithms proposed so far focus mostly on coping with signal distortions introduced by different encoding parameters; however, these algorithms do not cope well with display format conversions. We propose a copy-detection scheme that is robust to the above-mentioned distortions and is also robust to display format conversions. To this end, each image frame is partitioned into 2 /spl times/ 2 by intensity averaging, and the partitioned values are stored for indexing and matching. Our spatiotemporal approach combines spatial matching of ordinal signatures obtained from the partitions of each frame and temporal matching of temporal signatures from the temporal trails of the partitions. The proposed method has been extensively tested and the results show the proposed scheme is effective in detecting copies which have been subjected to wide range of modifications.

Journal ArticleDOI
TL;DR: In this paper, the authors combine the microeconomic-labor and macroeconomic-equilibrium views of matching in labor markets, and obtain two new equilibrium implications of job matching and search frictions for wage inequality.
Abstract: This paper brings together the microeconomic-labor and the macroeconomic-equilibrium views of matching in labor markets. We nest a job matching model a la Jovanovic (1984) into a Mortensen and Pissarides (1994)-type equilibrium search environment. The resulting framework preserves the implications of job matching theory for worker turnover and wage dynamics, and it also allows for aggregation and general equilibrium analysis. We obtain two new equilibrium implications of job matching and search frictions for wage inequality. First, learning about match quality and worker turnover map Gaussian output noise into an ergodic wage distribution of empirically accurate shape: unimodal, skewed, with a Paretian right tail. Second, high idiosyncratic productivity risk hinders learning and sorting, and reduces wage inequality. The equilibrium solutions for the wage distribution and for the aggregate worker flows-quits to unemployment and to other jobs, displacements, hires-provide the likelihood function of the model in closed form.

Proceedings ArticleDOI
05 Apr 2005
TL;DR: An algorithm is described that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names, able to identify corresponding attributes by comparing data values within those duplicate records.
Abstract: Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.

Journal ArticleDOI
Bo Lu1
TL;DR: A time‐dependent propensity score based on the Cox proportional hazards model is proposed and used in risk set matching and matching on this propensity score is shown to achieve a balanced distribution of the covariates in both treated and control groups.
Abstract: Summary In observational studies with a time-dependent treatment and time-dependent covariates, it is desirable to balance the distribution of the covariates at every time point. A time-dependent propensity score based on the Cox proportional hazards model is proposed and used in risk set matching. Matching on this propensity score is shown to achieve a balanced distribution of the covariates in both treated and control groups. Optimal matching with various designs is conducted and compared in a study of a surgical treatment, cystoscopy and hydrodistention, given in response to a chronic bladder disease, interstitial cystitis. Simulation studies also suggest that the statistical analysis after matching outperforms the analysis without matching in terms of both point and interval estimations.

Proceedings Article
01 Jan 2005
TL;DR: An efficient method for audio matching which performs effectively for a wide range of classical music is described, and a new type of chroma-based audio feature is introduced that strongly correlates to the harmonic progression of the audio signal.
Abstract: In this paper, we describe an efficient method for audio matching which performs effectively for a wide range of classical music. The basic goal of audio matching can be described as follows: consider an audio database containing several CD recordings for one and the same piece of music interpreted by various musicians. Then, given a short query audio clip of one interpretation, the goal is to automatically retrieve the corresponding excerpts from the other interpretations. To solve this problem, we introduce a new type of chroma-based audio feature that strongly correlates to the harmonic progression of the audio signal. Our feature shows a high degree of robustness to variations in parameters such as dynamics, timbre, articulation, and local tempo deviations. As another contribution, we describe a robust matching procedure, which allows to handle global tempo variations. Finally, we give a detailed account on our experiments, which have been carried out on a database of more than 110 hours of audio comprising a wide range of classical music.

Proceedings ArticleDOI
06 Oct 2005
TL;DR: This work presents a discriminative, large-margin approach to feature-based matching for word alignment, which achieves AER performance close to IBM Model 4, in much less time.
Abstract: We present a discriminative, large-margin approach to feature-based matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that pair, including measures of association between the words, distortion between their positions, similarity of the orthographic form, and so on. Even with only 100 labeled training examples and simple features which incorporate counts from a large unlabeled corpus, we achieve AER performance close to IBM Model 4, in much less time. Including Model 4 predictions as features, we achieve a relative AER reduction of 22% in over intersected Model 4 alignments.

PatentDOI
TL;DR: In this paper, a method and system used to determine the similarity between an input speech data and a sample speech data is provided, where the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input text frames and each sample text frame.
Abstract: A method and system used to determine the similarity between an input speech data and a sample speech data is provided. First, the input speech data is segmented into a plurality of input speech frames and the sample speech data is segmented into a plurality of sample speech frames. Then, the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input speech frames and each of the sample speech frames. Next, the distance values are used to calculate a matching score. Finally, the similarity between the input speech data and the sample speech data is determined according to this matching score.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: By employing a two-pass dynamic programming technique that performs optimization both along and across the scanlines, the typical inter-scanline inconsistency problem is solved and the stability and efficiency of the optimization are improved significantly.
Abstract: A method for solving dense stereo matching problem is presented in this paper. First, a new generalized ground control points (GGCPs) scheme is introduced, where one or more disparity candidates for the true disparity of each pixel are assigned by local matching using the oriented spatial filters. By allowing "all" pixels to have multiple candidates for their true disparities, GGCPs not only guarantee to provide a sufficient number of starting pixels needed for guiding the subsequent matching process, but also remarkably reduce the risk of false match, improving the previous GCP-based approaches where the number of the selected control points tends to be inversely proportional to the reliability. Second, by employing a two-pass dynamic programming technique that performs optimization both along and across the scanlines, we solve the typical inter-scanline inconsistency problem. Moreover, combined with the GGCPs, the stability and efficiency of the optimization are improved significantly. Experimental results for the standard data sets show that the proposed algorithm achieves comparable results to the state-of-the-arts with much less computational cost.

Proceedings ArticleDOI
05 Dec 2005
TL;DR: This paper presents a novel method for 2D laser scan matching called polar scan matching (PSM), which avoids searching for point associations by simply matching points with the same bearing, and enables the construction of an algorithm faster than the iterative closest point (ICP).
Abstract: This paper presents a novel method for 2D laser scan matching called polar scan matching (PSM). The method belongs to the family of point to point matching approaches. Our method avoids searching for point associations by simply matching points with the same bearing. This association rule enables the construction of an algorithm faster than the iterative closest point (ICP). Firstly the PSM approach is tested with simulated laser scans. Then the accuracy of our matching algorithm is evaluated from real laser scans from known relative positions to establish a ground truth. Furthermore, to demonstrate the practical usability of the new PSM approach, experimental results from a Kalman filter implementation of simultaneous localization and mapping (SLAM) are provided.

Journal ArticleDOI
TL;DR: This work has shown that in the context of a rich data source, applied to the study of an outcome that occurs infrequently, there will typically be many more variables available for control as potential confounders than traditional epidemiologic techniques will allow.
Abstract: Background Confounding by indication is a common problem in pharmacoepidemiology, where predictors of treatment also have prognostic value for the outcome of interest. The tools available to the epidemiologist that can be used to mitigate the effects of confounding by indication often have limits with respect to the number of variables that can be simultaneously incorporated as components of the confounding. This constraint becomes particularly apparent in the context of a rich data source (such as administrative claims data), applied to the study of an outcome that occurs infrequently. In such settings, there will typically be many more variables available for control as potential confounders than traditional epidemiologic techniques will allow. Methods One tool that can indirectly permit control of a large number of variables is the propensity score approach. This paper illustrates the application of the propensity score to a study conducted in an administrative database, and raises critical issues to be addressed in such an analysis. In this example, the effect of statin therapy on the occurrence of myocardial infarction was examined, and numerous potential confounders of this association were adjusted simultaneously using a propensity score to form matched cohorts of statin initiators and non-initiators. Results The incidence of myocardial infarction observed in the statin treated cohort was lower than the incidence in the untreated cohort, and the magnitude of this effect was consistent with results from randomized placebo controlled clinical trials of statin therapy. Conclusions This example illustrates how confounding by indication can be mitigated by the propensity score matching technique. Concerns remain over the generalizability of estimates obtained from such a study, and how to know when propensity scores are removing bias, since apparent balance between compared groups on measured variables could leave variables not included in the propensity score unbalanced and lead to confounded effect estimates. Copyright © 2005 John Wiley & Sons, Ltd.