scispace - formally typeset
Search or ask a question

Showing papers by "Zhaoxia Yu published in 2007"


Journal ArticleDOI
TL;DR: This paper considers eight methods to infer missing genotypes, including two haplotype reconstruction methods (local expectation maximization-EM, and fastPHASE), two k-nearest neighbor methods (original k-NEarest neighbor, KNN, and a weighted k-Nearest neighbors, wtKNN), and three linear regression methods (backward variable selection, LM.back, least angle regression, and singular value decomposition), and a regression tree, Rtree.svd.
Abstract: For large-scale genotyping studies, it is common for most subjects to have some missing genetic markers, even if the missing rate per marker is low. This compromises association analyses, with varying numbers of subjects contributing to analyses when performing single-marker or multi-marker analyses. In this paper, we consider eight methods to infer missing genotypes, including two haplotype reconstruction methods (local expectation maximization-EM, and fastPHASE), two k-nearest neighbor methods (original k-nearest neighbor, KNN, and a weighted k-nearest neighbor, wtKNN), three linear regression methods (backward variable selection, LM.back, least angle regression, LM.lars, and singular value decomposition, LM.svd), and a regression tree, Rtree. We evaluate the accuracy of them using single nucleotide polymorphism (SNP) data from the HapMap project, under a variety of conditions and parameters. We find that fastPHASE has the lowest error rates across different analysis panels and marker densities. LM.lars gives slightly less accurate estimate of missing genotypes than fastPHASE, but has better performance than the other methods.

61 citations


Journal ArticleDOI
TL;DR: The results indicate that the sequential scan procedure can identify a set of adjacent markers whose haplotypes might have strong genetic effects or be in linkage disequilibrium with disease predisposing variants.
Abstract: Multi-locus association analyses, including haplotype-based analyses, can sometimes provide greater power than singlelocus analyses for detecting disease susceptibility loci. This potential gain, however, can be compromised by the large number of degrees of freedom caused by irrelevant markers. Exhaustive search for the optimal set of markers might be possible for a small number of markers, yet it is computationally inefficient. In this paper, we present a sequential haplotype scan method to search for combinations of adjacent markers that are jointly associated with disease status. When evaluating each marker, we add markers close to it in a sequential manner: a marker is added if its contribution to the haplotype association with disease is warranted, conditional on current haplotypes. This conditional evaluation is based on the wellknown Mantel-Haenszel statistic. We propose two permutation based methods to evaluate the growing haplotypes: a haplotype method for the combined markers, and a summary method that sums conditional statistics. We compared our proposed methods, the single-locus method, and a sliding window method using simulated data. We also applied our sequential haplotype scan algorithm to experimental data for CYP2D6. The results indicate that the sequential scan procedure can identify a set of adjacent markers whose haplotypes might have strong genetic effects or be in linkage disequilibrium with disease predisposing variants. As a result, our methods can achieve greater power than the single-locus method, yet is much more computationally efficient than sliding window methods. Genet. Epidemiol.31:553–564, 2007. r 2007 Wiley-Liss, Inc.

27 citations


Journal ArticleDOI
TL;DR: The authors' recently developed sequential haplotype scan method is applied to case-control data for rheumatoid arthritis, including the PTPN22 candidate gene on chromosome 1p and the association mapping data on chromosome 18q, and showed that the new approach is at least as powerful as the traditional single-locus analysis and sometimes can be more powerful.
Abstract: Haplotype association analysis based on arbitrarily chosen markers might lower statistical power because of the larger number of degrees of freedom caused by irrelevant makers.

2 citations