scispace - formally typeset
Search or ask a question

Showing papers in "Genetic Epidemiology in 2003"


Journal ArticleDOI
TL;DR: A permutation procedure for the null hypothesis of interest is described, which controls for violation of the assumptions, and a likelihood‐ratio test is proposed, using the expectation‐maximization (E‐M) algorithm to account for haplotype ambiguities.
Abstract: Association tests of multilocus haplotypes are of interest both in linkage disequilibrium mapping and in candidate gene studies. For case-parent trios, I discuss the extension of existing multilocus methods to include ambiguous haplotypes in tests of models which distinguish between the cis and trans phase. A likelihood-ratio test is proposed, using the expectation-maximization (E-M) algorithm to account for haplotype ambiguities. Assumptions about the population structure are required, but realistic situations, including population stratification, which violate the assumptions lead to conservative tests. I describe a permutation procedure for the null hypothesis of interest, which controls for violation of the assumptions. For general pedigrees, I describe extensions of the pedigree disequilibrium test to include uncertain haplotypes. The summary statistics are replaced by their expected values over prior distributions of haplotype frequencies. If prior distributions are not available, a valid test is possible by using the E-M algorithm to estimate the null distribution of haplotype frequencies. Similar methods are available for quantitative traits. Exact permutation tests are difficult to construct in small samples, but an approximate procedure is appropriate in large samples, and can be used to account for dependencies between tests of multiple haplotypes and loci.

1,182 citations


Journal ArticleDOI
TL;DR: Using simulated data, multifactor dimensionality reduction has high power to identify gene‐gene interactions in the presence of 5% genotyping error, 5% missing data, phenocopy, or a combination of both, and MDR has reduced power for some models in the Presence of 50% Phenocopy and very limited power in the absence of genetic heterogeneity.
Abstract: The identification and characterization of genes that influence the risk of common, complex multifactorial diseases, primarily through interactions with other genes and other environmental factors, remains a statistical and computational challenge in genetic epidemiology. This challenge is partly due to the limitations of parametric statistical methods for detecting genetic effects that are dependent solely or partially on interactions with other genes and environmental exposures. We previously introduced multifactor dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus genotype information to improve the identification of polymorphism combinations associated with disease risk. The MDR approach is nonparametric (i.e., no hypothesis about the value of a statistical parameter is made), is model-free (i.e., assumes no particular inheritance model), and is directly applicable to case-control and discordant sib-pair study designs. Both empirical and theoretical studies suggest that MDR has excellent power for identifying high-order gene-gene interactions. However, the power of MDR for identifying gene-gene interactions in the presence of common sources of noise is not currently known. The goal of this study was to evaluate the power of MDR for identifying gene-gene interactions in the presence of noise due to genotyping error, missing data, phenocopy, and genetic or locus heterogeneity. Using simulated data, we show that MDR has high power to identify gene-gene interactions in the presence of 5% genotyping error, 5% missing data, or a combination of both. However, MDR has reduced power for some models in the presence of 50% phenocopy, and very limited power in the presence of 50% genetic heterogeneity. Extending MDR to address genetic heterogeneity should be a priority for the continued methodological development of this new approach.

533 citations


Journal ArticleDOI
TL;DR: The results indicate that association studies based on cases with a strong family history, identified for example through cancer genetics clinics, may be substantially more efficient than population‐based studies.
Abstract: Susceptibility to breast cancer is likely to be the result of susceptibility alleles in many different genes. In particular, one segregation analysis of breast cancer suggested that disease susceptibility in noncarriers of BRCA1/2 mutations may be explicable in terms of a polygenic model, with large numbers of susceptibility polymorphisms acting multiplicatively on risk. We considered the implications for such a model on the design of association studies to detect susceptibility polymorphisms, in particular the efficacy of utilizing cases with a family history of the disease, together with unrelated controls. Relative to a standard case-control association study with cases unselected for family history, the sample size required to detect a common disease susceptibility allele was typically reduced by more than twofold if cases with an affected first-degree relative were selected, and by more than fourfold if cases with two affected first-degree relatives were utilized. The relative efficiency obtained by using familial cases was greater for rarer alleles. Analysis of extended families indicated that the power was most dependent on the immediate (first-degree) family history. Bilateral cases may offer a similar gain in power to cases with two affected first-degree relatives. In contrast to the strong effect of family history, varying the ages at diagnosis of the cases across the range of 35-65 years did not strongly affect the power to detect association. These results indicate that association studies based on cases with a strong family history, identified for example through cancer genetics clinics, may be substantially more efficient than population-based studies.

250 citations


Journal ArticleDOI
TL;DR: The goal is to determine the two‐stage parameters (α1, β1, α2) that minimize the cost of the study such that the desired overall significance is α and the desired power is close to 1−β, the power of the one‐stage approach.
Abstract: We propose a cost-effective two-stage approach to investigate gene-disease associations when testing a large number of candidate markers using a case-control design. Under this approach, all the markers are genotyped and tested at stage 1 using a subset of affected cases and unaffected controls, and the most promising markers are genotyped on the remaining individuals and tested using all the individuals at stage 2. The sample size at stage 1 is chosen such that the power to detect the true markers of association is 1� b1 at significance level a1. The most promising markers are tested at significance level a2 at stage 2. In contrast, a one-stage approach would evaluate and test all the markers on all the cases and controls to identify the markers significantly associated with the disease. The goal is to determine the two-stage parameters (a1, b1, a2) that minimize the cost of the study such that the desired overall significance is a and the desired power is close to 1� b, the power of the one-stage approach. We provide analytic formulae to estimate the two-stage parameters. The properties of the two-stage approach are evaluated under various parametric configurations and compared with those of the corresponding one-stage approach. The optimal two-stage procedure does not depend on the signal of the markers associated with the study. Further, when there is a large number of markers, the optimal procedure is not substantially influenced by the total number of markers associated with the disease. The results show that, compared to a one-stage approach, a two-stage procedure typically halves the cost of the study. Genet Epidemiol 25:149–157, 2003. & 2003 Wiley-Liss, Inc.

162 citations


Journal ArticleDOI
TL;DR: It is demonstrated that even a modest degree of ethnic confounding can lead to unacceptably high type I error rates for tests of genetic effects.
Abstract: With the increasing availability of genetic data, many studies of quantitative traits focus on hypotheses related to candidate genes, and also gene-environment (G×E) and gene-gene (G×G) interactions. In a population-based sample, estimates and tests of candidate gene effects can be biased by ethnic confounding, also known as population stratification bias. This paper demonstrates that even a modest degree of ethnic confounding can lead to unacceptably high type I error rates for tests of genetic effects. The parent-offspring trio design is reviewed, and several forms of the quantitative transmission disequilibrium test (QTDT) are summarized. A variation of the QTDT (QTDTM) is described that is based on a linear regression model with multiple intercepts, one per parental mating type. This and other models are expanded to allow testing of G×E and G×G interactions. A method for computing required sample sizes using direct computations is described. Sample size requirements for tests of genetic main effects and G×E and G×G interactions are compared across various QTDT approaches to infer their efficiencies relative to one another. The QTDTM is found to meet or exceed the efficiency of other QTDT approaches. For example, the QTDTM is approximately 3% more efficient than the QTDT of Rabinowitz ([1997] Hum. Hered. 47:342–350) for testing a genetic main effect, but can be as much as twice as efficient for testing G×E interaction, and three times more efficient for testing G×G interaction. Genet Epidemiol 25:327–338, 2003. © 2003 Wiley-Liss, Inc.

137 citations


Journal ArticleDOI
TL;DR: It is shown that, under realistic scenarios, forming the product of the K most significant P‐values provides increased power to detect genomewide association, while identifying a candidate set of good quality and fixed size for follow‐up studies.
Abstract: Large exploratory studies are often characterized by a preponderance of true null hypotheses, with a small though multiple number of false hypotheses. Traditional multiple-test adjustments consider either each hypothesis separately, or all hypotheses simultaneously, but it may be more desirable to consider the combined evidence for subsets of hypotheses, in order to reduce the number of hypotheses to a manageable size. Previously, Zaykin et al. ([2002] Genet. Epidemiol. 22:170-185) proposed forming the product of all P-values at less than a preset threshold, in order to combine evidence from all significant tests. Here we consider a complementary strategy: form the product of the K most significant P-values. This has certain advantages for genomewide association scans: K can be chosen on the basis of a hypothesised disease model, and is independent of sample size. Furthermore, the alternative hypothesis corresponds more closely to the experimental situation where all loci have fixed effects. We give the distribution of the rank truncated product and suggest some methods to account for correlated tests in genomewide scans. We show that, under realistic scenarios, it provides increased power to detect genomewide association, while identifying a candidate set of good quality and fixed size for follow-up studies.

128 citations


Journal ArticleDOI
TL;DR: The geno‐PDT approach for testing genotypes in general family data provides a useful tool for identifying genes in complex disease, and partitioning individual genotype contributions will help to dissect the influence of genotype on risk.
Abstract: Many family-based tests of linkage disequilibrium (LD) are based on counts of alleles rather than genotypes. However, allele-based tests may not detect interactions among alleles at a single locus that are apparent when examining associations with genotypes. Family-based tests of LD based on genotypes have been developed, but they are typically valid as tests of association only in families with a single affected individual. To take advantage of families with multiple affected individuals, we propose the genotype-pedigree disequilibrium test (geno-PDT) to test for LD between marker locus genotypes and disease. Unlike previous tests for genotypic association, the geno-PDT is valid in general pedigrees. Simulations to compare the power of the allele-based PDT and geno-PDT reveal that under an additive model, the allele-based PDT is more powerful, but that the geno-PDT can have greater power when the genetic model is recessive or dominant. Perhaps the most important property of the geno-PDT is the ability to test for association with particular genotypes, which can reveal underlying patterns of association at the genotypic level. These genotype-specific tests can be used to suggest possible underlying genetic models that are consistent with the pattern of genotypic association. This is illustrated through an application to a candidate gene analysis of the MLLT3 gene in families with Alzheimer disease. The geno-PDT approach for testing genotypes in general family data provides a useful tool for identifying genes in complex disease, and partitioning individual genotype contributions will help to dissect the influence of genotype on risk.

120 citations


Journal ArticleDOI
TL;DR: These models fully account for haplotype phase ambiguity and allow for covariates and are encoded into a software package (the Evolutionary‐Based Haplotype Analysis Package, EHAP), which also provides for various kinds of exploratory data analysis.
Abstract: Association studies, both family-based and population-based, can be powerful means of detecting disease-liability alleles. To increase the information of the test, various researchers have proposed targeting haplotypes. The larger number of haplotypes, however, relative to alleles at individual loci, could decrease power because of the additional degrees of freedom required for the test. An optimal strategy would focus the test on particular haplotypes or groups of haplotypes, much as is done with cladistic-based association analysis. First suggested by Templeton et al. ([1987] Genetics 117:343-351), such analyses use the evolutionary relationships among haplotypes to produce a limited set of hypothesis tests and to increase the interpretability of these tests. To more fully utilize the information contained in the evolutionary relationships among haplotypes and in the sample, we propose generalized linear models (GLM) for the analysis of data from family-based and population-based studies. These models fully account for haplotype phase ambiguity and allow for covariates. The models are encoded into a software package (the Evolutionary-Based Haplotype Analysis Package, EHAP), which also provides for various kinds of exploratory data analysis. The exploratory analyses, such as error checking, estimation of haplotype frequencies, and tools for building cladograms, should facilitate the implementation of cladistic-based association analysis with haplotypes.

115 citations


Journal ArticleDOI
TL;DR: A new test that can be used with case‐parent triad data (affected individuals and their parents) to identify loci for which a maternal‐fetal genotype incompatibility increases the risk for disease, and it is shown that the type‐I error rate of the MFG test is appropriate, the estimated parameters are accurate, and that the test is powerful enough to detect a mothers' fetal genotypes of moderate effect size.
Abstract: Biological mechanisms that involve gene-by-environment interactions have been hypothesized to explain susceptibility to complex familial disorders. Current research provides compelling evidence that one environmental factor, which acts prenatally to increase susceptibility, arises from a maternal-fetal genotype incompatibility. Because it is genetic in origin, a maternal-fetal incompatibility is one possible source of an adverse environment that can be detected in genetic analyses and precisely studied, even years after the adverse environment was present. Existing statistical models and tests for gene detection are not optimal or even appropriate for identifying maternal-fetal genotype incompatibility loci that may increase the risk for complex disorders. We describe a new test, the maternal-fetal genotype incompatibility (MFG) test, that can be used with case-parent triad data (affected individuals and their parents) to identify loci for which a maternal-fetal genotype incompatibility increases the risk for disease. The MFG test adapts a log-linear approach for case-parent triads in order to detect maternal-fetal genotype incompatibility at a candidate locus, and allows the incompatibility effects to be estimated separately from direct effects of either the maternal or the child's genotype. Through simulations of two biologically plausible maternal-fetal genotype incompatibility scenarios, we show that the type-I error rate of the MFG test is appropriate, that the estimated parameters are accurate, and that the test is powerful enough to detect a maternal-fetal genotype incompatibility of moderate effect size.

104 citations


Journal ArticleDOI
TL;DR: SPTA controls for population stratification through a set of genomic markers by first deriving a genetic background variable for each sampled individual through his/her genotypes at a series of independent markers, and then modeling the relationship between trait values, genotypic scores at the candidate marker, and genetic background variables through a semiparametric model.
Abstract: Although genetic association studies using unrelated individuals may be subject to bias caused by population stratification, alternative methods that are robust to population stratification such as family-based association designs may be less powerful. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate markers and traits of interest (both qualitative and quantitative). Here, we propose a semiparametric test for association (SPTA). SPTA controls for population stratification through a set of genomic markers by first deriving a genetic background variable for each sampled individual through his/her genotypes at a series of independent markers, and then modeling the relationship between trait values, genotypic scores at the candidate marker, and genetic background variables through a semiparametric model. We assume that the exact form of relationship between the trait value and the genetic background variable is unknown and estimated through smoothing techniques. We evaluate the performance of SPTA through simulations both with discrete subpopulation models and with continuous admixture population models. The simulation results suggest that our procedure has a correct type I error rate in the presence of population stratification and is more powerful than statistical association tests for family-based association designs in all the cases considered. Moreover, SPTA is more powerful than the Quantitative Similarity-Based Association Test (QSAT) developed by us under continuous admixture populations, and the number of independent markers needed by SPTA to control for population stratification is substantially fewer than that required by QSAT.

102 citations


Journal ArticleDOI
TL;DR: No strong associations were found between CL±P and variants at these three genes, but there was a possible recessive effect of the TGFA TaqI variant on the risk of CPO, with a 3‐fold risk among children homozygous for the variant.
Abstract: We selected 262 case-parent triads from a population-based study of orofacial clefts in Norway, and examined variants of developmental genes TGFA, TGFB3, and MSX1 in the etiology of orofacial clefts. One hundred seventy-four triads of cleft lip cases (CL±P) and 88 triads of cleft palate only cases (CPO) were analyzed. There was little evidence for an association of any of these genes with CL±P. The strongest association was a 1.7-fold risk with two copies of the TGFB3-CA variant (95% CI=0.9–3.0). Among CPO cases, there was a 3-fold risk with two copies of the TGFA TaqI A2 allele, and no increase with one copy. Assuming this to be a recessive effect, we estimated a 3.2-fold risk among babies homozygous for the variant (95% CI=1.1–9.2). Furthermore, there was strong evidence of gene-gene interaction. While there was only a weak association of the MSX1-CA variant with CPO, the risk was 9.7-fold (95% CI=2.9–32) among children homozygous for both the MSX1-CA A4 allele and the TGFA A2 allele. No association of CPO with the TGFA variant was seen among the other MSX1-CA genotypes. In conclusion, no strong associations were found between CL±P and variants at these three genes. There was a possible recessive effect of the TGFA TaqI variant on the risk of CPO, with a 3-fold risk among children homozygous for the variant. The effect of this TGFA genotype was even stronger among children homozygous for the MSX1-CA A4 allele, raising the possibility of interaction between these two genes. Genet Epidemiol 24:230–239, 2003. © 2003 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: It is shown that there is extensive variation in LD even for closely linked loci, implying that several markers may be needed to detect a disease locus, and in general, best results will be obtained if the frequencies of marker alleles are at least as large as the frequency of the causative mutation.
Abstract: Association studies depend on linkage disequilibrium (LD) between a causative mutation and linked marker loci. Selecting markers that give the best chance of showing useful levels of LD with the causative mutation will increase the chances of successfully detecting an association. This report examines the variation in the extent of LD between a disease locus and one or two diallelic marker loci (termed single nucleotide polymorphisms or SNPs). We use a simulation method based on the neutral coalescent in a population of variable size to find the distribution of LD as a function of allele frequencies, the recombination rate, and the population history. Given that LD exists, the allele frequencies determine if a site will be useful for detecting an association with the disease mutation. We show that there is extensive variation in LD even for closely linked loci, implying that several markers may be needed to detect a disease locus. The distribution of LD between common variants is strongly influenced by ancestral population size. We show that in general, best results will be obtained if the frequencies of marker alleles are at least as large as the frequency of the causative mutation. Haplotypes of two or more SNPs generally have a higher probability than individual SNPs of showing useful LD with a disease mutation, although exceptions are described.

Journal Article
TL;DR: A meta-analysis to identify genetic regions that show evidence for susceptibility genes across studies found the strongest evidence for linkage was found on chromosomes 3q26–27 and 2q34–37, which might contain susceptibility genes for CHD.

Journal ArticleDOI
TL;DR: It is proved that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections, and introduced a computationally efficient form of forward stepwise regression against the FDR methods.
Abstract: It is increasingly recognized that multiple genetic variants, within the same or different genes, combine to affect liability for many common diseases. Indeed, the variants may interact among themselves and with environmental factors. Thus realistic genetic/statistical models can include an extremely large number of parameters, and it is by no means obvious how to find the variants contributing to liability. For models of multiple candidate genes and their interactions, we prove that statistical inference can be based on controlling the false discovery rate (FDR), which is defined as the expected number of false rejections divided by the number of rejections. Controlling the FDR automatically controls the overall error rate in the special case that all the null hypotheses are true. So do more standard methods such as Bonferroni correction. However, when some null hypotheses are false, the goals of Bonferroni and FDR differ, and FDR will have better power. Model selection procedures, such as forward stepwise regression, are often used to choose important predictors for complex models. By analysis of simulations of such models, we compare a computationally efficient form of forward stepwise regression against the FDR methods. We show that model selection includes numerous genetic variants having no impact on the trait, whereas FDR maintains a false-positive rate very close to the nominal rate. With good control over false positives and better power than Bonferroni, the FDR-based methods we introduce present a viable means of evaluating complex, multivariate genetic models. Naturally, as for any method seeking to explore complex genetic models, the power of the methods is limited by sample size and model complexity.

Journal ArticleDOI
TL;DR: It is shown that sum statistics can often be successfully applied when marker‐by‐marker approaches fail to detect association, and a method is presented that takes the correlation structure among marker loci into account when marker statistics are combined.
Abstract: In complex traits, multiple disease loci presumably interact to produce the disease. For this reason, even with highresolution single nucleotide polymorphism (SNP) marker maps, it has been difficult to map susceptibility loci by conventional locus-by-locus methods. Fine mapping strategies are needed that allow for the simultaneous detection of interacting disease loci while handling large numbers of densely spaced markers. For this purpose, sum statistics were recently proposed as a first-stage analysis method for case-control association studies with SNPs. Via sums of single-marker statistics, information over multiple disease-associated markers is combined and, with a global significance value a, a small set of ‘‘interesting’’ markers is selected for further analysis. Here, the statistical properties of such approaches are examined by computer simulation. It is shown that sum statistics can often be successfully applied when marker-by-marker approaches fail to detect association. Compared with Bonferroni or False Discovery Rate (FDR) procedures, sum statistics have greater power, and more disease loci can be detected. However, in studies with tightly linked markers, simple sum statistics can be suboptimal, since the intermarker correlation is ignored. A method is presented that takes the correlation structure among marker loci into account when marker statistics are combined. Genet Epidemiol 25:350–359, 2003. & 2003 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: This article reviews the various technical approaches currently available for proteomics and explores ways in which the emerging field of proteomics, the study of proteins in a cell, may inform the approach to gene mapping.
Abstract: Mapping of the human genome has the potential to transform the traditional methods of genetic epidemiology. The complete draft sequence of the 3.3 billion nucleotides comprising the genome is now available over the Internet, including the location and nearly complete sequence of the 26,000 to 31,000 protein-encoding genes. However, aside from water, almost everything in the human body is either made of, or by, proteins. Although the DNA code provides the instructions for their amino acid sequence, there are an estimated 1.5 million proteins. Thus, the correlation between DNA sequence and protein is low, reflecting alternate splicing as well as post-translational modification. The purpose of this article is to explore ways in which the emerging field of proteomics, the study of proteins in a cell, may inform our approach to gene mapping. This article reviews the various technical approaches currently available for proteomics. Technologies are available to quantify protein expression (and compare normal versus disease states), identify proteins through comparison with sequence information in databases or direct sequencing (which can then be mapped to chromosomal locations to ensure appropriate markers), elucidate protein-protein interactions (which may underlie disease), determine localization of proteins within the cell (abnormal trafficking of proteins could have an inherited basis), and characterize modifications of proteins (which is relevant to modifier gene candidates). Several examples are presented to illustrate the potential application of proteomics to the field of genetic epidemiology, and we conclude with various considerations regarding design and analysis.

Journal ArticleDOI
TL;DR: It is found that the DNA pooling of two individuals can be more cost‐effective than individual genotypings, especially when a large number of haplotype systems are studied.
Abstract: Genome-wide association studies may be necessary to identify genes underlying certain complex diseases. Because such studies can be extremely expensive, DNA pooling has been introduced, as it may greatly reduce the genotyping burden. Parallel to DNA pooling developments, the importance of haplotypes in genetic studies has been amply demonstrated in the literature. However, DNA pooling of a large number of samples may lose haplotype information among tightly linked genetic markers. Here, we examine the cost-effectiveness of DNA pooling in the estimation of haplotype frequencies from population data. When the maximum likelihood estimates of haplotype frequencies are obtained from pooled samples, we compare the overall cost of the study, including both DNA collection and marker genotyping, between the individual genotyping strategy and the DNA pooling strategy. We find that the DNA pooling of two individuals can be more cost-effective than individual genotypings, especially when a large number of haplotype systems are studied.

Journal ArticleDOI
TL;DR: A method to analyze haplotype effects using ideas derived from Bayesian spatial statistics, and defines a distance metric to specify the appropriate level of closeness between the two haplotypes.
Abstract: We propose a method to analyze haplotype effects using ideas derived from Bayesian spatial statistics. We assume that two haplotypes that are similar to one another in structure are likely to have similar risks, and define a distance metric to specify the appropriate level of closeness between the two haplotypes. Through the choice of distance metric, varying levels of population genetics theory can be incorporated into the modeling process, including some that allow estimation of the location of the disease causing mutation(s). This location can be estimated, along with the other parameters of the model, using Markov chain Monte Carlo (MCMC) estimation methods. We demonstrate the effectiveness of the model on two real datasets, a well-known dataset used to fine-map the gene for cystic fibrosis, and one used to localize the gene for Friedreich's ataxia.

Journal ArticleDOI
TL;DR: Little evidence of interaction is found between the child's genotypes at TGFA TaqI and various exposures for cleft palate, with the possible exception of folic acid intake.
Abstract: We have previously reported a threefold risk of cleft palate only (CPO) among children homozygous for the less common allele A2 at the TaqI marker site of the transforming growth factor alpha gene (TGFA) (Jugessur et al. [2003a] Genet. Epidemiol. 24:230-239). Here we assess possible interaction between the child's TGFA TaqI A2A2 genotype and maternal cigarette smoking, alcohol consumption, use of multivitamins and folic acid. This was done by comparing the strength of genetic associations between strata of exposed and unexposed case-parent triads. We also looked for possible gene-gene interaction with the polymorphic variant C677T of the folic acid-metabolizing gene MTHFR. We analyzed a total of 88 complete CPO triads selected from a population-based study of orofacial clefts in Norway (May 1996-1998). No evidence of interaction was observed with either smoking or alcohol use. The risk associated with two copies of the A2 allele at TGFA TaqI was strong among children whose mothers did not use folic acid (relative risk=4.5, 95% confidence interval=1.3-15.7), and was only marginal among children whose mothers reported using folic acid (RR=1.4, 95% CI=0.2-12.7). Although the interaction between the child's genotypes at TGFA TaqI and MTHFR-C677T was not statistically significant, the effect of the TGFA TaqI A2A2 genotype appeared to be stronger among children with either one or two copies of the T-allele at C677T (RR=4.0, 95% CI=1.1-13.9) compared to children homozygous for the C-allele (RR=1.7, 95% CI=0.2-15.7). In conclusion, we find little evidence of interaction between the child's genotypes at TGFA TaqI and various exposures for cleft palate, with the possible exception of folic acid intake.

Journal ArticleDOI
TL;DR: The large sample‐size requirements represent a formidable challenge to studies of this type, including attributable risk for the disease allele, inheritance mechanism, disease prevalence, and for sibling case‐control designs, extragenetic familial aggregation of disease and recombination.
Abstract: Most previous sample size calculations for case-control studies to detect genetic associations with disease assumed that the disease gene locus is known, whereas, in fact, markers are used. We calculated sample sizes for unmatched case-control and sibling case-control studies to detect an association between a biallelic marker and a disease governed by a putative biallelic disease locus. Required sample sizes increase with increasing discrepancy between the marker and disease allele frequencies, and with less-than-maximal linkage disequilibrium between the marker and disease alleles. Qualitatively similar results were found for studies of parent offspring triads based on the transmission disequilibrium test (Abel and Muller-Myhsok, 1998, Am. J. Hum. Genet. 63:664-667; Tu and Whittemore, 1999, Am. J. Hum. Genet. 64:641-649). We also studied other factors affecting required sample size, including attributable risk for the disease allele, inheritance mechanism, disease prevalence, and for sibling case-control designs, extragenetic familial aggregation of disease and recombination. The large sample-size requirements represent a formidable challenge to studies of this type.

Journal ArticleDOI
TL;DR: It is shown analytically and by computer simulation that unequal amplification should be taken into account when testing for differences in allele frequencies between pools, and a simple modification of the standard χ2 test is suggested to control the type I error rate in the presence of experimental error variation.
Abstract: Association studies using DNA pools are in principle powerful and efficient to detect association between a marker allele and disease status, e.g., in a case-control design. A common observation with the use of DNA pools is that the two alleles at a polymorphic SNP locus are not amplified in equal amounts in heterozygous individuals. In addition, there are pool-specific experimental errors so that there is variation in the estimates of allele frequencies from different pools that are from the same individuals. As a result of these additional sources of variation, the outcome of an experiment is an estimated count of alleles rather than the usual outcome in terms of observed counts. In this study, we show analytically and by computer simulation that unequal amplification should be taken into account when testing for differences in allele frequencies between pools, and suggest a simple modification of the standard chi(2) test to control the type I error rate in the presence of experimental error variation. The impact of experimental errors on the power of association studies is shown.

Journal ArticleDOI
TL;DR: This study identifies eight single‐nucleotide polymorphisms (SNPs) in the MSX1 gene, and presents genotype results for these SNPs in a set of 206 oral cleft cases and their parents, and suggests an interaction between variation in theMSx1 gene and exposure to maternal smoking.
Abstract: Oral clefts, one of the most common forms of birth defects, are considered to be of complex etiology, including both genetic and environmental causes. To date, however, no particular genetic cause has been confirmed for isolated, nonsyndromic oral clefts. Previous case-control and family-based association studies reported an association between an intronic CA repeat of the MSX1 gene and risk for oral clefts. In this study, we identify eight single-nucleotide polymorphisms (SNPs) in the MSX1 gene, and present genotype results for these SNPs in a set of 206 oral cleft cases and their parents. We performed single-marker and haplotype-based transmission disequilibrium tests (TDTs), and tested for evidence of interaction between MSX1 haplotypes and exposure to maternal smoking in the first trimester, using a case-only approach. The haplotype TDT analyses further implicate this gene, or region, in controlling the risk for oral clefts, particularly for cleft palate. In addition, case-only haplotype analyses suggest an interaction between variation in the MSX1 gene and exposure to maternal smoking. This study encourages further focus on the MSX1 gene region to ultimately determine specific variants predisposing to oral clefts.

Journal ArticleDOI
TL;DR: SEGPATH models have been extended to cover “model‐free” robust, variance‐components linkage analysis, based on identity‐by‐descent (IBD) sharing, to give added power to detect linkage as well as to protect against spuriously inferring linkage.
Abstract: A general-purpose modeling framework for performing path and segregation analysis jointly, called SEGPATH (Province and Rao [1995] Stat. Med. 7:185–198), has been extended to cover ‘‘model-free’’ robust, variance-components linkage analysis, based on identity-by-descent (IBD) sharing. These extended models can be used to analyze linkage to a single marker or to perform multipoint linkage analysis, with a single phenotype or multivariate vector of phenotypes, in pedigrees. Within a single, consistent approach, SEGPATH models can perform segregation analysis, path analysis, linkage analysis, or combinations thereof. SEGPATH models can incorporate environmental or other measured covariate fixed effects (including measured genotypes), genotype-specific covariate effects, population heterogeneity models, repeatedmeasures models, longitudinal models, autoregressive models, developmental models, gene-by-environment interaction models, etc., with or without linkage components. The data analyzed can have any missing value structure (assumed missing at random), with entire individuals missing, or missing on one or more measurements. Corrections for ascertainment can be made on a vector of phenotypes and/or other measures. Because of the flexibility of the class of models, the SEGPATH approach can also be used in nongenetic applications where there is a hierarchical structure, such as longitudinal, repeated-measures, time series, or nested models. A variety of specific models are provided, as well as some comparisons with other linkage analysis models. Particular applications demonstrate the importance of correctly accounting for the extraneous sources of familial resemblance, as can be done easily with these SEGPATH models, so as to give added power to detect linkage as well as to protect against spuriously inferring linkage. Genet Epidemiol 24:128–138,

Journal ArticleDOI
TL;DR: A fine-mapping simulation in which the region containing a significant linkage at a 10‐centiMorgan (cM) resolution was fine‐mapped at 2, 1, and 0.5 cM found that if the QTL accounts for a small proportion of the variation, as is the case for realistic traits,fine‐mapping has little value.
Abstract: Once a significant linkage is found, an important goal is reducing the error in the estimated location of the linked locus. A common approach to reducing location error, called fine-mapping, is the genotyping of additional markers in the linked region to increase the genetic information. The utility of fine-mapping for quantitative trait linkage analysis is largely unknown. To explore this issue, we performed a fine-mapping simulation in which the region containing a significant linkage at a 10-centiMorgan (cM) resolution was fine-mapped at 2, 1, and 0.5 cM. We simulated six quantitative trait models in which the proportion of variation due to the quantitative trait locus (QTL) ranged from 0.20-0.90. We used four sampling designs that were all combinations of 100 and 200 families of sizes 5 and 7. Variance components linkage analysis (Genehunter) was performed until 1,000 replicates were found with a maximum lodscore greater than 3.0. For each of these 1,000 replications, we repeated the linkage analysis three times: once for each of the fine-map resolutions. For the most realistic model, reduction in the average location error ranged from 3-15% for 2-cM fine-mapping and from 3-18% for 1-cM fine-mapping, depending on the number of families and family size. Fine-mapping at 0.5 cM did not differ from the 1-cM results. Thus, if the QTL accounts for a small proportion of the variation, as is the case for realistic traits, fine-mapping has little value.

Journal ArticleDOI
TL;DR: This paper summarizes 13 contributions to Genetic Analysis Workshop 13, which include a wide range of methods for genetic analysis of longitudinal data in families, and indicates how one might form a more powerful test for finding a slope‐affecting gene.
Abstract: Longitudinal family studies provide a valuable resource for investigating genetic and environmental factors that influence long-term averages and changes over time in a complex trait. This paper summarizes 13 contributions to Genetic Analysis Workshop 13, which include a wide range of methods for genetic analysis of longitudinal data in families. The methods can be grouped into two basic approaches: 1) two-step modeling, in which repeated observations are first reduced to one statistic per subject (e.g., a mean or slope), after which this statistic is used in a standard genetic analysis, or 2) joint modeling, in which genetic and longitudinal model parameters are estimated simultaneously in a single analysis. In applications to Framingham Heart Study data, contributors collectively reported evidence for genes that affected trait mean on chromosomes 1, 2, 3, 5, 8, 9, 10, 13, and 17, but most did not find genes affecting slope. Applications to simulated data suggested that even for a gene that only affected slope, use of a mean-type statistic could provide greater power than a slope-type statistic for detecting that gene. We report on the results of a small experiment that sheds some light on this apparently paradoxical finding, and indicate how one might form a more powerful test for finding a slope-affecting gene. Several areas for future research are discussed.

Journal ArticleDOI
TL;DR: A Bayesian model‐based method for multilocus association analysis of quantitative and qualitative traits is presented, which selects a trait‐associated subset of markers among candidates, and is equally applicable for analyzing wide chromosomal segments and small candidate regions.
Abstract: A Bayesian model-based method for multilocus association analysis of quantitative and qualitative (binary) traits is presented. The method selects a trait-associated subset of markers among candidates, and is equally applicable for analyzing wide chromosomal segments (genome scans) and small candidate regions. The method can be applied in situations involving missing genotype data. The number of trait loci, their marker positions, and the magnitudes of their gene effects (strengths of association) are all estimated simultaneously. The inference of parameters is based on their posterior distributions, which are obtained through Markov chain Monte Carlo simulations. The strengths of the approach are: 1) flexible use of oligogenic models with unknown number of loci, 2) performing the estimation of association jointly with model selection, and 3) avoidance of the multiple testing problem, which typically complicates the approaches based on association testing. The performance of the method was tested and compared to the multilocus conditional search procedure by analyzing two simulated data sets. We also applied the method to cystic fibrosis haplotype data (two-locus haplotypes), where gene position has already been identified. The method is implemented as a software package, which is freely available for research purposes under the name BAMA.

Journal ArticleDOI
TL;DR: The results of simulation studies, and a worked data example using a family data set ascertained through probands with schizophrenia, suggest that utilizing covariate information can yield substantial efficiency gains in localizing susceptibility genes.
Abstract: Recently, Liang et al. ([2001] Hum. Hered. 51:64-78) proposed a general multipoint linkage method for estimating the chromosomal position of a putative susceptibility locus. Their technique is computationally simple and does not require specification of penetrance or a mode of inheritance. In complex genetic diseases, covariate data may be available which reflect etiologic or locus heterogeneity. We developed approaches to incorporating covariates into the method of Liang et al. ([2001] Hum. Hered. 51:64-78) with particular attention to exploiting age-at-onset information. The results of simulation studies, and a worked data example using a family data set ascertained through probands with schizophrenia, suggest that utilizing covariate information can yield substantial efficiency gains in localizing susceptibility genes.

Journal ArticleDOI
TL;DR: It appears that the S447X variant of LPL may be another rare example (like APOE4, factor V‐Leiden, and PPARγ Pro12Ala) of a common variant predisposing to a common disorder.
Abstract: S447X, a serine substitution by a stop codon on base 99 of exon 9 of the lipoprotein lipase (LPL) gene, has beneficial effects on blood lipids. Other LPL alleles are associated with lipid levels, but whether one of these variants predominates remains elusive. We performed a systematic survey to identify single-nucleotide polymorphisms (SNPs) in all 10 LPL exons and flanking regions by resequencing the gene in 95 subjects. Of 24 variants, 14 were common (> or = 3%). We assayed the common SNPs in 186 cases with atherogenic lipid profiles (low HDL, high LDL) and 185 nonatherogenic controls (high HDL, low LDL). Only S447X and exons 6 (base +73) and 10 (base -11) were individually associated with case-control status (P<0.05, adjusted for major nongenetic covariates with known lipid effects). There were no significant SNP x gender interactions. In adjusted multi-SNP and haplotypic analyses, S447X was interpretable as the sole predictor, with a 2-3-fold reduction in the odds of being atherogenic vs. nonatherogenic (adjusted OR, 0.39; 95% CI, 0.21-0.73). S447X and base -11 of exon 10 were statistically interchangeable because they are strongly associated (r=0.92, P<0.0001), but we posit that the LPL association with lipid profile is more likely attributable to the functional S447X rather than the nonfunctional exon 10 SNP. It appears that the S447X variant of LPL may be another rare example (like APOE4, factor V-Leiden, and PPAR gamma Pro12Ala) of a common variant predisposing to a common disorder.

Journal ArticleDOI
TL;DR: This work describes the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure, and describes two such covariate models: the random effects model and the marginal model.
Abstract: We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates.

Journal ArticleDOI
TL;DR: The results indicate that the effect of age is modified by MTHFR genotype, which explained 6% of the phenotypic variation in tHcy.
Abstract: Elevation in plasma total homocysteine (tHcy) is believed to be causally related to cardiovascular disease. Like age and sex, the thermolabile variant of methylenetetrahydrofolate reductase (MTHFR(C677T)) is an important nonmodifiable determinant of tHcy, which may be considered when describing normal ranges of tHcy in the general population. We investigated the simultaneous effect of sex, age, and MTHFR(C677T) genotype on the distribution of tHcy in a cross-sectional study design. THcy concentrations and MTHFR(C677T) genotype were determined in a population-based sample of 2,788 Danish men and women aged 30–60 years participating in the Inter99 Study. The prevalences of MTHFR(C677T) genotypes were 48.8% (CC), 42.4% (CT), and 8.8% (TT). The overall median tHcy was 8.1 µmol/l, and the 2.5–97.5 percentiles were 4.8–17.8 µmol/l. The estimated proportionally higher level of tHcy in men compared to women was 14.3% (P<0.001). A significant interaction term was found between age and MTHFR(C677T) genotype (P<0.001). The estimated changes in tHcy per 5 years of age were 1.5% in CC individuals (P<0.01), 2.1% in CT individuals (P<0.001), and −4.1% in TT individuals (P<0.01). The T allele was associated with elevated tHcy. However, the proportionally higher level of tHcy in TT individuals compared to CT and CC individuals decreased with increasing age. The MTHFR(C677T) polymorphism explained 6% of the phenotypic variation in tHcy. In conclusion, we found that tHcy is associated with sex, age, and MTHFR genotype. Our results indicate that the effect of age is modified by MTHFR genotype. Genet Epidemiol 24:322–330, 2003. © 2003 Wiley-Liss, Inc.