scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2003"


Journal ArticleDOI
01 Aug 2003-Genetics
TL;DR: Extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data are described and methods that allow for linkage between loci are developed, which allows identification of subtle population subdivisions that were not detectable using the existing method.
Abstract: We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations (“admixture linkage disequilibium”). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori , and drift in populations of Drosophila melanogaster . The methods are implemented in a program, structure , version 2.0, which is available at http://pritch.bsd.uchicago.edu.

7,615 citations


Journal ArticleDOI
01 Mar 2003-Genetics
TL;DR: In this article, a new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented, and the method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest.
Abstract: A new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented. The method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest. The method is implemented in a computer program that relies on Markov chain Monte Carlo techniques to carry out the estimation of posterior probabilities. The program can be used with allozyme, microsatellite, RFLP, SNP, and other kinds of genotype data. We relax several assumptions of early methods for detecting recent immigrants, using genotype data; most significantly, we allow genotype frequencies to deviate from Hardy-Weinberg equilibrium proportions within populations. The program is demonstrated by applying it to two recently published microsatellite data sets for populations of the plant species Centaurea corymbosa and the gray wolf species Canis lupus. A computer simulation study suggests that the program can provide highly accurate estimates of migration rates and individual migrant ancestries, given sufficient genetic differentiation among populations and sufficient numbers of marker loci.

1,704 citations


Journal ArticleDOI
01 Aug 2003-Genetics
TL;DR: In this article, a Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times.
Abstract: The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be ∼20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.

1,016 citations


Journal ArticleDOI
01 Dec 2003-Genetics
TL;DR: The model overcomes limitations of existing approaches to understanding, summarizing, and interpreting LD by relating patterns of LD directly to the underlying recombination process and is competitive with the very best of current available methods for recombination rate estimates.
Abstract: We introduce a new statistical model for patterns of linkage disequilibrium (LD) among multiple SNPs in a population sample. The model overcomes limitations of existing approaches to understanding, summarizing, and interpreting LD by (i) relating patterns of LD directly to the underlying recombination process; (ii) considering all loci simultaneously, rather than pairwise; (iii) avoiding the assumption that LD necessarily has a "block-like" structure; and (iv) being computationally tractable for huge genomic regions (up to complete chromosomes). We examine in detail one natural application of the model: estimation of underlying recombination rates from population data. Using simulation, we show that in the case where recombination is assumed constant across the region of interest, recombination rate estimates based on our model are competitive with the very best of current available methods. More importantly, we demonstrate, on real and simulated data, the potential of the model to help identify and quantify fine-scale variation in recombination rate from population data. We also outline how the model could be useful in other contexts, such as in the development of more efficient haplotype-based methods for LD mapping.

992 citations


Journal ArticleDOI
01 Jan 2003-Genetics
TL;DR: A Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design is introduced, suggesting that this method is capable of estimating a population subst structure, while not artificially enforcing a substructure when it does not exist.
Abstract: We introduce a Bayesian method for estimating hidden population substructure using multilocus molecular markers and geographical information provided by the sampling design. The joint posterior distribution of the substructure and allele frequencies of the respective populations is available in an analytical form when the number of populations is small, whereas an approximation based on a Markov chain Monte Carlo simulation approach can be obtained for a moderate or large number of populations. Using the joint posterior distribution, posteriors can also be derived for any evolutionary population parameters, such as the traditional fixation indices. A major advantage compared to most earlier methods is that the number of populations is treated here as an unknown parameter. What is traditionally considered as two genetically distinct populations, either recently founded or connected by considerable gene flow, is here considered as one panmictic population with a certain probability based on marker data and prior information. Analyses of previously published data on the Moroccan argan tree (Argania spinosa) and of simulated data sets suggest that our method is capable of estimating a population substructure, while not artificially enforcing a substructure when it does not exist. The software (BAPS) used for the computations is freely available from http://www.rni.helsinki.fi/~mjs.

855 citations


Journal ArticleDOI
01 Jun 2003-Genetics
TL;DR: A comprehensive analysis of approximately 1900 ethyl methanesulfonate (EMS)-induced mutations in 192 Arabidopsis thaliana target genes from a large-scale TILLING reverse-genetic project, about two orders of magnitude larger than previous such efforts.
Abstract: Chemical mutagenesis has been the workhorse of traditional genetics, but it has not been possible to determine underlying rates or distributions of mutations from phenotypic screens. However, reverse-genetic screens can be used to provide an unbiased ascertainment of mutation statistics. Here we report a comprehensive analysis of approximately 1900 ethyl methanesulfonate (EMS)-induced mutations in 192 Arabidopsis thaliana target genes from a large-scale TILLING reverse-genetic project, about two orders of magnitude larger than previous such efforts. From this large data set, we are able to draw strong inferences about the occurrence and randomness of chemically induced mutations. We provide evidence that we have detected the large majority of mutations in the regions screened and confirm the robustness of the high-throughput TILLING method; therefore, any deviations from randomness can be attributed to selectional or mutational biases. Overall, we detect twice as many heterozygotes as homozygotes, as expected; however, for mutations that are predicted to truncate an encoded protein, we detect a ratio of 3.6:1, indicating selection against homozygous deleterious mutations. As expected for alkylation of guanine by EMS, >99% of mutations are G/C-to-A/T transitions. A nearest-neighbor bias around the mutated base pair suggests that mismatch repair counteracts alkylation damage.

609 citations


Journal ArticleDOI
01 Aug 2003-Genetics
TL;DR: The consequences of variable rates of clonal reproduction on the population genetics of neutral markers are explored in diploid organisms within a subdivided population (island model) using both analytical and stochastic simulation approaches.
Abstract: The consequences of variable rates of clonal reproduction on the population genetics of neutral markers are explored in diploid organisms within a subdivided population (island model). We use both analytical and stochastic simulation approaches. High rates of clonal reproduction will positively affect heterozygosity. As a consequence, nearly twice as many alleles per locus can be maintained and population differentiation estimated as F(ST) value is strongly decreased in purely clonal populations as compared to purely sexual ones. With increasing clonal reproduction, effective population size first slowly increases and then points toward extreme values when the reproductive system tends toward strict clonality. This reflects the fact that polymorphism is protected within individuals due to fixed heterozygosity. Contrarily, genotypic diversity smoothly decreases with increasing rates of clonal reproduction. Asexual populations thus maintain higher genetic diversity at each single locus but a lower number of different genotypes. Mixed clonal/sexual reproduction is nearly indistinguishable from strict sexual reproduction as long as the proportion of clonal reproduction is not strongly predominant for all quantities investigated, except for genotypic diversities (both at individual loci and over multiple loci).

538 citations


Journal ArticleDOI
01 Jul 2003-Genetics
TL;DR: Computer simulation is used to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination and finds that the LRT is robust to low levels of recombinations, but at higher levels, the type I error rate can be as high as 90%, and the test often mistakes recombination as evidence forpositive selection.
Abstract: Maximum-likelihood methods based on models of codon substitution accounting for heterogeneous selective pressures across sites have proved to be powerful in detecting positive selection in protein-coding DNA sequences. Those methods are phylogeny based and do not account for the effects of recombination. When recombination occurs, such as in population data, no unique tree topology can describe the evolutionary history of the whole sequence. This violation of assumptions raises serious concerns about the likelihood method for detecting positive selection. Here we use computer simulation to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination. We examine three tests based on different models of variable selective pressures among sites. Sequences are simulated using a coalescent model with recombination and analyzed using codon-based likelihood models ignoring recombination. We find that the LRT is robust to low levels of recombination (with fewer than three recombination events in the history of a sample of 10 sequences). However, at higher levels of recombination, the type I error rate can be as high as 90%, especially when the null model in the LRT is unrealistic, and the test often mistakes recombination as evidence for positive selection. The test that compares the more realistic models M7 (beta) against M8 (beta and omega) is more robust to recombination, where the null model M7 allows the positive selection pressure to vary between 0 and 1 (and so does not account for positive selection), and the alternative model M8 allows an additional discrete class with omega = d(N)/d(S) that could be estimated to be >1 (and thus accounts for positive selection). Identification of sites under positive selection by the empirical Bayes method appears to be less affected than the LRT by recombination.

538 citations


Journal ArticleDOI
01 Dec 2003-Genetics
TL;DR: Comparison of diversity in equivalent samples of inbreds and open-pollinated landraces revealed that maize in breds capture <80% of the alleles in the landrace, suggesting that landraced animals can provide additional genetic diversity for maize breeding.
Abstract: Two hundred and sixty maize inbred lines, representative of the genetic diversity among essentially all public lines of importance to temperate breeding and many important tropical and subtropical lines, were assayed for polymorphism at 94 microsatellite loci. The 2039 alleles identified served as raw data for estimating genetic structure and diversity. A model-based clustering analysis placed the inbred lines in five clusters that correspond to major breeding groups plus a set of lines showing evidence of mixed origins. A "phylogenetic" tree was constructed to further assess the genetic structure of maize inbreds, showing good agreement with the pedigree information and the cluster analysis. Tropical and subtropical inbreds possess a greater number of alleles and greater gene diversity than their temperate counterparts. The temperate Stiff Stalk lines are on average the most divergent from all other inbred groups. Comparison of diversity in equivalent samples of inbreds and open-pollinated landraces revealed that maize inbreds capture <80% of the alleles in the landraces, suggesting that landraces can provide additional genetic diversity for maize breeding. The contributions of four different segments of the landrace gene pool to each inbred group's gene pool were estimated using a novel likelihood-based model. The estimates are largely consistent with known histories of the inbreds and indicate that tropical highland germplasm is poorly represented in maize inbreds. Core sets of inbreds that capture maximal allelic richness were defined. These or similar core sets can be used for a variety of genetic applications in maize.

535 citations


Journal ArticleDOI
01 Mar 2003-Genetics
TL;DR: The distribution of late-flowering phenotypes in a mapping population segregating met1-1 indicates that the flowering-time phenotype is caused by the accumulation of inherited defects at loci unlinked to the met1 mutation, which led to a global reduction of cytosine methylation throughout the genome.
Abstract: We describe the isolation and characterization of two missense mutations in the cytosine-DNA-methyltransferase gene, MET1, from the flowering plant Arabidopsis thaliana. Both missense mutations, which affect the catalytic domain of the protein, led to a global reduction of cytosine methylation throughout the genome. Surprisingly, the met1-2 allele, with the weaker DNA hypomethylation phenotype, alters a well-conserved residue in methyltransferase signature motif I. The stronger met1-1 allele caused late flowering and a heterochronic delay in the juvenile-to-adult rosette leaf transition. The distribution of late-flowering phenotypes in a mapping population segregating met1-1 indicates that the flowering-time phenotype is caused by the accumulation of inherited defects at loci unlinked to the met1 mutation. The delay in flowering time is due in part to the formation and inheritance of hypomethylated fwa epialleles, but inherited defects at other loci are likely to contribute as well. Centromeric repeat arrays hypomethylated in met1-1 mutants are partially remethylated when introduced into a wild-type background, in contrast to genomic sequences hypomethylated in ddm1 mutants. ddm1 met1 double mutants were constructed to further our understanding of the mechanism of DDM1 action and the interaction between two major genetic loci affecting global cytosine methylation levels in Arabidopsis.

529 citations


Journal ArticleDOI
01 Apr 2003-Genetics
TL;DR: This paper presents a simple test based on a randomization procedure of allele sizes to determine whether stepwise-like mutations contributed to genetic differentiation, a nonsignificant test meaning that allele identity-based statistics perform better than allele size-based ones.
Abstract: The mutation process at microsatellite loci typically occurs at high rates and with stepwise changes in allele sizes, features that may introduce bias when using classical measures of population differentiation based on allele identity (e.g., F(ST), Nei's Ds genetic distance). Allele size-based measures of differentiation, assuming a stepwise mutation process [e.g., Slatkin's R(ST), Goldstein et al.'s (deltamu)(2)], may better reflect differentiation at microsatellite loci, but they suffer high sampling variance. The relative efficiency of allele size- vs. allele identity-based statistics depends on the relative contributions of mutations vs. drift to population differentiation. We present a simple test based on a randomization procedure of allele sizes to determine whether stepwise-like mutations contributed to genetic differentiation. This test can be applied to any microsatellite data set designed to assess population differentiation and can be interpreted as testing whether F(ST) = R(ST). Computer simulations show that the test efficiently identifies which of F(ST) or R(ST) estimates has the lowest mean square error. A significant test, implying that R(ST) performs better than F(ST), is obtained when the mutation rate, mu, for a stepwise mutation process is (a) >/= m in an island model (m being the migration rate among populations) or (b) >/= 1/t in the case of isolated populations (t being the number of generations since population divergence). The test also informs on the efficiency of other statistics used in phylogenetical reconstruction [e.g., Ds and (deltamu)(2)], a nonsignificant test meaning that allele identity-based statistics perform better than allele size-based ones. This test can also provide insights into the evolutionary history of populations, revealing, for example, phylogeographic patterns, as illustrated by applying it on three published data sets.

Journal ArticleDOI
01 Jul 2003-Genetics
TL;DR: A new general method is introduced that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC) and it is concluded that these have an approximately equivalent effect.
Abstract: This article introduces a new general method for genealogical inference that samples independent genealogical histories using importance sampling (IS) and then samples other parameters with Markov chain Monte Carlo (MCMC). It is then possible to more easily utilize the advantages of importance sampling in a fully Bayesian framework. The method is applied to the problem of estimating recent changes in effective population size from temporally spaced gene frequency data. The method gives the posterior distribution of effective population size at the time of the oldest sample and at the time of the most recent sample, assuming a model of exponential growth or decline during the interval. The effect of changes in number of alleles, number of loci, and sample size on the accuracy of the method is described using test simulations, and it is concluded that these have an approximately equivalent effect. The method is used on three example data sets and problems in interpreting the posterior densities are highlighted and discussed.

Journal ArticleDOI
01 Jun 2003-Genetics
TL;DR: The map-based cloning of the leaf rust resistance gene Lr21, previously mapped to a gene-rich region at the distal end of chromosome arm 1DS of bread wheat, opens the door for cloning of many crop-specific agronomic traits located in the gene- rich regions of bread Wheat.
Abstract: We report the map-based cloning of the leaf rust resistance gene Lr21, previously mapped to a gene-rich region at the distal end of chromosome arm 1DS of bread wheat (Triticum aestivum L.). Molecular cloning of Lr21 was facilitated by diploid/polyploid shuttle mapping strategy. Cloning of Lr21 was confirmed by genetic transformation and by a stably inherited resistance phenotype in transgenic plants. Lr21 spans 4318 bp and encodes a 1080-amino-acid protein containing a conserved nucleotide-binding site (NBS) domain, 13 imperfect leucine-rich repeats (LRRs), and a unique 151-amino-acid sequence missing from known NBS-LRR proteins at the N terminus. Fine-structure genetic analysis at the Lr21 locus detected a noncrossover (recombination without exchange of flanking markers) within a 1415-bp region resulting from either a gene conversion tract of at least 191 bp or a double crossover. The successful map-based cloning approach as demonstrated here now opens the door for cloning of many crop-specific agronomic traits located in the gene-rich regions of bread wheat.

Journal ArticleDOI
01 Mar 2003-Genetics
TL;DR: The low level of LD and the limited haplotype diversity suggested that the genome of any given soybean accession is a mosaic of three or four haplotypes, thereby supporting the suggestion of relatively limited genetic variation in cultivated soybean.
Abstract: Single-nucleotide polymorphisms (SNPs) provide an abundant source of DNA polymorphisms in a number of eukaryotic species. Information on the frequency, nature, and distribution of SNPs in plant genomes is limited. Thus, our objectives were (1) to determine SNP frequency in coding and noncoding soybean (Glycine max L. Merr.) DNA sequence amplified from genomic DNA using PCR primers designed to complete genes, cDNAs, and random genomic sequence; (2) to characterize haplotype variation in these sequences; and (3) to provide initial estimates of linkage disequilibrium (LD) in soybean. Approximately 28.7 kbp of coding sequence, 37.9 kbp of noncoding perigenic DNA, and 9.7 kbp of random noncoding genomic DNA were sequenced in each of 25 diverse soybean genotypes. Over the >76 kbp, mean nucleotide diversity expressed as Watterson's theta was 0.00097. Nucleotide diversity was 0.00053 and 0.00111 in coding and in noncoding perigenic DNA, respectively, lower than estimates in the autogamous model species Arabidopsis thaliana. Haplotype analysis of SNP-containing fragments revealed a deficiency of haplotypes vs. the number that would be anticipated at linkage equilibrium. In 49 fragments with three or more SNPs, five haplotypes were present in one fragment while four or less were present in the remaining 48, thereby supporting the suggestion of relatively limited genetic variation in cultivated soybean. Squared allele-frequency correlations (r(2)) among haplotypes at 54 loci with two or more SNPs indicated low genome-wide LD. The low level of LD and the limited haplotype diversity suggested that the genome of any given soybean accession is a mosaic of three or four haplotypes. To facilitate SNP discovery and the development of a transcript map, subsets of four to six diverse genotypes, whose sequence analysis would permit the discovery of at least 75% of all SNPs present in the 25 genotypes as well as 90% of the common (frequency >0.10) SNPs, were identified.

Journal ArticleDOI
01 Mar 2003-Genetics
TL;DR: Although some nonlikelihood estimators exhibit better performance with respect to specific metrics under some conditions, none approach the high level of performance exhibited by the likelihood estimator across all conditions and all metrics of performance.
Abstract: Relatedness between individuals is central to many studies in genetics and population biology. A variety of estimators have been developed to enable molecular marker data to quantify relatedness. Despite this, no effort has been given to characterize the traditional maximum-likelihood estimator in relation to the remainder. This article quantifies its statistical performance under a range of biologically relevant sampling conditions. Under the same range of conditions, the statistical performance of five other commonly used estimators of relatedness is quantified. Comparison among these estimators indicates that the traditional maximum-likelihood estimator exhibits a lower standard error under essentially all conditions. Only for very large amounts of genetic information do most of the other estimators approach the likelihood estimator. However, the likelihood estimator is more biased than any of the others, especially when the amount of genetic information is low or the actual relationship being estimated is near the boundary of the parameter space. Even under these conditions, the amount of bias can be greatly reduced, potentially to biologically irrelevant levels, with suitable genetic sampling. Additionally, the likelihood estimator generally exhibits the lowest root mean-square error, an indication that the bias in fact is quite small. Alternative estimators restricted to yield only biologically interpretable estimates exhibit lower standard errors and greater bias than do unrestricted ones, but generally do not improve over the maximum-likelihood estimator and in some cases exhibit even greater bias. Although some nonlikelihood estimators exhibit better performance with respect to specific metrics under some conditions, none approach the high level of performance exhibited by the likelihood estimator across all conditions and all metrics of performance.

Journal ArticleDOI
01 Dec 2003-Genetics
TL;DR: This study provides a statistical explanation for the Beavis effect and applies the theory to meta-analysis of QTL mapping results and improves success of marker-assisted selection.
Abstract: The core of statistical inference is based on both hypothesis testing and estimation. The use of inferential statistics for QTL identification thus includes estimation of genetic effects and statistical tests. Typically, QTL are reported only when the test statistics reach a predetermined critical value. Therefore, the estimated effects of detected QTL are actually sampled from a truncated distribution. As a result, the expectations of detected QTL effects are biased upward. In a simulation study, William D. Beavis showed that the average estimates of phenotypic variances associated with correctly identified QTL were greatly overestimated if only 100 progeny were evaluated, slightly overestimated if 500 progeny were evaluated, and fairly close to the actual magnitude when 1000 progeny were evaluated. This phenomenon has subsequently been called the Beavis effect. Understanding the theoretical basis of the Beavis effect will help interpret QTL mapping results and improve success of marker-assisted selection. This study provides a statistical explanation for the Beavis effect. The theoretical prediction agrees well with the observations reported in Beavis's original simulation study. Application of the theory to meta-analysis of QTL mapping is discussed.

Journal ArticleDOI
01 Jan 2003-Genetics
TL;DR: Previous moment and maximum-likelihood methods are extended to allow the joint estimation of N(e) and migration rate (m) using genetic samples over space and time, and it is shown that, compared to genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of N
Abstract: In the past, moment and likelihood methods have been developed to estimate the effective population size (N(e)) on the basis of the observed changes of marker allele frequencies over time, and these have been applied to a large variety of species and populations. Such methods invariably make the critical assumption of a single isolated population receiving no immigrants over the study interval. For most populations in the real world, however, migration is not negligible and can substantially bias estimates of N(e) if it is not accounted for. Here we extend previous moment and maximum-likelihood methods to allow the joint estimation of N(e) and migration rate (m) using genetic samples over space and time. It is shown that, compared to genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of N(e), respectively, if it is ignored. Extensive simulations are run to evaluate the newly developed moment and likelihood methods, which yield generally satisfactory estimates of both N(e) and m for populations with widely different effective sizes and migration rates and patterns, given a reasonably large sample size and number of markers.

Journal ArticleDOI
01 Feb 2003-Genetics
TL;DR: A Bayesian regression method is presented to simultaneously estimate genetic effects associated with markers of the entire genome and it is shown that the Bayesian method serves as an alternative or even better QTL mapping method because it produces clearer signals for QTL.
Abstract: Molecular markers have been used to map quantitative trait loci. However, they are rarely used to evaluate effects of chromosome segments of the entire genome. The original interval-mapping approach and various modified versions of it may have limited use in evaluating the genetic effects of the entire genome because they require evaluation of multiple models and model selection. Here we present a Bayesian regression method to simultaneously estimate genetic effects associated with markers of the entire genome. With the Bayesian method, we were able to handle situations in which the number of effects is even larger than the number of observations. The key to the success is that we allow each marker effect to have its own variance parameter, which in turn has its own prior distribution so that the variance can be estimated from the data. Under this hierarchical model, we were able to handle a large number of markers and most of the markers may have negligible effects. As a result, it is possible to evaluate the distribution of the marker effects. Using data from the North American Barley Genome Mapping Project in double-haploid barley, we found that the distribution of gene effects follows closely an L-shaped Gamma distribution, which is in contrast to the bell-shaped Gamma distribution when the gene effects were estimated from interval mapping. In addition, we show that the Bayesian method serves as an alternative or even better QTL mapping method because it produces clearer signals for QTL. Similar results were found from simulated data sets of F 2 and backcross (BC) families.

Journal ArticleDOI
01 May 2003-Genetics
TL;DR: This work provides three lines of evidence that Mus81/Mms4 is not the major meiotic HJ resolvase in S. cerevisiae and reveals the existence of two distinct classes of crossovers in budding yeast.
Abstract: Current models for meiotic recombination require that crossovers derive from the resolution of a double-Holliday junction (dHJ) intermediate In prokaryotes, enzymes responsible for HJ resolution are well characterized but the identification of a eukaryotic nuclear HJ resolvase has been elusive Indirect evidence suggests that MUS81 from humans and fission yeast encodes a HJ resolvase We provide three lines of evidence that Mus81/Mms4 is not the major meiotic HJ resolvase in S cerevisiae: (1) MUS81/MMS4 is required to form only a distinct subset of crossovers; (2) rather than accumulating, dHJ intermediates are reduced in an mms4 mutant; and (3) expression of a bacterial HJ resolvase has no suppressive effect on mus81 meiotic phenotypes Our analysis also reveals the existence of two distinct classes of crossovers in budding yeast Class I is dependent upon MSH4/MSH5 and exhibits crossover interference, while class II is dependent upon MUS81/MMS4 and exhibits no interference mms4 specifically reduces crossing over on small chromosomes, which are known to undergo less interference The correlation between recombination rate and degree of interference to chromosome size may therefore be achieved by modulating the balance between class I/class II crossovers

Journal ArticleDOI
01 Jan 2003-Genetics
TL;DR: In this article, the authors used a denser chromosome 20 marker map and exploited linkage disequilibrium using two distinct approaches to provide strong evidence that a chromosome segment including the gene coding for the growth hormone receptor accounts for at least part of the chromosome 20 QTL effect.
Abstract: We herein report on our efforts to improve the mapping resolution of a QTL with major effect on milk yield and composition that was previously mapped to bovine chromosome 20. By using a denser chromosome 20 marker map and by exploiting linkage disequilibrium using two distinct approaches, we provide strong evidence that a chromosome segment including the gene coding for the growth hormone receptor accounts for at least part of the chromosome 20 QTL effect. By sequencing individuals with known QTL genotype, we identify an F to Y substitution in the transmembrane domain of the growth hormone receptor gene that is associated with a strong effect on milk yield and composition in the general population.

Journal ArticleDOI
01 Oct 2003-Genetics
TL;DR: The authors' phylogenetic analyses show two gene clades within the core eudicots, euAP1 (including Arabidopsis APETALA1 and Antirrhinum SQUAMOSA) and euFUL (includingArabidopsis FRUITFULL), which includes key regulators of floral development that have been implicated in the specification of perianth identity.
Abstract: Phylogenetic analyses of angiosperm MADS-box genes suggest that this gene family has undergone multiple duplication events followed by sequence divergence. To determine when such events have taken place and to understand the relationships of particular MADS-box gene lineages, we have identified APETALA1/FRUITFULL-like MADS-box genes from a variety of angiosperm species. Our phylogenetic analyses show two gene clades within the core eudicots, euAP1 (including Arabidopsis APETALA1 and Antirrhinum SQUAMOSA) and euFUL (including Arabidopsis FRUITFULL). Non-core eudicot species have only sequences similar to euFUL genes (FUL-like). The predicted protein products of euFUL and FUL-like genes share a conserved C-terminal motif. In contrast, predicted products of members of the euAP1 gene clade contain a different C terminus that includes an acidic transcription activation domain and a farnesylation signal. Sequence analyses indicate that the euAP1 amino acid motifs may have arisen via a translational frameshift from the euFUL/FUL-like motif. The euAP1 gene clade includes key regulators of floral development that have been implicated in the specification of perianth identity. However, the presence of euAP1 genes only in core eudicots suggests that there may have been changes in mechanisms of floral development that are correlated with the fixation of floral structure seen in this clade.

Journal ArticleDOI
01 Jun 2003-Genetics
TL;DR: The role of the Ler/Cvi allelic variation in affecting dormancy is discussed in the context of current knowledge of Arabidopsis germination.
Abstract: Arabidopsis accessions differ largely in their seed dormancy behavior. To understand the genetic basis of this intraspecific variation we analyzed two accessions: the laboratory strain Landsberg erecta (Ler) with low dormancy and the strong-dormancy accession Cape Verde Islands (Cvi). We used a quantitative trait loci (QTL) mapping approach to identify loci affecting the after-ripening requirement measured as the number of days of seed dry storage required to reach 50% germination. Thus, seven QTL were identified and named delay of germination (DOG) 1-7. To confirm and characterize these loci, we developed 12 near-isogenic lines carrying single and double Cvi introgression fragments in a Ler genetic background. The analysis of these lines for germination in water confirmed four QTL (DOG1, DOG2, DOG3, and DOG6) as showing large additive effects in Ler background. In addition, it was found that DOG1 and DOG3 genetically interact, the strong dormancy determined by DOG1-Cvi alleles depending on DOG3-Ler alleles. These genotypes were further characterized for seed dormancy/germination behavior in five other test conditions, including seed coat removal, gibberellins, and an abscisic acid biosynthesis inhibitor. The role of the Ler/Cvi allelic variation in affecting dormancy is discussed in the context of current knowledge of Arabidopsis germination.

Journal ArticleDOI
01 Jan 2003-Genetics
TL;DR: Findings link mutations that extend chronological life span in S. cerevisiae to superoxide dismutases and suggest that the induction of other stress-resistance genes regulated by Msn2/4 and Rim15 is required for maximum longevity extension.
Abstract: Signal transduction pathways inactivated during periods of starvation are implicated in the regulation of longevity in organisms ranging from yeast to mammals, but the mechanisms responsible for life-span extension are poorly understood. Chronological life-span extension in S. cerevisiae cyr1 and sch9 mutants is mediated by the stress-resistance proteins Msn2/Msn4 and Rim15. Here we show that mitochondrial superoxide dismutase (Sod2) is required for survival extension in yeast. Deletion of SOD2 abolishes life-span extension in sch9Delta mutants and decreases survival in cyr1:mTn mutants. The overexpression of Sods--mitochondrial Sod2 and cytosolic CuZnSod (Sod1)--delays the age-dependent reversible inactivation of mitochondrial aconitase, a superoxide-sensitive enzyme, and extends survival by 30%. Deletion of the RAS2 gene, which functions upstream of CYR1, also doubles the mean life span by a mechanism that requires Msn2/4 and Sod2. These findings link mutations that extend chronological life span in S. cerevisiae to superoxide dismutases and suggest that the induction of other stress-resistance genes regulated by Msn2/4 and Rim15 is required for maximum longevity extension.

Journal ArticleDOI
01 Apr 2003-Genetics
TL;DR: Using extreme value theory, this distribution of fitness effects among new beneficial mutations is derived and it is shown that it has two unexpected properties: first, the distribution of beneficial fitness effects at a gene is exponential, and second, it has the same mean regardless of the fitness of the present wild-type allele.
Abstract: We know little about the distribution of fitness effects among new beneficial mutations, a problem that partly reflects the rarity of these changes. Surprisingly, though, population genetic theory allows us to predict what this distribution should look like under fairly general assumptions. Using extreme value theory, I derive this distribution and show that it has two unexpected properties. First, the distribution of beneficial fitness effects at a gene is exponential. Second, the distribution of beneficial effects at a gene has the same mean regardless of the fitness of the present wild-type allele. Adaptation from new mutations is thus characterized by a kind of invariance: natural selection chooses from the same spectrum of beneficial effects at a locus independent of the fitness rank of the present wild type. I show that these findings are reasonably robust to deviations from several assumptions. I further show that one can back calculate the mean size of new beneficial mutations from the observed mean size of fixed beneficial mutations.

Journal ArticleDOI
01 Jan 2003-Genetics
TL;DR: Four crosses were made between inbred Cannabis sativa plants with pure cannabidiol (CBD) and pure Delta-9-tetrahydrocannabinol (THC) chemotypes, and it is suggested that codominance is due to the codification by the two alleles for different isoforms of the same synthase, having different specificity for the conversion of the common precursor cannabigerol into CBD or THC.
Abstract: Four crosses were made between inbred Cannabis sativa plants with pure cannabidiol (CBD) and pure -9-tetrahydrocannabinol (THC) chemotypes. All the plants belonging to the F1's were analyzed by gas chromatography for cannabinoid composition and constantly found to have a mixed CBD-THC chemotype. Ten individual F1 plants were self-fertilized, and 10 inbred F2 offspring were collected and analyzed. In all cases, a segregation of the three chemotypes (pure CBD, mixed CBD-THC, and pure THC) fitting a 1:2:1 proportion was observed. The CBD/THC ratio was found to be significantly progeny specific and transmitted from each F1 to the F2's derived from it. A model involving one locus, B, with two alleles, BD and BT, is proposed, with the two alleles being codominant. The mixed chemotypes are interpreted as due to the genotype BD/BT at the B locus, while the pure-chemotype plants are due to homozygosity at the B locus (either BD/BD or BT/BT). It is suggested that such codominance is due to the codification by the two alleles for different isoforms of the same synthase, having different specificity for the conversion of the common precursor cannabigerol into CBD or THC, respectively. The F2 segregating groups were used in a bulk segregant analysis of the pooled DNAs for screening RAPD primers; three chemotype- associated markers are described, one of which has been transformed in a sequence-characterized amplified region (SCAR) marker and shows tight linkage to the chemotype and codominance.

Journal ArticleDOI
01 Dec 2003-Genetics
TL;DR: The results suggest that once a conjugative bacterial plasmid has invaded a bacterial population it will remain even if the original selection is discontinued, similar to how the cost of carrying drug-resistance markers in the absence of antibiotic selection was reduced.
Abstract: Although plasmids can provide beneficial functions to their host bacteria, they might confer a physiological or energetic cost. This study examines how natural selection may reduce the cost of carrying conjugative plasmids with drug-resistance markers in the absence of antibiotic selection. We studied two plasmids, R1 and RP4, both of which carry multiple drug resistance genes and were shown to impose an initial fitness cost on Escherichia coli. To determine if and how the cost could be reduced, we subjected plasmid-containing bacteria to 1100 generations of evolution in batch cultures. Analysis of the evolved populations revealed that plasmid loss never occurred, but that the cost was reduced through genetic changes in both the plasmids and the bacteria. Changes in the plasmids were inferred by the demonstration that evolved plasmids no longer imposed a cost on their hosts when transferred to a plasmid-free clone of the ancestral E. coli. Changes in the bacteria were shown by the lowered cost when the ancestral plasmids were introduced into evolved bacteria that had been cured of their (evolved) plasmids. Additionally, changes in the bacteria were inferred because conjugative transfer rates of evolved R1 plasmids were lower in the evolved host than in the ancestral host. Our results suggest that once a conjugative bacterial plasmid has invaded a bacterial population it will remain even if the original selection is discontinued.

Journal ArticleDOI
01 Jan 2003-Genetics
TL;DR: This work demonstrates that it will be feasible to combine genetic and functional genomic approaches in the Drosophila hematopoietic system to systematically identify oncogene-specific downstream targets.
Abstract: We use the Drosophila melanogaster larval hematopoietic system as an in vivo model for the genetic and functional genomic analysis of oncogenic cell overproliferation. Ras regulates cell proliferation and differentiation in multicellular eukaryotes. To further elucidate the role of activated Ras in cell overproliferation, we generated a collagen promoter-Gal4 strain to overexpress Ras V12 in Drosophila hemocytes. Activated Ras causes a dramatic increase in the number of circulating larval hemocytes (blood cells), which is caused by cellular overproliferation. This phenotype is mediated by the Raf/MAPK pathway. The mutant hemocytes retain the ability to phagocytose bacteria as well as to differentiate into lamellocytes. Microarray analysis of hemocytes overexpressing Ras V12 vs. Ras + identified 279 transcripts that are differentially expressed threefold or more in hemocytes expressing activated Ras. This work demonstrates that it will be feasible to combine genetic and functional genomic approaches in the Drosophila hematopoietic system to systematically identify oncogene-specific downstream targets.

Journal ArticleDOI
01 Jun 2003-Genetics
TL;DR: This article allows for an arbitrary distribution among demes of reproductive success, both beneficial and deleterious effects, and arbitrary dominance, and verified by simulation for a broad range of population structures, including the island model, the stepping-stone model, and a model with extinction and recolonization.
Abstract: New alleles arising in a population by mutation ultimately are either fixed or lost. Either is possible, for both beneficial and deleterious alleles, because of stochastic changes in allele frequency due to genetic drift. Spatially structured populations differ from unstructured populations in the probability of fixation and the time that this fixation takes. Previous results have generally made many assumptions: that all demes contribute to the next generation in exact proportion to their current sizes, that new mutations are beneficial, and that new alleles have additive effects. In this article these assumptions are relaxed, allowing for an arbitrary distribution among demes of reproductive success, both beneficial and deleterious effects, and arbitrary dominance. The effects of population structure can be expressed with two summary statistics: the effective population size and a variant of Wright's F ST. In general, the probability of fixation is strongly affected by population structure, as is the expected time to fixation or loss. Population structure changes the effective size of the species, often strongly downward; smaller effective size increases the probability of fixing deleterious alleles and decreases the probability of fixing beneficial alleles. On the other hand, population structure causes an increase in the homozygosity of alleles, which increases the probability of fixing beneficial alleles but somewhat decreases the probability of fixing deleterious alleles. The probability of fixing new beneficial alleles can be simply described by 2 hs (1 – F ST) N e/ N tot, where hs is the change in fitness of heterozygotes relative to the ancestral homozygote, F ST is a weighted version of Wright's measure of population subdivision, and N e and N tot are the effective and census sizes, respectively. These results are verified by simulation for a broad range of population structures, including the island model, the stepping-stone model, and a model with extinction and recolonization.

Journal ArticleDOI
01 Feb 2003-Genetics
TL;DR: These findings must contribute to the comprehensive understanding of visual capabilities of zebrafish and the evolution of the fish visual system and should become a basis of further studies on expression and developmental regulation of the opsin genes.
Abstract: Zebrafish is becoming a powerful animal model for the study of vision but the genomic organization and variation of its visual opsins have not been fully characterized. We show here that zebrafish has two red ( LWS-1 and LWS-2 ), four green ( RH2-1, RH2-2, RH2-3 , and RH2-4 ), and single blue ( SWS2 ) and ultraviolet ( SWS1 ) opsin genes in the genome, among which LWS-2, RH2-2 , and RH2-3 are novel. SWS2, LWS-1 , and LWS-2 are located in tandem and RH2-1, RH2-2, RH2-3 , and RH2-4 form another tandem gene cluster. The peak absorption spectra (λmax) of the reconstituted photopigments from the opsin cDNAs differed markedly among them: 558 nm ( LWS-1 ), 548 nm ( LWS-2 ), 467 nm ( RH2-1 ), 476 nm ( RH2-2 ), 488 nm ( RH2-3 ), 505 nm ( RH2-4 ), 355 nm ( SWS1 ), 416 nm ( SWS2 ), and 501 nm ( RH1 , rod opsin). The quantitative RT-PCR revealed a considerable difference among the opsin genes in the expression level in the retina. The expression of the two red opsin genes and of three green opsin genes, RH2-1, RH2-3 , and RH2-4 , is significantly lower than that of RH2-2, SWS1 , and SWS2 . These findings must contribute to our comprehensive understanding of visual capabilities of zebrafish and the evolution of the fish visual system and should become a basis of further studies on expression and developmental regulation of the opsin genes.

Journal ArticleDOI
01 Aug 2003-Genetics
TL;DR: The Saccharomyces cerevisiae genome sequence, DNA microarray expression data, tRNA gene numbers, and functional categorizations of proteins are employed to determine whether the amino acid composition of peptides reflects natural selection to optimize the speed and accuracy of translation.
Abstract: The primary structures of peptides may be adapted for efficient synthesis as well as proper function. Here, the Saccharomyces cerevisiae genome sequence, DNA microarray expression data, tRNA gene numbers, and functional categorizations of proteins are employed to determine whether the amino acid composition of peptides reflects natural selection to optimize the speed and accuracy of translation. Strong relationships between synonymous codon usage bias and estimates of transcript abundance suggest that DNA array data serve as adequate predictors of translation rates. Amino acid usage also shows striking relationships with expression levels. Stronger correlations between tRNA concentrations and amino acid abundances among highly expressed proteins than among less abundant proteins support adaptation of both tRNA abundances and amino acid usage to enhance the speed and accuracy of protein synthesis. Natural selection for efficient synthesis appears to also favor shorter proteins as a function of their expression levels. Comparisons restricted to proteins within functional classes are employed to control for differences in amino acid composition and protein size that reflect differences in the functional requirements of proteins expressed at different levels.