scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2007"


Journal ArticleDOI
01 Dec 2007-Genetics
TL;DR: This study shows that markers can capture genetic relationships among genotyped animals, thereby affecting accuracies of GEBVs, and the method of choice was Bayes-B; FR–LS should be investigated further, whereas RR–BLUP cannot be recommended.
Abstract: The success of genomic selection depends on the potential to predict genome-assisted breeding values (GEBVs) with high accuracy over several generations without additional phenotyping after estimating marker effects. Results from both simulations and practical applications have to be evaluated for this potential, which requires linkage disequilibrium (LD) between markers and QTL. This study shows that markers can capture genetic relationships among genotyped animals, thereby affecting accuracies of GEBVs. Strategies to validate the accuracy of GEBVs due to LD are given. Simulations were used to show that accuracies of GEBVs obtained by fixed regression–least squares (FR–LS), random regression–best linear unbiased prediction (RR–BLUP), and Bayes-B are nonzero even without LD. When LD was present, accuracies decrease rapidly in generations after estimation due to the decay of genetic relationships. However, there is a persistent accuracy due to LD, which can be estimated by modeling the decay of genetic relationships and the decay of LD. The impact of genetic relationships was greatest for RR–BLUP. The accuracy of GEBVs can result entirely from genetic relationships captured by markers, and to validate the potential of genomic selection, several generations have to be analyzed to estimate the accuracy due to LD. The method of choice was Bayes-B; FR–LS should be investigated further, whereas RR–BLUP cannot be recommended.

1,147 citations


Journal ArticleDOI
01 Jun 2007-Genetics
TL;DR: This work presents a method for rapidly calculating the distribution of Δm,n,b and demonstrates that it has comparable power to and a much improved running time over previous methods, especially in detecting recombination in large data sets.
Abstract: Statistical tests for detecting mosaic structure or recombination among nucleotide sequences usually rely on identifying a pattern or a signal that would be unlikely to appear under clonal reproduction. Dozens of such tests have been described, but many are hampered by long running times, confounding of selection and recombination, and/or inability to isolate the mosaic-producing event. We introduce a test that is exact, nonparametric, rapidly computable, free of the infinite-sites assumption, able to distinguish between recombination and variation in mutation/fixation rates, and able to identify the breakpoints and sequences involved in the mosaic-producing event. Our test considers three sequences at a time: two parent sequences that may have recombined, with one or two breakpoints, to form the third sequence (the child sequence). Excess similarity of the child sequence to a candidate recombinant of the parents is a sign of recombination; we take the maximum value of this excess similarity as our test statistic Δm,n,b. We present a method for rapidly calculating the distribution of Δm,n,b and demonstrate that it has comparable power to and a much improved running time over previous methods, especially in detecting recombination in large data sets.

699 citations


Journal ArticleDOI
01 Jan 2007-Genetics
TL;DR: A modified algorithm called inclusive composite interval mapping (ICIM) is proposed in this article, which retains all advantages of CIM over IM and avoids the possible increase of sampling variance and the complicated background marker selection process in CIM.
Abstract: Composite interval mapping (CIM) is the most commonly used method for mapping quantitative trait loci (QTL) with populations derived from biparental crosses. However, the algorithm implemented in the popular QTL Cartographer software may not completely ensure all its advantageous properties. In addition, different background marker selection methods may give very different mapping results, and the nature of the preferred method is not clear. A modified algorithm called inclusive composite interval mapping (ICIM) is proposed in this article. In ICIM, marker selection is conducted only once through stepwise regression by considering all marker information simultaneously, and the phenotypic values are then adjusted by all markers retained in the regression equation except the two markers flanking the current mapping interval. The adjusted phenotypic values are finally used in interval mapping (IM). The modified algorithm has a simpler form than that used in CIM, but a faster convergence speed. ICIM retains all advantages of CIM over IM and avoids the possible increase of sampling variance and the complicated background marker selection process in CIM. Extensive simulations using two genomes and various genetic models indicated that ICIM has increased detection power, a reduced false detection rate, and less biased estimates of QTL effects.

685 citations


Journal ArticleDOI
01 Mar 2007-Genetics
TL;DR: A model-based method for using multilocus sequence data to infer the clonal relationships of bacteria and the chromosomal position of homologous recombination events that disrupt a clonal pattern of inheritance is described.
Abstract: We describe a model-based method for using multilocus sequence data to infer the clonal relationships of bacteria and the chromosomal position of homologous recombination events that disrupt a clonal pattern of inheritance. The key assumption of our model is that recombination events introduce a constant rate of substitutions to a contiguous region of sequence. The method is applicable both to multilocus sequence typing (MLST) data from a few loci and to alignments of multiple bacterial genomes. It can be used to decide whether a subset of isolates share common ancestry, to estimate the age of the common ancestor, and hence to address a variety of epidemiological and ecological questions that hinge on the pattern of bacterial spread. It should also be useful in associating particular genetic events with the changes in phenotype that they cause. We show that the model outperforms existing methods of subdividing recombinogenic bacteria using MLST data and provide examples from Salmonella and Bacillus. The software used in this article, ClonalFrame, is available from http://bacteria.stats.ox.ac.uk/.

685 citations


Journal ArticleDOI
01 Mar 2007-Genetics
TL;DR: Using transposon mutagenesis in Drosophila, a library of 7404 protein trap and enhancer trap lines, the Carnegie collection, is constructed to facilitate gene expression mapping at single-cell resolution and finds that 600–900 different genes are trapped in the collection.
Abstract: Metazoan physiology depends on intricate patterns of gene expression that remain poorly known. Using transposon mutagenesis in Drosophila, we constructed a library of 7404 protein trap and enhancer trap lines, the Carnegie collection, to facilitate gene expression mapping at single-cell resolution. By sequencing the genomic insertion sites, determining splicing patterns downstream of the enhanced green fluorescent protein (EGFP) exon, and analyzing expression patterns in the ovary and salivary gland, we found that 600–900 different genes are trapped in our collection. A core set of 244 lines trapped different identifiable protein isoforms, while insertions likely to act as GFP-enhancer traps were found in 256 additional genes. At least 8 novel genes were also identified. Our results demonstrate that the Carnegie collection will be useful as a discovery tool in diverse areas of cell and developmental biology and suggest new strategies for greatly increasing the coverage of the Drosophila proteome with protein trap insertions.

567 citations


Journal ArticleDOI
01 Jul 2007-Genetics
TL;DR: In this article, the authors analyzed the dynamics of multiple mutations and the interplay between multiple mutants and interference between clones in a single asexual population and showed that the amount of variation is determined by a balance between selection, which destroys variation, and beneficial mutations, which create more.
Abstract: When beneficial mutations are rare, they accumulate by a series of selective sweeps. But when they are common, many beneficial mutations will occur before any can fix, so there will be many different mutant lineages in the population concurrently. In an asexual population, these different mutant lineages interfere and not all can fix simultaneously. In addition, further beneficial mutations can accumulate in mutant lineages while these are still a minority of the population. In this article, we analyze the dynamics of such multiple mutations and the interplay between multiple mutations and interference between clones. These result in substantial variation in fitness accumulating within a single asexual population. The amount of variation is determined by a balance between selection, which destroys variation, and beneficial mutations, which create more. The behavior depends in a subtle way on the population parameters: the population size, the beneficial mutation rate, and the distribution of the fitness increments of the potential beneficial mutations. The mutation–selection balance leads to a continually evolving population with a steady-state fitness variation. This variation increases logarithmically with both population size and mutation rate and sets the rate at which the population accumulates beneficial mutations, which thus also grows only logarithmically with population size and mutation rate. These results imply that mutator phenotypes are less effective in larger asexual populations. They also have consequences for the advantages (or disadvantages) of sex via the Fisher–Muller effect; these are discussed briefly.

512 citations


Journal ArticleDOI
01 Feb 2007-Genetics
TL;DR: Early studies of recombination in yeast and more recent studies in Drosophila and mammalian systems suggest that unequal crossover is the major driving force in the evolution of the rRNA genes with sister chromatid exchange occurring more often than exchange between homologs.
Abstract: Evolution of the tandemly repeated ribosomal RNA (rRNA) genes is intriguing because in each species all units within the array are highly uniform in sequence but that sequence differs between species. In this review we summarize the origins of the current models to explain this process of concerted evolution, emphasizing early studies of recombination in yeast and more recent studies in Drosophila and mammalian systems. These studies suggest that unequal crossover is the major driving force in the evolution of the rRNA genes with sister chromatid exchange occurring more often than exchange between homologs. Gene conversion is also believed to play a role; however, direct evidence for its involvement has not been obtained. Remarkably, concerted evolution is so well orchestrated that even transposable elements that insert into a large fraction of the rRNA genes appear to have little effect on the process. Finally, we summarize data that suggest that recombination in the rDNA locus of higher eukaryotes is sufficiently frequent to monitor changes within a few generations.

489 citations


Journal ArticleDOI
01 Jul 2007-Genetics
TL;DR: The popular Bayesian clustering approach STRUCTURES is extended for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers and it is shown that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signal of admixture.
Abstract: Nonrandom mating induces correlations in allelic states within and among loci that can be exploited to understand the genetic structure of natural populations (Wright 1965). For many species, it is of considerable interest to quantify the contribution of two forms of nonrandom mating to patterns of standing genetic variation: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes). Here, we extend the popular Bayesian clustering approach STRUCTURE (Pritchard et al. 2000) for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers. This is accomplished by eliminating the assumption of Hardy–Weinberg equilibrium within clusters and, instead, calculating expected genotype frequencies on the basis of inbreeding or selfing rates. We demonstrate the need for such an extension by showing that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signals of admixture. We gauge the performance of our method using extensive coalescent simulations and demonstrate that our approach can correct for this bias. We also apply our approach to understanding the population structure of the wild relative of domesticated rice, Oryza rufipogon, an important partially selfing grass species. Using a sample of n = 16 individuals sequenced at 111 random loci, we find strong evidence for existence of two subpopulations, which correlates well with geographic location of sampling, and estimate selfing rates for both groups that are consistent with estimates from experimental data (s ≈ 0.48–0.70).

457 citations


Journal ArticleDOI
01 Sep 2007-Genetics
TL;DR: A novel approach for genomewide pedigree-based quantitative trait loci (QTL) association analysis:genomewide rapid association using mixed model and regression (GRAMMAR), which first obtains residuals adjusted for family effects and subsequently analyzes the association between residuals and genetic polymorphisms using rapid least-squares methods.
Abstract: For pedigree-based quantitative trait loci (QTL) association analysis, a range of methods utilizing within-family variation such as transmission-disequilibrium test (TDT)-based methods have been developed. In scenarios where stratification is not a concern, methods exploiting between-family variation in addition to within-family variation, such as the measured genotype (MG) approach, have greater power. Application of MG methods can be computationally demanding (especially for large pedigrees), making genomewide scans practically infeasible. Here we suggest a novel approach for genomewide pedigree-based quantitative trait loci (QTL) association analysis: genomewide rapid association using mixed model and regression (GRAMMAR). The method first obtains residuals adjusted for family effects and subsequently analyzes the association between these residuals and genetic polymorphisms using rapid least-squares methods. At the final step, the selected polymorphisms may be followed up with the full measured genotype (MG) analysis. In a simulation study, we compared type 1 error, power, and operational characteristics of the proposed method with those of MG and TDT-based approaches. For moderately heritable (30%) traits in human pedigrees the power of the GRAMMAR and the MG approaches is similar and is much higher than that of TDT-based approaches. When using tabulated thresholds, the proposed method is less powerful than MG for very high heritabilities and pedigrees including large sibships like those observed in livestock pedigrees. However, there is little or no difference in empirical power of MG and the proposed method. In any scenario, GRAMMAR is much faster than MG and enables rapid analysis of hundreds of thousands of markers.

449 citations


Journal ArticleDOI
01 Nov 2007-Genetics
TL;DR: Mapped diversity array technology markers were used to find associations with resistance to stem rust, leaf rust, yellow rust, and powdery mildew, plus grain yield in five historical wheat international multienvironment trials from the International Maize and Wheat Improvement Center.
Abstract: Linkage disequilibrium can be used for identifying associations between traits of interest and genetic markers. This study used mapped diversity array technology (DArT) markers to find associations with resistance to stem rust, leaf rust, yellow rust, and powdery mildew, plus grain yield in five historical wheat international multienvironment trials from the International Maize and Wheat Improvement Center (CIMMYT). Two linear mixed models were used to assess marker–trait associations incorporating information on population structure and covariance between relatives. An integrated map containing 813 DArT markers and 831 other markers was constructed. Several linkage disequilibrium clusters bearing multiple host plant resistance genes were found. Most of the associated markers were found in genomic regions where previous reports had found genes or quantitative trait loci (QTL) influencing the same traits, providing an independent validation of this approach. In addition, many new chromosome regions for disease resistance and grain yield were identified in the wheat genome. Phenotyping across up to 60 environments and years allowed modeling of genotype × environment interaction, thereby making possible the identification of markers contributing to both additive and additive × additive interaction effects of traits.

429 citations


Journal ArticleDOI
01 Nov 2007-Genetics
TL;DR: The key idea is to directly simulate the quantity of interest, e.g., response to selection, rather than trying to approximate it using some ad hoc measure of heritability.
Abstract: Heritability is often used by plant breeders and geneticists as a measure of precision of a trial or a series of trials. Its main use is for computing the response to selection. Most formulas proposed for calculating heritability implicitly assume balanced data and independent genotypic effects. Both of these assumptions are often violated in plant breeding trials. This article proposes a simulation-based approach to tackle the problem. The key idea is to directly simulate the quantity of interest, e.g., response to selection, rather than trying to approximate it using some ad hoc measure of heritability. The approach is illustrated by three examples.

Journal ArticleDOI
01 Mar 2007-Genetics
TL;DR: The first large-scale global eQTL study in a relatively large plant mapping population reveals that the genetic control of transcript level is highly variable and multifaceted and that this complexity may be a general characteristic of eukaryotes.
Abstract: The genetic architecture of transcript-level variation is largely unknown. The genetic determinants of transcript-level variation were characterized in a recombinant inbred line (RIL) population (n = 211) of Arabidopsis thaliana using whole-genome microarray analysis and expression quantitative trait loci (eQTL) mapping of transcript levels as expression traits (e-traits). Genetic control of transcription was highly complex: one-third of the quantitatively controlled transcripts/e-traits were regulated by cis-eQTL, and many trans-eQTL mapped to hotspots that regulated hundreds to thousands of e-traits. Several thousand eQTL of large phenotypic effect were detected, but almost all (93%) of the 36,871 eQTL were associated with small phenotypic effects (R(2) < 0.3). Many transcripts/e-traits were controlled by multiple eQTL with opposite allelic effects and exhibited higher heritability in the RILs than their parents, suggesting nonadditive genetic variation. To our knowledge, this is the first large-scale global eQTL study in a relatively large plant mapping population. It reveals that the genetic control of transcript level is highly variable and multifaceted and that this complexity may be a general characteristic of eukaryotes.

Journal ArticleDOI
01 Dec 2007-Genetics
TL;DR: A maximum-likelihood approach is developed, based on the expected allele-frequency distribution generated by transition matrix methods, to estimate parameters of the DFE while simultaneously estimating parameters of a demographic model that allows a population size change at some time in the past.
Abstract: The distribution of fitness effects of new mutations (DFE) is important for addressing several questions in genetics, including the nature of quantitative variation and the evolutionary fate of small populations. Properties of the DFE can be inferred by comparing the distributions of the frequencies of segregating nucleotide polymorphisms at selected and neutral sites in a population sample, but demographic changes alter the spectrum of allele frequencies at both neutral and selected sites, so can bias estimates of the DFE if not accounted for. We have developed a maximum-likelihood approach, based on the expected allele-frequency distribution generated by transition matrix methods, to estimate parameters of the DFE while simultaneously estimating parameters of a demographic model that allows a population size change at some time in the past. We tested the method using simulations and found that it accurately recovers simulated parameter values, even if the simulated demography differs substantially from that assumed in our analysis. We use our method to estimate parameters of the DFE for amino acid-changing mutations in humans and Drosophila melanogaster. For a model of unconditionally deleterious mutations, with effects sampled from a gamma distribution, the mean estimate for the distribution shape parameter is ∼0.2 for human populations, which implies that the DFE is strongly leptokurtic. For Drosophila populations, we estimate that the shape parameter is ∼0.35. Differences in the shape of the distribution and the mean selection coefficient between humans and Drosophila result in significantly more strongly deleterious mutations in Drosophila than in humans, and, conversely, nearly neutral mutations are significantly less frequent.

Journal ArticleDOI
01 Nov 2007-Genetics
TL;DR: A relatively high proportion of SRR genes have experienced accelerated divergence throughout the genus Drosophila, and several testis-specific genes, male seminal fluid proteins (SFPs), and spermatogenesis genes show lineage-specific bursts of accelerated evolution and positive selection.
Abstract: A large portion of the annotated genes in Drosophila melanogaster show sex-biased expression, indicating that sex and reproduction-related genes (SRR genes) represent an appreciable component of the genome. Previous studies, in which subsets of genes were compared among few Drosophila species, have found that SRR genes exhibit unusual evolutionary patterns. Here, we have used the newly released genome sequences from 12 Drosophila species, coupled to a larger set of SRR genes, to comprehensively test the generality of these patterns. Among 2505 SRR genes examined, including ESTs with biased expression in reproductive tissues and genes characterized as involved in gametogenesis, we find that a relatively high proportion of SRR genes have experienced accelerated divergence throughout the genus Drosophila. Several testis-specific genes, male seminal fluid proteins (SFPs), and spermatogenesis genes show lineage-specific bursts of accelerated evolution and positive selection. SFP genes also show evidence of lineage-specific gene loss and/or gain. These results bring us closer to understanding the details of the evolutionary dynamics of SRR genes with respect to species divergence.

Journal ArticleDOI
01 Jun 2007-Genetics
TL;DR: A stochastic analysis of DMI accumulation is presented to predict probable levels of asymmetry as divergence time increases and indicates that unidirectional DMIs, specifically involving sex chromosomes, cytoplasmic elements, and maternal effects, are likely to play an important role in postmating isolation.
Abstract: Asymmetric postmating isolation, where reciprocal interspecific crosses produce different levels of fertilization success or hybrid sterility/inviability, is very common. Darwin emphasized its pervasiveness in plants, but it occurs in all taxa assayed. This asymmetry often results from Dobzhansky–Muller incompatibilities (DMIs) involving uniparentally inherited genetic factors (e.g., gametophyte–sporophyte interactions in plants or cytoplasmic–nuclear interactions). Typically, unidirectional (U) DMIs act simultaneously with bidirectional (B) DMIs between autosomal loci that affect reciprocal crosses equally. We model both classes of two-locus DMIs to make quantitative and qualitative predictions concerning patterns of isolation asymmetry in parental species crosses and in the hybrid F1 generation. First, we find conditions that produce expected differences. Second, we present a stochastic analysis of DMI accumulation to predict probable levels of asymmetry as divergence time increases. We find that systematic interspecific differences in relative rates of evolution for autosomal vs. nonautosomal loci can lead to different expected F1 fitnesses from reciprocal crosses, but asymmetries are more simply explained by stochastic differences in the accumulation of U DMIs. The magnitude of asymmetry depends primarily on the cumulative effects of U vs. B DMIs (which depend on heterozygous effects of DMIs), the average number of DMIs required to produce complete reproductive isolation (more asymmetry occurs when fewer DMIs are required), and the shape of the function describing how fitness declines as DMIs accumulate. Comparing our predictions to data from diverse taxa indicates that unidirectional DMIs, specifically involving sex chromosomes, cytoplasmic elements, and maternal effects, are likely to play an important role in postmating isolation.

Journal ArticleDOI
01 May 2007-Genetics
TL;DR: The first genetic transcript map of the soybean genome was created by mapping one SNP in each of 1141 genes in one or more of three recombinant inbred line mapping populations, thus providing a picture of the distribution of genic sequences across the mapped portion of the genome as mentioned in this paper.
Abstract: The first genetic transcript map of the soybean genome was created by mapping one SNP in each of 1141 genes in one or more of three recombinant inbred line mapping populations, thus providing a picture of the distribution of genic sequences across the mapped portion of the genome Single-nucleotide polymorphisms (SNPs) were discovered via the resequencing of sequence-tagged sites (STSs) developed from expressed sequence tag (EST) sequence From an initial set of 9459 polymerase chain reaction primer sets designed to a diverse set of genes, 4240 STSs were amplified and sequenced in each of six diverse soybean genotypes In the resulting 244 Mbp of aligned sequence, a total of 5551 SNPs were discovered, including 4712 single-base changes and 839 indels for an average nucleotide diversity of θ = 0000997 The analysis of the observed genetic distances between adjacent genes vs the theoretical distribution based upon the assumption of a random distribution of genes across the 20 soybean linkage groups clearly indicated that genes were clustered Of the 1141 genes, 291 mapped to 72 of the 112 gaps of 5–10 cM in the preexisting simple sequence repeat (SSR)-based map, while 111 genes mapped in 19 of the 26 gaps >10 cM The addition of 1141 sequence-based genic markers to the soybean genome map will provide an important resource to soybean geneticists for quantitative trait locus discovery and map-based cloning, as well as to soybean breeders who increasingly depend upon marker-assisted selection in cultivar improvement

Journal ArticleDOI
01 Jun 2007-Genetics
TL;DR: The results indicate that TEs provide a natural mechanism for the origination miRNAs that can contribute to regulatory divergence between species as well as a rich source for the discovery of as yet unknown miRNA genes.
Abstract: We sought to evaluate the extent of the contribution of transposable elements (TEs) to human microRNA (miRNA) genes along with the evolutionary dynamics of TE-derived human miRNAs. We found 55 experimentally characterized human miRNA genes that are derived from TEs, and these TE-derived miRNAs have the potential to regulate thousands of human genes. Sequence comparisons revealed that TE-derived human miRNAs are less conserved, on average, than non-TE-derived miRNAs. However, there are 18 TE-derived miRNAs that are relatively conserved, and 14 of these are related to the ancient L2 and MIR families. Comparison of miRNA vs. mRNA expression patterns for TE-derived miRNAs and their putative target genes showed numerous cases of anti-correlated expression that are consistent with regulation via mRNA degradation. In addition to the known human miRNAs that we show to be derived from TE sequences, we predict an additional 85 novel TE-derived miRNA genes. TE sequences are typically disregarded in genomic surveys for miRNA genes and target sites; this is a mistake. Our results indicate that TEs provide a natural mechanism for the origination miRNAs that can contribute to regulatory divergence between species as well as a rich source for the discovery of as yet unknown miRNA genes.

Journal ArticleDOI
01 Jan 2007-Genetics
TL;DR: This first multigene association genetic study in forest trees has shown the feasibility of candidate gene strategies for dissecting complex adaptive traits, provided that genes belonging to key pathways and appropriate statistical tools are used.
Abstract: Genetic association is a powerful method for dissecting complex adaptive traits due to (i) fine-scale mapping resulting from historical recombination, (ii) wide coverage of phenotypic and genotypic variation within a single experiment, and (iii) the simultaneous discovery of loci and alleles. In this article, genetic association among single nucleotide polymorphisms (58 SNPs) from 20 wood- and drought-related candidate genes and an array of wood property traits with evolutionary and commercial importance, namely, earlywood and latewood specific gravity, percentage of latewood, earlywood microfibril angle, and wood chemistry (lignin and cellulose content), was tested using mixed linear models (MLMs) that account for relatedness among individuals by using a pairwise kinship matrix. Population structure, a common systematic bias in association studies, was assessed using 22 nuclear microsatellites. Different phenotype:genotype associations were found, some of them confirming previous evidence from collocation of QTL and genes in linkage maps (for example, 4cl and percentage of latewood) and two that involve nonsynonymous polymorphisms (cad SNP M28 with earlywood specific gravity and 4cl SNP M7 with percentage of latewood). The strongest genetic association found in this study was between allelic variation in α-tubulin, a gene involved in the formation of cortical microtubules, and earlywood microfibril angle. Intragenic LD decays rapidly in conifers; thus SNPs showing genetic association are likely to be located in close proximity to the causative polymorphisms. This first multigene association genetic study in forest trees has shown the feasibility of candidate gene strategies for dissecting complex adaptive traits, provided that genes belonging to key pathways and appropriate statistical tools are used. This approach is of particular utility in species such as conifers, where genomewide strategies are limited by their large genomes.

Journal ArticleDOI
01 Dec 2007-Genetics
TL;DR: The feasibility of genomewide association mapping in cultivated Asian rice using a modest number of SNPs is demonstrated and variation in LD patterns among genomic regions is suggested.
Abstract: Despite its status as one of the world9s major crops, linkage disequilibrium (LD) patterns have not been systematically characterized across the genome of Asian rice ( Oryza sativa ). Such information is critical to fully exploit the genome sequence for mapping complex traits using association techniques. Here we characterize LD in five 500-kb regions of the rice genome in three major cultivated rice varieties ( indica , tropical japonica , and temperate japonica ) and in the wild ancestor of Asian rice, Oryza rufipogon . Using unlinked SNPs to determine the amount of background linkage disequilibrium in each population, we find that the extent of LD is greatest in temperate japonica (probably >500 kb), followed by tropical japonica (∼150 kb) and indica (∼75 kb). LD extends over a shorter distance in O. rufipogon (≪40 kb) than in any of the O. sativa groups assayed here. The differences in the extent of LD among these groups are consistent with differences in outcrossing and recombination rate estimates. As well as heterogeneity between groups, our results suggest variation in LD patterns among genomic regions. We demonstrate the feasibility of genomewide association mapping in cultivated Asian rice using a modest number of SNPs.

Journal ArticleDOI
01 Jan 2007-Genetics
TL;DR: Results show that interaction among individuals may create substantial heritable variation, which is hidden to classical analyses, and provides testable predictions of response to multilevel selection and reduces to classical theory in the absence of interaction.
Abstract: Interaction among individuals is universal, both in animals and in plants, and substantially affects evolution of natural populations and responses to artificial selection in agriculture. Although quantitative genetics has successfully been applied to many traits, it does not provide a general theory accounting for interaction among individuals and selection acting on multiple levels. Consequently, current quantitative genetic theory fails to explain why some traits do not respond to selection among individuals, but respond greatly to selection among groups. Understanding the full impacts of heritable interactions on the outcomes of selection requires a quantitative genetic framework including all levels of selection and relatedness. Here we present such a framework and provide expressions for the response to selection. Results show that interaction among individuals may create substantial heritable variation, which is hidden to classical analyses. Selection acting on higher levels of organization captures this hidden variation and therefore always yields positive response, whereas individual selection may yield response in the opposite direction. Our work provides testable predictions of response to multilevel selection and reduces to classical theory in the absence of interaction. Statistical methodology provided elsewhere enables empirical application of our work to both natural and domestic populations.

Journal ArticleDOI
01 Jun 2007-Genetics
TL;DR: The creation of a set of transgenic fly lines that allow spatially and temporally regulated expression of Drosophila Rab proteins make these transgenic lines a useful tool kit for investigating Rab functions in vivo.
Abstract: Rab proteins are small GTPases that play important roles in transport of vesicle cargo and recruitment, association of motor and other proteins with vesicles, and docking and fusion of vesicles at defined locations. In vertebrates, >75 Rab genes have been identified, some of which have been intensively studied for their roles in endosome and synaptic vesicle trafficking. Recent studies of the functions of certain Rab proteins have revealed specific roles in mediating developmental signal transduction. We have begun a systematic genetic study of the 33 Rab genes in Drosophila. Most of the fly proteins are clearly related to specific vertebrate proteins. We report here the creation of a set of transgenic fly lines that allow spatially and temporally regulated expression of Drosophila Rab proteins. We generated fluorescent protein-tagged wild-type, dominant-negative, and constitutively active forms of 31 Drosophila Rab proteins. We describe Drosophila Rab expression patterns during embryogenesis, the subcellular localization of some Rab proteins, and comparisons of the localization of wild-type, dominant-negative, and constitutively active forms of selected Rab proteins. The high evolutionary conservation and low redundancy of Drosophila Rab proteins make these transgenic lines a useful tool kit for investigating Rab functions in vivo.

Journal ArticleDOI
01 May 2007-Genetics
TL;DR: Mapping of HvFT genes suggests that they provide important sources of flowering-time variation in barley and are identified as the main barley FT-like gene involved in the switch to flowering.
Abstract: The FLOWERING LOCUS T (FT) gene plays a central role in integrating flowering signals in Arabidopsis because its expression is regulated antagonistically by the photoperiod and vernalization pathways. FT belongs to a family of six genes characterized by a phosphatidylethanolamine-binding protein (PEBP) domain. In rice (Oryza sativa), 19 PEBP genes were previously described, 13 of which are FT-like genes. Five FT-like genes were found in barley (Hordeum vulgare). HvFT1, HvFT2, HvFT3, and HvFT4 were highly homologous to OsFTL2 (the Hd3a QTL), OsFTL1, OsFTL10, and OsFTL12, respectively, and this relationship was supported by comparative mapping. No rice equivalent was found for HvFT5. HvFT1 was highly expressed under long-day (inductive) conditions at the time of the morphological switch of the shoot apex from vegetative to reproductive growth. HvFT2 and HvFT4 were expressed later in development. HvFT1 was therefore identified as the main barley FT-like gene involved in the switch to flowering. Mapping of HvFT genes suggests that they provide important sources of flowering-time variation in barley. HvFTI was a candidate for VRN-H3, a dominant mutation giving precocious flowering, while HvFT3 was a candidate for Ppd-H2, a major QTL affecting flowering time in short days.

Journal ArticleDOI
01 Apr 2007-Genetics
TL;DR: The notion of the mean population partition is developed, which is the partition of individuals to populations that minimizes the squared partition distance to the partitions sampled by the MCMC algorithm.
Abstract: Inferring population structure from genetic data sampled from some number of individuals is a formidable statistical problem. One widely used approach considers the number of populations to be fixed and calculates the posterior probability of assigning individuals to each population. More recently, the assignment of individuals to populations and the number of populations have both been considered random variables that follow a Dirichlet process prior. We examined the statistical behavior of assignment of individuals to populations under a Dirichlet process prior. First, we examined a best-case scenario, in which all of the assumptions of the Dirichlet process prior were satisfied, by generating data under a Dirichlet process prior. Second, we examined the performance of the method when the genetic data were generated under a population genetics model with symmetric migration between populations. We examined the accuracy of population assignment using a distance on partitions. The method can be quite accurate with a moderate number of loci. As expected, inferences on the number of populations are more accurate when θ = 4Neu is large and when the migration rate (4Nem) is low. We also examined the sensitivity of inferences of population structure to choice of the parameter of the Dirichlet process model. Although inferences could be sensitive to the choice of the prior on the number of populations, this sensitivity occurred when the number of loci sampled was small; inferences are more robust to the prior on the number of populations when the number of sampled loci is large. Finally, we discuss several methods for summarizing the results of a Bayesian Markov chain Monte Carlo (MCMC) analysis of population structure. We develop the notion of the mean population partition, which is the partition of individuals to populations that minimizes the squared partition distance to the partitions sampled by the MCMC algorithm.

Journal ArticleDOI
01 May 2007-Genetics
TL;DR: No relationship was observed between the level of polymorphism, motif type, and tissue origin, but the polymorphism appeared to be correlated with repeat type.
Abstract: The mapping of functional genes plays an important role in studies of genome structure, function, and evolution, as well as allowing gene cloning and marker-assisted selection to improve agriculturally important traits. Simple sequence repeats (SSRs) developed from expressed sequence tags (ESTs), EST-SSR (eSSR), can be employed as putative functional marker loci to easily tag corresponding functional genes. In this paper, 2218 eSSRs, 1554 from G. raimondii-derived and 754 from G. hirsutum-derived ESTs, were developed and used to screen polymorphisms to enhance our backbone genetic map in allotetraploid cotton. Of the 1554 G. raimondii-derived eSSRs, 744 eSSRs were able to successfully amplify polymorphisms between our two mapping parents, TM-1 and Hai7124, presenting a polymorphic rate of 47.9%. However, only a 23.9% (159/754) polymorphic rate was produced from G. hirsutum-derived eSSRs. No relationship was observed between the level of polymorphism, motif type, and tissue origin, but the polymorphism appeared to be correlated with repeat type. After integrating these new eSSRs, our enhanced genetic map consists of 1790 loci in 26 linkage groups and covers 3425.8 cM with an average intermarker distance of 1.91 cM. This microsatellite-based, gene-rich linkage map contains 71.96% functional marker loci, of which 87.11% are eSSR loci. There were 132 duplicated loci bridging 13 homeologous At/Dt chromosome pairs. Two reciprocal translocations after polyploidization between A2 and A3, and between A4 and A5, chromosomes were further confirmed. A functional analysis of 975 ESTs producing 1122 eSSR loci tagged in the map revealed that 60% had clear BLASTX hits (<1e(-10)) to the Uniprot database and that 475 were associated mainly with genes belonging to the three major gene ontology categories of biological process, cellular component, and molecular function; many of the ESTs were associated with two or more category functions. The results presented here will provide new insights for future investigations of functional and evolutionary genomics, especially those associated with cotton fiber improvement.

Journal ArticleDOI
01 Oct 2007-Genetics
TL;DR: Molecular phylogenetic analyses suggest that both active chitinases (chitotriosidase and AMCase) result from an early gene duplication event, and substantial gene specialization has occurred in time, allowing for tissue-specific expression of pH optimized chit inases and chi-lectins.
Abstract: Family 18 of glycosyl hydrolases encompasses chitinases and so-called chi-lectins lacking enzymatic activity due to amino acid substitutions in their active site. Both types of proteins widely occur in mammals although these organisms lack endogenous chitin. Their physiological function(s) as well as evolutionary relationships are still largely enigmatic. An overview of all family members is presented and their relationships are described. Molecular phylogenetic analyses suggest that both active chitinases (chitotriosidase and AMCase) result from an early gene duplication event. Further duplication events, followed by mutations leading to loss of chitinase activity, allowed evolution of the chi-lectins. The homologous genes encoding chitinase(-like) proteins are clustered in two distinct loci that display a high degree of synteny among mammals. Despite the shared chromosomal location and high homology, individual genes have evolved independently. Orthologs are more closely related than paralogues, and calculated substitution rate ratios indicate that protein-coding sequences underwent purifying selection. Substantial gene specialization has occurred in time, allowing for tissue-specific expression of pH optimized chitinases and chi-lectins. Finally, several family 18 chitinase-like proteins are present only in certain lineages of mammals, exemplifying recent evolutionary events in the chitinase protein family.

Journal ArticleDOI
01 Oct 2007-Genetics
TL;DR: Numerical evaluations of exact probability distributions and computer simulations verify that this new estimator yields unbiased estimates also when based on a modest number of alleles and loci, and eliminates the bias associated with earlier estimators.
Abstract: Amounts of genetic drift and the effective size of populations can be estimated from observed temporal shifts in sample allele frequencies. Bias in this so-called temporal method has been noted in cases of small sample sizes and when allele frequencies are highly skewed. We characterize bias in commonly applied estimators under different sampling plans and propose an alternative estimator for genetic drift and effective size that weights alleles differently. Numerical evaluations of exact probability distributions and computer simulations verify that this new estimator yields unbiased estimates also when based on a modest number of alleles and loci. At the cost of a larger standard deviation, it thus eliminates the bias associated with earlier estimators. The new estimator should be particularly useful for microsatellite loci and panels of SNPs, representing a large number of alleles, many of which will occur at low frequencies.

Journal ArticleDOI
01 Aug 2007-Genetics
TL;DR: Meta-analysis linked to synteny-based and expression-based information provides clues about specific genes and families involved in QTL networks, suggesting that fiber variation involves a complex network of interacting genes.
Abstract: QTL mapping experiments yield heterogeneous results due to the use of different genotypes, environments, and sampling variation. Compilation of QTL mapping results yields a more complete picture of the genetic control of a trait and reveals patterns in organization of trait variation. A total of 432 QTL mapped in one diploid and 10 tetraploid interspecific cotton populations were aligned using a reference map and depicted in a CMap resource. Early demonstrations that genes from the non-fiber-producing diploid ancestor contribute to tetraploid lint fiber genetics gain further support from multiple populations and environments and advanced-generation studies detecting QTL of small phenotypic effect. Both tetraploid subgenomes contribute QTL at largely non-homeologous locations, suggesting divergent selection acting on many corresponding genes before and/or after polyploid formation. QTL correspondence across studies was only modest, suggesting that additional QTL for the target traits remain to be discovered. Crosses between closely-related genotypes differing by single-gene mutants yield profoundly different QTL landscapes, suggesting that fiber variation involves a complex network of interacting genes. Members of the lint fiber development network appear clustered, with cluster members showing heterogeneous phenotypic effects. Meta-analysis linked to synteny-based and expression-based information provides clues about specific genes and families involved in QTL networks.

Journal ArticleDOI
01 Mar 2007-Genetics
TL;DR: This study provides a significant example of how changes in tissue-specific gene expression caused by transposable-element insertions can contribute to adaptation.
Abstract: Transposable elements are a major mutation source and powerful agents of adaptive change. Some transposable element insertions in genomes increase to a high frequency because of the selective advantage the mutant phenotype provides. Cyp6g1-mediated insecticide resistance in Drosophila melanogaster is due to the upregulation of the cytochrome P450 gene Cyp6g1, leading to the resistance to a variety of insecticide classes. The upregulation of Cyp6g1 is correlated with the presence of the long terminal repeat (LTR) of an Accord retrotransposon inserted 291bp upstream of the Cyp6g1 transcription start site. This resistant allele (DDT-R) is currently at a high frequency in D. melanogaster populations around the world. Here, we characterize the spatial expression of Cyp6g1 in insecticide-resistant and -susceptible strains. We show that the Accord LTR insertion is indeed the resistance-associated mutation and demonstrate that the Accord LTR carries regulatory sequences that increase the expression of Cyp6g1 in tissues important for detoxification, the midgut, Malpighian tubules, and the fat body. This study provides a significant example of how changes in tissue-specific gene expression caused by transposable-element insertions can contribute to adaptation.

Journal ArticleDOI
01 Feb 2007-Genetics
TL;DR: This work proposes an association mapping approach based on mixed models with attention to the incorporation of the relationships between genotypes, whether induced by pedigree, population substructure, or otherwise, and emphasizes the need to pay attention toThe environmental features of the data as well.
Abstract: Association or linkage disequilibrium (LD)-based mapping strategies are receiving increased attention for the identification of quantitative trait loci (QTL) in plants as an alternative to more traditional, purely linkage-based approaches. An attractive property of association approaches is that they do not require specially designed crosses between inbred parents, but can be applied to collections of genotypes with arbitrary and often unknown relationships between the genotypes. A less obvious additional attractive property is that association approaches offer possibilities for QTL identification in crops with hard to model segregation patterns. The availability of candidate genes and targeted marker systems facilitates association approaches, as will appropriate methods of analysis. We propose an association mapping approach based on mixed models with attention to the incorporation of the relationships between genotypes, whether induced by pedigree, population substructure, or otherwise. Furthermore, we emphasize the need to pay attention to the environmental features of the data as well, i.e., adequate representation of the relations among multiple observations on the same genotypes. We illustrate our modeling approach using 25 years of Dutch national variety list data on late blight resistance in the genetically complex crop of potato. As markers, we used nucleotide binding-site markers, a specific type of marker that targets resistance or resistance-analog genes. To assess the consistency of QTL identified by our mixed-model approach, a second independent data set was analyzed. Two markers were identified that are potentially useful in selection for late blight resistance in potato.

Journal ArticleDOI
01 Feb 2007-Genetics
TL;DR: In this paper, a genomewide coverage near-isogenic line (NIL) population of Arabidopsis thaliana was introduced by introgressing genomic regions from the Cape Verde Islands (Cvi) accession into the Landsberg erecta (Ler) genetic background.
Abstract: In Arabidopsis recombinant inbred line (RIL) populations are widely used for quantitative trait locus (QTL) analyses. However, mapping analyses with this type of population can be limited because of the masking effects of major QTL and epistatic interactions of multiple QTL. An alternative type of immortal experimental population commonly used in plant species are sets of introgression lines. Here we introduce the development of a genomewide coverage near-isogenic line (NIL) population of Arabidopsis thaliana, by introgressing genomic regions from the Cape Verde Islands (Cvi) accession into the Landsberg erecta (Ler) genetic background. We have empirically compared the QTL mapping power of this new population with an already existing RIL population derived from the same parents. For that, we analyzed and mapped QTL affecting six developmental traits with different heritability. Overall, in the NIL population smaller-effect QTL than in the RIL population could be detected although the localization resolution was lower. Furthermore, we estimated the effect of population size and of the number of replicates on the detection power of QTL affecting the developmental traits. In general, population size is more important than the number of replicates to increase the mapping power of RILs, whereas for NILs several replicates are absolutely required. These analyses are expected to facilitate experimental design for QTL mapping using these two common types of segregating populations.