scispace - formally typeset
Search or ask a question

Showing papers by "Detlef Weigel published in 2011"


Journal ArticleDOI
TL;DR: The majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome are described, their effects on gene function, and the patterns of local and global linkage among these variants.
Abstract: The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout the species' native range. We describe the majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome, their effects on gene function, and the patterns of local and global linkage among these variants. The action of processes other than spontaneous mutation is identified by comparing the spectrum of mutations that have accumulated since A. thaliana diverged from its closest relative 10 million years ago with the spectrum observed in the laboratory. Recent species-wide selective sweeps are rare, and potentially deleterious mutations are more common in marginal populations.

965 citations


Journal ArticleDOI
TL;DR: The 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47, based on 8.3× dideoxy sequence coverage, is reported, indicating pervasive selection for a smaller genome in this outcrossing species.
Abstract: We present the 207 Mb genome sequence of the outcrosser Arabidopsis lyrata, which diverged from the self-fertilizing species A. thaliana about 10 million years ago. It is generally assumed that the much smaller A. thaliana genome, which is only 125 Mb, constitutes the derived state for the family. Apparent genome reduction in this genus can be partially attributed to the loss of DNA from large-scale rearrangements, but the main cause lies in the hundreds of thousands of small deletions found throughout the genome. These occurred primarily in non-coding DNA and transposons, but protein-coding multi-gene families are smaller in A. thaliana as well. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome.

845 citations


Journal ArticleDOI
TL;DR: The results reveal a direct link between the transition to flowering and secondary metabolism and provide a potential target for manipulation of anthocyanin and flavonol content in plants.
Abstract: Flavonoids are synthesized through an important metabolic pathway that leads to the production of diverse secondary metabolites, including anthocyanins, flavonols, flavones, and proanthocyanidins. Anthocyanins and flavonols are derived from Phe and share common precursors, dihydroflavonols, which are substrates for both flavonol synthase and dihydroflavonol 4-reductase. In the stems of Arabidopsis thaliana, anthocyanins accumulate in an acropetal manner, with the highest level at the junction between rosette and stem. We show here that this accumulation pattern is under the regulation of miR156-targeted SQUAMOSA PROMOTER BINDING PROTEIN-LIKE (SPL) genes, which are deeply conserved and known to have important roles in regulating phase change and flowering. Increased miR156 activity promotes accumulation of anthocyanins, whereas reduced miR156 activity results in high levels of flavonols. We further provide evidence that at least one of the miR156 targets, SPL9, negatively regulates anthocyanin accumulation by directly preventing expression of anthocyanin biosynthetic genes through destabilization of a MYB-bHLH-WD40 transcriptional activation complex. Our results reveal a direct link between the transition to flowering and secondary metabolism and provide a potential target for manipulation of anthocyanin and flavonol content in plants.

730 citations


Journal ArticleDOI
01 Dec 2011-Nature
TL;DR: Compared genome-wide DNA methylation among 10 A. thaliana lines, differentially methylated sites were farther from transposable elements and showed less association with short interfering RNA expression than invariant positions, which has important implications for the potential contribution of sequence-independent epialleles to plant evolution.
Abstract: Heritable epigenetic polymorphisms, such as differential cytosine methylation, can underlie phenotypic variation. Moreover, wild strains of the plant Arabidopsis thaliana differ in many epialleles, and these can influence the expression of nearby genes. However, to understand their role in evolution, it is imperative to ascertain the emergence rate and stability of epialleles, including those that are not due to structural variation. We have compared genome-wide DNA methylation among 10 A. thaliana lines, derived 30 generations ago from a common ancestor. Epimutations at individual positions were easily detected, and close to 30,000 cytosines in each strain were differentially methylated. In contrast, larger regions of contiguous methylation were much more stable, and the frequency of changes was in the same low range as that of DNA mutations. Like individual positions, the same regions were often affected by differential methylation in independent lines, with evidence for recurrent cycles of forward and reverse mutations. Transposable elements and short interfering RNAs have been causally linked to DNA methylation. In agreement, differentially methylated sites were farther from transposable elements and showed less association with short interfering RNA expression than invariant positions. The biased distribution and frequent reversion of epimutations have important implications for the potential contribution of sequence-independent epialleles to plant evolution.

690 citations


Journal ArticleDOI
TL;DR: The results indicate that miR156 is an evolutionarily conserved regulator of vegetative phase change in both annual herbaceous plants and perennial trees.
Abstract: After germination, plants enter juvenile vegetative phase and then transition to an adult vegetative phase before producing reproductive structures. The character and timing of the juvenile-to-adult transition vary widely between species. In annual plants, this transition occurs soon after germination and usually involves relatively minor morphological changes, whereas in trees and other perennial woody plants it occurs after months or years and can involve major changes in shoot architecture. Whether this transition is controlled by the same mechanism in annual and perennial plants is unknown. In the annual forb Arabidopsis thaliana and in maize (Zea mays), vegetative phase change is controlled by the sequential activity of microRNAs miR156 and miR172. miR156 is highly abundant in seedlings and decreases during the juvenile-to-adult transition, while miR172 has an opposite expression pattern. We observed similar changes in the expression of these genes in woody species with highly differentiated, well-characterized juvenile and adult phases (Acacia confusa, Acacia colei, Eucalyptus globulus, Hedera helix, Quercus acutissima), as well as in the tree Populus x canadensis, where vegetative phase change is marked by relatively minor changes in leaf morphology and internode length. Overexpression of miR156 in transgenic P. x canadensis reduced the expression of miR156-targeted SPL genes and miR172, and it drastically prolonged the juvenile phase. Our results indicate that miR156 is an evolutionarily conserved regulator of vegetative phase change in both annual herbaceous plants and perennial trees.

364 citations


Journal ArticleDOI
TL;DR: It is shown that transposable elements—particularly siRNA-targeted TEs—are associated with reduced gene expression within both species and also with gene expression differences between orthologs, and that A. lyrata TEs are targeted by a lower fraction of uniquely matching siRNAs, which are associated with more effective silencing of TE expression.
Abstract: Transposable elements (TEs) are often the primary determinant of genome size differences among eukaryotes. In plants, the proliferation of TEs is countered through epigenetic silencing mechanisms that prevent mobility. Recent studies using the model plant Arabidopsis thaliana have revealed that methylated TE insertions are often associated with reduced expression of nearby genes, and these insertions may be subject to purifying selection due to this effect. Less is known about the genome-wide patterns of epigenetic silencing of TEs in other plant species. Here, we compare the 24-nt siRNA complement from A. thaliana and a closely related congener with a two- to threefold higher TE copy number, Arabidopsis lyrata. We show that TEs—particularly siRNA-targeted TEs—are associated with reduced gene expression within both species and also with gene expression differences between orthologs. In addition, A. lyrata TEs are targeted by a lower fraction of uniquely matching siRNAs, which are associated with more effective silencing of TE expression. Our results suggest that the efficacy of RNA-directed DNA methylation silencing is lower in A. lyrata, a finding that may shed light on the causes of differential TE proliferation among species.

325 citations


Journal ArticleDOI
TL;DR: Some of the developmental traits they control along with possible interactions between miRNA and their targets are discussed, and they present important targets for biotechnology applications.

308 citations


Journal ArticleDOI
TL;DR: Reference-guided whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago were presented in this paper.
Abstract: We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.

259 citations


Journal ArticleDOI
TL;DR: In the near future, it can be expected that mapping by sequencing will become a centerpiece in efforts to discover the genes responsible for quantitative trait loci, and the largest impact might come from the use of these strategies to extract genes from non-model, non-crop plants that exhibit heritable variation in important traits.

215 citations


Journal ArticleDOI
TL;DR: Evidence of double-strand break (DSB) repair, including NHEJ, sequence deletions and mitochondrial asymmetric recombination activity in Arabidopsis wild-type and msh1 mutants is obtained on the basis of data generated by Illumina deep sequencing and confirmed by DNA gel blot analysis.
Abstract: The mitochondrial genome of higher plants is unusually dynamic, with recombination and nonhomologous end-joining (NHEJ) activities producing variability in size and organization. Plant mitochondrial DNA also generally displays much lower nucleotide substitution rates than mammalian or yeast systems. Arabidopsis displays these features and expedites characterization of the mitochondrial recombination surveillance gene MSH1 (MutS 1 homolog), lending itself to detailed study of de novo mitochondrial genome activity. In the present study, we investigated the underlying basis for unusual plant features as they contribute to rapid mitochondrial genome evolution. We obtained evidence of double-strand break (DSB) repair, including NHEJ, sequence deletions and mitochondrial asymmetric recombination activity in Arabidopsis wild-type and msh1 mutants on the basis of data generated by Illumina deep sequencing and confirmed by DNA gel blot analysis. On a larger scale, with mitochondrial comparisons across 72 Arabidopsis ecotypes, similar evidence of DSB repair activity differentiated ecotypes. Forty-seven repeat pairs were active in DNA exchange in the msh1 mutant. Recombination sites showed asymmetrical DNA exchange within lengths of 50- to 556-bp sharing sequence identity as low as 85%. De novo asymmetrical recombination involved heteroduplex formation, gene conversion and mismatch repair activities. Substoichiometric shifting by asymmetrical exchange created the appearance of rapid sequence gain and loss in association with particular repeat classes. Extensive mitochondrial genomic variation within a single plant species derives largely from DSB activity and its repair. Observed gene conversion and mismatch repair activity contribute to the low nucleotide substitution rates seen in these genomes. On a phenotypic level, these patterns of rearrangement likely contribute to the reproductive versatility of higher plants.

209 citations


Journal ArticleDOI
TL;DR: This study compared the NB-LRR gene complement of the selfer Arabidopsis thaliana and its outcrossing close relativeArabidopsis lyrata to show a clearly positive relationship between interspecific divergence and intraspecific polymorphisms.
Abstract: Plants, like animals, use several lines of defense against pathogen attack. Prominent among genes that confer disease resistance are those encoding nucleotide-binding site-leucine-rich repeat (NB-LRR) proteins. Likely due to selection pressures caused by pathogens, NB-LRR genes are the most variable gene family in plants, but there appear to be species-specific limits to the number of NB-LRR genes in a genome. Allelic diversity within an individual is also increased by obligatory outcrossing, which leads to genome-wide heterozygosity. In this study, we compared the NB-LRR gene complement of the selfer Arabidopsis thaliana and its outcrossing close relative Arabidopsis lyrata. We then complemented and contrasted the interspecific patterns with studies of NB-LRR diversity within A. thaliana. Three important insights are as follows: (1) that both species have similar numbers of NB-LRR genes; (2) that loci with single NB-LRR genes are less variable than tandem arrays; and (3) that presence-absence polymorphisms within A. thaliana are not strongly correlated with the presence or absence of orthologs in A. lyrata. Although A. thaliana individuals are mostly homozygous and thus potentially less likely to suffer from aberrant interaction of NB-LRR proteins with newly introduced alleles, the number of NB-LRR genes is similar to that in A. lyrata. In intraspecific and interspecific comparisons, NB-LRR genes are also more variable than receptor-like protein genes. Finally, in contrast to Drosophila, there is a clearly positive relationship between interspecific divergence and intraspecific polymorphisms.

Journal ArticleDOI
01 Jun 2011-Genetics
TL;DR: By comparing effects across shared parents, this work concludes that in several cases there might be an allelic series caused by rare alleles in A. thaliana and mapped quantitative trait loci (QTL) for flowering time in 17 F2 populations derived from these parents.
Abstract: The onset of flowering is an important adaptive trait in plants. The small ephemeral species Arabidopsis thaliana grows under a wide range of temperature and day-length conditions across much of the Northern hemisphere, and a number of flowering-time loci that vary between different accessions have been identified before. However, only few studies have addressed the species-wide genetic architecture of flowering-time control. We have taken advantage of a set of 18 distinct accessions that present much of the common genetic diversity of A. thaliana and mapped quantitative trait loci (QTL) for flowering time in 17 F2 populations derived from these parents. We found that the majority of flowering-time QTL cluster in as few as five genomic regions, which include the locations of the entire FLC/MAF clade of transcription factor genes. By comparing effects across shared parents, we conclude that in several cases there might be an allelic series caused by rare alleles. While this finding parallels results obtained for maize, in contrast to maize much of the variation in flowering time in A. thaliana appears to be due to large-effect alleles.

Journal ArticleDOI
TL;DR: The generation of a predictive model describing the DNA recognition specificity of the LEAFY floral transcription factor is presented, which succeeds in detecting the connection between LFY and AG homologs despite extensive variation in binding sites and opens new avenues to deduce the structure of regulatory networks from mere inspection of genomic sequences.
Abstract: Despite great advances in sequencing technologies, generating functional information for nonmodel organisms remains a challenge. One solution lies in an improved ability to predict genetic circuits based on primary DNA sequence in combination with detailed knowledge of regulatory proteins that have been characterized in model species. Here, we focus on the LEAFY (LFY) transcription factor, a conserved master regulator of floral development. Starting with biochemical and structural information, we built a biophysical model describing LFY DNA binding specificity in vitro that accurately predicts in vivo LFY binding sites in the Arabidopsis thaliana genome. Applying the model to other plant species, we could follow the evolution of the regulatory relationship between LFY and the AGAMOUS (AG) subfamily of MADS box genes and show that this link predates the divergence between monocots and eudicots. Remarkably, our model succeeds in detecting the connection between LFY and AG homologs despite extensive variation in binding sites. This demonstrates that the cis-element fluidity recently observed in animals also exists in plants, but the challenges it poses can be overcome with predictions grounded in a biophysical model. Therefore, our work opens new avenues to deduce the structure of regulatory networks from mere inspection of genomic sequences.

Journal ArticleDOI
TL;DR: A method for de novo assembly of paired-end RAD-seq data in order to produce extended contigs flanking a restriction site to reconstruct one-tenth of the guppy genome represented by 200-500 bp contigs associated to EcoRI recognition sites.
Abstract: Motivation: Next-generation sequencing technologies have facilitated the study of organisms on a genome-wide scale. A recent method called restriction site associated DNA sequencing (RAD-seq) allows to sample sequence information at reduced complexity across a target genome using the Illumina platform. Single-end RAD-seq has proven to provide a large number of informative genetic markers in reference as well as non-reference organisms. Results: Here, we present a method for de novo assembly of paired-end RAD-seq data in order to produce extended contigs flanking a restriction site. We were able to reconstruct one-tenth of the guppy genome represented by 200–500 bp contigs associated to EcoRI recognition sites. In addition, these contigs were used as reference allowing the detection of thousands of new polymorphic markers that are informative for mapping and population genetic studies in the guppy. Availability: A perl and C++ implementation of the method demonstrated in this article is available under http://guppy.weigelworld.org/weigeldatabases/radMarkers/ as package RApiD. Contact: christine.dreyer@tuebingen.mpg.de Supplementary Information:Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: Using a system based on the silencing of the CH42 gene, the mobility of silencing signals initiated in phloem companion cells by artificial microRNAs (miRNA) and trans-acting siRNA (tasiRNA) that have the same primary sequence are tracked, indicating that biogenesis can determine the non-autonomous effects of sRNAs.
Abstract: In plants, small interfering RNAs (siRNAs) can trigger a silencing signal that may spread within a tissue to adjacent cells or even systemically to other organs. Movement of the signal is initially limited to a few cells, but in some cases the signal can be amplified and travel over larger distances. How far silencing initiated by other classes of plant small RNAs (sRNAs) than siRNAs can extend has been less clear. Using a system based on the silencing of the CH42 gene, we have tracked the mobility of silencing signals initiated in phloem companion cells by artificial microRNAs (miRNA) and trans-acting siRNA (tasiRNA) that have the same primary sequence. In this system, both the ta-siRNA and the miRNA act at a distance. Non-autonomous effects of the miRNA can be triggered by several different miRNA precursors deployed as backbones. While the tasiRNA also acts non-autonomously, it has a much greater range than the miRNA or hairpin-derived siRNAs directed against CH42, indicating that biogenesis can determine the non-autonomous effects of sRNAs. In agreement with this hypothesis, the silencing signals initiated by different sRNAs differ in their genetic requirements.

Journal ArticleDOI
TL;DR: The data suggest that FCA and FPA play important roles in the A. thaliana genome in RNA 3′ processing and transcription termination, thus limiting intergenic transcription.
Abstract: The RNA-binding proteins FCA and FPA were identified based on their repression of the flowering time regulator FLC but have since been shown to have widespread roles in the Arabidopsis thaliana genome. Here, we use whole-genome tiling arrays to show that a wide spectrum of genes and transposable elements are misexpressed in the fca-9 fpa-7 (fcafpa) double mutant at two stages of seedling development. There was a significant bias for misregulated genomic segments mapping to the 3′ region of genes. In addition, the double mutant misexpressed a large number of previously unannotated genomic segments corresponding to intergenic regions. We characterized a subset of these misexpressed unannotated segments and established that they resulted from extensive transcriptional read-through, use of downstream polyadenylation sites, and alternative splicing. In some cases, the transcriptional read-through significantly reduced expression of the associated genes. FCA/FPA-dependent changes in DNA methylation were found at several loci, supporting previous associations of FCA/FPA function with chromatin modifications. Our data suggest that FCA and FPA play important roles in the A. thaliana genome in RNA 3′ processing and transcription termination, thus limiting intergenic transcription.

Journal ArticleDOI
TL;DR: It is found that duplication, gene conversion, and positive selection have been important factors in the evolution of these two genes and appear to contribute to the generation of new recognition specificities.
Abstract: The S locus, a single polymorphic locus, is responsible for self-incompatibility (SI) in the Brassicaceae family and many related plant families. Despite its importance, our knowledge of S-locus evolution is largely restricted to the causal genes encoding the S-locus receptor kinase (SRK) receptor and S-locus cysteine-rich protein (SCR) ligand of the SI system. Here, we present high-quality sequences of the genomic region of six S-locus haplotypes: Arabidopsis (Arabidopsis thaliana; one haplotype), Arabidopsis lyrata (four haplotypes), and Capsella rubella (one haplotype). We compared these with reference S-locus haplotypes of the self-compatible Arabidopsis and its SI congener A. lyrata. We subsequently reconstructed the likely genomic organization of the S locus in the most recent common ancestor of Arabidopsis and Capsella. As previously reported, the two SI-determining genes, SCR and SRK, showed a pattern of coevolution. In addition, consistent with previous studies, we found that duplication, gene conversion, and positive selection have been important factors in the evolution of these two genes and appear to contribute to the generation of new recognition specificities. Intriguingly, the inactive pseudo-S-locus haplotype in the self-compatible species C. rubella is likely to be an old S-locus haplotype that only very recently became fixed when C. rubella split off from its SI ancestor, Capsella grandiflora.

Journal ArticleDOI
TL;DR: A non-additive interaction between alleles at the Arabidopsis thaliana OAK (OUTGROWTH-ASSOCIATED PROTEIN KINASE) gene provides insights into how tandem arrays, which are particularly prone to frequent, complex rearrangements, can produce genetic novelty.
Abstract: Non-additive interactions between genomes have important implications, not only for practical applications such as breeding, but also for understanding evolution. In extreme cases, genes from different genomic backgrounds may be incompatible and compromise normal development or physiology. Of particular interest are non-additive interactions of alleles at the same locus. For example, overdominant behavior of alleles, with respect to plant fitness, has been proposed as an important component of hybrid vigor, while underdominance may lead to reproductive isolation. Despite their importance, only a few cases of genetic over- or underdominance affecting plant growth or fitness are understood at the level of individual genes. Moreover, the relationship between biochemical and fitness effects may be complex: genetic overdominance, that is, increased or novel activity of a gene may lead to evolutionary underdominance expressed as hybrid weakness. Here, we describe a non-additive interaction between alleles at the Arabidopsis thaliana OAK (OUTGROWTH-ASSOCIATED PROTEIN KINASE) gene. OAK alleles from two different accessions interact in F1 hybrids to cause a variety of aberrant growth phenotypes that depend on a recently acquired promoter with a novel expression pattern. The OAK gene, which is located in a highly variable tandem array encoding closely related receptor-like kinases, is found in one third of A. thaliana accessions, but not in the reference accession Col-0. Besides recruitment of exons from nearby genes as promoter sequences, key events in OAK evolution include gene duplication and divergence of a potential ligand-binding domain. OAK kinase activity is required for the aberrant phenotypes, indicating it is not recognition of an aberrant protein, but rather a true gain of function, or overdominance for gene activity, that leads to this underdominance for fitness. Our work provides insights into how tandem arrays, which are particularly prone to frequent, complex rearrangements, can produce genetic novelty.

Journal ArticleDOI
23 Dec 2011-Cell
TL;DR: The authors state that these mRNAs, dubbed “competing endogenous RNAs (ceRNAs), define a new layer of regulation of miRNA activity that has only recently been discovered.

Journal ArticleDOI
TL;DR: This study characterized the genomic organization of LWS opsin genes by BAC clone sequencing, and described the full range of cone cell types in the retina of the colorful Cumaná guppy, Poecilia reticulata.
Abstract: Female preference for male orange coloration in the genus Poecilia suggests a role for duplicated long wavelength-sensitive (LWS) opsin genes in facilitating behaviors related to mate choice in these species. Previous work has shown that LWS gene duplication in this genus has resulted in expansion of long wavelength visual capacity as determined by microspectrophotometry (MSP). However, the relationship between LWS genomic repertoires and expression of LWS retinal cone classes within a given species is unclear. Our previous study in the related species, Xiphophorus helleri, was the first characterization of the complete LWS opsin genomic repertoire in conjunction with MSP expression data in the family Poeciliidae, and revealed the presence of four LWS loci and two distinct LWS cone classes. In this study we characterized the genomic organization of LWS opsin genes by BAC clone sequencing, and described the full range of cone cell types in the retina of the colorful Cumana guppy, Poecilia reticulata. In contrast to X. helleri, MSP data from the Cumana guppy revealed three LWS cone classes. Comparisons of LWS genomic organization described here for Cumana to that of X. helleri indicate that gene divergence and not duplication was responsible for the evolution of a novel LWS haplotype in the Cumana guppy. This lineage-specific divergence is likely responsible for a third additional retinal cone class not present in X. helleri, and may have facilitated the strong sexual selection driven by female preference for orange color patterns associated with the genus Poecilia.

Journal ArticleDOI
15 Aug 2011-PLOS ONE
TL;DR: LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.
Abstract: Motivation Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking. Results We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime. Conclusion LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.

Journal ArticleDOI
TL;DR: Recent work with relatives of these species is reviewed, motivated by a desire to understand the evolutionary and ecological context for morphological innovation.

Journal ArticleDOI
TL;DR: A major effect QTL that acts in a recessive manner and accounts for curve susceptibility was detected in an initial mapping cross on LG 14 and this locus contains over 100 genes, including MTNR1B, a candidate gene for human idiopathic scoliosis.
Abstract: Understanding the genetic basis of heritable spinal curvature would benefit medicine and aquaculture. Heritable spinal curvature among otherwise healthy children (i.e. Idiopathic Scoliosis and Scheuermann kyphosis) accounts for more than 80% of all spinal curvatures and imposes a substantial healthcare cost through bracing, hospitalizations, surgery, and chronic back pain. In aquaculture, the prevalence of heritable spinal curvature can reach as high as 80% of a stock, and thus imposes a substantial cost through production losses. The genetic basis of heritable spinal curvature is unknown and so the objective of this work is to identify quantitative trait loci (QTL) affecting heritable spinal curvature in the curveback guppy. Prior work with curveback has demonstrated phenotypic parallels to human idiopathic-type scoliosis, suggesting shared biological pathways for the deformity. A major effect QTL that acts in a recessive manner and accounts for curve susceptibility was detected in an initial mapping cross on LG 14. In a second cross, we confirmed this susceptibility locus and fine mapped it to a 5 cM region that explains 82.6% of the total phenotypic variance. We identify a major QTL that controls susceptibility to curvature. This locus contains over 100 genes, including MTNR1B, a candidate gene for human idiopathic scoliosis. The identification of genes associated with heritable spinal curvature in the curveback guppy has the potential to elucidate the biological basis of spinal curvature among humans and economically important teleosts.

Journal ArticleDOI
15 Apr 2011-Cell
TL;DR: It is revealed that plants also exploit miRNA binding by Argonautes as a sequestering mechanism that prevents miRNAs from fulfilling their normal roles.

01 Jan 2011
TL;DR: In this paper, the NB-LRR gene complement of the selfer Arabidopsis thaliana and its outcrossing close relative A. lyrata was compared, and the interspecific patterns of interspecific divergence and intraspecific polymorphisms were compared.
Abstract: Plants, like animals, use several lines of defense against pathogen attack. Prominent among genes that confer disease resistance are those encoding nucleotide-binding site-leucine-rich repeat (NB-LRR) proteins. Likely due to selection pressures caused by pathogens, NB-LRR genes are the most variable gene family in plants, but there appear to be species-specific limits to the number of NB-LRR genes in a genome. Allelic diversity within an individual is also increased by obligatory outcrossing, which leads to genome-wide heterozygosity. In this study, we compared the NB-LRR gene complement of the selfer Arabidopsis thaliana and its outcrossing close relative Arabidopsis lyrata. We then complemented and contrasted the interspecific patterns with studies of NB-LRR diversity within A. thaliana. Three important insights are as follows: (1) that both species have similar numbers of NB-LRR genes; (2) that loci with single NB-LRR genes are less variable than tandem arrays; and (3) that presenceabsence polymorphisms within A. thaliana are not strongly correlated with the presence or absence of orthologs in A. lyrata. Although A. thaliana individuals are mostly homozygous and thus potentially less likely to suffer from aberrant interaction of NB-LRR proteins with newly introduced alleles, the number of NB-LRR genes is similar to that in A. lyrata. In intraspecific and interspecific comparisons, NB-LRR genes are also more variable than receptor-like protein genes. Finally, in contrast to Drosophila, there is a clearly positive relationship between interspecific divergence and intraspecific polymorphisms. Resistance (R) gene-dependent recognition of avirulence (Avr) determinants is a cornerstone of plant defense against pathogens. There are at least five diverse classes of R proteins, the largest of which encodes proteins that have a nucleotide-binding (NB) site along with leucine-richrepeats (LRRs;Dangl and Jones, 2001). NB-LRR proteins can be subdivided into two types based on structural features of the N terminus: TIR-NBLRR (TNL) proteins have a domain that resembles the intracellular signaling domains of Drosophila Toll and mammalian IL-1 receptors, while CC-NB-LRR (CNL) proteins contain a putative coiled-coil domain. The predominant function of NB-LRR genes is in disease resistance (Kunkel et al., 1993; Grant et al., 1995; Botella

Journal ArticleDOI
TL;DR: Two distinct approaches are proposed to efficiently determine the t top-scoring pairs of SNPs, which are strongly associated with the phenotype and which can be guaranteed to find the causal pair.
Abstract: Material and methods For a large-scale dataset of over 200,000 SNPs from about 200 individuals together with several phenotypes, published by Atwell et al. [1], we develop efficient methods to find pairs of SNPs which are strongly associated with the phenotype. As an exhaustive search of all possible combinations of interacting SNPs is often unfeasible, even when only considering pairs of interacting SNPs, the challenge is to find methods which avoid an exhaustive search but can still guarantee to find the causal pair. We propose two distinct approaches to efficiently determine the t top-scoring pairs of SNPs.