scispace - formally typeset
Search or ask a question

Showing papers in "DNA Research in 2015"


Journal ArticleDOI
TL;DR: The region harbouring Saltol, a major quantitative trait loci on chromosome 1 in rice, which is known to control salinity tolerance at seedling stage, was detected as a major association with Na+/K+ ratio measured at reproductive stage in this study.
Abstract: Salinity tolerance in rice is highly desirable to sustain production in areas rendered saline due tovarious reasons. It is a complex quantitative trait having different components, which can be dissected effectively by genome-wide association study (GWAS). Here, we implemented GWAS to identify loci controlling salinity tolerance in rice. A custom-designed array based on 6,000 single nucleotide polymorphisms (SNPs) in as many stress-responsive genes, distributed at an average physical interval of <100 kb on 12 rice chromosomes, was used to genotype 220 rice accessions using Infinium highthroughput assay. Genetic association was analysed with 12 different traits recorded on these accessions under field conditions at reproductive stage. We identified 20 SNPs (loci) significantly associated with Na + /K + ratio, and 44 SNPs with other traits observed under stress condition. The loci identified for various salinity indices through GWAS explained 5–18% of the phenotypic variance. The region harbouring Saltol, a major quantitative trait loci (QTLs) on chromosome 1 in rice, which is known to control salinity tolerance at seedling stage, was detected as a major association with Na + / K + ratio measured at reproductive stage in our study. In addition to Saltol, we also found GWAS peaks representing new QTLs on chromosomes 4, 6 and 7. The current association mapping panel contained mostly indica accessions that can serve as source of novel salt tolerance genes and alleles. The gene-based SNP array used in this study was found cost-effective and efficient in unveiling genomic regions/candidate genes regulating salinity stress tolerance in rice.

257 citations


Journal ArticleDOI
TL;DR: A complete sequence set for the O-antigen biosynthesis gene clusters (O-AGCs) from all 184 recognized Escherichia coli O serogroups is presented, showing a stronger association between host phylogenetic lineage and O-serogroup diversification than previously recognized.
Abstract: The O antigen constitutes the outermost part of the lipopolysaccharide layer in Gram-negative bacteria. The chemical composition and structure of the O antigen show high levels of variation even within a single species revealing itself as serological diversity. Here, we present a complete sequence set for the O-antigen biosynthesis gene clusters (O-AGCs) from all 184 recognized Escherichia coli O serogroups. By comparing these sequences, we identified 161 well-defined O-AGCs. Based on the wzx/wzy or wzm/wzt gene sequences, in addition to 145 singletons, 37 serogroups were placed into 16 groups. Furthermore, phylogenetic analysis of all the E. coli O-serogroup reference strains revealed that the nearly one-quarter of the 184 serogroups were found in the ST10 lineage, which may have a unique genetic background allowing a more successful exchange of O-AGCs. Our data provide a complete view of the genetic diversity of O-AGCs in E. coli showing a stronger association between host phylogenetic lineage and O-serogroup diversification than previously recognized. These data will be a valuable basis for developing a systematic molecular O-typing scheme that will allow traditional typing approaches to be linked to genomic exploration of E. coli diversity.

145 citations


Journal ArticleDOI
TL;DR: The proposed QTL-seq-driven integrated genome-wide strategy has potential to delineate major candidate gene(s) harbouring a robust trait regulatory QTL rapidly with optimal use of resources to extrapolate the molecular mechanism underlying complex quantitative traits at a genome- wide scale leading to fast-paced marker-assisted genetic improvement in diverse crop plants, including chickpea.
Abstract: A rapid high-resolution genome-wide strategy for molecular mapping of major QTL(s)/gene(s) regulating important agronomic traits is vital for in-depth dissection of complex quantitative traits and genetic enhancement in chickpea. The present study for the first time employed a NGS-based whole-genome QTL-seq strategy to identifyone major genomic region harbouring a robust 100-seed weight QTL using an intra-specific 221 chickpea mapping population (desi cv. ICC 7184 ×desi cv. ICC 15061). The QTL-seq-derived major SW QTL (CaqSW1.1) was further validated by single-nucleotide polymorphism (SNP) and simple sequence repeat (SSR) marker-based traditional QTL mapping (47.6% R 2 at higher LOD >19). This reflects the reliability and efficacy of QTL-seq as a strategy for rapid genome-wide scanning and fine mapping of major trait regulatory QTLs in chickpea. The use of QTL-seq and classical QTL mapping in combination narrowed down the 1.37 Mb (comprising 177genes)major SWQTL (CaqSW1.1) region into a 35 kb genomicinterval ondesichickpeachromosome 1 containing six genes. One coding SNP (G/A)-carrying constitutive photomorphogenic9 (COP9) signalosome complex subunit 8 (CSN8) gene of these exhibited seed-specific expression, including pronounced differential up-/down-regulation in low and high seed weight mapping parents and homozygous individuals during seed development. The coding SNP mined in this potential seed weight-governing candidate CSN8 gene was found to be present exclusively in all cultivated species/ genotypes,but notin anywildspecies/genotypesofprimary,secondaryandtertiary genepools.This indicates the effect of strong artificial and/or natural selection pressure on target SW locus during chickpea domestication. The proposed QTL-seq-driven integrated genome-wide strategy has potential to delineate major candidate gene(s) harbouring a robust trait regulatory QTL rapidly with optimal use of resources. This will further assist us to extrapolate the molecular mechanism underlying complex quantitative traits at a genome-wide scale leading to fast-paced marker-assisted genetic improvement in diverse crop plants, including chickpea.

142 citations


Journal ArticleDOI
TL;DR: Results indicate that GWAS is an effective method by which to reveal natural variations of complex traits in B. napus and provide new insights into the genetic control of flowering time in the allopolyploid species.
Abstract: Flowering time adaptation is a major breeding goal in the allopolyploid species Brassica napus. To investigate the genetic architecture of flowering time, a genome-wide association study (GWAS) of flowering time was conducted with a diversity panel comprising 523 B. napus cultivars and inbred lines grown in eight different environments. Genotyping was performed with a Brassica 60K Illumina Infinium SNP array. A total of 41 single-nucleotide polymorphisms (SNPs) distributed on 14 chromosomes were found to be associated with flowering time, and 12 SNPs located in the confidence intervals of quantitative trait loci (QTL) identified in previous researches based on linkage analyses. Twenty-five candidate genes were orthologous to Arabidopsis thaliana flowering genes. To further our understanding of the genetic factors influencing flowering time in different environments, GWAS was performed on two derived traits, environment sensitivity and temperature sensitivity. The most significant SNPs were found near Bn-scaff_16362_1-p380982, just 13 kb away from BnaC09g41990D, which is orthologous to A. thaliana CONSTANS (CO), an important gene in the photoperiod flowering pathway. These results provide new insights into the genetic control of flowering time in B. napus and indicate that GWAS is an effective method by which to reveal natural variations of complex traits in B. napus.

127 citations


Journal ArticleDOI
TL;DR: The successful localization of weeping trait strongly indicates that the high-density map constructed using SLAF markers is a worthy reference for mapping important traits for woody plants.
Abstract: High-density genetic map is a valuable tool for fine mapping locus controlling a specific trait especially for perennial woody plants. In this study, we firstly constructed a high-density genetic map of mei (Prunus mume) using SLAF markers, developed by specific locus amplified fragment sequencing (SLAF-seq). The linkage map contains 8,007 markers, with a mean marker distance of 0.195 cM, making it the densest genetic map for the genus Prunus. Though weeping trees are used worldwide as landscape plants, little is known about weeping controlling gene(s) (Pl). To test the utility of the high-density genetic map, we did fine-scale mapping of this important ornamental trait. In total, three statistic methods were performed progressively based on the result of inheritance analysis. Quantitative trait loci (QTL) analysis initially revealed that a locus on linkage group 7 was strongly responsible for weeping trait. Mutmap-like strategy and extreme linkage analysis were then applied to fine map this locus within 1.14 cM. Bioinformatics analysis of the locus identified some candidate genes. The successful localization of weeping trait strongly indicates that the high-density map constructed using SLAF markers is a worthy reference for mapping important traits for woody plants.

121 citations


Journal ArticleDOI
TL;DR: The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species.
Abstract: High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1–8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species.

112 citations


Journal ArticleDOI
TL;DR: De novo whole-genome sequencing was performed with two lines of I. trifida, namely the selfed line Mx23Hm and the highly heterozygous line 0431-1 using the Illumina HiSeq platform, to assist in analysis of the sweet potato genome.
Abstract: Ipomoea trifida (H. B. K.) G. Don. is the most likely diploid ancestor of the hexaploid sweet potato, I. batatas (L.) Lam. To assist in analysis of the sweet potato genome, de novo whole-genome sequencing was performed with two lines of I. trifida, namely the selfed line Mx23Hm and the highly heterozygous line 0431-1, using the Illumina HiSeq platform. We classified the sequences thus obtained as either ‘core candidates’ (common to the two lines) or ‘line specific’. The total lengths of the assembled sequences of Mx23Hm (ITR_r1.0) was 513 Mb, while that of 0431-1 (ITRk_r1.0) was 712 Mb. Of the assembled sequences, 240 Mb (Mx23Hm) and 353 Mb (0431-1) were classified into core candidate sequences. A total of 62,407 (62.4 Mb) and 109,449 (87.2 Mb) putative genes were identified, respectively, in the genomes of Mx23Hm and 0431-1, of which 11,823 were derived from core sequences of Mx23Hm, while 28,831 were from the core candidate sequence of 0431-1. There were a total of 1,464,173 single-nucleotide polymorphisms and 16,682 copy number variations (CNVs) in the two assembled genomic sequences (under the condition of log2 ratio of >1 and CNV size >1,000 bases). The results presented here are expected to contribute to the progress of genomic and genetic studies of I. trifida, as well as studies of the sweet potato and the genus Ipomoea in general.

105 citations


Journal ArticleDOI
TL;DR: MetaVelvet-SL outperformed the original MetaVelvet and other state-of-the-art metagenomic assemblers, IDBA-UD, Ray Meta and Omega, to reconstruct accurate longer assemblies with higher N50 scores for both simulated data sets and real data sets of human gut microbial sequences.
Abstract: The assembly of multiple genomes from mixed sequence reads is a bottleneck in metagenomic analysis. A single-genome assembly program (assembler) is not capable of resolving metagenome sequences, so assemblers designed specifically for metagenomics have been developed. MetaVelvet is an extension of the single-genome assembler Velvet. It has been proved to generate assemblies with higher N50 scores and higher quality than single-genome assemblers such as Velvet and SOAPdenovo when applied to metagenomic sequence reads and is frequently used in this research community. One important open problem for MetaVelvet is its low accuracy and sensitivity in detecting chimeric nodes in the assembly (de Bruijn) graph, which prevents the generation of longer contigs and scaffolds. We have tackled this problem of classifying chimeric nodes using supervised machine learning to significantly improve the performance of MetaVelvet and developed a new tool, called MetaVelvet-SL. A Support Vector Machine is used for learning the classification model based on 94 features extracted from candidate nodes. In extensive experiments, MetaVelvet-SL outperformed the original MetaVelvet and other state-of-the-art metagenomic assemblers, IDBA-UD, Ray Meta and Omega, to reconstruct accurate longer assemblies with higher N50 scores for both simulated data sets and real data sets of human gut microbial sequences.

98 citations


Journal ArticleDOI
TL;DR: A comprehensive analysis of pennycress gene homologues involved in glucosinolate biosynthesis, metabolism, and transport pathways revealed high sequence conservation compared with other Brassicaceae species, and helps validate the assembly of the penny cress gene space in this draft genome.
Abstract: Field pennycress (Thlaspi arvense L.) is being domesticated as a new winter cover crop and biofuel species for the Midwestern United States that can be double-cropped between corn and soybeans. A genome sequence will enable the use of new technologies to make improvements in pennycress. To generate a draft genome, a hybrid sequencing approach was used to generate 47 Gb of DNA sequencing reads from both the Illumina and PacBio platforms. These reads were used to assemble 6,768 genomic scaffolds. The draft genome was annotated using the MAKER pipeline, which identified 27,390 predicted protein-coding genes, with almost all of these predicted peptides having significant sequence similarity to Arabidopsis proteins. A comprehensive analysis of pennycress gene homologues involved in glucosinolate biosynthesis, metabolism, and transport pathways revealed high sequence conservation compared with other Brassicaceae species, and helps validate the assembly of the pennycress gene space in this draft genome. Additional comparative genomic analyses indicate that the knowledge gained from years of basic Brassicaceae research will serve as a powerful tool for identifying gene targets whose manipulation can be predicted to result in improvements for pennycress.

89 citations


Journal ArticleDOI
TL;DR: The integrated map provides a valuable tool for validating and improving the catfish whole-genome assembly and facilitates fine-scale QTL mapping and positional cloning of genes responsible for economically important traits.
Abstract: Construction of genetic linkage map is essential for genetic and genomic studies. Recent advances in sequencing and genotyping technologies made it possible to generate high-density and high-resolution genetic linkage maps, especially for the organisms lacking extensive genomic resources. In the present work, we constructed a high-density and high-resolution genetic map for channel catfish with three large resource families genotyped using the catfish 250K single-nucleotide polymorphism (SNP) array. A total of 54,342 SNPs were placed on the linkage map, which to our knowledge had the highest marker density among aquaculture species. The estimated genetic size was 3,505.4 cM with a resolution of 0.22 cM for sex-averaged genetic map. The sex-specific linkage maps spanned a total of 4,495.1 cM in females and 2,593.7 cM in males, presenting a ratio of 1.7 : 1 between female and male in recombination fraction. After integration with the previously established physical map, over 87% of physical map contigs were anchored to the linkage groupsthat covered a physical length of 867 Mb, accounting for ∼90% of the catfish genome. The integrated map provides a valuable tool for validating and improving the catfish whole-genome assembly and facilitates fine-scale QTL mapping and positional cloning of genes responsible for economically important traits.

82 citations


Journal ArticleDOI
TL;DR: The genotyping-by-sequencing (GBS) technique is applied to construct a high-resolution genetic map and identify clubroot resistance (CR) genes and reposition 37 and 2 misanchored scaffolds in the 02–12 and TO1000DH genome sequences, respectively.
Abstract: Clubroot is a devastating disease caused by Plasmodiophora brassicae and results in severe losses of yield and quality in Brassica crops. Many clubroot resistance genes and markers are available in Brassica rapa but less is known in Brassica oleracea. Here, we applied the genotyping-by-sequencing (GBS) technique to construct a high-resolution genetic map and identify clubroot resistance (CR) genes. A total of 43,821 SNPs were identified using GBS data for two parental lines, one resistant and one susceptible lines to clubroot, and 18,187 of them showed >5× coverage in the GBS data. Among those, 4,103 were credibly genotyped for all 78 F2 individual plants. These markers were clustered into nine linkage groups spanning 879.9 cM with an average interval of 1.15 cM. Quantitative trait loci (QTLs) survey based on three rounds of clubroot resistance tests using F2 : 3 progenies revealed two and single major QTLs for Race 2 and Race 9 of P. brassicae, respectively. The QTLs show similar locations to the previously reported CR loci for Race 4 in B. oleracea but are in different positions from any of the CR loci found in B. rapa. We utilized two reference genome sequences in this study. The high-resolution genetic map developed herein allowed us to reposition 37 and 2 misanchored scaffolds in the 02-12 and TO1000DH genome sequences, respectively. Our data also support additional positioning of two unanchored 3.3 Mb scaffolds into the 02-12 genome sequence.

Journal ArticleDOI
TL;DR: It is demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA.
Abstract: Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA.

Journal ArticleDOI
Xuemei Zhou1, Peng Zhao1, Wei Wang1, Jie Zou, Tianhe Cheng1, Xiongbo Peng1, Meng-Xiang Sun1 
TL;DR: A detailed survey of all ATGs in tobacco is offered and this work suggests manifold functions of autophagy in both normal plant growth and plant response to environmental stresses.
Abstract: Autophagy is an evolutionarily conserved mechanism in both animals and plants, which has been shown to be involved in various essential developmental processes in plants. Nicotiana tabacum is considered to be an ideal model plant and has been widely used for the study of the roles of autophagy in the processes of plant development and in the response to various stresses. However, only a few autophagy-related genes (ATGs) have been identified in tobacco up to now. Here, we identified 30 ATGs belonging to 16 different groups in tobacco through a genome-wide survey. Comprehensive expression profile analysis reveals an abroad expression pattern of these ATGs, which could be detected in all tissues tested under normal growth conditions. Our series tests further reveal that majority of ATGs are sensitive and responsive to different stresses including nutrient starvation, plant hormones, heavy metal and other abiotic stresses, suggesting a central role of autophagy, likely as an effector, in plant response to various environmental cues. This work offers a detailed survey of all ATGs in tobacco and also suggests manifold functions of autophagy in both normal plant growth and plant response to environmental stresses.

Journal ArticleDOI
TL;DR: A genome-wide analysis was performed in the model plant foxtail millet, finding several TEs are highly polymorphic for insert location in the genome and this facilitates development of TE-based markers for various genotyping purposes.
Abstract: Transposable elements (TEs) are major components of plant genome and are reported to play significant roles in functional genome diversity and phenotypic variations. Several TEs are highly polymorphic for insert location in the genome and this facilitates development of TE-based markers for various genotyping purposes. Considering this, a genome-wide analysis was performed in the model plant foxtail millet. A total of 30,706 TEs were identified and classified as DNA transposons (24,386), full-length Copia type (1,038), partial or solo Copia type (10,118), full-length Gypsy type (1,570), partial or solo Gypsy type (23,293) and Long- and Short-Interspersed Nuclear Elements (3,659 and 53, respectively). Further, 20,278 TE-based markers were developed, namely Retrotransposon-Based Insertion Polymorphisms (4,801, ∼24%), Inter-Retrotransposon Amplified Polymorphisms (3,239, ∼16%), Repeat Junction Markers (4,451, ∼22%), Repeat Junction-Junction Markers (329, ∼2%), Insertion-Site-Based Polymorphisms (7,401, ∼36%) and Retrotransposon-Microsatellite Amplified Polymorphisms (57, 0.2%). A total of 134 Repeat Junction Markers were screened in 96 accessions of Setaria italica and 3 wild Setaria accessions of which 30 showed polymorphism. Moreover, an open access database for these developed resources was constructed (Foxtail millet Transposable Elements-based Marker Database; http://59.163.192.83/ltrdb/index.html). Taken together, this study would serve as a valuable resource for large-scale genotyping applications in foxtail millet and related grass species.

Journal ArticleDOI
TL;DR: This work shows that infrequent bursts of Zscan4 expression (Z4 events) involve unexpected transcriptional derepression in heterochromatin regions that usually remain silent, and suggests that mESCs may maintain their extraordinary genome stability at least in part by transiently resetting their heterochromaatin.
Abstract: Mouse embryonic stem cells (mESCs) have a remarkable capacity to maintain normal genome stability and karyotype in culture. We previously showed that infrequent bursts of Zscan4 expression (Z4 events) are important for the maintenance of telomere length and genome stability in mESCs. However, the molecular details of Z4 events remain unclear. Here we show that Z4 events involve unexpected transcriptional derepression in heterochromatin regions that usually remain silent. During a Z4 event, we see rapid derepression and rerepression of heterochromatin leading to a burst of transcription that coincides with transient histone hyperacetylation and DNA demethylation, clustering of pericentromeric heterochromatin around the nucleolus, and accumulation of activating and repressive chromatin remodelling complexes. This heterochromatin-based transcriptional activity suggests that mESCs may maintain their extraordinary genome stability at least in part by transiently resetting their heterochromatin.

Journal ArticleDOI
TL;DR: Three sex-specific simple sequence repeats (SSR) markers can be used to accurately sex type male and female kiwifruit in breeding programmes and localizing the SDR will expedite the discovery of genes controlling carpel abortion in males and pollen sterility in females.
Abstract: Kiwifruit (Actinidia chinensis Planchon) is an important specialty fruit crop that suffers from narrow genetic diversity stemming from recent global commercialization and limited cultivar improvement. Here, we present high-density RAD-seq-based genetic maps using an interspecific F1 cross between Actinidia rufa 'MT570001' and A. chinensis 'Guihai No4'. The A. rufa (maternal) map consists of 2,426 single-nucleotide polymorphism (SNP) markers with a total length of 2,651 cM in 29 linkage groups (LGs) corresponding to the 29 chromosomes. The A. chinensis (paternal) map consists of 4,214 SNP markers over 3,142 cM in 29 LGs. Using these maps, we were able to anchor an additional 440 scaffolds from the kiwifruit draft genome assembly. Kiwifruit is functionally dioecious, which presents unique challenges for breeding and production. Three sex-specific simple sequence repeats (SSR) markers can be used to accurately sex type male and female kiwifruit in breeding programmes. The sex-determination region (SDR) in kiwifruit was narrowed to a 1-Mb subtelomeric region on chromosome 25. Localizing the SDR will expedite the discovery of genes controlling carpel abortion in males and pollen sterility in females.

Journal ArticleDOI
TL;DR: After long-term exposure to cold, a large proportion of gene down-regulation was found, including photosynthesis and plant growth genes, and up-regulated genes after long- term cold exposure were related to organelle fusion, nucleus organization, and DNA integration, including retrotransposons.
Abstract: Low temperature severely affects plant growth and development. To overcome this constraint, several plant species from regions having a cool season have evolved an adaptive response, called cold acclimation. We have studied this response in olive tree (Olea europaea L.) cv. Picual. Biochemical stress markers and cold-stress symptoms were detected after the first 24 h as sagging leaves. After 5 days, the plants were found to have completely recovered. Control and cold-stressed plants were sequenced by Illumina HiSeq 1000 paired-end technique. We also assembled a new olive transcriptome comprising 157,799 unigenes and found 6,309 unigenes differentially expressed in response to cold. Three types of response that led to cold acclimation were found: short-term transient response, early long-term response, and late long-term response. These subsets of unigenes were related to different biological processes. Early responses involved many cold-stress-responsive genes coding for, among many other things, C-repeat binding factor transcription factors, fatty acid desaturases, wax synthesis, and oligosaccharide metabolism. After long-term exposure to cold, a large proportion of gene down-regulation was found, including photosynthesis and plant growth genes. Up-regulated genes after long-term cold exposure were related to organelle fusion, nucleus organization, and DNA integration, including retrotransposons.

Journal ArticleDOI
TL;DR: This work reports a method to overcome this limitation by using post-bisulfite adaptor tagging (PBAT), in which adaptingor tagging is conducted after bisulfite treatment to circumvent bisulfITE-induced loss of intact sequencing templates, thereby enabling TMS of a 100-fold smaller amount of input DNA with far fewer cycles of polymerase chain reaction than in the current protocol.
Abstract: The current gold standard method for methylome analysis is whole-genome bisulfite sequencing (WGBS), but its cost is substantial, especially for the purpose of multi-sample comparison of large methylomes. Shotgun bisulfite sequencing of target-enriched DNA, or targeted methylome sequencing (TMS), can be a flexible, cost-effective alternative to WGBS. However, the current TMS protocol requires a considerable amount of input DNA and hence is hardly applicable to samples of limited quantity. Here we report a method to overcome this limitation by using post-bisulfite adaptor tagging (PBAT), in which adaptor tagging is conducted after bisulfite treatment to circumvent bisulfite-induced loss of intact sequencing templates, thereby enabling TMS of a 100-fold smaller amount of input DNA with far fewer cycles of polymerase chain reaction than in the current protocol. We thus expect that the PBAT-mediated TMS will serve as an invaluable method in epigenomics.

Journal ArticleDOI
TL;DR: Bioinformatics analyses demonstrated that CNGCs of Group IVa were distinct to those of other groups in gene structure and amino acid sequence of cyclic nucleotide-binding domain, and silencing analyses revealed that a set of CNGC genes might be involved in disease resistance and abiotic stress responses in tomato.
Abstract: Cyclic nucleotide-gated ion channels (CNGCs) are calcium-permeable channels that are involved in various biological functions. Nevertheless, phylogeny and function of plant CNGCs are not well understood. In this study, 333 CNGC genes from 15 plant species were identified using comprehensive bioinformatics approaches. Extensive bioinformatics analyses demonstrated that CNGCs of Group IVa were distinct to those of other groups in gene structure and amino acid sequence of cyclic nucleotide-binding domain. A CNGC-specific motif that recognizes all identified plant CNGCs was generated. Phylogenetic analysis indicated that CNGC proteins of flowering plant species formed five groups. However, CNGCs of the non-vascular plant Physcomitrella patens clustered only in two groups (IVa and IVb), while those of the vascular non-flowering plant Selaginella moellendorffii gathered in four (IVa, IVb, I and II). These data suggest that Group IV CNGCs are most ancient and Group III CNGCs are most recently evolved in flowering plants. Furthermore, silencing analyses revealed that a set of CNGC genes might be involved in disease resistance and abiotic stress responses in tomato and function of SlCNGCs does not correlate with the group that they are belonging to. Our results indicate that Group IVa CNGCs are structurally but not functionally unique among plant CNGCs.

Journal ArticleDOI
TL;DR: The practical utility of developing and high-throughput genotyping of such beneficial InDel markers at a genome-wide scale to expedite genomics-assisted breeding applications in chickpea is demonstrated.
Abstract: We developed 21,499 genome-wide insertion–deletion (InDel) markers (2- to 54-bp in silico fragment length polymorphism) by comparing the genomic sequences of four (desi, kabuli and wild C. reticulatum) chickpea [Cicer arietinum (L.)] accessions. InDel markers showing 2- to 6-bp fragment length polymorphism among accessions were abundant (76.8%) in the chickpea genome. The physically mapped 7,643 and 13,856 markers on eight chromosomes and unanchored scaffolds, respectively, were structurally and functionally annotated. The 4,506 coding (23% large-effect frameshift mutations) and regulatory InDel markers were identified from 3,228 genes (representing 11.7% of total 27,571 desi genes), suggesting their functional relevance for trait association/genetic mapping. High amplification (97%) and intra-specific polymorphic (60–83%) potential and wider genetic diversity (15–89%) were detected by genome-wide 6,254 InDel markers among desi, kabuli and wild accessions using even a simpler cost-effective agarose gel-based assay. This signifies added advantages of this user-friendly genetic marker system for manifold large-scale genotyping applications in laboratories with limited infrastructure and resources. Utilizing 6,254 InDel markers-based high-density (inter-marker distance: 0.212 cM) inter-specific genetic linkage map (ICC 4958 × ICC 17160) of chickpea as a reference, three major genomic regions harboring six flowering and maturity time robust QTLs (16.4–27.5% phenotypic variation explained, 8.1–11.5 logarithm of odds) were identified. Integration of genetic and physical maps at these target QTL intervals mapped on three chromosomes delineated five InDel markers-containing candidate genes tightly linked to the QTLs governing flowering and maturity time in chickpea. Taken together, our study demonstrated the practical utility of developing and high-throughput genotyping of such beneficial InDel markers at a genome-wide scale to expedite genomics-assisted breeding applications in chickpea.

Journal ArticleDOI
TL;DR: The observed global CpG methylation patterns of pigs indicated high similarity to other mammals including humans, and provides essential information for future studies of the porcine epigenome.
Abstract: DNA methylation plays a major role in the epigenetic regulation of gene expression. Although a few DNA methylation profiling studies of porcine genome which is one of the important biomedical models for human diseases have been reported, the available data are still limited. We tried to study methylation patterns of diverse pig tissues as a study of the International Swine Methylome Consortium to generate the swine reference methylome map to extensively evaluate the methylation profile of the pig genome at a single base resolution. We generated and analysed the DNA methylome profiles of five different tissues and a cell line originated from pig. On average, 39.85 and 62.1% of cytosine and guanine dinucleotides (CpGs) of CpG islands and 2 kb upstream of transcription start sites were covered, respectively. We detected a low rate (an average of 1.67%) of non-CpG methylation in the six samples except for the neocortex (2.3%). The observed global CpG methylation patterns of pigs indicated high similarity to other mammals including humans. The percentage of CpG methylation associated with gene features was similar among the tissues but not for a 3D4/2 cell line. Our results provide essential information for future studies of the porcine epigenome.

Journal ArticleDOI
TL;DR: The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data and represents essential tools for molecular breeding and gene cloning in Allium spp.
Abstract: The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

Journal ArticleDOI
TL;DR: In this paper, the authors presented whole-genome resequencing analyses of 55 pigs of five breeds representing Korean native pigs, wild boar and three European origin breeds, covering ∼99.2% of the reference genome, at an average of ∼11.7-fold coverage.
Abstract: Pigs have been one of the most important sources of meat for humans, and their productivity has been substantially improved by recent strong selection. Here, we present whole-genome resequencing analyses of 55 pigs of five breeds representing Korean native pigs, wild boar and three European origin breeds. 1,673.1 Gb of sequence reads were mapped to the Swine reference assembly, covering ∼99.2% of the reference genome, at an average of ∼11.7-fold coverage. We detected 20,123,573 single-nucleotide polymorphisms (SNPs), of which 25.5% were novel. We extracted 35,458 of non-synonymous SNPs in 9,904 genes, which may contribute to traits of interest. The whole SNP sets were further used to access the population structures of the breeds, using multiple methodologies, including phylogenetic, similarity matrix, and population structure analysis. They showed clear population clusters with respect to each breed. Furthermore, we scanned the whole genomes to identify signatures of selection throughout the genome. The result revealed several promising loci that might underlie economically important traits in pigs, such as the CLDN1 and TWIST1 genes. These discoveries provide useful genomic information for further study of the discrete genetic mechanisms associated with economically important traits in pigs.

Journal ArticleDOI
TL;DR: Binformatics analysis of assembled Tribolium castaneum genome disclosed significant contribution of TRs in euchromatic chromosomal arms and clear predominance of satellite DNA-typical 170 bp monomers in arrays of ≥5 repeats.
Abstract: Although satellite DNAs are well-explored components of heterochromatin and centromeres, little is known about emergence, dispersal and possible impact of comparably structured tandem repeats (TRs) on the genome-wide scale. Our bioinformatics analysis of assembled Tribolium castaneum genome disclosed significant contribution of TRs in euchromatic chromosomal arms and clear predominance of satellite DNA-typical 170 bp monomers in arrays of ≥5 repeats. By applying different experimental approaches, we revealed that the nine most prominent TR families Cast1-Cast9 extracted from the assembly comprise ∼4.3% of the entire genome and reside almost exclusively in euchromatic regions. Among them, seven families that build ∼3.9% of the genome are based on ∼170 and ∼340 bp long monomers. Results of phylogenetic analyses of 2500 monomers originating from these families show high-sequence dynamics, evident by extensive exchanges between arrays on non-homologous chromosomes. In addition, our analysis shows that concerted evolution acts more efficiently on longer than on shorter arrays. Efficient genome-wide distribution of nine TR families implies the role of transposition only in expansion of the most dispersed family, and involvement of other mechanisms is anticipated. Despite similarities in sequence features, FISH experiments indicate high-level compartmentalization of centromeric and euchromatic tandem repeats.

Journal ArticleDOI
TL;DR: The genome mapping and validation of five neutral sites in the chromosome of Synechocystis sp.
Abstract: The use of microorganisms as cell factories frequently requires extensive molecular manipulation. Therefore, the identification of genomic neutral sites for the stable integration of ectopic DNA is required to ensure a successful outcome. Here we describe the genome mapping and validation of five neutral sites in the chromosome of Synechocystis sp. PCC 6803, foreseeing the use of this cyanobacterium as a photoautotrophic chassis. To evaluate the neutrality of these loci, insertion/deletion mutants were produced, and to assess their functionality, a synthetic green fluorescent reporter module was introduced. The constructed integrative vectors include a BioBrick-compatible multiple cloning site insulated by transcription terminators, constituting robust cloning interfaces for synthetic biology approaches. Moreover, Synechocystis mutants (chassis) ready to receive purpose-built synthetic modules/circuits are also available. This work presents a systematic approach to map and validate chromosomal neutral sites in cyanobacteria, and that can be extended to other organisms.

Journal ArticleDOI
Hantao Wang1, Xin Jin1, Beibei Zhang1, Chao Shen1, Zhongxu Lin1 
TL;DR: Here, the development of markers using parental RAD sequencing was effective, and a high-density intraspecific genetic map was constructed that can be used for molecular marker-assisted selection in cotton.
Abstract: RAD sequencing was performed using DH962 and Jimian5 as upland cotton mapping parents. Sequencing data for DH962 and Jimian5 were assembled into the genome sequences of ≈55.27 and ≈57.06 Mb, respectively. Analysing genome sequences of the two parents, 1,323 SSR, 3,838 insertion/deletion (InDel), and 9,366 single-nucleotide polymorphism (SNP) primer pairs were developed. All of the SSRs, 121 InDels, 441 SNPs, and other 6,747 primer pairs were screened in the two parents, and a total of 535 new polymorphic loci were identified. A genetic map including 1,013 loci was constructed using these results and 506 loci previously published for this population. Twentyseven new QTLs for yield and fibre quality were identified, indicating that the efficiency of QTL detection was greatly improved by the increase in map density. Comparative genomics showed there to be considerable homology and collinearity between the ATand A2 genomes and between the DTand D5 genomes, although there were a few exchanges and introgressions among the chromosomes of the A2 genome. Here, the development of markers using parental RAD sequencing was effective, and a high-density intraspecific genetic map was constructed. This map can be used for molecular marker-assisted selection in cotton.

Journal ArticleDOI
TL;DR: Natural antisense transcripts are endogenous transcripts that can form double-stranded RNA structures that are significantly derived from PC/PC pairs of trans-NATs and NPC/NPC Pair genes typically have similar pattern of epigenetic status.
Abstract: Natural antisense transcripts (NATs) are endogenous transcripts that can form double-stranded RNA structures. Many protein-coding genes (PCs) and non-protein-coding genes (NPCs) tend to form cis-NATs and trans-NATs, respectively. In this work, we identified 4,080 cis-NATs and 2,491 trans-NATs genome-widely in Arabidopsis. Of these, 5,385 NAT-siRNAs were detected from the small RNA sequencing data. NAT-siRNAs are typically 21nt, and are processed by Dicer-like 1 (DCL1)/DCL2 and RDR6 and function in epigenetically activated situations, or 24nt, suggesting these are processed by DCL3 and RDR2 and function in environment stress. NAT-siRNAs are significantly derived from PC/PC pairs of trans-NATs and NPC/NPC pairs of cis-NATs. Furthermore, NAT pair genes typically have similar pattern of epigenetic status. Cis-NATs tend to be marked by euchromatic modifications, whereas trans-NATs tend to be marked by heterochromatic modifications.

Journal ArticleDOI
TL;DR: GeneBase, a full parser of the National Center for Biotechnology Information (NCBI) Gene database, is developed, which generates a fully structured local database with an intuitive user-friendly graphic interface for personal computers.
Abstract: We have developed GeneBase, a full parser of the National Center for Biotechnology Information (NCBI) Gene database, which generates a fully structured local database with an intuitive user-friendly graphic interface for personal computers. Features of all the annotated eukaryotic genes are accessible through three main software tables, including for each entry details such as the gene summary, the gene exon/intron structure and the specific Gene Ontology attributions. The structuring of the data, the creation of additional calculation fields and the integration with nucleotide sequences allow users to make many types of comparisons and calculations that are useful for data retrieval and analysis. We provide an original example analysis of the existing introns across all the available species, through which the classic biological problem of the ‘minimal intron’ may find a solution using available data. Based on all currently available data, we can define the shortest known eukaryotic GT-AG intron length, setting the physical limit at the 30 base pair intron belonging to the human MST1L gene. This ‘model intron’ will shed light on the minimal requirement elements of recognition used for conventional splicing functioning. Remarkably, this size is indeed consistent with the sum of the splicing consensus sequence lengths.

Journal ArticleDOI
TL;DR: The results reported in this study show that RE activity (both retrotransposition and DNA loss) has impacted the olive genome structure in more ancient times than in other angiosperms.
Abstract: Improved knowledge of genome composition, especially of its repetitive component, generates important information for both theoretical and applied research. The olive repetitive component is made up of two main classes of sequences: tandem repeats and retrotransposons (REs). In this study, we provide characterization of a sample of 254 unique full-length long terminal repeat (LTR) REs. In the sample, Ty1-Copia elements were more numerous than Ty3-Gypsy elements. Mapping a large set of Illumina whole-genome shotgun reads onto the identified retroelement set revealed that Gypsyelements are more redundant than Copia elements. The insertion time of intact retroelements was estimated based on sister LTR’s divergence. Although some elements inserted relatively recently, the mean insertion age of the isolated retroelements is around 18 million yrs. Gypsy and Copia retroelements showed different waves of transposition, with Gypsy elements especially active between 10 and 25 million yrs ago and nearly inactive in the last 7 million yrs. The occurrence of numerous solo-LTRs related to isolated full-length retroelements was ascertained for two Gypsy elements and one Copia element. Overall, the results reported in this study show that RE activity (both retrotransposition and DNA loss) has impacted the olive genome structure in more ancient times than in other angiosperms.

Journal ArticleDOI
TL;DR: In this article, the detection and characterization of 15 million SNPs from chicken genome with the goal to predict variants with potential functional implications (pfVars) from both coding and non-coding regions.
Abstract: Next-generation sequencing has prompted a surge of discovery of millions of genetic variants from vertebrate genomes. Besides applications in genetic association and linkage studies, a fraction of these variants will have functional consequences. This study describes detection and characterization of 15 million SNPs from chicken genome with the goal to predict variants with potential functional implications (pfVars) from both coding and non-coding regions. The study reports: 183K amino acid-altering SNPs of which 48% predicted as evolutionary intolerant, 13K splicing variants, 51K likely to alter RNA secondary structures, 500K within most conserved elements and 3K from non-coding RNAs. Regions of local fixation within commercial broiler and layer lines were investigated as potential selective sweeps using genome-wide SNP data. Relationships with phenotypes, if any, of the pfVars were explored by overlaying the sweep regions with known QTLs. Based on this, the candidate genes and/or causal mutations for a number of important traits are discussed. Although the fixed variants within sweep regions were enriched with non-coding SNPs, some non-synonymous-intolerant mutations reached fixation, suggesting their possible adaptive advantage. The results presented in this study are expected to have important implications for future genomic research to identify candidate causal mutations and in poultry breeding.