scispace - formally typeset
Search or ask a question

Showing papers in "BMC Genomics in 2018"


Journal ArticleDOI
TL;DR: Slingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories and infers more accurate pseudotimes than other leading methods.
Abstract: Single-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve. We introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods. Slingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression.

1,241 citations


Journal ArticleDOI
TL;DR: Findings show that evolutionary events based on horizontal gene transfer occur within an ongoing CDI and contribute to the adaptation of the species by the introduction of new genes into the genomes.
Abstract: Clostridioides difficile infections (CDI) have emerged over the past decade causing symptoms that range from mild, antibiotic-associated diarrhea (AAD) to life-threatening toxic megacolon. In this study, we describe a multiple and isochronal (mixed) CDI caused by the isolates DSM 27638, DSM 27639 and DSM 27640 that already initially showed different morphotypes on solid media. The three isolates belonging to the ribotypes (RT) 012 (DSM 27639) and 027 (DSM 27638 and DSM 27640) were phenotypically characterized and high quality closed genome sequences were generated. The genomes were compared with seven reference strains including three strains of the RT 027, two of the RT 017, and one of the RT 078 as well as a multi-resistant RT 012 strain. The analysis of horizontal gene transfer events revealed gene acquisition incidents that sort the strains within the time line of the spread of their RTs within Germany. We could show as well that horizontal gene transfer between the members of different RTs occurred within this multiple infection. In addition, acquisition and exchange of virulence-related features including antibiotic resistance genes were observed. Analysis of the two genomes assigned to RT 027 revealed three single nucleotide polymorphisms (SNPs) and apparently a regional genome modification within the flagellar switch that regulates the fli operon. Our findings show that (i) evolutionary events based on horizontal gene transfer occur within an ongoing CDI and contribute to the adaptation of the species by the introduction of new genes into the genomes, (ii) within a multiple infection of a single patient the exchange of genetic material was responsible for a much higher genome variation than the observed SNPs.

373 citations


Journal ArticleDOI
TL;DR: Fifty four WRKY genes were identified in pineapple and the structure of their encoded proteins, their evolutionary characteristics and expression patterns were examined in this study, providing a foundation for further functional characterization ofWRKY genes with an aim of pineapple crop improvement.
Abstract: WRKY proteins comprise a large family of transcription factors that play important roles in many aspects of physiological processes and adaption to environment. However, little information was available about the WRKY genes in pineapple (Ananas comosus), an important tropical fruits. The recent release of the whole-genome sequence of pineapple allowed us to perform a genome-wide investigation into the organization and expression profiling of pineapple WRKY genes. In the present study, 54 pineapple WRKY (AcWRKY) genes were identified and renamed on the basis of their respective chromosome distribution. According to their structural and phylogenetic features, the 54 AcWRKYs were further classified into three main groups with several subgroups. The segmental duplication events played a major role in the expansion of pineapple WRKY gene family. Synteny analysis and phylogenetic comparison of group III WRKY genes provided deep insight into the evolutionary characteristics of pineapple WRKY genes. Expression profiles derived from transcriptome data and real-time quantitative PCR analysis exhibited distinct expression patterns of AcWRKY genes in various tissues and in response to different abiotic stress and hormonal treatments. Fifty four WRKY genes were identified in pineapple and the structure of their encoded proteins, their evolutionary characteristics and expression patterns were examined in this study. This systematic analysis provided a foundation for further functional characterization of WRKY genes with an aim of pineapple crop improvement.

199 citations


Journal ArticleDOI
TL;DR: TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled.
Abstract: Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .

195 citations


Journal ArticleDOI
TL;DR: This study shows that the iDeepS method identifies the sequence and structure motifs to accurately predict RBP binding sites, and outperforms the state-of-the-art methods.
Abstract: RNA regulation is significantly dependent on its binding protein partner, known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well characterized. Interdependencies between sequence and secondary structure specificities is challenging for both predicting RBP binding sites and accurate sequence and structure motifs detection. In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, to enable subsequent convolution operations. To reveal the hidden binding knowledge from the observed sequences, the CNNs are applied to learn the abstract features. Considering the close relationship between sequence and predicted structures, we use the BLSTM to capture possible long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets. The results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage compared to other methods is that iDeepS can automatically extract both binding sequence and structure motifs, which will improve our understanding of the mechanisms of binding specificities of RBPs. Our study shows that the iDeepS method identifies the sequence and structure motifs to accurately predict RBP binding sites. iDeepS is available at https://github.com/xypan1232/iDeepS .

190 citations


Journal ArticleDOI
TL;DR: An in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned flow cells with Exclusion Amplification chemistry is presented and methods for eliminating sample data cross contamination are provided.
Abstract: Here we present an in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned flow cells with Exclusion Amplification (ExAmp) chemistry (HiSeqX, HiSeq4000, and NovaSeq). We also present a remediation method that minimizes the impact of such swaps. Leveraging data collected over a two-year period, we demonstrate the widespread prevalence of index swapping in patterned flow cell data. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different library methods can have vastly different swapping rates and that even non-ExAmp chemistry instruments display trace levels of index swapping. We provide methods for eliminating sample data cross contamination by utilizing non-redundant dual indexing for complete filtering of index swapped reads, and share the sequences for 96 non-combinatorial dual indexes we have validated across various library preparation methods and sequencer models. Finally, using computational methods we provide a greater insight into the mechanism of index swapping. Index swapping in pooled libraries is a prevalent phenomenon that we observe at a rate of 0.2 to 6% in all sequencing runs on HiSeqX, HiSeq 4000/3000, and NovaSeq. Utilizing non-redundant dual indexing allows for the removal (flagging/filtering) of these swapped reads and eliminates swapping induced sample contamination, which is critical for sensitive applications such as RNA-seq, single cell, blood biopsy using circulating tumor DNA, or clinical sequencing.

187 citations


Journal ArticleDOI
TL;DR: The capacity of single molecule, real-time (SMRT) sequencing implemented on the SEQUEL platform to overcome limitations and generate high-fidelity sequences from amplicons with varying GC content and is resilient to homopolymer tracts is tested.
Abstract: Although high-throughput sequencers (HTS) have largely displaced their Sanger counterparts, the short read lengths and high error rates of most platforms constrain their utility for amplicon sequencing. The present study tests the capacity of single molecule, real-time (SMRT) sequencing implemented on the SEQUEL platform to overcome these limitations, employing 658 bp amplicons of the mitochondrial cytochrome c oxidase I gene as a model system. By examining templates from more than 5000 species and 20,000 specimens, the performance of SMRT sequencing was tested with amplicons showing wide variation in GC composition and varied sequence attributes. SMRT and Sanger sequences were very similar, but SMRT sequencing provided more complete coverage, especially for amplicons with homopolymer tracts. Because it can characterize amplicon pools from 10,000 DNA extracts in a single run, the SEQUEL can reduce greatly reduce sequencing costs in comparison to first (Sanger) and second generation platforms (Illumina, Ion). SMRT analysis generates high-fidelity sequences from amplicons with varying GC content and is resilient to homopolymer tracts. Analytical costs are low, substantially less than those for first or second generation sequencers. When implemented on the SEQUEL platform, SMRT analysis enables massive amplicon characterization because each instrument can recover sequences from more than 5 million DNA extracts a year.

182 citations


Journal ArticleDOI
TL;DR: The results suggest that use of such adapters is critical to reduce false positive rates in assays that aim to identify low allele frequency events, and strongly indicate that dual-matched adapters be implemented for all sensitive MPS applications.
Abstract: Sample index cross-talk can result in false positive calls when massively parallel sequencing (MPS) is used for sensitive applications such as low-frequency somatic variant discovery, ancient DNA investigations, microbial detection in human samples, or circulating cell-free tumor DNA (ctDNA) variant detection. Therefore, the limit-of-detection of an MPS assay is directly related to the degree of index cross-talk. Cross-talk rates up to 0.29% were observed when using standard, combinatorial adapters, resulting in 110,180 (0.1% cross-talk rate) or 1,121,074 (0.29% cross-talk rate) misassigned reads per lane in non-patterned and patterned Illumina flow cells, respectively. Here, we demonstrate that using unique, dual-matched indexed adapters dramatically reduces index cross-talk to ≤1 misassigned reads per flow cell lane. While the current study was performed using dual-matched indices, using unique, dual-unrelated indices would also be an effective alternative. For sensitive downstream analyses, the use of combinatorial indices for multiplexed hybrid capture and sequencing is inappropriate, as it results in an unacceptable number of misassigned reads. Cross-talk can be virtually eliminated using dual-matched indexed adapters. These results suggest that use of such adapters is critical to reduce false positive rates in assays that aim to identify low allele frequency events, and strongly indicate that dual-matched adapters be implemented for all sensitive MPS applications.

157 citations


Journal ArticleDOI
TL;DR: Findings suggest that transcriptome signatures may distinguish end-stage heart failure, shedding light on underlying biological differences between ICM and DCM, and demonstrate the commonality of mitochondrial dysfunction in end- stage HF.
Abstract: Current heart failure (HF) treatment is based on targeting symptoms and left ventricle dysfunction severity, relying on a common HF pathway paradigm to justify common treatments for HF patients. This common strategy may belie an incomplete understanding of heterogeneous underlying mechanisms and could be a barrier to more precise treatments. We hypothesized we could use RNA-sequencing (RNA-seq) in human heart tissue to delineate HF etiology-specific gene expression signatures. RNA-seq from 64 human left ventricular samples: 37 dilated (DCM), 13 ischemic (ICM), and 14 non-failing (NF). Using a multi-analytic approach including covariate adjustment for age and sex, differentially expressed genes (DEGs) were identified characterizing HF and disease-specific expression. Pathway analysis investigated enrichment for biologically relevant pathways and functions. DCM vs NF and ICM vs NF had shared HF-DEGs that were enriched for the fetal gene program and mitochondrial dysfunction. DCM-specific DEGs were enriched for cell-cell and cell-matrix adhesion pathways. ICM-specific DEGs were enriched for cytoskeletal and immune pathway activation. Using the ICM and DCM DEG signatures from our data we were able to correctly classify the phenotypes of 24/31 ICM and 32/36 DCM samples from publicly available replication datasets. Our results demonstrate the commonality of mitochondrial dysfunction in end-stage HF but more importantly reveal key etiology-specific signatures. Dysfunctional cell-cell and cell-matrix adhesion signatures typified DCM whereas signals related to immune and fibrotic responses were seen in ICM. These findings suggest that transcriptome signatures may distinguish end-stage heart failure, shedding light on underlying biological differences between ICM and DCM.

143 citations


Journal ArticleDOI
TL;DR: Analysis of the potato Hsp20 gene family demonstrated that the genes responded to multiple abiotic stresses, such as heat, salt or drought stress, and provided valuable information for clarifying the evolutionary relationship of the StHsp20 family and in aiding functional characterization of StHSP20 genes in further research.
Abstract: Heat shock proteins (Hsps) are essential components in plant tolerance mechanism under various abiotic stresses. Hsp20 is the major family of heat shock proteins, but little of Hsp20 family is known in potato (Solanum tuberosum), which is an important vegetable crop that is thermosensitive. To reveal the mechanisms of potato Hsp20s coping with abiotic stresses, analyses of the potato Hsp20 gene family were conducted using bioinformatics-based methods. In total, 48 putative potato Hsp20 genes (StHsp20s) were identified and named according to their chromosomal locations. A sequence analysis revealed that most StHsp20 genes (89.6%) possessed no, or only one, intron. A phylogenetic analysis indicated that all of the StHsp20 genes, except 10, were grouped into 12 subfamilies. The 48 StHsp20 genes were randomly distributed on 12 chromosomes. Nineteen tandem duplicated StHsp20s and one pair of segmental duplicated genes (StHsp20-15 and StHsp20-48) were identified. A cis-element analysis inferred that StHsp20s, except for StHsp20-41, possessed at least one stress response cis-element. A heatmap of the StHsp20 gene family showed that the genes, except for StHsp20-2 and StHsp20-45, were expressed in various tissues and organs. Real-time quantitative PCR was used to detect the expression level of StHsp20 genes and demonstrated that the genes responded to multiple abiotic stresses, such as heat, salt or drought stress. The relative expression levels of 14 StHsp20 genes (StHsp20-4, 6, 7, 9, 20, 21, 33, 34, 35, 37, 41, 43, 44 and 46) were significantly up-regulated (more than 100-fold) under heat stress. These results provide valuable information for clarifying the evolutionary relationship of the StHsp20 family and in aiding functional characterization of StHsp20 genes in further research.

140 citations


Journal ArticleDOI
TL;DR: A Bioconductor package, ATACseqQC, for easily generating various diagnostic plots to help researchers quickly assess the quality of their ATAC-seq data, and has been used successfully for preprocessing and assessing several in-house and public ATac-seq datasets.
Abstract: ATAC-seq (Assays for Transposase-Accessible Chromatin using sequencing) is a recently developed technique for genome-wide analysis of chromatin accessibility. Compared to earlier methods for assaying chromatin accessibility, ATAC-seq is faster and easier to perform, does not require cross-linking, has higher signal to noise ratio, and can be performed on small cell numbers. However, to ensure a successful ATAC-seq experiment, step-by-step quality assurance processes, including both wet lab quality control and in silico quality assessment, are essential. While several tools have been developed or adopted for assessing read quality, identifying nucleosome occupancy and accessible regions from ATAC-seq data, none of the tools provide a comprehensive set of functionalities for preprocessing and quality assessment of aligned ATAC-seq datasets. We have developed a Bioconductor package, ATACseqQC, for easily generating various diagnostic plots to help researchers quickly assess the quality of their ATAC-seq data. In addition, this package contains functions to preprocess aligned ATAC-seq data for subsequent peak calling. Here we demonstrate the utilities of our package using 25 publicly available ATAC-seq datasets from four studies. We also provide guidelines on what the diagnostic plots should look like for an ideal ATAC-seq dataset. This software package has been used successfully for preprocessing and assessing several in-house and public ATAC-seq datasets. Diagnostic plots generated by this package will facilitate the quality assessment of ATAC-seq data, and help researchers to evaluate their own ATAC-seq experiments as well as select high-quality ATAC-seq datasets from public repositories such as GEO to avoid generating hypotheses or drawing conclusions from low-quality ATAC-seq experiments. The software, source code, and documentation are freely available as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html .

Journal ArticleDOI
TL;DR: A new R package named EnrichedHeatmap is presented that efficiently visualizes genomic signal enrichment and provides advanced solutions for normalizing genomic signals within target regions as well as offering highly customizable visualizations.
Abstract: High-throughput sequencing data are dramatically increasing in volume. Thus, there is urgent need for efficient tools to perform fast and integrative analysis of multiple data types. Enriched heatmap is a specific form of heatmap that visualizes how genomic signals are enriched over specific target regions. It is commonly used and efficient at revealing enrichment patterns especially for high dimensional genomic and epigenomic datasets. We present a new R package named EnrichedHeatmap that efficiently visualizes genomic signal enrichment. It provides advanced solutions for normalizing genomic signals within target regions as well as offering highly customizable visualizations. The major advantage of EnrichedHeatmap is the ability to conveniently generate parallel heatmaps as well as complex annotations, which makes it easy to integrate and visualize comprehensive overviews of the patterns and associations within and between complex datasets. EnrichedHeatmap facilitates comprehensive understanding of high dimensional genomic and epigenomic data. The power of EnrichedHeatmap is demonstrated by visualization of the complex associations between DNA methylation, gene expression and various histone modifications.

Journal ArticleDOI
Sarah M. Pilkington1, Ross N. Crowhurst1, Elena Hilario1, Simona Nardozza1, Lena G. Fraser1, Yongyan Peng1, Yongyan Peng2, Kularajathevan Gunaseelan1, Robert M. Simpson, Jibran Tahir, Simon C. Deroles, Kerry Robert Templeton1, Zhiwei Luo1, Marcus Davy, Canhong Cheng1, Mark A McNeilage1, Davide Scaglione, Yifei Liu3, Qiong Zhang, P. M. Datson1, Nihal De Silva1, Susan E. Gardiner, H. Bassett, David Chagné, John McCallum, Helge Dzierzon, Cecilia H. Deng1, Yen-Yi Wang1, Lorna Barron1, Kelvina I. Manako1, Judith H. Bowen1, Toshi Foster, Zoe A. Erridge, Heather R. Tiffin, Chethi N. Waite, Kevin M. Davies, Ella R. P. Grierson, William A. Laing, Rebecca Kirk1, Xiuyin Chen1, Marion Wood1, Mirco Montefiori1, David A. Brummell, Kathy E. Schwinn, Andrew Catanach, Christina G. Fullerton1, Dawei Li, Sathiyamoorthy Meiyalaghan, Niels J. Nieuwenhuizen1, Nicola C. Read2, Roneel Prakash1, Donald A. Hunter, Huaibi Zhang, Marian J. McKenzie, Mareike Knäbel, Alastair Harris2, Andrew C. Allan2, Andrew C. Allan1, Andrew P. Gleave1, Angela Chen2, Bart J. Janssen1, Blue Plunkett1, Charles Ampomah-Dwamena1, Charlotte Voogd1, Davin Leif1, Davin Leif2, Declan J. Lafferty2, Edwige J. F. Souleyre1, Erika Varkonyi-Gasic1, Francesco Gambi1, Jenny Hanley2, Jia-Long Yao1, Joey Cheung2, Karine M. David2, Ben Warren1, K.B. Marsh1, Kimberley C. Snowden1, Kui Lin-Wang1, Lara Brian1, Marcela Martínez-Sánchez1, Mindy Y. Wang1, Nadeesha R. Ileperuma1, Nikolai Macnee1, Robert Campin1, Peter A. McAtee1, Revel S.M. Drummond1, Richard V. Espley1, Hilary S. Ireland1, Rongmei Wu1, Ross G. Atkinson1, Sakuntala Karunairetnam1, Sean Bulley, Shayhan Chunkath2, Zac Hanley1, Roy Storey, Amali H. Thrimawithana1, Susan Thomson, Charles David, Raffaele Testolin4, Hongwen Huang3, Roger P. Hellens5, Robert J. Schaffer1, Robert J. Schaffer2 
TL;DR: The use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction, especially relevant for certain types of gene families such as the EXPANSIN like genes.
Abstract: Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) ‘Hongyang’ draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within ‘Hongyang’ The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned ‘Hort16A’ cDNAs and comparing with the predicted protein models for Red5 and both the original ‘Hongyang’ assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised ‘Hongyang’ annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and correction of gene models enabling improvement of computational prediction. This utility was especially relevant for certain types of gene families such as the EXPANSIN like genes. Finally, this high quality gene set will supply the kiwifruit and general plant community with a new tool for genomics and other comparative analysis.

Journal ArticleDOI
TL;DR: The global scale of circRNA age-accumulation is attributed to the high composition of post-mitotic cells in adult C. elegans, coupled with the high resistance of circRNAs to decay, suggesting that the exceptional stability of circ RNAs might explain age- Accumulation trends observed from neural tissues of other organisms, which also have a high compositionof post-Mitotic cells.
Abstract: Circular RNAs (CircRNAs) are a newly appreciated class of RNAs that lack free 5′ and 3′ ends, are expressed by the thousands in diverse forms of life, and are mostly of enigmatic function. Ostensibly due to their resistance to exonucleases, circRNAs are known to be exceptionally stable. Previous work in Drosophila and mice have shown that circRNAs increase during aging in neural tissues. Here, we examined the global profile of circRNAs in C. elegans during aging by performing ribo-depleted total RNA-seq from the fourth larval stage (L4) through 10-day old adults. Using stringent bioinformatic criteria and experimental validation, we annotated a high-confidence set of 1166 circRNAs, including 575 newly discovered circRNAs. These circRNAs were derived from 797 genes with diverse functions, including genes involved in the determination of lifespan. A massive accumulation of circRNAs during aging was uncovered. Many hundreds of circRNAs were significantly increased among the aging time-points and increases of select circRNAs by over 40-fold during aging were quantified by RT-qPCR. The expression of 459 circRNAs was determined to be distinct from the expression of linear RNAs from the same host genes, demonstrating host gene independence of circRNA age-accumulation. We attribute the global scale of circRNA age-accumulation to the high composition of post-mitotic cells in adult C. elegans, coupled with the high resistance of circRNAs to decay. These findings suggest that the exceptional stability of circRNAs might explain age-accumulation trends observed from neural tissues of other organisms, which also have a high composition of post-mitotic cells. Given the suitability of C. elegans for aging research, it is now poised as an excellent model system to determine whether there are functional consequences of circRNA accumulation during aging.

Journal ArticleDOI
TL;DR: It is found that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency, and this approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries.
Abstract: RNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules. We have incorporated UMIs into RNA-seq and small RNA-seq protocols and developed tools to analyze the resulting data. Our UMIs contain stretches of random nucleotides whose lengths sufficiently capture diverse molecule species in both RNA-seq and small RNA-seq libraries generated from mouse testis. Our approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries. Using simulated and real datasets, we demonstrate that our methods increase the reproducibility of RNA-seq and small RNA-seq data. Notably, we find that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency. Finally, we show that computational removal of PCR duplicates based only on their mapping coordinates introduces substantial bias into data analysis.

Journal ArticleDOI
TL;DR: The characteristics of the Bgh genome are consistent with a “one-speed" genome that differs in its architecture and (co-)evolutionary pattern from the “two-speed” genomes reported for several other filamentous phytopathogens.
Abstract: Powdery mildews are biotrophic pathogenic fungi infecting a number of economically important plants. The grass powdery mildew, Blumeria graminis, has become a model organism to study host specialization of obligate biotrophic fungal pathogens. We resolved the large-scale genomic architecture of B. graminis forma specialis hordei (Bgh) to explore the potential influence of its genome organization on the co-evolutionary process with its host plant, barley (Hordeum vulgare). The near-chromosome level assemblies of the Bgh reference isolate DH14 and one of the most diversified isolates, RACE1, enabled a comparative analysis of these haploid genomes, which are highly enriched with transposable elements (TEs). We found largely retained genome synteny and gene repertoires, yet detected copy number variation (CNV) of secretion signal peptide-containing protein-coding genes (SPs) and locally disrupted synteny blocks. Genes coding for sequence-related SPs are often locally clustered, but neither the SPs nor the TEs reside preferentially in genomic regions with unique features. Extended comparative analysis with different host-specific B. graminis formae speciales revealed the existence of a core suite of SPs, but also isolate-specific SP sets as well as congruence of SP CNV and phylogenetic relationship. We further detected evidence for a recent, lineage-specific expansion of TEs in the Bgh genome. The characteristics of the Bgh genome (largely retained synteny, CNV of SP genes, recently proliferated TEs and a lack of significant compartmentalization) are consistent with a “one-speed” genome that differs in its architecture and (co-)evolutionary pattern from the “two-speed” genomes reported for several other filamentous phytopathogens.

Journal ArticleDOI
TL;DR: This study shows that existing software based on the measurement of ROH can accurately identify autozygosity across the genome, provided appropriate threshold parameters are used.
Abstract: While autozygosity as a consequence of selection is well understood, there is limited information on the ability of different methods to measure true inbreeding. In the present study, a gene dropping simulation was performed and inbreeding estimates based on runs of homozygosity (ROH), pedigree, and the genomic relationship matrix were compared to true inbreeding. Inbreeding based on ROH was estimated using SNP1101, PLINK, and BCFtools software with different threshold parameters. The effects of different selection methods on ROH patterns were also compared. Furthermore, inbreeding coefficients were estimated in a sample of genotyped North American Holstein animals born from 1990 to 2016 using 50 k chip data and ROH patterns were assessed before and after genomic selection. Using ROH with a minimum window size of 20 to 50 using SNP1101 provided the closest estimates to true inbreeding in simulation study. Pedigree inbreeding tended to underestimate true inbreeding, and results for genomic inbreeding varied depending on assumptions about base allele frequencies. Using an ROH approach also made it possible to assess the effect of population structure and selection on distribution of runs of autozygosity across the genome. In the simulation, the longest individual ROH and the largest average length of ROH were observed when selection was based on best linear unbiased prediction (BLUP), whereas genomic selection showed the largest number of small ROH compared to BLUP estimated breeding values (BLUP-EBV). In North American Holsteins, the average number of ROH segments of 1 Mb or more per individual increased from 57 in 1990 to 82 in 2016. The rate of increase in the last 5 years was almost double that of previous 5 year periods. Genomic selection results in less autozygosity per generation, but more per year given the reduced generation interval. This study shows that existing software based on the measurement of ROH can accurately identify autozygosity across the genome, provided appropriate threshold parameters are used. Our results show how different selection strategies affect the distribution of ROH, and how the distribution of ROH has changed in the North American dairy cattle population over the last 25 years.

Journal ArticleDOI
TL;DR: Apart from the ST-driven spread, plasmid-mediated spread, especially via IncI1 and IncK plasmids, likely plays an important role for emergence and transmission of blaCMY-2 between animals and humans.
Abstract: Resistance to 3rd-generation cephalosporins in Escherichia coli is mostly mediated by extended-spectrum beta-lactamases (ESBLs) or AmpC beta-lactamases. Besides overexpression of the species-specific chromosomal ampC gene, acquisition of plasmid-encoded ampC genes, e.g. blaCMY-2, has been described worldwide in E. coli from humans and animals. To investigate a possible transmission of blaCMY-2 along the food production chain, we conducted a next-generation sequencing (NGS)-based analysis of 164 CMY-2-producing E. coli isolates from humans, livestock animals and foodstuff from Germany. The data of the 164 sequenced isolates revealed 59 different sequence types (STs); the most prevalent ones were ST38 (n = 19), ST131 (n = 16) and ST117 (n = 13). Two STs were present in all reservoirs: ST131 (human n = 8; food n = 2; animal n = 6) and ST38 (human n = 3; animal n = 9; food n = 7). All but one CMY-2-producing ST131 isolates belonged to the clade B (fimH22) that differed substantially from the worldwide dominant CTX-M-15-producing clonal lineage ST131-O25b clade C (fimH30). Plasmid replicon types IncI1 (n = 61) and IncK (n = 72) were identified for the majority of blaCMY-2-carrying plasmids. Plasmid sequence comparisons showed a remarkable sequence identity, especially for IncK plasmids. Associations of replicon types and distinct STs were shown for IncK and ST57, ST429 and ST38 as well as for IncI1 and ST58. Additional β-lactamase genes (blaTEM, blaCTX-M, blaOXA, blaSHV) were detected in 50% of the isolates, and twelve E. coli from chicken and retail chicken meat carried the colistin resistance gene mcr-1. We found isolates of distinct E. coli clonal lineages (ST131 and ST38) in all three reservoirs. However, a direct clonal relationship of isolates from food animals and humans was only noticeable for a few cases. The CMY-2-producing E. coli-ST131 represents a clonal lineage different from the CTX-M-15-producing ST131-O25b cluster. Apart from the ST-driven spread, plasmid-mediated spread, especially via IncI1 and IncK plasmids, likely plays an important role for emergence and transmission of blaCMY-2 between animals and humans.

Journal ArticleDOI
TL;DR: This study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics and demonstrates that improper methods may result in unacceptably high levels of false positives, which in turn may lead to incorrect or obfuscated biological interpretation.
Abstract: In shotgun metagenomics, microbial communities are studied through direct sequencing of DNA without any prior cultivation. By comparing gene abundances estimated from the generated sequencing reads, functional differences between the communities can be identified. However, gene abundance data is affected by high levels of systematic variability, which can greatly reduce the statistical power and introduce false positives. Normalization, which is the process where systematic variability is identified and removed, is therefore a vital part of the data analysis. A wide range of normalization methods for high-dimensional count data has been proposed but their performance on the analysis of shotgun metagenomic data has not been evaluated. Here, we present a systematic evaluation of nine normalization methods for gene abundance data. The methods were evaluated through resampling of three comprehensive datasets, creating a realistic setting that preserved the unique characteristics of metagenomic data. Performance was measured in terms of the methods ability to identify differentially abundant genes (DAGs), correctly calculate unbiased p-values and control the false discovery rate (FDR). Our results showed that the choice of normalization method has a large impact on the end results. When the DAGs were asymmetrically present between the experimental conditions, many normalization methods had a reduced true positive rate (TPR) and a high false positive rate (FPR). The methods trimmed mean of M-values (TMM) and relative log expression (RLE) had the overall highest performance and are therefore recommended for the analysis of gene abundance data. For larger sample sizes, CSS also showed satisfactory performance. This study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics. Our results also demonstrate that improper methods may result in unacceptably high levels of false positives, which in turn may lead to incorrect or obfuscated biological interpretation.

Journal ArticleDOI
TL;DR: Genes inside ROH islands suggest a strong selection for dairy traits and enrichment for Gyr cattle environmental adaptation and the existence of a moderate correlation between larger ROH indicates that FROH can be used as an alternative to inbreeding estimates in the absence of pedigree records.
Abstract: Runs of homozygosity (ROH) are continuous homozygous segments of the DNA sequence. They have been applied to quantify individual autozygosity and used as a potential inbreeding measure in livestock species. The aim of the present study was (i) to investigate genome-wide autozygosity to identify and characterize ROH patterns in Gyr dairy cattle genome; (ii) identify ROH islands for gene content and enrichment in segments shared by more than 50% of the samples, and (iii) compare estimates of molecular inbreeding calculated from ROH (FROH), genomic relationship matrix approach (FGRM) and based on the observed versus expected number of homozygous genotypes (FHOM), and from pedigree-based coefficient (FPED). ROH were identified in all animals, with an average number of 55.12 ± 10.37 segments and a mean length of 3.17 Mb. Short segments (ROH1–2 Mb) were abundant through the genomes, which accounted for 60% of all segments identified, even though the proportion of the genome covered by them was relatively small. The findings obtained in this study suggest that on average 7.01% (175.28 Mb) of the genome of this population is autozygous. Overlapping ROH were evident across the genomes and 14 regions were identified with ROH frequencies exceeding 50% of the whole population. Genes associated with lactation (TRAPPC9), milk yield and composition (IRS2 and ANG), and heat adaptation (HSF1, HSPB1, and HSPE1), were identified. Inbreeding coefficients were estimated through the application of FROH, FGRM, FHOM, and FPED approaches. FPED estimates ranged from 0.00 to 0.327 and FROH from 0.001 to 0.201. Low to moderate correlations were observed between FPED-FROH and FGRM-FROH, with values ranging from −0.11 to 0.51. Low to high correlations were observed between FROH-FHOM and moderate between FPED-FHOM and FGRM-FHOM. Correlations between FROH from different lengths and FPED gradually increased with ROH length. Genes inside ROH islands suggest a strong selection for dairy traits and enrichment for Gyr cattle environmental adaptation. Furthermore, low FPED-FROH correlations for small segments indicate that FPED estimates are not the most suitable method to capture ancient inbreeding. The existence of a moderate correlation between larger ROH indicates that FROH can be used as an alternative to inbreeding estimates in the absence of pedigree records.

Journal ArticleDOI
TL;DR: This analysis demonstrates that SymC appears to have evolved by losing gene families, such as the MAA biosynthesis gene cluster, and implies that Symbiodinium ecology drives acquisition and loss of gene families.
Abstract: The marine dinoflagellate, Symbiodinium, is a well-known photosynthetic partner for coral and other diverse, non-photosynthetic hosts in subtropical and tropical shallows, where it comprises an essential component of marine ecosystems. Using molecular phylogenetics, the genus Symbiodinium has been classified into nine major clades, A-I, and one of the reported differences among phenotypes is their capacity to synthesize mycosporine-like amino acids (MAAs), which absorb UV radiation. However, the genetic basis for this difference in synthetic capacity is unknown. To understand genetics underlying Symbiodinium diversity, we report two draft genomes, one from clade A, presumed to have been the earliest branching clade, and the other from clade C, in the terminal branch. The nuclear genome of Symbiodinium clade A (SymA) has more gene families than that of clade C, with larger numbers of organelle-related genes, including mitochondrial transcription terminal factor (mTERF) and Rubisco. While clade C (SymC) has fewer gene families, it displays specific expansions of repeat domain-containing genes, such as leucine-rich repeats (LRRs) and retrovirus-related dUTPases. Interestingly, the SymA genome encodes a gene cluster for MAA biosynthesis, potentially transferred from an endosymbiotic red alga (probably of bacterial origin), while SymC has completely lost these genes. Our analysis demonstrates that SymC appears to have evolved by losing gene families, such as the MAA biosynthesis gene cluster. In contrast to the conservation of genes related to photosynthetic ability, the terminal clade has suffered more gene family losses than other clades, suggesting a possible adaptation to symbiosis. Overall, this study implies that Symbiodinium ecology drives acquisition and loss of gene families.

Journal ArticleDOI
TL;DR: Differences in expression profiles among the resistant genotypes indicate genotype-specific defense mechanisms, and shows a greater resemblance in transcriptomics of HC374 to Nyubai, consistent with their sharing of two FHB resistance QTLs, compared to Wuhan 1 which carries one QTL on 2DL in common with HC374.
Abstract: Fusarium head blight (FHB) of wheat in North America is caused mostly by the fungal pathogen Fusarium graminearum (Fg) Upon exposure to Fg, wheat initiates a series of cellular responses involving massive transcriptional reprogramming In this study, we analyzed transcriptomics data of four wheat genotypes (Nyubai, Wuhan 1, HC374, and Shaw), at 2 and 4 days post inoculation (dpi) with Fg, using RNA-seq technology A total of 37,772 differentially expressed genes (DEGs) were identified, 28,961 from wheat and 8811 from the pathogen The susceptible genotype Shaw exhibited the highest number of host and pathogen DEGs, including 2270 DEGs associating with FHB susceptibility Protein serine/threonine kinases and LRR-RK were associated with susceptibility at 2 dpi, while several ethylene-responsive, WRKY, Myb, bZIP and NAC-domain containing transcription factors were associated with susceptibility at 4 dpi In the three resistant genotypes, 220 DEGs were associated with resistance Glutathione S-transferase (GST), membrane proteins and distinct LRR-RKs were associated with FHB resistance across the three genotypes Genes with unique, high up-regulation by Fg in Wuhan 1 were mostly transiently expressed at 2 dpi, while many defense-associated genes were up-regulated at both 2 and 4 dpi in Nyubai; the majority of unique genes up-regulated in HC374 were detected at 4 dpi only In the pathogen, most genes showed increased expression between 2 and 4 dpi in all genotypes, with stronger levels in the susceptible host; however two pectate lyases and a hydrolase were expressed higher at 2 dpi, and acetyltransferase activity was highly enriched at 4 dpi There was an early up-regulation of LRR-RKs, different between susceptible and resistant genotypes; subsequently, distinct sets of genes associated with defense response were up-regulated Differences in expression profiles among the resistant genotypes indicate genotype-specific defense mechanisms This study also shows a greater resemblance in transcriptomics of HC374 to Nyubai, consistent with their sharing of two FHB resistance QTLs on 3BS and 5AS, compared to Wuhan 1 which carries one QTL on 2DL in common with HC374

Journal ArticleDOI
TL;DR: A novel, scalable pipeline for real-time analysis of MinION sequence data is demonstrated and use of this pipeline is used to show initial proof of concept that metagenomic MinION sequencing can provide rapid, accurate diagnosis for prosthetic joint infections.
Abstract: Prosthetic joint infections are clinically difficult to diagnose and treat. Previously, we demonstrated metagenomic sequencing on an Illumina MiSeq replicates the findings of current gold standard microbiological diagnostic techniques. Nanopore sequencing offers advantages in speed of detection over MiSeq. Here, we report a real-time analytical pathway for Nanopore sequence data, designed for detecting bacterial composition of prosthetic joint infections but potentially useful for any microbial sequencing, and compare detection by direct-from-clinical-sample metagenomic nanopore sequencing with Illumina sequencing and standard microbiological diagnostic techniques. DNA was extracted from the sonication fluids of seven explanted orthopaedic devices, and additionally from two culture negative controls, and was sequenced on the Oxford Nanopore Technologies MinION platform. A specific analysis pipeline was assembled to overcome the challenges of identifying the true infecting pathogen, given high levels of host contamination and unavoidable background lab and kit contamination. The majority of DNA classified (> 90%) was host contamination and discarded. Using negative control filtering thresholds, the species identified corresponded with both routine microbiological diagnosis and MiSeq results. By analysing sequences in real time, causes of infection were robustly detected within minutes from initiation of sequencing. We demonstrate a novel, scalable pipeline for real-time analysis of MinION sequence data and use of this pipeline to show initial proof of concept that metagenomic MinION sequencing can provide rapid, accurate diagnosis for prosthetic joint infections. The high proportion of human DNA in prosthetic joint infection extracts prevents full genome analysis from complete coverage, and methods to reduce this could increase genome depth and allow antimicrobial resistance profiling. The nine samples sequenced in this pilot study have shown a proof of concept for sequencing and analysis that will enable us to investigate further sequencing to improve specificity and sensitivity.

Journal ArticleDOI
TL;DR: The first molecular signature of oEVs across the bovine estrous cycle is provided, revealing marked differences between post- and pre-ovulatory stages and contributing to a better understanding of the potential role of o EVs as modulators of gamete/embryo-maternal interactions.
Abstract: The success of early reproductive events depends on an appropriate communication between gametes/embryos and the oviduct. Extracellular vesicles (EVs) contained in oviductal secretions have been suggested as new players in mediating this crucial cross-talk by transferring their cargo (proteins, mRNA and small ncRNA) from cell to cell. However, little is known about the oviductal EVs (oEVS) composition and their implications in the reproductive success. The aim of the study was to determine the oEVs content at protein, mRNA and small RNA level and to examine whether the oEVs content is under the hormonal influence of the estrous cycle. We identified the presence of oEVs, exosomes and microvesicles, in the bovine oviductal fluid at different stages of the estrous cycle (postovulatory-stage, early luteal phase, late luteal phase and pre-ovulatory stage) and demonstrated that their composition is under hormonal regulation. RNA-sequencing identified 903 differentially expressed transcripts (FDR 2). Our data revealed proteins related to early embryo development and gamete-oviduct interactions as well as numerous ribosomal proteins. Our study provides with the first molecular signature of oEVs across the bovine estrous cycle, revealing marked differences between post- and pre-ovulatory stages. Our findings contribute to a better understanding of the potential role of oEVs as modulators of gamete/embryo-maternal interactions and their implications for the reproductive success.

Journal ArticleDOI
TL;DR: Using genome skimming, the complete chloroplast genomes of two Oresitrophe rupifraga and one Mukdenia rossii individuals were reconstructed and comparative analyses were conducted to examine the evolutionary pattern of chlorOPlast genomes in Saxifragaceae.
Abstract: Epilithic sister genera Oresitrophe and Mukdenia (Saxifragaceae) have an epilithic habitat (rocky slopes) and a parapatric distribution in East Asia, which makes them an ideal model for a more comprehensive understanding of the demographic and divergence history and the influence of climate changes in East Asia. However, the genetic background and resources for these two genera are scarce. The complete chloroplast (cp) genomes of two Oresitrophe rupifraga and one Mukdenia rossii individuals were reconstructed and comparative analyses were conducted to examine the evolutionary pattern of chloroplast genomes in Saxifragaceae. The cp genomes ranged from 156,738 bp to 156,960 bp in length and had a typical quadripartite structure with a conserved genome arrangement. Comparative analysis revealed the intron of rpl2 has been lost in Heuchera parviflora, Tiarella polyphylla, M. rossii and O. rupifraga but presents in the reference genome of Penthorum chinense. Seven cp hotspot regions (trnH-psbA, trnR-atpA, atpI-rps2, rps2-rpoC2, petN-psbM, rps4-trnT and rpl33-rps18) were identified between Oresitrophe and Mukdenia, while four hotspots (trnQ-psbK, trnR-atpA, trnS-psbZ and rpl33-rps18) were identified within Oresitrophe. In addition, 24 polymorphic cpSSR loci were found between Oresitrophe and Mukdenia. Most importantly, we successfully developed 126 intergeneric polymorphic gSSR markers between Oresitrophe and Mukdenia, as well as 452 intrageneric ones within Oresitrophe. Twelve randomly selected intergeneric gSSRs have shown that these two genera exhibit a significant genetic structure. In this study, we conducted genome skimming for Oresitrophe rupifraga and Mukdenia rossii. Using these data, we were able to not only assemble their complete chloroplast genomes, but also develop abundant genetic resources (cp hotspots, cpSSRs, polymorphic gSSRs). The genomic patterns and genetic resources presented here will contribute to further studies on population genetics, phylogeny and conservation biology in Saxifragaceae.

Journal ArticleDOI
TL;DR: The analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites and casts light on the interconnections between secondary metabolite gene clusters.
Abstract: Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary. Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis’ strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes. Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds.

Journal ArticleDOI
TL;DR: The gene structure and motif distribution analysis of the GRAS members in G. hirsutum revealed that many genes of the SHR subfamily have more than one intron, which indicated these genes may have effect on the development and breeding of cotton.
Abstract: Cotton is a major fiber and oil crop worldwide. Cotton production, however, is often threatened by abiotic environmental stresses. GRAS family proteins are among the most abundant transcription factors in plants and play important roles in regulating root and shoot development, which can improve plant resistance to abiotic stresses. However, few studies on the GRAS family have been conducted in cotton. Recently, the G. hirsutum genome sequences have been released, which provide us an opportunity to analyze the GRAS family in G. hirsutum. In total, 150 GRAS proteins from G. hirsutum were identified. Phylogenetic analysis showed that these GRAS protins could be classified into 14 subfamilies including SCR, DLT, OS19, LAS, SCL4/7, OS4, OS43, DELLA, PAT1, SHR, HAM, SCL3, LISCL and G_GRAS. The gene structure and motif distribution analysis of the GRAS members in G. hirsutum revealed that many genes of the SHR subfamily have more than one intron, which maybe a kind of form in the evolution of plant by obtaining or losing introns. Chromosomal location and duplication analysis revealed that segment and tandem duplication maybe the reasons of the expension of the GRAS family in cotton. Gene expression analysis confirmed the expression level of GRAS members were up-regulated under different abiotic stresses, suggesting that their possible roles in response to stresses. What’s more, higher expression level in root, stem, leaf and pistil also indicated these genes may have effect on the development and breeding of cotton. This study firstly shows the comprehensive analysis of GRAS members in G. hirsutum. Our results provide important information about GRAS family and a framework for stress-resistant breeding in G. hirsutum.

Journal ArticleDOI
TL;DR: UDiTaS is a robust and streamlined sequencing method useful for measuring small indels as well as structural rearrangements, like translocations, in a single reaction, and is especially useful for pre-clinical and clinical application of gene editing to measure on- and off-target editing, large and small.
Abstract: Understanding the diversity of repair outcomes after introducing a genomic cut is essential for realizing the therapeutic potential of genomic editing technologies. Targeted PCR amplification combined with Next Generation Sequencing (NGS) or enzymatic digestion, while broadly used in the genome editing field, has critical limitations for detecting and quantifying structural variants such as large deletions (greater than approximately 100 base pairs), inversions, and translocations. To overcome these limitations, we have developed a Uni-Directional Targeted Sequencing methodology, UDiTaS, that is quantitative, removes biases associated with variable-length PCR amplification, and can measure structural changes in addition to small insertion and deletion events (indels), all in a single reaction. We have applied UDiTaS to a variety of samples, including those treated with a clinically relevant pair of S. aureus Cas9 single guide RNAs (sgRNAs) targeting CEP290, and a pair of S. pyogenes Cas9 sgRNAs at T-cell relevant loci. In both cases, we have simultaneously measured small and large edits, including inversions and translocations, exemplifying UDiTaS as a valuable tool for the analysis of genome editing outcomes. UDiTaS is a robust and streamlined sequencing method useful for measuring small indels as well as structural rearrangements, like translocations, in a single reaction. UDiTaS is especially useful for pre-clinical and clinical application of gene editing to measure on- and off-target editing, large and small.

Journal ArticleDOI
TL;DR: In this article, the authors compared the performance of the TruSeq and NEXTflex protocols with the use of a novel type of randomised adapters called MidRand-Like (MRL) adapters and polyethylene glycol.
Abstract: Next-generation sequencing technologies have revolutionized the study of small RNAs (sRNAs) on a genome-wide scale. However, classical sRNA library preparation methods introduce serious bias, mainly during adapter ligation steps. Several types of sRNA including plant microRNAs (miRNA), piwi-interacting RNAs (piRNA) in insects, nematodes and mammals, and small interfering RNAs (siRNA) in insects and plants contain a 2’-O-methyl (2’-OMe) modification at their 3′ terminal nucleotide. This inhibits 3′ adapter ligation and makes library preparation particularly challenging. To reduce bias, the NEBNext kit (New England Biolabs) uses polyethylene glycol (PEG), the NEXTflex V2 kit (BIOO Scientific) uses both randomised adapters and PEG, and the novel SMARTer (Clontech) and CATS (Diagenode) kits avoid ligation altogether. Here we compared these methods with Illumina’s classical TruSeq protocol regarding the detection of normal and 2’ OMe RNAs. In addition, we modified the TruSeq and NEXTflex protocols to identify conditions that improve performance. Among the five kits tested with their respective standard protocols, the SMARTer and CATS kits had the lowest levels of bias but also had a strong formation of side products, and as a result performed relatively poorly with biological samples; NEXTflex detected the largest numbers of different miRNAs. The use of a novel type of randomised adapters called MidRand-Like (MRL) adapters and PEG improved the detection of 2’ OMe RNAs both in the TruSeq as well as in the NEXTflex protocol. While it is commonly accepted that biases in sRNA library preparation protocols are mainly due to adapter ligation steps, the ligation-free protocols were not the best performing methods. Our modified versions of the TruSeq and NEXTflex protocols provide an improved tool for the study of 2’ OMe RNAs.

Journal ArticleDOI
TL;DR: Major gut microbiota members utilise different strategies for gut colonisation and high oxygen sensitivity of Firmicutes may explain their commonly reported decrease after oxidative burst during gut inflammation.
Abstract: In order to start to understand the function of individual members of gut microbiota, we cultured, sequenced and analysed bacterial anaerobes from chicken caecum. Altogether 204 isolates from chicken caecum were obtained in pure cultures using Wilkins-Chalgren anaerobe agar and anaerobic growth conditions. Genomes of all the isolates were determined using the NextSeq platform and subjected to bioinformatic analysis. Among 204 sequenced isolates we identified 133 different strains belonging to seven different phyla - Firmicutes, Bacteroidetes, Actinobacteria, Proteobacteria, Verrucomicrobia, Elusimicrobia and Synergistetes. Genome sizes ranged from 1.51 Mb in Elusimicrobium minutum to 6.70 Mb in Bacteroides ovatus. Clustering based on the presence of protein coding genes showed that isolates from phyla Proteobacteria, Verrucomicrobia, Elusimicrobia and Synergistetes did not cluster with the remaining isolates. Firmicutes split into families Lactobacillaceae, Enterococcaceae, Veillonellaceae and order Clostridiales from which the Clostridium perfringens isolates formed a distinct sub-cluster. All Bacteroidetes isolates formed a separate cluster showing similar genetic composition in all isolates but distinct from the rest of the gut anaerobes. The majority of Actinobacteria clustered closely together except for the representatives of genus Gordonibacter showing that the genome of this genus differs from the rest of Actinobacteria sequenced in this study. Representatives of Bacteroidetes commonly encoded proteins (collagenase, hemagglutinin, hemolysin, hyaluronidase, heparinases, chondroitinase, mucin-desulfating sulfatase or glutamate decarboxylase) that may enable them to interact with their host. Aerotolerance was recorded in Akkermansia and Cloacibacillus and was also common among representatives of Bacteroidetes. On the other hand, Elusimicrobium and the majority of Clostridiales were highly sensitive to air exposure despite their potential for spore formation. Major gut microbiota members utilise different strategies for gut colonisation. High oxygen sensitivity of Firmicutes may explain their commonly reported decrease after oxidative burst during gut inflammation.