scispace - formally typeset
Search or ask a question

Showing papers by "Thomas Abeel published in 2023"


Posted ContentDOI
05 Jun 2023-bioRxiv
TL;DR: In this paper , the authors conducted a global study of the genus Enterococcus and identified 18 new species expanding the diversity of this genus by over 25% and identifying genes associated with toxins, detoxification, and resource acquisition.
Abstract: Bacteria of the genus Enterococcus colonize the guts of diverse animals. Some species have acquired multiple antibiotic resistances on top of a high level of intrinsic resistance and have emerged as leading causes of hospital-associated infection. Although clinical isolates of enterococcal species E. faecalis and E. faecium have been studied with respect to their antibiotic resistances and infection pathogenesis, comparatively little is known about the biology of enterococci in their natural context of the guts of humans and other land animals, including arthropods and other invertebrates. Importantly, little is also known about the global pool of genes already optimized for expression in an enterococcal background with the potential to be readily acquired by hospital adapted strains of E. faecalis and E. faecium, known facile exchangers of mobile genetic elements. We therefore undertook a global study designed to reach into maximally diverse habitats, to establish a first approximation of the genetic diversity of enterococci on Earth. Presumptive enterococci from over 900 diverse specimens were initially screened by PCR using a specific reporter gene that we found to accurately reflect genomic diversity. The genomes of isolates exceeding an operationally set threshold for diversity were then sequenced in their entirety and analyzed. This provided us with data on the global occurrence of many known enterococcal species and their association with various hosts and ecologies and identified 18 novel species expanding the diversity of the genus Enterococcus by over 25%. The 18 novel enterococcal species harbor a diverse array of genes associated with toxins, detoxification, and resource acquisition that highlight the capacity of the enterococci to acquire and adapt novel functions from diverse gut environments. In addition to the discovery and characterization of new species, this expanded diversity permitted a higher resolution analysis of the phylogenetic structure of the Enterococcus genus, including identification of distinguishing features of its 4 deeply rooted clades and genes associated with range expansion such as B-vitamin biosynthesis and flagellar motility. Collectively, this work provides an unprecedentedly broad and deep view of the genus Enterococcus, along with new insights into their potential threat to human health.

1 citations


Journal ArticleDOI
TL;DR: In this article , the authors combined mixed-culture biotechnology and Hi-C sequencing to elucidate the transformation of wastewater microorganisms with a synthetic plasmid encoding GFP and kanamycin resistance genes.
Abstract: The transformation of environmental microorganisms by extracellular DNA is an overlooked mechanism of horizontal gene transfer and evolution. It initiates the acquisition of exogenous genes and propagates antimicrobial resistance alongside vertical and conjugative transfers. We combined mixed-culture biotechnology and Hi-C sequencing to elucidate the transformation of wastewater microorganisms with a synthetic plasmid encoding GFP and kanamycin resistance genes, in the mixed culture of chemostats exposed to kanamycin at concentrations representing wastewater, gut and polluted environments (0.01-2.5-50-100 mg L-1). We found that the phylogenetically distant Gram-negative Runella (102 Hi-C links), Bosea (35), Gemmobacter (33) and Zoogloea (24) spp., and Gram-positive Microbacterium sp. (90) were transformed by the foreign plasmid, under high antibiotic exposure (50 mg L-1). In addition, the antibiotic pressure shifted the origin of aminoglycoside resistance genes from genomic DNA to mobile genetic elements on plasmids accumulating in microorganisms. These results reveal the power of Hi-C sequencing to catch and surveil the transfer of xenogenetic elements inside microbiomes.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors investigated the taxonomic classification, pathogenicity, and production of unique secondary metabolites of Streptomycetes inhabiting potato fields in Colombia.
Abstract: Abstract Genomes of four Streptomyces isolates, two putative new species ( Streptomyces sp. JH14 and Streptomyces sp. JH34) and two non thaxtomin-producing pathogens ( Streptomyces sp. JH002 and Streptomyces sp. JH010) isolated from potato fields in Colombia were selected to investigate their taxonomic classification, their pathogenicity, and the production of unique secondary metabolites of Streptomycetes inhabiting potato crops in this region. The average nucleotide identity (ANI) value calculated between Streptomyces sp. JH34 and its closest relatives (92.23%) classified this isolate as a new species. However, Streptomyces sp. JH14 could not be classified as a new species due to the lack of genomic data of closely related strains. Phylogenetic analysis based on 231 single-copy core genes, confirmed that the two pathogenic isolates ( Streptomyces sp. JH010 and JH002) belong to Streptomyces pratensis and Streptomyces xiamenensis , respectively, are distant from the most well-known pathogenic species, and belong to two different lineages. We did not find orthogroups of protein-coding genes characteristic of scab-causing Streptomycetes shared by all known pathogenic species. Most genes involved in biosynthesis of known virulence factors are not present in the scab-causing isolates ( Streptomyces sp. JH002 and Streptomyces sp. JH010). However, Tat-system substrates likely involved in pathogenicity in Streptomyces sp. JH002 and Streptomyces sp. JH010 were identified. Lastly, the presence of a putative mono-ADP-ribosyl transferase, homologous to the virulence factor scabin, was confirmed in Streptomyces sp. JH002. The described pathogenic isolates likely produce virulence factors uncommon in Streptomyces species, including a histidine phosphatase and a metalloprotease potentially produced by Streptomyces sp. JH002, and a pectinesterase, potentially produced by Streptomyces sp. JH010. Biosynthetic gene clusters (BGCs) showed the presence of clusters associated with the synthesis of medicinal compounds and BGCs potentially linked to pathogenicity in Streptomyces sp. JH010 and JH002. Interestingly, BGCs that have not been previously reported were also found. Our findings suggest that the four isolates produce novel secondary metabolites and metabolites with medicinal properties.

Posted ContentDOI
02 Feb 2023-bioRxiv
TL;DR: In this article , state-of-the-art long-read de novo assemblers are evaluated to help readers make a balanced choice for the assembly of eukaryotic organisms.
Abstract: Background Assembly algorithm choice should be a deliberate, well-justified decision when researchers create genome assemblies for eukaryotic organisms from third-generation sequencing technologies. While third-generation sequencing by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) have overcome the disadvantages of short read lengths specific to next-generation sequencing (NGS), third-generation sequencers are known to produce more error-prone reads, thereby generating a new set of challenges for assembly algorithms and pipelines. Since the introduction of third-generation sequencing technologies, many tools have been developed that aim to take advantage of the longer reads, and researchers need to choose the correct assembler for their projects. Results We benchmarked state-of-the-art long-read de novo assemblers, to help readers make a balanced choice for the assembly of eukaryotes. To this end, we used 13 real and 72 simulated datasets from different eukaryotic genomes, with different read length distributions, imitating PacBio CLR, PacBio HiFi, and ONT sequencing to evaluate the assemblers. We include five commonly used long read assemblers in our benchmark: Canu, Flye, Miniasm, Raven and Redbean. Evaluation categories address the following metrics: reference-based metrics, assembly statistics, misassembly count, BUSCO completeness, runtime, and RAM usage. Additionally, we investigated the effect of increased read length on the quality of the assemblies, and report that read length can, but does not always, positively impact assembly quality. Conclusions Our benchmark concludes that there is no assembler that performs the best in all the evaluation categories. However, our results shows that overall Flye is the best-performing assembler, both on real and simulated data. Next, the benchmarking using longer reads shows that the increased read length improves assembly quality, but the extent to which that can be achieved depends on the size and complexity of the reference genome.

Posted ContentDOI
02 May 2023
TL;DR: Abeel et al. as discussed by the authors developed SAP, a novel synteny-aware gene function prediction tool based on protein embeddings, to annotate bacterial species and incorporated conserved synteny across the entire bacterial kingdom using a novel operon-based approach.
Abstract: Today, we know the function of only a small fraction of all known protein sequences identified. This problem is even more salient in bacteria as human-centric studies are prioritized in the field and there is much to uncover in the bacterial genetic repertoire. Conventional approaches to bacterial gene annotation are especially inadequate for annotating previously unseen proteins in novel species since there are no proteins with similar sequence in the existing databases. Thus, we need alternative representations of proteins. Recently, there has been an uptick in interest in adopting natural language processing methods to solve challenging bioinformatics tasks; in particular using transformer-based language models to represent proteins has proven successful in tackling various challenges. However, there are still limited applications of such representations in bacteria.We developed SAP, a novel synteny-aware gene function prediction tool based on protein embeddings, to annotate bacterial species. SAP distinguishes itself from existing methods for annotating bacteria in two ways: (i) it uses embedding vectors extracted from state-of-the-art protein language models and (ii) it incorporates conserved synteny across the entire bacterial kingdom using a novel operon-based approach proposed in our work. SAP outperformed conventional annotation methods on a range of representative bacteria, for various gene prediction tasks including distant homolog detection where the sequence similarity between training and test proteins was 40% at its lowest. SAP also achieved annotation coverage on par with conventional structure-based predictors in a real-life application on Enterococcus genes of unknown function.https://github.com/AbeelLab/sap.t.abeel@tudelft.nl.Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: In this article , the authors used convolutional neural networks (CNN), variational autoencoders (VAE), principal component analysis (PCA), kernelized ridge regression (KRR), support vector regression (SVR), and multilayer perceptron (MLP) based on fusions of image data, environmental records, and plant load information, etc.
Abstract: Global soft fruit supply chains rely on trustworthy descriptions of product quality. However, crucial criteria such as sweetness and firmness cannot be accurately established without destroying the fruit. Since traditional alternatives are subjective assessments by human experts, it is desirable to obtain quality estimations in a consistent and non-destructive manner. The majority of research on fruit quality measurements analyzed fruits in the lab with uniform data collection. However, it is laborious and expensive to scale up to the level of the whole yield. The “harvest-first, analysis-second” method also comes too late to decide to adjust harvesting schedules. In this research, we validated our hypothesis of using in-field data acquirable via commodity hardware to obtain acceptable accuracies. The primary instance that the research concerns is the sugariness of strawberries, described by the juice’s total soluble solid (TSS) content (unit: °Brix or Brix). We benchmarked the accuracy of strawberry Brix prediction using convolutional neural networks (CNN), variational autoencoders (VAE), principal component analysis (PCA), kernelized ridge regression (KRR), support vector regression (SVR), and multilayer perceptron (MLP), based on fusions of image data, environmental records, and plant load information, etc. Our results suggest that: (i) models trained by environment and plant load data can perform reliable prediction of aggregated Brix values, with the lowest RMSE at 0.59; (ii) using image data can further supplement the Brix predictions of individual fruits from (i), from 1.27 to as low up to 1.10, but they by themselves are not sufficiently reliable.

Posted ContentDOI
18 Apr 2023-bioRxiv
TL;DR: In this paper , the effect of repeat-induced overlaps on de novo eukaryote genome assembly was analyzed and several methods for detecting and removing these overlaps were proposed.
Abstract: Determining accurate genotypes is important for associating phenotypes to genotypes. De novo genome assembly is a critical step to determine the complete genotype for species for which no reference exists yet. The main challenge of de novo eukaryote genome assembly, particularly plant genomes, are repetitive DNA sequences within their genomes. The introduction of third generation sequencing and corresponding long reads has promised to resolve repeat-related problems. While there have been notable improvements, reads originating from these repeats are still creating errors because they introduce false overlaps in the assembly graph. This study focuses on analyzing the effect of repeats on de novo assembly and improving performance of existing de novo assembly algorithms by removing repeat-induced overlaps. First, we show the possible improvements in de novo assembly with removing repeat-induced overlaps. Then we propose several methods for detecting and removing repeat-induced overlaps and evaluate their performance on several simulated datasets.