scispace - formally typeset
Search or ask a question
Author

Jiří Hon

Bio: Jiří Hon is an academic researcher from Brno University of Technology. The author has contributed to research in topics: Bioconductor & Protein engineering. The author has an hindex of 3, co-authored 3 publications receiving 81 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A newly developed Bioconductor package for identifying potential quadruplex‐forming sequences (PQS), which allows for sequence searches that accommodate possible divergences from the optimal G4 base composition and demonstrates that the algorithm behind the searches has a 96% accuracy.
Abstract: Motivation: G-quadruplexes (G4s) are one of the non-B DNA structures easily observed in vitro and assumed to form in vivo. The latest experiments with G4-specific antibodies and G4-unwinding helicase mutants confirm this conjecture. These four-stranded structures have also been shown to influence a range of molecular processes in cells. As G4s are intensively studied, it is often desirable to screen DNA sequences and pinpoint the precise locations where they might form. Results: We describe and have tested a newly-developed Bioconductor package for identifying potential quadruplex-forming sequences (PQS). The package is easy-to-use, flexible and customizable. It allows for sequence searches that accommodate possible divergences from the optimal G4 base composition. A novel aspect of our research was the creation and training (parametrization) of an advanced scoring model which resulted in increased precision compared to similar tools. We demonstrate that the algorithm behind the searches has a 96% accuracy on 392 currently known and experimentally observed G4 structures. We also carried out searches against the recent G4-seq data to verify how well we can identify the structures detected by that technology. The correlation with pqsfinder predictionswas 0.622, higher than the correlation 0.491 obtained with the second best G4Hunter. Availability:http://bioconductor.org/packages/pqsfinder/ This paper is based on pqsfinder-1.4.1.

97 citations

Journal ArticleDOI
TL;DR: A previously published implementation of a triplex DNA search algorithm with visualization is combined to create a versatile R/Bioconductor package 'triplex', which provides functions that can be used to search Bioconductor genomes and other DNA sequence data for occurrence of nucleotide patterns capable of forming intramolecular triplexes (H-DNA).
Abstract: Upgrade and integration of triplex software into the R/Bioconductor framework. We combined a previously published implementation of a triplex DNA search algorithm with visualization to create a versatile R/Bioconductor package triplex. The new package provides functions that can be used to search Bioconductor genomes and other DNA sequence data for occurrence of nucleotide patterns capable of forming intramolecular triplexes (H-DNA). Functions producing 2-D and 3-D diagrams of the identified triplexes allow instant visualization of the search results. Leveraging the power of Biostrings and GRanges classes, the results get fully integrated into the existing Bioconductor framework, allowing their passage to other Genome visualization and annotation packages, such as GenomeGraphs, rtracklayer or Gviz. R package triplex is available from Bioconductor (bioconductor.org).

26 citations

Journal ArticleDOI
TL;DR: This work modified the internals of the recursive algorithm to avoid matching and scoring many sub-optimal PQS conformations that are later discarded, and created a website for submission of sequence analysis jobs that does not require knowledge of R to use pqsfinder.
Abstract: MOTIVATION G-quadruplex is a DNA or RNA form in which four guanine-rich regions are held together by base pairing between guanine nucleotides in coordination with potassium ions. G-quadruplexes are increasingly seen as a biologically important component of genomes. Their detection in vivo is problematic; however, sequencing and spectrometric techniques exist for their in vitro detection. We previously devised the pqsfinder algorithm for PQS identification, implemented it in C++ and published as an R/Bioconductor package. We looked for ways to optimize pqsfinder for faster and user-friendly sequence analysis. RESULTS We identified two weak points where pqsfinder could be optimized. We modified the internals of the recursive algorithm to avoid matching and scoring many sub-optimal PQS conformations that are later discarded. To accommodate the needs of a broader range of users, we created a website for submission of sequence analysis jobs that does not require knowledge of R to use pqsfinder. AVAILABILITY AND IMPLEMENTATION https://pqsfinder.fi.muni.cz, https://bioconductor.org/packages/pqsfinder. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

13 citations

Journal ArticleDOI
TL;DR: This paper presented a pipeline integrating sequence and structural bioinformatics with microfluidic enzymology for bioprospecting of efficient and robust haloalkane dehalogenases.
Abstract: Next-generation sequencing doubles genomic databases every 2.5 years. The accumulation of sequence data provides a unique opportunity to identify interesting biocatalysts directly in the databases without tedious and time-consuming engineering. Herein, we present a pipeline integrating sequence and structural bioinformatics with microfluidic enzymology for bioprospecting of efficient and robust haloalkane dehalogenases. The bioinformatic part identified 2,905 putative dehalogenases and prioritized a “small-but-smart” set of 45 genes, yielding 40 active enzymes, 24 of which were biochemically characterized by microfluidic enzymology techniques. Combining microfluidics with modern global data analysis provided precious mechanistic insights related to the high catalytic efficiency of selected enzymes. Overall, we have doubled the dehalogenation “toolbox” characterized over three decades, yielding biocatalysts that surpass the efficiency of currently available wild-type and engineered enzymes. This pipeline is generally applicable to other enzyme families and can accelerate the identification of efficient biocatalysts for industrial use. • A pipeline integrating bioinformatics and microfluidic enzymology is introduced • A small-but-smart set of selected genes yielded a 90% success rate of active enzymes • Microfluidics and global data analysis provided mechanistic insight into biocatalysis • The obtained dehalogenases outperform previously discovered or engineered variants For decades, scientists have asked themselves how to obtain better enzymes: should they discover new enzymes from nature or improve known enzymes by protein engineering? The success of many protein engineering studies might lead to underestimating the potential of natural diversity represented by genomic databases. We present a pipeline integrating sequence and structural bioinformatics with microfluidic enzymology to discover efficient and robust biocatalysts. Bioinformatic analysis prioritizes promising candidates, while microfluidic enzymology facilitates efficient characterization of these enzymes, leading to mechanistic insights. The obtained enzymes catalytically outperformed previously known variants, independently of whether these had been newly discovered or engineered. This study represents an interesting conceptual view of current approaches used in biocatalyst development, which should explore the great potential of structural and functional diversity found in nature. We present a pipeline integrating sequence and structural bioinformatics with microfluidic enzymology to discover efficient and robust haloalkane dehalogenases. Our smart bioinformatic identification of promising candidates in genomic databases is followed by efficient microfluidic characterization, in terms of activity, specificity, stability, and mechanistic insights. The obtained biocatalysts outperform the previously known wild-type and engineered dehalogenases. This strategy is applicable to other enzyme families, paving the way toward accelerating the identification of novel biocatalysts for industrial applications.

5 citations

DOI
TL;DR: A pipeline integrating sequence and structural bioinformatics with microfluidic enzymology for bioprospecting of efficient and robust haloalkane dehalogenases is presented, yielding biocatalysts that surpass the efficiency of currently available wild-type and engineered enzymes.
Abstract: SUMMARY Next-generation sequencing doubles genomic databases every 2.5 years. The accumulation of sequence data provides a unique opportunity to identify interesting biocatalysts directly in the databases without tedious and time-consuming engineering. Herein, we present a pipeline integrating sequence and structural bioinformatics with microfluidic enzymology for bioprospecting of efficient and robust haloalkane dehalogenases. The bioinformatic part identified 2,905 putative dehalogenases and prioritized a “small-but-smart” set of 45 genes, yielding 40 active enzymes, 24 of which were biochemically characterized by microfluidic enzymology techniques. Combining microfluidics with modern global data analysis provided precious mechanistic insights related to the high catalytic efficiency of selected enzymes. Overall, we have doubled the dehalogenation “toolbox” characterized over three decades, yielding biocatalysts that surpass the efficiency of currently available wild-type and engineered enzymes. This pipeline is generally applicable to other enzyme families and can accelerate the identification of efficient biocatalysts for industrial use. GRAPHICAL ABSTRACT

Cited by
More filters
Journal ArticleDOI
TL;DR: An improved version of a G-quadruplex sequencing method is employed to generate whole genome G4 maps for 12 species that include widely studied model organisms and also pathogens of clinical relevance, and reveals that the enrichment of OQs in gene promoters is particular to mammals such as mouse and human, among the species studied.
Abstract: The S.B. research group is supported by programme grant funding from Cancer Research UK (C9681/A18618), European Research Council Advanced Grant No. 339778, a Wellcome Trust Senior Investigator Award (grant 209441/z/17/z) and by core funding from Cancer Research UK (C14303/A17197). We are grateful to the Biotechnology and Biological Sciences Research Council (BBSRC) and Illumina for the CASE studentship supporting V.S.C. (BB/I015477/1).

245 citations

Journal ArticleDOI
TL;DR: The present review aims at providing an updated overview of the current open-source G-quadruplex prediction algorithms and straightforward examples of their implementation, and proposing other estimates which consider non-canonical sequences and/or structure propensity and stability.
Abstract: Guanine-rich nucleic acids can fold into the non-B DNA or RNA structures called G-quadruplexes (G4). Recent methodological developments have allowed the characterization of specific G-quadruplex structures in vitro as well as in vivo, and at a much higher throughput, in silico, which has greatly expanded our understanding of G4-associated functions. Typically, the consensus motif G3+N1-7G3+N1-7G3+N1-7G3+ has been used to identify potential G-quadruplexes from primary sequence. Since, various algorithms have been developed to predict the potential formation of quadruplexes directly from DNA or RNA sequences and the number of studies reporting genome-wide G4 exploration across species has rapidly increased. More recently, new methodologies have also appeared, proposing other estimates which consider non-canonical sequences and/or structure propensity and stability. The present review aims at providing an updated overview of the current open-source G-quadruplex prediction algorithms and straightforward examples of their implementation.

132 citations

Journal ArticleDOI
TL;DR: A web version of the G4Hunter application is developed that allows retrieval of gene/nucleotide sequence entries from NCBI databases and provides complete characterization of localization and quadruplex propensity of quadruplex-forming sequences.
Abstract: Motivation Expanding research highlights the importance of guanine quadruplex structures. Therefore, easy-accessible tools for quadruplex analyses in DNA and RNA molecules are important for the scientific community. Results We developed a web version of the G4Hunter application. This new web-based server is a platform-independent and user-friendly application for quadruplex analyses. It allows retrieval of gene/nucleotide sequence entries from NCBI databases and provides complete characterization of localization and quadruplex propensity of quadruplex-forming sequences. The G4Hunter web application includes an interactive graphical data representation with many useful options including visualization, sorting, data storage and export. Availability and implementation G4Hunter web application can be accessed at: http://bioinformatics.ibp.cz. Supplementary information Supplementary data are available at Bioinformatics online.

118 citations

Journal ArticleDOI
TL;DR: A newly developed Bioconductor package for identifying potential quadruplex‐forming sequences (PQS), which allows for sequence searches that accommodate possible divergences from the optimal G4 base composition and demonstrates that the algorithm behind the searches has a 96% accuracy.
Abstract: Motivation: G-quadruplexes (G4s) are one of the non-B DNA structures easily observed in vitro and assumed to form in vivo. The latest experiments with G4-specific antibodies and G4-unwinding helicase mutants confirm this conjecture. These four-stranded structures have also been shown to influence a range of molecular processes in cells. As G4s are intensively studied, it is often desirable to screen DNA sequences and pinpoint the precise locations where they might form. Results: We describe and have tested a newly-developed Bioconductor package for identifying potential quadruplex-forming sequences (PQS). The package is easy-to-use, flexible and customizable. It allows for sequence searches that accommodate possible divergences from the optimal G4 base composition. A novel aspect of our research was the creation and training (parametrization) of an advanced scoring model which resulted in increased precision compared to similar tools. We demonstrate that the algorithm behind the searches has a 96% accuracy on 392 currently known and experimentally observed G4 structures. We also carried out searches against the recent G4-seq data to verify how well we can identify the structures detected by that technology. The correlation with pqsfinder predictionswas 0.622, higher than the correlation 0.491 obtained with the second best G4Hunter. Availability:http://bioconductor.org/packages/pqsfinder/ This paper is based on pqsfinder-1.4.1.

97 citations

Journal ArticleDOI
TL;DR: Methodologies including predictive algorithms and structure-based sequencing have enabled the detection and mapping of rG4 structures on a transcriptome-wide scale at high sensitivity and resolution and the associated findings in relation to rG 4-related biological mechanisms are discussed.
Abstract: RNA G-quadruplex (rG4) secondary structures are proposed to play key roles in fundamental biological processes that include the modulation of transcriptional, co-transcriptional, and posttranscriptional events. Recent methodological developments that include predictive algorithms and structure-based sequencing have enabled the detection and mapping of rG4 structures on a transcriptome-wide scale at high sensitivity and resolution. The data generated by these studies provide valuable insights into the potentially diverse roles of rG4s in biology and open up a number of mechanistic hypotheses. Herein we highlight these methodologies and discuss the associated findings in relation to rG4-related biological mechanisms.

86 citations