Home
/
Authors
/
Jaroslav Zendulka

Author

Jaroslav Zendulka

Bio: Jaroslav Zendulka is an academic researcher from Brno University of Technology. The author has contributed to research in topics: Cluster analysis & Object (computer science). The author has an hindex of 9, co-authored 37 publications receiving 816 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations.

[...]

Jaroslav Bendl¹, Jan Stourac², Ondrej Salanda¹, Antonín Pavelka², Eric D. Wieben³, Jaroslav Zendulka¹, Jan Brezovsky², Jiri Damborsky² - Show less +4 more•Institutions (3)

Brno University of Technology¹, Masaryk University², Mayo Clinic³

16 Jan 2014-PLOS Computational Biology

TL;DR: This study constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated prediction tools, and returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools.

...read moreread less

Abstract: Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.

...read moreread less

571 citations

Journal Article•DOI•

PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

[...]

Jaroslav Bendl¹, Jaroslav Bendl², Milos Musil², Milos Musil¹, Jan Stourac¹, Jaroslav Zendulka², Jiří Damborský¹, Jan Brezovský¹ - Show less +4 more•Institutions (2)

Masaryk University¹, Brno University of Technology²

25 May 2016-PLOS Computational Biology

TL;DR: A user-friendly web interface was developed that provides easy access to the five tools’ predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations.

...read moreread less

Abstract: An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools’ predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.

...read moreread less

134 citations

Journal Article•DOI•

pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R.

[...]

Jiří Hon¹, Tomáš Martínek¹, Jaroslav Zendulka¹, Matej Lexa²•Institutions (2)

Brno University of Technology¹, Masaryk University²

01 Nov 2017-Bioinformatics

TL;DR: A newly developed Bioconductor package for identifying potential quadruplex‐forming sequences (PQS), which allows for sequence searches that accommodate possible divergences from the optimal G4 base composition and demonstrates that the algorithm behind the searches has a 96% accuracy.

...read moreread less

Abstract: Motivation: G-quadruplexes (G4s) are one of the non-B DNA structures easily observed in vitro and assumed to form in vivo. The latest experiments with G4-specific antibodies and G4-unwinding helicase mutants confirm this conjecture. These four-stranded structures have also been shown to influence a range of molecular processes in cells. As G4s are intensively studied, it is often desirable to screen DNA sequences and pinpoint the precise locations where they might form. Results: We describe and have tested a newly-developed Bioconductor package for identifying potential quadruplex-forming sequences (PQS). The package is easy-to-use, flexible and customizable. It allows for sequence searches that accommodate possible divergences from the optimal G4 base composition. A novel aspect of our research was the creation and training (parametrization) of an advanced scoring model which resulted in increased precision compared to similar tools. We demonstrate that the algorithm behind the searches has a 96% accuracy on 392 currently known and experimentally observed G4 structures. We also carried out searches against the recent G4-seq data to verify how well we can identify the structures detected by that technology. The correlation with pqsfinder predictionswas 0.622, higher than the correlation 0.491 obtained with the second best G4Hunter. Availability:http://bioconductor.org/packages/pqsfinder/ This paper is based on pqsfinder-1.4.1.

...read moreread less

97 citations

Journal Article•DOI•

FireProt: web server for automated design of thermostable proteins

[...]

Milos Musil¹, Milos Musil², Jan Stourac², Jaroslav Bendl², Jaroslav Bendl¹, Jan Brezovsky², Zbynek Prokop², Jaroslav Zendulka¹, Tomáš Martínek¹, Tomáš Martínek², David Bednar², Jiri Damborsky² - Show less +8 more•Institutions (2)

Brno University of Technology¹, Masaryk University²

03 Jul 2017-Nucleic Acids Research

TL;DR: FireProt is a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calculation core and is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostably mutants.

...read moreread less

Abstract: There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnological applications. A number of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purification, and characterization. Here, we present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calculation core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.

...read moreread less

88 citations

Proceedings Article•DOI•

Ouroboros: early identification of at-risk students without models based on legacy data

[...]

Martin Hlosta¹, Zdenek Zdrahal², Jaroslav Zendulka¹•Institutions (2)

Brno University of Technology¹, Czech Technical University in Prague²

13 Mar 2017

TL;DR: The concept of a "self-learner" that builds the machine learning models from the data generated during the current course, which utilises information about already submitted assessments, and introduces the problem of imbalanced data for training and testing the classification models.

...read moreread less

Abstract: This paper focuses on the problem of identifying students, who are at risk of failing their course. The presented method proposes a solution in the absence of data from previous courses, which are usually used for training machine learning models. This situation typically occurs in new courses. We present the concept of a "self-learner" that builds the machine learning models from the data generated during the current course. The approach utilises information about already submitted assessments, which introduces the problem of imbalanced data for training and testing the classification models. There are three main contributions of this paper: (1) the concept of training the models for identifying at-risk students using data from the current course, (2) specifying the problem as a classification task, and (3) tackling the challenge of imbalanced data, which appears both in training and testing data. The results show the comparison with the traditional approach of learning the models from the legacy course data, validating the proposed concept.

...read moreread less

85 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•

Model Driven Engineering

[...]

Stuart Kent

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: A framework for model driven engineering is set out, which proposes an organisation of the modelling 'space' and how to locate models in that space, and identifies the need for defining families of languages and transformations, and for developing techniques for generating/configuring tools from such definitions.

...read moreread less

Abstract: The Object Management Group's (OMG) Model Driven Architecture (MDA) strategy envisages a world where models play a more direct role in software production, being amenable to manipulation and transformation by machine. Model Driven Engineering (MDE) is wider in scope than MDA. MDE combines process and analysis with architecture. This article sets out a framework for model driven engineering, which can be used as a point of reference for activity in this area. It proposes an organisation of the modelling 'space' and how to locate models in that space. It discusses different kinds of mappings between models. It explains why process and architecture are tightly connected. It discusses the importance and nature of tools. It identifies the need for defining families of languages and transformations, and for developing techniques for generating/configuring tools from such definitions. It concludes with a call to align metamodelling with formal language engineering techniques.

...read moreread less

1,476 citations

Journal Article•DOI•

REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants

[...]

Nilah M. Ioannidis¹, Joseph H. Rothstein², Joseph H. Rothstein¹, Vikas Pejaver³, Sumit Middha⁴, Shannon K. McDonnell⁵, Saurabh Baheti⁵, Anthony M. Musolf⁶, Qing Li⁶, Emily R. Holzinger⁶, Danielle M. Karyadi⁶, Lisa A. Cannon-Albright⁷, Craig C. Teerlink⁷, Janet L. Stanford⁸, William B. Isaacs⁹, Jianfeng Xu¹⁰, Kathleen A. Cooney⁷, Kathleen A. Cooney¹¹, Ethan M. Lange¹², Johanna Schleutker¹³, John D. Carpten¹⁴, Isaac J. Powell¹⁵, Olivier Cussenot¹⁶, Geraldine Cancel-Tassin¹⁶, Graham G. Giles¹⁷, Graham G. Giles¹⁸, Robert J. MacInnis¹⁷, Robert J. MacInnis¹⁸, Christiane Maier¹⁹, Chih-Lin Hsieh²⁰, Fredrik Wiklund²¹, William J. Catalona²², William D. Foulkes²³, Diptasri Mandal²⁴, Rosalind A. Eeles, Zsofia Kote-Jarai, Carlos Bustamante¹, Daniel J. Schaid⁵, Trevor Hastie¹, Elaine A. Ostrander⁶, Joan E. Bailey-Wilson⁶, Predrag Radivojac³, Stephen N. Thibodeau⁵, Alice S. Whittemore¹, Weiva Sieh¹, Weiva Sieh² - Show less +42 more•Institutions (24)

Stanford University¹, Icahn School of Medicine at Mount Sinai², Indiana University³, Memorial Sloan Kettering Cancer Center⁴, Mayo Clinic⁵, National Institutes of Health⁶, University of Utah⁷, Fred Hutchinson Cancer Research Center⁸, Johns Hopkins University⁹, NorthShore University HealthSystem¹⁰, University of Michigan¹¹, University of North Carolina at Chapel Hill¹², University of Turku¹³, Translational Genomics Research Institute¹⁴, Wayne State University¹⁵, University of Paris¹⁶, University of Melbourne¹⁷, Cancer Council Victoria¹⁸, University of Ulm¹⁹, University of Southern California²⁰, Karolinska Institutet²¹, Northwestern University²², McGill University²³, LSU Health Sciences Center New Orleans²⁴

06 Oct 2016-American Journal of Human Genetics

TL;DR: This work developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, LRT, GERP, SiPhy, phyloP, and phastCons.

...read moreread less

Abstract: The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p −12 ) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies

...read moreread less

1,295 citations

Journal Article•DOI•

Better prediction of functional effects for sequence variants

[...]

Maximilian Hecht¹, Yana Bromberg², Burkhard Rost¹•Institutions (2)

Technische Universität München¹, Rutgers University²

18 Jun 2015-BMC Genomics

TL;DR: SNP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants, significantly outperformed other methods and optimized the new method to perform surprisingly well even without alignments.

...read moreread less

Abstract: Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web Delta, input feature that results from computing the difference feature scores for native amino acid and feature scores for variant amino acid; nsSNP, non-synoymous SNP; PMD, Protein Mutant Database; SNAP, Screening for non-acceptable polymorphisms; SNP, single nucleotide polymorphism; variant, any amino acid changing sequence variant.

...read moreread less

461 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse