Home
/
Authors
/
Konrad Scheffler

Author

Konrad Scheffler

Other affiliations: University of Cape Town, University of California, San Diego, University of Cambridge ...read more

Bio: Konrad Scheffler is an academic researcher from Illumina. The author has contributed to research in topics: Selection (genetic algorithm) & Phylogenetic tree. The author has an hindex of 27, co-authored 53 publications receiving 5192 citations. Previous affiliations of Konrad Scheffler include University of Cape Town & University of California, San Diego.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2002
2000
1999
1998
1997

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Detecting individual sites subject to episodic diversifying selection.

[...]

Ben Murrell¹, Ben Murrell², Joel O. Wertheim³, Sasha Moola², Thomas Weighill², Konrad Scheffler², Konrad Scheffler³, Sergei L. Kosakovsky Pond³ - Show less +4 more•Institutions (3)

Medical Research Council¹, Stellenbosch University², University of California, San Diego³

12 Jul 2012-PLOS Genetics

TL;DR: It is found that episodic selection is widespread and it is concluded that the number of sites experiencing positive selection may have been vastly underestimated.

...read moreread less

Abstract: The imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.

...read moreread less

1,327 citations

Journal Article•DOI•

FUBAR : A Fast, Unconstrained Bayesian AppRoximation for inferring selection

[...]

Ben Murrell¹, Sasha Moola¹, Sasha Moola², Amandla Mabona¹, Amandla Mabona², Thomas Weighill¹, Daniel J. Sheward², Sergei L. Kosakovsky Pond³, Konrad Scheffler³, Konrad Scheffler¹ - Show less +6 more•Institutions (3)

Stellenbosch University¹, University of Cape Town², University of California, San Diego³

01 May 2013-Molecular Biology and Evolution

TL;DR: This work presents an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes, and leaves the distribution of selection parameters essentially unconstrained.

...read moreread less

Abstract: Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection—an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server (http://www.datamonkey.org/).

...read moreread less

939 citations

Journal Article•DOI•

Strelka2: fast and accurate calling of germline and somatic variants.

[...]

Sangtae Kim¹, Konrad Scheffler¹, Aaron L. Halpern¹, Mitchell A. Bekritsky¹, Eunho Noh¹, Morten Källberg¹, Xiaoyu Chen¹, Yeonbin Kim¹, Doruk Beyter², Doruk Beyter³, Peter Krusche¹, Christopher T. Saunders¹ - Show less +8 more•Institutions (3)

Illumina¹, Amgen², University of California, San Diego³

16 Jul 2018-Nature Methods

TL;DR: Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis.

...read moreread less

Abstract: We describe Strelka2 ( https://github.com/Illumina/strelka ), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.

...read moreread less

798 citations

Journal Article•DOI•

Less is More: an Adaptive Branch-Site Random Effects Model for Efficient Detection of Episodic Diversifying Selection

[...]

M. D. Smith¹, Joel O. Wertheim¹, Steven Weaver¹, Ben Murrell¹, Konrad Scheffler¹, Sergei L. Kosakovsky Pond¹ - Show less +2 more•Institutions (1)

University of California, San Diego¹

19 Feb 2015-Molecular Biology and Evolution

TL;DR: Adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion, delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster.

...read moreread less

Abstract: Over the past two decades, comparative sequence analysis using codon-substitution models has been honed into a powerful and popular approach for detecting signatures of natural selection from molecular data. A substantial body of work has focused on developing a class of “branch-site” models which permit selective pressures on sequences, quantified by the ω ratio, to vary among both codon sites and individual branches in the phylogeny. We develop and present a method in this class, adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion. By applying models of different complexity to different branches in the phylogeny, aBSREL delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster. Based on simulated data analysis, we offer guidelines for what extent and strength of diversifying positive selection can be detected reliably and suggest that there is a natural limit on the optimal parametric complexity for “branch-site” models. An aBSREL analysis of 8,893 Euteleostomes gene alignments demonstrates that over 80% of branches in typical gene phylogenies can be adequately modeled with a single ω ratio model, that is, current models are unnecessarily complicated. However, there are a relatively small number of key branches, whose identities are derived from the data using a model selection procedure, for which it is essential to accurately model evolutionary complexity.

...read moreread less

501 citations

Journal Article•DOI•

RELAX: Detecting Relaxed Selection in a Phylogenetic Framework

[...]

Joel O. Wertheim¹, Ben Murrell¹, M. D. Smith¹, Sergei L. Kosakovsky Pond¹, Konrad Scheffler², Konrad Scheffler¹ - Show less +2 more•Institutions (2)

University of California, San Diego¹, Stellenbosch University²

01 Mar 2015-Molecular Biology and Evolution

TL;DR: This work presents a general hypothesis testing framework (RELAX) for detecting relaxed selection in a codon-based phylogenetic framework and demonstrates the power of RELAX in a variety of biological scenarios where relaxation of selection has been hypothesized or demonstrated previously.

...read moreread less

Abstract: Relaxation of selective strength, manifested as a reduction in the efficiency or intensity of natural selection, can drive evolutionary innovation and presage lineage extinction or loss of function. Mechanisms through which selection can be relaxed range from the removal of an existing selective constraint to a reduction in effective population size. Standard methods for estimating the strength and extent of purifying or positive selection from molecular sequence data are not suitable for detecting relaxed selection, because they lack power and can mistake an increase in the intensity of positive selection for relaxation of both purifying and positive selection. Here, we present a general hypothesis testing framework (RELAX) for detecting relaxed selection in a codon-based phylogenetic framework. Given two subsets of branches in a phylogeny, RELAX can determine whether selective strength was relaxed or intensified in one of these subsets relative to the other. We establish the validity of our test via simulations and show that it can distinguish between increased positive selection and a relaxation of selective strength. We also demonstrate the power of RELAX in a variety of biological scenarios where relaxation of selection has been hypothesized or demonstrated previously. We find that obligate and facultative γ-proteobacteria endosymbionts of insects are under relaxed selection compared with their free-living relatives and obligate endosymbionts are under relaxed selection compared with facultative endosymbionts. Selective strength is also relaxed in asexual Daphnia pulex lineages, compared with sexual lineages. Endogenous, nonfunctional, bornavirus-like elements are found to be under relaxed selection compared with exogenous Borna viruses. Finally, selection on the short-wavelength sensitive, SWS1, opsin genes in echolocating and nonecholocating bats is relaxed only in lineages in which this gene underwent pseudogenization; however, selection on the functional medium/long-wavelength sensitive opsin, M/LWS1, is found to be relaxed in all echolocating bats compared with nonecholocating bats.

...read moreread less

444 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

[...]

Fumio Tajima¹•Institutions (1)

Kyushu University¹

30 Oct 1989-Genomics

TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

...read moreread less

11,521 citations

Journal Article•DOI•

Evolution of Protein Molecules

[...]

S. Jeffery

01 Apr 1979-Biochemical Society Transactions

3,734 citations

Book Chapter•DOI•

I. the origin of species by means of natural selection

[...]

AsaHG Gray, A. Hunter Dupree

31 Jan 1963

2,885 citations

Journal Article•DOI•

RDP4: Detection and analysis of recombination patterns in virus genomes.

[...]

Darren P. Martin¹, Ben Murrell², Michael Golden³, Arjun Khoosal¹, Brejnev M. Muhire¹ - Show less +1 more•Institutions (3)

University of Cape Town¹, University of California, San Diego², University of Oxford³

01 Mar 2015-Virus Evolution

TL;DR: The key feature of RDP4 that differentiates it from other recombination detection tools is its flexibility, which can be run either in fully automated mode from the command line interface or with a graphically rich user interface that enables detailed exploration of both individual recombination events and overall recombination patterns.

...read moreread less

Abstract: RDP4 is the latest version of recombination detection program (RDP), a Windows computer program that implements an extensive array of methods for detecting and visualising recombination in, and stripping evidence of recombination from, virus genome sequence alignments. RDP4 is capable of analysing twice as many sequences (up to 2,500) that are up to three times longer (up to 10 Mb) than those that could be analysed by older versions of the program. RDP4 is therefore also applicable to the analysis of bacterial full-genome sequence datasets. Other novelties in RDP4 include (1) the capacity to differentiate between recombination and genome segment reassortment, (2) the estimation of recombination breakpoint confidence intervals, (3) a variety of ‘recombination aware’ phylogenetic tree construction and comparison tools, (4) new matrix-based visualisation tools for examining both individual recombination events and the overall phylogenetic impacts of multiple recombination events and (5) new tests to detect the influences of gene arrangements, encoded protein structure, nucleic acid secondary structure, nucleotide composition, and nucleotide diversity on recombination breakpoint patterns. The key feature of RDP4 that differentiates it from other recombination detection tools is its flexibility. It can be run either in fully automated mode from the command line interface or with a graphically rich user interface that enables detailed exploration of both individual recombination events and overall recombination patterns.

...read moreread less

2,386 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse