Fitness landscape analysis reveals that the wild type allele is sub-optimal and mutationally robust

doi:10.1101/2021.09.27.461914

Fitness landscape analysis reveals that the wild type allele is

1

sub-optimal and mutationally robust

2

Tzahi Gabzi (1), Yitzhak Pilpel (1) and Tamar Friedlander (2)

(1) Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel

(2) The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture

Faculty of Agriculture, Hebrew University of Jerusalem,

P.O. Box 12 Rehovot 7610001, Israel

Correspondence: tamar.friedlander@mail.huji.ac.il, pilpel@weizmann.ac.il.

3

September 27, 2021

4

Abstract

5

Fitness landscape mapping and the prediction of evolutionary trajectories on these landscapes are

6

major tasks in evolutionary biology research. Evolutionary dynamics is tightly linked to the land-

7

scape topography, but this relation is not straightforward. Models predict dierent evolutionary

8

outcomes depending on mutation rates: high-tness genotypes should dominate the population un-

9

der low mutation rates and lower-tness, mutationally robust (also called 'at') genotypes - at higher

10

mutation rates. Yet, so far, at genotypes have been demonstrated in very few cases, particularly in

11

viruses. The quantitative conditions for their emergence were studied only in simplied single-locus,

12

two-peak landscapes. In particular, it is unclear whether within the same genome some genes can

13

be at while the remaining ones are t. Here, we analyze a previously measured tness landscape

14

of a yeast tRNA gene. We found that the wild type allele is sub-optimal, but is mutationally robust

15

('at'). Using computer simulations, we estimated the critical mutation rate in which transition

16

from t to at allele should occur for a gene with such characteristics. We then used a scaling

17

argument to extrapolate this critical mutation rate for a full genome, assuming the same mutation

18

rate for all genes. Finally, we propose that while the majority of genes are still selected to be ttest,

19

there are a few mutation hot-spots like the tRNA, for which the mutationally robust at allele is

20

favored by selection.

21

Introduction

22

Fitness landscape mapping and prediction of evolutionary trajectories of these landscapes are major

23

tasks in evolutionary biology [1]. While evolutionary theory predicts that population mean tness

24

should increase over time, it oers only few quantitative predictions for the dynamics of evolution

25

and the possible evolutionary trajectories. The main hurdle for generally computing evolutionary

26

1

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a

preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted September 27, 2021. ; https://doi.org/10.1101/2021.09.27.461914doi: bioRxiv preprint

trajectories is their dependence on the underlying tness landscape. Currently available tness

27

landscapes include between 16 and 100,000 dierent genotypes (for review see [2, 3]). Yet, even the

28

largest datasets [4, 5, 6, 7] encompass only small fractions of the entire tness landscape of even

29

a single gene. As detailed tness measurements have been unavailable until recently, most of the

30

associated theory was developed in isolation from data [8, 9, 10, 11, 12, 13, 14, 15, 16]. Additionally,

31

the development of a general theory is dicult, because tness landscapes are diverse and dier in

32

details.

33

Evolutionary dynamics on empirical tness landscapes was studied in cases in which genotype-

34

phenotype mapping was available, such as folded RNA molecules [17, 18, 19] and transcription-factor

35

binding sites [20, 21, 22] or in computational tness landscape models closely inspired by particular

36

experimental systems, such as maturation of the immune response [13, 23] and molecular interac-

37

tions [24, 25]. Evolutionary dynamics on phenotypic tness landscapes was studied for bacterial

38

metabolic networks [26] and antibiotic resistance [27]. Exploration of empirical tness landscapes

39

and extraction of their statistical features such as local correlation, epistasis, ruggedness and density

40

of local maxima [8, 28, 2, 6, 29, 30, 31], were pursued in the belief that these statistical hallmarks

41

will aid in translating evolutionary trajectories to more general landscapes [32, 33].

42

The focus of the studies surveyed above was genotype

tness

. Genes however are thought to

43

evolve not only to maximize tness, but also to reduce crosstalk [34, 35], increase network modu-

44

larity [36] and allow for desired signaling properties [37, 24]. Mutational robustness - the extent to

45

which tness changes due to mutations - has been demonstrated to be an additional driver of evolu-

46

tion [38, 39, 19, 40, 41, 42, 43, 44]. The quasi-species framework developed by Manfred Eigen and

47

Peter Schuster [45, 46, 47] is a theoretical framework that describes mutation-selection evolutionary

48

dynamics of a large number of distinct genotypes. This framework is suitable for studying evolution

49

of genetic sequences with a large variety of alleles, as those captured by tness landscapes. Quasi-

50

species theory is an extension of the simple single-locus systems studied in population genetics [48].

51

While the above-mentioned models mostly assumed the strong-selection-weak-mutation (SSWM)

52

regime, in which the population is nearly monomorphic, the quasi-species framework allows for high

53

mutation rates such that the population is polymorphic. This theory predicts a failure to adapt (so-

54

called "error catastrophe") if the mutation rate exceeds a threshold value. In intermediate mutation

55

rates, it predicts that populations could (depending on the landscape) favor sub-optimal but muta-

56

tionally robust genotypes over the ttest ones. This "survival of the attest" result has been shown

57

theoretically for the simple two-peak landscape case [49, 50]. It was demonstrated in simulations of

58

digital organisms [51] and experimentally in plant viral pathogens [52] and RNA viruses [53].

59

The advent in sequencing technologies now enables measurement of increasingly larger tness

60

landscape datasets [6, 7]. It is then desirable to predict evolutionary trajectories on these empirical

61

tness landscapes, using the previously developed theory in this eld.

62

A recent set of experiments characterized the tness landscape of the tRNA

Arg

CCU

gene of

S.

63

cerevisiae

. As this gene is relatively short (72 nucleotides), its landscape is signicantly smaller

64

than that of a typical protein (average of 1.4 kb in

S. cerevisiae

). It is a single-copy, non-essential

65

gene, such that many of its mutants are viable. Li

et al.

measured the growth rates of

23, 284

66

dierent mutants of this gene in four dierent growth conditions (

23



C,

30



C,

37



C and oxidative

67

2

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a

preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted September 27, 2021. ; https://doi.org/10.1101/2021.09.27.461914doi: bioRxiv preprint

stress) [54, 55]. The richness of this dataset renders it a highly valuable case study for analyzing

68

topographic properties and evolutionary trajectories of an empirical landscape and for comparing

69

them with theoretical predictions. Here, we comprehensively analyze this tRNA tness landscape, in

70

eorts to identify the properties that dictate whether a particular genotype can be the "wild type",

71

namely the extant outcome of the evolutionary dynamics. We found that the wild type was not the

72

ttest genotype, in any of the four conditions measured, nor was it the ttest on average over all, nor

73

a local tness maximum. We then dened a measure of genotype local atness with respect to its

74

single-point mutants and found that the wild type was one of the attest genotypes in the dataset.

75

Stochastic evolutionary simulations over this empirical tness landscape showed a phase transition at

76

a threshold mutation rate, from a population dominated by a high-tness (non wild type) genotype

77

at low mutation rates to a collection of many intermediate-tness genotypes composed of the wild

78

type and other genotypes of similar tness. To estimate the full-genome mutation rate in which

79

this transition is expected, we used the threshold mutation rate for the tRNA alone, as obtained in

80

the simulations, and applied a scaling argument, assuming equal properties for all loci. Variation in

81

either local mutation rate or gene susceptibility to mutation could however cause hybrid constructs

82

with a mixture of t and at genes in the same genome.

83

Results

84

The wild type is not the ttest genotype.

85

Our dataset consists of experimental tness measurements of



65, 000

mutants of the

S. cerevisiae

86

tRNA

Arg

CCU

gene collected by Li

et al.

[54, 55]. Growth rates of 23,284 of these mutants were

87

measured under four dierent environmental conditions:

23



C,

30



C,

37



C and oxidative stress. In

88

the following, we refer only to the genotypes that were measured under all four conditions. The

89

tness of each genotype was dened as the base-2 exponent of its relative growth rate with respect

90

to the wild type (see Methods). Hence, by denition the wild type tness was set to 1, for each

91

condition.

92

We began by closely examining the tness values dataset. Our rst remarkable nding was that

93

the wild type was not the genotype with highest tness under any of the four conditions, as one

94

might expect from population-genetic models for single-locus selection, if the population is at steady

95

state. Under each of the conditions, between 2000 and 2400 mutants (out of the 23,284) exhibited

96

higher tness than the wild type (Fig. 1b-e)

1

. We then analyzed possible sources for measurement

97

errors, including read-count variability, as a source of inaccuracy in growth rate assessment and

98

the possibility that the tness eect was due to independent mutations that fortuitously occurred

99

elsewhere in the genome (SI - Figs. S1-S2). While such error sources did exist, they could not fully

100

account for the wild type's tness sub-optimality.

101

1

2441 genotypes in

23



C, 2075 genotypes in

30



C, 2008 genotypes in

37



C and 2236 genotypes in oxidative stress

3

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a

preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted September 27, 2021. ; https://doi.org/10.1101/2021.09.27.461914doi: bioRxiv preprint

1

ﬁtness

N

1

N

2

N

3

N

4

WT

(a)

(b) (c)

(d) (e)

4

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a

preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted September 27, 2021. ; https://doi.org/10.1101/2021.09.27.461914doi: bioRxiv preprint

Figure 1

(previous page)

:

Empirical tness landscape of a tRNA gene (a)

A schematic

visualization of the experimentally measured tRNA tness landscape. Each circle represents a

genotype. Filled circles represent genotypes whose tness values (here encoded by dierent colors)

were measured. Empty circles represent genotypes whose tness values were not measured. We use

here a concentric representation of the tness landscape, centered around the wild type, where the

minimal number of steps on the graph between any two genotypes is the number of point mutations

separating them. The wild type is then surrounded by expanding circles of its single mutants

(denoted by

N

1

), double mutants (

N

2

), etc. The experiment probed all the wild type's single-point

mutants, but only decreasingly smaller proportions of the following mutational neighborhoods,

N

i

.

(b-e):

The distribution of all tness values measured under four dierent conditions (

30



C,

23



C,

DMSO and

37



C), at semi-log scale. The wild type tness value is shown in each by the red dotted

line. Fitness was dened relative to the wild type's tness, such that the wild type tness was set to

1 for each condition. Under each of the conditions tested, 8%-10% of the genotypes in this dataset

were tter than the wild type. The relative weights of dierent tness values were biased by the

non-uniform sampling of the landscape, with dense sampling close to the wild type, and sparser

sampling farther away.

The wild type is not the ttest on average across conditions

102

A possible explanation for the apparent sub-optimality of the wild type could be that while some

103

mutants are tter than the wild type under a specic condition, they are much less t under other

104

conditions, such that,

on average

the wild type is ttest. To test the applicability of this explanation

105

for our case, we checked for each genotype the correlation between its tness values under the various

106

growth conditions. For high-tness genotypes (>1.05

30



C), a high correlation was found between

107

the tness values measured under various conditions,

r



0.75



0.91

between tness values at

30



C

108

and tness values under the other conditions. Namely, most genotypes which are t under one

109

condition are also t under others (see Fig. 2a). In contrast, genotypes with low tness in the range

110

0.6-0.8 at

30



C, showed a much lower correlation between their tness values across conditions,

111

r



0.28



0.49

(see Fig. 2b). These results argue against the possibility that the wild type is the

112

ttest on average, which would imply that genotypes having high-tness under one condition should

113

have low tness under another.

114

To formally compare between tness values averaged over multiple conditions, we considered the

115

geometric mean tness [56],

x

f

i

y  p

±

m

f

m

i

q

1

{

M

, where

f

m

i

is the tness value of the

i

-th genotype

116

in the

m

-th condition (out of

M

). The tness values we have are relative to the wild type's, whose

117

tness was dened to be 1 under each of the conditions. Since growth rates dier between conditions,

118

we must rst transform the tness values under the dierent conditions to a common baseline before

119

we can calculate the geometric mean. To do so, we used the wild type growth rates reported by

120

Li

et al.

for each of the conditions (See Methods section for details). Fig. 2c shows a histogram

121

of the geometric mean tness values

x

f

i

y

of all the genotypes in our dataset after transforming the

122

original values. A possible caveat to this calculation is the underlying assumption that all four

123

conditions are equally probable in the organism's natural habitat. Empirical tness values might

124

be inaccurate due to various reasons as discussed in the SI (Section 1). To reduce dependence on

125

tness value inaccuracies, we may also look at the tness ranks: under each condition separately,

126

all the genotypes are ranked according to their tness values in ascending order (lowest tness has

127

5

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under a

preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted September 27, 2021. ; https://doi.org/10.1101/2021.09.27.461914doi: bioRxiv preprint

Fitness landscape analysis reveals that the wild type allele is sub-optimal and mutationally robust

Figures

References

Selforganization of matter and the evolution of biological macromolecules

The hypercycle. A principle of natural self-organization. Part A: Emergence of the hypercycle.

Towards a general theory of adaptive walks on rugged landscapes.

Evolution of digital organisms at high mutation rates leads to survival of the flattest

Population Genetics: A Concise Guide

Related Papers (5)

Epistasis and frequency dependence influence the fitness of an adaptive mutation in a diversifying lineage

Measuring ruggedness in fitness landscapes

Rates of fitness decline and rebound suggest pervasive epistasis

Epistasis, pleiotropy, and the mutation load in sexual and asexual populations

Evolution of fitnesses and allele frequencies in a population with spatially heterogeneous selection pressures.

Trending Questions (1)