A novel phylogenetic analysis combined with a machine learning approach predicts human mitochondrial variant pathogenicity
Summary (3 min read)
- Because of the critical roles that mitochondria play in metabolism and bioenergetics, mutation of mitochondria-localized proteins and ribonucleic acids can adversely affect human health (Alston et al, 2017; Suomalainen & Battersby, 2018; Khan et al, 2020; Russell et al, 2020).
- Heteroplasmy among the hundreds of mitochondrial DNA molecules found within a cell (Stewart & Chinnery, 2015; Hahn & Zuryn, 2019; Wei & Chinnery, 2020), differential distribution of disease-causing among tissues (Boulet et al, 1992), and modifier alleles within the mitochondrial genome (Wei et al, 2017; Elliott et al, 2008) magnify the difficulty of interpreting different alterations.
- Simple tabulation of mtDNA variants found among healthy or sick individuals (Whiffin et al, 2017) may be of limited utility in predicting how harmful a variant may be.
- First, while knowledge of amino acid physico-chemical properties is widely considered to be informative regarding whether an amino acid substitution may or may not have a damaging effect on protein function (Dayhoff 3 et al, 1978), the site-specific acceptability of a given substitution is ultimately decided within the context of its local protein environment (Zuckerkandl & Pauling, 1965).
- Third, alignment (Kawrykow et al, 2012; Iantorno et al, 2014) and sequencing errors (Chen et al, 2017; Smith, 2019) may falsely indicate the acceptability of a particular mtDNA substitution.
- Mapping apparent substitutions to a phylogenetic tree allows calculation of relative positional conservation in mtDNA-encoded proteins and RNAs Using the sequences of extant species and the predicted ancestral node values, the authors subsequently analyzed each edge of the tree for the presence or absence of substitutions at each aligned human position.
- When calculated for protein and RNA sites encoded by mammalian mtDNA, it is clear that the TSS (and the ISS, not shown) provides an excellent readout of relative conservation at, and consequent functional importance of, each alignment position.
- Substitution scores and inferred direct substitutions can be linked to human mtDNA variant pathogenicity Since summation of detected substitutions across a phylogenetic tree provides a robust measure of relative conservation at different macromolecular positions, the authors were confident that a phylogenetic analysis that includes TSSs would also provide information about the pathogenicity of human mtDNA variants.
- Even so, the distribution of variant frequencies among full-length sequences in GenBank was strikingly different for those mutations for which an IIDS could be identified in their mammalian trees of proteins , and even tRNAs , when compared to those for which an IIDS could not be identified.
A support vector machine predicts harmful mtDNA variants
- Given the clear presence of deleterious substitutions among so far uncharacterized variants, the authors sought a high-throughput method that could, with confidence, identify these potentially deleterious substitutions.
- MitoCAP also scored best against their training set when considering most auxiliary measures of prediction proficiency .
- To further investigate this possibility, the authors first plotted the level of agreement between MitoCAP other methods when assessing all classified variants, and they noted a pronounced lack of overlap between their MitoCAP predictions and the predictions of other methods .
- When heteroplasmy data for unannotated variants in HelixMTdb are analyzed for other prediction methods , as performed above for MitoCAP, MitoCAP best separated variants into classes with different heteroplasmy propensities and achieved the highest Kolmogorov-Smirnov D score .
- Taken together, their analyses indicate that MitoCAP appears to be the most proficient among the compared methods in predicting pathogenicity of variants in mtDNA-encoded proteins, while alternative methods may outperform MitoCAP during classification of tRNA variants.
- The authors describe here a methodology that allows improved quantification of the relative conservation of sites within and between genes, RNAs, and proteins.
- Even nearly identical sequences can be utilized by their approach, allowing for an everincreasing input dataset that can be deployed toward calculation of site-specific conservation.
- The authors note that focusing upon IIDSs, rather than the simple presence or absence of a character at a site, can indirectly integrate information about potential epistatic interactions that permit or block a substitution from being successfully established within a lineage.
- The MitoCAP predictions that the authors provide allow for improved comprehension of which mtDNA variants identified within a patient may be linked to mitochondrial disease.
- Concordantly, their data suggest a strong propensity for heteroplasmy in the set of substitutions that the authors predict to be pathogenic, but are not yet clinically annotated as disease-associated.
- Mitochondrial DNA sequence acquisition and conservation analysis Mammalian mtDNA sequences were retrieved from the National Center for Biotechnology Information database of organelle genomes (https://www.ncbi.nlm.nih.gov/genome/browse#!/organelles/ on September 26, 2019).
- The PAGAN output was then analyzed using “binary-table-by-edges-v2.2” and "addconvention-to-binarytable-v1.1.py" (https://github.com/corydunnlab/hummingbird).
- For proteins, the negative training sets consisted of 50 mtDNA substitutions (encoding 51 protein variants) from the reference sequence.
- Predictions for the ROC curve were collected using ‘mining’ function of the rminer package (Cortez, 2015), with the optimized parameters during 10 runs of 5-fold cross-validation [model="ksvm", task = "prob", method = c("kfold", 5), Runs = 10].
- Comparison of selected, alternative prediction methods with MitoCAP Pathogenicity predictions for their training and test set variants were compared to predictions made by PolyPhen-2 (Adzhubei et al, 2013), PROVEAN (Choi et al, 2012), Panther-PSEP (Tang & Thomas, 2016b), Mitoclass (Martín-Navarro et al, 2017) and MitImpact (Castellana et al, 2015).
- C.D.D. is managing director, and B.A.A., and P.O.C. are members, of Primal Predictions LLC, a firm developing approaches to variant pathogenicity prediction.
Did you find this useful? Give us your feedback
"A novel phylogenetic analysis combi..." refers methods in this paper
...These 1184 mammalian mtDNA genomes were aligned using MAFFT on the ‘auto’ setting (Katoh and Standley 2013)....
Related Papers (5)
H.-J. Bandelt, Yong-Gang Yao +3 more
Nicolas Dierckxsens, Patrick Mardulyn +1 more
Liron Levin, Ilia Zhidkov +3 more
David L Goode, Gregory M. Cooper +9 more
Sung Chun, Justin C. Fay