scispace - formally typeset
Search or ask a question

Showing papers by "Cathal Seoighe published in 2022"


Journal ArticleDOI
TL;DR: In this article , the authors investigated the relationship between inferred TMB and sequencing depth and found that TMB, estimated by counting the number of somatic mutations above a threshold frequency (typically 0.05), is not robust to sequencing depth.
Abstract: Abstract Background Tumour mutation burden (TMB), defined as the number of somatic mutations per megabase within the sequenced region in the tumour sample, has been used as a biomarker for predicting response to immune therapy. Several studies have been conducted to assess the utility of TMB for various cancer types; however, methods to measure TMB have not been adequately evaluated. In this study, we identified two sources of bias in current methods to calculate TMB. Methods We used simulated data to quantify the two sources of bias and their effect on TMB calculation, we down-sampled sequencing reads from exome sequencing datasets from TCGA to evaluate the consistency in TMB estimation across different sequencing depths. We analyzed data from ten cancer cohorts to investigate the relationship between inferred TMB and sequencing depth. Results We found that TMB, estimated by counting the number of somatic mutations above a threshold frequency (typically 0.05), is not robust to sequencing depth. Furthermore, we show that, because only mutations with an observed frequency greater than the threshold are considered, the observed mutant allele frequency provides a biased estimate of the true frequency. This can result in substantial over-estimation of the TMB, when the cancer sample includes a large number of somatic mutations at low frequencies, and exacerbates the lack of robustness of TMB to variation in sequencing depth and tumour purity. Conclusion Our results demonstrate that care needs to be taken in the estimation of TMB to ensure that results are unbiased and consistent across studies and we suggest that accurate and robust estimation of TMB could be achieved using statistical models that estimate the full mutant allele frequency spectrum.

3 citations


Journal ArticleDOI
TL;DR: In this paper , the authors investigated the relationship between inferred TMB and sequencing depth and found that TMB, estimated by counting the number of somatic mutations above a threshold frequency (typically 0.05), is not robust to sequencing depth.
Abstract: Abstract Background Tumour mutation burden (TMB), defined as the number of somatic mutations per megabase within the sequenced region in the tumour sample, has been used as a biomarker for predicting response to immune therapy. Several studies have been conducted to assess the utility of TMB for various cancer types; however, methods to measure TMB have not been adequately evaluated. In this study, we identified two sources of bias in current methods to calculate TMB. Methods We used simulated data to quantify the two sources of bias and their effect on TMB calculation, we down-sampled sequencing reads from exome sequencing datasets from TCGA to evaluate the consistency in TMB estimation across different sequencing depths. We analyzed data from ten cancer cohorts to investigate the relationship between inferred TMB and sequencing depth. Results We found that TMB, estimated by counting the number of somatic mutations above a threshold frequency (typically 0.05), is not robust to sequencing depth. Furthermore, we show that, because only mutations with an observed frequency greater than the threshold are considered, the observed mutant allele frequency provides a biased estimate of the true frequency. This can result in substantial over-estimation of the TMB, when the cancer sample includes a large number of somatic mutations at low frequencies, and exacerbates the lack of robustness of TMB to variation in sequencing depth and tumour purity. Conclusion Our results demonstrate that care needs to be taken in the estimation of TMB to ensure that results are unbiased and consistent across studies and we suggest that accurate and robust estimation of TMB could be achieved using statistical models that estimate the full mutant allele frequency spectrum.

2 citations



Journal ArticleDOI
TL;DR: In this paper , a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets is proposed. But the experimental condition between genes in gene sets of interest is not considered.
Abstract: Gene set analysis (GSA) remains a common step in genome-scale studies because it can reveal insights that are not apparent from results obtained for individual genes. Many different computational tools are applied for GSA, which may be sensitive to different types of signals; however, most methods implicitly test whether there are differences in the distribution of the effect of some experimental condition between genes in gene sets of interest. We have developed a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets. These differences can be in the proportions of genes that are perturbed or in the sign or size of the effects. Inspired by statistical meta-analysis, we take into account the uncertainty in effect size estimates by reducing the influence of genes with greater uncertainty on the estimation of distribution parameters. We demonstrate, using simulation and by application to real data, that this approach provides significant gains in performance over existing methods. Furthermore, the statistical tests carried out are defined in terms of effect sizes, rather than the results of prior statistical tests measuring these changes, which leads to improved interpretability and greater robustness to variation in sample sizes.

Posted ContentDOI
06 Jun 2022
TL;DR: In this article , a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets, is proposed to reduce the influence of genes with greater uncertainty in effect size estimate on distribution parameters.
Abstract: Gene set analysis (GSA) remains a common step in genome-scale studies because it can reveal insights that are not apparent from results obtained for individual genes. Many different computational tools are applied for GSA, which may be sensitive to different types of signals; however, most methods test whether there are differences in the distribution of the effect of some experimental condition between genes in gene sets of interest. We have developed a unifying framework for GSA that first fits effect size distributions, and then tests for differences in these distributions between gene sets. These differences can be in the proportions of genes that are perturbed or in the sign or size of the effects. Inspired by statistical meta-analysis, we take into account the uncertainty in effect size estimates to reduce the influence of genes with greater uncertainty in effect size estimate on distribution parameters. We demonstrate, using simulation and by application to real data, that this approach provides significant gains in performance over existing methods. Furthermore, the statistical tests carried out are defined in terms of effect sizes, rather than the results of prior statistical tests measuring these changes, which leads to improved interpretability and greater robustness to variation in sample sizes. We also show that the approach naturally suggests alternative test types that are not usually considered in GSA; it can, for example, be applied to identify differences in effect size distributions between sample subgroups in a gene set of interest. Applying this approach to an analysis of gene expression changes between matched colon tumour and normal samples, we found several gene sets that showed distinct behaviour in patient subgroups with different prognoses. These may help to explain the clinical differences that have been reported between these patient groups.