scispace - formally typeset
Search or ask a question

Showing papers in "Genome Medicine in 2017"


Journal ArticleDOI
TL;DR: Measurements of TMB from comprehensive genomic profiling are strongly reflective of measurements from whole exome sequencing and model that below 0.5 Mb the variance in measurement increases significantly, demonstrating that many disease types have a substantial portion of patients with high TMB who might benefit from immunotherapy.
Abstract: High tumor mutational burden (TMB) is an emerging biomarker of sensitivity to immune checkpoint inhibitors and has been shown to be more significantly associated with response to PD-1 and PD-L1 blockade immunotherapy than PD-1 or PD-L1 expression, as measured by immunohistochemistry (IHC). The distribution of TMB and the subset of patients with high TMB has not been well characterized in the majority of cancer types. In this study, we compare TMB measured by a targeted comprehensive genomic profiling (CGP) assay to TMB measured by exome sequencing and simulate the expected variance in TMB when sequencing less than the whole exome. We then describe the distribution of TMB across a diverse cohort of 100,000 cancer cases and test for association between somatic alterations and TMB in over 100 tumor types. We demonstrate that measurements of TMB from comprehensive genomic profiling are strongly reflective of measurements from whole exome sequencing and model that below 0.5 Mb the variance in measurement increases significantly. We find that a subset of patients exhibits high TMB across almost all types of cancer, including many rare tumor types, and characterize the relationship between high TMB and microsatellite instability status. We find that TMB increases significantly with age, showing a 2.4-fold difference between age 10 and age 90 years. Finally, we investigate the molecular basis of TMB and identify genes and mutations associated with TMB level. We identify a cluster of somatic mutations in the promoter of the gene PMS2, which occur in 10% of skin cancers and are highly associated with increased TMB. These results show that a CGP assay targeting ~1.1 Mb of coding genome can accurately assess TMB compared with sequencing the whole exome. Using this method, we find that many disease types have a substantial portion of patients with high TMB who might benefit from immunotherapy. Finally, we identify novel, recurrent promoter mutations in PMS2, which may be another example of regulatory mutations contributing to tumorigenesis.

2,304 citations


Journal ArticleDOI
TL;DR: A practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation is presented.
Abstract: RNA sequencing (RNA-seq) is a genomic approach for the detection and quantitative analysis of messenger RNA molecules in a biological sample and is useful for studying cellular responses. RNA-seq has fueled much discovery and innovation in medicine over recent years. For practical reasons, the technique is usually conducted on samples comprising thousands to millions of cells. However, this has hindered direct assessment of the fundamental unit of biology—the cell. Since the first single-cell RNA-sequencing (scRNA-seq) study was published in 2009, many more have been conducted, mostly by specialist laboratories with unique skills in wet-lab single-cell genomics, bioinformatics, and computation. However, with the increasing commercial availability of scRNA-seq platforms, and the rapid ongoing maturation of bioinformatics approaches, a point has been reached where any biomedical researcher or clinician can use scRNA-seq to make exciting discoveries. In this review, we present a practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation.

611 citations


Journal ArticleDOI
TL;DR: The challenges for clinical translation of RNA-based therapeutics, with an emphasis on recent advances in delivery technologies, are discussed, and an overview of the applications of RNAs for modulation of gene/protein expression and genome editing that are currently being investigated both in the laboratory as well as in the clinic are presented.
Abstract: The rapid expansion of the available genomic data continues to greatly impact biomedical science and medicine. Fulfilling the clinical potential of genetic discoveries requires the development of therapeutics that can specifically modulate the expression of disease-relevant genes. RNA-based drugs, including short interfering RNAs and antisense oligonucleotides, are particularly promising examples of this newer class of biologics. For over two decades, researchers have been trying to overcome major challenges for utilizing such RNAs in a therapeutic context, including intracellular delivery, stability, and immune response activation. This research is finally beginning to bear fruit as the first RNA drugs gain FDA approval and more advance to the final phases of clinical trials. Furthermore, the recent advent of CRISPR, an RNA-guided gene-editing technology, as well as new strides in the delivery of messenger RNA transcribed in vitro, have triggered a major expansion of the RNA-therapeutics field. In this review, we discuss the challenges for clinical translation of RNA-based therapeutics, with an emphasis on recent advances in delivery technologies, and present an overview of the applications of RNA-based drugs for modulation of gene/protein expression and genome editing that are currently being investigated both in the laboratory as well as in the clinic.

461 citations


Journal ArticleDOI
TL;DR: Differences of colonic microbiota and of microbiota metabolism between PD patients and controls are revealed at an unprecedented detail not achievable through 16S sequencing, point to a yet unappreciated aspect of PD, possibly involving the intestinal barrier function and immune function in PD patients.
Abstract: Parkinson’s disease (PD) presently is conceptualized as a protein aggregation disease in which pathology involves both the enteric and the central nervous system, possibly spreading from one to another via the vagus nerves. As gastrointestinal dysfunction often precedes or parallels motor symptoms, the enteric system with its vast diversity of microorganisms may be involved in PD pathogenesis. Alterations in the enteric microbial taxonomic level of L-DOPA-naive PD patients might also serve as a biomarker. We performed metagenomic shotgun analyses and compared the fecal microbiomes of 31 early stage, L-DOPA-naive PD patients to 28 age-matched controls. We found increased Verrucomicrobiaceae (Akkermansia muciniphila) and unclassified Firmicutes, whereas Prevotellaceae (Prevotella copri) and Erysipelotrichaceae (Eubacterium biforme) were markedly lowered in PD samples. The observed differences could reliably separate PD from control with a ROC-AUC of 0.84. Functional analyses of the metagenomes revealed differences in microbiota metabolism in PD involving the ẞ-glucuronate and tryptophan metabolism. While the abundances of prophages and plasmids did not differ between PD and controls, total virus abundance was decreased in PD participants. Based on our analyses, the intake of either a MAO inhibitor, amantadine, or a dopamine agonist (which in summary relates to 90% of PD patients) had no overall influence on taxa abundance or microbial functions. Our data revealed differences of colonic microbiota and of microbiota metabolism between PD patients and controls at an unprecedented detail not achievable through 16S sequencing. The findings point to a yet unappreciated aspect of PD, possibly involving the intestinal barrier function and immune function in PD patients. The influence of the parkinsonian medication should be further investigated in the future in larger cohorts.

407 citations


Journal ArticleDOI
TL;DR: In this article, the authors performed metagenomic sequencing of monthly stool samples from 20 IBD patients and 12 controls (266 total samples) and identified strain-specific functional correlates with IBD outcomes.
Abstract: Inflammatory bowel disease (IBD) is characterized by chronic inflammation of the gastrointestinal tract that is associated with changes in the gut microbiome. Here, we sought to identify strain-specific functional correlates with IBD outcomes. We performed metagenomic sequencing of monthly stool samples from 20 IBD patients and 12 controls (266 total samples). These were taxonomically profiled with MetaPhlAn2 and functionally profiled using HUMAnN2. Differentially abundant species were identified using MaAsLin and strain-specific pangenome haplotypes were analyzed using PanPhlAn. We found a significantly higher abundance in patients of facultative anaerobes that can tolerate the increased oxidative stress of the IBD gut. We also detected dramatic, yet transient, blooms of Ruminococcus gnavus in IBD patients, often co-occurring with increased disease activity. We identified two distinct clades of R. gnavus strains, one of which is enriched in IBD patients. To study functional differences between these two clades, we augmented the R. gnavus pangenome by sequencing nine isolates from IBD patients. We identified 199 IBD-specific, strain-specific genes involved in oxidative stress responses, adhesion, iron-acquisition, and mucus utilization, potentially conferring an adaptive advantage for this R. gnavus clade in the IBD gut. This study adds further evidence to the hypothesis that increased oxidative stress may be a major factor shaping the dysbiosis of the microbiome observed in IBD and suggests that R. gnavus may be an important member of the altered gut community in IBD.

386 citations


Journal ArticleDOI
TL;DR: The observations made in this study suggest that, with certain caveats, a very low allele frequency threshold can be adopted to more accurately interpret sequence variants.
Abstract: The frequency of a variant in the general population is a key criterion used in the clinical interpretation of sequence variants. With certain exceptions, such as founder mutations, the rarity of a variant is a prerequisite for pathogenicity. However, defining the threshold at which a variant should be considered “too common” is challenging and therefore diagnostic laboratories have typically set conservative allele frequency thresholds. Recent publications of large population sequencing data, such as the Exome Aggregation Consortium (ExAC) database, provide an opportunity to characterize with accuracy and precision the frequency distributions of very rare disease-causing alleles. Allele frequencies of pathogenic variants in ClinVar, as well as variants expected to be pathogenic through the nonsense-mediated decay (NMD) pathway, were analyzed to study the burden of pathogenic variants in 79 genes of clinical importance. Of 1364 BRCA1 and BRCA2 variants that are well characterized as pathogenic or that are expected to lead to NMD, 1350 variants had an allele frequency of less than 0.0025%. The remaining 14 variants were previously published founder mutations. Importantly, we observed no difference in the distributions of pathogenic variants expected to be lead to NMD compared to those that are not. Therefore, we expanded the analysis to examine the distributions of NMD expected variants in 77 additional genes. These 77 genes were selected to represent a broad set of clinical areas, modes of inheritance, and penetrance. Among these variants, most (97.3%) had an allele frequency of less than 0.01%. Furthermore, pathogenic variants with allele frequencies greater than 0.01% were well characterized in publications and included many founder mutations. The observations made in this study suggest that, with certain caveats, a very low allele frequency threshold can be adopted to more accurately interpret sequence variants.

181 citations


Journal ArticleDOI
TL;DR: Analyzing 10,000 tumor exomes, this work identifies more than 3000 rarely mutated residues in proteins as potentially functional and experimentally validate several in RAC1 and MAP2K1.
Abstract: Many mutations in cancer are of unknown functional significance. Standard methods use statistically significant recurrence of mutations in tumor samples as an indicator of functional impact. We extend such analyses into the long tail of rare mutations by considering recurrence of mutations in clusters of spatially close residues in protein structures. Analyzing 10,000 tumor exomes, we identify more than 3000 rarely mutated residues in proteins as potentially functional and experimentally validate several in RAC1 and MAP2K1. These potential driver mutations (web resources: 3dhotspots.org and cBioPortal.org) can extend the scope of genomically informed clinical trials and of personalized choice of therapy.

173 citations


Journal ArticleDOI
TL;DR: In this article, the authors designed and implemented protocols for the study of cases for which a plausible molecular diagnosis was not achieved in a clinical genomics diagnostic laboratory (i.e., unsolved clinical exomes).
Abstract: Given the rarity of most single-gene Mendelian disorders, concerted efforts of data exchange between clinical and scientific communities are critical to optimize molecular diagnosis and novel disease gene discovery. We designed and implemented protocols for the study of cases for which a plausible molecular diagnosis was not achieved in a clinical genomics diagnostic laboratory (i.e. unsolved clinical exomes). Such cases were recruited to a research laboratory for further analyses, in order to potentially: (1) accelerate novel disease gene discovery; (2) increase the molecular diagnostic yield of whole exome sequencing (WES); and (3) gain insight into the genetic mechanisms of disease. Pilot project data included 74 families, consisting mostly of parent–offspring trios. Analyses performed on a research basis employed both WES from additional family members and complementary bioinformatics approaches and protocols. Analysis of all possible modes of Mendelian inheritance, focusing on both single nucleotide variants (SNV) and copy number variant (CNV) alleles, yielded a likely contributory variant in 36% (27/74) of cases. If one includes candidate genes with variants identified within a single family, a potential contributory variant was identified in a total of ~51% (38/74) of cases enrolled in this pilot study. The molecular diagnosis was achieved in 30/63 trios (47.6%). Besides this, the analysis workflow yielded evidence for pathogenic variants in disease-associated genes in 4/6 singleton cases (66.6%), 1/1 multiplex family involving three affected siblings, and 3/4 (75%) quartet families. Both the analytical pipeline and the collaborative efforts between the diagnostic and research laboratories provided insights that allowed recent disease gene discoveries (PURA, TANGO2, EMC1, GNB5, ATAD3A, and MIPEP) and increased the number of novel genes, defined in this study as genes identified in more than one family (DHX30 and EBF3). An efficient genomics pipeline in which clinical sequencing in a diagnostic laboratory is followed by the detailed reanalysis of unsolved cases in a research environment, supplemented with WES data from additional family members, and subject to adjuvant bioinformatics analyses including relaxed variant filtering parameters in informatics pipelines, can enhance the molecular diagnostic yield and provide mechanistic insights into Mendelian disorders. Implementing these approaches requires collaborative clinical molecular diagnostic and research efforts.

169 citations


Journal ArticleDOI
TL;DR: The potential for translation to clinical use of polygenic risk scores that combine thousands of variants show some predictive ability across a range of complex traits and diseases, including neuropsychiatric disorders is considered.
Abstract: Genome-wide association studies have made strides in identifying common variation associated with disease. The modest effect sizes preclude risk prediction based on single genetic variants, but polygenic risk scores that combine thousands of variants show some predictive ability across a range of complex traits and diseases, including neuropsychiatric disorders. Here, we consider the potential for translation to clinical use.

161 citations


Journal ArticleDOI
TL;DR: The data strongly support the value of large-scale sequencing, especially WGS within proband-parent trios, as both an effective first-choice diagnostic tool and means to advance clinical and research progress related to pediatric neurological disease.
Abstract: Developmental disabilities have diverse genetic causes that must be identified to facilitate precise diagnoses. We describe genomic data from 371 affected individuals, 309 of which were sequenced as proband-parent trios. Whole-exome sequences (WES) were generated for 365 individuals (127 affected) and whole-genome sequences (WGS) were generated for 612 individuals (244 affected). Pathogenic or likely pathogenic variants were found in 100 individuals (27%), with variants of uncertain significance in an additional 42 (11.3%). We found that a family history of neurological disease, especially the presence of an affected first-degree relative, reduces the pathogenic/likely pathogenic variant identification rate, reflecting both the disease relevance and ease of interpretation of de novo variants. We also found that improvements to genetic knowledge facilitated interpretation changes in many cases. Through systematic reanalyses, we have thus far reclassified 15 variants, with 11.3% of families who initially were found to harbor a VUS and 4.7% of families with a negative result eventually found to harbor a pathogenic or likely pathogenic variant. To further such progress, the data described here are being shared through ClinVar, GeneMatcher, and dbGaP. Our data strongly support the value of large-scale sequencing, especially WGS within proband-parent trios, as both an effective first-choice diagnostic tool and means to advance clinical and research progress related to pediatric neurological disease.

155 citations


Journal ArticleDOI
TL;DR: A high-confidence set of independently validated genes differentially expressed between schizophrenia and control patients in the anterior cingulate cortex are highlighted and integrated transcriptional changes with untargeted metabolite profiling are integrated.
Abstract: Psychiatric disorders are multigenic diseases with complex etiology that contribute significantly to human morbidity and mortality. Although clinically distinct, several disorders share many symptoms, suggesting common underlying molecular changes exist that may implicate important regulators of pathogenesis and provide new therapeutic targets. We performed RNA sequencing on tissue from the anterior cingulate cortex, dorsolateral prefrontal cortex, and nucleus accumbens from three groups of 24 patients each diagnosed with schizophrenia, bipolar disorder, or major depressive disorder, and from 24 control subjects. We identified differentially expressed genes and validated the results in an independent cohort. Anterior cingulate cortex samples were also subjected to metabolomic analysis. ChIP-seq data were used to characterize binding of the transcription factor EGR1. We compared molecular signatures across the three brain regions and disorders in the transcriptomes of post-mortem human brain samples. The most significant disease-related differences were in the anterior cingulate cortex of schizophrenia samples compared to controls. Transcriptional changes were assessed in an independent cohort, revealing the transcription factor EGR1 as significantly down-regulated in both cohorts and as a potential regulator of broader transcription changes observed in schizophrenia patients. Additionally, broad down-regulation of genes specific to neurons and concordant up-regulation of genes specific to astrocytes was observed in schizophrenia and bipolar disorder patients relative to controls. Metabolomic profiling identified disruption of GABA levels in schizophrenia patients. We provide a comprehensive post-mortem transcriptome profile of three psychiatric disorders across three brain regions. We highlight a high-confidence set of independently validated genes differentially expressed between schizophrenia and control patients in the anterior cingulate cortex and integrate transcriptional changes with untargeted metabolite profiling.

Journal ArticleDOI
TL;DR: It is shown that the combination of deeper genotype imputation and extended phenotype availability make GS:SFHS an attractive resource to carry out association studies to gain insight into the genetic architecture of complex traits.
Abstract: The Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based population cohort with DNA, biological samples, socio-demographic, psychological and clinical data from approximately 24,000 adult volunteers across Scotland. Although data collection was cross-sectional, GS:SFHS became a prospective cohort due to of the ability to link to routine Electronic Health Record (EHR) data. Over 20,000 participants were selected for genotyping using a large genome-wide array. GS:SFHS was analysed using genome-wide association studies (GWAS) to test the effects of a large spectrum of variants, imputed using the Haplotype Research Consortium (HRC) dataset, on medically relevant traits measured directly or obtained from EHRs. The HRC dataset is the largest available haplotype reference panel for imputation of variants in populations of European ancestry and allows investigation of variants with low minor allele frequencies within the entire GS:SFHS genotyped cohort. Genome-wide associations were run on 20,032 individuals using both genotyped and HRC imputed data. We present results for a range of well-studied quantitative traits obtained from clinic visits and for serum urate measures obtained from data linkage to EHRs collected by the Scottish National Health Service. Results replicated known associations and additionally reveal novel findings, mainly with rare variants, validating the use of the HRC imputation panel. For example, we identified two new associations with fasting glucose at variants near to Y_RNA and WDR4 and four new associations with heart rate at SNPs within CSMD1 and ASPH, upstream of HTR1F and between PROKR2 and GPCPD1. All were driven by rare variants (minor allele frequencies in the range of 0.08–1%). Proof of principle for use of EHRs was verification of the highly significant association of urate levels with the well-established urate transporter SLC2A9. GS:SFHS provides genetic data on over 20,000 participants alongside a range of phenotypes as well as linkage to National Health Service laboratory and clinical records. We have shown that the combination of deeper genotype imputation and extended phenotype availability make GS:SFHS an attractive resource to carry out association studies to gain insight into the genetic architecture of complex traits.

Journal ArticleDOI
TL;DR: This study demonstrates that PD status has a profound association with DNA methylation levels in blood and saliva; and the most significant PD-related changes reflect changes in blood cell composition.
Abstract: Several articles suggest that DNA methylation levels in blood relate to Parkinson’s disease (PD) but there is a need for a large-scale study that involves suitable population based controls. The purposes of the study were: (1) to study whether PD status is associated with DNA methylation levels in blood/saliva; (2) to study whether observed associations relate to blood cell types; and (3) to characterize genome-wide significant markers (“CpGs”) and clusters of CpGs (co-methylation modules) in terms of biological pathways. In a population-based case control study of PD, we studied blood samples from 335 PD cases and 237 controls and saliva samples from another 128 cases and 131 controls. DNA methylation data were generated from over 486,000 CpGs using the Illumina Infinium array. We identified modules of CpGs (clusters) using weighted correlation network analysis (WGCNA). Our cross-sectional analysis of blood identified 82 genome-wide significant CpGs (including cg02489202 in LARS2 p = 8.3 × 10–11 and cg04772575 in ABCB9 p = 4.3 × 10–10). Three out of six PD related co-methylation modules in blood were significantly enriched with immune system related genes. Our analysis of saliva identified five significant CpGs. PD-related CpGs are located near genes that relate to mitochondrial function, neuronal projection, cytoskeleton organization, systemic immune response, and iron handling. This study demonstrates that: (1) PD status has a profound association with DNA methylation levels in blood and saliva; and (2) the most significant PD-related changes reflect changes in blood cell composition. Overall, this study highlights the role of the immune system in PD etiology but future research will need to address the causal structure of these relationships.

Journal ArticleDOI
TL;DR: It is shown that IMQ does not uniquely model psoriasis but in fact triggers a core set of pathways active in diverse skin diseases, and suggests that B6 mice provide a better background than other strains for modeling Psoriasis disease mechanisms.
Abstract: Imiquimod (IMQ) produces a cutaneous phenotype in mice frequently studied as an acute model of human psoriasis. Whether this phenotype depends on strain or sex has never been systematically investigated on a large scale. Such effects, however, could lead to conflicts among studies, while further impacting study outcomes and efforts to translate research findings. RNA-seq was used to evaluate the psoriasiform phenotype elicited by 6 days of Aldara (5% IMQ) treatment in both sexes of seven mouse strains (C57BL/6 J (B6), BALB/cJ, CD1, DBA/1 J, FVB/NJ, 129X1/SvJ, and MOLF/EiJ). In most strains, IMQ altered gene expression in a manner consistent with human psoriasis, partly due to innate immune activation and decreased homeostatic gene expression. The response of MOLF males was aberrant, however, with decreased expression of differentiation-associated genes (elevated in other strains). Key aspects of the IMQ response differed between the two most commonly studied strains (BALB/c and B6). Compared with BALB/c, the B6 phenotype showed increased expression of genes associated with DNA replication, IL-17A stimulation, and activated CD8+ T cells, but decreased expression of genes associated with interferon signaling and CD4+ T cells. Although IMQ-induced expression shifts mirrored psoriasis, responses in BALB/c, 129/SvJ, DBA, and MOLF mice were more consistent with other human skin conditions (e.g., wounds or infections). IMQ responses in B6 mice were most consistent with human psoriasis and best replicated expression patterns specific to psoriasis lesions. These findings demonstrate strain-dependent aspects of IMQ dermatitis in mice. We have shown that IMQ does not uniquely model psoriasis but in fact triggers a core set of pathways active in diverse skin diseases. Nonetheless, our findings suggest that B6 mice provide a better background than other strains for modeling psoriasis disease mechanisms.

Journal ArticleDOI
TL;DR: The data suggest that DNA methylation patterns in cell-free DNA have the potential to detect a proportion of OCs up to two years in advance of diagnosis and may potentially guide personalized treatment.
Abstract: Despite a myriad of attempts in the last three decades to diagnose ovarian cancer (OC) earlier, this clinical aim still remains a significant challenge. Aberrant methylation patterns of linked CpGs analyzed in DNA fragments shed by cancers into the bloodstream (i.e. cell-free DNA) can provide highly specific signals indicating cancer presence. We analyzed 699 cancerous and non-cancerous tissues using a methylation array or reduced representation bisulfite sequencing to discover the most specific OC methylation patterns. A three-DNA-methylation-serum-marker panel was developed using targeted ultra-high coverage bisulfite sequencing in 151 women and validated in 250 women with various conditions, particularly in those associated with high CA125 levels (endometriosis and other benign pelvic masses), serial samples from 25 patients undergoing neoadjuvant chemotherapy, and a nested case control study of 172 UKCTOCS control arm participants which included serum samples up to two years before OC diagnosis. The cell-free DNA amount and average fragment size in the serum samples was up to ten times higher than average published values (based on samples that were immediately processed) due to leakage of DNA from white blood cells owing to delayed time to serum separation. Despite this, the marker panel discriminated high grade serous OC patients from healthy women or patients with a benign pelvic mass with specificity/sensitivity of 90.7% (95% confidence interval [CI] = 84.3–94.8%) and 41.4% (95% CI = 24.1–60.9%), respectively. Levels of all three markers plummeted after exposure to chemotherapy and correctly identified 78% and 86% responders and non-responders (Fisher’s exact test, p = 0.04), respectively, which was superior to a CA125 cut-off of 35 IU/mL (20% and 75%). 57.9% (95% CI 34.0–78.9%) of women who developed OC within two years of sample collection were identified with a specificity of 88.1% (95% CI = 77.3–94.3%). Sensitivity and specificity improved further when specifically analyzing CA125 negative samples only (63.6% and 87.5%, respectively). Our data suggest that DNA methylation patterns in cell-free DNA have the potential to detect a proportion of OCs up to two years in advance of diagnosis and may potentially guide personalized treatment. The prospective use of novel collection vials, which stabilize blood cells and reduce background DNA contamination in serum/plasma samples, will facilitate clinical implementation of liquid biopsy analyses.

Journal ArticleDOI
TL;DR: The benefit of whole-genome sequencing over whole-exome sequencing (WGS) to understand more complex, multifactorial cases of NDD and how this improved understanding aids diagnosis and management of these disorders are discussed.
Abstract: Next-generation sequencing (NGS) is now more accessible to clinicians and researchers. As a result, our understanding of the genetics of neurodevelopmental disorders (NDDs) has rapidly advanced over the past few years. NGS has led to the discovery of new NDD genes with an excess of recurrent de novo mutations (DNMs) when compared to controls. Development of large-scale databases of normal and disease variation has given rise to metrics exploring the relative tolerance of individual genes to human mutation. Genetic etiology and diagnosis rates have improved, which have led to the discovery of new pathways and tissue types relevant to NDDs. In this review, we highlight several key findings based on the discovery of recurrent DNMs ranging from copy number variants to point mutations. We explore biases and patterns of DNM enrichment and the role of mosaicism and secondary mutations in variable expressivity. We discuss the benefit of whole-genome sequencing (WGS) over whole-exome sequencing (WES) to understand more complex, multifactorial cases of NDD and explain how this improved understanding aids diagnosis and management of these disorders. Comprehensive assessment of the DNM landscape across the genome using WGS and other technologies will lead to the development of novel functional and bioinformatics approaches to interpret DNMs and drive new insights into NDD biology.

Journal ArticleDOI
TL;DR: The progress and challenges in the detection of rare and common variants in DCM and systolic heart failure are reviewed, and the particular challenges in accurate and informed variant interpretation, and in understanding the effects of these variants are discussed.
Abstract: Heart failure is a major health burden, affecting 40 million people globally. One of the main causes of systolic heart failure is dilated cardiomyopathy (DCM), the leading global indication for heart transplantation. Our understanding of the genetic basis of both DCM and systolic heart failure has improved in recent years with the application of next-generation sequencing and genome-wide association studies (GWAS). This has enabled rapid sequencing at scale, leading to the discovery of many novel rare variants in DCM and of common variants in both systolic heart failure and DCM. Identifying rare and common genetic variants contributing to systolic heart failure has been challenging given its diverse and multiple etiologies. DCM, however, although rarer, is a reasonably specific and well-defined condition, leading to the identification of many rare genetic variants. Truncating variants in titin represent the single largest genetic cause of DCM. Here, we review the progress and challenges in the detection of rare and common variants in DCM and systolic heart failure, and the particular challenges in accurate and informed variant interpretation, and in understanding the effects of these variants. We also discuss how our increasing genetic knowledge is changing clinical management. Harnessing genetic data and translating it to improve risk stratification and the development of novel therapeutics represents a major challenge and unmet critical need for patients with heart failure and their families.

Journal ArticleDOI
TL;DR: N6-methyladenosine (m6A) in mRNA has emerged as a crucial epitranscriptomic modification that controls cellular differentiation and pluripotency and is a new and promising therapeutic avenue for investigation.
Abstract: N 6-methyladenosine (m6A) in mRNA has emerged as a crucial epitranscriptomic modification that controls cellular differentiation and pluripotency. Recent studies are pointing to a role for the RNA methylation program in cancer self-renewal and cell fate, making this a new and promising therapeutic avenue for investigation.

Journal ArticleDOI
TL;DR: It is demonstrated that ethnicity and infant feeding practices independently influence the infant gut microbiome at 1 year, and that ethnic differences can be mapped to alpha diversity as well as a higher abundance of lactic acid bacteria in South Asians and a higher array of genera within the order Clostridiales in white Caucasians.
Abstract: The infant gut is rapidly colonized by microorganisms soon after birth, and the composition of the microbiota is dynamic in the first year of life. Although a stable microbiome may not be established until 1 to 3 years after birth, the infant gut microbiota appears to be an important predictor of health outcomes in later life. We obtained stool at one year of age from 173 white Caucasian and 182 South Asian infants from two Canadian birth cohorts to gain insight into how maternal and early infancy exposures influence the development of the gut microbiota. We investigated whether the infant gut microbiota differed by ethnicity (referring to groups of people who have certain racial, cultural, religious, or other traits in common) and by breastfeeding status, while accounting for variations in maternal and infant exposures (such as maternal antibiotic use, gestational diabetes, vegetarianism, infant milk diet, time of introduction of solid food, infant birth weight, and weight gain in the first year). We demonstrate that ethnicity and infant feeding practices independently influence the infant gut microbiome at 1 year, and that ethnic differences can be mapped to alpha diversity as well as a higher abundance of lactic acid bacteria in South Asians and a higher abundance of genera within the order Clostridiales in white Caucasians. The infant gut microbiome is influenced by ethnicity and breastfeeding in the first year of life. Ethnic differences in the gut microbiome may reflect maternal/infant dietary differences and whether these differences are associated with future cardiometabolic outcomes can only be determined after prospective follow-up.

Journal ArticleDOI
TL;DR: In this paper, the authors performed an in-depth computational analysis of the prevalence of functional variants in 806 drug-related genes, including 628 known drug targets, and found that even though many variants are very rare and thus likely not observed in clinical trials, four in five patients are likely to carry a variant with possibly functional effects in a target for commonly prescribed drugs.
Abstract: Variability in drug efficacy and adverse effects are observed in clinical practice. While the extent of genetic variability in classic pharmacokinetic genes is rather well understood, the role of genetic variation in drug targets is typically less studied. Based on 60,706 human exomes from the ExAC dataset, we performed an in-depth computational analysis of the prevalence of functional variants in 806 drug-related genes, including 628 known drug targets. We further computed the likelihood of 1236 FDA-approved drugs to be affected by functional variants in their targets in the whole ExAC population as well as different geographic sub-populations. We find that most genetic variants in drug-related genes are very rare (f < 0.1%) and thus will likely not be observed in clinical trials. Furthermore, we show that patient risk varies for many drugs and with respect to geographic ancestry. A focused analysis of oncological drug targets indicates that the probability of a patient carrying germline variants in oncological drug targets is, at 44%, high enough to suggest that not only somatic alterations but also germline variants carried over into the tumor genome could affect the response to antineoplastic agents. This study indicates that even though many variants are very rare and thus likely not observed in clinical trials, four in five patients are likely to carry a variant with possibly functional effects in a target for commonly prescribed drugs. Such variants could potentially alter drug efficacy.

Journal ArticleDOI
TL;DR: The ability of NGM technology to detect pathogenic structural variants otherwise missed by PCR-based techniques or chromosomal microarrays is showed, indicating NGM is poised to become a new tool in the clinical genetic diagnostic strategy and research due to its ability to sensitively identify large genomic variations.
Abstract: Massively parallel DNA sequencing, such as exome sequencing, has become a routine clinical procedure to identify pathogenic variants responsible for a patient’s phenotype. Exome sequencing has the capability of reliably identifying inherited and de novo single-nucleotide variants, small insertions, and deletions. However, due to the use of 100–300-bp fragment reads, this platform is not well powered to sensitively identify moderate to large structural variants (SV), such as insertions, deletions, inversions, and translocations. To overcome these limitations, we used next-generation mapping (NGM) to image high molecular weight double-stranded DNA molecules (megabase size) with fluorescent tags in nanochannel arrays for de novo genome assembly. We investigated the capacity of this NGM platform to identify pathogenic SV in a series of patients diagnosed with Duchenne muscular dystrophy (DMD), due to large deletions, insertion, and inversion involving the DMD gene. We identified deletion, duplication, and inversion breakpoints within DMD. The sizes of deletions were in the range of 45–250 Kbp, whereas the one identified insertion was approximately 13 Kbp in size. This method refined the location of the break points within introns for cases with deletions compared to current polymerase chain reaction (PCR)-based clinical techniques. Heterozygous SV were detected in the known carrier mothers of the DMD patients, demonstrating the ability of the method to ascertain carrier status for large SV. The method was also able to identify a 5.1-Mbp inversion involving the DMD gene, previously identified by RNA sequencing. We showed the ability of NGM technology to detect pathogenic structural variants otherwise missed by PCR-based techniques or chromosomal microarrays. NGM is poised to become a new tool in the clinical genetic diagnostic strategy and research due to its ability to sensitively identify large genomic variations.

Journal ArticleDOI
TL;DR: Matched pairs analysis of individual tumor-normal pairs revealed significant differences in relative abundance of specific taxa, namely in the genus Actinomyces, more pronounced among patients with higher T-stage.
Abstract: While the role of the gut microbiome in inflammation and colorectal cancers has received much recent attention, there are few data to support an association between the oral microbiome and head and neck squamous cell carcinomas. Prior investigations have been limited to comparisons of microbiota obtained from surface swabs of the oral cavity. This study aims to identify microbiomic differences in paired tumor and non-tumor tissue samples in a large group of 121 patients with head and neck squamous cell carcinomas and correlate these differences with clinical-pathologic features. Total DNA was extracted from paired normal and tumor resection specimens from 169 patients; 242 samples from 121 patients were included in the final analysis. Microbiomic content of each sample was determined using 16S rDNA amplicon sequencing. Bioinformatic analysis was performed using QIIME algorithms. F-testing on cluster strength, Wilcoxon signed-rank testing on differential relative abundances of paired tumor-normal samples, and Wilcoxon rank-sum testing on the association of T-stage with relative abundances were conducted in R. We observed no significant difference in measures of alpha diversity between tumor and normal tissue (Shannon index: p = 0.13, phylogenetic diversity: p = 0.42). Similarly, although we observed statistically significantly differences in both weighted (p = 0.01) and unweighted (p = 0.04) Unifrac distances between tissue types, the tumor/normal grouping explained only a small proportion of the overall variation in the samples (weighted R2 = 0.01, unweighted R2 < 0.01). Notably, however, when comparing the relative abundances of individual taxa between matched pairs of tumor and normal tissue, we observed that Actinomyces and its parent taxa up to the phylum level were significantly depleted in tumor relative to normal tissue (q < 0.01), while Parvimonas was increased in tumor relative to normal tissue (q = 0.01). These differences were more pronounced among patients with more extensive disease as measured by higher T-stage. Matched pairs analysis of individual tumor-normal pairs revealed significant differences in relative abundance of specific taxa, namely in the genus Actinomyces. These differences were more pronounced among patients with higher T-stage. Our observations suggest further experiments to interrogate potential novel mechanisms relevant to carcinogenesis associated with alterations of the oral microbiome that may have consequences for the human host.

Journal ArticleDOI
TL;DR: These data quantify the longitudinal variability of the oral and gut microbiota in AML patients, show that increased variability was correlated with adverse clinical outcomes, and offer the possibility of using stabilizing taxa as a method of focused microbiome repletion.
Abstract: Understanding longitudinal variability of the microbiome in ill patients is critical to moving microbiome-based measurements and therapeutics into clinical practice. However, the vast majority of data regarding microbiome stability are derived from healthy subjects. Herein, we sought to determine intra-patient temporal microbiota variability, the factors driving such variability, and its clinical impact in an extensive longitudinal cohort of hospitalized cancer patients during chemotherapy. The stool (n = 365) and oral (n = 483) samples of 59 patients with acute myeloid leukemia (AML) undergoing induction chemotherapy (IC) were sampled from initiation of chemotherapy until neutrophil recovery. Microbiome characterization was performed via analysis of 16S rRNA gene sequencing. Temporal variability was determined using coefficients of variation (CV) of the Shannon diversity index (SDI) and unweighted and weighted UniFrac distances per patient, per site. Measurements of intra-patient temporal variability and patient stability categories were analyzed for their correlations with genera abundances. Groups of patients were analyzed to determine if patients with adverse outcomes had significantly different levels of microbiome temporal variability. Potential clinical drivers of microbiome temporal instability were determined using multivariable regression analyses. Our cohort evidenced a high degree of intra-patient temporal instability of stool and oral microbial diversity based on SDI CV. We identified statistically significant differences in the relative abundance of multiple taxa amongst individuals with different levels of microbiota temporal stability. Increased intra-patient temporal variability of the oral SDI was correlated with increased risk of infection during IC (P = 0.02), and higher stool SDI CVs were correlated with increased risk of infection 90 days post-IC (P = 0.04). Total days on antibiotics was significantly associated with increased temporal variability of both oral microbial diversity (P = 0.03) and community structure (P = 0.002). These data quantify the longitudinal variability of the oral and gut microbiota in AML patients, show that increased variability was correlated with adverse clinical outcomes, and offer the possibility of using stabilizing taxa as a method of focused microbiome repletion. Furthermore, these results support the importance of longitudinal microbiome sampling and analyses, rather than one time measurements, in research and future clinical practice.

Journal ArticleDOI
TL;DR: The approach provides a framework for examining molecular signatures of disease in fibrosis and autoimmune diseases and for leveraging publicly available data to understand common and tissue-specific disease processes in complex human diseases.
Abstract: Systemic sclerosis (SSc) is a multi-organ autoimmune disease characterized by skin fibrosis. Internal organ involvement is heterogeneous. It is unknown whether disease mechanisms are common across all involved affected tissues or if each manifestation has a distinct underlying pathology. We used consensus clustering to compare gene expression profiles of biopsies from four SSc-affected tissues (skin, lung, esophagus, and peripheral blood) from patients with SSc, and the related conditions pulmonary fibrosis (PF) and pulmonary arterial hypertension, and derived a consensus disease-associate signature across all tissues. We used this signature to query tissue-specific functional genomic networks. We performed novel network analyses to contrast the skin and lung microenvironments and to assess the functional role of the inflammatory and fibrotic genes in each organ. Lastly, we tested the expression of macrophage activation state-associated gene sets for enrichment in skin and lung using a Wilcoxon rank sum test. We identified a common pathogenic gene expression signature—an immune–fibrotic axis—indicative of pro-fibrotic macrophages (MOs) in multiple tissues (skin, lung, esophagus, and peripheral blood mononuclear cells) affected by SSc. While the co-expression of these genes is common to all tissues, the functional consequences of this upregulation differ by organ. We used this disease-associated signature to query tissue-specific functional genomic networks to identify common and tissue-specific pathologies of SSc and related conditions. In contrast to skin, in the lung-specific functional network we identify a distinct lung-resident MO signature associated with lipid stimulation and alternative activation. In keeping with our network results, we find distinct MO alternative activation transcriptional programs in SSc-associated PF lung and in the skin of patients with an “inflammatory” SSc gene expression signature. Our results suggest that the innate immune system is central to SSc disease processes but that subtle distinctions exist between tissues. Our approach provides a framework for examining molecular signatures of disease in fibrosis and autoimmune diseases and for leveraging publicly available data to understand common and tissue-specific disease processes in complex human diseases.

Journal ArticleDOI
TL;DR: Using the work of the Australian Pancreatic Cancer Genome Initiative, barriers and opportunities associated with a comprehensive process of RoR in large-scale genomic research are discussed that may be useful for others developing their own policies.
Abstract: The return of research results (RoR) remains a complex and well-debated issue. Despite the debate, actual data related to the experience of giving individual results back, and the impact these results may have on clinical care and health outcomes, is sorely lacking. Through the work of the Australian Pancreatic Cancer Genome Initiative (APGI) we: (1) delineate the pathway back to the patient where actionable research data were identified; and (2) report the clinical utilisation of individual results returned. Using this experience, we discuss barriers and opportunities associated with a comprehensive process of RoR in large-scale genomic research that may be useful for others developing their own policies. We performed whole-genome (n = 184) and exome (n = 208) sequencing of matched tumour-normal DNA pairs from 392 patients with sporadic pancreatic cancer (PC) as part of the APGI. We identified pathogenic germline mutations in candidate genes (n = 130) with established predisposition to PC or medium–high penetrance genes with well-defined cancer associated syndromes or phenotypes. Variants from candidate genes were annotated and classified according to international guidelines. Variants were considered actionable if clinical utility was established, with regard to prevention, diagnosis, prognostication and/or therapy. A total of 48,904 germline variants were identified, with 2356 unique variants undergoing annotation and in silico classification. Twenty cases were deemed actionable and were returned via previously described RoR framework, representing an actionable finding rate of 5.1%. Overall, 1.78% of our cohort experienced clinical benefit from RoR. Returning research results within the context of large-scale genomics research is a labour-intensive, highly variable, complex operation. Results that warrant action are not infrequent, but the prevalence of those who experience a clinical difference as a result of returning individual results is currently low.

Journal ArticleDOI
TL;DR: A pipeline used in ASD studies is extended and applied to infer rare genetic parameters for SCZ and four NDDs and finds many new DD risk genes, supported by gene set enrichment and PPI network connectivity analyses.
Abstract: Integrating rare variation from trio family and case–control studies has successfully implicated specific genes contributing to risk of neurodevelopmental disorders (NDDs) including autism spectrum disorders (ASD), intellectual disability (ID), developmental disorders (DDs), and epilepsy (EPI). For schizophrenia (SCZ), however, while sets of genes have been implicated through the study of rare variation, only two risk genes have been identified. We used hierarchical Bayesian modeling of rare-variant genetic architecture to estimate mean effect sizes and risk-gene proportions, analyzing the largest available collection of whole exome sequence data for SCZ (1,077 trios, 6,699 cases, and 13,028 controls), and data for four NDDs (ASD, ID, DD, and EPI; total 10,792 trios, and 4,058 cases and controls). For SCZ, we estimate there are 1,551 risk genes. There are more risk genes and they have weaker effects than for NDDs. We provide power analyses to predict the number of risk-gene discoveries as more data become available. We confirm and augment prior risk gene and gene set enrichment results for SCZ and NDDs. In particular, we detected 98 new DD risk genes at FDR 0.55), but low between SCZ and the NDDs (ρ<0.3). An in-depth analysis of 288 NDD genes shows there is highly significant protein–protein interaction (PPI) network connectivity, and functionally distinct PPI subnetworks based on pathway enrichment, single-cell RNA-seq cell types, and multi-region developmental brain RNA-seq. We have extended a pipeline used in ASD studies and applied it to infer rare genetic parameters for SCZ and four NDDs ( https://github.com/hoangtn/extTADA ). We find many new DD risk genes, supported by gene set enrichment and PPI network connectivity analyses. We find greater similarity among NDDs than between NDDs and SCZ. NDD gene subnetworks are implicated in postnatally expressed presynaptic and postsynaptic genes, and for transcriptional and post-transcriptional gene regulation in prenatal neural progenitor and stem cells.

Journal ArticleDOI
TL;DR: Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNAsq data analysis.
Abstract: Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app

Journal ArticleDOI
TL;DR: This review focuses on how recent technological advances in 3D genomics are leading to an enhanced understanding of disease mechanisms and the use of genome-wide chromatin conformation capture coupled with oligonucleotide capture technology to map interactions between gene promoters and distal regulatory elements.
Abstract: Genome compaction is a universal feature of cells and has emerged as a global regulator of gene expression. Compaction is maintained by a multitude of architectural proteins, long non-coding RNAs (lncRNAs), and regulatory DNA. Each component comprises interlinked regulatory circuits that organize the genome in three-dimensional (3D) space to manage gene expression. In this review, we update the current state of 3D genome catalogues and focus on how recent technological advances in 3D genomics are leading to an enhanced understanding of disease mechanisms. We highlight the use of genome-wide chromatin conformation capture (Hi-C) coupled with oligonucleotide capture technology (capture Hi-C) to map interactions between gene promoters and distal regulatory elements such as enhancers that are enriched for disease variants from genome-wide association studies (GWASs). We discuss how aberrations in architectural units are associated with various pathological outcomes, and explore how recent advances in genome and epigenome editing show great promise for a systematic understanding of complex genetic disorders. Our growing understanding of 3D genome architecture—coupled with the ability to engineer changes in it—may create novel therapeutic opportunities.

Journal ArticleDOI
TL;DR: It is concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the “unsequenceable” genomic trinucleotide repeat disorders.
Abstract: Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the "unsequenceable" genomic trinucleotide repeat disorders.

Journal ArticleDOI
TL;DR: The results suggest that RAB10 could be a promising therapeutic target for AD prevention and can be expanded and adapted to other phenotypes, thus serving as a model for future efforts to identify rare variants for AD and other complex human diseases.
Abstract: While age and the APOE e4 allele are major risk factors for Alzheimer’s disease (AD), a small percentage of individuals with these risk factors exhibit AD resilience by living well beyond 75 years of age without any clinical symptoms of cognitive decline. We used over 200 “AD resilient” individuals and an innovative, pedigree-based approach to identify genetic variants that segregate with AD resilience. First, we performed linkage analyses in pedigrees with resilient individuals and a statistical excess of AD deaths. Second, we used whole genome sequences to identify candidate SNPs in significant linkage regions. Third, we replicated SNPs from the linkage peaks that reduced risk for AD in an independent dataset and in a gene-based test. Finally, we experimentally characterized replicated SNPs. Rs142787485 in RAB10 confers significant protection against AD (p value = 0.0184, odds ratio = 0.5853). Moreover, we replicated this association in an independent series of unrelated individuals (p value = 0.028, odds ratio = 0.69) and used a gene-based test to confirm a role for RAB10 variants in modifying AD risk (p value = 0.002). Experimentally, we demonstrated that knockdown of RAB10 resulted in a significant decrease in Aβ42 (p value = 0.0003) and in the Aβ42/Aβ40 ratio (p value = 0.0001) in neuroblastoma cells. We also found that RAB10 expression is significantly elevated in human AD brains (p value = 0.04). Our results suggest that RAB10 could be a promising therapeutic target for AD prevention. In addition, our gene discovery approach can be expanded and adapted to other phenotypes, thus serving as a model for future efforts to identify rare variants for AD and other complex human diseases.