scispace - formally typeset
Search or ask a question

Showing papers by "Carlos Bustamante published in 2018"


Journal ArticleDOI
Sebastian M. Waszak, Paul A. Northcott1, Paul A. Northcott2, Ivo Buchhalter2, Giles W. Robinson1, Christian Sutter3, Susanne N. Groebner2, Kerstin Grund3, Laurence Brugières4, David T.W. Jones2, Kristian W. Pajtler2, Kristian W. Pajtler5, A. Sorana Morrissy6, Marcel Kool2, Dominik Sturm5, Dominik Sturm2, Lukas Chavez2, Aurélie Ernst2, Sebastian Brabetz5, Sebastian Brabetz2, Michael Hain2, Thomas Zichner, Maia Segura-Wang, Joachim Weischenfeldt7, Tobias Rausch, Balca R. Mardin, Xin Zhou1, Cristina Baciu8, Christian Lawerenz2, Jennifer A. Chan6, Pascale Varlet, Léa Guerrini-Rousseau4, Daniel W. Fults9, Wiesława Grajkowska, Peter Hauser10, Nada Jabado11, Young Shin Ra12, Karel Zitterbart13, Suyash Shringarpure14, Francisco M. De La Vega14, Carlos Bustamante14, Ho Keung Ng15, Arie Perry16, Tobey J. MacDonald17, Pablo Hernáiz Driever18, Anne Bendel19, Daniel C. Bowers20, Geoffrey McCowage21, Murali Chintagumpala22, Richard J. Cohn22, Tim Hassall22, Gudrun Fleischhack23, Tone Eggen, Finn Wesenberg24, Finn Wesenberg25, Maria Feychting26, Birgitta Lannering27, Joachim Schüz28, Christoffer Johansen7, Tina Veje Andersen, Martin Röösli29, Claudia E. Kuehni30, Michael A. Grotzer31, Kristina Kjaerheim, Camelia M. Monoranu32, Tenley C. Archer22, Tenley C. Archer33, Elizabeth S. Duke22, Scott L. Pomeroy22, Scott L. Pomeroy33, Redmond Shelagh30, Stephan Frank34, David Sumerauer35, Wolfram Scheurlen, Marina Ryzhova, Till Milde5, Till Milde2, Christian P. Kratz36, David Samuel22, Jinghui Zhang1, David A. Solomon16, Marco A. Marra37, Roland Eils2, Claus R. Bartram3, Katja von Hoff38, Katja von Hoff18, Stefan Rutkowski38, Vijay Ramaswamy39, Richard J. Gilbertson40, Andrey Korshunov5, Andrey Korshunov2, Michael D. Taylor, Peter Lichter2, David Malkin39, Amar Gajjar1, Jan O. Korbel, Stefan M. Pfister5, Stefan M. Pfister2 
TL;DR: The prevalence of genetic predispositions differed between molecular subgroups in the retrospective cohort and was highest for patients in the MBSHH subgroup, and survival estimates differed significantly across patients with germline mutations in different medulloblastoma predisposition genes.
Abstract: Summary Background Medulloblastoma is associated with rare hereditary cancer predisposition syndromes; however, consensus medulloblastoma predisposition genes have not been defined and screening guidelines for genetic counselling and testing for paediatric patients are not available. We aimed to assess and define these genes to provide evidence for future screening guidelines. Methods In this international, multicentre study, we analysed patients with medulloblastoma from retrospective cohorts (International Cancer Genome Consortium [ICGC] PedBrain, Medulloblastoma Advanced Genomics International Consortium [MAGIC], and the CEFALO series) and from prospective cohorts from four clinical studies (SJMB03, SJMB12, SJYC07, and I-HIT-MED). Whole-genome sequences and exome sequences from blood and tumour samples were analysed for rare damaging germline mutations in cancer predisposition genes. DNA methylation profiling was done to determine consensus molecular subgroups: WNT (MB WNT ), SHH (MB SHH ), group 3 (MB Group3 ), and group 4 (MB Group4 ). Medulloblastoma predisposition genes were predicted on the basis of rare variant burden tests against controls without a cancer diagnosis from the Exome Aggregation Consortium (ExAC). Previously defined somatic mutational signatures were used to further classify medulloblastoma genomes into two groups, a clock-like group (signatures 1 and 5) and a homologous recombination repair deficiency-like group (signatures 3 and 8), and chromothripsis was investigated using previously established criteria. Progression-free survival and overall survival were modelled for patients with a genetic predisposition to medulloblastoma. Findings We included a total of 1022 patients with medulloblastoma from the retrospective cohorts (n=673) and the four prospective studies (n=349), from whom blood samples (n=1022) and tumour samples (n=800) were analysed for germline mutations in 110 cancer predisposition genes. In our rare variant burden analysis, we compared these against 53 105 sequenced controls from ExAC and identified APC, BRCA2, PALB2, PTCH1, SUFU , and TP53 as consensus medulloblastoma predisposition genes according to our rare variant burden analysis and estimated that germline mutations accounted for 6% of medulloblastoma diagnoses in the retrospective cohort. The prevalence of genetic predispositions differed between molecular subgroups in the retrospective cohort and was highest for patients in the MB SHH subgroup (20% in the retrospective cohort). These estimates were replicated in the prospective clinical cohort (germline mutations accounted for 5% of medulloblastoma diagnoses, with the highest prevalence [14%] in the MB SHH subgroup). Patients with germline APC mutations developed MB WNT and accounted for most (five [71%] of seven) cases of MB WNT that had no somatic CTNNB1 exon 3 mutations. Patients with germline mutations in SUFU and PTCH1 mostly developed infant MB SHH . Germline TP53 mutations presented only in childhood patients in the MB SHH subgroup and explained more than half (eight [57%] of 14) of all chromothripsis events in this subgroup. Germline mutations in PALB2 and BRCA2 were observed across the MB SHH , MB Group3 , and MB Group4 molecular subgroups and were associated with mutational signatures typical of homologous recombination repair deficiency. In patients with a genetic predisposition to medulloblastoma, 5-year progression-free survival was 52% (95% CI 40–69) and 5-year overall survival was 65% (95% CI 52–81); these survival estimates differed significantly across patients with germline mutations in different medulloblastoma predisposition genes. Interpretation Genetic counselling and testing should be used as a standard-of-care procedure in patients with MB WNT and MB SHH because these patients have the highest prevalence of damaging germline mutations in known cancer predisposition genes. We propose criteria for routine genetic screening for patients with medulloblastoma based on clinical and molecular tumour characteristics. Funding German Cancer Aid; German Federal Ministry of Education and Research; German Childhood Cancer Foundation (Deutsche Kinderkrebsstiftung); European Research Council; National Institutes of Health; Canadian Institutes for Health Research; German Cancer Research Center; St Jude Comprehensive Cancer Center; American Lebanese Syrian Associated Charities; Swiss National Science Foundation; European Molecular Biology Organization; Cancer Research UK; Hertie Foundation; Alexander and Margaret Stewart Trust; V Foundation for Cancer Research; Sontag Foundation; Musicians Against Childhood Cancer; BC Cancer Foundation; Swedish Council for Health, Working Life and Welfare; Swedish Research Council; Swedish Cancer Society; the Swedish Radiation Protection Authority; Danish Strategic Research Council; Swiss Federal Office of Public Health; Swiss Research Foundation on Mobile Communication; Masaryk University; Ministry of Health of the Czech Republic; Research Council of Norway; Genome Canada; Genome BC; Terry Fox Research Institute; Ontario Institute for Cancer Research; Pediatric Oncology Group of Ontario; The Family of Kathleen Lorette and the Clark H Smith Brain Tumour Centre; Montreal Children's Hospital Foundation; The Hospital for Sick Children: Sonia and Arthur Labatt Brain Tumour Research Centre, Chief of Research Fund, Cancer Genetics Program, Garron Family Cancer Centre, MDT's Garron Family Endowment; BC Childhood Cancer Parents Association; Cure Search Foundation; Pediatric Brain Tumor Foundation; Brainchild; and the Government of Ontario.

229 citations


Journal ArticleDOI
21 Dec 2018-Science
TL;DR: Reconstituted ESCRT-III and Vps4 can harness ATP-dependent force production for membrane scission and this approach provides a window into the molecular mechanisms involved in the activities of ESCRTs.
Abstract: The endosomal sorting complexes required for transport (ESCRTs) catalyze reverse-topology scission from the inner face of membrane necks in HIV budding, multivesicular endosome biogenesis, cytokinesis, and other pathways. We encapsulated ESCRT-III subunits Snf7, Vps24, and Vps2 and the AAA+ ATPase (adenosine triphosphatase) Vps4 in giant vesicles from which membrane nanotubes reflecting the correct topology of scission could be pulled. Upon ATP release by photo-uncaging, this system generated forces within the nanotubes that led to membrane scission in a manner dependent upon Vps4 catalytic activity and Vps4 coupling to the ESCRT-III proteins. Imaging of scission revealed Snf7 and Vps4 puncta within nanotubes whose presence followed ATP release, correlated with force generation and nanotube constriction, and preceded scission. These observations directly verify long-standing predictions that ATP-hydrolyzing assemblies of ESCRT-III and Vps4 sever membranes.

133 citations


Journal ArticleDOI
TL;DR: The gut microbiomes of Raute and Raji reveal an intermediate state between the Chepang and Tharu, indicating that divergence from a stereotypical foraging microbiome can occur within a single generation, and environmental factors such as drinking water source and solid cooking fuel are significantly associated with the gut microbiome.
Abstract: The composition of the gut microbiome in industrialized populations differs from those living traditional lifestyles. However, it has been difficult to separate the contributions of human genetic and geographic factors from lifestyle. Whether shifts away from the foraging lifestyle that characterize much of humanity’s past influence the gut microbiome, and to what degree, remains unclear. Here, we characterize the stool bacterial composition of four Himalayan populations to investigate how the gut community changes in response to shifts in traditional human lifestyles. These groups led seminomadic hunting–gathering lifestyles until transitioning to varying levels of agricultural dependence upon farming. The Tharu began farming 250–300 years ago, the Raute and Raji transitioned 30–40 years ago, and the Chepang retain many aspects of a foraging lifestyle. We assess the contributions of dietary and environmental factors on their gut-associated microbes and find that differences in the lifestyles of Himalayan foragers and farmers are strongly correlated with microbial community variation. Furthermore, the gut microbiomes of all four traditional Himalayan populations are distinct from that of the Americans, indicating that industrialization may further exacerbate differences in the gut community. The Chepang foragers harbor an elevated abundance of taxa associated with foragers around the world. Conversely, the gut microbiomes of the populations that have transitioned to farming are more similar to those of Americans, with agricultural dependence and several associated lifestyle and environmental factors correlating with the extent of microbiome divergence from the foraging population. The gut microbiomes of Raute and Raji reveal an intermediate state between the Chepang and Tharu, indicating that divergence from a stereotypical foraging microbiome can occur within a single generation. Our results also show that environmental factors such as drinking water source and solid cooking fuel are significantly associated with the gut microbiome. Despite the pronounced differences in gut bacterial composition across populations, we found little differences in alpha diversity across lifestyles. These findings in genetically similar populations living in the same geographical region establish the key role of lifestyle in determining human gut microbiome composition and point to the next challenging steps of determining how large-scale gut microbiome reconfiguration impacts human biology.

118 citations


Journal ArticleDOI
TL;DR: It is shown that Early Neolithic Moroccans are similar to Later Stone Age individuals from the same region and possess an endemic element retained in present-day Maghrebi populations, confirming a long-term genetic continuity in the region.
Abstract: The extent to which prehistoric migrations of farmers influenced the genetic pool of western North Africans remains unclear. Archaeological evidence suggests that the Neolithization process may have happened through the adoption of innovations by local Epipaleolithic communities or by demic diffusion from the Eastern Mediterranean shores or Iberia. Here, we present an analysis of individuals’ genome sequences from Early and Late Neolithic sites in Morocco and from Early Neolithic individuals from southern Iberia. We show that Early Neolithic Moroccans (∼5,000 BCE) are similar to Later Stone Age individuals from the same region and possess an endemic element retained in present-day Maghrebi populations, confirming a long-term genetic continuity in the region. This scenario is consistent with Early Neolithic traditions in North Africa deriving from Epipaleolithic communities that adopted certain agricultural techniques from neighboring populations. Among Eurasian ancient populations, Early Neolithic Moroccans are distantly related to Levantine Natufian hunter-gatherers (∼9,000 BCE) and Pre-Pottery Neolithic farmers (∼6,500 BCE). Late Neolithic (∼3,000 BCE) Moroccans, in contrast, share an Iberian component, supporting theories of trans-Gibraltar gene flow and indicating that Neolithization of North Africa involved both the movement of ideas and people. Lastly, the southern Iberian Early Neolithic samples share the same genetic composition as the Cardial Mediterranean Neolithic culture that reached Iberia ∼5,500 BCE. The cultural and genetic similarities between Iberian and North African Neolithic traditions further reinforce the model of an Iberian migration into the Maghreb.

106 citations


Journal ArticleDOI
TL;DR: The effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank is characterized and 27 associations between medical phenotypes and protein- Truncation variants in genes outside the major histocompatibility complex are found.
Abstract: Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. Here, we characterize the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and find 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We perform phenome-wide analyses and directly measure the effect in homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated as being protective against disease or associated with at least one phenotype in our study. We find several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.

97 citations


Journal ArticleDOI
TL;DR: Experiments show that Network Enhancement (NE) improves gene–function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species.
Abstract: Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene–function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks. Technical noise in experiments is unavoidable, but it introduces inaccuracies into the biological networks we infer from the data. Here, the authors introduce a diffusion-based method for denoising undirected, weighted networks, and show that it improves the performances of downstream analyses.

92 citations


Journal ArticleDOI
TL;DR: To avoid further inequities in health outcomes, the inclusion of diverse populations in research, unbiased genotyping, and methods of bias reduction in PRS are critical.
Abstract: A new study highlights the biases and inaccuracies of polygenic risk scores (PRS) when predicting disease risk in individuals from populations other than those used in their derivation. The design bias of workhorse tools used for research, particularly genotyping arrays, contributes to these distortions. To avoid further inequities in health outcomes, the inclusion of diverse populations in research, unbiased genotyping, and methods of bias reduction in PRS are critical.

91 citations


Journal ArticleDOI
TL;DR: The requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories, and the need for a standardized REA data collection framework to be developed through partnerships and collaborations and adopted across clinical genomics.
Abstract: The Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group highlights the need to develop guidance on race, ethnicity, and ancestry (REA) data collection and use in clinical genomics. We present quantitative and qualitative evidence to characterize: (1) acquisition of REA data via clinical laboratory requisition forms, and (2) information disparity across populations in the Genome Aggregation Database (gnomAD) at clinically relevant sites ascertained from annotations in ClinVar. Our requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories. There was also striking disparity across REA populations in the amount of information available about clinically relevant variants in gnomAD. European ancestral populations constituted the majority of observations (55.8%), allele counts (59.7%), and private alleles (56.1%) in gnomAD at 550 loci with "pathogenic" and "likely pathogenic" expert-reviewed variants in ClinVar. Our findings highlight the importance of implementing and supporting programs to increase diversity in genome sequencing and clinical genomics, as well as measuring uncertainty around population-level datasets that are used in variant interpretation. Finally, we suggest the need for a standardized REA data collection framework to be developed through partnerships and collaborations and adopted across clinical genomics.

90 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the ancestors of the so-called “Taino” who inhabited large parts of the Caribbean in pre-Columbian times originated in northern South America, and there is evidence that they had a comparatively large effective population size.
Abstract: The Caribbean was one of the last parts of the Americas to be settled by humans, but how and when the islands were first occupied remains a matter of debate. Ancient DNA can help answering these questions, but the work has been hampered by poor DNA preservation. We report the genome sequence of a 1,000-year-old Lucayan Taino individual recovered from the site of Preacher’s Cave in the Bahamas. We sequenced her genome to 12.4-fold coverage and show that she is genetically most closely related to present-day Arawakan speakers from northern South America, suggesting that the ancestors of the Lucayans originated there. Further, we find no evidence for recent inbreeding or isolation in the ancient genome, suggesting that the Lucayans had a relatively large effective population size. Finally, we show that the native American components in some present-day Caribbean genomes are closely related to the ancient Taino, demonstrating an element of continuity between precontact populations and present-day Latino populations in the Caribbean.

70 citations


Journal ArticleDOI
16 Oct 2018
TL;DR: A genomic analysis of 200 cacao plants representing more than 10 genetically distinct populations identifies metabolic and disease resistance genes as contributing to the domestication of cacao and shows that domesticated populations maintain a high proportion of deleterious mutations.
Abstract: Domestication has had a strong impact on the development of modern societies. We sequenced 200 genomes of the chocolate plant Theobroma cacao L. to show for the first time to our knowledge that a single population, the Criollo population, underwent strong domestication ~3600 years ago (95% CI: 2481–13,806 years ago). We also show that during the process of domestication, there was strong selection for genes involved in the metabolism of the colored protectants anthocyanins and the stimulant theobromine, as well as disease resistance genes. Our analyses show that domesticated populations of T. cacao (Criollo) maintain a higher proportion of high-frequency deleterious mutations. We also show for the first time the negative consequences of the increased accumulation of deleterious mutations during domestication on the fitness of individuals (significant reduction in kilograms of beans per hectare per year as Criollo ancestry increases, as estimated from a GLM, P = 0.000425).

59 citations


Journal ArticleDOI
TL;DR: Network Enhancement (NE) as discussed by the authors uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network.
Abstract: Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. However, biological networks are noisy due to the limitations of measurement technology and inherent natural variation, which can hamper discovery of network patterns and dynamics. We propose Network Enhancement (NE), a method for improving the signal-to-noise ratio of undirected, weighted networks. NE uses a doubly stochastic matrix operator that induces sparsity and provides a closed-form solution that increases spectral eigengap of the input network. As a result, NE removes weak edges, enhances real connections, and leads to better downstream performance. Experiments show that NE improves gene function prediction by denoising tissue-specific interaction networks, alleviates interpretation of noisy Hi-C contact maps from the human genome, and boosts fine-grained identification accuracy of species. Our results indicate that NE is widely applicable for denoising biological networks.

Journal ArticleDOI
TL;DR: This study aimed to compare indications and short‐term outcomes of TaTME, open, laparoscopic, and robotic TME internationally.
Abstract: Introduction: Transanal total mesorectal excision (TaTME) has rapidly emerged as a novel approach for rectal cancer surgery. Safety profiles are still emerging and more comparative data is urgently needed. This study aimed to compare indications and short-term outcomes of TaTME, open, laparoscopic, and robotic TME internationally. Methods: A pre-planned analysis of the European Society of Coloproctology (ESCP) 2017 audit was performed. Patients undergoing elective total mesorectal excision (TME) for malignancy between 1 January 2017 and 15 March 2017 by any operative approach were included. The primary outcome measure was anastomotic leak. Results: Of 2579 included patients, 76.2% (1966/2579) underwent TME with restorative anastomosis of which 19.9% (312/1966) had a minimally invasive approach (laparoscopic or robotic) which included a transanal component (TaTME). Overall, 9.0% (175/1951, 15 missing outcome data) of patients suffered an anastomotic leak. On univariate analysis both laparoscopic TaTME (OR 1.61, 1.02–2.48, P = 0.04) and robotic TaTME (OR 3.05, 1.10–7.34, P = 0.02) were associated with a higher risk of anastomotic leak than non-transanal laparoscopic TME. However this association was lost in the mixed-effects model controlling for patient and disease factors (OR 1.23, 0.77–1.97, P = 0.39 and OR 2.11, 0.79–5.62, P = 0.14 respectively), whilst low rectal anastomosis (OR 2.72, 1.55–4.77, P < 0.001) and male gender (OR 2.29, 1.52–3.44, P < 0.001) remained strongly associated. The overall positive circumferential margin resection rate was 4.0%, which varied between operative approaches: laparoscopic 3.2%, transanal 3.8%, open 4.7%, robotic 1%. Conclusion: This contemporaneous international snapshot shows that uptake of the TaTME approach is widespread and is associated with surgically and pathologically acceptable results.

Posted ContentDOI
07 Feb 2018-bioRxiv
TL;DR: ESRT-III and Vps4 were reconstituted from within the interior of nanotubes pulled from giant vesicles, revealing that this machinery couples ATP-dependent force production for membrane scission.
Abstract: The ESCRTs catalyze reverse-topology scission from the inner face of membrane necks in HIV budding, multivesicular endosome biogenesis, cytokinesis, and other pathways. We encapsulated a minimal ESCRT module consisting of ESCRT-III subunits Snf7, Vps24, and Vps2, and the AAA+ ATPase Vps4 such that membrane nanotubes reflecting the correct topology of scission could be pulled from giant vesicles. Upon ATP release by photo-uncaging, this system was capable of generating forces within the nanotubes in a manner dependent upon Vps4 catalytic activity, Vps4 coupling to the ESCRT-III proteins, and membrane insertion by Snf7. At physiological concentrations, single scission events were observed that correlated with forces of ~6 pN, verifying predictions that ESCRTs are capable of exerting forces on membranes. Imaging of scission with subsecond resolution revealed Snf7 puncta at the sites of membrane cutting, directly verifying longstanding predictions for the ESCRT scission mechanism.

Journal ArticleDOI
TL;DR: A strong affinity between modern and ancient individuals from the region is found, providing evidence of continuity in the region for the last ∼1,000 years and regional genetic structure within Southern South America.
Abstract: Patagonia was the last region of the Americas reached by humans who entered the continent from Siberia ∼15,000-20,000 y ago. Despite recent genomic approaches to reconstruct the continental evolutionary history, regional characterization of ancient and modern genomes remains understudied. Exploring the genomic diversity within Patagonia is not just a valuable strategy to gain a better understanding of the history and diversification of human populations in the southernmost tip of the Americas, but it would also improve the representation of Native American diversity in global databases of human variation. Here, we present genome data from four modern populations from Central Southern Chile and Patagonia (n = 61) and four ancient maritime individuals from Patagonia (∼1,000 y old). Both the modern and ancient individuals studied in this work have a greater genetic affinity with other modern Native Americans than to any non-American population, showing within South America a clear structure between major geographical regions. Native Patagonian Kaweskar and Yamana showed the highest genetic affinity with the ancient individuals, indicating genetic continuity in the region during the past 1,000 y before present, together with an important agreement between the ethnic affiliation and historical distribution of both groups. Lastly, the ancient maritime individuals were genetically equidistant to a ∼200-y-old terrestrial hunter-gatherer from Tierra del Fuego, which supports a model with an initial separation of a common ancestral group to both maritime populations from a terrestrial population, with a later diversification of the maritime groups.

Journal ArticleDOI
TL;DR: A high-resolution optical tweezers assay is used and it is found that pause sites modify the dynamics of nearly all RNAP molecules, rather than just affecting the subset of molecules that enter long-lived pauses.
Abstract: Transcription by RNA polymerase (RNAP) is interspersed with sequence-dependent pausing. The processes through which paused states are accessed and stabilized occur at spatiotemporal scales beyond the resolution of previous methods, and are poorly understood. Here, we combine high-resolution optical trapping with improved data analysis methods to investigate the formation of paused states at enhanced temporal resolution. We find that pause sites reduce the forward transcription rate of nearly all RNAP molecules, rather than just affecting the subset of molecules that enter long-lived pauses. We propose that the reduced rates at pause sites allow time for the elongation complex to undergo conformational changes required to enter long-lived pauses. We also find that backtracking occurs stepwise, with states backtracked by at most one base pair forming quickly, and further backtracking occurring slowly. Finally, we find that nascent RNA structures act as modulators that either enhance or attenuate pausing, depending on the sequence context.

Journal ArticleDOI
01 May 2018-PLOS ONE
TL;DR: There is a correlation between distance to Buenos Aires and proportion of Native American ancestry, where the highest proportion corresponds to the Northernmost populations, which is also the furthest from the Argentinian capital.
Abstract: We analyzed 391 samples from 12 Argentinian populations from the Center-West, East and North-West regions with the Illumina Human Exome Beadchip v1.0 (HumanExome-12v1-A). We did Principal Components analysis to infer patterns of populational divergence and migrations. We identified proportions and patterns of European, African and Native American ancestry and found a correlation between distance to Buenos Aires and proportion of Native American ancestry, where the highest proportion corresponds to the Northernmost populations, which is also the furthest from the Argentinian capital. Most of the European sources are from a South European origin, matching historical records, and we see two different Native American components, one that spreads all over Argentina and another specifically Andean. The highest percentages of African ancestry were in the Center West of Argentina, where the old trade routes took the slaves from Buenos Aires to Chile and Peru. Subcontinentaly, sources of this African component are represented by both West Africa and groups influenced by the Bantu expansion, the second slightly higher than the first, unlike North America and the Caribbean, where the main source is West Africa. This is reasonable, considering that a large proportion of the ships arriving at the Southern Hemisphere came from Mozambique, Loango and Angola.

Journal ArticleDOI
TL;DR: The fetal genetic contribution to PTB is unlikely due to single common genetic variant, but could be explained by interactions of multiple common variants, or of rare variants affected by environmental influences, all not detectable using a GWAS alone.
Abstract: Preterm birth (PTB), or the delivery prior to 37 weeks of gestation, is a significant cause of infant morbidity and mortality. Although twin studies estimate that maternal genetic contributions account for approximately 30% of the incidence of PTB, and other studies reported fetal gene polymorphism association, to date no consistent associations have been identified. In this study, we performed the largest reported genome-wide association study analysis on 1,349 cases of PTB and 12,595 ancestry-matched controls from the focusing on genomic fetal signals. We tested over 2 million single nucleotide polymorphisms (SNPs) for associations with PTB across five subpopulations: African (AFR), the Americas (AMR), European, South Asian, and East Asian. We identified only two intergenic loci associated with PTB at a genome-wide level of significance: rs17591250 (P = 4.55E-09) on chromosome 1 in the AFR population and rs1979081 (P = 3.72E-08) on chromosome 8 in the AMR group. We have queried several existing replication cohorts and found no support of these associations. We conclude that the fetal genetic contribution to PTB is unlikely due to single common genetic variant, but could be explained by interactions of multiple common variants, or of rare variants affected by environmental influences, all not detectable using a GWAS alone.

Journal ArticleDOI
TL;DR: This work eliminated the main source of noise of most high-resolution dual-trap optical tweezers and developed both a single-molecule assay and a self-learning algorithm to uncover the full trajectories of such a motor: RNA polymerase.
Abstract: In recent years, highly stable optical tweezers systems have enabled the characterization of the dynamics of molecular motors at very high resolution. However, the motion of many motors with angstrom-scale dynamics cannot be consistently resolved due to poor signal-to-noise ratio. Using an acousto-optic deflector to generate a "time-shared" dual-optical trap, we decreased low-frequency noise by more than one order of magnitude compared with conventional dual-trap optical tweezers. Using this instrument, we implemented a protocol that synthesizes single base-pair trajectories, which are used to test a Large State Space Hidden Markov Model algorithm to recover their individual steps. We then used this algorithm on real transcription data obtained in the same instrument to fully uncover the molecular trajectories of Escherichia coli RNA polymerase. We applied this procedure to reveal the effect of pyrophosphate on the distribution of dwell times between consecutive polymerase steps.

Journal ArticleDOI
TL;DR: A novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project, which demonstrates increased imputation accuracy for rare variants and examines array design strategies that contrast multi-ethnic cohorts vs. single populations.
Abstract: The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.

Journal ArticleDOI
TL;DR: WT and arginine finger mutants of the pentameric bacteriophage φ29 DNA packaging motor are studied to reveal the molecular interactions necessary for the coordination of ADP–ATP exchange and ATP hydrolysis of the motor’s biphasic mechanochemical cycle.
Abstract: Subunits in multimeric ring-shaped motors must coordinate their activities to ensure correct and efficient performance of their mechanical tasks Here, we study WT and arginine finger mutants of the pentameric bacteriophage φ29 DNA packaging motor Our results reveal the molecular interactions necessary for the coordination of ADP–ATP exchange and ATP hydrolysis of the motor’s biphasic mechanochemical cycle We show that two distinct regulatory mechanisms determine this coordination In the first mechanism, the DNA up-regulates a single subunit’s catalytic activity, transforming it into a global regulator that initiates the nucleotide exchange phase and the hydrolysis phase In the second, an arginine finger in each subunit promotes ADP–ATP exchange and ATP hydrolysis of its neighbor Accordingly, we suggest that the subunits perform the roles described for GDP exchange factors and GTPase-activating proteins observed in small GTPases We propose that these mechanisms are fundamental to intersubunit coordination and are likely present in other ring ATPases

Posted ContentDOI
09 May 2018-bioRxiv
TL;DR: The requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories, and the need for a standardized REA data collection framework to be developed and adopted across clinical genomics is suggested.
Abstract: The Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group highlights the need to develop guidance on race, ethnicity, and ancestry (REA) data collection and use in clinical genomics. We present quantitative and qualitative evidence to characterize: 1) acquisition of REA data via clinical laboratory requisition forms, and 2) information disparity across populations in the Genome Aggregation Database (gnomAD) at clinically relevant sites as determined by variants in ClinVar. Our requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories. There was also striking disparity across REA populations in the amount of information available about variants at clinically relevant sites in gnomAD. European ancestral populations constituted the majority of observations (55.8%), allele counts (59.7%), and private alleles (56.1%) in gnomAD at 550 loci with "pathogenic" and "likely pathogenic" expert-reviewed variants in ClinVar. Our findings highlight the importance of implementing and supporting programs to increase diversity in genome sequencing and clinical genomics, as well as measuring uncertainty around population-level datasets that are used in variant interpretation. Finally, we suggest the need for a standardized REA data collection framework to be developed and adopted across clinical genomics.

Journal ArticleDOI
TL;DR: The aim of this study was to compare the major postoperative complication rate in patients undergoing end stoma vs primary anastomosis following emergency left sided colorectal resection.
Abstract: Some evidence suggests that primary anastomosis following left sided colorectal resection in the emergency setting may be safe in selected patients, and confer favourable outcomes to permanent enterostomy. The aim of this study was to compare the major postoperative complication rate in patients undergoing end stoma vs primary anastomosis following emergency left sided colorectal resection.

Posted ContentDOI
03 Dec 2018-bioRxiv
TL;DR: In spite of observing interinsular differences in the survival of indigenous lineages, modern populations, with the sole exception of La Gomera, are homogenous across the islands, supporting the theory of extensive human mobility after the European conquest.
Abstract: The Canary Islands’ indigenous people have been the subject of substantial archaeological, anthropological, linguistic and genetic research pointing to a most probable North African Berber source. However, neither agreement about the exact point of origin nor a model for the indigenous colonization of the islands has been established. To shed light on these questions, we analyzed 48 ancient mitogenomes from 25 archaeological sites from the seven main islands. Most lineages observed in the ancient samples have a Mediterranean distribution, and belong to lineages associated with the Neolithic expansion in the Near East and Europe (T2c, J2a, X3a…). This phylogeographic analysis of Canarian indigenous mitogenomes, the first of its kind, shows that some lineages are restricted to Central North Africa (H1cf, J2a2d and T2c1d3), while others have a wider distribution, including both West and Central North Africa, and, in some cases, Europe and the Near East (U6a1a1, U6a7a1, U6b, X3a, U6c1). In addition, we identify four new Canarian-specific lineages (H1e1a9, H4a1e, J2a2d1a and L3b1a12) whose coalescence dates correlate with the estimated time for the colonization of the islands (1st millennia CE). Additionally, we observe an asymmetrical distribution of mtDNA haplogroups in the ancient population, with certain haplogroups appearing more frequently in the islands closer to the continent. This reinforces results based on modern mtDNA and Y-chromosome data, and archaeological evidence suggesting the existence of two distinct migrations. Comparisons between insular populations show that some populations had high genetic diversity, while others were probably affected by genetic drift and/or bottlenecks. In spite of observing interinsular differences in the survival of indigenous lineages, modern populations, with the sole exception of La Gomera, are homogenous across the islands, supporting the theory of extensive human mobility after the European conquest.

Journal ArticleDOI
TL;DR: The first in-solution capture-enrichment method targeting the human Y-chromosome in aDNA sequencing libraries is presented, leading to an increase in the amount of Y-DNA sequences, as compared to libraries not enriched for the Y- chromosome.
Abstract: As most ancient biological samples have low levels of endogenous DNA, it is advantageous to enrich for specific genomic regions prior to sequencing. One approach—in-solution capture-enrichment—retrieves sequences of interest and reduces the fraction of microbial DNA. In this work, we implement a capture-enrichment approach targeting informative regions of the Y chromosome in six human archaeological remains excavated in the Caribbean and dated between 200 and 3000 years BP. We compare the recovery rate of Y-chromosome capture (YCC) alone, whole-genome capture followed by YCC (WGC + YCC) versus non-enriched (pre-capture) libraries. The six samples show different levels of initial endogenous content, with very low (< 0.05%, 4 samples) or low (0.1–1.54%, 2 samples) percentages of sequenced reads mapping to the human genome. We recover 12–9549 times more targeted unique Y-chromosome sequences after capture, where 0.0–6.2% (WGC + YCC) and 0.0–23.5% (YCC) of the sequence reads were on-target, compared to 0.0–0.00003% pre-capture. In samples with endogenous DNA content greater than 0.1%, we found that WGC followed by YCC (WGC + YCC) yields lower enrichment due to the loss of complexity in consecutive capture experiments, whereas in samples with lower endogenous content, the libraries’ initial low complexity leads to minor proportions of Y-chromosome reads. Finally, increasing recovery of informative sites enabled us to assign Y-chromosome haplogroups to some of the archeological remains and gain insights about their paternal lineages and origins. We present to our knowledge the first in-solution capture-enrichment method targeting the human Y-chromosome in aDNA sequencing libraries. YCC and WGC + YCC enrichments lead to an increase in the amount of Y-DNA sequences, as compared to libraries not enriched for the Y-chromosome. Our probe design effectively recovers regions of the Y-chromosome bearing phylogenetically informative sites, allowing us to identify paternal lineages with less sequencing than needed for pre-capture libraries. Finally, we recommend considering the endogenous content in the experimental design and avoiding consecutive rounds of capture, as clonality increases considerably with each round.

Journal ArticleDOI
TL;DR: In this paper, the authors used imputed gene expression levels in 6891 cases and 54,566 controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558 self-reported cSCC cases and 673,788 controls from 23andMe.
Abstract: Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute gene expression levels in 6891 cSCC cases and 54,566 controls in the Kaiser Permanente Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort and 25,558 self-reported cSCC cases and 673,788 controls from 23andMe. In a discovery-validation study, we identify 19 loci containing 33 genes whose imputed expression levels are associated with cSCC at false discovery rate < 10% in the GERA cohort and validate 15 of these candidate genes at Bonferroni significance in the 23andMe dataset, including eight genes in five novel susceptibility loci and seven genes in four previously associated loci. These results suggest genetic mechanisms contributing to cSCC risk and illustrate advantages and disadvantages of TWAS as a supplement to traditional GWAS analyses.

Journal ArticleDOI
24 Oct 2018
TL;DR: A deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free-text notes and enables automated disease annotation across a broad range of clinical diagnoses with minimal preprocessing.
Abstract: Large scale veterinary clinical records can become a powerful resource for patient care and research. However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free-text notes. The lack of standard coding makes it challenging to use the clinical data to improve patient care. It is also a major impediment to cross-species translational research, which relies on the ability to accurately identify patient cohorts with specific diagnostic criteria in humans and animals. In order to reduce the coding burden for veterinary clinical practice and aid translational research, we have developed a deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free-text notes. DeepTag is trained on a newly curated dataset of 112,558 veterinary notes manually annotated by experts. DeepTag extends multitask LSTM with an improved hierarchical objective that captures the semantic structures between diseases. To foster human-machine collaboration, DeepTag also learns to abstain in examples when it is uncertain and defers them to human experts, resulting in improved performance. DeepTag accurately infers disease codes from free-text even in challenging cross-hospital settings where the text comes from different clinical settings than the ones used for training. It enables automated disease annotation across a broad range of clinical diagnoses with minimal preprocessing. The technical framework in this work can be applied in other medical domains that currently lack medical coding resources.

Journal ArticleDOI
TL;DR: It is demonstrated that a canonical Eurasian skin pigmentation gene, SLC24A5, was introduced to southern Africa via recent migration and experienced strong adaptive evolution in the KhoeSan, both a rare example of intense, ongoing adaptation in very recent human history and an adaptive gene flow at a pigmentation locus in humans.
Abstract: Skin pigmentation is under strong directional selection in northern European and Asian populations. The indigenous KhoeSan populations of far southern Africa have lighter skin than other sub-Saharan African populations, potentially reflecting local adaptation to a region of Africa with reduced UV radiation. Here, we demonstrate that a canonical Eurasian skin pigmentation gene, SLC24A5, was introduced to southern Africa via recent migration and experienced strong adaptive evolution in the KhoeSan. To reconstruct the evolution of skin pigmentation, we collected phenotypes from over 400 ≠Khomani San and Nama individuals and high-throughput sequenced candidate pigmentation genes. The derived causal allele in SLC24A5, p.Ala111Thr, significantly lightens basal skin pigmentation in the KhoeSan and explains 8 to 15% of phenotypic variance in these populations. The frequency of this allele (33 to 53%) is far greater than expected from colonial period European gene flow; however, the most common derived haplotype is identical among European, eastern African, and KhoeSan individuals. Using four-population demographic simulations with selection, we show that the allele was introduced into the KhoeSan only 2,000 y ago via a back-to-Africa migration and then experienced a selective sweep (s = 0.04 to 0.05 in ≠Khomani and Nama). The SLC24A5 locus is both a rare example of intense, ongoing adaptation in very recent human history, as well as an adaptive gene flow at a pigmentation locus in humans.

Journal ArticleDOI
Nick J. Battersby1, James C. Glasbey, Peter Neary, Ionut Negoi  +1362 moreInstitutions (1)
TL;DR: The overall complete pathological response (pCR) rate and the reliability of detecting a cCR by conventional pre‐operative imaging are reported.
Abstract: This is the peer reviewed version of the following article: , which has been published in final form at https://doi.org/10.1111/codi.14361. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions."

Journal ArticleDOI
TL;DR: It is argued that the regulatory mechanisms employed by motor proteins display features similar to those described in small GTPases, which require external regulatory elements, such as dissociation inhibitors, exchange factors and activating proteins, to switch the protein's function ‘on’ and ‘off'.
Abstract: Motor proteins are powered by nucleotide hydrolysis and exert mechanical work to carry out many fundamental biological tasks. To ensure their correct and efficient performance, the motors9 activities are allosterically regulated by additional factors that enhance or suppress their NTPase activity. Here, we review two highly conserved mechanisms of ATP hydrolysis activation and repression operating in motor proteins—the glutamate switch and the arginine finger—and their associated regulatory factors. We examine the implications of these regulatory mechanisms in proteins that are formed by multiple ATPase subunits. We argue that the regulatory mechanisms employed by motor proteins display features similar to those described in small GTPases, which require external regulatory elements, such as dissociation inhibitors, exchange factors and activating proteins, to switch the protein9s function ‘on’ and ‘off9. Likewise, similar regulatory roles are taken on by the motor9s substrate, additional binding factors, and even adjacent subunits in multimeric complexes. However, in motor proteins, more than one regulatory factor and the two mechanisms described here often underlie the machine9s operation. Furthermore, ATPase regulation takes place throughout the motor9s cycle, which enables a more complex function than the binary ‘active9 and ‘inactive9 states. This article is part of a discussion meeting issue ‘Allostery and molecular machines9.

Journal ArticleDOI
01 Apr 2018-Genome
TL;DR: This study used whole-genome data to assess the levels of heterozygosity in different lineages of the mangrove rivulus and infer the phylogenetic relationships among those lineages, and sequenced whole genomes from 15 lineages that were completely homozygous at microsatellite loci.
Abstract: The mangrove rivulus, Kryptolebias marmoratus, is one of only two self-fertilizing hermaphroditic fish species and inhabits mangrove forests. While selfing can be advantageous, it reduces heterozygosity and decreases genetic diversity. Studies using microsatellites found that there are variable levels of selfing among populations of K. marmoratus, but overall, there is a low rate of outcrossing and, therefore, low heterozygosity. In this study, we used whole-genome data to assess the levels of heterozygosity in different lineages of the mangrove rivulus and infer the phylogenetic relationships among those lineages. We sequenced whole genomes from 15 lineages that were completely homozygous at microsatellite loci and used single nucleotide polymorphisms (SNPs) to determine heterozygosity levels. More variation was uncovered than in studies using microsatellite data because of the resolution of full genome sequencing data. Moreover, missense polymorphisms were found most often in genes associated with immune function and reproduction. Inferred phylogenetic relationships suggest that lineages largely group by their geographic distribution. The use of whole-genome data provided further insight into genetic diversity in this unique species. Although this study was limited by the number of lineages that were available, these data suggest that there is previously undescribed variation within lineages of K. marmoratus that could have functional consequences and (or) inform us about the limits to selfing (e.g., genetic load, accumulation of deleterious mutations) and selection that might favor the maintenance of heterozygosity. These results highlight the need to sequence additional individuals within and among lineages.