Complete genomic and epigenetic maps of human centromeres
Reads0
Chats0
TLDR
In this paper , a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled the comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome.Abstract:
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions. read more
Citations
More filters
Journal ArticleDOI
The complete sequence of a human genome
Sergey Koren,Sergey Nurk,Mikko Rautiainen,B Ren,Weijun Zhu,Richard Lawless,Саидмуродов Мамур Таирович +6 more
TL;DR: The T2T-CHM13-T2T Consortium presented a complete 3.055 billion-base pair sequence of a human genome, including gapless assemblies for all chromosomes except Y, corrected errors in the prior references, and introduced nearly 200 million base pairs of sequence containing gene predictions, 99 of which are predicted to be protein coding as discussed by the authors .
Journal ArticleDOI
A complete reference genome improves analysis of human genetic variation
Justin M. Zook,Samantha Zarate,Marcel Lukas,Alexander Herrera Wassilowski,Petra Mund,Ulrich Costabel,Ivan Bozic,Zhili Zhou,William H. Louviere +8 more
TL;DR: The T2T-CHM13 reference as discussed by the authors has been shown to universally improve read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively.
Journal ArticleDOI
From telomere to telomere: The transcriptional and epigenetic state of human repeat elements
TL;DR: In this paper , a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome was presented, which expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeat, and located retroelement transduction events.
Journal ArticleDOI
A draft human pangenome reference
Wen-Wei Liao,Mobin Asri,Jana Ebler,Daniel Doerr,Marina Haukness,Glenn Hickey,Shuangjia Lu,Julian K. Lucas,Jean Marcel Maurice Monlong,Haley J. Abel,Silvia Buonaiuto,Xian Chang,Haoyu Cheng,Justin Jang Hann Chu,Vincenza Colonna,Jordan M. Eizenga,Xiaowen Feng,Christian Fischer,Robert S. Fulton,Shilpa Garg,Cristian Groza,Andrea Guarracino,William T. Harvey,Simon Heumos,Kerstin Howe,Miten Jain,Tsung-Yu Lu,Charles Markello,Fergal J. Martin,Matthew Mitchell,Katherine M. Munson,Moses N. Mwaniki,Adam M. Novak,Hugh E. Olsen,Trevor Pesout,David Porubsky,Pjotr Prins,Jonas Andreas Sibbesen,Chad Tomlinson,Flavia Villani,Mitchell R. Vollger,Guillaume Bourque,Mark Chaisson,Paul Flicek,Adam M. Phillippy,Justin M. Zook,Evan E. Eichler,David Haussler,Erich D. Jarvis,Karen H. Miga,Ting Wang,Erik Garrison,Tobias Marschall,Ira M. Hall,Heng Li,Benedict Paten +55 more
TL;DR: The pangenome reference as discussed by the authors contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals and is more than 99% accurate at the structural and base pair levels.
Journal ArticleDOI
Epigenetic patterns in a complete human genome
TL;DR: In this article , a high-resolution epigenetic study of previously unresolved sequences was presented, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes.
References
More filters
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods
Koichiro Tamura,Daniel S. Peterson,Nicholas Peterson,Glen Stecher,Masatoshi Nei,Sudhir Kumar +5 more
TL;DR: The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models, inferring ancestral states and sequences, and estimating evolutionary rates site-by-site.
Journal ArticleDOI
MUSCLE: multiple sequence alignment with high accuracy and high throughput
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Journal ArticleDOI
Cutadapt removes adapter sequences from high-throughput sequencing reads
TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Journal ArticleDOI
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.