scispace - formally typeset
Search or ask a question

Showing papers on "Personal genomics published in 2022"


Journal ArticleDOI
TL;DR: In this paper, an improved method based on machine learning was proposed to analyze the sequencing and tumor sequencing patterns of the human gene and analyzes the circulatory problems of patients with different tumor types for analysis in the public domain.
Abstract: In general, the various medical systems currently available provide insights into changes in the tumor genome of patients with tumor sequencing. Most of the tumor DNA sequencing can also be referred to as genetic specification or genetic testing. The sequence results help clinical decision-making to develop a personalized cancer treatment plan based on the molecular characteristics of the tumor rather than a one-size-fits-all treatment approach. The tumor sequencing also plays a major role in cancer research. In this paper, an improved method based on machine learning was proposed to analyze the sequencing and tumor sequencing patterns of the human gene. This proposed method analyzes the circulatory problems of patients with different tumor types for analysis in the public domain. It also constantly monitors large data sets of cancer or tumor genetic sequences to calculate tumor size and location. This allows the doctor to get an accurate report on the type of tumor and the problems it can cause to the patient. The Analysis of these datasets of cancer tumor gene sequences reveals that the genetic makeup of each patient is different and that no two cancers are the same.

164 citations


Journal ArticleDOI
TL;DR: In this article , the potential and challenges of clinical whole-genome sequencing in solid tumors and hematological malignancies are reviewed and critically appraised in a series of three studies.

20 citations


Journal ArticleDOI
TL;DR: A brief history of the bacterial genome sequencing revolution and its application in public health and molecular epidemiology is described, and a chronology that encompasses the various technological developments: whole-genome shotgun sequencing, high-throughput sequencing, long-read sequencing is presented.
Abstract: Over the past 25 years, the powerful combination of genome sequencing and bioinformatics analysis has played a crucial role in interpreting information encoded in bacterial genomes. High-throughput sequencing technologies have paved the way towards understanding an increasingly wide range of biological questions. This revolution has enabled advances in areas ranging from genome composition to how proteins interact with nucleic acids. This has created unprecedented opportunities through the integration of genomic data into clinics for the diagnosis of genetic traits associated with disease. Since then, these technologies have continued to evolve, and recently, long-read sequencing has overcome previous limitations in terms of accuracy, thus expanding its applications in genomics, transcriptomics and metagenomics. In this review, we describe a brief history of the bacterial genome sequencing revolution and its application in public health and molecular epidemiology. We present a chronology that encompasses the various technological developments: whole-genome shotgun sequencing, high-throughput sequencing, long-read sequencing. We mainly discuss the application of next-generation sequencing to decipher bacterial genomes. Secondly, we highlight how long-read sequencing technologies go beyond the limitations of traditional short-read sequencing. We intend to provide a description of the guiding principles of the 3rd generation sequencing applications and ongoing improvements in the field of microbial medical research.

16 citations


Book ChapterDOI
01 Jan 2022
TL;DR: The main genetic testing strategies relevant to ophthalmic practice are discussed in this paper , where the authors discuss the main genetic test strategies relevant for ophthalmology practice and discuss how to use them in practice.
Abstract: Over the past two decades, technological advances have transformed molecular genetic testing for ophthalmic disorders. There has been a shift from targeted testing of known mutations (using arrayed primer extension technology) and serial analysis of a small number of genes (using Sanger sequencing) to testing comprehensive gene panels (using massively parallel sequencing, also known as high-throughput sequencing or next-generation sequencing). Cost-effective interrogation of all protein-coding genes at once is now possible (using exome sequencing) and, soon, routine genetic testing is likely to involve in-depth analysis of both the protein and the non-protein-coding parts of the genome (using genome sequencing). In this chapter, we discuss the main genetic testing strategies relevant to ophthalmic practice.

7 citations


Journal ArticleDOI
TL;DR: The model of network genomics pioneered in the late 1980s and adopted in the European Commission-led Yeast Genome Sequencing Project (YGSP) contrasted with the burgeoning large-scale center model being developed in the United States to sequence the yeast genome, chiefly as a pilot for tackling the human genome.
Abstract: This paper examines the model of network genomics pioneered in the late 1980s and adopted in the European Commission-led Yeast Genome Sequencing Project (YGSP). It contrasted with the burgeoning large-scale center model being developed in the United States to sequence the yeast genome, chiefly as a pilot for tackling the human genome. We investigate the operation and connections of the two models by exploring a co-authorship network that captures different types of sequencing practices. In our network analysis, we focus on institutions that bridge both the European and American yeast whole-genome sequencing projects, and such concerted projects with non-concerted sequencing of yeast DNA. The institutions include two German biotechnology companies and Biozentrum, a research institute at Universität Basel that adopted yeast as a model to investigate cell biochemistry and molecular biology. Through assessing these bridging institutions, we formulate two analytical distinctions: between proximate and distal, and directed and undirected sequencing. Proximate and distal refer to the extent that intended users of DNA sequence data are connected to the generators of that data. Directed and undirected capture the extent to which sequencing was part of a specific research program. The networked European model, as mobilized in the YGSP, enabled the coexistence and cooperation of institutions exhibiting different combinations of these characteristics in contrast with the more uniformly distal and undirected large-scale centers. This contributes to broadening the historical boundaries of genomics and presenting a thicker historiography, one that inextricably meshes genomics with the trajectories of biotechnology and cell biology. This essay is part of a special issue entitled The Sequences and the Sequencers: A New Approach to Investigating the Emergence of Yeast, Human, and Pig Genomics, edited by Michael García-Sancho and James Lowe.

7 citations


Journal ArticleDOI
TL;DR: The protocol for the provider training intervention utilized in the SouthSeq study and the associated impact on NGP knowledge and confidence in reviewing, interpreting, and using genome sequencing results are described.
Abstract: To meet current and expected future demand for genome sequencing in the neonatal intensive care unit (NICU), adjustments to traditional service delivery models are necessary. Effective programs for the training of non-genetics providers (NGPs) may address the known barriers to providing genetic services including limited genetics knowledge and lack of confidence. The SouthSeq project aims to use genome sequencing to make genomic diagnoses in the neonatal period and evaluate a scalable approach to delivering genome sequencing results to populations with limited access to genetics professionals. Thirty-three SouthSeq NGPs participated in a live, interactive training intervention and completed surveys before and after participation. Here, we describe the protocol for the provider training intervention utilized in the SouthSeq study and the associated impact on NGP knowledge and confidence in reviewing, interpreting, and using genome sequencing results. Participation in the live training intervention led to an increased level of confidence in critical skills needed for real-world implementation of genome sequencing. Providers reported a significant increase in confidence level in their ability to review, understand, and use genome sequencing result reports to guide patient care. Reported barriers to implementation of genome sequencing in a NICU setting included test cost, lack of insurance coverage, and turn around time. As implementation of genome sequencing in this setting progresses, effective education of NGPs is critical to provide access to high-quality and timely genomic medicine care.

7 citations


Journal ArticleDOI
TL;DR: In this article , an in silico decoy chromosome along with corresponding synthetic DNA reference controls are used to evaluate the accuracy of next-generation sequencing in difficult regions of the human genome and highlight the challenges that remain to resolve these difficult regions.
Abstract: Next-generation sequencing (NGS) can identify mutations in the human genome that cause disease and has been widely adopted in clinical diagnosis. However, the human genome contains many polymorphic, low-complexity, and repetitive regions that are difficult to sequence and analyze. Despite their difficulty, these regions include many clinically important sequences that can inform the treatment of human diseases and improve the diagnostic yield of NGS.To evaluate the accuracy by which these difficult regions are analyzed with NGS, we built an in silico decoy chromosome, along with corresponding synthetic DNA reference controls, that encode difficult and clinically important human genome regions, including repeats, microsatellites, HLA genes, and immune receptors. These controls provide a known ground-truth reference against which to measure the performance of diverse sequencing technologies, reagents, and bioinformatic tools. Using this approach, we provide a comprehensive evaluation of short- and long-read sequencing instruments, library preparation methods, and software tools and identify the errors and systematic bias that confound our resolution of these remaining difficult regions.This study provides an analytical validation of diagnosis using NGS in difficult regions of the human genome and highlights the challenges that remain to resolve these difficult regions.

5 citations


Journal ArticleDOI
TL;DR: Aldy 4 is shown to be the most accurate star-allele caller with near-perfect accuracy in all evaluated contexts and it is hoped that Aldy remains an invaluable tool in the clinical toolbox even with the advent of long-read sequencing technologies.
Abstract: High-throughput sequencing provides sufficient means for determining genotypes of clinically important pharmacogenes that can be used to tailor medical decisions to individual patients. However, pharmacogene genotyping, also known as star-allele calling, is a challenging problem that requires accurate copy number calling, structural variation discovery, variant calling and phasing within each pharmacogene copy present in the sample. Here we introduce Aldy 4, a fast and efficient tool for genotyping pharmacogenes that utilizes combinatorial optimization for accurate star-allele calling across different sequencing technologies. Aldy 4 adds support for long reads and ships with a novel phasing model and improved copy number and variant calling models. We compare Aldy 4 against the current state-of-the-art star-allele callers on a large and diverse set of samples and genes sequenced by various sequencing technologies, such as whole-genome and targeted Illumina sequencing, barcoded 10X Genomics and PacBio HiFi. We show that Aldy 4 is the most accurate star-allele caller with near-perfect accuracy in all evaluated contexts. We hope that Aldy remains an invaluable tool in the clinical toolbox even with the advent of long-read sequencing technologies. Availability Aldy 4 is available at https://github.com/0xTCG/aldy.

5 citations


Journal ArticleDOI
TL;DR: In this article , an in silico decoy chromosome along with corresponding synthetic DNA reference controls are used to evaluate the accuracy of next-generation sequencing in difficult regions of the human genome and highlight the challenges that remain to resolve these difficult regions.
Abstract: Next-generation sequencing (NGS) can identify mutations in the human genome that cause disease and has been widely adopted in clinical diagnosis. However, the human genome contains many polymorphic, low-complexity, and repetitive regions that are difficult to sequence and analyze. Despite their difficulty, these regions include many clinically important sequences that can inform the treatment of human diseases and improve the diagnostic yield of NGS.To evaluate the accuracy by which these difficult regions are analyzed with NGS, we built an in silico decoy chromosome, along with corresponding synthetic DNA reference controls, that encode difficult and clinically important human genome regions, including repeats, microsatellites, HLA genes, and immune receptors. These controls provide a known ground-truth reference against which to measure the performance of diverse sequencing technologies, reagents, and bioinformatic tools. Using this approach, we provide a comprehensive evaluation of short- and long-read sequencing instruments, library preparation methods, and software tools and identify the errors and systematic bias that confound our resolution of these remaining difficult regions.This study provides an analytical validation of diagnosis using NGS in difficult regions of the human genome and highlights the challenges that remain to resolve these difficult regions.

4 citations



Journal ArticleDOI
Joel Rozowsky, Jorg Drenkow, Yucheng T. Yang, Gamze Gursoy, Timur R. Galeev, Beatrice Borsari, Charles B. Epstein, Kun Xiong, Jinrui Xu, Jiahao Gao, Kai Yu, Ana Berthel, Zhanlin Chen, Fabio C. P. Navarro, Jason Liu, Maxwell S Sun, James C. Wright, Justin Chang, Christopher J. F. Cameron, Noam Shoresh, Elizabeth Gaskell, Jessika Adrian, Sergey Aganezov, François Aguet, Gabriela Balderrama-Gutierrez, Samridhi Banskota, G. Corona, Sora Chee, Surya B. Chhetri, Gabriel Conte Cortez Martins, Cassidy Danyko, Carrie A. Davis, Daniel Farid, Nina Farrell, Idan Gabdank, Yoel Gofin, David U. Gorkin, Mengting Gu, Vivian C. Hecht, Benjamin C. Hitz, Robbyn Issner, Melanie Kirsche, Xiangmeng Kong, Bonita R Lam, Shantao Li, Bian Li, Tianxiao Li, Xiqi Li, Khine Lin, Ruibang Luo, Mark Mackiewicz, Jill Moore, Jonathan M. Mudge, Nicholas C Nelson, Chad Nusbaum, Ioann O. Popov, Henry Pratt, Yunjiang Qiu, Srividya Ramakrishnan, Joe Raymond, Leonidas Salichos, Alexandra Scavelli, Jacob Schreiber, Fritz J. Sedlazeck, Lei-Hoon See, Rachel M. Sherman, Xu Shi, Minyi Shi, Cricket A. Sloan, J. Seth Strattan, Zhen Tan, Forrest Y. Tanaka, Anna Vlasova, Jun Wang, Jonathan D. Werner, Brian A. Williams, Min Xu, Chengfei Yan, Lu Yu, Chris Zaleski, Jing Zhang, Kristin G. Ardlie, J. M. Cherry, Eric M. Mendenhall, William Noble, Zhiping Weng, Morgan E. Levine, Alexander Dobin, Barbara J. Wold, Ali Mortazavi, Bing Ren, Jesse Gillis, Richard M. Myers, Michael Snyder, Jyoti S. Choudhary, Aleksandar Milosavljević, Michael C. Schatz, Roderic Guigó, Bradley E. Bernstein, Thomas R. Gingeras, Mark Gerstein 
22 Nov 2022-Cell
TL;DR: The EN-TEx dataset as mentioned in this paper contains 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays) mapped to matched, diploid genomes with long read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci.

Posted ContentDOI
20 Jul 2022-bioRxiv
TL;DR: FixItFelix as mentioned in this paper uses GTEx, gnomAD, 1000 Genomes Project, and other important genomic resources leading to wrong interpretations for these genes to improve variant calling and expression analysis.
Abstract: The GRCh38 reference is the current standard in human genomics research and clinical applications, but includes errors across 33 protein-coding genes, including 12 with medical relevance. Current studies rely on the correctness of this reference genome and require an accurate and cost-effective way to improve variant calling and expression analysis across these erroneous loci. We identified likely artifacts in GTEx, gnomAD, 1000 Genomes Project, and other important genomic resources leading to wrong interpretations for these genes. Here, we present FixItFelix together with a modified GRCh38 version that improves the subsequent analysis across these genes within minutes for an existing BAM/CRAM file. We showcase these improvements over multi-ethnic control samples across short and long-read DNA-, and RNA-sequencing. Furthermore, applying our approach across thousands of genomes demonstrates improvements for population variant calling as well as eQTL studies. Still, some genes e.g., DUSP22 indicate mixed results due to their complexity.

Journal ArticleDOI
TL;DR: A user-friendly interface for monitoring the variations in a gene of interest for molecular diagnosis, PhenGenVar, which was able to identify several genomic variations, including single-nucleotide polymorphism, insertions, and deletions in specific gene regions and can be used to diagnose a patient’s disease.
Abstract: Precision medicine has been revolutionized by the advent of high-throughput next-generation sequencing (NGS) technology and development of various bioinformatic analysis tools for large-scale NGS big data. At the population level, biomedical studies have identified human diseases and phenotype-associated genetic variations using NGS technology, such as whole-genome sequencing, exome sequencing, and gene panel sequencing. Furthermore, patients’ genetic variations related to a specific phenotype can also be identified by analyzing their genomic information. These breakthroughs paved the way for the clinical diagnosis and precise treatment of patients’ diseases. Although many bioinformatics tools have been developed to analyze the genetic variations from the individual patient’s NGS data, it is still challenging to develop user-friendly programs for clinical physicians who do not have bioinformatics programing skills to diagnose a patient’s disease using the genomic data. In response to this demand, we developed a Phenotype to Genotype Variation program (PhenGenVar), which is a user-friendly interface for monitoring the variations in a gene of interest for molecular diagnosis. This allows for flexible filtering and browsing of variants of the disease and phenotype-associated genes. To test this program, we analyzed the whole-genome sequencing data of an anonymous person from the 1000 human genome project data. As a result, we were able to identify several genomic variations, including single-nucleotide polymorphism, insertions, and deletions in specific gene regions. Therefore, PhenGenVar can be used to diagnose a patient’s disease. PhenGenVar is freely accessible and is available at our website.

Journal ArticleDOI
01 Sep 2022-Genes
TL;DR:
Abstract: The goal of biomarker testing, in the field of personalized medicine, is to guide treatments to achieve the best possible results for each patient. The accurate and reliable identification of everyone’s genome variants is essential for the success of clinical genomics, employing third-generation sequencing. Different variant calling techniques have been used and recommended by both Oxford Nanopore Technologies (ONT) and Nanopore communities. A thorough examination of the variant callers might give critical guidance for third-generation sequencing-based clinical genomics. In this study, two reference genome sample datasets (NA12878) and (NA24385) and the set of high-confidence variant calls provided by the Genome in a Bottle (GIAB) were used to allow the evaluation of the performance of six variant calling tools, including Human-SNP-wf, Clair3, Clair, NanoCaller, Longshot, and Medaka, as an integral step in the in-house variant detection workflow. Out of the six variant callers understudy, Clair3 and Human-SNP-wf that has Clair3 incorporated into it achieved the highest performance rates in comparison to the other variant callers. Evaluation of the results for the tool was expressed in terms of Precision, Recall, and F1-score using Hap.py tools for the comparison. In conclusion, our findings give important insights for identifying accurate variants from third-generation sequencing of personal genomes using different variant detection tools available for long-read sequencing.

Journal ArticleDOI
TL;DR: The potential for accurate integration of variation into health and disease-related care is discussed in this article , where the authors examine the opportunities and potential challenges for application of genome-scale variation detection in disease.
Abstract: Human genome variation has increasingly posed challenges and opportunities for patients, medical providers, and an increasing group of stakeholders including advocacy groups, disadvantaged communities, public health experts, and scientists. Here, advances in genomic sequencing and mapping technologies are discussed with particular attention to the increasing ability to detect personal and population genome variation and the potential for accurate integration of variation into health and disease-related care. Genome mapping, one technique used to create genome map scaffolds, has now been combined with long read sequencing. New technologies have led to improved variation detection, including cryptic structural variation and diverse variants with different degrees of disease association. Combined with advances in automated and medical interpretations, variation detection is increasingly being applied in healthcare. These advances promise to make disease diagnostics more rapid, and potentially more accessible, to those with medical needs. Consequentially, the need for medical genetics and genomics experts is increasing. Here, the opportunities and potential challenges for application of genome-scale variation detection in disease are examined. (<300 words).

Journal ArticleDOI
TL;DR: In this paper , an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build, was presented.
Abstract: Genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the noncoding genome and the clinical need for methods that prioritize potentially disease causal noncoding variants. Some tools for assessment of variant pathogenicity as well as annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software, and pipelines was slow.Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, retrained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and look up scores in the genome, we developed a website and API for easy score lookup.Scores of the GRCh38 genome build are highly correlated to the prior release with a performance increase due to the better coverage of features. For prioritization of noncoding mutations in imbalanced datasets, the ReMM score performed much better than other variation scores. Prescored whole-genome files of GRCh37 and GRCh38 genome builds are cited in the article and the website; UCSC genome browser tracks, and an API are available at https://remm.bihealth.org.

Journal ArticleDOI
TL;DR: In this article , the authors investigated the Genomics England 100,000 Genomes Project diagnostic utility to evaluate genome sequencing for in rare, inherited conditions and found that diagnostic yield varied considerably between phenotype categories and was minimal with prior exome testing.
Abstract: NHS genetics centres in Scotland sought to investigate the Genomics England 100,000 Genomes Project diagnostic utility to evaluate genome sequencing for in rare, inherited conditions. Four regional services recruited 999 individuals from 394 families in 200 rare phenotype categories, with negative historic genetic testing. Genome sequencing was performed at Edinburgh Genomics, and phenotype and sequence data were transferred to Genomics England for variant calling, gene-based filtering and variant prioritisation. NHS Scotland genetics laboratories performed interpretation, validation and reporting. New diagnoses were made in 23% cases - 19% in genes implicated in disease at the time of variant prioritisation, and 4% from later review of additional genes. Diagnostic yield varied considerably between phenotype categories and was minimal in cases with prior exome testing. Genome sequencing with gene panel filtering and reporting achieved improved diagnostic yield over previous historic testing but not over now routine trio-exome sequence tests. Re-interpretation of genomic data with updated gene panels modestly improved diagnostic yield at minimal cost. However, to justify the additional costs of genome vs exome sequencing, efficient methods for analysis of structural variation will be required and / or cost of genome analysis and storage will need to decrease.

Journal ArticleDOI
TL;DR: It is demonstrated that ETS not only precludes error-prone manual checks but also has no effect on the genomic landscape of preimplantation embryos and increases efficacy and throughput of the state-of-the-art PGT methods.
Abstract: Abstract STUDY QUESTION Can the embryo tracking system (ETS) increase safety, efficacy and scalability of massively parallel sequencing-based preimplantation genetic testing (PGT)? SUMMARY ANSWER Applying ETS-PGT, the chance of sample switching is decreased, while scalability and efficacy could easily be increased substantially. WHAT IS KNOWN ALREADY Although state-of-the-art sequencing-based PGT methods made a paradigm shift in PGT, they still require labor intensive library preparation steps that makes PGT cost prohibitive and poses risks of human errors. To increase the quality assurance, efficiency, robustness and throughput of the sequencing-based assays, barcoded DNA fragments have been used in several aspects of next-generation sequencing (NGS) approach. STUDY DESIGN, SIZE, DURATION We developed an ETS that substantially alleviates the complexity of the current sequencing-based PGT. With (n = 693) and without (n = 192) ETS, the downstream PGT procedure was performed on both bulk DNA samples (n = 563) and whole-genome amplified (WGAed) few-cell DNA samples (n = 322). Subsequently, we compared full genome haplotype landscapes of both WGAed and bulk DNA samples containing ETS or no ETS. PARTICIPANTS/MATERIALS, SETTING, METHODS We have devised an ETS to track embryos right after whole-genome amplification (WGA) to full genome haplotype profiles. In this study, we recruited 322 WGAed DNA samples derived from IVF embryos as well as 563 bulk DNA isolated from peripheral blood of prospective parents. To determine possible interference of the ETS in the NGS-based PGT workflow, barcoded DNA fragments were added to DNA samples prior to library preparation and compared to samples without ETS. Coverages and variants were determined. MAIN RESULTS AND THE ROLE OF CHANCE Current PGT protocols are quality sensitive and prone to sample switching. To avoid sample switching and increase throughput of PGT by sequencing-based haplotyping, six control steps should be carried out manually and checked by a second person in a clinical setting. Here, we developed an ETS approach in which one step only in the entire PGT procedure needs the four-eyes principal. We demonstrate that ETS not only precludes error-prone manual checks but also has no effect on the genomic landscape of preimplantation embryos. Importantly, our approach increases efficacy and throughput of the state-of-the-art PGT methods. LIMITATIONS, REASONS FOR CAUTION Even though the ETS simplified sequencing-based PGT by avoiding potential errors in six steps in the protocol, if the initial assignment is not performed correctly, it could lead to cross-contamination. However, this can be detected in silico following downstream ETS analysis. Although we demonstrated an approach to evaluate purity of the ETS fragment, it is recommended to perform a pre-PGT quality control assay of the ETS amplicons with non-human DNA, such that the purity of each ETS molecule can be determined prior to ETS-PGT. WIDER IMPLICATIONS OF THE FINDINGS The ETS-PGT approach notably increases efficacy and scalability of PGT. ETS-PGT has broad applicative value, as it can be tailored to any single- and few-cell sequencing approach where the starting specimen is scarce, as opposed to other methods that require a large number of cells as the input. Moreover, ETS-PGT could easily be adapted to any sequencing-based diagnostic method, including PGT for structural rearrangements and aneuploidies by low-pass sequencing as well as non-invasive prenatal testing. STUDY FUNDING/COMPETING INTEREST(S) M.Z.E. is supported by the EVA (Erfelijkheid Voortplanting & Aanleg) specialty program (grant no. KP111513) of Maastricht University Medical Centre (MUMC+), and the Horizon 2020 innovation (ERIN) (grant no. EU952516) of the European Commission. TRIAL REGISTRATION NUMBER N/A.

Journal ArticleDOI
07 Dec 2022
TL;DR: FINDEL as mentioned in this paper is a deep learning-based software that can efficiently remove sequencing artifacts from cancer samples by querying the variant call format file which is much more compact than BAM files.
Abstract: Next-generation sequencing technologies have increased sequencing throughput by 100-1000 folds and subsequently reduced the cost of sequencing a human genome to approximately US$1,000. However, the existence of sequencing artifacts can cause erroneous identification of variants and adversely impact the downstream analyses. Currently, the manual inspection of variants for additional refinement is still necessary for high-quality variant calls. The inspection is usually done on large binary alignment map (BAM) files which consume a huge amount of labor and time. It also suffers from a lack of standardization and reproducibility. Here we show that the use of mutational signatures coupled with deep learning can replace the current standards in the bioinformatics workflow. This software, called FINDEL, can efficiently remove sequencing artifacts from cancer samples. It queries the variant call format file which is much more compact than BAM files. The software automates the variant refinement process and produces high-quality variant calls.

Posted ContentDOI
08 Nov 2022-bioRxiv
TL;DR: The authors used multiple sequencing approaches to sequence the genome of a volunteer from Saudi Arabia and used the resulting data to generate a de novo assembly of the genome, and use different computational approaches to refine the assembly.
Abstract: We have used multiple sequencing approaches to sequence the genome of a volunteer from Saudi Arabia. We use the resulting data to generate a de novo assembly of the genome, and use different computational approaches to refine the assembly. As a consequence, we provide a contiguous assembly of the complete genome of an individual from Saudi Arabia for all chromosomes except chromosome Y, and label this assembly KSA001. We transferred genome annotations from reference genomes and predicted genome features using methods from Artificial Intelligence to fully annotate KSA001, and we make all primary sequencing data, the assembly, and the genome annotations freely available in public databases using the FAIR data principles.

Posted ContentDOI
24 Aug 2022
TL;DR: In this article , a brief history of next-generation sequencing technology (NGS) is presented, followed by the benefits of employing this revolutionary technology and the general approach of NGS, beginning with fragmentation and ending with data analysis, with a focus on the Illumina and Ion torrent platforms.
Abstract: Abstract Background: Next-generation sequencing is a type of deep sequencing. In comparison to the previously used Sanger's method, Next generation sequencing allowing the sequencing of an entire genome in a single day. Next-generation sequencing (NGS) has revolutionized genomics and molecular biology. NGS has a wide range of medical applications, including tumors and inherited disease diagnosis. It is also used to find genetic variants across the genome. There are several NGS platforms available. Illumina and Iontorrent are the most prevalent sequencing platforms. These NGS platforms reduced the cost and time required to sequence a full genome. The main body of the abstract: The review paper covered a brief history of next-generation sequencing technology (NGS), followed by the benefits of employing this revolutionary technology and The general approach of NGS, beginning with fragmentation and ending with data analysis, was explained, with a focus on the Illumina and Ion torrent platforms. Finally, the data analysis step was thoroughly covered, beginning with data quality control and ending with data visualization. Conclusion: According to the review article, Next generation sequencing (NGS) is a promising technology that has revolutionized genome sequencing. The NGS platform has resulted in softwares that can perform the vast majority of NGS steps such as sequencing, variant annotation, and quality checks. The Iontorrent and Illumina platforms have grown in popularity and are frequently used. NGS has gained traction in clinical applications. Conclusion: According to the review article, Next generation sequencing (NGS) is a promising technology that has revolutionized genome sequencing. The NGS platform has resulted in softwares that can perform the vast majority of NGS steps such as sequencing, variant annotation, and quality checks. The Iontorrent and Illumina platforms have grown in popularity and are frequently used. NGS has gained traction in clinical applications.

Journal ArticleDOI
TL;DR: The American College of Medical Genetics and Genomics (ACMG) recommends genome sequencing (GS) testing as an effective strategy in comparison with traditional single or multi-gene testing in patients with a strong family history of a likely unknown genetic disorder or otherwise unspecified phenotype, known but heterogeneous disorder, and affected patients with previously non-conclusive genetic results as discussed by the authors .

Journal ArticleDOI
TL;DR: In this article , a mixed-methods study aimed to assess the acceptability of a genetic counselor (GC) phone call in communicating polygenic risk information in the Melanoma Genomics Managing Your Risk randomized controlled trial.
Abstract: Personalized polygenic risk information may be used to guide risk-based melanoma prevention and early detection at a population scale, but research on communicating this information is limited. This mixed-methods study aimed to assess the acceptability of a genetic counselor (GC) phone call in communicating polygenic risk information in the Melanoma Genomics Managing Your Risk randomized controlled trial. Participants (n = 509) received personalized melanoma polygenic risk information, an educational booklet on melanoma prevention, and a GC phone call, which was audio-recorded. Participants completed the Genetic Counseling Satisfaction Survey 1-month after receiving their risk information (n = 346). A subgroup took part in a qualitative interview post-study completion (n = 20). Survey data were analyzed descriptively using SPSS, and thematic analysis of the qualitative data was conducted using NVivo 12.0 software. The survey showed a high level of acceptability for the GC phone call (mean satisfaction score overall: 4.3 out of 5, standard deviation (SD): 0.6) with differences according to gender (mean score for women: 4.4, SD: 0.6 vs. men: 4.2, SD: 0.7; p = 0.005), health literacy (lower literacy: 4.1, SD: 0.8; average: 4.3, SD: 0.6; higher: 4.4, SD: 0.6: p = 0.02) and polygenic risk group (low risk: 4.5, SD: 0.5, SD: average: 4.3, SD: 0.7, high: 4.3, SD: 0.7; p = 0.03). During the GC phone calls, the discussion predominately related to the impact of past sun exposure on personal melanoma risk. Together our findings point to the importance of further exploring educational and support needs and preferences for communicating personalized melanoma risk among population subgroups, including diverse literacy levels.

Posted ContentDOI
12 Apr 2022-medRxiv
TL;DR: Large proteomic datasets (> 1,000 proteins) can be accurately linked to a specific genome through pQTL knowledge and should not be considered deidentified, suggesting that large scale proteomic data be given privacy protections of genomic data, or that bioinformatic transformations should be applied to obfuscate identity.
Abstract: Introduction: Privacy protection is a core principle of genomic research but needs further refinement for high-throughput proteomic platforms. Methods: We identified independent single nucleotide polymorphism (SNP) quantitative trait loci (pQTL) from COPDGene and Jackson Heart Study (JHS) and then calculated genotype probabilities by protein level for each protein-genotype combination (training). Using the most significant 100 proteins, we applied a naive Bayesian approach to match proteomes to genomes for 2,812 independent subjects from COPDGene, JHS, SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) and Multi-Ethnic Study of Atherosclerosis (MESA) with SomaScan 1.3K proteomes and also 2,646 COPDGene subjects with SomaScan 5K proteomes (testing). We tested whether subtracting mean genotype effect for each pQTL SNP would obscure genetic identity. Results: In the four testing cohorts, we were able to correctly match 90%-95% their proteomes to their correct genome and for 95%-99% we could match the proteome to the 1% most likely genome. With larger profiling (SomaScan 5K), correct identification was > 99%. The accuracy of matching in subjects with African ancestry was lower (~60%) unless training included diverse subjects. Mean genotype effect adjustment reduced identification accuracy nearly to random guess. Conclusion: Large proteomic datasets (> 1,000 proteins) can be accurately linked to a specific genome through pQTL knowledge and should not be considered deidentified. These findings suggest that large scale proteomic data be given privacy protections of genomic data, or that bioinformatic transformations (such as adjustment for genotype effect) should be applied to obfuscate identity.

Posted ContentDOI
27 May 2022-bioRxiv
TL;DR: An updated version of the Regulatory Mendelian Mutation (ReMM) score is presented, re-trained on features and variants derived from the GRCh38 genome build, and achieves good performance on its highly imbalanced data.
Abstract: Motivation Various genome sequencing efforts for individuals with rare Mendelian disease have increased the research focus on the non-coding genome and the clinical need for methods that prioritize potentially disease causal non-coding variants. Some methods and annotations are not available for the current human genome build (GRCh38), for which the adoption in databases, software and pipelines was slow. Results Here, we present an updated version of the Regulatory Mendelian Mutation (ReMM) score, re-trained on features and variants derived from the GRCh38 genome build. Like its GRCh37 version, it achieves good performance on its highly imbalanced data. To improve accessibility and provide users with a toolbox to score their variant files and lookup scores in the genome, we developed a website and API for easy score lookup. Availability and Implementation Pre-scored whole genome files of GRCh37 and GRCh38 genome builds are available on Zenodo https://doi.org/10.5281/zenodo.6576087. The website and API are available at https://remm.bihealth.org.

Journal ArticleDOI
TL;DR: In this paper , a review of the use of sequencing technologies in personalized medicine is presented, along with its own practical examples, and the advantages and limitations of the above methods for diagnosing monogenic and oncological diseases, as well as for identifying risk factors and predicting the course of socially significant multifactorial diseases.
Abstract: The review highlights various methods for deciphering the nucleotide sequence (sequencing) of nucleic acids and their importance for the implementation of the three main principles of personalized medicine: prevention, predictability and personalization. The review, along with its own practical examples, considers three generations of sequencing technologies: 1) sequencing of cloned or amplified DNA fragments according to Sanger and its analogues; 2) massive parallel sequencing of DNA libraries with short reads (NGS); and 3) sequencing of single molecules of DNA and RNA with long reads. The methods of whole genome, whole exome, targeted, RNA sequencing and sequencing based on chromatin immunoprecipitation are also discussed. The advantages and limitations of the above methods for diagnosing monogenic and oncological diseases, as well as for identifying risk factors and predicting the course of socially significant multifactorial diseases are discussed. Using examples from clinical practice, algorithms for the application and selection of sequencing technologies are demonstrated. As a result of the use of sequencing technologies, it has now become possible to determine the molecular mechanism of the development of monogenic, orphan and multifactorial diseases, the knowledge of which is necessary for personalized patient therapy. In science, these technologies paved the way for international genome projects — the Human Genome Project, the HapMap, 1000 Genomes Project, the Personalized Genome Project, etc.

Journal ArticleDOI
TL;DR: In this article , a semi-structured pre-and post-sequencing interview with each participant was conducted to identify key themes that they raised after being sequenced and evaluate how their experience of the procedure evolved over time, also conducted a questionnaire to gather their views 3 years after receiving their genomic data.
Abstract: Whole-genome sequencing (WGS) can provide valuable health insight for research participants or patients. Opportunities to be sequenced are increasing as direct-to-consumer (DTC) testing becomes more prevalent, but it is still fairly unusual to have been sequenced. We offered WGS to fourteen professionals with pre-existing familiarity with an interest in human genetics - healthcare, science, policy and art. Participants received a hard drive containing their personal sequence data files (.BAM,.gvcf), without further explanation or obligation, to consider how experiencing WGS firsthand might influence their professional attitudes. We performed semi-structured pre- and post-sequencing interviews with each participant to identify key themes that they raised after being sequenced. To evaluate how their experience of the procedure evolved over time, we also conducted a questionnaire to gather their views 3 years after receiving their genomic data. Participants were generally satisfied with the experience (all 14 participants would choose to participate again). They mostly decided to participate out of curiosity (personal) and to learn from the experience (professional). Whereas most participants slightly developed their original perspective on genetic data, a small selection of them radically changed their views over the course of the project. We conclude that personal experience of sequencing provides an interesting alternative perspective for experts involved in leading, planning, implementing or researching genome sequencing services. Moreover, the personal experience may provide professionals with a better understanding of the challenges visitors of the Genetics Clinic of the Future may face.

Journal ArticleDOI
01 Dec 2022-Plants
TL;DR: Wang et al. as discussed by the authors used k-mer and flow cytometric analysis to estimate the genome size of I. chinensis to be around 618-655 Mb, with the GC content, heterozygous rate, and repeat sequence rate of 37.52%, 1.1%, and 38%, respectively.
Abstract: Ilex chinensis Sims. is an evergreen arbor species with high ornamental and medicinal value that is widely distributed in China. However, there is a lack of molecular and genomic data for this plant, which severely restricts the development of its relevant research. To obtain the whole reference genome, we first conducted a genome survey of I. chinensis by next-generation sequencing (NGS) to perform de novo whole-genome sequencing. As a result, our estimates using k-mer and flow cytometric analysis suggested the genome size of I. chinensis to be around 618–655 Mb, with the GC content, heterozygous rate, and repeat sequence rate of 37.52%, 1.1%, and 38%, respectively. A total of 334,649 microsatellite motifs were detected from the I. chinensis genome data, which will provide basic molecular markers for germplasm characterization, genetic diversity, and QTL mapping studies for I. chinensis. In summary, the I. chinensis genome is complex with high heterozygosity and few repeated sequences. Overall, this is the first report on the genome features of I. chinensis, and the information may lay a strong groundwork for future whole-genome sequencing and molecular breeding studies of this species.

Posted ContentDOI
14 Nov 2022-bioRxiv
TL;DR: FINDEL as discussed by the authors is a deep learning-based software that can efficiently remove sequencing artifacts from cancer samples by querying the variant call format file which is much more compact than BAM files.
Abstract: Next-generation sequencing technologies have increased sequencing throughput by 100-1000 folds and subsequently reduced the cost of sequencing a human genome to approximately US$1,000. However, the existence of sequencing artifacts can cause erroneous identification of variants and adversely impact the downstream analyses. Currently, the manual inspection of variants for additional refinement is still necessary for high-quality variant calls. The inspection is usually done on large binary alignment map (BAM) files which consume a huge amount of labor and time. It also suffers from a lack of standardization and reproducibility. Here we show that the use of mutational signatures coupled with deep learning can replace the current standards in the bioinformatics workflow. This software, called FINDEL, can efficiently remove sequencing artifacts from cancer samples. It queries the variant call format file which is much more compact than BAM files. The software automates the variant refinement process and produces high-quality variant calls.

Journal ArticleDOI
23 Jun 2022-Genes
TL;DR: This mini review discusses such developments from the viewpoint of the Stickler’s higher specialist service, detailing the considerations and improvements to diagnostic sequencing implemented since 2003.
Abstract: Diagnostic genetics within the United Kingdom National Health Service (NHS) has undergone many stepwise improvements in technology since the completion of the human genome project in 2003. Although Sanger sequencing has remained a cornerstone of the diagnostic sequencing arena, the human genome reference sequence has enabled next-generation sequencing (more accurately named ‘second-generation sequencing’), to rapidly surpass it in scale and potential. This mini review discusses such developments from the viewpoint of the Stickler’s higher specialist service, detailing the considerations and improvements to diagnostic sequencing implemented since 2003.