scispace - formally typeset
Search or ask a question
Topic

Variant Call Format

About: Variant Call Format is a research topic. Over the lifetime, 179 publications have been published within this topic receiving 93623 citations. The topic is also known as: VCF.


Papers
More filters
Journal ArticleDOI
TL;DR: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants, and provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants.
Abstract: Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/.

162 citations

Journal ArticleDOI
TL;DR: The advent of next-generation sequencing (NGS) in 2010 has transformed medicine, particularly the growing field of inborn errors of immunity, and whole-genome sequencing (WES) is presently the most cost-effective approach for research and diagnostics.
Abstract: The advent of next-generation sequencing (NGS) in 2010 has transformed medicine, particularly the growing field of inborn errors of immunity. NGS has facilitated the discovery of novel disease-causing genes and the genetic diagnosis of patients with monogenic inborn errors of immunity. Whole-exome sequencing (WES) is presently the most cost-effective approach for research and diagnostics, although whole-genome sequencing offers several advantages. The scientific or diagnostic challenge consists in selecting 1 or 2 candidate variants among thousands of NGS calls. Variant- and gene-level computational methods, as well as immunologic hypotheses, can help narrow down this genome-wide search. The key to success is a well-informed genetic hypothesis on 3 key aspects: mode of inheritance, clinical penetrance, and genetic heterogeneity of the condition. This determines the search strategy and selection criteria for candidate alleles. Subsequent functional validation of the disease-causing effect of the candidate variant is critical. Even the most up-to-date dry lab cannot clinch this validation without a seasoned wet lab. The multifariousness of variations entails an experimental rigor even greater than traditional Sanger sequencing-based approaches in order not to assign a condition to an irrelevant variant. Finding the needle in the haystack takes patience, prudence, and discernment.

140 citations

Patent
18 Jul 2002
TL;DR: In this article, the authors present a system for mapping elements from a first XML format to a second XML format using an interface that allows a user to associate elements from the first format to the second format.
Abstract: Methods, apparatuses and systems facilitating mapping of elements from a first XML format to a second XML format using an interface that allows a user to associate elements from the first format to the second format. In some embodiments, a mapping can cause a direct transfer of a value in an input document to an output document. Maps can also be augmented with textual strings and scripts, for example, that can save a value from a first file format to a variable that can be accessed by another script, or save a value in association with a result tag in the second file format. A single tag from the first format can be mapped to multiple tags from the second format, multiple tags from the first format can be mapped to a single tag in the second format, and a single tag in the first format can be mapped multiple times to a single tag in the second format.

131 citations

Journal ArticleDOI
TL;DR: LDBlockShow, an open source software, for visualizing LD and haplotype blocks from variant call format files is developed, which is time and memory saving and can also compress the SVG files with a large number of SNPs and support subgroup analysis.
Abstract: The triangular correlation heatmap aiming to visualize the linkage disequilibrium (LD) pattern and haplotype block structure of SNPs is ubiquitous component of population-based genetic studies. However, current tools suffered from the problem of time and memory consuming. Here, we developed LDBlockShow, an open source software, for visualizing LD and haplotype blocks from variant call format files. It is time and memory saving. In a test dataset with 100 SNPs from 60 000 subjects, it was at least 10.60 times faster and used only 0.03-13.33% of physical memory as compared with other tools. In addition, it could generate figures that simultaneously display additional statistical context (e.g. association P-values) and genomic region annotations. It can also compress the SVG files with a large number of SNPs and support subgroup analysis. This fast and convenient tool will facilitate the visualization of LD and haplotype blocks for geneticists.

110 citations

Journal ArticleDOI
TL;DR: A new WGS variant data format implemented in the R/Bioconductor package ‘SeqArray’ for storing variant calls in an array‐oriented manner which provides the same capabilities as VCF, but with multiple high compression options and data access using high‐performance parallel computing.
Abstract: Motivation Whole-genome sequencing (WGS) data are being generated at an unprecedented rate. Analysis of WGS data requires a flexible data format to store the different types of DNA variation. Variant call format (VCF) is a general text-based format developed to store variant genotypes and their annotations. However, VCF files are large and data retrieval is relatively slow. Here we introduce a new WGS variant data format implemented in the R/Bioconductor package 'SeqArray' for storing variant calls in an array-oriented manner which provides the same capabilities as VCF, but with multiple high compression options and data access using high-performance parallel computing. Results Benchmarks using 1000 Genomes Phase 3 data show file sizes are 14.0 Gb (VCF), 12.3 Gb (BCF, binary VCF), 3.5 Gb (BGT) and 2.6 Gb (SeqArray) respectively. Reading genotypes in the SeqArray package are two to three times faster compared with the htslib C library using BCF files. For the allele frequency calculation, the implementation in the SeqArray package is over 5 times faster than PLINK v1.9 with VCF and BCF files, and over 16 times faster than vcftools. When used in conjunction with R/Bioconductor packages, the SeqArray package provides users a flexible, feature-rich, high-performance programming environment for analysis of WGS variant data. Availability and implementation http://www.bioconductor.org/packages/SeqArray. Contact zhengx@u.washington.edu. Supplementary information Supplementary data are available at Bioinformatics online.

108 citations


Network Information
Related Topics (5)
Genome
74.2K papers, 3.8M citations
77% related
Gene expression profiling
26.9K papers, 1.7M citations
74% related
Exon
38.3K papers, 1.7M citations
72% related
Intron
23.8K papers, 1.3M citations
72% related
DNA methylation
49.8K papers, 2.5M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202120
202017
201922
201817
201716