scispace - formally typeset
Search or ask a question
Topic

Variant Call Format

About: Variant Call Format is a research topic. Over the lifetime, 179 publications have been published within this topic receiving 93623 citations. The topic is also known as: VCF.


Papers
More filters
Journal ArticleDOI
TL;DR: The Genome Variation Format (GVF), an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data.
Abstract: Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment.

92 citations

Journal ArticleDOI
TL;DR: VCF‐kit adds essential utilities to process and analyze VCF files, including primer generation for variant validation, dendrogram production, genotype imputation from sequence data in linkage studies, and additional tools.
Abstract: Summary The variant call format (VCF) is a popular standard for storing genetic variation data. As a result, a large collection of tools has been developed that perform diverse analyses using VCF files. However, some tasks common to statistical and population geneticists have not been created yet. To streamline these types of analyses, we created novel tools that analyze or annotate VCF files and organized these tools into a command-line based utility named VCF-kit. VCF-kit adds essential utilities to process and analyze VCF files, including primer generation for variant validation, dendrogram production, genotype imputation from sequence data in linkage studies, and additional tools. Availability and Implementation https://github.com/AndersenLab/VCF-kit. Contact erik.andersen@northwestern.edu.

86 citations

Journal ArticleDOI
TL;DR: The GATK joint genotyping method for calling variants on RNA-seq data was validated by comparing this approach to a so-called “per-sample” method, indicating that both approaches are very close in their capacity of detecting reference variants and that the joint genotypes method is more sensitive than the per-sample method.
Abstract: The Genome Analysis Toolkit (GATK) is a popular set of programs for discovering and genotyping variants from next-generation sequencing data. The current GATK recommendation for RNA sequencing (RNA-seq) is to perform variant calling from individual samples, with the drawback that only variable positions are reported. Versions 3.0 and above of GATK offer the possibility of calling DNA variants on cohorts of samples using the HaplotypeCaller algorithm in Genomic Variant Call Format (GVCF) mode. Using this approach, variants are called individually on each sample, generating one GVCF file per sample that lists genotype likelihoods and their genome annotations. In a second step, variants are called from the GVCF files through a joint genotyping analysis. This strategy is more flexible and reduces computational challenges in comparison to the traditional joint discovery workflow. Using a GVCF workflow for mining SNP in RNA-seq data provides substantial advantages, including reporting homozygous genotypes for the reference allele as well as missing data. Taking advantage of RNA-seq data derived from primary macrophages isolated from 50 cows, the GATK joint genotyping method for calling variants on RNA-seq data was validated by comparing this approach to a so-called “per-sample” method. In addition, pair-wise comparisons of the two methods were performed to evaluate their respective sensitivity, precision and accuracy using DNA genotypes from a companion study including the same 50 cows genotyped using either genotyping-by-sequencing or with the Bovine SNP50 Beadchip (imputed to the Bovine high density). Results indicate that both approaches are very close in their capacity of detecting reference variants and that the joint genotyping method is more sensitive than the per-sample method. Given that the joint genotyping method is more flexible and technically easier, we recommend this approach for variant calling in RNA-seq experiments.

72 citations

Journal ArticleDOI
TL;DR: Jannovar, a stand‐alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis, uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society‐compliant annotations.
Abstract: Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar.

63 citations

Journal ArticleDOI
TL;DR: A machine independent format for storing data derived from automatic sequencing machines is described, which can store the derived sequence, the traces and a set of confidence measures for each base.
Abstract: There are now a number of machines for determining DNA sequences. These devices are currently of two types: those such as the Applied Biosystems 373A and the Pharmacia A.L.F. which interpret the sequences of samples as they run on gels within the machine, and those, such as the Bio-Rad and Amersham readers that scan and analyse conventional autoradiographs. Both types of machine can produce their data in the form of traces which represent the band intensity of each of the four base types at each position in the sequence. At present all the machines write files in different formats. We describe a machine independent formal for storing data derived from automatic sequencing machines. Files in this format can store the derived sequence, the traces and a set of confidence measures for each base. We have adopted the format as the standard for our sequence handling software.

60 citations


Network Information
Related Topics (5)
Genome
74.2K papers, 3.8M citations
77% related
Gene expression profiling
26.9K papers, 1.7M citations
74% related
Exon
38.3K papers, 1.7M citations
72% related
Intron
23.8K papers, 1.3M citations
72% related
DNA methylation
49.8K papers, 2.5M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202120
202017
201922
201817
201716