scispace - formally typeset
Search or ask a question

Showing papers on "Personal genomics published in 2013"


Journal ArticleDOI
TL;DR: The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Abstract: Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.

2,958 citations


Journal ArticleDOI
TL;DR: It is recommended that laboratories performing clinical sequencing seek and report mutations of the specified classes or types in the genes listed here and encourage the creation of an ongoing process for updating these recommendations at least annually as further data are collected.

2,215 citations


Journal ArticleDOI
26 Sep 2013-Cell
TL;DR: The current state of genomics in the massively parallel sequencing era is explored, with a focus on clinical diagnostics and other aspects of medical care.

871 citations


Journal ArticleDOI
TL;DR: The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes, and indicate that combining data from two technologies can reduce coverage bias.
Abstract: DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias. We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage. The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.

806 citations


Journal ArticleDOI
TL;DR: The UCSC Genome Browser is a graphical viewer for genomic data that presents visualization of annotations mapped to genomic coordinates, and the ability to juxtapose annotations of many types facilitates inquiry-driven data mining.
Abstract: The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting.

672 citations


Journal ArticleDOI
19 Mar 2013-PLOS ONE
TL;DR: This study reports the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data, and provides a high-resolution strategy for large-scale genotyping that can be generally applicable to various species and populations.
Abstract: Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq). SLAF-seq technology has several distinguishing characteristics: i) deep sequencing to ensure genotyping accuracy; ii) reduced representation strategy to reduce sequencing costs; iii) pre-designed reduced representation scheme to optimize marker efficiency; and iv) double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.

621 citations


Journal ArticleDOI
TL;DR: This special issue on ‘Genotyping-by-Sequencing in Ecological and Conservation Genomics’ represents a diverse set of empirical and theoretical studies that demonstrate both the utility and some of the challenges of GBS in ecological and conservation genomics.
Abstract: The fields of ecological and conservation genetics have developed greatly in recent decades through the use of molecular markers to investigate organisms in their natural habitat and to evaluate the effect of anthropogenic disturbances. However, many of these studies have been limited to narrow regions of the genome, allowing for limited inferences but making it difficult to generalize about the organisms and their evolutionary history. Tremendous advances in sequencing technology over the last decade (i.e. next-generation sequencing; NGS) have led to the ability to sample the genome much more densely and to observe the patterns of genetic variation that result from the full range of evolutionary processes acting across the genome (Allendorf et al. 2010; Stapley et al. 2010; Li et al. 2012). These studies are transforming molecular ecology by making many long-standing questions much more easily accessible in almost any organism. When studying the genetics of wild populations, it is desirable to samples tens, hundreds or even thousands of individuals. While it is now possible to sequence whole genomes for tens of individuals with small genome sizes, the sequencing of hundreds of individuals with large genomes remains prohibitively expensive, particularly where the genome sequence is unknown. Further, for the purpose of many studies, complete genomic sequence data for all individuals would be unnecessary and simply inflate the computational and bioinformatic costs. A major recent advance has been the development of genotyping-by-sequencing (GBS) approaches that allow a targeted fraction of the genome (a reduced representation library) to be sequenced with next-generation technology rather than the entire genome, even in species with little or no previous genomic information and large genomes. The subset of the genome to be sequenced in these GBS approaches may be targeted using restriction enzymes or capture probes or by sequencing the transcriptome (reviewed in Davey et al. 2011). In the future, as sequencing technology and computational and bioinformatic methods develop further, whole-genome resequencing may become the predominant method for ecological and conservation genomics. Currently, reduced representation approaches offer the ability to not only discover genetic variants such as SNPs but also genotype individuals at these newly discovered loci in the same data. This special issue on ‘Genotyping-by-Sequencing in Ecological and Conservation Genomics’ represents a diverse set of empirical and theoretical studies that demonstrate both the utility and some of the challenges of GBS in ecological and conservation genomics. The empirical studies include demonstrations of the utility of GBS for population genomics and association mapping, as well as the development of genomic resources (i.e. large SNP data sets) for target species. The studies also illustrate some of the differences between GBS methods, in particular, aligning paired-end reads to achieve longer consensus sequences in contrast to single-end reads with shorter alignments, and double-digest versus sonication methods to fragment DNA. In addition, several papers describe advanced data pipelines for handling GBS-related sequence data and critically evaluate best practices for GBS methods and potential biases and novel features associated with GBS data. Overall, this compilation of papers emphasizes that GBS has been quickly adopted by the scientific community and is expected to become a common tool for studies in molecular ecology.

505 citations


Journal ArticleDOI
TL;DR: Two of the most commonly used platforms in research and clinical labs today: the LifeTechnologies Ion Torrent Personal Genome Machine (PGM) and the Illumina MiSeq are highlighted.

300 citations


Journal ArticleDOI
TL;DR: This review discusses innovative new approaches to genome sequencing, including ever finer analyses of transcriptome dynamics, genome structure and genomic variation, and provides an overview of the new insights into complex biological systems catalyzed by these technologies.
Abstract: Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches—including ever finer analyses of transcriptome dynamics, genome structure and genomic variation—and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.

297 citations


Journal ArticleDOI
TL;DR: An empirical Bayesian method is introduced to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.
Abstract: Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.

240 citations


Journal ArticleDOI
TL;DR: Full realization of the potential of personal genomics will benefit from network biology approaches that aim to uncover the mechanisms underlying disease pathogenesis, identify new biomarkers, and guide personalized therapeutic interventions.

Journal ArticleDOI
TL;DR: Empirical Bayesian mutation Calling enables accurate calling of mutations with low allele frequencies harboured within a minor tumour subpopulation, thus allowing for the deciphering of fine substructures within a tumour specimen.
Abstract: Recent advances in high-throughput sequencing technologies have enabled a comprehensive dissection of the cancer genome clarifying a large number of somatic mutations in a wide variety of cancer types. A number of methods have been proposed for mutation calling based on a large amount of sequencing data, which is accomplished in most cases by statistically evaluating the difference in the observed allele frequencies of possible single nucleotide variants between tumours and paired normal samples. However, an accurate detection of mutations remains a challenge under low sequencing depths or tumour contents. To overcome this problem, we propose a novel method, Empirical Bayesian mutation Calling (https://github.com/friend1ws/EBCall), for detecting somatic mutations. Unlike previous methods, the proposed method discriminates somatic mutations from sequencing errors based on an empirical Bayesian framework, where the model parameters are estimated using sequencing data from multiple non-paired normal samples. Using 13 whole-exome sequencing data with 87.5-206.3 mean sequencing depths, we demonstrate that our method not only outperforms several existing methods in the calling of mutations with moderate allele frequencies but also enables accurate calling of mutations with low allele frequencies (≤ 10%) harboured within a minor tumour subpopulation, thus allowing for the deciphering of fine substructures within a tumour specimen.

Journal ArticleDOI
TL;DR: A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs and the higher sensitivity at detecting low-frequency and rare variants.
Abstract: Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing.
Abstract: Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics.

01 Jan 2013
TL;DR: Evaluated the performance of mtGenome sequencing using the Personal Genome Machine and compared the resulting haplotypes directly with conventional Sanger-type sequencing and the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes.
Abstract: A B S T R A C T Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics.

Journal ArticleDOI
TL;DR: Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals.
Abstract: The debate regarding the relative merits of whole genome sequencing (WGS) versus exome sequencing (ES) centers around comparative cost, average depth of coverage for each interrogated base, and their relative efficiency in the identification of medically actionable variants from the myriad of variants identified by each approach. Nevertheless, few genomes have been subjected to both WGS and ES, using multiple next generation sequencing platforms. In addition, no personal genome has been so extensively analyzed using DNA derived from peripheral blood as opposed to DNA from transformed cell lines that may either accumulate mutations during propagation or clonally expand mosaic variants during cell transformation and propagation. We investigated a genome that was studied previously by SOLiD chemistry using both ES and WGS, and now perform six independent ES assays (Illumina GAII (x2), Illumina HiSeq (x2), Life Technologies' Personal Genome Machine (PGM) and Proton), and one additional WGS (Illumina HiSeq). We compared the variants identified by the different methods and provide insights into the differences among variants identified between ES runs in the same technology platform and among different sequencing technologies. We resolved the true genotypes of medically actionable variants identified in the proband through orthogonal experimental approaches. Furthermore, ES identified an additional SH3TC2 variant (p.M1?) that likely contributes to the phenotype in the proband. ES identified additional medically actionable variant calls and helped resolve ambiguous single nucleotide variants (SNV) documenting the power of increased depth of coverage of the captured targeted regions. Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals.

Journal ArticleDOI
TL;DR: Findings suggest that neither the health benefits envisioned by DTC-GT proponents nor the worst fears expressed by its critics have materialized to date.
Abstract: Direct-to-consumer genetic testing (DTC-GT) has sparked much controversy and undergone dramatic changes in its brief history. Debates over appropriate health policies regarding DTC-GT would benefit from empirical research on its benefits, harms, and limitations. We review the recent literature (2011-present) and summarize findings across (1) content analyses of DTC-GT websites, (2) studies of consumer perspectives and experiences, and (3) surveys of relevant health care providers. Findings suggest that neither the health benefits envisioned by DTC-GT proponents (e.g., significant improvements in positive health behaviors) nor the worst fears expressed by its critics (e.g., catastrophic psychological distress and misunderstanding of test results, undue burden on the health care system) have materialized to date. However, research in this area is in its early stages and possesses numerous key limitations. We note needs for future studies to illuminate the impact of DTC-GT and thereby guide practice and policy regarding this rapidly evolving approach to personal genomics.

Journal ArticleDOI
Yi Wang1, James T. Lu1, Jin Yu1, Richard A. Gibbs1, Fuli Yu1 
TL;DR: This work describes methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage sequencing of populations, implemented in a pipeline called SNPTools, and develops an imputation engine that refines raw genotype likelihoods to produce high- quality phased genotypes/haplotypes.
Abstract: Next-generation sequencing is a powerful approach for discovering genetic variation. Sensitive variant calling and haplotype inference from population sequencing data remain challenging. We describe methods for high-quality discovery, genotyping, and phasing of SNPs for low-coverage (approximately 5×) sequencing of populations, implemented in a pipeline called SNPTools. Our pipeline contains several innovations that specifically address challenges caused by low-coverage population sequencing: (1) effective base depth (EBD), a nonparametric statistic that enables more accurate statistical modeling of sequencing data; (2) variance ratio scoring, a variance-based statistic that discovers polymorphic loci with high sensitivity and specificity; and (3) BAM-specific binomial mixture modeling (BBMM), a clustering algorithm that generates robust genotype likelihoods from heterogeneous sequencing data. Last, we develop an imputation engine that refines raw genotype likelihoods to produce high-quality phased genotypes/haplotypes. Designed for large population studies, SNPTools' input/output (I/O) and storage aware design leads to improved computing performance on large sequencing data sets. We apply SNPTools to the International 1000 Genomes Project (1000G) Phase 1 low-coverage data set and obtain genotyping accuracy comparable to that of SNP microarray.

Journal ArticleDOI
TL;DR: This work states that in cancer with extreme genomic heterogeneity, applying HTS in cancer tissue samples from indi-vidual patients provides unprecedented power to reach personalized, highly effective approaches overcoming current nca.
Abstract: High-throughput sequencing (HTS) tech -nologies have revolutionized biomedical research These breakthrough platforms have rapidly evolved from next-generation sequencing (NGS) or second-generation (2G) platforms to third-generation (3G) and fourth-generation (4G) sequencing machines Sequencing, mapping and comparing the genomes of cells in healthy and disease states, cheaply, rapidly and accurately can alter the way clinicians think about how to treat patients shifting from traditional medicine to a genome-based era of preventive and therapeutic decisions Particularly, in cancer with extreme genomic heterogeneity, applying HTS in cancer tissue samples from indi-vidual patients provides unprecedented power to reach personalized, highly effective approaches overcoming current nca et s i s e r c i ut pehea t r

Journal ArticleDOI
TL;DR: Some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE are explained.
Abstract: By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE.

Journal ArticleDOI
23 Jul 2013-PLOS ONE
TL;DR: Undergoing personal genome testing and using personal genotype data in the classroom enhanced students' self-reported and assessed knowledge of genomics, and did not appear to cause significant anxiety.
Abstract: An emerging debate in academic medical centers is not about the need for providing trainees with fundamental education on genomics, but rather the most effective educational models that should be deployed. At Stanford School of Medicine, a novel hands-on genomics course was developed in 2010 that provided students the option to undergo personal genome testing as part of the course curriculum. We hypothesized that use of personal genome testing in the classroom would enhance the learning experience of students. No data currently exist on how such methods impact student learning; thus, we surveyed students before and after the course to determine its impact. We analyzed responses using paired statistics from the 31 medical and graduate students who completed both pre-course and post-course surveys. Participants were stratified by those who did (N = 23) or did not (N = 8) undergo personal genome testing. In reflecting on the experience, 83% of students who underwent testing stated that they were pleased with their decision compared to 12.5% of students who decided against testing (P = 0.00058). Seventy percent of those who underwent personal genome testing self-reported a better understanding of human genetics on the basis of having undergone testing. Further, students who underwent personal genome testing demonstrated an average 31% increase in pre- to post-course scores on knowledge questions (P = 3.5×10−6); this was significantly higher (P = 0.003) than students who did not undergo testing, who showed a non-significant improvement. Undergoing personal genome testing and using personal genotype data in the classroom enhanced students' self-reported and assessed knowledge of genomics, and did not appear to cause significant anxiety. At least for self-selected students, the incorporation of personal genome testing can be an effective educational tool to teach important concepts of clinical genomic testing.

Journal ArticleDOI
11 Jun 2013-PLOS ONE
TL;DR: A detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions is presented.
Abstract: The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes.

Journal ArticleDOI
TL;DR: The ‘thousand-dollar genome’ will also become increasingly important to healthcare, and the applications that come within reach raise a number of ethical questions.
Abstract: SUMMARY Analysing the entire human genome Whole-genome sequencing (WGS) can lead to whole-genome analysis (WGA), in which the meaning of the raw data obtained during sequencing is fleshed out. This is done using software integrating the latest scientific insights into the relationship between genes and health. Filters may be used to selectively examine certain parts of the genome (targeted analysis), for example when diagnosing diseases with a known genetic substrate. Use of filters helps limit the amount of non-relevant information. Using this approach, WGS-based diagnostic testing yields results that are not different from diagnostic testing with existing methods, such as DNA chips. If WGS becomes cheap enough in future, it will likely become the standard approach. Diagnostic testing WGA (complete sequence analysis) is also expected to play a role in healthcare, specifically in the diagnosis of diseases for which the genetic background is not yet (or insufficiently) clear. Searching the entire genome will often allow a diagnosi st o be made. This approach was recently already implemented, but using less powerful techniques. Genome-wide diagnostic testing inevitably means that far more genetic information about the patient is revealed than is necessary for answering the clinical question. Among other things, this raises questions about the feasibility of informed consent, the possibility to shape the ‘right not to know’ and the limits to the requirement to inform patients. What should happen with the (raw) sequencing data afterwards? Should it be stored? Is it allowed to be destroyed? What about the analysis findings (genetic information): should all unsought for findings also be saved? What about genetic information not desired by the patient, and therefore not supplied to him or her? Finally: can a doctor be expected to recontact the patient if new scientific knowledge means that data obtained from past WGA may be viewed in a new light? Screening Some commentators expect that WGS and WGA for every individual will be worthwhile in a few years. This would be performed without a concrete medical indication, meaning it is screening rather than diagnostic testing. Whole-genome screening creates a personal genomic database (personal genome) that can subsequently be used to deliver ‘personalised medicine’ to individual patients. While the first steps in this direction have already been taken (particularly in the field of pharmacogenetics: ‘personalised medication’), this largely remains something for the future. According to some, analysing the personal genome would ideally be done when people reach the legal age. The individual can then decide for himself whether or not to take part in this form of screening, and it is still early enough in life to benefit sufficiently. The usefulness may consist of lifestyle advice, treatment and prevention tailored to the personal health profile, but also of risk information that could affect reproductive decisions. In addition to the (currently largely hypothetical) advantages of analysing the personal genome, there are also all too real disadvantages to obtaining information that could burdensome or even harmful. Disadvantages include worry caused by (still) unclear findings and the resulting ‐ often unnecessary ‐ contacts with healthcare. As long as there is no clear positive balance of advantages and disadvantages, there can be no responsible implementation of whole genome population screening within public healthcare. However, as soon as WGS/WGA becomes cheap enough, commercial parties will likely see a market. Wholegenome tests are already commercially available, albeit currently implementing methods that only examine small, common variations in the genome. The existing commercial availability of preconception testing for recessive genetic conditions t oi ndividuals and couples who wish to have children is also a potential area for expansion. Application of WGS in this context can easily lead to the question of why analysis should be limited to finding out about carrier risk status. If removing filters is enough to obtain a WGA, the question is no longer what we do, but what we do not want to know about ourselves.

Journal ArticleDOI
TL;DR: This review highlights some of the major results and discoveries stemming from high-throughput DNA sequencing research in the understanding of Mendelian genetic disorders, hematologic cancer biology, infectious diseases, the immune system, transplant biology, and prenatal diagnostics.
Abstract: Advances in DNA sequencing technology have allowed comprehensive investigation of the genetics of human beings and human diseases. Insights from sequencing the genomes, exomes, or transcriptomes of healthy and diseased cells in patients are already enabling improved diagnostic classification, prognostication, and therapy selection for many diseases. Understanding the data obtained using new high-throughput DNA sequencing methods, choices made in sequencing strategies, and common challenges in data analysis and genotype-phenotype correlation is essential if pathologists, geneticists, and clinicians are to interpret the growing scientific literature in this area. This review highlights some of the major results and discoveries stemming from high-throughput DNA sequencing research in our understanding of Mendelian genetic disorders, hematologic cancer biology, infectious diseases, the immune system, transplant biology, and prenatal diagnostics. Transition of new DNA sequencing methodologies to the clinical laboratory is under way and is likely to have a major impact on all areas of medicine.

Journal ArticleDOI
TL;DR: The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.
Abstract: Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome-wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome-wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long-term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.

Journal ArticleDOI
TL;DR: This work first describes sequence-based improvements to existing study designs, followed by prioritization of both samples and genomic regions to be sequenced, and then addresses the ultimate goal of analyzing thousands of whole-genome sequences.

Journal ArticleDOI
TL;DR: A discourse analysis of ways in which genetic counseling is represented on DTC GT websites, blogs and other online material demonstrates shifting professional roles and forms of expertise in genetic counseling.
Abstract: Individuals now have access to an increasing number of internet resources offering personal genomics services. As the direct-to-consumer genetic testing (DTC GT) industry expands, critics have called for pre- and post-test genetic counseling to be included with the product. Several genetic testing companies offer genetic counseling. There has been no examination to date of this service provision, whether it meets critics’ concerns and implications it may have for the genetic counseling profession. Considering the increasing relevance of genetics in healthcare, the complexity of genetic information provided by DTC GT, the mediating role of the internet in counseling, and potential conflicts of interest, this is a topic which deserves further attention. In this paper we offer a discourse analysis of ways in which genetic counseling is represented on DTC GT websites, blogs and other online material. This analysis identified four types of genetic counseling represented on the websites: the integrated counseling product; discretionary counseling; independent counseling; and product advice. Genetic counselors are represented as having the following roles: genetics educator; mediator; lifestyle advisor; risk interpreter; and entrepreneur. We conclude that genetic counseling as represented on DTC GT websites demonstrates shifting professional roles and forms of expertise in genetic counseling. Genetic counselors are also playing an important part in how the genetic testing market is taking shape. Our analysis offers important and timely insights into recent developments in the genetic counseling profession, which have relevance for practitioners, researchers and policy makers concerned with the evolving field of personal genomics.

Journal ArticleDOI
TL;DR: It is evident that pharmacogenomics and individualized drug therapy are the building blocks of personalized medicine.
Abstract: Technology continues to lead the field of personalized medicine as the interpretation of the human genome is progressing. The cost and duration of genomic sequencing continue to decrease sharply and there is intensive research aimed at understanding how the changes that occur within the genome can alter its function and the genomic variations that constitute individual susceptibility to diseases and responses to therapy. The overlay of a personal genome with the personal medical record of patients has a potential to improve prediction and prevention and to allow a more pro-active therapeutic strategy. It is evident that pharmacogenomics and individualized drug therapy are the building blocks of personalized medicine. A growing number of drugs are now used for the treatment of cancer in subjects selected by a companion genetic test. Personalized medicine while based upon genomic knowledge of the individual requires equally essential personalised environmental information as well as the understanding of every subject's capacity for health-promoting behaviour.

Journal ArticleDOI
TL;DR: There was considerable willingness to participate in and desire for personal results from genomics research in this sample of predominantly low-income, Hispanic and African American patients.
Abstract: Patients from traditionally underrepresented communities need to be involved in discussions around genomics research including attitudes towards participation and receiving personal results. Structured interviews, including open-ended and closed-ended questions, were conducted with 205 patients in an inner-city hospital outpatient clinic: 48 % of participants self-identified as Black or African American, 29 % Hispanic, 10 % White; 49 % had an annual household income of <$20,000. When the potential for personal results to be returned was not mentioned, 82 % of participants were willing to participate in genomics research. Reasons for willingness fell into four themes: altruism; benefit to family members; personal health benefit; personal curiosity and improving understanding. Reasons for being unwilling fell into five themes: negative perception of research; not personally relevant; negative feelings about procedures (e.g., blood draws); practical barriers; and fear of results. Participants were more likely to report that they would participate in genomics research if personal results were offered than if they were not offered (89 vs. 62 % respectively, p < 0.001). Participants were more interested in receiving personal genomic risk results for cancer, heart disease and type 2 diabetes than obesity (89, 89, 91, 80 % respectively, all p < 0.001). The only characteristic consistently associated with interest in receiving personal results was disease-specific worry. There was considerable willingness to participate in and desire for personal results from genomics research in this sample of predominantly low-income, Hispanic and African American patients. When returning results is not practical, or even when it is, alternatively or additionally providing generic information about genomics and health may also be a valuable commodity to underrepresented minority and other populations considering participating in genomics research.

Journal ArticleDOI
TL;DR: Personal genotyping may improve students' self-reported motivation and engagement with course material, however, consultative support that is different from traditional genetic counseling will be necessary to support students.
Abstract: Direct-to-consumer (DTC) personal genotyping services are beginning to be adoptedby educational institutions as pedagogical tools for learning about humangenetics. However, there is little known about student reactions to such testing.This study investigated student experiences and attitudes towards DTC personalgenome testing. Individual interviews were conducted with students who chose to undergo personalgenotyping in the context of an elective genetics course. Ten medical and graduatestudents were interviewed before genotyping occurred, and at 2 weeks and 6 monthsafter receiving their genotype results. Qualitative analysis of interviewtranscripts assessed the expectations and experiences of students who underwentpersonal genotyping, how they interpreted and applied their results; how thetesting affected the quality of their learning during the course, and what weretheir perceived needs for support. Students stated that personal genotyping enhanced their engagement with the coursecontent. Although students expressed skepticism over the clinical utility of sometest results, they expressed significant enthusiasm immediately after receivingtheir personal genetic analysis, and were particularly interested in results suchas drug response and carrier testing. However, few reported making behavioralchanges or following up on specific results through a healthcare provider.Students did not report utilizing genetic counseling, despite feeling stronglythat the 'general public' would need these services. In follow-up interviews,students exhibited poor recall on details of the consent and biobankingagreements, but expressed little regret over their decision to undergo genotyping.Students reported mining their raw genetic data, and conveyed a need for furtherconsultation support in their exploration of genetic variants. Personal genotyping may improve students' self-reported motivation and engagementwith course material. However, consultative support that is different fromtraditional genetic counseling will be necessary to support students. Beforeincorporating personal genotyping into coursework, institutions should leadmulti-disciplinary discussion to anticipate issues and incorporate teachingmechanisms that engage the ethical, legal, and social implications of personalgenotyping, including addressing those found in this study, to go beyond what isoffered by commercial providers.