scispace - formally typeset
Search or ask a question

Showing papers on "Personal genomics published in 2012"


Journal ArticleDOI
TL;DR: Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number of complete plant genomes.
Abstract: The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

3,728 citations


Journal ArticleDOI
TL;DR: A novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome, which includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants.
Abstract: As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

2,355 citations


Journal ArticleDOI
TL;DR: This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies.
Abstract: Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

424 citations


Journal ArticleDOI
TL;DR: WANNOVAR as mentioned in this paper is a web server for functional annotation of genetic variants from personal genomes, which provides a simple and intuitive interface to help users determine the functional significance of variants.
Abstract: Background High-throughput DNA sequencing platforms have become widely available. As a result, personal genomes are increasingly being sequenced in research and clinical settings. However, the resulting massive amounts of variants data pose significant challenges to the average biologists and clinicians without bioinformatics skills. Methods and results We developed a web server called wANNOVAR to address the critical needs for functional annotation of genetic variants from personal genomes. The server provides simple and intuitive interface to help users determine the functional significance of variants. These include annotating single nucleotide variants and insertions/deletions for their effects on genes, reporting their conservation levels (such as PhyloP and GERP++ scores), calculating their predicted functional importance scores (such as SIFT and PolyPhen scores), retrieving allele frequencies in public databases (such as the 1000 Genomes Project and NHLBI-ESP 5400 exomes), and implementing a ‘variants reduction’ protocol to identify a subset of potentially deleterious variants/genes. We illustrated how wANNOVAR can help draw biological insights from sequencing data, by analysing genetic variants generated on two Mendelian diseases. Conclusions We conclude that wANNOVAR will help biologists and clinicians take advantage of the personal genome information to expedite scientific discoveries. The wANNOVAR server is available at , and will be continuously updated to reflect the latest annotation information.

365 citations


Journal ArticleDOI
12 Jul 2012-Nature
TL;DR: A low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes is described.
Abstract: Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ∼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10–20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications. A new DNA analysis method termed long fragment read technology is described, and the approach is used to determine parental haplotypes and to sequence human genomes cost-effectively and accurately from only 10 to 20 cells. Many of the hoped-for advances in the field of personalized medicine are dependent on the development of low-cost genome-sequencing technology that combines clinical accuracy with the ability to describe the context (the genetic haplotype) in which variants occur on an individual chromosome. The technique described here, termed long-fragment read technology, is similar to that used to sequence long single DNA molecules, but without DNA cloning or chromosome separation. The authors demonstrate the potential of this approach by generating seven accurate human genome sequences, as well as haplotype data, from samples containing just 10–20 cells. This advance shows that it should be possible to achieve clinical quality and scale in personal genome sequencing of microbiopsies and circulating cancer cells.

320 citations


Journal ArticleDOI
TL;DR: This work defines a custom data processing pipeline for Pacific Biosciences data for human data analysis, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs.
Abstract: Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects. We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis. Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.

248 citations


Journal ArticleDOI
TL;DR: To conclude, the most valuable contribution of GWAS-identified loci lies in their contribution to elucidating new physiological pathways that underlie obesity-susceptibility.

222 citations


Journal ArticleDOI
TL;DR: This article discusses the growth of this resource and its use by affiliated software to create personal genome reports and describes how associations are assigned to single genotypes as well as sets of genosets.
Abstract: SNPedia (http://www.SNPedia.com) is a wiki resource of the functional consequences of human genetic variation as published in peer-reviewed studies. Online since 2006 and freely available for personal use, SNPedia has focused on the medical, phenotypic and genealogical associations of single nucleotide polymorphisms. Entries are formatted to allow associations to be assigned to single genotypes as well as sets of genotypes (genosets). In this article, we discuss the growth of this resource and its use by affiliated software to create personal genome reports.

213 citations


Journal ArticleDOI
TL;DR: An unbiased assessment of the capacity of whole-genome sequencing to provide clinically relevant information assuming that future research will allow us to understand the significance of every genetic variant is presented.
Abstract: New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a “genometype”. A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that: (i) for 23 of the 24 diseases, the majority of individuals will receive negative test results, (ii) these negative test results will, in general, not be very informative, as the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 - 80% of that in the general population, and (iii) on the positive side, in the best-case scenario more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy makers and consumers.

196 citations


Journal ArticleDOI
TL;DR: Simulation with GemSIM provides insights into the error profiles of individual sequencing runs and allowing researchers to assess the effects of these errors on downstream data analysis, as analysis is complicated as error profiles can vary noticeably even between different runs of the same technology.
Abstract: GemSIM, or General Error-Model based SIMulator, is a next-generation sequencing simulator capable of generating single or paired-end reads for any sequencing technology compatible with the generic formats SAM and FASTQ (including Illumina and Roche/454). GemSIM creates and uses empirically derived, sequence-context based error models to realistically emulate individual sequencing runs and/or technologies. Empirical fragment length and quality score distributions are also used. Reads may be drawn from one or more genomes or haplotype sets, facilitating simulation of deep sequencing, metagenomic, and resequencing projects. We demonstrate GemSIM's value by deriving error models from two different Illumina sequencing runs and one Roche/454 run, and comparing and contrasting the resulting error profiles of each run. Overall error rates varied dramatically, both between individual Illumina runs, between the first and second reads in each pair, and between datasets from Illumina and Roche/454 technologies. Indels were markedly more frequent in Roche/454 than Illumina and both technologies suffered from an increase in error rates near the end of each read. The effects of these different profiles on low-frequency SNP-calling accuracy were investigated by analysing simulated sequencing data for a mixture of bacterial haplotypes. In general, SNP-calling using VarScan was only accurate for SNPs with frequency > 3%, independent of which error model was used to simulate the data. Variation between error profiles interacted strongly with VarScan's 'minumum average quality' parameter, resulting in different optimal settings for different sequencing runs. Next-generation sequencing has unprecedented potential for assessing genetic diversity, however analysis is complicated as error profiles can vary noticeably even between different runs of the same technology. Simulation with GemSIM can help overcome this problem, by providing insights into the error profiles of individual sequencing runs and allowing researchers to assess the effects of these errors on downstream data analysis.

169 citations


Journal ArticleDOI
TL;DR: Early adopters of personal genomics are prospectively enthusiastic about using genomic profiling information to improve their health, in close consultation with their physicians, suggesting that early users (i.e. through direct-to-consumer companies or research) may follow up with the health care system.
Abstract: Background/Aims: To predict the potential public health impact of personal genomics, empirical research on public perceptions of these services is needed. In this study, ‘early adopters’ of personal genomics were surveyed to assess their motivations, perceptions and intentions. Methods: Participants were recruited from everyone who registered to attend an enrollment event for the Coriell Personalized Medicine Collaborative, a United States-based (Camden, N.J.) research study of the utility of personalized medicine, between March 31, 2009 and April 1, 2010 (n = 369). Participants completed an Internet-based survey about their motivations, awareness of personalized medicine, perceptions of study risks and benefits, and intentions to share results with health care providers. Results: Respondents were motivated to participate for their own curiosity and to find out their disease risk to improve their health. Fewer than 10% expressed deterministic perspectives about genetic risk, but 32% had misperceptions about the research study or personal genomic testing. Most respondents perceived the study to have health-related benefits. Nearly all (92%) intended to share their results with physicians, primarily to request specific medical recommendations. Conclusion: Early adopters of personal genomics are prospectively enthusiastic about using genomic profiling information to improve their health, in close consultation with their physicians. This suggests that early users (i.e. through direct-to-consumer companies or research) may follow up with the health care system. Further research should address whether intentions to seek care match actual behaviors.


Journal ArticleDOI
TL;DR: The field of genomics is driven by technological improvements in sequencing platforms; however, software and algorithm development has lagged behind reductions in sequencing costs, improved throughput, and quality improvements as discussed by the authors.
Abstract: The study of plant biology in the 21st century is, and will continue to be, vastly different from that in the 20th century. One driver for this has been the use of genomics methods to reveal the genetic blueprints for not one but dozens of plant species, as well as resolving genome differences in thousands of individuals at the population level. Genomics technology has advanced substantially since publication of the first plant genome sequence, that of Arabidopsis thaliana, in 2000. Plant genomics researchers have readily embraced new algorithms, technologies and approaches to generate genome, transcriptome and epigenome datasets for model and crop species that have permitted deep inferences into plant biology. Challenges in sequencing any genome include ploidy, heterozygosity and paralogy, all which are amplified in plant genomes compared to animal genomes due to the large genome sizes, high repetitive sequence content, and rampant whole- or segmental genome duplication. The ability to generate de novo transcriptome assemblies provides an alternative approach to bypass these complex genomes and access the gene space of these recalcitrant species. The field of genomics is driven by technological improvements in sequencing platforms; however, software and algorithm development has lagged behind reductions in sequencing costs, improved throughput, and quality improvements. It is anticipated that sequencing platforms will continue to improve the length and quality of output, and that the complementary algorithms and bioinformatic software needed to handle large, repetitive genomes will improve. The future is bright for an exponential improvement in our understanding of plant biology.

Journal ArticleDOI
TL;DR: New generation of sequencers, based on the 'next-next' or third-generation sequencing (TGS) technologies like the Single-Molecule Real-Time (SMRT™) Sequencer, Heliscope™ Single Molecule Sequencer and the Ion Personal Genome Machine™ are becoming available that are capable of generating longer sequence reads in a shorter time and at even lower costs per instrument run.
Abstract: A number of next-generation sequencing (NGS) technologies such as Roche/454, Illumina and AB SOLiD have recently become available. These technologies are capable of generating hundreds of thousands or tens of millions of short DNA sequence reads at a relatively low cost. These NGS technologies, now referred as second-generation sequencing (SGS) technologies, are being utilized for de novo sequencing, genome re-sequencing, and whole genome and transcriptome analysis. Now, new generation of sequencers, based on the 'next-next' or third-generation sequencing (TGS) technologies like the Single-Molecule Real-Time (SMRT™) Sequencer, Heliscope™ Single Molecule Sequencer, and the Ion Personal Genome Machine™ are becoming available that are capable of generating longer sequence reads in a shorter time and at even lower costs per instrument run. Ever declining sequencing costs and increased data output and sample throughput for NGS and TGS sequencing technologies enable the plant genomics and breeding community to undertake genotyping-by-sequencing (GBS). Data analysis, storage and management of large-scale second or TGS projects, however, are essential. This article provides an overview of different sequencing technologies with an emphasis on forthcoming TGS technologies and bioinformatics tools required for the latest evolution of DNA sequencing platforms.

Journal ArticleDOI
TL;DR: The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly, paying particular attention to mammalian-sized genomes.
Abstract: The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

Journal ArticleDOI
17 Jan 2012
TL;DR: SoyKB addresses the increasing need of the soybean research community to have a one-stop-shop functional and translational omics web resource for information retrieval and analysis in a user-friendly way.
Abstract: Background Soybean Knowledge Base (SoyKB) is a comprehensive all-inclusive web resource for soybean translational genomics. SoyKB is designed to handle the management and integration of soybean genomics, transcriptomics, proteomics and metabolomics data along with annotation of gene function and biological pathway. It contains information on four entities, namely genes, microRNAs, metabolites and single nucleotide polymorphisms (SNPs).

Journal ArticleDOI
Nian Wang1, Linchuan Fang1, Haiping Xin1, Lijun Wang1, Shaohua Li1 
TL;DR: The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using designed SNP markers.
Abstract: Background: Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results: An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions: The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison.

Journal ArticleDOI
TL;DR: This review describes major methodologies currently used for gene capture and detection of genetic variations by next-generation sequencing (NGS) and highlights applications of this technology in studies of genetic disorders.

Journal ArticleDOI
Qi Liu1, Yan Guo1, Jiang Li1, Jirong Long1, Bing Zhang1, Yu Shyr1 
TL;DR: In this paper, the authors made a systematic assessment of the relative contribution of each step to the accuracy of variant calling from Illumina DNA sequencing data and found that trimming off low-quality tails helped align more reads, but it introduced lots of false positives.
Abstract: Accurate calling of SNPs and genotypes from next-generation sequencing data is an essential prerequisite for most human genetics studies. A number of computational steps are required or recommended when translating the raw sequencing data into the final calls. However, whether each step does contribute to the performance of variant calling and how it affects the accuracy still remain unclear, making it difficult to select and arrange appropriate steps to derive high quality variants from different sequencing data. In this study, we made a systematic assessment of the relative contribution of each step to the accuracy of variant calling from Illumina DNA sequencing data. We found that the read preprocessing step did not improve the accuracy of variant calling, contrary to the general expectation. Although trimming off low-quality tails helped align more reads, it introduced lots of false positives. The ability of markup duplication, local realignment and recalibration, to help eliminate false positive variants depended on the sequencing depth. Rearranging these steps did not affect the results. The relative performance of three popular multi-sample SNP callers, SAMtools, GATK, and GlfMultiples, also varied with the sequencing depth. Our findings clarify the necessity and effectiveness of computational steps for improving the accuracy of SNP and genotype calls from Illumina sequencing data and can serve as a general guideline for choosing SNP calling strategies for data with different coverage.

Journal ArticleDOI
TL;DR: The utility and limitations of personal genomic data in three domains are discussed: pharmacogenomics, assessment of genetic predispositions for common diseases, and identification of rare disease-causing genetic variants.
Abstract: Medicine has always been personalized. For years, physicians have incorporated environmental, behavioural, and genetic factors that affect disease and drug response into patient management decisions. However, until recently, the ‘genetic’ data took the form of family history and self-reported race/ethnicity. As genome sequencing declines in cost, the availability of specific genomic information will no longer be limiting. Rather, our ability to parse these data and our decision whether to use it will become primary. As our understanding of genetic association with drug responses and diseases continues to improve, clinically useful genetic tests may emerge to improve upon our previous methods of assessing genetic risks. Indeed, genetic tests for monogenic disorders have already proven useful. Such changes may usher in a new era of personalized medicine. In this review, we will discuss the utility and limitations of personal genomic data in three domains: pharmacogenomics, assessment of genetic predispositions for common diseases, and identification of rare disease-causing genetic variants.

Journal ArticleDOI
TL;DR: The need for clinical-grade sample collection, high-quality sequencing data acquisition, digitalized phenotyping, rigorous generation of variant calls, and comprehensive functional annotation of variants is urged and a 'networking of science' model that encourages much more collaboration and online sharing of medical history, genomic data and biological knowledge is suggested.
Abstract: The pace of exome and genome sequencing is accelerating, with the identification of many new disease-causing mutations in research settings, and it is likely that whole exome or genome sequencing could have a major impact in the clinical arena in the relatively near future. However, the human genomics community is currently facing several challenges, including phenotyping, sample collection, sequencing strategies, bioinformatics analysis, biological validation of variant function, clinical interpretation and validity of variant data, and delivery of genomic information to various constituents. Here we review these challenges and summarize the bottlenecks for the clinical application of exome and genome sequencing, and we discuss ways for moving the field forward. In particular, we urge the need for clinical-grade sample collection, high-quality sequencing data acquisition, digitalized phenotyping, rigorous generation of variant calls, and comprehensive functional annotation of variants. Additionally, we suggest that a 'networking of science' model that encourages much more collaboration and online sharing of medical history, genomic data and biological knowledge, including among research participants and consumers/patients, will help establish causation and penetrance for disease causal variants and genes. As we enter this new era of genomic medicine, we envision that consumer-driven and consumer-oriented efforts will take center stage, thus allowing insights from the human genome project to translate directly back into individualized medicine.

Journal ArticleDOI
11 Jul 2012-PLOS ONE
TL;DR: Observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.
Abstract: Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.

Journal ArticleDOI
TL;DR: A historical perspective on human genome sequencing is provided, current and future sequencing technologies are summarized, issues related to data management and interpretation are highlighted, and research and clinical applications of high-throughput sequencing are considered, with specific emphasis on cardiovascular disease.
Abstract: We are in the midst of a time of great change in genetics that may dramatically impact human biology and medicine. The completion of the human genome project,1,2 the development of low cost, high-throughput parallel sequencing technology, and large-scale studies of genetic variation3 have provided a rich set of techniques and data for the study of genetic disease risk, treatment response, population diversity, and human evolution. Newly-developed sequencing instruments now generate hundreds of millions to billions of short sequences per run, allowing for rapid complete sequencing of human genomes. These technological advances have facilitated a precipitous drop (Figure 1) in the cost per base pair of DNA sequenced. To capitalize on the potential of these technologies for research and clinical applications, translational scientists and clinicians must become familiar with a continuously evolving field. In this review we will provide a historical perspective on human genome sequencing, summarize current and future sequencing technologies, highlight issues related to data management and interpretation, and finally consider research and clinical applications of high-throughput sequencing, with specific emphasis on cardiovascular disease. Open in a separate window Figure 1 Sequencing milestones, costs, and output since completion of the human genome project. Note logarithmic scale for sequencing costs and bases produced per sequence run.

Journal ArticleDOI
TL;DR: The application of NGS to breast cancer has been associated with tremendous advances and promises for increasing the understanding of the disease, however, there still remain many unanswered questions, such as the role of structural changes of tumor genomes in cancer progression and treatment response/resistance.
Abstract: We are currently on the threshold of a revolution in breast cancer research, thanks to the emergence of novel technologies based on next-generation sequencing (NGS). In this review, we will describe the different sequencing technologies and platforms, and summarize the main findings from the latest sequencing articles in breast cancer.

Journal ArticleDOI
TL;DR: The most important developments in the field are reviewed--the databases and bioinformatics tools that will be of utmost importance in the concerted effort to interpret the human variome.
Abstract: An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.

Journal ArticleDOI
TL;DR: Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways.
Abstract: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

Journal ArticleDOI
TL;DR: The feasibility and accuracy of Ion Torrent Personal Genome Machine (PGM) sequencing for genomic typing of meningococci was explored and the amount of typing information could be substantially increased by the combined use of genome sequencing and BIGSdb compared to conventional methods.
Abstract: Neisseria meningitidis causes invasive meningococcal disease in infants, toddlers and adolescents worldwide. DNA sequence based typing has become the standard for molecular epidemiology of the organism including multilocus sequence typing, analysis of genetic determinants of antibiotic resistance, and sequence typing of vaccine antigens. However, PCR of multiple targets and consecutive Sanger sequencing provides logistic constraints to reference laboratories. Taking advantage of the recent development of benchtop next generation sequencers (NGS) and of BIGSdb, a database accommodating and analyzing genome sequence data, we therefore explored the feasibility and accuracy of Ion Torrent Personal Genome Machine™ (PGM™) sequencing for genomic typing of meningococci. Three strains from a previous meningococcus B community outbreak were selected to compare conventional typing results with data generated by semiconductor chip based sequencing. In addition, sequencing of the meningococcal type strain MC58 provided information about the general performance of the technology. The PGM™ technology generated sequence information for almost all target genes addressed. The results were 100% concordant with conventional typing results with no further editing necessary. In addition, the amount of typing information, i.e. nucleotides and target genes analyzed, could be substantially increased by the combined use of genome sequencing and BIGSdb compared to conventional methods. In a near future, affordable and fast benchtop-NGS machines like the PGM™ might enable reference laboratories to switch to genomic typing on a routine basis. This will reduce workload and rapidly provide information for laboratory surveillance, outbreak investigation, assessment of vaccine preventability and antibiotic resistance gene monitoring.

Journal ArticleDOI
TL;DR: Molecular genetics of prostate cancer: emerging appreciation of genetic complexity and the need for further research into this area.
Abstract: Barbieri C E, Demichelis F & Rubin M A (2012) Histopathology 60, 187–198 Molecular genetics of prostate cancer: emerging appreciation of genetic complexity The emergence of Next Generation Sequencing is providing novel insights into cancer genomes as part of large-scale efforts by the International Cancer Genome Consortium (ICGC), as well as individual Genome Centers. Studies performing whole genome or whole exome DNA sequencing are remarkable both for the alterations discovered and equally important for the infrequent nature of recurrent mutations. Current understanding of the prostate cancer (PCa) genome is based on extensive RNA-sequencing for novel gene fusions and the first whole genome sequencing effort. The emerging data suggest that there are few recurrent genetic mutations. Surprisingly, the PCa genome undergoes frequent large-scale genomic rearrangements that could not have been predicted using previous DNA sequencing approaches, or even whole exome sequencing approaches. These large-scale rearrangements appear not to occur randomly, but demonstrate patterns leading to the ‘chained’ juxtaposition of known oncogenes. Future efforts in DNA sequencing will help to determine the recurrent nature of these genomic rearrangements, their association with other alterations and their effect on PCa disease progression. These discoveries raise the possibility that PCa might soon transition from a poorly understood, clinically heterogeneous disease to a collection of homogeneous subtypes identifiable by molecular criteria, and perhaps vulnerable to targeted therapies.

Journal ArticleDOI
TL;DR: It is argued that there is potential for prenatal whole genome sequencing to alter clinical practice in undesirable ways, especially in the short term, and is concerned that the technology could change the norms and expectations of pregnancy in ways that complicate parental autonomy and informed decision-making.
Abstract: Whole genome sequencing is quickly becoming more affordable and accessible, with the prospect of personal genome sequencing for under $1,000 now widely said to be in sight. The ethical issues raised by the use of this technology in the research context have received some significant attention, but little has been written on its use in the clinical context, and most of this analysis has been futuristic forecasting. This is problematic, given the speed with which whole genome sequencing technology is likely to be incorporated into clinical care. This paper explores one particular subset of these issues: the implications of adopting this technology in the prenatal context without a good understanding of when and how it is useful. Prenatal whole genome sequencing differs from current prenatal genetic testing practice in a number of ethically relevant ways. Most notably, whole genome sequencing would radically increase the volume and scope of available prenatal genetic data. The wealth of new data could enhance reproductive decision-making, promoting parents' freedom to make well-informed reproductive decisions. We argue, however, that there is potential for prenatal whole genome sequencing to alter clinical practice in undesirable ways, especially in the short term. We are concerned that the technology could (1) change the norms and expectations of pregnancy in ways that complicate parental autonomy and informed decision-making, (2) exacerbate the deleterious role that genetic determinism plays in child rearing, and (3) undermine children's future autonomy by removing the option of not knowing their genetic information without appropriate justification.

Journal ArticleDOI
TL;DR: It is found that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way tools are developed, in standard operating procedures, and in funding mechanisms.
Abstract: Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.