scispace - formally typeset
Search or ask a question

Showing papers in "Genome Medicine in 2014"


Journal ArticleDOI
TL;DR: This work presents SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data, which is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment.
Abstract: Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. We include validation of SRST2 within a public health laboratory, and demonstrate its use for microbial genome surveillance in the hospital setting. In the face of rising threats of antimicrobial resistance and emerging virulence among bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/.

820 citations


Journal ArticleDOI
TL;DR: The findings demonstrate the ability to uncover novel associations from paired genome-microbiome data, and they suggest a complex link between host genetics and microbial dysbiosis in subjects with IBD across independent cohorts.
Abstract: Background: Human genetics and host-associated microbial communities have been associated independently with a wide range of chronic diseases One of the strongest associations in each case is inflammatory bowel disease (IBD), but disease risk cannot be explained fully by either factor individually Recent findings point to interactions between host genetics and microbial exposures as important contributors to disease risk in IBD These include evidence of the partial heritability of the gut microbiota and the conferral of gut mucosal inflammation by microbiome transplant even when the dysbiosis was initially genetically derived Although there have been several tests for association of individual genetic loci with bacterial taxa, there has been no direct comparison of complex genome-microbiome associations in large cohorts of patients with an immunity-related disease Methods: We obtained 16S ribosomal RNA (rRNA) gene sequences from intestinal biopsies as well as host genotype via Immunochip in three independent cohorts totaling 474 individuals We tested for correlation between relative abundance of bacterial taxa and number of minor alleles at known IBD risk loci, including fine mapping of multiple risk alleles in the Nucleotide-binding oligomerization domain-containing protein 2 (NOD2) gene exon We identified host polymorphisms whose associations with bacterial taxa were conserved across two or more cohorts, and we tested related genes for enrichment of host functional pathways Results: We identified and confirmed in two cohorts a significant association between NOD2 risk allele count and increased relative abundance of Enterobacteriaceae, with directionality of the effect conserved in the third cohort Forty-eight additional IBD-related SNPs have directionality of their associations with bacterial taxa significantly conserved across two or three cohorts, implicating genes enriched for regulation of innate immune response, the JAK-STAT cascade, and other immunity-related pathways Conclusions: These results suggest complex interactions between genetically altered host functional pathways and the structure of the microbiome Our findings demonstrate the ability to uncover novel associations from paired genome-microbiome data, and they suggest a complex link between host genetics and microbial dysbiosis in subjects with IBD across independent cohorts

332 citations


Journal ArticleDOI
TL;DR: A new algorithm called DawnRank is developed to directly prioritize altered genes on a single patient level and will help to discover personalized causal mutations that would otherwise be obscured by tumor heterogeneity.
Abstract: Large-scale cancer genomic studies have revealed that the genetic heterogeneity of the same type of cancer is greater than previously thought. A key question in cancer genomics is the identification of driver genes. Although existing methods have identified many common drivers, it remains challenging to predict personalized drivers to assess rare and even patient-specific mutations. We developed a new algorithm called DawnRank to directly prioritize altered genes on a single patient level. Applications to TCGA datasets demonstrated the effectiveness of our method. We believe DawnRank complements existing driver identification methods and will help us discover personalized causal mutations that would otherwise be obscured by tumor heterogeneity. Source code can be accessed at http://bioen-compbio.bioen.illinois.edu/DawnRank/.

203 citations


Journal ArticleDOI
TL;DR: This review focuses on the identification and interpretation of disease-susceptibility variants that influence enhancer function, and discusses strategies for prioritizing the study of functional enhancer SNPs over those likely to be benign.
Abstract: Gene enhancer elements are noncoding segments of DNA that play a central role in regulating transcriptional programs that control development, cell identity, and evolutionary processes. Recent studies have shown that noncoding single nucleotide polymorphisms (SNPs) that have been associated with risk for numerous common diseases through genome-wide association studies frequently lie in cell-type-specific enhancer elements. These enhancer variants probably influence transcriptional output, thereby offering a mechanistic basis to explain their association with risk for many common diseases. This review focuses on the identification and interpretation of disease-susceptibility variants that influence enhancer function. We discuss strategies for prioritizing the study of functional enhancer SNPs over those likely to be benign, review experimental and computational approaches to identifying the gene targets of enhancer variants, and highlight efforts to quantify the impact of enhancer variants on target transcript levels and cellular phenotypes. These studies are beginning to provide insights into the mechanistic basis of many common diseases, as well as into how we might translate this knowledge for improved disease diagnosis, prevention and treatments. Finally, we highlight five major challenges often associated with interpreting enhancer variants, and discuss recent technical advances that may help to surmount these challenges.

196 citations


Journal ArticleDOI
TL;DR: Approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells, and techniques to identify recurrent combinations of somatics mutations are described.
Abstract: High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer.

193 citations


Journal ArticleDOI
TL;DR: Evidence from copy number variant, exome sequencing and genome-wide association studies supports a gradient of neurodevelopmental psychopathology indexed by mutational load or mutational severity, and cognitive impairment.
Abstract: Psychiatric disorders such as schizophrenia, bipolar disorder, major depressive disorder, attention-deficit/hyperactivity disorder and autism spectrum disorder are common and result in significant morbidity and mortality. Although currently classified into distinct disorder categories, they show clinical overlap and familial co-aggregation, and share genetic risk factors. Recent advances in psychiatric genomics have provided insight into the potential mechanisms underlying the overlap between these disorders, implicating genes involved in neurodevelopment, synaptic plasticity, learning and memory. Furthermore, evidence from copy number variant, exome sequencing and genome-wide association studies supports a gradient of neurodevelopmental psychopathology indexed by mutational load or mutational severity, and cognitive impairment. These findings have important implications for psychiatric research, highlighting the need for new approaches to stratifying patients for research. They also point the way for work aiming to advance our understanding of the pathways from genotype to clinical phenotype, which will be required in order to inform new classification systems and to develop novel therapeutic strategies.

193 citations


Journal ArticleDOI
TL;DR: Transethnic GWASs enable prioritization of candidate genes, fine-mapping of functional variants, and potentially identification of SNPs associated with disease risk in admixed populations, by taking advantage of natural differences in genomic linkage disequilibrium across ethnically diverse populations.
Abstract: Genome-wide association studies (GWASs) are the method most often used by geneticists to interrogate the human genome, and they provide a cost-effective way to identify the genetic variants underpinning complex traits and diseases. Most initial GWASs have focused on genetically homogeneous cohorts from European populations given the limited availability of ethnic minority samples and so as to limit population stratification effects. Transethnic studies have been invaluable in explaining the heritability of common quantitative traits, such as height, and in examining the genetic architecture of complex diseases, such as type 2 diabetes. They provide an opportunity for large-scale signal replication in independent populations and for cross-population meta-analyses to boost statistical power. In addition, transethnic GWASs enable prioritization of candidate genes, fine-mapping of functional variants, and potentially identification of SNPs associated with disease risk in admixed populations, by taking advantage of natural differences in genomic linkage disequilibrium across ethnically diverse populations. Recent efforts to assess the biological function of variants identified by GWAS have highlighted the need for large-scale replication, meta-analyses and fine-mapping across worldwide populations of ethnically diverse genetic ancestries. Here, we review recent advances and new approaches that are important to consider when performing, designing or interpreting transethnic GWASs, and we highlight existing challenges, such as the limited ability to handle heterogeneity in linkage disequilibrium across populations and limitations in dissecting complex architectures, such as those found in recently admixed populations.

182 citations


Journal ArticleDOI
TL;DR: Hypomethylated blocks are a universal feature of common solid human cancer, and that they occur at the earliest stage of premalignant tumors and progress through clinical stages of thyroid and colon cancer development.
Abstract: One of the most provocative recent observations in cancer epigenetics is the discovery of large hypomethylated blocks, including single copy genes, in colorectal cancer, that correspond in location to heterochromatic LOCKs (large organized chromatin lysine-modifications) and LADs (lamin-associated domains). Here we performed a comprehensive genome-scale analysis of 10 breast, 28 colon, nine lung, 38 thyroid, 18 pancreas cancers, and five pancreas neuroendocrine tumors as well as matched normal tissue from most of these cases, as well as 51 premalignant lesions. We used a new statistical approach that allows the identification of large hypomethylated blocks on the Illumina HumanMethylation450 BeadChip platform. We find that hypomethylated blocks are a universal feature of common solid human cancer, and that they occur at the earliest stage of premalignant tumors and progress through clinical stages of thyroid and colon cancer development. We also find that the disrupted CpG islands widely reported previously, including hypermethylated island bodies and hypomethylated shores, are enriched in hypomethylated blocks, with flattening of the methylation signal within and flanking the islands. Finally, we found that genes showing higher between individual gene expression variability are enriched within these hypomethylated blocks. Thus hypomethylated blocks appear to be a universal defining epigenetic alteration in human cancer, at least for common solid tumors.

174 citations


Journal ArticleDOI
TL;DR: The extent of differences in annotation of 80 million variants from a whole-genome sequencing study is quantified and the types of apparent errors made by Annovar and VEP are characterised and discussed to discuss their impact on the analysis of DNA variants in genome sequencing studies.
Abstract: Variant annotation is a crucial step in the analysis of genome sequencing data. Functional annotation results can have a strong influence on the ultimate conclusions of disease studies. Incorrect or incomplete annotations can cause researchers both to overlook potentially disease-relevant DNA variants and to dilute interesting variants in a pool of false positives. Researchers are aware of these issues in general, but the extent of the dependency of final results on the choice of transcripts and software used for annotation has not been quantified in detail. This paper quantifies the extent of differences in annotation of 80 million variants from a whole-genome sequencing study. We compare results using the RefSeq and Ensembl transcript sets as the basis for variant annotation with the software Annovar, and also compare the results from two annotation software packages, Annovar and VEP (Ensembl’s Variant Effect Predictor), when using Ensembl transcripts. We found only 44% agreement in annotations for putative loss-of-function variants when using the RefSeq and Ensembl transcript sets as the basis for annotation with Annovar. The rate of matching annotations for loss-of-function and nonsynonymous variants combined was 79% and for all exonic variants it was 83%. When comparing results from Annovar and VEP using Ensembl transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy. Using these comparisons, we characterised the types of apparent errors made by Annovar and VEP and discuss their impact on the analysis of DNA variants in genome sequencing studies. Variant annotation is not yet a solved problem. Choice of transcript set can have a large effect on the ultimate variant annotations obtained in a whole-genome sequencing study. Choice of annotation software can also have a substantial effect. The annotation step in the analysis of a genome sequencing study must therefore be considered carefully, and a conscious choice made as to which transcript set and software are used for annotation.

172 citations


Journal ArticleDOI
TL;DR: This review will explain and provide examples of how network-based analyses of omics data, in combination with functional and clinical studies, are aiding the understanding of disease, as well as helping to prioritize diagnostic markers or therapeutic candidate genes.
Abstract: Many common diseases, such as asthma, diabetes or obesity, involve altered interactions between thousands of genes. High-throughput techniques (omics) allow identification of such genes and their products, but functional understanding is a formidable challenge. Network-based analyses of omics data have identified modules of disease-associated genes that have been used to obtain both a systems level and a molecular understanding of disease mechanisms. For example, in allergy a module was used to find a novel candidate gene that was validated by functional and clinical studies. Such analyses play important roles in systems medicine. This is an emerging discipline that aims to gain a translational understanding of the complex mechanisms underlying common diseases. In this review, we will explain and provide examples of how network-based analyses of omics data, in combination with functional and clinical studies, are aiding our understanding of disease, as well as helping to prioritize diagnostic markers or therapeutic candidate genes. Such analyses involve significant problems and limitations, which will be discussed. We also highlight the steps needed for clinical implementation.

171 citations


Journal ArticleDOI
TL;DR: This work characterized whole genome sequencing, whole exome sequencing, and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors and developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls.
Abstract: INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%). Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data. Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.

Journal ArticleDOI
TL;DR: Advances in epigenetic studies of discordant MZ twins, focusing on disease, help to identify epigenetic markers of environmental risk and molecular mechanisms involved in disease and disease progression, which have implications both for understanding disease and for future medical research.
Abstract: Monozygotic (MZ) twins share nearly all of their genetic variants and many similar environments before and after birth. However, they can also show phenotypic discordance for a wide range of traits. Differences at the epigenetic level may account for such discordances. It is well established that epigenetic states can contribute to phenotypic variation, including disease. Epigenetic states are dynamic and potentially reversible marks involved in gene regulation, which can be influenced by genetics, environment, and stochastic events. Here, we review advances in epigenetic studies of discordant MZ twins, focusing on disease. The study of epigenetics and disease using discordant MZ twins offers the opportunity to control for many potential confounders encountered in general population studies, such as differences in genetic background, early-life environmental exposure, age, gender, and cohort effects. Recently, analysis of disease-discordant MZ twins has been successfully used to study epigenetic mechanisms in aging, cancer, autoimmune disease, psychiatric, neurological, and multiple other traits. Epigenetic aberrations have been found in a range of phenotypes, and challenges have been identified, including sampling time, tissue specificity, validation, and replication. The results have relevance for personalized medicine approaches, including the identification of prognostic, diagnostic, and therapeutic targets. The findings also help to identify epigenetic markers of environmental risk and molecular mechanisms involved in disease and disease progression, which have implications both for understanding disease and for future medical research.

Journal ArticleDOI
TL;DR: In more typical late-onset PD, chronic dysfunction in synaptic transmission, early endosomal trafficking and receptor recycling, as well as chaperone-mediated autophagy, provide a unifying synthesis of the molecular pathways involved.
Abstract: Parkinson’s disease (PD) is a progressively debilitating neurodegenerative syndrome. Although best described as a movement disorder, the condition has prominent autonomic, cognitive, psychiatric, sensory and sleep components. Striatal dopaminergic innervation and nigral neurons are progressively lost, with associated Lewy pathology readily apparent on autopsy. Nevertheless, knowledge of the molecular events leading to this pathophysiology is limited. Current therapies offer symptomatic benefit but they fail to slow progression and patients continue to deteriorate. Recent discoveries in sporadic, Mendelian and more complex forms of parkinsonism provide novel insight into disease etiology; 28 genes, including those encoding alpha-synuclein (SNCA), leucine-rich repeat kinase 2 (LRRK2) and microtubule-associated protein tau (MAPT), have been linked and/or associated with PD. A consensus regarding the affected biological pathways and molecular processes has also started to emerge. In early-onset and more a typical PD, deficits in mitophagy pathways and lysosomal function appear to be prominent. By contrast, in more typical late-onset PD, chronic, albeit subtle, dysfunction in synaptic transmission, early endosomal trafficking and receptor recycling, as well as chaperone-mediated autophagy, provide a unifying synthesis of the molecular pathways involved. Disease-modification (neuroprotection) is no longer such an elusive goal given the unparalleled opportunity for diagnosis, translational neuroscience and therapeutic development provided by genetic discovery.

Journal ArticleDOI
TL;DR: Recent advances in the global profiling of tumor genomes for aberrant DNA methylation and the integration of these data with cancer genome profiling data are discussed, potential mechanisms leading to different methylation subgroups are highlighted, and how this information can be used in basic research and for translational applications are highlighted.
Abstract: The comparison of DNA methylation patterns across cancer types (pan-cancer methylome analyses) has revealed distinct subgroups of tumors that share similar methylation patterns. Integration of these data with the wealth of information derived from cancer genome profiling studies performed by large international consortia has provided novel insights into the cellular aberrations that contribute to cancer development. There is evidence that genetic mutations in epigenetic regulators (such as DNMT3, IDH1/2 or H3.3) mediate or contribute to these patterns, although a unifying molecular mechanism underlying the global alterations of DNA methylation has largely been elusive. Knowledge gained from pan-cancer methylome analyses will aid the development of diagnostic and prognostic biomarkers, improve patient stratification and the discovery of novel druggable targets for therapy, and will generate hypotheses for innovative clinical trial designs based on methylation subgroups rather than on cancer subtypes. In this review, we discuss recent advances in the global profiling of tumor genomes for aberrant DNA methylation and the integration of these data with cancer genome profiling data, highlight potential mechanisms leading to different methylation subgroups, and show how this information can be used in basic research and for translational applications. A remaining challenge is to experimentally prove the functional link between observed pan-cancer methylation patterns, the associated genetic aberrations, and their relevance for the development of cancer.

Journal ArticleDOI
TL;DR: CNVs are an important cause of NSHL and their detection must be included in comprehensive genetic testing for hearing loss, as well as incorporated as part of the standard analysis pipeline.
Abstract: Copy number variants (CNVs) are a well-recognized cause of genetic disease; however, methods for their identification are often gene-specific, excluded as ‘routine’ in screens of genetically heterogeneous disorders, and not implemented in most next-generation sequencing pipelines. For this reason, the contribution of CNVs to non-syndromic hearing loss (NSHL) is most likely under-recognized. We aimed to incorporate a method for CNV identification as part of our standard analysis pipeline and to determine the contribution of CNVs to genetic hearing loss. We used targeted genomic enrichment and massively parallel sequencing to isolate and sequence all exons of all genes known to cause NSHL. We completed testing on 686 patients with hearing loss with no exclusions based on type of hearing loss or any other clinical features. For analysis we used an integrated method for detection of single nucleotide changes, indels and CNVs. CNVs were identified using a previously published method that utilizes median read-depth ratios and a sliding-window approach. Of 686 patients tested, 15.2% (104) carried at least one CNV within a known deafness gene. Of the 38.9% (267) of individuals for whom we were able to determine a genetic cause of hearing loss, a CNV was implicated in 18.7% (50). We identified CNVs in 16 different genes including 7 genes for which no CNVs have been previously reported. CNVs of STRC were most common (73% of CNVs identified) followed by CNVs of OTOA (13% of CNVs identified). CNVs are an important cause of NSHL and their detection must be included in comprehensive genetic testing for hearing loss.

Journal ArticleDOI
TL;DR: This review suggests patient engagement should be quantified as part of a comprehensive health risk appraisal given its apparent value in helping individuals to effectively self-manage chronic disease.
Abstract: The role of patient engagement as an important risk factor for healthcare outcomes has not been well established. The objective of this article was to systematically review the relationship between patient engagement and health outcomes in chronic disease to determine whether patient engagement should be quantified as an important risk factor in health risk appraisals to enhance the practice of personalized medicine. A systematic review of prospective clinical trials conducted between January 1993 and December 2012 was performed. Articles were identified through a medical librarian-conducted multi-term search of Medline, Embase, and Cochrane databases. Additional studies were obtained from the references of meta-analyses and systematic reviews on hypertension, diabetes, and chronic care. Search terms included variations of the following: self-care, self-management, self-monitoring, (shared) decision-making, patient education, patient motivation, patient engagement, chronic disease, chronically ill, and randomized controlled trial. Studies were included only if they: (1) compared patient engagement interventions to an appropriate control among adults with chronic disease aged 18 years and older; (2) had minimum 3 months between pre- and post-intervention measurements; and (3) defined patient engagement as: (a) understanding the importance of taking an active role in one’s health and health care; (b) having the knowledge, skills, and confidence to manage health; and (c) using knowledge, skills and confidence to perform health-promoting behaviors. Three authors and two research assistants independently extracted data using predefined fields including quality metrics. We reviewed 543 abstracts to identify 10 trials that met full inclusion criteria, four of which had ‘high’ methodological quality (Jadad score ≥ 3). Diverse measurement of patient engagement prevented robust statistical analyses, so data were qualitatively described. Nine studies documented improvements in patient engagement. Five studies reported reduction in clinical markers of disease (for example HbA1C). All studies reported improvements in self-reported health status. This review suggests patient engagement should be quantified as part of a comprehensive health risk appraisal given its apparent value in helping individuals to effectively self-manage chronic disease. Patient engagement measures should include assessment of the knowledge, confidence and skills to prevent and manage chronic disease, plus the behaviors to do so.

Journal ArticleDOI
TL;DR: It is argued that this view that epistasis has little role in the genetic architecture of complex human disease is a misconception and why exploring epistasis is likely to be crucial to understanding and predicting complex disease.
Abstract: Epistasis has been dismissed by some as having little role in the genetic architecture of complex human disease. The authors argue that this view is the result of a misconception and explain why exploring epistasis is likely to be crucial to understanding and predicting complex disease.

Journal ArticleDOI
TL;DR: The results indicate that changes in the conjunctival microbiome occur in trachomatous disease; whether these are a cause or a consequence is yet unknown.
Abstract: Background: Trachoma, caused by Chlamydia trachomatis, remains the world’s leading infectious cause of blindness. Repeated ocular infection during childhood leads to scarring of the conjunctiva, in-turning of the eyelashes (trichiasis) and corneal opacity in later life. There is a growing body of evidence to suggest non-chlamydial bacteria are associated with clinical signs of trachoma, independent of C. trachomatis infection. Methods: We used deep sequencing of the V1-V3 region of the bacterial 16S rRNA gene to characterize the microbiome of the conjunctiva of 220 residents of The Gambia, 105 with healthy conjunctivae and 115 with clinical signs of trachoma in the absence of detectable C. trachomatis infection. Deep sequencing was carried out using the Roche-454 platform. Sequence data were processed and analyzed through a pipeline developed by the Human Microbiome Project. Results: The microbiome of healthy participants was influenced by age and season of sample collection with increased richness and diversity seen in younger participants and in samples collected during the dry season. Decreased diversity and an increased abundance of Corynebacterium and Streptococcus were seen in participants with conjunctival scarring compared to normal controls. Abundance of Corynebacterium was higher still in adults with scarring and trichiasis compared to adults with scarring only. Conclusions: Our results indicate that changes in the conjunctival microbiome occur in trachomatous disease; whether these are a cause or a consequence is yet unknown. Background Trachoma, caused by the bacterium Chlamydia trachomatis, is characterized by recurrent episodes of chronic follicular conjunctivitis. Repeated infection during childhood can lead to scarring of the conjunctiva and the blinding complications of trachomatous trichiasis (TT) and corneal opacification in later life. Persistent, severe inflammation is a contributing factor to progressive scarring yet ocular C. trachomatis infection is rarely detected

Journal ArticleDOI
TL;DR: It is found that people with asthma who experience acute respiratory illness-induced exacerbations are characterized by a reduced but prolonged inflammatory immune response, inadequate activation of mucosal repair, and the expression of a newly described exacerbation-specific transcriptional signature.
Abstract: Background: Acute respiratory illness is the leading cause of asthma exacerbations yet the mechanisms underlying this association remain unclear. To address the deficiencies in our understanding of the molecular events characterizing acute respiratory illness-induced asthma exacerbations, we undertook a transcriptional profiling study of the nasal mucosa over the course of acute respiratory illness amongst individuals with a history of asthma, allergic rhinitis and no underlying respiratory disease. Methods: Transcriptional profiling experiments were performed using the Agilent Whole Human Genome 4X44K array platform. Time point-based microarray and principal component analyses were conducted to identify and distinguish acute respiratory illness-associated transcriptional profiles over the course of our study. Gene enrichment analysis was conducted to identify biological processes over-represented within each acute respiratory illness-associated profile, and gene expression was subsequently confirmed by quantitative polymerase chain reaction. Results: We found that acute respiratory illness is characterized by dynamic, time-specific transcriptional profiles whose magnitudes of expression are influenced by underlying respiratory disease and the mucosal repair signature evoked during acute respiratory illness. Most strikingly, we report that people with asthma who experience acute respiratory illness-induced exacerbations are characterized by a reduced but prolonged inflammatory immune response, inadequate activation of mucosal repair, and the expression of a newly described exacerbation-specific transcriptional signature. Conclusion: Findings from our study represent a significant contribution towards clarifying the complex molecular interactions that typify acute respiratory illness-induced asthma exacerbations.

Journal ArticleDOI
TL;DR: The current achievements of genomics in the development of improved diagnostic tools, including those that are now available in the clinic, such as the design of PCR assays for the detection of microbial pathogens, virulence factors or antibiotic-resistance determinants, or the designof optimized culture media for ‘unculturable’ pathogens are reviewed.
Abstract: The availability of genome sequences obtained using next-generation sequencing (NGS) has revolutionized the field of infectious diseases. Indeed, more than 38,000 bacterial and 5,000 viral genomes have been sequenced to date, including representatives of all significant human pathogens. These tremendous amounts of data have not only enabled advances in fundamental biology, helping to understand the pathogenesis of microorganisms and their genomic evolution, but have also had implications for clinical microbiology. Here, we first review the current achievements of genomics in the development of improved diagnostic tools, including those that are now available in the clinic, such as the design of PCR assays for the detection of microbial pathogens, virulence factors or antibiotic-resistance determinants, or the design of optimized culture media for ‘unculturable’ pathogens. We then review the applications of genomics to the investigation of outbreaks, either through the design of genotyping assays or the direct sequencing of the causative strains. Finally, we discuss how genomics might change clinical microbiology in the future.

Journal ArticleDOI
TL;DR: The importance of the IL-1 signaling pathway and a prominent signature of innate immunity and cell migration in the acute phase of the illness was revealed and a potential therapeutic target was identified.
Abstract: Background: Global gene expression profiling can provide insight into the underlying pathophysiology of disease processes. Kawasaki disease (KD) is an acute, self-limited vasculitis whose etiology remains unknown. Although the clinical illness shares certain features with other pediatric infectious diseases, the occurrence of coronary artery aneurysms in 25% of untreated patients is unique to KD. Methods: To gain further insight into the molecular mechanisms underlying KD, we investigated the acute and convalescent whole blood transcriptional profiles of 146 KD subjects and compared them with the transcriptional profiles of pediatric patients with confirmed bacterial or viral infection, and with healthy control children. We also investigated the transcript abundance in patients with different intravenous immunoglobulin treatment responses and different coronary artery outcomes. Results: The overwhelming signature for acute KD involved signaling pathways of the innate immune system. Comparison with other acute pediatric infections highlighted the importance of pathways involved in cell motility including paxillin, relaxin, actin, integrins, and matrix metalloproteinases. Most importantly, the IL1β pathway was identified as a potential therapeutic target. Conclusion: Our study revealed the importance of the IL-1 signaling pathway and a prominent signature of innate immunity and cell migration in the acute phase of the illness.

Journal ArticleDOI
TL;DR: Bacterial GWASs are about to come of age thanks to the availability of massive datasets, and because of the potential to bridge genomics and traditional genetic approaches that is provided by improving validation strategies.
Abstract: Genome-wide association studies (GWASs) have become an increasingly important approach for eukaryotic geneticists, facilitating the identification of hundreds of genetic polymorphisms that are responsible for inherited diseases. Despite the relative simplicity of bacterial genomes, the application of GWASs to identify polymorphisms responsible for important bacterial phenotypes has only recently been made possible through advances in genome sequencing technologies. Bacterial GWASs are now about to come of age thanks to the availability of massive datasets, and because of the potential to bridge genomics and traditional genetic approaches that is provided by improving validation strategies. A small number of pioneering GWASs in bacteria have been published in the past 2 years, examining from 75 to more than 3,000 strains. The experimental designs have been diverse, taking advantage of different processes in bacteria for generating variation. Analysis of data from bacterial GWASs can, to some extent, be performed using software developed for eukaryotic systems, but there are important differences in genome evolution that must be considered. The greatest experimental advantage of bacterial GWASs is the potential to perform downstream validation of causality and dissection of mechanism. We review the recent advances and remaining challenges in this field and propose strategies to improve the validation of bacterial GWASs.

Journal ArticleDOI
TL;DR: This work built a classifier that integrates a variety of genomic and systematic datasets to prioritize drug targets specific for breast, pancreatic and ovarian cancer and selected a set of targets that are amenable to inhibition by small molecules, antibodies and synthetic peptides.
Abstract: We present an integrated approach that predicts and validates novel anti-cancer drug targets. We first built a classifier that integrates a variety of genomic and systematic datasets to prioritize drug targets specific for breast, pancreatic and ovarian cancer. We then devised strategies to inhibit these anti-cancer drug targets and selected a set of targets that are amenable to inhibition by small molecules, antibodies and synthetic peptides. We validated the predicted drug targets by showing strong anti-proliferative effects of both synthetic peptide and small molecule inhibitors against our predicted targets.

Journal ArticleDOI
TL;DR: It is shown that co-expression analyses of lncRNAs and protein-coding genes can predict the signaling pathways in which these AID-associated lnc RNAs are involved and suggests that lncRNA genes should be studied in more detail to interpret GWAS findings correctly.
Abstract: Although genome-wide association studies (GWAS) have identified hundreds of variants associated with a risk for autoimmune and immune-related disorders (AID), our understanding of the disease mechanisms is still limited. In particular, more than 90% of the risk variants lie in non-coding regions, and almost 10% of these map to long non-coding RNA transcripts (lncRNAs). lncRNAs are known to show more cell-type specificity than protein-coding genes. We aimed to characterize lncRNAs and protein-coding genes located in loci associated with nine AIDs which have been well-defined by Immunochip analysis and by transcriptome analysis across seven populations of peripheral blood leukocytes (granulocytes, monocytes, natural killer (NK) cells, B cells, memory T cells, naive CD4+ and naive CD8+ T cells) and four populations of cord blood-derived T-helper cells (precursor, primary, and polarized (Th1, Th2) T-helper cells). We show that lncRNAs mapping to loci shared between AID are significantly enriched in immune cell types compared to lncRNAs from the whole genome (α <0.005). We were not able to prioritize single cell types relevant for specific diseases, but we observed five different cell types enriched (α <0.005) in five AID (NK cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, and psoriasis; memory T and CD8+ T cells in juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis; Th0 and Th2 cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis). Furthermore, we show that co-expression analyses of lncRNAs and protein-coding genes can predict the signaling pathways in which these AID-associated lncRNAs are involved. The observed enrichment of lncRNA transcripts in AID loci implies lncRNAs play an important role in AID etiology and suggests that lncRNA genes should be studied in more detail to interpret GWAS findings correctly. The co-expression results strongly support a model in which the lncRNA and protein-coding genes function together in the same pathways.

Journal ArticleDOI
TL;DR: MNA genome-wide mutational analysis reveals genetic alterations distinct from colorectal cancer, in support of its unique pathophysiology and suggests new targeted therapeutic opportunities.
Abstract: Mucinous neoplasms of the appendix (MNA) are rare tumors which may progress from benign to malignant disease with an aggressive biological behavior. MNA is often diagnosed after metastasis to the peritoneal surfaces resulting in mucinous carcinomatosis peritonei (MCP). Genetic alterations in MNA are poorly characterized due to its low incidence, the hypo-cellularity of MCPs, and a lack of relevant pre-clinical models. As such, application of targeted therapies to this disease is limited to those developed for colorectal cancer and not based on molecular rationale. We sequenced the whole exomes of 10 MCPs of appendiceal origin to identify genome-wide somatic mutations and copy number aberrations and validated significant findings in 19 additional cases. Our study demonstrates that MNA has a different molecular makeup than colorectal cancer. Most tumors have co-existing oncogenic mutations in KRAS (26/29) and GNAS (20/29) and are characterized by downstream PKA activation. High-grade tumors are GNAS wild-type (5/6), suggesting they do not progress from low-grade tumors. MNAs do share some genetic alterations with colorectal cancer including gain of 1q (5/10), Wnt, and TGFβ pathway alterations. In contrast, mutations in TP53 (1/10) and APC (0/10), common in colorectal cancer, are rare in MNA. Concurrent activation of the KRAS and GNAS mediated signaling pathways appears to be shared with pancreatic intraductal papillary mucinous neoplasm. MNA genome-wide mutational analysis reveals genetic alterations distinct from colorectal cancer, in support of its unique pathophysiology and suggests new targeted therapeutic opportunities.

Journal ArticleDOI
TL;DR: A large retrospective meta-analysis on 466 PDAC patients revealed a 36-gene classifier able to prognosticate PDAC independent of patient cohort and microarray platforms and is likely to reveal true molecular candidates for PDAC therapeutics.
Abstract: Background Improved usage of the repertoires of pancreatic ductal adenocarcinoma (PDAC) profiles is crucially needed to guide the development of predictive and prognostic tools that could inform the selection of treatment options.

Journal ArticleDOI
TL;DR: Three CMAP-based methods are evaluated on their prediction performance against a curated dataset of 890 true drug-indication pairs to demonstrate the power of connectivity map in classifying known drug-disease relationships.
Abstract: Connectivity map data and associated methodologies have become a valuable tool in understanding drug mechanism of action (MOA) and discovering new indications for drugs. One of the key ideas of connectivity map (CMAP) is to measure the connectivity between disease gene expression signatures and compound-induced gene expression profiles. Despite multiple impressive anecdotal validations, only a few systematic evaluations have assessed the accuracy of this aspect of CMAP, and most of these utilize drug-to-drug matching to transfer indications across the two drugs. To assess CMAP methodologies in a more direct setting, namely the power of classifying known drug-disease relationships, we evaluated three CMAP-based methods on their prediction performance against a curated dataset of 890 true drug-indication pairs. The disease signatures were generated using Gene Logic BioExpress™ system and the compound profiles were derived from the Connectivity Map database (CMAP, build 02, http://www.broadinstitute.org/CMAP/ ). The similarity scoring algorithm called eXtreme Sum (XSum) performs better than the standard Kolmogorov-Smirnov (KS) statistic in terms of the area under curve and can achieve a four-fold enrichment at 0.01 false positive rate level, with AUC = 2.2E-4, P value = 0.0035. Connectivity map can significantly enrich true positive drug-indication pairs given an effective matching algorithm.

Journal ArticleDOI
TL;DR: A pipeline for analyzing diverse genome-scale datasets from microarray, deep sequencing, and restriction site associated DNA sequence experiments for clinical and laboratory strains of Candida albicans, the most prevalent human fungal pathogen.
Abstract: The design of effective antimicrobial therapies for serious eukaryotic pathogens requires a clear understanding of their highly variable genomes. To facilitate analysis of copy number variations, single nucleotide polymorphisms and loss of heterozygosity events in these pathogens, we developed a pipeline for analyzing diverse genome-scale datasets from microarray, deep sequencing, and restriction site associated DNA sequence experiments for clinical and laboratory strains of Candida albicans, the most prevalent human fungal pathogen. The YMAP pipeline (http://lovelace.cs.umn.edu/Ymap/) automatically illustrates genome-wide information in a single intuitive figure and is readily modified for the analysis of other pathogens with small genomes.

Journal ArticleDOI
TL;DR: It is found that public non-anonymous data is valuable and leads to a participatory research model, which the implementation of this model is greatly facilitated by web-based tools and methods and participant education.
Abstract: Background: Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an ‘open consent’ framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment. Discussion: Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, selfpursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project. Summary: We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants.

Journal ArticleDOI
TL;DR: Advances in interactomic approaches for viral infections are reviewed, focusing on high-throughput screening technologies and on the generation of high-quality datasets, showing how these are already beginning to offer intriguing perspectives in terms of virus-host cell biology and the control of cellular functions.
Abstract: The current therapeutic arsenal against viral infections remains limited, with often poor efficacy and incomplete coverage, and appears inadequate to face the emergence of drug resistance. Our understanding of viral biology and pathophysiology and our ability to develop a more effective antiviral arsenal would greatly benefit from a more comprehensive picture of the events that lead to viral replication and associated symptoms. Towards this goal, the construction of virus-host interactomes is instrumental, mainly relying on the assumption that a viral infection at the cellular level can be viewed as a number of perturbations introduced into the host protein network when viral proteins make new connections and disrupt existing ones. Here, we review advances in interactomic approaches for viral infections, focusing on high-throughput screening (HTS) technologies and on the generation of high-quality datasets. We show how these are already beginning to offer intriguing perspectives in terms of virus-host cell biology and the control of cellular functions, and we conclude by offering a summary of the current situation regarding the potential development of host-oriented antiviral therapeutics.