scispace - formally typeset
Search or ask a question
Author

Richard K. Wilson

Bio: Richard K. Wilson is an academic researcher from Nationwide Children's Hospital. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 173, co-authored 463 publications receiving 260000 citations. Previous affiliations of Richard K. Wilson include University of Washington & St. Jude Children's Research Hospital.
Topics: Genome, Gene, Exome sequencing, Genomics, Human genome


Papers
More filters
Journal ArticleDOI
23 Feb 2017-Cell
TL;DR: Findings suggest that hypomethylation is an initiating phenotype in AMLs with DNMT3AR882, whileDNMT3A-dependent CpG island hypermethylation is a consequence of AML progression.

165 citations

Journal ArticleDOI
TL;DR: Phylogenetic analysis of the high-copy repeats CR1, Galluhop, and Birddawg provided insight into two distinct genome dispersion strategies and exemplifies the power of the CBCS method to create representative databases for the repetitive fractions of genomes for which only limited sequence data is available.
Abstract: Cot-based cloning and sequencing (CBCS) is a powerful tool for isolating and characterizing the various repetitive components of any genome, combining the established principles of DNA reassociation kinetics with high-throughput sequencing. CBCS was used to generate sequence libraries representing the high, middle, and low-copy fractions of the chicken genome. Sequencing high-copy DNA of chicken to about 2.7 x coverage of its estimated sequence complexity led to the initial identification of several new repeat families, which were then used for a survey of the newly released first draft of the complete chicken genome. The analysis provided insight into the diversity and biology of known repeat structures such as CR1 and CNM, for which only limited sequence data had previously been available. Cot sequence data also resulted in the identification of four novel repeats (Birddawg, Hitchcock, Kronos, and Soprano), two new subfamilies of CR1 repeats, and many elements absent from the chicken genome assembly. Multiple autonomous elements were found for a novel Mariner-like transposon, Galluhop, in addition to nonautonomous deletion derivatives. Phylogenetic analysis of the high-copy repeats CR1, Galluhop, and Birddawg provided insight into two distinct genome dispersion strategies. This study also exemplifies the power of the CBCS method to create representative databases for the repetitive fractions of genomes for which only limited sequence data is available.

165 citations

Journal ArticleDOI
20 Apr 2011-JAMA
TL;DR: Whole-genome sequencing can identify novel, cryptic variants in cancer susceptibility genes in addition to providing unbiased information on the spectrum of mutations in a cancer genome.
Abstract: Context The identification of patients with inherited cancer susceptibility syndromes facilitates early diagnosis, prevention, and treatment. However, in many cases of suspected cancer susceptibility, the family history is unclear and genetic testing of common cancer susceptibility genes is unrevealing. Objective To apply whole-genome sequencing to a patient without any significant family history of cancer but with suspected increased cancer susceptibility because of multiple primary tumors to identify rare or novel germline variants in cancer susceptibility genes. Design, Setting, and Participant Skin (normal) and bone marrow (leukemia) DNA were obtained from a patient with early-onset breast and ovarian cancer (negative for BRCA1 and BRCA2 mutations) and therapy-related acute myeloid leukemia (t-AML) and analyzed with the following: whole-genome sequencing using paired-end reads, single-nucleotide polymorphism (SNP) genotyping, RNA expression profiling, and spectral karyotyping. Main Outcome Measures Structural variants, copy number alterations, single-nucleotide variants, and small insertions and deletions (indels) were detected and validated using the described platforms. Results Whole-genome sequencing revealed a novel, heterozygous 3-kilobase deletion removing exons 7-9 of TP53 in the patient's normal skin DNA, which was homozygous in the leukemia DNA as a result of uniparental disomy. In addition, a total of 28 validated somatic single-nucleotide variations or indels in coding genes, 8 somatic structural variants, and 12 somatic copy number alterations were detected in the patient's leukemia genome. Conclusion Whole-genome sequencing can identify novel, cryptic variants in cancer susceptibility genes in addition to providing unbiased information on the spectrum of mutations in a cancer genome.

163 citations

Journal ArticleDOI
TL;DR: The genome sequence provides important information regarding the ability of Cyanothece 51142 to accomplish metabolic compartmentalization and energy storage, as well as how a unicellular bacterium balances multiple, often incompatible, processes in a single cell.
Abstract: Unicellular cyanobacteria have recently been recognized for their contributions to nitrogen fixation in marine environments, a function previously thought to be filled mainly by filamentous cyanobacteria such as Trichodesmium. To begin a systems level analysis of the physiology of the unicellular N2-fixing microbes, we have sequenced to completion the genome of Cyanothece sp. ATCC 51142, the first such organism. Cyanothece 51142 performs oxygenic photosynthesis and nitrogen fixation, separating these two incompatible processes temporally within the same cell, while concomitantly accumulating metabolic products in inclusion bodies that are later mobilized as part of a robust diurnal cycle. The 5,460,377-bp Cyanothece 51142 genome has a unique arrangement of one large circular chromosome, four small plasmids, and one linear chromosome, the first report of a linear element in the genome of a photosynthetic bacterium. On the 429,701-bp linear chromosome is a cluster of genes for enzymes involved in pyruvate metabolism, suggesting an important role for the linear chromosome in fermentative processes. The annotation of the genome was significantly aided by simultaneous global proteomic studies of this organism. Compared with other nitrogen-fixing cyanobacteria, Cyanothece 51142 contains the largest intact contiguous cluster of nitrogen fixation-related genes. We discuss the implications of such an organization on the regulation of nitrogen fixation. The genome sequence provides important information regarding the ability of Cyanothece 51142 to accomplish metabolic compartmentalization and energy storage, as well as how a unicellular bacterium balances multiple, often incompatible, processes in a single cell.

159 citations

Journal ArticleDOI
TL;DR: The overall protein structure has been clarified and this comparative analysis derived structure will form the basis for the functional study of polycystin and its individual domains.
Abstract: PKD1 is the major locus of the common genetic disorder autosomal dominant polycystic kidney disease (ADPKD). Analysis of the predicted protein sequence of the human PKD1 gene, polycystin, shows a large molecule with a unique arrangement of extracellular domains and multiple putative transmembrane regions. The precise function of polycystin remains unclear with a paucity of mutations to define key structural and functional domains. To refine the structure of this protein we have cloned the genomic region encoding the Fugu PKD1 gene. Fugu PKD1 spans 36 kb of genomic DNA and has greater complexity with 54 exons compared with 46 in man. Comparative analysis of the predicted protein sequences shows a lower level of homology than in similar studies with identity of 40 and 59% similarity. However key structural motifs including leucine rich repeats (LRR), a C-type lectin and LDL-A like domains and 16 PKD repeats are maintained. A region of homology with the sea urchin REJ protein was also confirmed in Fugu but found to extend over 1000 amino acids. Several highly conserved intra- and extra-cellular regions, with no known sequence homologies, that are likely to be of functional importance were detected. The likely structure of the membrane associated region has been refined with similarity to the PKD2 protein and voltage gated Ca2+ and Na+ channels highlighted over part of this area. The overall protein structure has therefore been clarified and this comparative analysis derived structure will form the basis for the functional study of polycystin and its individual domains.

157 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations