scispace - formally typeset
Search or ask a question
Author

Richard K. Wilson

Bio: Richard K. Wilson is an academic researcher from Nationwide Children's Hospital. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 173, co-authored 463 publications receiving 260000 citations. Previous affiliations of Richard K. Wilson include University of Washington & St. Jude Children's Research Hospital.
Topics: Genome, Gene, Exome sequencing, Genomics, Human genome


Papers
More filters
Journal ArticleDOI
TL;DR: The results of these initial experiments suggested that low-pressure shearing offered a useful alternative to sonic and enzymatic DNA fragmentation methods, and additional DNA sequencing experiments using subclones produced by the low pressure-shearing method are in progress.
Abstract: Several methods have been described for random fragmentation of DNA. These methods, often used for library preparation and subcloning prior to DNA sequence analysis, include sonic treatment (1, 2), partial digestion by restriction endonucleases (3) and treatment with DNase I in the presence of manganese ions (4). While all of these methods have been used successfully to prepare random DNA fragments for further manipulation and analysis, each has difficulties and limitations. In an effort to minimize template DNA preparation tasks and simplify primerand PCR-directed DNA closure methods after an initial shotgun sequencing approach, we wished to prepare random subclones containing inserts with an average size of 4 to 6 kilobase pairs (kb). As an alternative approach, several different DNA samples were passed through a small French pressure cell at a variety of low to intermediate pressures (Figure lb). A lever device was constructed to allow controlled application of low to intermediate pressures to the cell (Figure la). The results of these initial experiments suggested that low-pressure shearing offered a useful alternative to sonic and enzymatic DNA fragmentation methods. Subsequently, regions of the Caenorhabditis elegans genome cloned in cosmid vectors were sheared using an application of 250 psi. Shearing experiments with three different C. elegans cosmid clones (insert sizes ca. 35 —42 kb) all produced essentially the same results (data not shown). The sheared cosmid DNA fragments were made flush with T4 DNA polymerase in the presence of 100 /tM dNTPs (2), and DNA fragments of the desired size range were purified by preparative agarose gel electrophoresis and subcloned in the Hindi (Sail) site of the phagemid vector pUC118. To check the efficiency of this subcloning method, 109 of these subclones were examined by standard plasmid mini-prep and agarose gel electrophoresis procedures. 101 (92%) subclones contained an insert of the expected 4 to 6 kb size range. 72 subclone DNAs were sequenced using a linear amplification method with fluorescent dye-labeled primers. Identical subclones were not observed in this analysis, and no sequence-specific shearing hot spots were detected. Additional DNA sequencing experiments using subclones produced by the low pressure-shearing method are in progress in order to determine the complete nucleotide sequence of a 100 kb region in the large cluster of C. elegans chromosome HI.

64 citations

Journal ArticleDOI
TL;DR: A software package, BreakFusion that combines the strength of reference alignment followed by read-pair analysis and de novo assembly to achieve a good balance in sensitivity, specificity and computational efficiency is presented.
Abstract: Summary: Despite recent progress, computational tools that identify gene fusions from next-generation whole transcriptome sequencing data are often limited in accuracy and scalability. Here, we present a software package, BreakFusion that combines the strength of reference alignment followed by read-pair analysis and de novo assembly to achieve a good balance in sensitivity, specificity and computational efficiency. Availability: http://bioinformatics.mdanderson.org/main/BreakFusion Contact: gro.nosrednadm@3nehck; ude.ltsuw.emoneg@gnidl Supplementary information: Supplementary data are available at Bioinformatics online

64 citations

Journal ArticleDOI
TL;DR: 83 novel putative platypus venom genes from 13 toxin families, which are homologous to known toxins from a wide range of vertebrates and invertebrates are identified, providing insight into the evolution of mammalian venom.
Abstract: To date, few peptides in the complex mixture of platypus venom have been identified and sequenced, in part due to the limited amounts of platypus venom available to study. We have constructed and sequenced a cDNA library from an active platypus venom gland to identify the remaining components. We identified 83 novel putative platypus venom genes from 13 toxin families, which are homologous to known toxins from a wide range of vertebrates (fish, reptiles, insectivores) and invertebrates (spiders, sea anemones, starfish). A number of these are expressed in tissues other than the venom gland, and at least three of these families (those with homology to toxins from distant invertebrates) may play non-toxin roles. Thus, further functional testing is required to confirm venom activity. However, the presence of similar putative toxins in such widely divergent species provides further evidence for the hypothesis that there are certain protein families that are selected preferentially during evolution to become venom peptides. We have also used homology with known proteins to speculate on the contributions of each venom component to the symptoms of platypus envenomation. This study represents a step towards fully characterizing the first mammal venom transcriptome. We have found similarities between putative platypus toxins and those of a number of unrelated species, providing insight into the evolution of mammalian venom.

63 citations

Journal ArticleDOI
TL;DR: CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes and is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies.
Abstract: Motivation: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. Results: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes. Availability: The R and C programs implementing our method are available at https://dsgweb.wustl.edu/qunyuan/software/cmds. Contact: ude.ltsuw@nauynuq Supplementary information: Supplementary data are available at Bioinformatics online.

62 citations

Journal ArticleDOI
TL;DR: Compared genomic analyses validate established evolutionary relationships and sub-genera and provide insight into the evolutionary biology underlying novel adaptations and are relevant to applied aspects of vector control such as trap design and discovery of novel pest and disease control strategies.
Abstract: Tsetse flies (Glossina sp.) are the vectors of human and animal trypanosomiasis throughout sub-Saharan Africa. Tsetse flies are distinguished from other Diptera by unique adaptations, including lactation and the birthing of live young (obligate viviparity), a vertebrate blood-specific diet by both sexes, and obligate bacterial symbiosis. This work describes the comparative analysis of six Glossina genomes representing three sub-genera: Morsitans (G. morsitans morsitans, G. pallidipes, G. austeni), Palpalis (G. palpalis, G. fuscipes), and Fusca (G. brevipalpis) which represent different habitats, host preferences, and vectorial capacity. Genomic analyses validate established evolutionary relationships and sub-genera. Syntenic analysis of Glossina relative to Drosophila melanogaster shows reduced structural conservation across the sex-linked X chromosome. Sex-linked scaffolds show increased rates of female-specific gene expression and lower evolutionary rates relative to autosome associated genes. Tsetse-specific genes are enriched in protease, odorant-binding, and helicase activities. Lactation-associated genes are conserved across all Glossina species while male seminal proteins are rapidly evolving. Olfactory and gustatory genes are reduced across the genus relative to other insects. Vision-associated Rhodopsin genes show conservation of motion detection/tracking functions and variance in the Rhodopsin detecting colors in the blue wavelength ranges. Expanded genomic discoveries reveal the genetics underlying Glossina biology and provide a rich body of knowledge for basic science and disease control. They also provide insight into the evolutionary biology underlying novel adaptations and are relevant to applied aspects of vector control such as trap design and discovery of novel pest and disease control strategies.

60 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations