scispace - formally typeset
Search or ask a question
Author

Richard K. Wilson

Bio: Richard K. Wilson is an academic researcher from Nationwide Children's Hospital. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 173, co-authored 463 publications receiving 260000 citations. Previous affiliations of Richard K. Wilson include University of Washington & St. Jude Children's Research Hospital.
Topics: Genome, Gene, Exome sequencing, Genomics, Human genome


Papers
More filters
Journal ArticleDOI
TL;DR: The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system.
Abstract: In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.

84 citations

Journal ArticleDOI
TL;DR: The analysis characterizes P. polycephalum as a prototypical eukaryote with features attributed to the last common ancestor of Amorphea, that is, the Amoebozoa and Opisthokonts, and argues against the later emergence of tyrosine kinase signaling in the opistHokont lineage.
Abstract: Physarum polycephalum is a well-studied microbial eukaryote with unique experimental attributes relative to other experimental model organisms. It has a sophisticated life cycle with several distinct stages including amoebal, flagellated, and plasmodial cells. It is unusual in switching between open and closed mitosis according to specific life-cycle stages. Here we present the analysis of the genome of this enigmatic and important model organism and compare it with closely related species. The genome is littered with simple and complex repeats and the coding regions are frequently interrupted by introns with a mean size of 100 bases. Complemented with extensive transcriptome data, we define approximately 31,000 gene loci, providing unexpected insights into early eukaryote evolution. We describe extensive use of histidine kinase-based two-component systems and tyrosine kinase signaling, the presence of bacterial and plant type photoreceptors (phytochromes, cryptochrome, and phototropin) and of plant-type pentatricopeptide repeat proteins, as well as metabolic pathways, and a cell cycle control system typically found in more complex eukaryotes. Our analysis characterizes P. polycephalum as a prototypical eukaryote with features attributed to the last common ancestor of Amorphea, that is, the Amoebozoa and Opisthokonts. Specifically, the presence of tyrosine kinases in Acanthamoeba and Physarum as representatives of two distantly related subdivisions of Amoebozoa argues against the later emergence of tyrosine kinase signaling in the opisthokont lineage and also against the acquisition by horizontal gene transfer.

83 citations

Journal ArticleDOI
TL;DR: The generated sequence reveals the precise architecture of genes residing near CFTR/Cftr, including one known gene (WNT2/Wnt2) and two previously unknown genes that immediately flank CFTR or Cftr.
Abstract: The identification of the cystic fibrosis transmembrane conductance regulator gene (CFTR) in 1989 represents a landmark accomplishment in human genetics. Since that time, there have been numerous advances in elucidating the function of the encoded protein and the physiological basis of cystic fibrosis. However, numerous areas of cystic fibrosis biology require additional investigation, some of which would be facilitated by information about the long-range sequence context of the CFTR gene. For example, the latter might provide clues about the sequence elements responsible for the temporal and spatial regulation of CFTR expression. We thus sought to establish the sequence of the chromosomal segments encompassing the human CFTR and mouse Cftr genes, with the hope of identifying conserved regions of biologic interest by sequence comparison. Bacterial clone-based physical maps of the relevant human and mouse genomic regions were constructed, and minimally overlapping sets of clones were selected and sequenced, eventually yielding ≈1.6 Mb and ≈358 kb of contiguous human and mouse sequence, respectively. These efforts have produced the complete sequence of the ≈189-kb and ≈152-kb segments containing the human CFTR and mouse Cftr genes, respectively, as well as significant amounts of flanking DNA. Analyses of the resulting data provide insights about the organization of the CFTR/Cftr genes and potential sequence elements regulating their expression. Furthermore, the generated sequence reveals the precise architecture of genes residing near CFTR/Cftr, including one known gene (WNT2/Wnt2) and two previously unknown genes that immediately flank CFTR/Cftr.

81 citations

Journal ArticleDOI
TL;DR: Synteny between rice and other cereals using an integrated maize physical map and wheat genetic map was strikingly high, further supporting the use of rice and, in particular, chromosome 3, as a model for comparative studies among the cereals.
Abstract: Rice (Oryza sativa L.) chromosome 3 is evolutionarily conserved across the cultivated cereals and shares large blocks of synteny with maize and sorghum, which diverged from rice more than 50 million years ago. To begin to completely understand this chromosome, we sequenced, finished, and annotated 36.1 Mb ( approximately 97%) from O. sativa subsp. japonica cv Nipponbare. Annotation features of the chromosome include 5915 genes, of which 913 are related to transposable elements. A putative function could be assigned to 3064 genes, with another 757 genes annotated as expressed, leaving 2094 that encode hypothetical proteins. Similarity searches against the proteome of Arabidopsis thaliana revealed putative homologs for 67% of the chromosome 3 proteins. Further searches of a nonredundant amino acid database, the Pfam domain database, plant Expressed Sequence Tags, and genomic assemblies from sorghum and maize revealed only 853 nontransposable element related proteins from chromosome 3 that lacked similarity to other known sequences. Interestingly, 426 of these have a paralog within the rice genome. A comparative physical map of the wild progenitor species, Oryza nivara, with japonica chromosome 3 revealed a high degree of sequence identity and synteny between these two species, which diverged approximately 10,000 years ago. Although no major rearrangements were detected, the deduced size of the O. nivara chromosome 3 was 21% smaller than that of japonica. Synteny between rice and other cereals using an integrated maize physical map and wheat genetic map was strikingly high, further supporting the use of rice and, in particular, chromosome 3, as a model for comparative studies among the cereals.

81 citations

Journal ArticleDOI
TL;DR: PolyScan is presented, an algorithm and software implementation designed to provide de novo heterozygous indel detection and improved SNP identification in the context of high-throughput medical resequencing and suggests that PolyScan may play a useful role in the post human genome project research era.
Abstract: Small insertions and deletions (indels) and single nucleotide polymorphisms (SNPs) are common genetic variants that are thought to be associated with a wide variety of human diseases. Owing to the genome's size and complexity, manually characterizing each one of these variations in an individual is not practical. While significant progress has been made in automated single-base mutation discovery from the sequences of diploid PCR products, automated and reliable detection of indels continues to pose difficult challenges. In this paper, we present PolyScan, an algorithm and software implementation designed to provide de novo heterozygous indel detection and improved SNP identification in the context of high-throughput medical resequencing. Tests on a human diploid PCR-based sequence data set, consisting of 90,270 traces from 13 genes, indicate that PolyScan identified approximately 90% of the 151 consensus indel sites and approximately 84% of the 1546 heterozygous indels previously identified by manual inspection. Tests on tumor-derived data show that PolyScan better identifies high-quality, low-level mutations as compared with other mutation detection software. Moreover, SNP identification improves when reprocessing the results of other programs. These results suggest that PolyScan may play a useful role in the post human genome project research era.

80 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations