scispace - formally typeset
Search or ask a question
Author

Hanlee P. Ji

Bio: Hanlee P. Ji is an academic researcher from Stanford University. The author has contributed to research in topics: Genome & Cancer. The author has an hindex of 38, co-authored 155 publications receiving 12020 citations. Previous affiliations of Hanlee P. Ji include Arizona State University & University of Washington.
Topics: Genome, Cancer, Genomics, Biology, Exome sequencing


Papers
More filters
Journal ArticleDOI
TL;DR: Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.
Abstract: DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.

4,458 citations

Journal ArticleDOI
Leming Shi1, Laura H. Reid, Wendell D. Jones, Richard Shippy2, Janet A. Warrington3, Shawn C. Baker4, Patrick J. Collins5, Francoise de Longueville, Ernest S. Kawasaki6, Kathleen Y. Lee7, Yuling Luo, Yongming Andrew Sun7, James C. Willey8, Robert Setterquist7, Gavin M. Fischer9, Weida Tong1, Yvonne P. Dragan1, David J. Dix10, Felix W. Frueh1, Federico Goodsaid1, Damir Herman6, Roderick V. Jensen11, Charles D. Johnson, Edward K. Lobenhofer12, Raj K. Puri1, Uwe Scherf1, Jean Thierry-Mieg6, Charles Wang13, Michael A Wilson7, Paul K. Wolber5, Lu Zhang7, William Slikker1, Shashi Amur1, Wenjun Bao14, Catalin Barbacioru7, Anne Bergstrom Lucas5, Vincent Bertholet, Cecilie Boysen, Bud Bromley, Donna Brown, Alan Brunner2, Roger D. Canales7, Xiaoxi Megan Cao, Thomas A. Cebula1, James J. Chen1, Jing Cheng, Tzu Ming Chu14, Eugene Chudin4, John F. Corson5, J. Christopher Corton10, Lisa J. Croner15, Christopher Davies3, Timothy Davison, Glenda C. Delenstarr5, Xutao Deng13, David Dorris7, Aron Charles Eklund11, Xiaohui Fan1, Hong Fang, Stephanie Fulmer-Smentek5, James C. Fuscoe1, Kathryn Gallagher10, Weigong Ge1, Lei Guo1, Xu Guo3, Janet Hager16, Paul K. Haje, Jing Han1, Tao Han1, Heather Harbottle1, Stephen C. Harris1, Eli Hatchwell17, Craig A. Hauser18, Susan D. Hester10, Huixiao Hong, Patrick Hurban12, Scott A. Jackson1, Hanlee P. Ji19, Charles R. Knight, Winston Patrick Kuo20, J. Eugene LeClerc1, Shawn Levy21, Quan Zhen Li, Chunmei Liu3, Ying Liu22, Michael Lombardi11, Yunqing Ma, Scott R. Magnuson, Botoul Maqsodi, Timothy K. McDaniel3, Nan Mei1, Ola Myklebost23, Baitang Ning1, Natalia Novoradovskaya9, Michael S. Orr1, Terry Osborn, Adam Papallo11, Tucker A. Patterson1, Roger Perkins, Elizabeth Herness Peters, Ron L. Peterson24, Kenneth L. Philips12, P. Scott Pine1, Lajos Pusztai25, Feng Qian, Hongzu Ren10, Mitch Rosen10, Barry A. Rosenzweig1, Raymond R. Samaha7, Mark Schena, Gary P. Schroth, Svetlana Shchegrova5, Dave D. Smith26, Frank Staedtler24, Zhenqiang Su1, Hongmei Sun, Zoltan Szallasi20, Zivana Tezak1, Danielle Thierry-Mieg6, Karol L. Thompson1, Irina Tikhonova16, Yaron Turpaz3, Beena Vallanat10, Christophe Van, Stephen J. Walker27, Sue Jane Wang1, Yonghong Wang6, Russell D. Wolfinger14, Alexander Wong5, Jie Wu, Chunlin Xiao7, Qian Xie, Jun Xu13, Wen Yang, Liang Zhang, Sheng Zhong28, Yaping Zong 
TL;DR: This study describes the experimental design and probe mapping efforts behind the MicroArray Quality Control project and shows intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed.
Abstract: Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.

1,987 citations

Journal ArticleDOI
TL;DR: A microfluidics-based, linked-read sequencing technology that can phase and haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma and cancer genomes using nanograms of input DNA is presented.
Abstract: Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.

686 citations

Journal ArticleDOI
TL;DR: Intratumor heterogeneity and genomic instability have the potential to be useful measures that can universally be applied to all cancers.
Abstract: Intratumor heterogeneity (ITH) drives neoplastic progression and therapeutic resistance. We used the bioinformatics tools 'expanding ploidy and allele frequency on nested subpopulations' (EXPANDS) and PyClone to detect clones that are present at a ≥10% frequency in 1,165 exome sequences from tumors in The Cancer Genome Atlas. 86% of tumors across 12 cancer types had at least two clones. ITH in the morphology of nuclei was associated with genetic ITH (Spearman's correlation coefficient, ρ = 0.24-0.41; P 2 clones coexisted in the same tumor sample (HR = 1.49, 95% CI: 1.20-1.87). In two independent data sets, copy-number alterations affecting either 75% of a tumor's genome predicted reduced risk (HR = 0.15, 95% CI: 0.08-0.29). Mortality risk also declined when >4 clones coexisted in the sample, suggesting a trade-off between the costs and benefits of genomic instability. ITH and genomic instability thus have the potential to be useful measures that can universally be applied to all cancers.

630 citations

Journal ArticleDOI
TL;DR: The general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues is demonstrated.
Abstract: The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.

334 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

14,524 citations

Journal ArticleDOI
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

14,103 citations

Journal ArticleDOI
TL;DR: Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets, including visualizing sliding window results integrated with available genome annotations in the UCSC browser.
Abstract: Motivation: DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser. Availability: Freely available to academic users from: http://www.ub.edu/dnasp Contact: [email protected]

13,511 citations