Author
Charles D. Johnson
Other affiliations: Texas AgriLife Research, University of Louisville, University of Minnesota
Bio: Charles D. Johnson is an academic researcher from Texas A&M University. The author has contributed to research in topics: Deep sequencing & Transcriptome. The author has an hindex of 19, co-authored 26 publications receiving 5483 citations. Previous affiliations of Charles D. Johnson include Texas AgriLife Research & University of Louisville.
Papers
More filters
••
Food and Drug Administration1, GE Healthcare2, Thermo Fisher Scientific3, Illumina4, Agilent Technologies5, National Institutes of Health6, Applied Biosystems7, University of Toledo8, Stratagene9, United States Environmental Protection Agency10, University of Massachusetts Boston11, Clinical Data, Inc12, University of California, Los Angeles13, SAS Institute14, Biogen Idec15, Yale University16, Cold Spring Harbor Laboratory17, Discovery Institute18, Stanford University19, Harvard University20, Vanderbilt University21, University of Texas at Dallas22, University of Oslo23, Novartis24, University of Texas MD Anderson Cancer Center25, Luminex Corporation26, Wake Forest University27, University of Illinois at Urbana–Champaign28
TL;DR: This study describes the experimental design and probe mapping efforts behind the MicroArray Quality Control project and shows intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed.
Abstract: Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.
1,987 citations
••
TL;DR: This work reveals the let-7 microRNA to be a master regulator of cell proliferation pathways and shows that multiple genes involved in cell cycle and cell division functions are also directly or indirectly repressed byLet-7.
Abstract: MicroRNAs play important roles in animal development, cell differentiation, and metabolism and have been implicated in human cancer. The let-7 microRNA controls the timing of cell cycle exit and terminal differentiation in Caenorhabditis elegans and is poorly expressed or deleted in human lung tumors. Here, we show that let-7 is highly expressed in normal lung tissue, and that inhibiting let-7 function leads to increased cell division in A549 lung cancer cells. Overexpression of let-7 in cancer cell lines alters cell cycle progression and reduces cell division, providing evidence that let-7 functions as a tumor suppressor in lung cells. let-7 was previously shown to regulate the expression of the RAS lung cancer oncogenes, and our work now shows that multiple genes involved in cell cycle and cell division functions are also directly or indirectly repressed by let-7. This work reveals the let-7 microRNA to be a master regulator of cell proliferation pathways.
1,220 citations
••
TL;DR: The complete SEQC data sets, comprising >100 billion reads, provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings, and measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling.
Abstract: We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
853 citations
••
TL;DR: P predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans are generated.
Abstract: Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
753 citations
••
TL;DR: Libraries of synthetic small interfering RNA (siRNA) and microRNAs (miRNA) were used to probe the TRAIL pathway and revealed novel genes, including CDK4, the only member of the cyclin-dependent kinase gene family that bore a unique function in apoptotic signal transduction.
Abstract: Tumor necrosis factor–related apoptosis-inducing ligand (TRAIL) binds to death receptors 4/5 and selectively induces caspase-dependent apoptosis. The RNA interference screening approach has led to the discovery and characterization of several TRAIL pathway components in human cells. Here, libraries of synthetic small interfering RNA (siRNA) and microRNAs (miRNA) were used to probe the TRAIL pathway. In addition to known genes, siRNAs targeting CDK4, PTGS1, ALG2, CLCN3, IRAK4 , and MAP3K8 altered TRAIL-induced caspase-3 activation responses. Introduction of the miRNAs let-7c, mir-10a, mir-144, mir-150, mir-155 , and mir-193 also affected the activation of the caspase cascade. Putative targets of these endogenous miRNAs included genes encoding death receptors, caspases, and other apoptosis-related genes. Among the novel genes revealed in the screen, CDK4 was selected for further characterization. CDK4 was the only member of the cyclin-dependent kinase gene family that bore a unique function in apoptotic signal transduction. [Cancer Res 2007;67(22):10782–8]
217 citations
Cited by
More filters
••
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
22,147 citations
••
TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
14,524 citations
••
TL;DR: Preface to the Princeton Landmarks in Biology Edition vii Preface xi Symbols used xiii 1.
Abstract: Preface to the Princeton Landmarks in Biology Edition vii Preface xi Symbols Used xiii 1. The Importance of Islands 3 2. Area and Number of Speicies 8 3. Further Explanations of the Area-Diversity Pattern 19 4. The Strategy of Colonization 68 5. Invasibility and the Variable Niche 94 6. Stepping Stones and Biotic Exchange 123 7. Evolutionary Changes Following Colonization 145 8. Prospect 181 Glossary 185 References 193 Index 201
14,171 citations
••
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
14,103 citations
•
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.
11,521 citations