scispace - formally typeset
Search or ask a question
Author

Jean Thierry-Mieg

Bio: Jean Thierry-Mieg is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Gene & Genome. The author has an hindex of 40, co-authored 72 publications receiving 32739 citations. Previous affiliations of Jean Thierry-Mieg include Harvard University & National Institute of Genetics.


Papers
More filters
Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
Leming Shi1, Laura H. Reid, Wendell D. Jones, Richard Shippy2, Janet A. Warrington3, Shawn C. Baker4, Patrick J. Collins5, Francoise de Longueville, Ernest S. Kawasaki6, Kathleen Y. Lee7, Yuling Luo, Yongming Andrew Sun7, James C. Willey8, Robert Setterquist7, Gavin M. Fischer9, Weida Tong1, Yvonne P. Dragan1, David J. Dix10, Felix W. Frueh1, Federico Goodsaid1, Damir Herman6, Roderick V. Jensen11, Charles D. Johnson, Edward K. Lobenhofer12, Raj K. Puri1, Uwe Scherf1, Jean Thierry-Mieg6, Charles Wang13, Michael A Wilson7, Paul K. Wolber5, Lu Zhang7, William Slikker1, Shashi Amur1, Wenjun Bao14, Catalin Barbacioru7, Anne Bergstrom Lucas5, Vincent Bertholet, Cecilie Boysen, Bud Bromley, Donna Brown, Alan Brunner2, Roger D. Canales7, Xiaoxi Megan Cao, Thomas A. Cebula1, James J. Chen1, Jing Cheng, Tzu Ming Chu14, Eugene Chudin4, John F. Corson5, J. Christopher Corton10, Lisa J. Croner15, Christopher Davies3, Timothy Davison, Glenda C. Delenstarr5, Xutao Deng13, David Dorris7, Aron Charles Eklund11, Xiaohui Fan1, Hong Fang, Stephanie Fulmer-Smentek5, James C. Fuscoe1, Kathryn Gallagher10, Weigong Ge1, Lei Guo1, Xu Guo3, Janet Hager16, Paul K. Haje, Jing Han1, Tao Han1, Heather Harbottle1, Stephen C. Harris1, Eli Hatchwell17, Craig A. Hauser18, Susan D. Hester10, Huixiao Hong, Patrick Hurban12, Scott A. Jackson1, Hanlee P. Ji19, Charles R. Knight, Winston Patrick Kuo20, J. Eugene LeClerc1, Shawn Levy21, Quan Zhen Li, Chunmei Liu3, Ying Liu22, Michael Lombardi11, Yunqing Ma, Scott R. Magnuson, Botoul Maqsodi, Timothy K. McDaniel3, Nan Mei1, Ola Myklebost23, Baitang Ning1, Natalia Novoradovskaya9, Michael S. Orr1, Terry Osborn, Adam Papallo11, Tucker A. Patterson1, Roger Perkins, Elizabeth Herness Peters, Ron L. Peterson24, Kenneth L. Philips12, P. Scott Pine1, Lajos Pusztai25, Feng Qian, Hongzu Ren10, Mitch Rosen10, Barry A. Rosenzweig1, Raymond R. Samaha7, Mark Schena, Gary P. Schroth, Svetlana Shchegrova5, Dave D. Smith26, Frank Staedtler24, Zhenqiang Su1, Hongmei Sun, Zoltan Szallasi20, Zivana Tezak1, Danielle Thierry-Mieg6, Karol L. Thompson1, Irina Tikhonova16, Yaron Turpaz3, Beena Vallanat10, Christophe Van, Stephen J. Walker27, Sue Jane Wang1, Yonghong Wang6, Russell D. Wolfinger14, Alexander Wong5, Jie Wu, Chunlin Xiao7, Qian Xie, Jun Xu13, Wen Yang, Liang Zhang, Sheng Zhong28, Yaping Zong 
TL;DR: This study describes the experimental design and probe mapping efforts behind the MicroArray Quality Control project and shows intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed.
Abstract: Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.

1,987 citations

Journal ArticleDOI
03 Mar 1994-Nature
TL;DR: The nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III is completed, and comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three.
Abstract: As part of our effort to sequence the 100-megabase (Mb) genome of the nematode Caenorhabditis elegans, we have completed the nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III. Analysis of the finished sequence has indicated an average density of about one gene per five kilobases; comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three. In addition, the genomic sequence contains several intriguing features, including putative gene duplications and a variety of other repeats with potential evolutionary implications.

1,612 citations

Journal ArticleDOI
Zhenqiang Su, Paweł P. Łabaj1, Sheng Li2, Jean Thierry-Mieg3  +161 moreInstitutions (54)
TL;DR: The complete SEQC data sets, comprising >100 billion reads, provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings, and measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling.
Abstract: We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.

853 citations

Journal ArticleDOI
Leming Shi1, Gregory Campbell1, Wendell D. Jones, Fabien Campagne2  +198 moreInstitutions (55)
TL;DR: P predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans are generated.
Abstract: Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

753 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147 citations

Journal ArticleDOI
19 Feb 1998-Nature
TL;DR: To their surprise, it was found that double-stranded RNA was substantially more effective at producing interference than was either strand individually, arguing against stochiometric interference with endogenous mRNA and suggesting that there could be a catalytic or amplification component in the interference process.
Abstract: Experimental introduction of RNA into cells can be used in certain biological systems to interfere with the function of an endogenous gene Such effects have been proposed to result from a simple antisense mechanism that depends on hybridization between the injected RNA and endogenous messenger RNA transcripts RNA interference has been used in the nematode Caenorhabditis elegans to manipulate gene expression Here we investigate the requirements for structure and delivery of the interfering RNA To our surprise, we found that double-stranded RNA was substantially more effective at producing interference than was either strand individually After injection into adult animals, purified single strands had at most a modest effect, whereas double-stranded mixtures caused potent and specific interference The effects of this interference were evident in both the injected animals and their progeny Only a few molecules of injected double-stranded RNA were required per affected cell, arguing against stochiometric interference with endogenous mRNA and suggesting that there could be a catalytic or amplification component in the interference process

15,374 citations

Journal ArticleDOI
TL;DR: It is shown that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads, and estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired- end reads, depending on the number of possible splice forms for each gene.
Abstract: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

14,524 citations