scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Large scale comparison of global gene expression patterns in human and mouse

23 Dec 2010-Genome Biology (BioMed Central)-Vol. 11, Iss: 12, pp 1-11
TL;DR: The results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse.
Abstract: It is widely accepted that orthologous genes between species are conserved at the sequence level and perform similar functions in different organisms. However, the level of conservation of gene expression patterns of the orthologous genes in different species has been unclear. To address the issue, we compared gene expression of orthologous genes based on 2,557 human and 1,267 mouse samples with high quality gene expression data, selected from experiments stored in the public microarray repository ArrayExpress. In a principal component analysis (PCA) of combined data from human and mouse samples merged on orthologous probesets, samples largely form distinctive clusters based on their tissue sources when projected onto the top principal components. The most prominent groups are the nervous system, muscle/heart tissues, liver and cell lines. Despite the great differences in sample characteristics and experiment conditions, the overall patterns of these prominent clusters are strikingly similar for human and mouse. We further analyzed data for each tissue separately and found that the most variable genes in each tissue are highly enriched with human-mouse tissue-specific orthologs and the least variable genes in each tissue are enriched with human-mouse housekeeping orthologs. The results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse. The expression of groups of orthologous genes co-varies in the two species, both for the most variable genes and the most ubiquitously expressed genes.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: The project aimed to implement and develop new gene-based methods to derive gene-level statistics to use GWAS in well established system biology tools and explore the ability of these methods to improve the analysis GWAS on disease sub-phenotypes which usually suffer of very small sample sizes.
Abstract: Genome-wide association studies (GWAS) have identified hundreds of loci at very stringent levels of statistical significance across many different human traits. However, it is now clear that very large samples (n~10^4-10^5) are needed to find the majority of genetic variants underlying risk for most human diseases. Therefore, the field has engaged itself in a race to increase study sample sizes with some studies yielding very successful results but also studies which provide little or no new insights. This project started early on in this new wave of studies and I decided to use an alternative approach that uses prior biological knowledge to improve both interpretation and power of GWAS. The project aimed to a) implement and develop new gene-based methods to derive gene-level statistics to use GWAS in well established system biology tools; b) use of these gene-level statistics in networks and gene-set analyses of GWAS data; c) mine GWAS of neuropsychiatric disorders using gene, gene-sets and integrative biology analyses with gene-expression studies; and d) explore the ability of these methods to improve the analysis GWAS on disease sub-phenotypes which usually suffer of very small sample sizes.

5 citations

Journal Article
TL;DR: The procedures to merge in vitro and in vivo matured oocyte expression profiling datasets of rhesus monkey and mouse were illustrated and the acquired results strongly suggested the feasibility of the prepared data, and its preparation’s methodology.
Abstract: Concurrent gene expression profiling meta-analysis of in vitro and in vivo matured oocytes among mammals can provide crucial knowledge to assist reproductive technologies. Due to the lack of methodology to prepare oocyte datasets for such analysis, we illustrated the procedures to merge in vitro and in vivo matured oocyte expression profiling datasets of rhesus monkey (Macaca mulatta) and mouse (Mus musculus). Datasets acquired from both species were pooled together based on types of their orthologous genes. To determine the feasibility of constructed pooled data, top orthologous genes differentially expressed between in vitro and in vivo oocytes were identified by Linear models and empirical Bayes methods with 500 generated learning datasets (FDR<0.01). Several clustering algorithms were then applied for oocyte sample clustering using the acquired differentially expressed genes. Gene enrichment analysis to determine biological processes associated with the differentially expressed genes was performed using DAVID Bioinformatics Resources 6.7. The results revealed successful construction of pooled oocyte expression profiles of monkey and mouse, and the pooled datasets used for subsequent analyses consisted of 10,214 one-to-one orthologous genes. With total selected 100 differentially expressed genes, oocyte clustering results revealed the correct clustering of in vivo and in vitro oocyte samples. Interestingly, enrichment analysis revealed association of several differentially expressed genes with maturation and developmental process of oocytes. Of note, the acquired results strongly suggested the feasibility of the prepared data, and its preparation’s methodology. Hopefully, this approach would be beneficial for cross-species gene expression profiling analyses of several mammalian oocytes in the future.

4 citations

Journal ArticleDOI
TL;DR: In this paper , a cross-species comparison of transcriptomes between humans and cattle was conducted to elucidating evolutionary molecular mechanisms underpinning phenotypic variation between and within species, which can help decipher the genetic and evolutionary basis of complex traits in both species.
Abstract: Abstract Background Cross-species comparison of transcriptomes is important for elucidating evolutionary molecular mechanisms underpinning phenotypic variation between and within species, yet to date it has been essentially limited to model organisms with relatively small sample sizes. Results Here, we systematically analyze and compare 10,830 and 4866 publicly available RNA-seq samples in humans and cattle, respectively, representing 20 common tissues. Focusing on 17,315 orthologous genes, we demonstrate that mean/median gene expression, inter-individual variation of expression, expression quantitative trait loci, and gene co-expression networks are generally conserved between humans and cattle. By examining large-scale genome-wide association studies for 46 human traits (average n = 327,973) and 45 cattle traits (average n = 24,635), we reveal that the heritability of complex traits in both species is significantly more enriched in transcriptionally conserved than diverged genes across tissues. Conclusions In summary, our study provides a comprehensive comparison of transcriptomes between humans and cattle, which might help decipher the genetic and evolutionary basis of complex traits in both species.

4 citations

Journal ArticleDOI
10 Jan 2022-eLife
TL;DR: In this article , comparative transcriptomics of blood was used to evaluate the systemic host response and its concordance between humans with different clinical manifestations of malaria and five commonly used mouse models.
Abstract: Recent initiatives to improve translation of findings from animal models to human disease have focussed on reproducibility but quantifying the relevance of animal models remains a challenge. Here, we use comparative transcriptomics of blood to evaluate the systemic host response and its concordance between humans with different clinical manifestations of malaria and five commonly used mouse models. Plasmodium yoelii 17XL infection of mice most closely reproduces the profile of gene expression changes seen in the major human severe malaria syndromes, accompanied by high parasite biomass, severe anemia, hyperlactatemia, and cerebral microvascular pathology. However, there is also considerable discordance of changes in gene expression between the different host species and across all models, indicating that the relevance of biological mechanisms of interest in each model should be assessed before conducting experiments. These data will aid the selection of appropriate models for translational malaria research, and the approach is generalizable to other disease models.

4 citations

Posted ContentDOI
06 Mar 2018-bioRxiv
TL;DR: A novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level, and reports better transcript-level mappings compared to Ensembl orthology for the human proto-oncogene BRAF and its mouse ortholog.
Abstract: Access to large-scale genomics and transcriptomics data from various tissues and cell lines allowed the discovery of wide-spread alternative splicing events and alternative promoter usage in mammalians. However, evolutionary studies that aim at identifying orthology relationships mostly focus on gene-level orthology which hinders the importance of transcript-level diversity. Between human and mouse, gene-level orthology is currently present for nearly 16k protein coding genes spanning a diverse repertoire of over 200k total transcript isoforms. Here we describe a novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level. ExTraMapper identifies more than 250k exon, as well as 30k transcript mappings between human and mouse. We demonstrate that ExTraMapper identifies a larger number of exon and transcript mappings compared to previous methods. Furthermore, it identifies exon fusions, splits and losses due to splice site mutations, and finds better transcript-level mappings compared to Ensembl orthology for the human proto-oncogene BRAF and its ortholog Braf in mouse. ExTraMapper is applicable to any pair of organisms that have orthologous gene pairs and allows translation of alternative splicing events and transcript expression values from one organism to another.

4 citations


Cites background from "Large scale comparison of global ge..."

  • ...Mouse models are the bed stone of mechanistic studies of human genes and their role in different diseases due to conserved sequence, function and expression profiles of their orthologous genes (23,32-35)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Abstract: SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R � system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip R � arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth’s Genetics Institute involving 95 HG-U95A human GeneChip R � arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip R � arrays. We display some familiar features of the perfect match and mismatch probe ( PM and MM )v alues of these data, and examine the variance–mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix’s (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multiarray average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities. ∗ To whom correspondence should be addressed

10,711 citations


"Large scale comparison of global ge..." refers methods in this paper

  • ...The resulting 1,323 CEL files were pre-processed using Bioconductor’s RMA package [32] to create an integrated, normalized data matrix....

    [...]

Journal ArticleDOI
TL;DR: In this paper, high-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale, and the authors have designed custom arrays that interrogate the expression of the vast majority of proteinencoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues.
Abstract: The tissue-specific pattern of mRNA expression can indicate important clues about gene function. High-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale. Toward this end, we have designed custom arrays that interrogate the expression of the vast majority of protein-encoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues. The resulting data set provides the expression patterns for thousands of predicted genes, as well as known and poorly characterized genes, from mice and humans. We have explored this data set for global trends in gene expression, evaluated commonly used lines of evidence in gene prediction methodologies, and investigated patterns indicative of chromosomal organization of transcription. We describe hundreds of regions of correlated transcription and show that some are subject to both tissue and parental allele-specific expression, suggesting a link between spatial expression and imprinting.

3,513 citations


"Large scale comparison of global ge..." refers background or result in this paper

  • ...While studies suggested that orthologous genes do not share similar expression patterns [1-5], other groups reported the opposite observations [6-9]....

    [...]

  • ...Alternatively, many other studies made use of species-specific arrays to identify coexpressed groups of orthologous genes [4-6,16,17]....

    [...]

Journal ArticleDOI
TL;DR: The ability of the trained ANN models to recognize SRBCTs is demonstrated, and the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy are demonstrated.
Abstract: The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification. Expression of several of these genes has been reported in SRBCTs, but most have not been associated with these cancers. To test the ability of the trained ANN models to recognize SRBCTs, we analyzed additional blinded samples that were not previously used for the training procedure, and correctly classified them in all cases. This study demonstrates the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy.

2,683 citations


"Large scale comparison of global ge..." refers methods in this paper

  • ...PCA has been often used to study high-dimensional data generated by genome-wide gene expression studies [22-25]....

    [...]

Book
27 Jan 2006
TL;DR: In this article, the authors present a detailed case study of R algorithms with publicly available data, and a major section of the book is devoted to fully worked case studies, with a companion website where readers can reproduce every number, figure and table on their own computers.
Abstract: Full four-color book. Some of the editors created the Bioconductor project and Robert Gentleman is one of the two originators of R. All methods are illustrated with publicly available data, and a major section of the book is devoted to fully worked case studies. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

2,625 citations

Related Papers (5)