scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Large scale comparison of global gene expression patterns in human and mouse

23 Dec 2010-Genome Biology (BioMed Central)-Vol. 11, Iss: 12, pp 1-11
TL;DR: The results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse.
Abstract: It is widely accepted that orthologous genes between species are conserved at the sequence level and perform similar functions in different organisms. However, the level of conservation of gene expression patterns of the orthologous genes in different species has been unclear. To address the issue, we compared gene expression of orthologous genes based on 2,557 human and 1,267 mouse samples with high quality gene expression data, selected from experiments stored in the public microarray repository ArrayExpress. In a principal component analysis (PCA) of combined data from human and mouse samples merged on orthologous probesets, samples largely form distinctive clusters based on their tissue sources when projected onto the top principal components. The most prominent groups are the nervous system, muscle/heart tissues, liver and cell lines. Despite the great differences in sample characteristics and experiment conditions, the overall patterns of these prominent clusters are strikingly similar for human and mouse. We further analyzed data for each tissue separately and found that the most variable genes in each tissue are highly enriched with human-mouse tissue-specific orthologs and the least variable genes in each tissue are enriched with human-mouse housekeeping orthologs. The results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse. The expression of groups of orthologous genes co-varies in the two species, both for the most variable genes and the most ubiquitously expressed genes.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Using a conditional regression algorithm, regulatory interactions between transcription factors and potential targets overlap with regulatory interactions inferred from transcriptional changes during immunocyte differentiation, and it is speculated that this “conservation of variation” reflects a differential constraint on intraspecies variation in expression levels of different genes.
Abstract: To determine the breadth and underpinning of changes in immunocyte gene expression due to genetic variation in mice, we performed, as part of the Immunological Genome Project, gene expression profiling for CD4(+) T cells and neutrophils purified from 39 inbred strains of the Mouse Phenome Database. Considering both cell types, a large number of transcripts showed significant variation across the inbred strains, with 22% of the transcriptome varying by 2-fold or more. These included 119 loci with apparent complete loss of function, where the corresponding transcript was not expressed in some of the strains, representing a useful resource of "natural knockouts." We identified 1222 cis-expression quantitative trait loci (cis-eQTL) that control some of this variation. Most (60%) cis-eQTLs were shared between T cells and neutrophils, but a significant portion uniquely impacted one of the cell types, suggesting cell type-specific regulatory mechanisms. Using a conditional regression algorithm, we predicted regulatory interactions between transcription factors and potential targets, and we demonstrated that these predictions overlap with regulatory interactions inferred from transcriptional changes during immunocyte differentiation. Finally, comparison of these and parallel data from CD4(+) T cells of healthy humans demonstrated intriguing similarities in variability of a gene's expression: the most variable genes tended to be the same in both species, and there was an overlap in genes subject to strong cis-acting genetic variants. We speculate that this "conservation of variation" reflects a differential constraint on intraspecies variation in expression levels of different genes, either through lower pressure for some genes, or by favoring variability for others.

36 citations


Cites background or methods from "Large scale comparison of global ge..."

  • ...coefficients) for the top n = [10, 20, 30, 50] targets of each TF in the second species (see Materials and Methods)....

    [...]

  • ...The top 10 targets for each TF in humans (mice) were first identified, and then the overlap with the top n = [10, 20, 30, 40, 50] mouse (human) targets for the same TF was computed....

    [...]

  • ...Second, the evidence for conservation of the top n = [10, 20, 30, 50] targets of each TF in mice (humans) was assessed in humans by using the Wilcoxon rank sum test to compare the distribution of the coexpression values for the top n targets compared with the distribution of coexpression values between that TF and all genes....

    [...]

  • ...First, for each TF in mice (humans), the top 10 targets were defined based on coexpression values, and the overlap of these targets was assessed in the top n = [10, 20, 30, 50] targets for the same TF in human (mouse)....

    [...]

Journal ArticleDOI
Yunpeng Cao1, Yahui Han1, Dahui Li1, Yi Lin1, Yongping Cai1 
19 Oct 2016-Genes
TL;DR: This study identified 4CL-related genes in the apple, peach, yangmei, and pear genomes using DNATOOLS software and inferred their evolutionary relationships using phylogenetic analysis, collinearity analysis, conserved motif analysis, and structure analysis.
Abstract: In plants, 4-coumarate:coenzyme A ligases (4CLs), comprising some of the adenylate-forming enzymes, are key enzymes involved in regulating lignin metabolism and the biosynthesis of flavonoids and other secondary metabolites. Although several 4CL-related proteins were shown to play roles in secondary metabolism, no comprehensive study on 4CL-related genes in the pear and other Rosaceae species has been reported. In this study, we identified 4CL-related genes in the apple, peach, yangmei, and pear genomes using DNATOOLS software and inferred their evolutionary relationships using phylogenetic analysis, collinearity analysis, conserved motif analysis, and structure analysis. A total of 149 4CL-related genes in four Rosaceous species (pear, apple, peach, and yangmei) were identified, with 30 members in the pear. We explored the functions of several 4CL and acyl-coenzyme A synthetase (ACS) genes during the development of pear fruit by quantitative real-time PCR (qRT-PCR). We found that duplication events had occurred in the 30 4CL-related genes in the pear. These duplicated 4CL-related genes are distributed unevenly across all pear chromosomes except chromosomes 4, 8, 11, and 12. The results of this study provide a basis for further investigation of both the functions and evolutionary history of 4CL-related genes.

36 citations


Cites background from "Large scale comparison of global ge..."

  • ...In addition, previous studies showed that orthologous genes were more likely to share correlated expression patterns compared with non‐orthologous genes [66,67]....

    [...]

  • ...In addition, previous studies showed that orthologous genes were more likely to share correlated expression patterns compared with non-orthologous genes [66,67]....

    [...]

Journal ArticleDOI
TL;DR: Results show that mitochondria are among the first responders to environmental and nutritional stress stimuli in gilthead sea bream, and functional phenotyping of this cellular organelle is highly promising to obtain reliable markers of growth performance and well-being in this fish species.
Abstract: The effects of nutrient availability on the transcriptome of cardiac and skeletal muscle tissues were assessed in juvenile gilthead sea bream fed with a standard diet at two feeding levels: (1) full ration size and (2) 70 % satiation followed by a finishing phase at the maintenance ration. Microarray analysis evidenced a characteristic transcriptomic profile for each muscle tissue following changes in oxidative capacity (heart > red skeletal muscle > white skeletal muscle). The transcriptome of heart and secondly that of red skeletal muscle were highly responsive to nutritional changes, whereas that of glycolytic white skeletal muscle showed less ability to respond. The highly expressed and nutritionally regulated genes of heart were mainly related to signal transduction and transcriptional regulation. In contrast, those of white muscle were enriched in gene ontology (GO) terms related to proteolysis and protein ubiquitination. Microarray meta-analysis using the bioinformatic tool Fish and Chips ( http://fishandchips.genouest.org/index.php ) showed the close association of a representative cluster of white skeletal muscle with some of cardiac and red skeletal muscle, and many GO terms related to mitochondrial function appeared to be common links between them. A second round of cluster comparisons revealed that mitochondria-related GOs also linked differentially expressed genes of heart with those of liver from cortisol-treated gilthead sea bream. These results show that mitochondria are among the first responders to environmental and nutritional stress stimuli in gilthead sea bream, and functional phenotyping of this cellular organelle is highly promising to obtain reliable markers of growth performance and well-being in this fish species.

35 citations


Cites background from "Large scale comparison of global ge..."

  • ...Likewise, the tissue-specific transcriptome of human and rodents is usually clustered based on tissue function and developmental origin (Son et al., 2005; Zheng-Bradley et al., 2010)....

    [...]

Journal ArticleDOI
TL;DR: The existing EuroPhenome and WTSI phenotype informatics systems and the IKMC portal are reviewed and plans for extending these systems and lessons learned are presented to the development of a robust IMPC informatics infrastructure.
Abstract: The International Mouse Phenotyping Consortium (IMPC) (http://www.mousephenotype.org) will reveal the pleiotropic functions of every gene in the mouse genome and uncover the wider role of genetic loci within diverse biological systems. Comprehensive informatics solutions are vital to ensuring that this vast array of data is captured in a standardised manner and made accessible to the scientific community for interrogation and analysis. Here we review the existing EuroPhenome and WTSI phenotype informatics systems and the IKMC portal, and present plans for extending these systems and lessons learned to the development of a robust IMPC informatics infrastructure.

35 citations


Cites background or methods from "Large scale comparison of global ge..."

  • ...2012) and are available as a series of meta-analysed experiments coanalysed with human orthologues of mouse genes (Zheng-Bradley et al. 2010)....

    [...]

  • ...…data from array-based and sequencing technologies are stored in the European Bioinformatics Institute’s (EBI) gene expression atlas (Kapushesky et al. 2012) and are available as a series of meta-analysed experiments coanalysed with human orthologues of mouse genes (Zheng-Bradley et al. 2010)....

    [...]

Journal ArticleDOI
TL;DR: This report provides the first global genomic evidence that CNS pathways affected by toluene are strongly associated with neurological processes participating in synaptic transmission and plasticity.

34 citations

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Abstract: SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R � system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip R � arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth’s Genetics Institute involving 95 HG-U95A human GeneChip R � arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip R � arrays. We display some familiar features of the perfect match and mismatch probe ( PM and MM )v alues of these data, and examine the variance–mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix’s (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multiarray average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities. ∗ To whom correspondence should be addressed

10,711 citations


"Large scale comparison of global ge..." refers methods in this paper

  • ...The resulting 1,323 CEL files were pre-processed using Bioconductor’s RMA package [32] to create an integrated, normalized data matrix....

    [...]

Journal ArticleDOI
TL;DR: In this paper, high-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale, and the authors have designed custom arrays that interrogate the expression of the vast majority of proteinencoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues.
Abstract: The tissue-specific pattern of mRNA expression can indicate important clues about gene function. High-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale. Toward this end, we have designed custom arrays that interrogate the expression of the vast majority of protein-encoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues. The resulting data set provides the expression patterns for thousands of predicted genes, as well as known and poorly characterized genes, from mice and humans. We have explored this data set for global trends in gene expression, evaluated commonly used lines of evidence in gene prediction methodologies, and investigated patterns indicative of chromosomal organization of transcription. We describe hundreds of regions of correlated transcription and show that some are subject to both tissue and parental allele-specific expression, suggesting a link between spatial expression and imprinting.

3,513 citations


"Large scale comparison of global ge..." refers background or result in this paper

  • ...While studies suggested that orthologous genes do not share similar expression patterns [1-5], other groups reported the opposite observations [6-9]....

    [...]

  • ...Alternatively, many other studies made use of species-specific arrays to identify coexpressed groups of orthologous genes [4-6,16,17]....

    [...]

Journal ArticleDOI
TL;DR: The ability of the trained ANN models to recognize SRBCTs is demonstrated, and the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy are demonstrated.
Abstract: The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification. Expression of several of these genes has been reported in SRBCTs, but most have not been associated with these cancers. To test the ability of the trained ANN models to recognize SRBCTs, we analyzed additional blinded samples that were not previously used for the training procedure, and correctly classified them in all cases. This study demonstrates the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy.

2,683 citations


"Large scale comparison of global ge..." refers methods in this paper

  • ...PCA has been often used to study high-dimensional data generated by genome-wide gene expression studies [22-25]....

    [...]

Book
27 Jan 2006
TL;DR: In this article, the authors present a detailed case study of R algorithms with publicly available data, and a major section of the book is devoted to fully worked case studies, with a companion website where readers can reproduce every number, figure and table on their own computers.
Abstract: Full four-color book. Some of the editors created the Bioconductor project and Robert Gentleman is one of the two originators of R. All methods are illustrated with publicly available data, and a major section of the book is devoted to fully worked case studies. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

2,625 citations

Related Papers (5)