scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Large scale comparison of global gene expression patterns in human and mouse

23 Dec 2010-Genome Biology (BioMed Central)-Vol. 11, Iss: 12, pp 1-11
TL;DR: The results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse.
Abstract: It is widely accepted that orthologous genes between species are conserved at the sequence level and perform similar functions in different organisms. However, the level of conservation of gene expression patterns of the orthologous genes in different species has been unclear. To address the issue, we compared gene expression of orthologous genes based on 2,557 human and 1,267 mouse samples with high quality gene expression data, selected from experiments stored in the public microarray repository ArrayExpress. In a principal component analysis (PCA) of combined data from human and mouse samples merged on orthologous probesets, samples largely form distinctive clusters based on their tissue sources when projected onto the top principal components. The most prominent groups are the nervous system, muscle/heart tissues, liver and cell lines. Despite the great differences in sample characteristics and experiment conditions, the overall patterns of these prominent clusters are strikingly similar for human and mouse. We further analyzed data for each tissue separately and found that the most variable genes in each tissue are highly enriched with human-mouse tissue-specific orthologs and the least variable genes in each tissue are enriched with human-mouse housekeeping orthologs. The results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse. The expression of groups of orthologous genes co-varies in the two species, both for the most variable genes and the most ubiquitously expressed genes.

Content maybe subject to copyright    Report

Citations
More filters
Dissertation
28 Aug 2013
TL;DR: ERVs provide a potential link between the intestinal microbiota and a range of pathologies, including cancer, and a new computational tool, REquest, was developed for use in the above studies.
Abstract: Initial sequencing of the human and mouse genomes revealed that substantial fractions were composed of retroelements (REs) and endogenous retroviruses (ERVs), the latter being relics of ancestral retroviral infection. Further study revealed ERVs constitute up to 10% of many mammalian genomes. Despite this abundance, comparatively little is known about their interactions, beneficial or detrimental, with the host. This thesis details two distinct sets of interactions with the immune system. Firstly, the presentation of ERV-derived peptides to developing lymphocytes was shown to exert a control on the immune response to infection with Friend Virus (FV). A self peptide encoded by an ERV negatively selected a significant fraction of polyclonal FV-specific CD4+ T cells and resulted in an impaired immune response. However, CD4+ T cell-mediated antiviral activity was fully preserved and repertoire analysis revealed a deletional bias according to peptide affinity, resulting in an effective enrichment of high-affinity CD4+ T cells. Thus, ERVs exerted a significant influence on the immune response, a mechanism that may partially contribute to the heterogeneity seen in human immune responses to retroviral infections. Secondly, a requirement for specific antibodies was shown in the control of ERVs. In a range of mice displaying distinct deficiencies in antibody production, products from the intestinal microbiota potentially induce ERV expression. Subsequent recombinational correction of a defective murine leukaemia virus (MLV) results in the emergence of infectious virus. In the long term, this leads to retrovirus-induced lymphomas and morbidity. ERVs, therefore, provide a potential link between the intestinal microbiota and a range of pathologies, including cancer. Finally, a new computational tool, REquest, was developed for use in the above studies. REquest allows the mining of retroelement (RE) and ERV expression data from the majority of commercially available human and murine microarray platforms and allows rapid hypothesis testing with publicly available data.

4 citations

Dissertation
02 Jul 2016
TL;DR: The aim of this book is to provide a chronology of key events and events in the development of EMMARM, as well as provide a discussion of key players and their roles in the process.
Abstract: ........................................................................................................................ i Lay summary ............................................................................................................... iii Acknowledgements ..................................................................................................... v List of figures............................................................................................................... vi List of tables ................................................................................................................. x Abbreviations and acronyms .................................................................................... xi

3 citations


Cites background from "Large scale comparison of global ge..."

  • ...Large-scale gene expression analyses are widely used in biological and medical studies (Zhong et al., 2008; Nie et al., 2010; Zheng-Bradley et al., 2010)....

    [...]

Dissertation
07 Dec 2016
TL;DR: A transcriptome-wide analysis of healthy and diseased lines comparing immortalized lines with their parent primary populations in both differentiated and undifferentiated states found that immortalization has no measurable effect on the myogenic cascade or on any other cellular processes, and that it was protective against the senescence.
Abstract: The aim of this project was to systematically identify new interaction partners of the dystrophin protein within differentiated human skeletal muscle cells in order to uncover new roles in which dystrophin is involved, and to better understand how the global interactome is affected by the absence of dystrophin. hTERT/cdk4 immortalized myogenic human cell lines represent an important tool for skeletal muscle research however, disruption of the cell cycle has the potential to affect many other cellular processes to which it also linked. A transcriptome-wide analysis of healthy and diseased lines comparing immortalized lines with their parent primary populations in both differentiated and undifferentiated states testing their myogenic character by comparison with non-myogenic cells found that immortalization has no measurable effect on the myogenic cascade or on any other cellular processes, and that it was protective against the senescence. In this context the human muscle cell lines are a good in vitro model to study the dystrophin interactome. We investigated dystrophin’s interactors using the high-sensitivity proteomics ‘QUICK’ approach. We identified 18 new physical interactors of dystrophin which displayed a high proportion of vesicle transport related proteins and adhesion proteins, strengthening the link between dystrophin and these roles. The proteins determined through previously published data together with the newly identified interactors were incorporated into a web-based data exploration tool: sys-myo.rhcloud.com/dystrophin-interactome, intended to provide an easily accessible and informative view of dystrophins interactions in skeletal muscle.

3 citations

Dissertation
01 Jan 2019
TL;DR: A draft genome assembly of European grayling is presented and used in a comparative framework to study evolution of gene regulation following WGD and highlights cases of regulatory divergence of Ss4R duplicates, possibly related to a niche shift in early salmonid evolution.
Abstract: Whole-genome duplication (WGD) has been a major evolutionary driver of increased genomic complexity in vertebrates. One such event occurred in the salmonid family !80 Ma (Ss4R) giving rise to a plethora of structural and regulatory duplicate-driven divergence, making salmonids an exemplary system to investigate the evolutionary consequences of WGD. Here, we present a draft genome assembly of European grayling (Thymallus thymallus) and use this in a comparative framework to study evolution of gene regulation following WGD. Among the Ss4R duplicates identified in European grayling and Atlantic salmon (Salmo salar), one-third reflect nonneutral tissue expression evolution, with strong purifying selection, maintained over !50 Myr. Of these, the majority reflect conserved tissue regulation under strong selective constraints related to brain and neural-related functions, as well as higherorder protein–protein interactions. A small subset of the duplicates have evolved tissue regulatory expression divergence in a common ancestor, which have been subsequently conserved in both lineages, suggestive of adaptive divergence following WGD. These candidates for adaptive tissue expression divergence have elevated rates of protein codingand promotersequence evolution and are enriched for immuneand lipid metabolism ontology terms. Lastly, lineage-specific duplicate divergence points toward underlying differences in adaptive pressures on expression regulation in the nonanadromous grayling versus the anadromous Atlantic salmon. Our findings enhance our understanding of the role of WGD in genome evolution and highlight cases of regulatory divergence of Ss4R duplicates, possibly related to a niche shift in early salmonid evolution.

3 citations

Posted ContentDOI
14 Aug 2018-bioRxiv
TL;DR: The tissue-based proteome maps using 34 major normal pig tissues provided valuable insights and a rich resource for enhancing studies of pig genomics and biology as well as biomedical model application to human medicine.
Abstract: A lack of the complete pig proteome has left a gap in our knowledge of the pig genome and has restricted the feasibility of using pigs as a biomedical model. We developed the tissue-based proteome maps using 34 major normal pig tissues. A total of 7,319 unknown protein isoforms were identified and systematically characterized, including 3,703 novel protein isoforms, 669 protein isoforms from 460 genes symbolized beginning with LOC, and 2,947 protein isoforms without clear NCBI annotation in current pig reference genome. These newly identified protein isoforms were functionally annotated through profiling the pig transcriptome with high-throughput RNA sequencing (RNA-seq) of the same pig tissues, further improving the genome annotation of corresponding protein coding genes. Combining the well-annotated genes that having parallel expression pattern and subcellular witness, we predicted the tissue related subcellular components and potential function for these unknown proteins. Finally, we mined 3,656 orthologous genes for 49.95% of unknown protein isoforms across multiple species, referring to 65 KEGG pathways and 25 disease signaling pathways. These findings provided valuable insights and a rich resource for enhancing studies of pig genomics and biology as well as biomedical model application to human medicine.

3 citations


Cites background from "Large scale comparison of global ge..."

  • ...12 functions are more likely to exhibit similar expression patterns (Zheng-Bradley et al. 2010)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830 citations

Journal ArticleDOI
TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Abstract: SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R � system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip R � arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth’s Genetics Institute involving 95 HG-U95A human GeneChip R � arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip R � arrays. We display some familiar features of the perfect match and mismatch probe ( PM and MM )v alues of these data, and examine the variance–mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix’s (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multiarray average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities. ∗ To whom correspondence should be addressed

10,711 citations


"Large scale comparison of global ge..." refers methods in this paper

  • ...The resulting 1,323 CEL files were pre-processed using Bioconductor’s RMA package [32] to create an integrated, normalized data matrix....

    [...]

Journal ArticleDOI
TL;DR: In this paper, high-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale, and the authors have designed custom arrays that interrogate the expression of the vast majority of proteinencoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues.
Abstract: The tissue-specific pattern of mRNA expression can indicate important clues about gene function. High-density oligonucleotide arrays offer the opportunity to examine patterns of gene expression on a genome scale. Toward this end, we have designed custom arrays that interrogate the expression of the vast majority of protein-encoding human and mouse genes and have used them to profile a panel of 79 human and 61 mouse tissues. The resulting data set provides the expression patterns for thousands of predicted genes, as well as known and poorly characterized genes, from mice and humans. We have explored this data set for global trends in gene expression, evaluated commonly used lines of evidence in gene prediction methodologies, and investigated patterns indicative of chromosomal organization of transcription. We describe hundreds of regions of correlated transcription and show that some are subject to both tissue and parental allele-specific expression, suggesting a link between spatial expression and imprinting.

3,513 citations


"Large scale comparison of global ge..." refers background or result in this paper

  • ...While studies suggested that orthologous genes do not share similar expression patterns [1-5], other groups reported the opposite observations [6-9]....

    [...]

  • ...Alternatively, many other studies made use of species-specific arrays to identify coexpressed groups of orthologous genes [4-6,16,17]....

    [...]

Journal ArticleDOI
TL;DR: The ability of the trained ANN models to recognize SRBCTs is demonstrated, and the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy are demonstrated.
Abstract: The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification. Expression of several of these genes has been reported in SRBCTs, but most have not been associated with these cancers. To test the ability of the trained ANN models to recognize SRBCTs, we analyzed additional blinded samples that were not previously used for the training procedure, and correctly classified them in all cases. This study demonstrates the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy.

2,683 citations


"Large scale comparison of global ge..." refers methods in this paper

  • ...PCA has been often used to study high-dimensional data generated by genome-wide gene expression studies [22-25]....

    [...]

Book
27 Jan 2006
TL;DR: In this article, the authors present a detailed case study of R algorithms with publicly available data, and a major section of the book is devoted to fully worked case studies, with a companion website where readers can reproduce every number, figure and table on their own computers.
Abstract: Full four-color book. Some of the editors created the Bioconductor project and Robert Gentleman is one of the two originators of R. All methods are illustrated with publicly available data, and a major section of the book is devoted to fully worked case studies. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

2,625 citations

Related Papers (5)