Large scale comparison of global gene expression patterns in human and mouse

doi:10.1186/GB-2010-11-12-R124

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Integrated view and comparative analysis of baseline protein expression in mouse and rat tissues

[...]

Ananth Prakash¹, Éder Guedes Freitas²•Institutions (2)

Open Targets¹, European Bioinformatics Institute²

17 Jun 2022-PLOS Computational Biology

TL;DR: In this paper , a comparative analysis of protein expression between mouse, rat and human tissues was carried out, showing a high level of correlation among orthologs between all three species in brain, kidney, heart and liver samples.

...read moreread less

Abstract: The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively. In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.

...read moreread less

5 citations

Journal Article•DOI•

Comparative transcriptome in large-scale human and cattle populations

[...]

22 Aug 2022-Genome Biology

TL;DR: In this paper , a cross-species comparison of transcriptomes between humans and cattle was conducted to elucidating evolutionary molecular mechanisms underpinning phenotypic variation between and within species, which can help decipher the genetic and evolutionary basis of complex traits in both species.

...read moreread less

Abstract: Abstract Background Cross-species comparison of transcriptomes is important for elucidating evolutionary molecular mechanisms underpinning phenotypic variation between and within species, yet to date it has been essentially limited to model organisms with relatively small sample sizes. Results Here, we systematically analyze and compare 10,830 and 4866 publicly available RNA-seq samples in humans and cattle, respectively, representing 20 common tissues. Focusing on 17,315 orthologous genes, we demonstrate that mean/median gene expression, inter-individual variation of expression, expression quantitative trait loci, and gene co-expression networks are generally conserved between humans and cattle. By examining large-scale genome-wide association studies for 46 human traits (average n = 327,973) and 45 cattle traits (average n = 24,635), we reveal that the heritability of complex traits in both species is significantly more enriched in transcriptionally conserved than diverged genes across tissues. Conclusions In summary, our study provides a comprehensive comparison of transcriptomes between humans and cattle, which might help decipher the genetic and evolutionary basis of complex traits in both species.

...read moreread less

5 citations

Dissertation•DOI•

Conservation and synteny of long non-coding RNAs invertebrate genomes and their identification in novel transcriptomes

[...]

Swaraj Basu

01 Jan 2013

TL;DR: An annotation pipeline is developed, which can effectively identify IncRNAs in entire transcriptomes, and it is demonstrated that positional conservation of lncRNAs with a flanking coding gene is generally independent from the conservation of the lncRNA expression with respect to the coding gene.

...read moreread less

Abstract: Long non-coding RNAs (IncRNAs) are a biological entity defined by what they are not, rather than by what they are. This indicates that our knowledge about them is sensibly limited. The aim of my PhD is to gain insights into the evolution and the functions of IncRNAs through computational approaches and the usage of large scale functional genomics dataset. I developed an annotation pipeline, which can effectively identify IncRNAs in entire transcriptomes. The pipeline is able to accurately annotate the coding genes while predicting a conservative estimate of the IncRNA population. It allowed me to show, for the first time, the presence of lncRNA transcription in a diverse range of organisms. Further, I analysed sequence and positional conservation of lncRNAs, demonstrating the presence of short segments of conserved sequence in IncRNAs and the existence of several syntenically conserved non-coding transcripts over large evolutionary distances. However, I also demonstrate that positional conservation of lncRNAs with a flanking coding gene is generally independent from the conservation of the lncRNA expression with respect to the coding gene. Finally, I have characterised the diversity of lncRNA transcription in specific cells and developmental stages of two teleost fishes. In summary, the work presented in the thesis provides novel findings and contributions in the field of lncRNAomics.

...read moreread less

5 citations

Posted Content•DOI•

SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories

[...]

Fritz Lekschas¹, Nils Gehlenborg¹•Institutions (1)

Harvard University¹

01 Jun 2017-bioRxiv

TL;DR: SATORI enables researchers to seamlessly search, browse, and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface that is informed by a requirements analysis through a series of semi-structured interviews.

...read moreread less

Abstract: The number of data sets in biomedical repositories has grown rapidly over the past decade, providing scientists in fields like genomics and other areas of high-throughput biology with tremendous opportunities to re-use data. Scientists are able to test hypotheses computationally instead of generating their own data, to complement their own data sets with data generated by others, and to conduct meta analyses across many data sets. In order to effectively exploit existing data, it is crucial to understand the content of repositories and to discover data relevant to a question of interest. These are challenging tasks, as most repositories currently only support finding data sets through text-based search of metadata and in some cases also through metadata-based browsing. In order to address these challenges, we have developed SATORI - an ontology-guided visual exploration system - that combines a powerful metadata search with a tree map and a node-link diagram that visualize the repository structure, provide context to retrieved data sets, and serve as an interface to drive semantic querying and browsing of the repository. The requirements for SATORI were derived in semi-structured interviews with biomedical data scientists. We demonstrate its utility by describing several usage scenarios using a stem cell data repository, discoveries we made in the process of developing them, and an evaluation of SATORI with domain experts. We have integrated an open-source, web-based implementation of SATORI in the data repository of the Refinery Platform for biomedical data analysis and visualization (http://refinery-platform.org).

...read moreread less

5 citations

Additional excerpts

...[49]....
[...]

Posted Content•DOI•

XGSEA: CROSS-species Gene Set Enrichment Analysis via domain adaptation

[...]

Menglan Cai¹, Canh Hao Nguyen², Hiroshi Mamitsuka³, Hiroshi Mamitsuka², Limin Li¹ - Show less +1 more•Institutions (3)

Xi'an Jiaotong University¹, Kyoto University², Aalto University³

21 Jul 2020-bioRxiv

TL;DR: XGSEA (Cross-species Gene Set Enrichment Analysis) is proposed, with three steps of GSEA; 2) domain adaptation; and 3) regression, which indicates that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XG SEA.

...read moreread less

Abstract: Gene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEA Author summary Gene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.

...read moreread less

5 citations