scispace - formally typeset
Search or ask a question

Showing papers by "Roderic Guigó published in 2017"


Journal ArticleDOI
TL;DR: Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level.
Abstract: Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level. Insights have been gained into the degree of conservation between human and mouse at the level of not only gene expression but also epigenetics and inter-individual variation. However, a number of limitations exist, including incomplete transcriptome characterization and difficulties in identifying orthologous phenotypes and cell types, which are beginning to be addressed by emerging technologies. Ultimately, these comparisons will help to identify the conditions under which the mouse is a suitable model of human physiology and disease, and optimize the use of animal models.

158 citations


Journal ArticleDOI
Ashis Saha1, Yungil Kim1, Ariel D. H. Gewirtz2, Brian Jo2  +256 moreInstitutions (49)
TL;DR: These networks are built that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues, and provide an improved understanding of the complex relationships of the human transcriptome across tissues.
Abstract: Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues.

146 citations


Journal ArticleDOI
TL;DR: An unpredicted speciation event in the tropical Andes that gave rise to a sibling species, formerly considered the “wild ancestor” of P. vulgaris, is uncovered and haplotypes strongly associated with genes underlying the emergence of domestication traits are identified.
Abstract: Modern civilization depends on only a few plant species for its nourishment. These crops were derived via several thousands of years of human selection that transformed wild ancestors into high-yielding domesticated descendants. Among cultivated plants, common bean (Phaseolus vulgaris L.) is the most important grain legume. Yet, our understanding of the origins and concurrent shaping of the genome of this crop plant is limited. We sequenced the genomes of 29 accessions representing 12 Phaseolus species. Single nucleotide polymorphism-based phylogenomic analyses, using both the nuclear and chloroplast genomes, allowed us to detect a speciation event, a finding further supported by metabolite profiling. In addition, we identified ~1200 protein coding genes (PCGs) and ~100 long non-coding RNAs with domestication-associated haplotypes. Finally, we describe asymmetric introgression events occurring among common bean subpopulations in Mesoamerica and across hemispheres. We uncover an unpredicted speciation event in the tropical Andes that gave rise to a sibling species, formerly considered the “wild ancestor” of P. vulgaris, which diverged before the split of the Mesoamerican and Andean P. vulgaris gene pools. Further, we identify haplotypes strongly associated with genes underlying the emergence of domestication traits. Our findings also reveal the capacity of a predominantly autogamous plant to outcross and fix loci from different populations, even from distant species, which led to the acquisition by domesticated beans of adaptive traits from wild relatives. The occurrence of such adaptive introgressions should be exploited to accelerate breeding programs in the near future.

112 citations


Journal ArticleDOI
TL;DR: A first catalogue of mutated lncRNA genes driving cancer, which will grow and improve with the application of ExInAtor to future tumour genome projects, is presented.
Abstract: Long noncoding RNAs (lncRNAs) represent a vast unexplored genetic space that may hold missing drivers of tumourigenesis, but few such “driver lncRNAs” are known. Until now, they have been discovered through changes in expression, leading to problems in distinguishing between causative roles and passenger effects. We here present a different approach for driver lncRNA discovery using mutational patterns in tumour DNA. Our pipeline, ExInAtor, identifies genes with excess load of somatic single nucleotide variants (SNVs) across panels of tumour genomes. Heterogeneity in mutational signatures between cancer types and individuals is accounted for using a simple local trinucleotide background model, which yields high precision and low computational demands. We use ExInAtor to predict drivers from the GENCODE annotation across 1112 entire genomes from 23 cancer types. Using a stratified approach, we identify 15 high-confidence candidates: 9 novel and 6 known cancer-related genes, including MALAT1, NEAT1 and SAMMSON. Both known and novel driver lncRNAs are distinguished by elevated gene length, evolutionary conservation and expression. We have presented a first catalogue of mutated lncRNA genes driving cancer, which will grow and improve with the application of ExInAtor to future tumour genome projects.

74 citations


Journal ArticleDOI
TL;DR: Secmarker greatly improves the accuracy of previously existing methods constituting a valuable tool to identify tRNASec genes, and to efficiently determine whether a genome contains selenoproteins, and is used to analyze a large set of fully sequenced genomes.
Abstract: Selenocysteine (Sec) is known as the 21st amino acid, a cysteine analogue with selenium replacing sulphur. Sec is inserted co-translationally in a small fraction of proteins called selenoproteins. In selenoprotein genes, the Sec specific tRNA (tRNASec) drives the recoding of highly specific UGA codons from stop signals to Sec. Although found in organisms from the three domains of life, Sec is not universal. Many species are completely devoid of selenoprotein genes and lack the ability to synthesize Sec. Since tRNASec is a key component in selenoprotein biosynthesis, its efficient identification in genomes is instrumental to characterize the utilization of Sec across lineages. Available tRNA prediction methods fail to accurately predict tRNASec, due to its unusual structural fold. Here, we present Secmarker, a method based on manually curated covariance models capturing the specific tRNASec structure in archaea, bacteria and eukaryotes. We exploited the non-universality of Sec to build a proper benchmark set for tRNASec predictions, which is not possible for the predictions of other tRNAs. We show that Secmarker greatly improves the accuracy of previously existing methods constituting a valuable tool to identify tRNASec genes, and to efficiently determine whether a genome contains selenoproteins. We used Secmarker to analyze a large set of fully sequenced genomes, and the results revealed new insights in the biology of tRNASec, led to the discovery of a novel bacterial selenoprotein family, and shed additional light on the phylogenetic distribution of selenoprotein containing genomes. Secmarker is freely accessible for download, or online analysis through a web server at http://secmarker.crg.cat.

45 citations


Journal ArticleDOI
TL;DR: ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision.
Abstract: Chimeric transcripts are commonly defined as transcripts linking two or more different genes in the genome, and can be explained by various biological mechanisms such as genomic rearrangement, read-through or trans-splicing, but also by technical or biological artefacts. Several studies have shown their importance in cancer, cell pluripotency and motility. Many programs have recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion genes in cancer). However outputs of different programs on the same dataset can be widely inconsistent, and tend to include many false positives. Other issues relate to simulated datasets restricted to fusion genes, real datasets with limited numbers of validated cases, result inconsistencies between simulated and real datasets, and gene rather than junction level assessment. Here we present ChimPipe, a modular and easy-to-use method to reliably identify fusion genes and transcription-induced chimeras from paired-end Illumina RNA-seq data. We have also produced realistic simulated datasets for three different read lengths, and enhanced two gold-standard cancer datasets by associating exact junction points to validated gene fusions. Benchmarking ChimPipe together with four other state-of-the-art tools on this data showed ChimPipe to be the top program at identifying exact junction coordinates for both kinds of datasets, and the one showing the best trade-off between sensitivity and precision. Applied to 106 ENCODE human RNA-seq datasets, ChimPipe identified 137 high confidence chimeras connecting the protein coding sequence of their parent genes. In subsequent experiments, three out of four predicted chimeras, two of which recurrently expressed in a large majority of the samples, could be validated. Cloning and sequencing of the three cases revealed several new chimeric transcript structures, 3 of which with the potential to encode a chimeric protein for which we hypothesized a new role. Applying ChimPipe to human and mouse ENCODE RNA-seq data led to the identification of 131 recurrent chimeras common to both species, and therefore potentially conserved. ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision. The chimeras found by ChimPipe can be validated in-vitro with high accuracy.

26 citations


Journal ArticleDOI
TL;DR: The Octodon degus is a model that naturally integrates multiple AD pathological hallmarks like tau fibrilary tangles and β-amyloid deposits and shows a correlation of expression with human AD- related genes making this model a powerful tool to characterize the effects of novel treatments for AD and identify new therapeutic targets.
Abstract: Alzheimer's disease (AD) is a slowly progressive disease characterized by impairment of memory and eventually by disturbances in reasoning, planning, language, and perception. Ageing is the greatest risk factor for its development but mutations in amyloid precursor protein (APP), apolipoprotein E (APOE), microtubule-associated protein tau (MAPT) among others, are also a major factor (Blasko et al., 2004). The symptoms of AD result from neurofibrillary tangles that are composed of aggregates of hyper-phosphorylated tau protein and an increase in the production of amyloid-beta (Aβ) protein in the brain that leads to deposits of senile plaques. As such, there is a worldwide effort to find an effective disease-modifying treatment that can reverse symptoms and/or delay onset of the disease. Transgenic mouse models exist that mimic a range of AD–related pathologies, although none of the models fully replicate all pathological features of the human disease (Birch et al., 2014). Drugs developed using these mouse models have failed in phase III clinical trials (Mangialasche et al., 2010; Braidy et al., 2012; Saraceno et al., 2013). These failures question not only our accurate understanding of the disease (Castellani and Perry, 2012) but also the validity of the animal models upon which the drug discovery efforts are rooted (Windisch, 2014; Nazem et al., 2015). Animal models have contributed significantly to our understanding of the underlying mechanisms of AD. To date, however, these findings have not resulted in target validation in humans and successful translation to disease-modifying therapies. The Octodon degus (O. degus) is a model that naturally integrates multiple AD pathological hallmarks like tau fibrilary tangles and β-amyloid deposits (Inestrosa et al., 2005, 2015; Deacon et al., 2015). The Aβ peptide sequence in O. degus is 97.5% homologous to the human Aβ peptide sequence (Inestrosa et al., 2005). The species presents acetylcholine (AChE)-rich pyramidal neurons in their forebrain, which decline in numbers during the progression to an AD-like behavioral state, similar to that seen in AD patients (Ardiles et al., 2012). Affected O. degus also present the characteristic medical signs and symptoms surrounding AD like macular degeneration, diabetes and circadian rhythm dysfunction (Laurijssens et al., 2013). Behavioral experiments have shown that the O. degus can also present behavioral deficits and neural alterations in the frontal cortex and aggression similar to those seen in patients with AD (Tarragon et al., 2013). Most importantly, the O. degus shows a correlation of expression with human AD- related genes making this model a powerful tool to characterize the effects of novel treatments for AD and identify new therapeutic targets. Our findings advance the use of the O. degus as an effective tool for AD research.

12 citations


Journal ArticleDOI
TL;DR: These resources are summarized, data types they offer, and insights they provide into human functional variation are reviewed, and the challenges and developments needed to integrate both existing and new resources into a detailed map of how genetic differences impact molecular phenotypes and ultimately human health are discussed.

8 citations


Posted ContentDOI
25 Aug 2017-bioRxiv
TL;DR: The “Cancer LncRNA Census” (CLC) is created, a manually-curated and strictly-defined compilation of lncRNAs with causative roles in cancer, and it is shown that mouse orthologues of CLC genes tend also to be cancer genes.
Abstract: Long non-coding RNAs (lncRNAs) that drive tumorigenesis are a growing focus of cancer genomics studies. To facilitate further discovery, we have created the “Cancer LncRNA Census” (CLC), a manually-curated and strictly-defined compilation of lncRNAs with causative roles in cancer. CLC has two principle applications: first, as a resource for training and benchmarking de novo identification methods; and second, as a dataset for studying the fundamental properties of these genes. CLC Version 1 comprises 122 lncRNAs implicated in 31 distinct cancers. LncRNAs are included based on functional or genetic evidence for different causative roles in cancer progression. All belong to the GENCODE reference annotation, to facilitate integration across projects and datasets. For each entry, the evidence type, biological activity (oncogene or tumour suppressor), source reference and cancer type are recorded. CLC genes are significantly enriched amongst de novo predicted driver genes from PCAWG. CLC genes are distinguished from other lncRNAs by a series of features consistent with biological function, including gene length, expression and sequence conservation of both exons and promoters. We identify a trend for CLC genes to be co-localised with known protein-coding cancer genes along the human genome. Finally, by integrating data from transposon-mutagenesis functional screens, we show that mouse orthologues of CLC genes tend also to be cancer driver genes. Thus CLC represents a valuable resource for research into long non-coding RNAs in cancer. Their evolutionary and genomic properties have implications for understanding disease mechanisms and point to conserved functions across ~80 million years of evolution.

7 citations


Posted ContentDOI
22 Dec 2017-bioRxiv
TL;DR: This study provides a comprehensive epigenetic chart of chromatin states in primary human neutrophils and monocytes, thus providing a valuable resource for studying the regulation of the human innate immune system.
Abstract: Neutrophils and monocytes provide a first line of defense against infections as part of the innate immune system. Here we report the integrated analysis of transcriptomic and epigenetic landscapes for circulating monocytes and neutrophils with the aim to enable downstream interpretation and functional validation of key regulatory elements in health and disease. We collected RNA-seq data, ChIP-seq of six histone modifications and of DNA methylation by bisulfite sequencing at base pair resolution from up to 6 individuals per cell type. Chromatin segmentation analyses suggested that monocytes have a higher number of cell-specific enhancer regions (4-fold) compared to neutrophils. This highly plastic epigenome is likely indicative of the greater differentiation potential of monocytes into macrophages, dendritic cells and osteoclasts. In contrast, most of the neutrophil-specific features tend to be characterized by repressed chromatin, reflective of their status as terminally differentiated cells. Enhancers were the regions where most of differences in DNA methylation between cells were observed, with monocyte-specific enhancers being generally hypomethylated. Monocytes show a substantially higher gene expression levels than neutrophils, in line with epigenomic analysis revealing that gene more active elements in monocytes. Our analyses suggest that the overexpression of c-Myc in monocytes and its binding to monocyte-specific enhancers could be an important contributor to these differences. Altogether, our study provides a comprehensive epigenetic chart of chromatin states in primary human neutrophils and monocytes, thus providing a valuable resource for studying the regulation of the human innate immune system.

7 citations


Journal ArticleDOI
TL;DR: There is little empirical evidence that scientific retreats lead to better science (whatever this exactly means); the authors have been unable to find any work that would correlate frequency or length of science retreats with any of the metrics usually employed to measure the quality of science.
Abstract: Scientific retreats are an intrinsic part of the life of many institutes, departments, and groups. They depart from traditional, virtual [1], and unconventional conferences [2], workshops [3], and other types of scientific meetings [4] in that participants generally all know each other prior to the retreat, and, often, they have a good grasp of the scientific interests and accomplishments of each other; they may even be working closely together. Participants, thus, do not attend the retreat expecting to necessarily hear about breakthroughs in their fields of interests or to present their latest results to an expert audience but rather to have a deeper knowledge of the work of their closest colleagues, learn from developments in related areas, and explore potential collaborations. Since retreats usually take place away from the home institute and may expand over two or more days, they are expensive to organize—including significant institutional funds and employees’ working and personal time. They are disruptive of the daily scientific routine: experiments may need to be stopped or planned ahead, and regular activities such as seminars and group meetings need to be cancelled. Thus, to many, the benefits of moving a group of people who already share the same working space to a remote location over a period of two or more days are not obvious. After all, retreat participants already have the opportunity of meeting, almost on a daily basis, at the home institute. There is little empirical evidence that scientific retreats lead to better science (whatever this exactly means); we have been unable to find any work that would correlate frequency or length of scientific retreats with any of the metrics usually employed to measure the quality of science. Yet, anecdotal evidence of a positive correlation between scientific breakthroughs and scientists being outside the lab is abundant and includes a discovery of penicillin attributed to a long summer vacation by Fleming in 1928 [5] or a discovery of Velcro by Georges de Mestral after a hunting trip with his dog in 1941 [6]. More recently, the invention of a new cipher for using DNA as high-capacity data storage by Ewan Birney and Nick Goldman of the European Bioinformatics Institute (EBI) apparently happened involving “many beers” [7, 8]. If properly planned, retreats offer an informal environment, which is becoming increasingly rare with the “laborization” of science, when scientists tend to follow preestablished working schedules and interact with each other only during the regular working hours, following wellstructured formats of group meetings, conference calls, seminars, and other meetings. There is also an increasing divide between work at the lab and personal life. These tendencies are new to science, often being seen in the past as a way of life rather than a means of living. Retreats offer the possibility to break these tendencies—even if only for a short period of time—by bringing together work and personal life. At the retreat, a student may have a lunch with a professor he