scispace - formally typeset
Search or ask a question

Showing papers by "Cathal Seoighe published in 2015"


Journal ArticleDOI
TL;DR: Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures intrans.
Abstract: RNA:DNA hybrids represent a non-canonical nucleic acid structure that has been associated with a range of human diseases and potential transcriptional regulatory functions. Mapping of RNA:DNA hybrids in human cells reveals them to have a number of characteristics that give insights into their functions. We find RNA:DNA hybrids to occupy millions of base pairs in the human genome. A directional sequencing approach shows the RNA component of the RNA:DNA hybrid to be purine-rich, indicating a thermodynamic contribution to their in vivo stability. The RNA:DNA hybrids are enriched at loci with decreased DNA methylation and increased DNase hypersensitivity, and within larger domains with characteristics of heterochromatin formation, indicating potential transcriptional regulatory properties. Mass spectrometry studies of chromatin at RNA:DNA hybrids shows the presence of the ILF2 and ILF3 transcription factors, supporting a model of certain transcription factors binding preferentially to the RNA:DNA conformation. Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures in trans. The results of the study indicate heterogeneous functions of these genomic elements and new insights into their formation and stability in vivo.

124 citations


Journal ArticleDOI
TL;DR: The genetic contribution to southern and eastern African populations, which involved admixture between indigenous San, Niger-Congo-speaking and populations of Eurasian ancestry, is demonstrated, demonstrating the need to account for stratification in genome-wide association studies.
Abstract: We report a study of genome-wide, dense SNP (∼ 900K) and copy number polymorphism data of indigenous southern Africans. We demonstrate the genetic contribution to southern and eastern African populations, which involved admixture between indigenous San, Niger-Congo-speaking and populations of Eurasian ancestry. This finding illustrates the need to account for stratification in genome-wide association studies, and that admixture mapping would likely be a successful approach in these populations. We developed a strategy to detect the signature of selection prior to and following putative admixture events. Several genomic regions show an unusual excess of Niger-Kordofanian, and unusual deficiency of both San and Eurasian ancestry, which were considered the footprints of selection after population admixture. Several SNPs with strong allele frequency differences were observed predominantly between the admixed indigenous southern African populations, and their ancestral Eurasian populations. Interestingly, many candidate genes, which were identified within the genomic regions showing signals for selection, were associated with southern African-specific high-risk, mostly communicable diseases, such as malaria, influenza, tuberculosis, and human immunodeficiency virus/AIDs. This observation suggests a potentially important role that these genes might have played in adapting to the environment. Additionally, our analyses of haplotype structure, linkage disequilibrium, recombination, copy number variation and genome-wide admixture highlight, and support the unique position of San relative to both African and non-African populations. This study contributes to a better understanding of population ancestry and selection in south-eastern African populations; and the data and results obtained will support research into the genetic contributions to infectious as well as non-communicable diseases in the region.

47 citations


Posted ContentDOI
02 Nov 2015-bioRxiv
TL;DR: Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures in trans.
Abstract: Background: RNA:DNA hybrids represent a non-canonical nucleic acid structure that has been associated with a range of human diseases and potential transcriptional regulatory functions. Mapping of RNA:DNA hybrids in human cells reveals them to have a number of characteristics that give insights into their functions. Results: We find RNA:DNA hybrids to occupy millions of base pairs in the human genome. A directional sequencing approach shows the RNA component of the RNA:DNA hybrid to be purine-rich, indicating a thermodynamic contribution to their in vivo stability. The RNA:DNA hybrids are enriched at loci with decreased DNA methylation and increased DNase hypersensitivity, and within larger domains with characteristics of heterochromatin formation, indicating potential transcriptional regulatory properties. Mass spectrometry studies of chromatin at RNA:DNA hybrids shows the presence of the ILF2 and ILF3 transcription factors, supporting a model of certain transcription factors binding preferentially to the RNA:DNA conformation. Conclusions: Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures in trans. The results of the study indicate heterogeneous functions of these genomic elements and new insights into their formation and stability in vivo.

41 citations


Journal ArticleDOI
TL;DR: The results suggest that developing T lymphocytes are exposed to diverse tissue-restricted splice isoforms in the thymus and that AIRE has a direct or indirect role in this process, representing a novel aspect of its role in the maintenance of immune self-tolerance.
Abstract: Motivation: The expression of tissue-restricted antigens (TRAs) in the thymus is required to ensure efficient negative selection of potentially auto-reactive T lymphocytes and avoid autoimmune disease. This promiscuous expression is under the control of the autoimmune regulator (AIRE), a transcription factor expressed in medullary thymic epithelial cells (mTECs). Tissue-specific alternative splicing may also produce TRAs but the extent to which splice isoforms that are restricted to specific tissues are expressed in mTECs had yet to be investigated. Results: We reanalyzed microarray and RNA-Seq datasets from mouse mTECs and other epithelial and non-epithelial cell types and found that the diversity of splice isoforms in mTECs was greater than in any of the other cell types or tissues studied. We identified tissue-specific isoforms from a panel of mouse tissues and found several examples of such isoforms that are expressed in mTECs. The number of isoforms with restricted expression found in mTECs was significantly higher than for comparable cell types. Furthermore, we found evidence that AIRE influences the increased splicing diversity observed in mTECs as the genes for which tissue restricted isoforms are produced in mTECs were significantly more likely than other genes to be differentially spliced between AIRE knock-out and wildtype samples. Our results suggest that developing T lymphocytes are exposed to diverse tissue-restricted splice isoforms in the thymus and that AIRE has a direct or indirect role in this process, representing a novel aspect of its role in the maintenance of immune self-tolerance.

25 citations



Journal ArticleDOI
TL;DR: A novel approach to microarray analysis that attains many of the advantages of RNA-Seq, and can accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible.
Abstract: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories. We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues. This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.

5 citations


01 Jan 2015
TL;DR: The Machine Learning of Transcript Expression (MaLTE) method as mentioned in this paper leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and transcript expression estimates.
Abstract: Background: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories. Results: We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues. Conclusion: This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.

4 citations