scispace - formally typeset
Search or ask a question

Showing papers on "Pseudogene published in 2022"


Journal ArticleDOI
TL;DR: Updates to the HUGO Gene Nomenclature Committee resource are described, including improvements to the search facility and new download features.
Abstract: Abstract The HUGO Gene Nomenclature Committee (HGNC) assigns unique symbols and names to human genes. The HGNC database (www.genenames.org) currently contains over 43 000 approved gene symbols, over 19 200 of which are assigned to protein-coding genes, 14 000 to pseudogenes and nearly 9000 to non-coding RNA genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC nomenclature advisors and links to related genomic, clinical, and proteomic information. Here, we describe updates to our resource, including improvements to our search facility and new download features.

30 citations


Journal ArticleDOI
TL;DR: All members of the Borrelia genus that have been examined harbour a linear chromosome that is about 900 kbp in length, as well as a plethora of both linear and circular plasmids in the 5-220 kbp size range.
Abstract: All members of the Borrelia genus that have been examined harbour a linear chromosome that is about 900 kbp in length, as well as a plethora of both linear and circular plasmids in the 5-220 kbp size range. Genome sequences for 27 Lyme disease Borrelia isolates have been determined since the elucidation of the B. burgdorferi B31 genome sequence in 1997. The chromosomes, which carry the vast majority of the housekeeping genes, appear to be very constant in gene content and organization across all Lyme disease Borrelia species. The content of the plasmids, which carry most of the genes that encode the differentially expressed surface proteins that interact with the spirochete's arthropod and vertebrate hosts, is much more variable. Lyme disease Borrelia isolates carry between 7-21 different plasmids, ranging in size from 5-84 kbp. All strains analyzed to date harbor three plasmids, cp26, lp54 and lp17. The plasmids are unusual, as compared to most bacterial plasmids, in that they contain many paralogous sequences, a large number of pseudogenes, and, in some cases, essential genes. In addition, a number of the plasmids have features indicating that they are prophages. Numerous methods have been developed for Lyme disease Borrelia strain typing. These have proven valuable for clinical and epidemiological studies, as well as phylogenomic and population genetic analyses. Increasingly, these approaches have been displaced by whole genome sequencing techniques. Some correlations between genome content and pathogenicity have been deduced, and comparative whole genome analyses promise future progress in this arena.

25 citations


Journal ArticleDOI
TL;DR: The successor of the first ferroptosis database, FerrDb V1, is presented, featuring external gene-related data were integrated, enabling thought-provoking and gene-oriented analysis in FerrDB V2, and will help to acquire deeper insights into ferroPTosis.
Abstract: Abstract Ferroptosis is a mode of regulated cell death characterized by iron-dependent accumulation of lipid peroxidation. It is closely linked to the pathophysiological processes in many diseases. Since our publication of the first ferroptosis database in 2020 (FerrDb V1), many new findings have been published. To keep up with the rapid progress in ferroptosis research and to provide timely and high-quality data, here we present the successor, FerrDb V2. It contains 1001 ferroptosis regulators and 143 ferroptosis-disease associations manually curated from 3288 articles. Specifically, there are 621 gene regulators, of which 264 are drivers, 238 are suppressors, 9 are markers, and 110 are unclassified genes; and there are 380 substance regulators, with 201 inducers and 179 inhibitors. Compared to FerrDb V1, curated articles increase by >300%, ferroptosis regulators increase by 175%, and ferroptosis-disease associations increase by 50.5%. Circular RNA and pseudogene are novel regulators in FerrDb V2, and the percentage of non-coding RNA increases from 7.3% to 13.6%. External gene-related data were integrated, enabling thought-provoking and gene-oriented analysis in FerrDb V2. In conclusion, FerrDb V2 will help to acquire deeper insights into ferroptosis. FerrDb V2 is freely accessible at http://www.zhounan.org/ferrdb/.

17 citations


Journal ArticleDOI
TL;DR: Pseudofinder as mentioned in this paper is an open-source software dedicated to pseudogene identification and analysis in bacterial and archaeal genomes, which can detect a wide variety of pseudogenes, including those that are highly degraded and typically missed by gene-calling pipelines, as well newly formed pseudogenees containing only one or few inactivating mutations.
Abstract: Prokaryotic genomes are usually densely packed with intact and functional genes. However, in certain contexts, such as after recent ecological shifts or extreme population bottlenecks, broken and nonfunctional gene fragments can quickly accumulate and form a substantial fraction of the genome. Identification of these broken genes, called pseudogenes, is a critical step for understanding the evolutionary forces acting upon, and the functional potential encoded within, prokaryotic genomes. Here, we present Pseudofinder, an open-source software dedicated to pseudogene identification and analysis in bacterial and archaeal genomes. We demonstrate that Pseudofinder's multi-pronged, reference-based approach can detect a wide variety of pseudogenes, including those that are highly degraded and typically missed by gene-calling pipelines, as well newly formed pseudogenes containing only one or a few inactivating mutations. Additionally, Pseudofinder can detect genes that lack inactivating substitutions but experiencing relaxed selection. Implementation of Pseudofinder in annotation pipelines will allow more precise estimations of the functional potential of sequenced microbes, while also generating new hypotheses related to the evolutionary dynamics of bacterial and archaeal genomes.

14 citations


Journal ArticleDOI
TL;DR: This article showed that the response to low, appetitive salt concentrations in Drosophila depends on Ir56b, an atypical member of the ionotropic receptor (Ir) family.

12 citations


Journal ArticleDOI
TL;DR: The reported roles of exosomal ceRNA in the diagnosis and treatment of malignant tumors and the mechanisms underlying these are reviewed.
Abstract: ABSTRACT Malignant tumors are a threat to human health, thus it is critical to better understand the mechanism of tumor occurrence and development and to find key therapeutic targets. Competitive endogenous RNA (ceRNA) is a type of RNA molecule that includes mRNA of coding-protein, pseudogenes, long non-coding RNA (lncRNA), and circular RNA (circRNA) etc. It is created through a competitive combination of common small RNA (miRNA) and has an inhibitory effect on mRNA translation. ceRNA regulate the post transcriptional expression of genes by competitively binding to common microRNAs (miRNAs).Studies have shown that cernas are involved in tumor cell proliferation, invasion and migration, drug resistance, angiogenesis, as well as tumor immunity, and so on, affecting the progression of tumor development. This article reviews the reported roles of exosomal ceRNA in the diagnosis and treatment of malignant tumors and the mechanisms underlying these. Graphical abstract

11 citations


Journal ArticleDOI
TL;DR: The role and function of the complex regulatory interactions between multiple types of ncRNAs in the growth of thyroid cancer remains unknown as mentioned in this paper , however, it has been suggested that some nc RNAs, such as long noncoding RNAs (lncRNAs), pseudogenes and circular RNAs(circRNAs) play an important role in the progression of TC.

10 citations


Journal ArticleDOI
07 Sep 2022-eLife
TL;DR: In this paper , a central role for Long Interspersed Nuclear Element-1 retrotransposition in HGT to virus genomes is discovered, which reveals a previously unseen conduit of genetic traffic with fundamental implications for the evolution of many virus classes and their hosts.
Abstract: Horizontal gene transfer (HGT) provides a major source of genetic variation. Many viruses, including poxviruses, encode genes with crucial functions directly gained by gene transfer from hosts. The mechanism of transfer to poxvirus genomes is unknown. Using genome analysis and experimental screens of infected cells, we discovered a central role for Long Interspersed Nuclear Element-1 retrotransposition in HGT to virus genomes. The process recapitulates processed pseudogene generation, but with host messenger RNA directed into virus genomes. Intriguingly, hallmark features of retrotransposition appear to favor virus adaption through rapid duplication of captured host genes on arrival. Our study reveals a previously unrecognized conduit of genetic traffic with fundamental implications for the evolution of many virus classes and their hosts.

10 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed that transposable elements and possibly positive selection would be involved in the highly dynamic evolution of gustatory receptor in Spodoptera spp.
Abstract: The bitter taste, triggered via gustatory receptors, serves as an important natural defense against the ingestion of poisonous foods in animals, and the increased host breadth is usually linked to an increase in the number of gustatory receptor genes. This has been especially observed in polyphagous insect species, such as noctuid species from the Spodoptera genus. However, the dynamic and physical mechanisms leading to these gene expansions and the evolutionary pressures behind them remain elusive. Among major drivers of genome dynamics are the transposable elements but, surprisingly, their potential role in insect gustatory receptor expansion has not been considered yet. In this work, we hypothesized that transposable elements and possibly positive selection would be involved in the highly dynamic evolution of gustatory receptor in Spodoptera spp. We first sequenced de novo the full 465 Mb genome of S. littoralis, and manually annotated the main chemosensory genes, including a large repertoire of 373 gustatory receptor genes (including 19 pseudogenes). We also improved the completeness of S. frugiperda and S. litura gustatory receptor gene repertoires. Then, we annotated transposable elements and revealed that a particular category of class I retrotransposons, the SINE transposons, was significantly enriched in the vicinity of gustatory receptor gene clusters, suggesting a transposon-mediated mechanism for the formation of these clusters. Selection pressure analyses indicated that positive selection within the gustatory receptor gene family is cryptic, only 7 receptors being identified as positively selected. Altogether, our data provide a new good quality Spodoptera genome, pinpoint interesting gustatory receptor candidates for further functional studies and bring valuable genomic information on the mechanisms of gustatory receptor expansions in polyphagous insect species.

9 citations


Journal ArticleDOI
TL;DR: Although the chloroplast genome structure of C. subtilis was found to be conserved and stable in general, 26 SSRs and 13 highly variable loci were detected, these regions have the potential to be developed as important molecular markers for the subfamily Pooideae.
Abstract: Coleanthus subtilis (Tratt.) Seidel (Poaceae) is an ephemeral grass from the monotypic genus Coleanthus Seidl, which grows on wet muddy areas such as fishponds or reservoirs. As a rare species with strict habitat requirements, it is protected at international and national levels. In this study, we sequenced its whole chloroplast genome for the first time using the next-generation sequencing (NGS) technology on the Illumina platform, and performed a comparative and phylogenetic analysis with the related species in Poaceae. The complete chloroplast genome of C. subtilis is 135,915 bp in length, with a quadripartite structure having two 21,529 bp inverted repeat regions (IRs) dividing the entire circular genome into a large single copy region (LSC) of 80,100 bp and a small single copy region (SSC) of 12,757 bp. The overall GC content is 38.3%, while the GC contents in LSC, SSC, and IR regions are 36.3%, 32.4%, and 43.9%, respectively. A total of 129 genes were annotated in the chloroplast genome, including 83 protein-coding genes, 38 tRNA genes, and 8 rRNA genes. The accD gene and the introns of both clpP and rpoC1 genes were missing. In addition, the ycf1, ycf2, ycf15, and ycf68 were pseudogenes. Although the chloroplast genome structure of C. subtilis was found to be conserved and stable in general, 26 SSRs and 13 highly variable loci were detected, these regions have the potential to be developed as important molecular markers for the subfamily Pooideae. Phylogenetic analysis with species in Poaceae indicated that Coleanthus and Phippsia were sister groups, and provided new insights into the relationship between Coleanthus, Zingeria, and Colpodium. This study presents the initial chloroplast genome report of C. subtilis, which provides an essential data reference for further research on its origin.

9 citations


Journal ArticleDOI
TL;DR: Gauchian as mentioned in this paper is a tool for short-read, whole-genome sequencing data analysis, and it outperformed the GATK Best Practices pipeline for detecting GBAP1-related mutations.
Abstract: Abstract GBA variants carriers are at increased risk of Parkinson’s disease (PD) and Lewy body dementia (LBD). The presence of pseudogene GBAP1 predisposes to structural variants, complicating genetic analysis. We present two methods to resolve recombinant alleles and other variants in GBA : Gauchian, a tool for short-read, whole-genome sequencing data analysis, and Oxford Nanopore sequencing after PCR enrichment. Both methods were concordant for 42 samples carrying a range of recombinants and GBAP1 -related mutations, and Gauchian outperformed the GATK Best Practices pipeline. Applying Gauchian to sequencing of over 10,000 individuals shows that copy number variants (CNVs) spanning GBAP1 are relatively common in Africans. CNV frequencies in PD and LBD are similar to controls. Gains may coexist with other mutations in patients, and a modifying effect cannot be excluded. Gauchian detects more GBA variants in LBD than PD, especially severe ones. These findings highlight the importance of accurate GBA analysis in these patients.

Journal ArticleDOI
01 Jan 2022-Genes
TL;DR: The genome organization, gene content, and structural features of the chloroplast genome of C. spruceanum are reported for the first time, providing valuable information for genetic and evolutionary studies in the genus Calycophyllum and beyond.
Abstract: Capirona (Calycophyllum spruceanum Benth.) belongs to subfamily Ixoroideae, one of the major lineages in the Rubiaceae family, and is an important timber tree. It originated in the Amazon Basin and has widespread distribution in Bolivia, Peru, Colombia, and Brazil. In this study, we obtained the first complete chloroplast (cp) genome of capirona from the department of Madre de Dios located in the Peruvian Amazon. High-quality genomic DNA was used to construct libraries. Pair-end clean reads were obtained by PE 150 library and the Illumina HiSeq 2500 platform. The complete cp genome of C. spruceanum has a 154,480 bp in length with typical quadripartite structure, containing a large single copy (LSC) region (84,813 bp) and a small single-copy (SSC) region (18,101 bp), separated by two inverted repeat (IR) regions (25,783 bp). The annotation of C. spruceanum cp genome predicted 87 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, 37 transfer RNA (tRNA) genes, and one pseudogene. A total of 41 simple sequence repeats (SSR) of this cp genome were divided into mononucleotides (29), dinucleotides (5), trinucleotides (3), and tetranucleotides (4). Most of these repeats were distributed in the noncoding regions. Whole chloroplast genome comparison with the other six Ixoroideae species revealed that the small single copy and large single copy regions showed more divergence than inverted regions. Finally, phylogenetic analyses resolved that C. spruceanum is a sister species to Emmenopterys henryi and confirms its position within the subfamily Ixoroideae. This study reports for the first time the genome organization, gene content, and structural features of the chloroplast genome of C. spruceanum, providing valuable information for genetic and evolutionary studies in the genus Calycophyllum and beyond.

Journal ArticleDOI
TL;DR: Divergence time analysis based on the complete chloroplast genome sequences showed that Ficus species diverged rapidly during the early to middle Miocene, and provided basic resources for further evolutionary studies of Ficus.
Abstract: As the largest genus in Moraceae, Ficus is widely distributed across tropical and subtropical regions and exhibits a high degree of adaptability to different environments. At present, however, the phylogenetic relationships of this genus are not well resolved, and chloroplast evolution in Ficus remains poorly understood. Here, we sequenced, assembled, and annotated the chloroplast genomes of 10 species of Ficus, downloaded and assembled 13 additional species based on next-generation sequencing data, and compared them to 46 previously published chloroplast genomes. We found a highly conserved genomic structure across the genus, with plastid genome sizes ranging from 159,929 bp (Ficus langkokensis) to 160,657 bp (Ficus religiosa). Most chloroplasts encoded 113 unique genes, including a set of 78 protein-coding genes, 30 transfer RNA (tRNA) genes, four ribosomal RNA (rRNA) genes, and one pseudogene (infA). The number of simple sequence repeats (SSRs) ranged from 67 (Ficus sagittata) to 89 (Ficus microdictya) and generally increased linearly with plastid size. Among the plastomes, comparative analysis revealed eight intergenic spacers that were hotspot regions for divergence. Additionally, the clpP, rbcL, and ccsA genes showed evidence of positive selection. Phylogenetic analysis indicated that none of the six traditionally recognized subgenera of Ficus were monophyletic. Divergence time analysis based on the complete chloroplast genome sequences showed that Ficus species diverged rapidly during the early to middle Miocene. This research provides basic resources for further evolutionary studies of Ficus.

Journal ArticleDOI
TL;DR: In this article , the authors focused on gene losses (unitary pseudogenes) and systematically analyzed gene losses in NMRs and BMRs, aiming to elucidate the potential roles of pseudogenees in their adaptation to subterranean lifestyle.
Abstract: Naked mole-rats (Heterocephalus glaber, NMRs) and blind mole-rats (Spalax galili, BMRs) are representative subterranean rodents that have evolved many extraordinary traits, including hypoxia tolerance, longevity, and cancer resistance. Although multiple candidate loci responsible for these traits have been uncovered by genomic studies, many of them are limited to functional changes to amino acid sequence and little is known about the contributions of other genetic events. To address this issue, we focused on gene losses (unitary pseudogenes) and systematically analyzed gene losses in NMRs and BMRs, aiming to elucidate the potential roles of pseudogenes in their adaptation to subterranean lifestyle.We obtained the pseudogene repertoires in NMRs and BMRs, as well as their respective aboveground relatives, guinea pigs and rats, on a genome-wide scale. As a result, 167, 139, 341, and 112 pseudogenes were identified in NMRs, BMRs, guinea pigs, and rats, respectively. Functional enrichment analysis identified 4 shared and 2 species-specific enriched functional groups (EFGs) in subterranean lineages. Notably, the pseudogenes in these EFGs might be associated with either regressive (e.g., visual system) or adaptive (e.g., altered DNA damage response) traits. In addition, several pseudogenes including TNNI3K and PDE5A might be associated with specific cardiac features observed in subterranean lineages. Interestingly, we observed 20 convergent gene losses in NMRs and BMRs. Given that the functional investigations of these genes are generally scarce, we provided functional evidence that independent loss of TRIM17 in NMRs and BMRs might be beneficial for neuronal survival under hypoxia, supporting the positive role of eliminating TRIM17 function in hypoxia adaptation. Our results also suggested that pseudogenes, together with positively selected genes, reinforced subterranean adaptations cooperatively.Our study provides new insights into the molecular underpinnings of subterranean adaptations and highlights the importance of gene losses in mammalian evolution.

Journal ArticleDOI
TL;DR: PLASTER as mentioned in this paper is a robust data processing pipeline for accurate allele typing of SMRT sequenced amplicons, which can improve drug safety and efficacy through screening prior to drug administration, but it is limited in scale and has employed nascent data processing pipelines.
Abstract: Abstract The CYP2D6 enzyme is estimated to metabolize 25% of commonly used pharmaceuticals and is of intense pharmacogenetic interest due to the polymorphic nature of the CYP2D6 gene. Accurate allele typing of CYP2D6 has proved challenging due to frequent copy number variants (CNVs) and paralogous pseudogenes. SNP-arrays, qPCR and short-read sequencing have been employed to interrogate CYP2D6 , however these technologies are unable to capture longer range information. Long-read sequencing using the PacBio Single Molecule Real Time (SMRT) sequencing platform has yielded promising results for CYP2D6 allele typing. However, previous studies have been limited in scale and have employed nascent data processing pipelines. We present a robust data processing pipeline “PLASTER” for accurate allele typing of SMRT sequenced amplicons. We demonstrate the pipeline by typing CYP2D6 alleles in a large cohort of 377 Solomon Islanders. This pharmacogenetic method will improve drug safety and efficacy through screening prior to drug administration.

Journal ArticleDOI
TL;DR: PLASTER as discussed by the authors is a robust data processing pipeline for accurate allele typing of SMRT sequenced amplicons, which can improve drug safety and efficacy through screening prior to drug administration, but it is limited in scale and has employed nascent data processing pipelines.
Abstract: Abstract The CYP2D6 enzyme is estimated to metabolize 25% of commonly used pharmaceuticals and is of intense pharmacogenetic interest due to the polymorphic nature of the CYP2D6 gene. Accurate allele typing of CYP2D6 has proved challenging due to frequent copy number variants (CNVs) and paralogous pseudogenes. SNP-arrays, qPCR and short-read sequencing have been employed to interrogate CYP2D6 , however these technologies are unable to capture longer range information. Long-read sequencing using the PacBio Single Molecule Real Time (SMRT) sequencing platform has yielded promising results for CYP2D6 allele typing. However, previous studies have been limited in scale and have employed nascent data processing pipelines. We present a robust data processing pipeline “PLASTER” for accurate allele typing of SMRT sequenced amplicons. We demonstrate the pipeline by typing CYP2D6 alleles in a large cohort of 377 Solomon Islanders. This pharmacogenetic method will improve drug safety and efficacy through screening prior to drug administration.

Journal ArticleDOI
TL;DR: In this paper , a combination of transcriptomics and exon capture was used to investigate the role of neutral evolutionary processes in the loss of vision over evolutionary time in a water beetle (Dytiscidae).

Journal ArticleDOI
TL;DR: It is shown that the analysis of barcode sequences fails to reconstruct accurate species trees and differentiate species when the organisms have chimeric genomes composed of admixed mosaics of different origins.
Abstract: DNA barcoding is based on the premise that the barcode sequences can distinguish individuals (strains) of different species because their sequence variation between species exceeds that within species. The primary barcodes used in fungal and yeast taxonomy are the ITS segments and the LSU (large subunit) D1/D2 domain of the homogenized multicopy rDNA repeats. The secondary barcodes are conserved segments of protein‐encoding genes, which usually have single copies in haploid genomes. This study shows that the analysis of barcode sequences fails to reconstruct accurate species trees and differentiate species when the organisms have chimeric genomes composed of admixed mosaics of different origins. It is shown that the type strains of 10 species of the pulcherrima clade of the ascomycetous yeast genus Metschnikowia cannot be differentiated with standard barcodes because their intragenomic diversity is comparable to or even higher than the interstrain diversity. The analysis of a large group of genes of the sequenced genomes of the clade and the viability and segregation of the hybrids of ex‐type strains indicate that the high intragenomic barcode differences can be attributed to admixed genome structures. Because of the mosaic structures of the genomes, the rDNA repeats do not form continuous arrays and thus cannot be homogenized. Since the highly diverse ITS and D1/D2 sequences of the type strains form a continuous pool including pseudogenes, the evolution of their rDNA appears to involve reticulation. The secondary barcode sequences and the nonbarcode genes included in the analysis show incongruent phylogenetic relationships among the type strains, which can also be attributed to differences in the phylogenetic histories of the genes.

Journal ArticleDOI
23 May 2022-PLOS ONE
TL;DR: PltRNAdb (https://bioinformatics.um6p.edu.ma/PltRNAdbs/index.php) is a data source for tRNA genes from 256 plant species as mentioned in this paper .
Abstract: Transfer RNAs (tRNAs) are intermediate-sized non-coding RNAs found in all organisms that help translate messenger RNA into protein. Recently, the number of sequenced plant genomes has increased dramatically. The availability of this extensive data greatly accelerates the study of tRNAs on a large scale. Here, 8,768,261 scaffolds/chromosomes containing 229,093 giga-base pairs representing whole-genome sequences of 256 plant species were analyzed to identify tRNA genes. As a result, 331,242 nuclear, 3,216 chloroplast, and 1,467 mitochondrial tRNA genes were identified. The nuclear tRNA genes include 275,134 tRNAs decoding 20 standard amino acids, 1,325 suppressor tRNAs, 6,273 tRNAs with unknown isotypes, 48,475 predicted pseudogenes, and 37,873 tRNAs with introns. Efforts also extended to the creation of PltRNAdb (https://bioinformatics.um6p.ma/PltRNAdb/index.php), a data source for tRNA genes from 256 plant species. PltRNAdb website allows researchers to search, browse, visualize, BLAST, and download predicted tRNA genes. PltRNAdb will help improve our understanding of plant tRNAs and open the door to discovering the unknown regulatory roles of tRNAs in plant genomes.

Journal ArticleDOI
TL;DR: In this paper , a pipeline consisting of de novo transcript assembly, six frame-translated custom database, and a combination of search engines was used to identify novel peptides.

Journal ArticleDOI
01 Jan 2022-Genomics
TL;DR: This paper investigated transcriptome-wide scatter of 23 cell types and conditions across different levels of biological complexity, focusing on genes that act like toggle switches between pairwise replicates of the same cell type, referred as ON/OFF genes.

Journal ArticleDOI
01 Apr 2022-Biology
TL;DR: The proposed model, Gene Attention Ensemble NETwork (GAENET), demonstrated superior performance compared to conventional models, and HILS1 was discovered as the most significant prognostic gene for LGG by taking advantage of the gene attention mechanism.
Abstract: Simple Summary This paper proposes a deep learning model for prognosis estimation and an attention mechanism for gene expression. In the prognosis estimation of low-grade glioma (LGG), the proposed model, Gene Attention Ensemble NETwork (GAENET), demonstrated superior performance compared to conventional models, where GAENET exhibited an improvement of 7.2% compared to the second-best model. By the proposed gene attention, HILS1 was discovered as the most significant prognostic gene for LGG. While HILS1 is classified as a pseudogene, it functions as a biomarker for predicting the prognosis of LGG and has been shown to have the ability to regulate the expression of other prognostic genes. Abstract While estimating the prognosis of low-grade glioma (LGG) is a crucial problem, it has not been extensively studied to introduce recent improvements in deep learning to address the problem. The attention mechanism is one of the significant advances; however, it is still unclear how attention mechanisms are used in gene expression data to estimate prognosis because they were designed for convolutional layers and word embeddings. This paper proposes an attention mechanism called gene attention for gene expression data. Additionally, a deep learning model for prognosis estimation of LGG is proposed using gene attention. The proposed Gene Attention Ensemble NETwork (GAENET) outperformed other conventional methods, including survival support vector machine and random survival forest. When evaluated by C-Index, the GAENET exhibited an improvement of 7.2% compared to the second-best model. In addition, taking advantage of the gene attention mechanism, HILS1 was discovered as the most significant prognostic gene in terms of deep learning training. While HILS1 is known as a pseudogene, HILS1 is a biomarker estimating the prognosis of LGG and has demonstrated a possibility of regulating the expression of other prognostic genes.

Journal ArticleDOI
TL;DR: The eukaryotic translation elongation factors (EEFs) are coding-genes that play a central role in the elongation step of translation but are often altered in cancer as mentioned in this paper .
Abstract: The eukaryotic translation elongation factors (EEFs), i.e. EEF1A1, EEF1A2, EEF1B2, EEF1D, EEF1G, EEF1E1 and EEF2, are coding-genes that play a central role in the elongation step of translation but are often altered in cancer. Less investigated are their pseudogenes. Recently, it was demonstrated that pseudogenes have a key regulatory role in the cell, especially via non-coding RNAs, and that the aberrant expression of ncRNAs has an important role in cancer development and progression. The present review paper, for the first time, collects all that published about the EEFs pseudogenes to create a base for future investigations. For most of them, the studies are in their infancy, while for others the studies suggest their involvement in normal cell physiology but also in various human diseases. However, more investigations are needed to understand their functions in both normal and cancer cells and to define which can be useful biomarkers or therapeutic targets.

Journal ArticleDOI
TL;DR: PCR amplification and sequencing using primers designed on transgene and promotors and/or polyadenylation signal for gene expression are useful for gene-doping detection as an additional confirmatory test to prevent false positives.
Abstract: Processed pseudogenes, also known as retrocopy genes, are copies of messenger RNAs that have been reverse transcribed into DNA and inserted into the genome. In this study, we identified 62 processed pseudogene candidates as intron-less genes from whole-genome sequencing (WGS) data of Thoroughbred horses using delly structural variation software. The 62 processed pseudogene candidates were confirmed by PCR amplification of intron-less products. A total of 11 processed pseudogenes were confirmed in the genome of all 23 analysed horses, whereas three processed pseudogenes with structures of ATP11B, DPH3 and RPL17 were detected in only one of 115 horses by PCR amplification of intron-less products. Currently, most of the gene doping tests proposed in human and horse sports are adapted PCR-based methods using hydrolysis probes to detect exon/exon junctions in transgenes because the operation is simple and economical. However, when the pseudogene is present in the host genome, the PCR-based methods may have a potential risk of detecting false positives. In this study, because processed pseudogenes that exist less frequently in the horse genome may affect PCR-based transgene detection in gene-doping tests, we propose and demonstrate that PCR amplification and sequencing using primers designed on transgene and promotors and/or polyadenylation signal for gene expression are useful for gene-doping detection as an additional confirmatory test to prevent false positives.

Journal ArticleDOI
TL;DR: In this paper , the role of Plasminogen like A (PLGLA) in hepatocellular carcinoma (HCC) was explored using The Cancer Genome Atlas (TCGA) datasets.

Journal ArticleDOI
TL;DR: In this paper , the authors present evidence-based recommendations for the technical implementation and interpretation of biochemical and genetic testing for the diagnosis of Gaucher disease to ensure a timely and accurate diagnosis for patients with GD worldwide.
Abstract: Abstract Gaucher disease (GD) is an autosomal recessive lysosomal storage disorder due to the deficient activity of the acid beta-glucosidase (GCase) enzyme, resulting in the progressive lysosomal accumulation of glucosylceramide (GlcCer) and its deacylated derivate, glucosylsphingosine (GlcSph). GCase is encoded by the GBA1 gene, located on chromosome 1q21 16 kb upstream from a highly homologous pseudogene. To date, more than 400 GBA1 pathogenic variants have been reported, many of them derived from recombination events between the gene and the pseudogene. In the last years, the increased access to new technologies has led to an exponential growth in the number of diagnostic laboratories offering GD testing. However, both biochemical and genetic diagnosis of GD are challenging and to date no specific evidence-based guidelines for the laboratory diagnosis of GD have been published. The objective of the guidelines presented here is to provide evidence-based recommendations for the technical implementation and interpretation of biochemical and genetic testing for the diagnosis of GD to ensure a timely and accurate diagnosis for patients with GD worldwide. The guidelines have been developed by members of the Diagnostic Working group of the International Working Group of Gaucher Disease (IWGGD), a non-profit network established to promote clinical and basic research into GD for the ultimate purpose of improving the lives of patients with this disease. One of the goals of the IWGGD is to support equitable access to diagnosis of GD and to standardize procedures to ensure an accurate diagnosis. Therefore, a guideline development group consisting of biochemists and geneticists working in the field of GD diagnosis was established and a list of topics to be discussed was selected. In these guidelines, twenty recommendations are provided based on information gathered through a systematic review of the literature and two different diagnostic algorithms are presented, considering the geographical differences in the access to diagnostic services. Besides, several gaps in the current diagnostic workflow were identified and actions to fulfill them were taken within the IWGGD. We believe that the implementation of recommendations provided in these guidelines will promote an equitable, timely and accurate diagnosis for patients with GD worldwide.

Journal ArticleDOI
TL;DR: In this article , a hybrid sequencing dataset combining full-length PacBio sequencing, sample-matched Illumina sequencing, and public time-course transcriptome data was used to systematically characterize pseudogenes and observe a burst of pseudogene gain in these two lineages.
Abstract: Pseudogenes are excellent markers for genome evolution, which are emerging as crucial regulators of development and disease, especially cancer. However, systematic functional characterization and evolution of pseudogenes remain largely unexplored.To systematically characterize pseudogenes, we date the origin of human and mouse pseudogenes across vertebrates and observe a burst of pseudogene gain in these two lineages. Based on a hybrid sequencing dataset combining full-length PacBio sequencing, sample-matched Illumina sequencing, and public time-course transcriptome data, we observe that abundant mammalian pseudogenes could be transcribed, which contribute to the establishment of organ identity. Our analyses reveal that developmentally dynamic pseudogenes are evolutionarily conserved and show an increasing weight during development. Besides, they are involved in complex transcriptional and post-transcriptional modulation, exhibiting the signatures of functional enrichment. Coding potential evaluation suggests that 19% of human pseudogenes could be translated, thus serving as a new way for protein innovation. Moreover, pseudogenes carry disease-associated SNPs and conduce to cancer transcriptome perturbation.Our discovery reveals an unexpectedly high abundance of mammalian pseudogenes that can be transcribed and translated, and these pseudogenes represent a novel regulatory layer. Our study also prioritizes developmentally dynamic pseudogenes with signatures of functional enrichment and provides a hybrid sequencing dataset for further unraveling their biological mechanisms in organ development and carcinogenesis in the future.

Posted ContentDOI
23 May 2022-bioRxiv
TL;DR: Long-read sequencing is used to determine the full mutation spectrum in MA lines derived from two strains of the green alga Chlamydomonas reinhardtii, demonstrating that diverse types of SMs occur at substantial rates and support prominent roles for SMs and TEs in evolution.
Abstract: Genetic variation originates from several types of spontaneous mutation, including single nucleotide substitutions, short insertions and deletions (INDELs), and larger structural changes. Structural mutations (SMs) drive genome evolution and are thought to play major roles in evolutionary adaptation, speciation and genetic disease, including cancers. Sequencing of mutation accumulation (MA) lines has provided estimates of rates and spectra of single nucleotide and INDEL mutations in many species, yet the rate of new SMs is largely unknown. Here, we use long-read sequencing to determine the full mutation spectrum in MA lines derived from two strains (CC-1952 and CC-2931) of the green alga Chlamydomonas reinhardtii. The SM rate is highly variable between strains and MA lines, and SMs represent a substantial proportion of all mutations in both strains (CC-1952 6%; CC-2931 12%). The SM spectra also differs considerably between the two strains, with almost all inversions and translocations occurring in CC-2931 MA lines. This variation is associated with heterogeneity in the number and type of active transposable elements (TEs), which comprise major proportions of SMs in both strains (CC-1952 22% and CC-2931 38% of SMs). In CC-2931, a Crypton and a previously undescribed type of DNA element caused 71% of chromosomal rearrangements, while in CC-1952 a Dualen LINE was associated with 87% of duplications. Other SMs, notably many large duplications in CC-2931, were likely products of various double-strand break repair pathways. Our results demonstrate that diverse types of SMs occur at substantial rates and support prominent roles for SMs and TEs in evolution.

Journal ArticleDOI
TL;DR: The emerging roles of various types of noncoding RNAs in CRC and their future implications in colorectal cancer management and research are discussed.

Journal ArticleDOI
TL;DR: In this paper , the authors review the machinery of antigenic variation and discuss if there remains the possibility of trypanosome adaptations, or even trypanosa-specific machineries, that might offer opportunities to impair this crucial parasite-survival process.