Historically, why did comparative primate genomics focus on protein-coding sequences?
Comparative primate genomics historically focused on protein-coding sequences due to the importance of accurate gene prediction and alignment for understanding evolutionary relationships. Protein-coding sequences are crucial for studying positive selection, gene evolution, and adaptive radiation across primate lineages. Multiple Alignment of Coding Sequences (MACSE) was developed to align protein-coding sequences, even in the presence of frameshifts and stop codons, facilitating further analyses of selection based on substitutions. Genome-wide analysis revealed an excess of coincident single nucleotide polymorphisms (coSNPs) in coding regions, indicating potential signatures of primate protein evolution and the impact of purifying selection on these sequences. To address errors in protein-coding sequence alignments, COATi was introduced as a codon-aware aligner to reduce data discards and improve alignment accuracy.
Answers from top 5 papers
Papers (5) | Insight |
---|---|
Comparative primate genomics focused on protein-coding sequences historically due to their functional significance and evolutionary conservation, essential for phylogenetic inference and gene annotation. | |
6 Citations | Comparative primate genomics focused on protein-coding sequences historically due to their roles in primate innovations and adaptations, including in the nervous, skeletal, and digestive systems, aiding in understanding primate evolution. |
7 Citations | Comparative primate genomics focused on protein-coding sequences historically due to their potential to reveal evolutionary signatures, impacts on diseases, and insights into purifying selection's role in shaping genetic variation. |
9 Citations | Comparative primate genomics focused on protein-coding sequences due to the codon structure challenge, addressed by MACSE for accurate multiple sequence alignments essential for evolutionary analyses. |
Comparative primate genomics focused on protein-coding sequences due to the complexity of exon-intron structures, leading to errors in gene prediction algorithms affecting downstream analyses. |