Alternative polyadenylation of mRNA precursors
Bin Tian
1
and James L. Manley
2
1
Department of Microbiology, Biochemistry and Molecular Genetics, New Jersey Medical School,
Rutgers University, Newark, New Jersey 07103, USA
2
Department of Biological Sciences, Columbia University, New York, New York 10027, USA
Abstract
Alternative polyadenylation (APA) is an RNA-processing mechanism that generates distinct 3′
termini on mRNAs and other RNA polymerase II transcripts. It is widespread across all eukaryotic
species and is recognized as a major mechanism of gene regulation. APA exhibits tissue specificity
and is important for cell proliferation and differentiation. In this Review, we discuss the roles of
APA in diverse cellular processes, including mRNA metabolism, protein diversification and
protein localization, and more generally in gene regulation. We also discuss the molecular
mechanisms underlying APA, such as variation in the concentration of core processing factors and
RNA-binding proteins, as well as transcription-based regulation.
The transcriptome of eukaryotic cells is produced by three RNA polymerases, each with its
own mechanisms for the maturation of the 3′ ends of nascent transcripts (reviewed in REF.
1). Protein-coding transcripts, or mRNAs, are transcribed by RNA polymerase II (Pol II).
With the exception of the canonical, replication-dependent transcripts encoding histones in
metazoans
2
, the maturation of mRNA 3′ ends involves endonucleolytic cleavage of the
nascent RNA followed by synthesis of a poly(A) tail on the 3′ terminus of the cleaved
product by a poly(A) polymerase (PAP). These two coupled reactions, collectively referred
to as cleavage and polyadenylation or, simply, polyadenylation, are intimately linked to
transcription termination
1
. Polyadenylation also occurs for some other Pol II products,
especially long non-coding RNAs (lncRNAs; non-coding transcripts of ∼200 nt or longer).
The sequences in the mRNA precursor and the proteins required for polyadenylation are
now well understood. The polyadenylation site, also known as the poly(A) site (PAS), is
defined by surrounding RNA sequence elements (BOX 1), which are generally conserved
across metazoans with some minor variations (BOX 1 and Supplementary information S1
(box)). However, major distinctions can be found in yeast and plant PASs
3
(Supplementary
information S1 (box)). Notably, the key protein factors responsible for polyadenylation are
conserved throughout eukaryotes, although the machinery in mammals, which comprises
more than 20 core proteins (BOX 1), has differences in protein composition and subcomplex
organization compared with the machinery in yeast
4–7
.
Competing interests statement
: The authors declare no competing interests.
HHS Public Access
Author manuscript
Nat Rev Mol Cell Biol
. Author manuscript; available in PMC 2017 June 26.
Published in final edited form as:
Nat Rev Mol Cell Biol
. 2017 January ; 18(1): 18–30. doi:10.1038/nrm.2016.116.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
It was first reported more than three decades ago that a gene can give rise to transcripts with
multiple PASs and that differential usage of these sites can lead to the formation of distinct
mRNA isoforms, a phenomenon termed alternative polyadenylation (APA; early studies
were reviewed in REFS 8,9). From early studies using expressed sequence tags
10,11
and
more recent analyses using high-throughput sequencing, we know that APA is very common
and occurs most frequently in the 3′ untranslated region (3′ UTR) of mRNAs, and that it is
used frequently in essentially all eukaryotes, from yeast to humans. For example, at least
70% of m ammalian mRNA-encoding genes express APA isoforms
12,13
. Substantial, albeit
slightly lower, APA frequencies have been reported in simpler species (Supplementary
information S1 (box)). In this Review, we discuss our current understanding of APA from
genomic as well as molecular and cellular perspectives, focusing mostly on the mechanisms
and consequences of APA in metazoans. Readers are referred to other reviews for
discussions of some early studies and of work in other species
6,14–19
.
APA in 3′ UTRs
Most APA sites are located in 3′ UTRs. In line with the nomenclature used for alternative
splicing, here we refer to the 3′ UTR portion upstream of the first, or proximal, PAS as the
constitutive UTR (cUTR) and the portion downstream as the alternative UTR (aUTR) (FIG.
1a). APA occurring in the 3′ UTR, referred to hereafter as 3′ UTR-APA, gives rise to
mRNA isoforms with significantly different 3′ UTR lengths. For example, for mouse
transcripts, the median 3′ UTR lengths of shortest and longest APA isoforms differ about
sevenfold, at 249 nt and 1,773 nt, respectively
13
. As 3′ UTRs contain
cis
elements that are
involved in various aspects of mRNA metabolism, 3′ UTR-APA can considerably affect
post-transcriptional gene regulation in various ways, including through the modulation of
mRNA stability, translation, nuclear export and cellular localization, and even through
effects on the localization of the encoded protein (FIG. 1b–d). One remarkable feature of 3′
UTR-APA is that it can be regulated globally, simultaneously involving numerous transcripts
in a cell. This was first shown for different human tissues that display a biased preference for
certain APA isoform types (BOX 2) and was later demonstrated in studies of proliferation-
and differentiation-based changes in APA profiles (BOX 3).
mRNA stability and translation
Perhaps the best studied consequence of 3′ UTR-APA is its effect on microRNA (miRNA)
functions. miRNAs are small RNAs (∼22 nt) that modulate the stability and/or translation of
their target complementary mRNAs
20
. miRNA target sites are generally located in 3′ UTRs.
In mammals, more than half of the conserved miRNA target sites are located in aUTRs
21,22
.
Differential targeting of 3′ UTR-APA isoforms was first demonstrated in activated T cells
and cancer cells, both of which display global 3′ UTR shortening compared with non-
activated T cells and non-transformed cells, respectively
21,23
. A recent study showed that
APA isoform expression influences about 10% of targeting by miRNAs between any two
cell types analysed and, importantly, that the accuracy of target prediction can be improved
if the cellular APA profile is considered
24
. Targeting by miRNAs is often influenced by
target site location in the mRNA and by the surrounding sequences
20
. For example, target
sites located near either end of a 3′ UTR tend to be more efficient than sites in the middle.
Tian and Manley
Page 2
Nat Rev Mol Cell Biol
. Author manuscript; available in PMC 2017 June 26.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Consistent with this, target sites for certain pro-proliferation miRNAs are enriched in the
region immediately upstream of the proximal PASs of pro-differentiation or anti-
proliferation mRNAs; the shortening of 3′ UTRs during cell proliferation improves the
targeting context for these miRNAs and can thus enhance their targeting efficiency and their
promotion of cell proliferation
25
.
3′ UTRs are also hotbeds for mRNA destabilization elements, which often function through
RNA-binding proteins (RBPs). Well-characterized motifs include AU-rich elements (AREs),
GU-rich elements (GREs) and PUF protein-binding elements
26
. As with miRNA target sites,
inclusion or exclusion of these elements by 3′ UTR-APA can affect mRNA stability. For
example, a genetic polymorphism leading to differential expression of two APA isoforms of
human IFN-regulatory factor 5 (IRF5) is linked to the risk of developing systemic lupus
erythematosus
27
(FIG. 1b). Because of the presence of an ARE in the aUTR, the two
isoforms have different decay rates
27
. In addition, RNA–RNA interactions, such as base
pairing between 3′ UTR-encoded Alu elements (which are the most abundant transposable
elements in the human genome) and lncRNAs can lead to mRNA decay through STAU1-
mediated mRNA decay
28
. Moreover, a long 3′ UTR is itself considered to be a feature that
causes mRNA degradation through nonsense-mediated mRNA decay
29
. It is therefore
generally believed that, owing to their tendency to harbour destabilizing elements and their
sheer size, isoforms with long 3′ UTRs are less stable than short isoforms. However, this
view has been challenged by a genome-wide study of the role of APA in mRNA decay in
mouse cells. Using the transcription inhibitor actinomycin D (ActD) to measure mRNA
stability, long isoforms were found to be only slightly less stable than short isoforms
30
.
Possible ActD-related artefacts notwithstanding, this suggests that the fate of 3′ UTR-APA
isoforms is more complex than was previously thought. For example, additional sequences
such as stabilizing elements in aUTRs can also substantially affect mRNA decay
30–34
.
Although our understanding is therefore far from complete, it is nonetheless now clear that
many genes produce multiple mRNA isoforms with different decay rates, highlighting the
importance of 3′ UTR-APA in modulating mRNA stability.
A related question is whether 3′ UTR-APA affects mRNA translation. Indeed, the above-
mentioned study analysing the effects of APA in mouse cells reported that long isoforms
were associated with slightly more ribosomes than were short isoforms
30
. As with the
destabilization effects of longer 3′ UTRs, this may be attributable to both translation-
enhancing and translation-suppressing elements in aUTRs. However, another study using
human cells reported a role for 3′ UTR length in suppressing translation and also detailed
variable effects of different 3′ UTR sequences on translation
35
. Hence, further work is
required to delineate how various
cis
elements and 3′ UTR size per se affect the stability
and translation of APA isoforms in different cell types and under different conditions, such
as cell stress and differentiation.
mRNA nuclear export and localization
Isoforms with a long 3′ UTR tend to be more abundant in the nucleus than in the
cytoplasm
36,37
. This was observed initially in a global analysis of all transcribed sequences
in human cells
37
, and a more recent study found that ∼10% of all detected 3′ UTR-APA
Tian and Manley
Page 3
Nat Rev Mol Cell Biol
. Author manuscript; available in PMC 2017 June 26.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
isoforms differed significantly in abundance between nuclear and cytoplasmic fractions
36
.
Although nuclear retention was reported for long isoforms containing certain
cis
elements in
the aUTR, such as inverted Alu repeats
38
, it is still uncertain how much of the differential
localization of the long isoforms is due to differences in mRNA stability rather than
differences in nuclear export. In addition, if regulation of nuclear export is involved, exactly
how
cis
elements in aUTRs and 3′ UTR size per se have an impact on export, and what the
functional significance of APA might be, remains unclear.
A better understood role of aUTRs in mRNA localization is the control of subcellular
localization in the cytoplasm. Such regulated mRNA localization can in turn facilitate
localized translation, which is an efficient way to enrich proteins at a specific cellular
location
39
. The relevance of APA for mRNA localization has been demonstrated for several
transcripts in neuronal cells, in which localized translation in dendrites and axons is
common. For example, a short isoform of the mRNA encoding brain-derived neurotrophic
factor (BDNF) is restricted to the cell body, whereas the long isoform localizes to the
dendrites, where it is translated
40
(FIG. 1c). Similarly, long and short isoforms of mRNAs
encoding inositol monophosphatase 1 (REF. 41) and RAN
42
are localized to the axon and
cell body, respectively. These reports suggest that long isoforms are more likely to be
located in dendrites or axons than are short isoforms. Conversely, a recent study compared
mRNA localization in neurites (dendrites and axons) versus the cell body for neuronal cell
lines and for primary cortical neurons, and this study found that short and long isoforms are
similarly enriched in neurites and in the cell body
43
. Future investigations are required to
delineate the underlying mechanisms involved and to address whether, as in mRNA stability,
cis
elements can function in both enhancing and suppressing subcellular localization of
mRNAs.
Protein localization
Sequences in 3′ UTRs have been implicated in mRNA localization to the ER to facilitate
the expression of membrane proteins
44,45
. A surprising recent study showed that the 3′
UTR can also regulate protein localization independently of mRNA localization
46
(FIG. 1d).
Specifically, the aUTR of the mRNA encoding the transmembrane protein CD47 was found
to act as a scaffold for a protein complex containing the RBP Hu antigen R (HUR; also
known as ELAVL1) and the phosphatase 2A inhibitor SET; this complex is therefore
recruited to the site of translation, resulting in the interaction of SET with the newly
translated cytoplasmic domains of CD47 and the subsequent translocation of CD47 to the
plasma membrane. The short mRNA isoform, which lacks the sequences necessary for
assembly of the HUR–SET complex, gives rise to CD47 that is primarily localized at the
ER. Thus, CD47 has a different localization, and hence a function, depending on whether it
is translated from the short or long mRNA isoform. This mechanism has also been observed
for transcripts encoding several other proteins, including CD44, α1 integrin (ITGA1) and
TNF receptor superfamily member 13C (TNFRSF13C)
46
.
Tian and Manley
Page 4
Nat Rev Mol Cell Biol
. Author manuscript; available in PMC 2017 June 26.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
APA upstream of the last exon
A sizable fraction of APA sites are located upstream of the last exon, mostly in introns. For
simplicity, we refer to this as upstream regions APA (UR-APA). In the mouse genome, for
example, more than 40% of genes have PASs of this type
13
. UR-APA leads to the expression
of alternative terminal exons and can result in changes to both the coding sequence and 3′
UTR of an mRNA. Depending on the configuration of splicing relative to the PAS, the
resulting alternative terminal exons can be divided into two subtypes (FIG. 2a): skipped
terminal exons, which are alternative upstream exons selected through splicing to be the
terminal exons, and composite terminal exons, which are formed by the extension of an
internal exon into the adjacent intron through inhibition of the 5′ splice site. In addition, a
small fraction of PASs can be identified in internal exons, leading to transcripts without an
in-frame stop codon, which are likely to be degraded rapidly through the non-stop decay
pathway
47
. However, in some rare cases, truncated proteins can be produced when
adenosine residues from the poly(A) tail are used to form a stop codon
48
. UR-APA is
generally upregulated in proliferating cells and suppressed during cell differentiation
13,43,49
,
mirroring the use of proximal PASs in 3′ UTRs, suggesting that UR-APA and 3′ UTR-APA
are mechanistically related in these conditions. Similar to 3′ UTR-APA, UR-APA can also
affect gene expression in various ways, and this is addressed below.
Protein diversification
Two classic APA events reported in the early 1980s, involving transcripts from the
calcitonin-related polypeptide-α gene (
CALCA
) and the gene encoding the immunoglobulin
M (IgM) heavy chain, are well-known examples of UR-APA. In the case of
CALCA
,
alternative splicing and the use of a proximal PAS generates a transcript containing a
skipped ter minal exon, and this mRNA isoform encodes the protein calcitonin, whereas the
use of a distal PAS in the 3′-most exon generates an mRNA encoding calcitonin gene-
related peptide 1 (CGRP)
50
. The regulation of APA is tissue specific in this case: when
comparing expression levels of the two isoforms, the calcitonin-encoding isoform is more
highly expressed in the thyroid, whereas the CGRP-encoding isoform predominates in the
hypothalamus. In the case of IgM heavy chain mRNA, during B cell activation there is a
switch from using a distal PAS in the 3′-most exon to using a proximal PAS in a composite
terminal exon, which results in a shift in protein production from a membrane-bound form
of the antibody to a secreted form
51
. Notably, bioinformatic analysis has identified at least
376 mouse genes that potentially use such a mechanism for regulating membrane
anchoring
52
. Manipulation of UR-APA-based protein isoform switching has also been
shown to be a promising therapeutic approach. For example, the addition of an antisense
RNA that attenuates splicing triggers the activation of an intronic PAS in the mRNA
encoding vascular endothelial growth factor receptor 2 (VEGFR2) and thus enforces the
expression of a soluble version of VEGFR2, which functions antagonistically to the
membrane-bound form and inhibits angiogenesis
53
.
In addition to the generation of proteins with distinct functions, UR-APA can lead to the
expression of truncated proteins with dominant negative functions. For example,
retinoblastoma-binding protein 6 (RBBP6) is a recently characterized polyadenylation factor
Tian and Manley
Page 5
Nat Rev Mol Cell Biol
. Author manuscript; available in PMC 2017 June 26.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript