This is the question I think about a lot, and one I spent a some time on in a recent minireview. The answer is, in a nutshell, very.
One of the things I had to do for this review was try and make sense out of the different approaches that have been described recently for studying alternative poly(A) site choice in plants. One of these – the use of high-throughput sequencing to sequence cDNA tags that query the exact mRNA-poly(A) junction – has been discussed previously, in a general sense and in terms of a study of poly(A) site choice in plants. In the latter study, it was determined that about 70% of plant genes possess at least two poly(A) sites.
My collaborator, Quinn Li, along with several other colleagues, took a somewhat different approach to this question. Quinn was working with Blake Meyers from the University of Delaware; Blake was a pioneer in the development of high throughput sequencing tools and has generated a lot of so-called MPSS data over the years. The cDNA tags that are generated in this approach are made by digesting cDNAs with a restriction enzyme and subsequently sequencing the tags from this site. In other words, the sequence “marks” the restriction site that is closest to the actual poly(A) site. This may be nearby, or not, depending on the gene. This is illustrated in the following:
Their analysis of these tags showed a widespread distribution of alternative poly(A) site choice in Arabidopsis and rice. They also showed that many sites in these plants were located within introns and protein-coding regions, much as was reported in the previous high-throughput study.
These high-throughput approaches confirm what has been suggested (or asserted) for a long time (at least as far back as 1986), namely that plant genes have a multiplicity of poly(A) sites. The gene-by-gene instances are too numerous to list here; besides the well-known example of alternative polyadenylation of transcripts encoded by the flowering regulatory gene FCA, one can list less well-known examples such as alternative polyadenylation of mRNAs that encode a polyadenylation factor subunit or a bifunctional gene necessary for lysine catabolism. Quinn and Blake present data that suggests that alternative polyadenylation can be affected by developmental cues, and Iida et al. saw similar indications for stress-responsive alternative polyadenylation. With the new technology available, it is feasible to start to look at these possibilities on a large scale. Which is, in a nutshell, the point with which my review ends.
[So, can we assign a number to “very”? I’m coy about stating one here since the different technologies give different values. The recent high throughput sequencing studies indicate that “>70%” is a safe value. This is probably an underestimate, but honing in on a more precise number becomes difficult, since further refinement accomplished by deeper sequencing introduces a greater likelihood of assigning as authentic poly(A) sites artifacts and sequencing errors.]