Following up

A short time ago, I mentioned this article.  This study was the product of a collaboration between five laboratories – two plant poly(A) labs, a seed biology lab, and two bioinformatics groups.  As the abstract indicated, this paper describes the results of a characterization of polyadenylation in plants using so-called Next Generation DNA sequencing technology; as such it is an addition to other recent studies, albeit the first (to my knowledge) that deals with plants.

I’m more than happy to answer questions about the paper in the comments.  What I will do in the essay is described one of the more perplexing findings, and “amend” the PNAS paper with a few illustrations that we couldn’t include in the paper (even the online Supplemental Files – we maxed out the print and SI page limits).

First, the curious finding.  The main point of the paper was an accounting of all of the poly(A) sites “encoded” by the Arabidopsis genome (at least all of the sites seen in mRNAs present int he leaf and seed).  As Fig. 1 of the paper shows, a bit more than 10% of all “sense”* sites fell within annotated protein-coding regions.  (The study by Shepard et al. reported a similar finding.)  The existence of so many of these sites was quite unexpected.  This is because processing and polyadenylation within a protein coding region will, except in rare cases, produce an RNA that lacks a translation termination codon.  These so-called non-stop mRNAs are expected to be relatively unstable and polypeptides produced by translating such RNAs should be degraded;  these activities are necessary to promote proper recycling of ribosomes, that otherwise would pile up along such mRNAs owing to the absence of translation termination codons.  I haven’t got a good explanation for these sites, other than they exist (and they are not rare, judging from the numbers of tags that correspond to these sites).  This goes to show, though, that there is probably much to be learned about non-stop mRNAs in plants, and about the interplay between mRNA surveillance and RNA processing.

A related observation we made was that these coding sequence-associated poly(A) sites seemed to possess different polyadenylation signals.  This was determined by using a sort of genome-scale assay.  The approach has been described before (for example, here and here); basically, what one does is line up all of the sequences adjacent to poly(A) sites and tabulate the relative frequencies of the four bases on a position-by-position basis.  When this is done for “normal” plant poly(A) sites, one gets this (from our recent PNAS paper):

As I alluded to before, one can easily see the A-rich Near Upstream Element and U-rich Cleavage Element (that surrounds the actual poly(A) site at -1).  Here is what the coding sequence-associated sites look like:

What is striking about these sites is the relative paucity of U’s around the poly(A) site and the relative abundance of G’s.  Indeed, these sites are defined by their A+G richness. These sites thus would seem to be a class of non-canonical poly(A) site, the first described for plant nuclear genes.  (This isn’t really surprising – in animals, non-canonical sites can be recognized by the deviation from normal of the hexanucleotide AAUAAA motif.  Plants have no highly-conserved signal, so it hasn’t really been possible to use this as a basis for poly(A) signal classification.)

Second, what is neat (and addicting, and eventually exhausting and annoying) is the genome-wide picture of polyadenylation that can be put together using computational tools.  We originally intended to provide several examples in the Supplemental Files, but the powers-that-be told us that each illustration would be considered a separate figure, and thus count against the 10-item limit.  So we had to remove them.  Here, for your viewing pleasure, is a small sampling to illustrate some of the interesting things that can be seen:

An overview, showing the distirubtion of tags superimposed on the genome annotation.

A close-up showing the typical occurrence of multiple poly(A) sites.

An example (rare) of a site that lies within an annotated 5'-UTR.

Some genes have it all - sites in introns, CDS, and 3'-UTRs!

In case anyone is interested, these illustrations were made using  CLC Genomics Workbench (to map tags to the genome and make the .sam and .bam files), SAMtools (to do the indexing of the .bam files), and Integrated Genomics Viewer (to display the tags using the .bam files).  I’m far from a computer geek, but even I can manage to use these tools with little frustration.

Enjoy the paper, and, as I said above, feel free to ask questions in the comments.

* – so-called “sense” sites are those sites that are oriented in the same direction as an annotated Arabidopsis gene or feature.

About these ads

3 Responses to Following up

  1. Clem says:

    Art:
    Have only just begun to look at this and already have a couple questions –

    1. In the abstract to the paper “CDS protein-coding…” what does CDS stand for?

    2. APA (especially antisense APA) reminds me of the “junk DNA” of 30 years ago. More and more levels of (and mechanisms for) gene regulation. Like the “cost” of carrying extra non-coding sequence one might consider APA an aditional biochemical cost if the transcripts generated have no biological value. Or is the value yet to be appreciated?

  2. Arthur Hunt says:

    Hi Clem,

    Regarding your questions:

    1. CDS stands for coding sequence.

    2. I think that there is no one explanation for APA. There are several examples where APA contributes to gene function, and probably others where it is hard to see any “use”.

    I am not much inclined to worry about the cost of extra or unnecessary transcripts. In the bigger scheme of things, DNA and RNA metabolism doesn’t use much of the cell’s energy budget, which means that there is probably a negligible cost for making some (or even lots) of RNA that is not translated (or indeed, is thrown away).

  3. [...] has become all the rage.  Lots of people are using variations on the themes I describe here and here to study alternative polyadenylation.  (I hope to be able to discuss additional plant studies in [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 28 other followers

%d bloggers like this: