1
in the yeast genome reveals an RN A thermosensor that
mediates alternative splicing
Markus Meyer1,4, Mireya Plass2,&, Jorge Pérez-Valle4,&, Eduardo Eyras2,3, and Josep
Vilardell1,3,4,*
Running title: An RNA thermosensor controls APE2 3'ss selection.
1- Centre de Regulació Genòmica, Dr. Aiguader 88, 08003 Barcelona, Spain
2- Computational Genomics Group, Universitat Pompeu Fabra, Dr. Aiguader 88,
08003 Barcelona, Spain
3- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain (ICREA).
4- (present address) Molecular Biology Institute of Barcelona (IBMB), Baldiri
Reixach 10-12, 08028 Barcelona, Spain. Tel: +34.93.4020549, fax: +34.93.4034979.
&- Both authors contributed equally to this work
* corresponding author
e-mail: [email protected]
2
Highlights
Yeast introns have a diverse range of branch-‐
RNA structure in the BS-‐ The APE2 choice
SUMMARY
Poor understanding of the spliceosomal mechanisms to select intronic 3' ends (3'ss) is
a major obstacle to deciphering eukaryotic genomes. Here we discern the rules for
intronic branch site (BS), and that
mediated by pre-mRNA structures. Our results reveal that one of these RNA folds
acts as an RNA thermosensor, modulating alternative splicing in response to heat
lity. Thus, our data point to a deeper role
for the pre-mRNA in the control of its own fate, and to a simple mechanism for some
alternative splicing.
INTRODUCTION
Precise identification of intron boundaries (splice sites [ss]) in eukaryotic pre-mRNAs
is critical. Errors result in incoherent and possibly deleterious transcripts and are
conductive to diseases (Cooper et al., 2009)
interactions with U1 snRNP (Wahl et al., 2009). In contrast, the identification of the
3
of its multiple occurrence and the lack of a clear selector comparable to U1. Several
intron branch site (BS) or from a downstream pyrimidine-rich track (Patterson and
Guthrie, 1991; Smith et al., 1993) (Goguel and Rosbash,
1993), or with the downstream exon (Crotti and Horowitz, 2009); distance from the
BS (Cellini et al., 1986; Luukkonen and Seraphin, 1997); and interactions with
splicing factors (Smith et al., 2008; Umen and Guthrie, 1995). But despite these
required to fully decode eukaryotic genomes.
The relative simplicity of the Saccharomyces cerevisiae genome, with 5% genes
containing an intron (Spingola et al., 1999) and readily identifiable BS, provides a
splicing is underrepresented (Yassour et al., 2009), indicating a predominantly
additional 3'ss choices (sequence HAG, H : [UCA]) between the BS and the annotated
Here we undertook bioinformatics and molecular approaches to document a set of
model genome. Remarkably, a
balance between pre-mRNA folding and spliceosome flexibility is required,
suggesting important evolutionary implications and alternative ways of splicing
regulation. We verified this possibility by showing that in one intron the pre-mRNA
structure acts modulating splicing in response to temperature changes.
RESULTS AND DISCUSSION
4
Intronic features downstream from the branch site
Yeast introns show a variable distance between the BS adenosine (BS-A) and the
5 nucleotides (Figure S1). This is surprising as previous reports
indicate that such a large distance drastically hinders splicing efficiency (Cellini et al.,
1986), but this must be overcome as most yeast transcripts are efficiently spliced
(Clark et al., 2002). Interestingly, the number of potentially cryptic 3'ss between the
BS and the annotated 3'ss is significantly lower than expected by chance ( 2 p = 1.71
x 10-17). This suggests a selective pressure against HAGs upstream of the annotated
reduces VMA10 splicing (Figure S1).
have a negative impact. Interestingly, this mutation destabilizes a putative secondary
structure that would occlude this HAG, suggesting the possibility that RNA folding
would have a dual role by preventing spliceosomal targeting of occluded HAGs and
promoting the use of one located downstream. To test the generality of this hypothesis
we used RNAfold (Hofacker, 2009) to predict an optimal secondary structure (i.e. the
most likely one) between the BS and the annotated 3'ss using a dataset of 282 S.
cerevisiae introns (SGD 2009) (see Methods and Supplementary Material). The
results of these predictions, summarized in Figures 1 and S1, are available at
http://regulatorygenomics.upf.edu/Yeast_Introns. Their analysis reveals several
First, 40% (113) of S. cerevisiae introns can potentially form a structure in the BS-
-
nucleotides [nt] (Figure 1A, black line). Second, the effective BS-3'ss distance
distribution of those introns, calculated by subtracting the number of nt contained in
5
the structure (Figure 1B), is not significantly distinct from that of introns without
structures (Figure 1A, light grey line). Notably, the effective BS-
introns is predicted to be never larger than ~45 nt. This may represent the maximum
distance for efficient splicing in S. cerevisiae, or indeed in yeasts, since a similar
result is also observed for other species (Figure S1).
It has been shown that suboptimal RNA structure predictions can be relevant as well
to determine the biological effects of RNA folds (Rogic et al., 2008). Thus, to assess
the consistency of our results, we evaluated 1000 suboptimal structure predictions for
each BS-
they introduce. The results by this approach (Figure S1) are consistent with our
previous analysis, supporting our conclusions.
In addition, to test whether the properties of the effective distance are specific to real
introns, we examined sets of randomized intron sequences (Supplemental Material)
with the same BS-
observed that in randomized sequences RNA structures also shorten the BS-
effective distance to values similar to those found in sequences without a predicted
secondary structure. However, the maximum effective distance observed is
significantly longer (~60 nt) than in real introns (Figure S1).
(accessibility) in the BS- the downstream exon
(exonic HAG). We found that accessibility values are significantly higher for real
accessible than intronic HAGs, which is not expected in randomized sequences as
longer segments are more likely to fold, increasing the probability of occluding
6
HAGs. These results, replicated in other yeast species analyzed (Figure S1), are
ted by RN A folding
mutations expected to disrupt the BS-
restoring the original structure by additional mutations would re-establ
selection. This strategy was tested in the RPS23B
an RNA structure, and that disruption of this structure should activate it. Thus, the
RPS23B intron was inserted into a CUP1 reporter (Lesser and Guthrie, 1993) and
splicing was monitored by primer extension analyses (Figure 2). The results show that
in the wt RPS23B only the annotated splice site is used (RPS23B-1; Figure 2B, lane
1). This pattern switches when the structure is disrupted and the alternative AG
becomes the sole target of the spliceosome (RPS23B-2; Figure 2B, lane 2).
Importantly, the wt pattern is restored when the structure is reinstated by
complementary mutations (RPS23B-3; Figure 2B, lane 3). In this case the splicing
pattern is undistinguishable from that of the wt, in agreement with our predictions.
The loss of splicing to the annotated 3'ss when the structure is disrupted could be
attributed to a preference by the spliceosome for the first AG from the BS (Smith et
al., 1993), or to an excessive (>45 nt) distance of the second AG from the BS, as
suggested by our bioinformatics analyses. To distinguish between these two
RPS23B-2. This mutation
blocks the second step of splicing as shown by the lack of accumulation of pre-
mRNA and of mRNA, and by the build-up of first step lariat intermediate in a dbr1
7
strain (RPS23B-4; Figure 2B, lane 6) (dbr1 cells lack debranching enzyme, required
for the proper turn-over of first step lariat intermediates). These observations are
consistent with the possibility that the annota
spliceosome without a structure that brings it closer to the BS. Thus, we conclude that
the predicted RPS23B structure forms in vivo and allows for the proper selection of
VMA10 intron (Figure S2). Hence we used
closer to the BS. We found that distances greater than ~45 nt diminish splicing
efficiency (Figure 2C-D), in agreement with our in silico findings.
While the existence of an optimal range for 3'ss selection has been previously shown
((Chua and Reed, 2001) and references therein), the suggested scope (20-30 nt from
the BS) does not explain many naturally occurring introns. Our conclusions are
consistent with an alternate spliceosomal range for the second step of catalysis and
help to explain S. cerevisiae global 3'ss usage. We envision two compatible
explanations for the spliceosomal range; it could reflect a balance between a search
for a splice site and the activation of a proofreading mechanism, or it could represent
the physical distance that can be reached by the spliceosome active center. While
further research is necessary to evaluate these two models, we speculate that a pure
kinetic model could allow for changes in scope, for example by stabilizing the
conformation of the spliceosome required for the second step (Konarska et al., 2006).
According to our predictions, 81 out of the 134 HAGs located between the BS and the
(http://regulatorygenomics.upf.edu/Yeast_Introns). Out of the remaining 53 HAGs, 32
8
are located at a distance of 9 nt or less from the BS, whereas the shortest natural
occurrence is of 10 nt. As for the rest (21), they are all AAGs. Therefore, we made
two additional predictions. First, that AGs located closer than 10 nt to the BS cannot
be used by the spliceosome, as was indicated for the ACT1 intron (Patterson and
Guthrie, 1991); second, that the yeast spliceosome selects C/UAG over AAG, as
reported in other systems (Smith et al., 1993). These predictions were tested in vivo
with constructs based on the DMC1 intron with an AAG and a UAG at 16 and 28 nt
from the BS, respectively. In the wt construct the spliceosome mostly selects the
downstream 3'ss (DMC1-1; Figure 3B, lane 1). This preference is not due to the
are equally used when they are both UAG (Figure 3B, lanes 2-3). However, to be
selected an AG must be at more than 9 nt from the BS (Figure 3B, lane 4). To
determine whether the presence of an RNA fold between two available HAGs
interferes with their selection we introduced the RPS23B stem into DMC1-2, with two
active UAG. Remarkably, there is no change in the ratio of usage of either 3'ss
(DMC1-5; Figure 3B, lane 5; and Figure S3), indicating that the spliceosome is able
to probe for potential 3'ss upstream as well as downstream from the stem (but not
inside). Consistent with our model, disrupting the stem (construct DMC1-6) renders
the downstream AG inaccessible, now too far from the BS (Figure 3B, lane 6). Taken
together these results enable us to describe how the spliceosome selects a 3'ss. To be
linearly or with the help of an RNA fold. Moreover, if there are two splice sites within
this window, but outside a structure, both will be equally used, unless one is stronger
than the other (PyAG > AAG).
9
The spliceosome will open a stem located too close to the branch site
We next asked whether a structure can occur anywhere between the BS and the
decreased the distance between the BS and the stem loop of
the VMA10 intron, from 20 to 5 nt. Our data show that a stem is not fully formed
when predicted to be close to the BS. In fact, positions closer than 10 nt from the BS
will remain unpaired. Thus, an HAG that would otherwise be occluded becomes
available if a stem in moved closer to the BS (Figure 3D, lane 3). Moreover, moving
the stem even closer can in fact put out of spliceosomal reach a downstream HAG
previously available, since the lack of folding leads to an increased effective distance
to the BS (Figure 3D, lane 4). This, taken into account into our in silico predictions,
indicates that while the spliceosome shows a remarkable tolerance to a diversity of
RNA folds, a region of ~9 nt downstream the BS will not harbor any structure, and
likewise AGs located here will be ignored. Thus it appears that the spliceosome needs
to have access to this region before catalyzing exon ligation. This is consistent with
both substrate requirements for in vitro spliceosome assembly (Fabrizio et al., 2009),
and second-step blocking by a strong hairpin downstream of the BS (Liu et al., 1997).
and we show that the spliceosome use
Consistent with this, scanning does not explain splicing patterns of some human
introns (Chen et al., 2000). Interestingly, the minor U12-dependent spliceosome
removes introns with a small BS-3'ss region, similar to that of yeast (Will and
Luhrmann, 2005).
Our data indicate that RNA folds are more than a particularity of some transcripts.
They are critical for splicing and widespread in several fungi species (this work and
10
(Deshler and Rossi, 1991)
conserved in homologous genes (http://regulatorygenomics.upf.edu/Yeast_Introns),
arguing for a selective pressure to keep them. Notably, our defined set of rules for
re S3) explains the splice site choice of all but one
(REC102, Figure S3) yeast introns.
Comparison of sequence elements inside and outside secondary structures shows an
enrichment of U-rich motifs outside structures, which reflects the higher propensity of
sequences containing purines to be in a fold (Tables S2 and S3). Thus, if a factor were
to be required for the function of these diverse structures, it would be unlikely that
this is mediated by a particular sequence motif. Likewise, there is no apparent
correlation between the presence of an RNA structure and the function of factors
prp18 (Umen and Guthrie, 1995) and slu7 (Frank
et al., 1992) (Figure S3). Taken together these results suggest that the RNA structures
in the BS-
factors that could modulate this role
The APE2 BS-
Given that 40% of S. cerevisiae introns potentially harbor a structure between the BS
selection. This could be mediated by binding regulatory factors, by working as a
riboswitch (Wachter, 2010), or by acting directly as a thermosensor (Kaempfer, 2003;
Neupert and Bock, 2009). In prokaryotes it has been shown that RNA structures can
block ribosome access in response to temperature changes (Neupert and Bock, 2009).
11
hus, in a search for yeast
by temperature change (heat-shock), we identified APE2, encoding an amino
peptidase. Splicing of APE2 pre- k (Yassour et
al., 2009), and our predictions show that their the BS-
stem loop occluding two AAG and bringing the annotated CAG into spliceosomal
range (Figure 4A and S4). Thus, according to our model, formation of this stem will
trigger CAG selection. Our data are consistent with this and the previous report
(Yassour et al., 2009), although the dual AAG/CAG usage suggests that the stem only
forms completely in a fraction of transcripts. Upon heat shock the stem becomes less
stable and selection of the distant CAG is diminished (Figure 4B, lane 2).
Accordingly, splicing to the upstream AAG is promoted, leading to the addition of 6
residues to the Ape2 N-terminus region. We predicted that mutations weakening the
APE2 stem will mimic the heat-shock response at low temperatures, possibly up to
e tested this by
sequentially opening the stem. Notably, a single-nucleotide mutation at the bottom of
leaving a residual heat sensitivity (Figure 4B, lanes 3-4). Further disruption of the
stem leads to an APE2 splicing pattern that favours mostly AAG selection, and is
insensitive to heat (Figure 4C, lanes 5-6). Importantly, reinstating the stem by
compensatory mutations restores the splicing switch induced by heat (Figure 4C,
lanes 7-8). We argue that the increased AAG/CAG ratio in this construct is caused by
the weaker reconstituted stem (AU vs GC base pair). Altogether, our data are
consistent with the APE2 intron folding into a small stem loop that acts as an RNA
thermosenso
12
accordingly. While in prokaryotes thermosensors and riboswitches are relatively
common controlling transcription and translation, in eukaryotes only one class of
riboswitches has been described. They modulate pre-mRNA splicing by sophisticated
conformational changes induced upon thiamine pyrophosphate (TPP) binding
(Neupert and Bock, 2009; Wachter, 2010). Our findings for the APE2 thermosensor
indicate that simpler switching strategies, analogous to the prokaryotic thermosensors
that control ribosome binding, are also present in eukaryotes to control spliceosome
access. This presents additional possibilities of splicing control, and it will need to be
taken into account when deciphering eukaryotic genomes.
EXPERIMENTAL PROCEDURES
Bioinformatics analyses
A total of 282 introns, defined as unambiguous sequences with canonical splice sites
in chromosomal genes, were extracted from the S. cerevisiae genome (SGD July
2009) . To predict branch sites (BS), introns were scanned for NNNTRACNN motifs
TACTRACNN motif were predicted as BS. When several motifs with identical
Hamming distances were found, an additional selection based on the potential base
pairing to U2 snRNA was applied using RNAcofold (Hofacker, 2009), forcing the
BS-A to be unpaired. When several motifs had the same potential, the closer to the
s selected.
For secondary structure predictions we used RNAfold (Hofacker, 2009) with default
parameters, using the sequence region between 8 nt downstream from the BS-A and
in introns we
13
compared the observed and expected frequencies, calculated assuming that nucleotide
positions are independent. Individual nucleotide frequencies were calculate in the
-A. The
expected probability of each triplet was then calculated as the product of the
individual observed frequencies. Accessibility of a nucleotide was defined as one
minus the probability of it being paired. Pairing probabilities were calculated using
RNAfold (Hofacker, 2009). Given the same sequence and absence of bias, a larger
window is expected to give lower accessibility values, as the folding probability
increases.
The predictions for all yeast species analyzed (Table S1), pair probabilities for the
structures and accessibility values can be found at
http://regulatorygenomics.upf.edu/Yeast_Introns/. Further details on the
bioinformatics analyses can be found in the Supplemental Material.
Strains and reporter plasmids
S.cerevisiae strains are derived from BY4741. The reporter plasmids are based on
pCC71 (Collins and Guthrie, 1999) where the ACT1 intron has been replaced by the
various versions of the different introns studied. As a result, the introns are flanked by
50 nt of ACT1 leader, and the CUP1 gene. Cloning strategies and oligonucleotides are
available in Supplemental Material.
Primer extension
Performed as described in (Siatecka et al., 1999), on RNA from strain BY4741
carrying the reporter plasmid unless stated otherwise. Primers used are
14
complementary to CUP1 and to U6, used as loading control. Quantifications are of
three independent experiments +/- SEM.
ACKNOWLEDGMENTS
We thank C. Query, J. Valcárcel, and J. Warner for helpful discussions and comments
on the manuscript. This research has been supported by the Spanish Ministry of
Science (BIO2008-01091 and 363), MC #510183, EURASNET (LSHG-CT-2005-
518238), CSIC (200920I195), and Agaur. M. M. is supported in part by a Training of
Researchers (FI) fellowship (Agaur) and M. P. by a Carlos III PhD fellowship.
REFERENCES
Cellini, A., Felder, E., and Rossi, J.J. (1986). Yeast pre-messenger RNA splicing efficiency depends on critical spacing requirements between the branch point and 3' splice site. Embo J 5, 1023-1030. Chen, S., Anderson, K., and Moore, M.J. (2000). Evidence for a linear search in bimolecular 3' splice site AG selection. Proc Natl Acad Sci U S A 97, 593-598. Chua, K., and Reed, R. (2001). An upstream AG determines whether a downstream AG is selected during catalytic step II of splicing. Mol Cell Biol 21, 1509-1514. Clark, T.A., Sugnet, C.W., and Ares, M., Jr. (2002). Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 296, 907-910. Collins, C.A., and Guthrie, C. (1999). Allele-specific genetic interactions between Prp8 and RNA active site residues suggest a function for Prp8 at the catalytic core of the spliceosome. Genes Dev 13, 1970-1982. Cooper, T.A., Wan, L., and Dreyfuss, G. (2009). RNA and disease. Cell 136, 777-793. Crotti, L.B., and Horowitz, D.S. (2009). Exon sequences at the splice junctions affect splicing fidelity and alternative splicing. Proc Natl Acad Sci U S A 106, 18954-18959. Deshler, J.O., and Rossi, J.J. (1991). Unexpected point mutations activate cryptic 3' splice sites by perturbing a natural secondary structure within a yeast intron. Genes Dev 5, 1252-1263.
15
Fabrizio, P., Dannenberg, J., Dube, P., Kastner, B., Stark, H., Urlaub, H., and Luhrmann, R. (2009). The evolutionarily conserved core design of the catalytic activation step of the yeast spliceosome. Mol Cell 36, 593-608. Frank, D., Patterson, B., and Guthrie, C. (1992). Synthetic lethal mutations suggest interactions between U5 small nuclear RNA and four proteins required for the second step of splicing. Molecular and cellular biology 12, 5197-5205. Goguel, V., and Rosbash, M. (1993). Splice site choice and splicing efficiency are positively influenced by pre-mRNA intramolecular base pairing in yeast. Cell 72, 893-901. Hofacker, I.L. (2009). RNA secondary structure analysis using the Vienna RNA package. Curr Protoc Bioinformatics Chapter 12, Unit12 12. Kaempfer, R. (2003). RNA sensors: novel regulators of gene expression. EMBO Rep 4, 1043-1047. Konarska, M.M., Vilardell, J., and Query, C.C. (2006). Repositioning of the reaction intermediate within the catalytic center of the spliceosome. Mol Cell 21, 543-553. Lesser, C.F., and Guthrie, C. (1993). Mutational analysis of pre-mRNA splicing in Saccharomyces cerevisiae using a sensitive new reporter gene, CUP1. Genetics 133, 851-863. Liu, Z.R., Laggerbauer, B., Luhrmann, R., and Smith, C.W. (1997). Crosslinking of the U5 snRNP-specific 116-kDa protein to RNA hairpins that block step 2 of splicing. RNA 3, 1207-1219. Luukkonen, B.G., and Seraphin, B. (1997). The role of branchpoint-3' splice site spacing and interaction between intron terminal nucleotides in 3' splice site selection in Saccharomyces cerevisiae. Embo J 16, 779-792. Neupert, J., and Bock, R. (2009). Designing and using synthetic RNA thermometers for temperature-controlled gene expression in bacteria. Nat Protoc 4, 1262-1273. Patterson, B., and Guthrie, C. (1991). A U-rich tract enhances usage of an alternative 3' splice site in yeast. Cell 64, 181-187. Rogic, S., Montpetit, B., Hoos, H. H., Mackworth, A. K., Ouellette, B. F., and Hieter, P. (2008). Correlation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces cerevisiae. BMC Genomics 9, 355-373. Siatecka, M., Reyes, J.L., and Konarska, M.M. (1999). Functional interactions of Prp8 with both splice sites at the spliceosomal catalytic center. Genes Dev 13, 1983-1993. Smith, C.W., Chu, T.T., and Nadal-Ginard, B. (1993). Scanning and competition between AGs are involved in 3' splice site selection in mammalian introns. Mol Cell Biol 13, 4939-4952. Smith, D.J., Query, C.C., and Konarska, M.M. (2008). "Nought may endure but mutability": spliceosome dynamics and the regulation of splicing. Mol Cell 30, 657-666. Spingola, M., Grate, L., Haussler, D., and Ares, M., Jr. (1999). Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. Rna 5, 221-234. Umen, J.G., and Guthrie, C. (1995). The second catalytic step of pre-mRNA splicing. RNA 1, 869-885. Wachter, A. (2010). Riboswitch-mediated control of gene expression in eukaryotes. RNA Biol 7, 67-76. Wahl, M.C., Will, C.L., and Luhrmann, R. (2009). The spliceosome: design principles of a dynamic RNP machine. Cell 136, 701-718.
16
Will, C.L., and Luhrmann, R. (2005). Splicing of a rare class of introns by the U12-dependent spliceosome. Biol Chem 386, 713-724. Yassour, M., Kaplan, T., Fraser, H.B., Levin, J.Z., Pfiffner, J., Adiconis, X., Schroth, G., Luo, S., Khrebtukova, I., Gnirke, A., et al. (2009). Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc Natl Acad Sci U S A 106, 3264-3269.
FIGURE LEGENDS
Figure 1. BS- trons. (A) Introns are separated in two
categories, those that have potential for secondary structure in this region (black); and
those that do not (light gray). When the number of nucleotides in the secondary
structures of the former is removed from the actual distance (schematized in (B)), the
distribution of these effective distances (dark gray) does not exceed 45 nt, as with the
introns without structure (Wilcoxon signed-rank p = 3.4 x10-1) Density expresses the
proportion of cases at a given distance f
for details on calculations). (C) Boxplot diagram showing accessibility values for
from bottom to top, Q1 (25% of the datapoints), Q2 (the median indicated by a thick
line) and Q3 (75% of the datapoints). The dashed lines, which are limited by the thin
lines, indicate the distribution of values outside the Q1 and Q3 values. Outliers, which
are shown as open circles, correspond to those values that lie outside the interval [Q1
1.5(Q3-Q1), Q3 + 1.5(Q3-
HAGs (Wilcoxon signed-rank p [ann vs intronic] = 5.5 x 10-5, p [ann vs exonic] = 2.2
x 10-16).
Figure 2. mediated by RNA structures. (A, B) Validation of the RPS23B
fold (Figure S2) (A) Diagrams of the tested RPS23B
17
numbered, and the used one is shown in black. Paired and mutated regions are
indicated by hyphens and small marks, respectively. (B), Primer extension analyses of
RNA from cells containing the constructs shown in (A), as indicated. Extension
5-6 indicate samples from a dbr1 strain, lacking debranching enzyme (C , D) The
VMA10 intron (Figure S2) was used to assess the BS-
efficiency, measured by primer extension. (C) Diagrams of the VMA10 constructs
used, formatted as in (A). The distance (in nt) between the BS-
shown. (D) Primer extension analyses of RNA from cells containing the constructs
shown in (C), as indicated, formatted as in (B). Lane 1, VMA10-1 (wt); lane 2,
VMA10-2 with mutated stem; lanes 3-4, VMA10-5 and 6 (respectively) constructs,
based on VMA10-2 but with AG-1 increasingly closer to the BS, as indicated. The *
indicates an uncharacterized band.
Figure 3. A)(B) Splice site strength and
minimal distance between BS (A) Diagrams of the different DMC1
constructs (see also Figure S3) monitored by primer extension analyses shown in (B).
Constructs DMC1-5 and 6 include the RPS23B stem loop (wt and open, respectively)
between the UAGs in DMC1-2. -A. (C)(D) Requirement for a
minimal distance between the BS and the beginning of a structure. Splicing patterns
of constructs depicted in (C), based on VMA10 but with a decreasing distance
between the BS-A and the base of the stem, are shown in (D), as analyzed by primer
extension. Top row numbers indicate the VMA10 construct (1 is wt), showing below
the distance to the stem from the BS-A. Primer extension products are indicated on
the right of panels (B) and (D), with numbers indicating the selected AG.
18
Figure 4. The APE2 RNA thermosensor. (A) RNA structure that controls selection of
-A, and the effective distance is shown in
italics. Two possible alternate thermosensor states are depicted in balance, indicating
factors promoting either state. Mutations (in circles) introduced to disrupt (grey) and
C23, as the AAG30 only gets activated by mutating the stem up to A22 or to C23
(Figure S4). (B) Primer extension analyses monitoring splicing of APE2 constructs at
independent experiments is shown +/- SEM. Bars indicate the AAG (white) and CAG
(black) fraction of transcripts.
Figure-1 (Meyer)
A
B
5’SS 3’SS BS
structure
5’SS 3’SS BS
effectivedistance
distance
intron
no structure
0 50 100 1500.00
0.02
0.04
0.06
0.08
distance BS to 3’ss (nt)
Density
structureeffective dist.no structure
C
3’ssintron exon0.00.20.40.60.81.0
accessibility
crypticann.
Figure 1
A
Figure-2 (Meyer)
(wt)
________A
RPS23B-1
AG
A
RPS23B-2
AG
xxx
________A
RPS23B-3
AG
xx
xx
x
xxx
RPS23B-4
xx
A AG
xxxxx
xx
x
1 2
1 2
1 2
2
(wt)
B
C D
RPS23B
VMA10
VMA10-
U6
1
3
d (nt)
*
1
45515757
432
1 5 62(wt)
VMA10-2
d=57 ntA AG
1
2 3
xxxxx
VMA10-5
d=51 ntA AG
xxx
xx
1
2 3
VMA10-6
d=45 ntA AG
xxx
xx
1
2 3
__________A AG
1
2 3A
AVMA10-1
d=57 nt
RPS23B-dbr1
U6
2 41431
2 65431
21
Figure 2
A B
Figure-3 (Meyer)
A AAG UAG
A UAG UAG
A UAG AAG
DMC1-1
DMC1-2
DMC1-3
16 nt
A UAG AAGDMC1-49 nt
A UAG UAGDMC1-5
A UAG UAGDMC1-6xxxxx
1 2
U6
DMC1-654321
654321
1
1
2
12 nt C VMA10-1 VMA10-7 to 9__________A AG
1
2 3
__________A AG
d
1
2 3
VMA10-
U6
23
d (nt)
1
8 1 5 8 20 14
4 3 2
7 9 D
Figure 3
mut AA A
UGUAC
CCA A
AA
GUACGA A A A
AAUAC
CCA A
AA
GUACGA A
mut BA
AAUAC
CCA A
AA
GUAUUA A
mut C
Figure-4 (Meyer)
A
5'
20
50
A UUGUAC
CCA A
AA
GUACGA A C
30
40
G G A A A C U A A A A U G A C A G 3'Cwt
U
5'
2050
A UUGU
AC
CCA A
AA
GU
A
CGA A C
30
40
G G A A A C U A A A A U G A C A G 3'C17 29 47
+T -T
17 21 39
B
CAG
AAG
23 37wt
23 37mut A
23 37mut B
23 37mut C
U6Isoform fraction
0.0 0.1 0.2
0.3 0.4 0.5 0.6 0.7
1 2 3 4 5 6 7 8
AAG
CAG
*
(T ºC)
Figure 4
SUPPL E M E N T A L M A T E RI A L
in the yeast genome reveals an RN A thermosensor that
mediates alternative splicing
Markus Meyer, Mireya Plass, Jorge Pérez-Valle, Eduardo Eyras, and Josep Vilardell*
*To whom correspondence should be addressed: [email protected]
File contents:
Supplemental Figures S1-S4
Supplemental Tables S1-S3
Supplemental Experimental Procedures
Supplemental References
Inventory of Supplemental Information
Meyer et al. Supplemental Material - Page 1 of 26
SUPPL E M E N T A L M A T E RI A L
in the yeast genome reveals an RN A thermosensor that
mediates alternative splicing
Markus Meyer, Mireya Plass1, Jorge Pérez-Valle1, Eduardo Eyras, and Josep
Vilardell*
1Equal contributors.
*To whom correspondence should be addressed: [email protected].
Supplemental Text and Figures
Meyer et al. Supplemental Material - Page 2 of 26
Meyer et al. Supplemental Material - Page 3 of 26
Meyer et al. Supplemental Material - Page 4 of 26
Supplemental F igure S1 (related to main F igure 1)
Saccharomyces cerevisiae introns. Density expresses the proportion of cases with a given BS-
(b, c) A mutation 48 nt upstream of the VMA10
likely weakens a predicted stem- s (boxes). Numbers refer to the first nucleotide downstream of the branch-site A. The effect
extension analysis, shown in (c). An impairment of the second step of splicing is evidenced by the 1st step lariat accumulation in a dbr1debranching enzyme). Primer extension intermediates are depicted on the right. U6 is the internal control.
Introns are
separated in two categories, those predicted to have a secondary structure between
corresponds to the effective distance distribution of introns with a predicted secondary structure. For each species, the number of introns in each category is
Meyer et al. Supplemental Material - Page 5 of 26
given. The p-value of the comparison of their effective length distributions is given, indicating that they do not statistically differ. Only yeast species with at least 30 sequenced introns in each group were considered. Density expresses the proportion of cases with a given BS-
(e - h) Comparison of BS-
minimum free energy (MFE) structures or by calculating 1000 suboptimal structures for each BS-(e) shows the effective BS-predictions, with a secondary structure (dark grey line), without a secondary structure (light grey line), and the whole set (black line). Each of these MFE-based distributions is compared to the corresponding distribution obtained by calculating 1000 suboptimal structures for the same region. Thus, panel (f) shows the comparison of distributions for the whole set of introns; panel (g) shows those for introns with MFE structure; and panel (h) for introns without MFE structure. There is no statistically significant difference present in any case. Density expresses the proport
(i, j) Summary of the effective distance distributions of S. cerevisiae introns, grouped
by having a predicted MFE secondary structure (i) or without it (j), and plotted according to the number of nucleotideeach intron 1000 suboptimal structures, with their corresponding effective distances, have been calculated (see panels e-h). The mode (most common value) is shown by a grey dot. Vertical bars show the range of effective distance values considering 95% of cases (we eliminate the top and the bottom 2.5% extreme values). For comparison, the effective distance based on MFE predictions (see Figure 1 of the main manuscript) is shown by a red dot. There are no significant differences (Wilcoxon signed-ranked test introns with a predicted optimal secondary structure p-value= 0.036; without a predicted optimal secondary structure p-value = 0.343). For clarity purposes each panel is also shown below with the introns sorted by increasing BS-effective distances calculated according to MFE predictions.
(k) Distribution of maximum effective distances BS-3'ss for randomized datasets with
st intron set. The randomized datasets were constructed using either the same intronic nucleotide content (grey) or that of the whole yeast genome (black). The analysis shown in Figure 1 was repeated on 1000 these sets. The distribution of the resulting 1000 maximum effective distances is shown. The maximum effective BS-real introns (~45 nt, indicated by a red dot) is significantly shorter than the prevalent in randomized datasets with the same nucleotide content (p-value= 0.002) or with a nucleotide content based on that of the whole yeast genome (p-value=0.008). Density expresses the proportion of sets with a given maximum effective distance.
(l) Boxplot diagrams showing accessibility values for annotated (green) and cryptic
(intcerevisiae is shown also in Figure 1c). Dashed lines indicate the value distribution between the maximum and minimum (thin horizontal lines). Boxes include 50% of the values. Thick lines indicate median values and outliers are shown as open
Meyer et al. Supplemental Material - Page 6 of 26
be less accessible even though the analyzed sequence is shorter, and therefore it would be less likely to adopt many structures in absence of bias.
(m, n) Histogram showing the distribution of PhastCons conservation scores of the
nucleotides in the BS- both signals). The BS-was divided into three groups as shown in (m). (n) Nucleotides inside stems have higher PhastCons conservation scores than nucleotides outside the secondary structure or in loop/bulges (Wilcoxon single-rank test p-value STEM vs OUT = 2.823E-07; STEM vs LOOP/BULGE = 0.01329). There is not any significant difference between PhastCons scores of nucleotides outside the secondary structure and those in loops or bulges (Wilcoxon single-rank test p-value= 0.3136). Density expresses the proportion of cases with a given score. See also Table S3 and S4 for a list of conserved motifs.
Meyer et al. Supplemental Material - Page 7 of 26
Meyer et al. Supplemental Material - Page 8 of 26
Supplemental F igure S2 (related to main F igure 2) (a, b) RPS23B BS- -
site a RPS23B, indicating mutations introduced to disrupt (left arrows) and to restore (right arrows) the structure (see Figure 2 of the main
relative to the first position after the BS-A. (b) Multiple alignment of the BS-region of RPS23B from several Saccharomyces species, as indicated. The bracket notation of the secondary structure predicted in S. cerevisiae region is shown. S. kluyveri and castellii have a shorter BS-
Meyer et al. Supplemental Material - Page 9 of 26
(c - g) VMA10 VMA10 is depicte
Nucleotide numbers are relative to the first position after the BS-A. Mutations to disrupt (left arrows) and to restore (right arrows) the stem loop are shown. Nucleotides deleted in VMA10 5-9 constructs (Figures 2 (delta) followed by the construct number. The splicing pattern of constructs (d), as indicated, was analyzed by primer extension (e). Mutations predicted to disrupt the structure of VMA10 (VMA10-2 construct) lead to splicing switching from the annotated splice site (AG-3 in VMA10-1) to AG-1, which is a weak second-step substrate (VMA10-2). The wt splicing pattern is restored when complementary mutations are introduced (VMA10-3). When the 63 nucleotides including the structure are deleted from the construct (VMA10-4), there is no apparent difference between splicing of this construct and the wt. Primer extension products are depicted on the right, including lariat intermediates from the VMA10 constructs 1-3 (marked with *) and 4 (marked with **). (f) Constructs used in Figure 2 are included to help identifying the mutations that they include. (g) Multiple alignment of the VMA10 BS- Saccharomyces species, as indicated. The bracket notation of the predicted secondary structure of the cerevisiae region is shown.
Alignments in panels (b) and (g) were edited with Jalview (Waterhouse et al., 2009).
Meyer et al. Supplemental Material - Page 10 of 26
Meyer et al. Supplemental Material - Page 11 of 26
Supplemental F igure S3 (related to main F igure 2 and 3) (a) Sequence of the BS- DMC1 constructs (in bold) in Figure 3. The
DMC1-4, and the black triangle the insertion point of the RPS23B stem loops (wt and mutant, with the indicated substitutions).
situated inside a window of 10 to 45 nucleotides downstream of the BS. This distance can be the actual number of nucleotides between the BS and the HAG, or the effective distance that results from the formation of a secondary structure. HAGs contained in such a structure are occluded from the spliceosome. If two suitable HAGs are present, both are used equally, unless one is stronger (CAG=UAG>AAG). A indicates the BS adenosine, unsuitable HAGs are shown in grey and those recognized by the spliceosome are depicted in black.
(c - e) The possibility of a genetic interaction between RNA structures and splicing
of RNA extracted from wt, , and slu7 strains. Cells were transformed with the constructs VMA10-1 (wt VMA10 intron), VMA10-2 (VMA10 intron with the stem loop disrupted), VMA10-4 (VMA10 intron with the stem loop removed, see Figure S2), DMC1-2 (DMC1 intron with two UAG), and DMC1-5 (DMC1 intron with the RPS23B stem loop in between two UAG, see Figure 3 of the main manuscript and panel (a) here). Prior to RNA extraction slu7 cells were subjected
Meyer et al. Supplemental Material - Page 12 of 26
to temperature shift as originally described in Frank et al. (1992). (c) Shows the efficiencies, relative to wt (lane 1), of first and second step of splicing in the VMA10 constructs. Both mutants display a lower efficiency for the second step, as expected. The effect of the stem loop is minor (compare lanes 2-3 and 8-9). Interestingly slu7 and yield an improved 1st step in the VMA10-2 construct (which has the stem loop disrupted and is a poor second step substrate). For each reaction, the first step efficiency was calculated as (Mature + Lariat) / (Precursor + Mature + Lariat). The second step efficiency was calculated as Mature / (Mature + Lariat). Panels (d) and (e) show the relative selection of two UAGs and how that affected by the introduction of a stem loop in between. Interestingly in this intron slu7 and have opposite effects on relative UAG selection. This is not changed by the introduction of the RNA structure. Relative UAG selection was calculated as the ratio of the upstream UAG (AG-1) over the downstream (AG-2), and normalized to the value in wt cells.
(f - i) Th REC102/
YLR329W has a predicted stem that we have verified. However, one of the two accessible HAGs is preferentially used. The secondary structure predicted between
REC102
that is not contained in a structure (AG3) are also indicated (nucleotide numbers are relative to the first position after the BS-A). (g) shows the constructs analyzed by primer extension in (h). The stem-loop region of each construct is shown in (i), indicating the mutations (red).
In the wt construct REC102-1, splicing is predicted to go to AG3 and AG4; but instead AG3 is not used. When the secondary structure is disrupted by different mutations in constructs REC102-2 and 4, the splicing pattern is altered and the strong splice sites, no longer sequestered in a structure, are used; unless they have been mutated to open the secondary structure (REC102-4). When the secondary structure is restored by introduction of complementary mutations, the wt splicing pattern is restored (REC102-3 and 5). The alternative splice site AG3 is never used, revealing an exception to the rules of binding of an additional factor, or an alternative folding, blocks splicing to the proximal AG3.
Meyer et al. Supplemental Material - Page 13 of 26
Meyer et al. Supplemental Material - Page 14 of 26
Supplemental F igure S4 (related to main F igure 4) (a - c) Validation of the APE2 selection. (a)
Predicted secondary structure in the BS - APE2 intron. Only the region with structure is shown. The small stem is included although it is not consistent across predictions. Numbers refer to the first position after BS-A. The two intronic AAGs are indicated (numbers in circles). The effect on splicing of mutations in the two stems was tested by primer extension analysis (mutations in red, including the predicted fold of the mutant), shown in (b). The wt splicing pattern (lane 1), showing splicing to both AAG-1 and CAG, suggests that the large stem fails to form completely in all transcripts, allowing the weak AAG to compete with a CAG that is impaired by a large distance to the BS. This pattern also suggests that the small stem is not formed in most molecules, since it would block recognition of both AAG-1 and CAG. When both stems are disrupted (mut I) splicing goes to both AAG-1 and AAG-2, whereas the annotated CAG is out of spliceosomal range (50 nt from BS; lane 2). This indicates that in the wt AAG-2 is mostly occluded by the stem-loop. In mut II only the large stem is disrupted, and formation of the small stem is expected to block splicing to CAG and AAG-1. However, the result (lane 3) is more consistent with its absence, since AAG-1 is used and CAG is not (indicating that it is out of range). The inability to use CAG leads to lariat accumulation (lanes 2-the second step of splicing. We conclude that the large stem is the main modulator of APE2 splicing, and that the small stem is not formed under the conditions tested. (c) Multiple alignment of the APE2 BS-Saccharomyces species, as indicated. The bracket notation of the predicted secondary structure of the cerevisiae region is shown. In agreement with our data, the long stem sequence is more conserved. Jalview (Waterhouse et al., 2009) was used for editing.
(d, e) Effect of increasingly disrupting the APE2
selection. (d) Depicts the thermosensor in its two states, with the mutants tested by primer extension (see Figure 4 of the main text) shown in (e). The two AAGs included in the stem loop are numbered. Primer extension was performed with samples from cells before (23C) and after the heat shock (37C), as indicated. AAG- -pair is required to keep the loop (opening the bottom four positions in the stem is indistinguishable from mut B [data not shown]). The upper band (*) apparent at
-PCR and sequencing mut B, while
in mut D it is the CAU in the stem.
Meyer et al. Supplemental Material - Page 15 of 26
Table S1. Homologous introns identified in other yeast species
Species name Nº of introns S. cerevisiae 282 S. paradoxus 259 S. mikatae 262 S kudriavzevii 255 S. bayanus 258 S. castellii 150
Meyer et al. Supplemental Material - Page 16 of 26
Table S2. Enrichment, in/out of secondary structures, of k-mers in the region between the BS and the 3'ss
1PhastCons score (Figure S1)
Meyer et al. Supplemental Material - Page 17 of 26
Table S3. Top ten k-mers present in the BS - 3' ss region
1Calculated as the average of PhastCons scores (Figure S1)
Meyer et al. Supplemental Material - Page 18 of 26
SUPPL E M E N T A L E XPE RI M E N T A L PR O C E DUR ES
Strains
Deletion strains are from the YKO MATa Strain Collection (Open Biosystems).
Plasmids and primer extension analyses
All constructs used for this study were made by a BamHI/PacI digestion of a
GPD::ACT1-CUP1 2-micron reporter plasmid (Lesser and Guthrie, 1993) and a PCR
product that contains a BamHI site followed by 50 nt of the actin exon 1, ATG, an
intron followed by 14 to 21 nt of exon 2 and a PacI site. The oligonucleotides used for
every construct are detailed bellow. All PCRs were made with W303a genomic DNA
as template unless stated otherwise, and upon cloning all constructs were sequenced.
Primer extension analyses were performed as described in (Siatecka et al., 1999), on
RNA from strain BY4741 upf1
otherwise. Primers used are complementary to CUP1 and to U6 snRNA, used as
loading control. Heat shocks were performed by resuspending cells in warm media.
Cells were flash-frozen before RNA extraction (Kos and Tollervey, 2010).
Quantifications, as in Fig. 4, are represented as mean from three independent
experiments +/- SEM.
Primers used in this work
CUP1 -GGCACTCATGACCTTC-
U6 snRNA -GAACTGCTGATCATCTCTG-
RPS23B-1 (pMM42): -
ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgACAAGTATGTACAATT
ATAGAAGATTG- - -
tacgttaattaaCTCTTCTTATAGTTGTTTTCGGCCCAACGGCTGTTT-
RPS23B-2 (pMM43): JV1358- -
tacgttaattaaCTCTTCTTATAGTTGTTTTCGGCCCAACGGCTGTTTTAAACGAaT
AtCTTTacACAAtACATAGTTTCATCAAAAAGATT-
RPS23B-3 (pMM44): JV1358- -
tacgttaattaaCTCTTCTTATAGTTGTTTTCGGCCCAACGGCTGTTTTAAACGAaT
AtCTTTacACAAtACATAGTaTCATgtAAAAGAaTAtTAAATTTGTTAGTAAAT
G-
Meyer et al. Supplemental Material - Page 19 of 26
RPS23B-4 (pMM52): JV1358- -
tacgttaattaaCTCTTCTTATAGTTGTTTTCGGCCCAACGGCTGTTTTAAACGAaT
AtaTTTacACAAtACATAGTTTCATCAAAAAGATT-
VMA10-1 (pMM18): -
ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgATGGTATGTGCCATTA
CATTAC- - -tacgttaattaaTCCGTTTTTTTGGGACTAAAG-
VMA10-2 (pMM29): JV1128- -
tacgttaattaaTCCGTTTTTTTGGGACTAAAGAGAATATTACATAGTTCTGAGCA
ACAATGAAAAAACCAATACCTTTGTCACGAGcaACCAcTCCAAgAAGAGAG
gACAGTTTTAAAA-
VMA10-3 (pMM30): JV1128- -
tacgttaattaaTCCGTTTTTTTGGGACTAAAGAGAATATTACATAGTTCTGAcCA
ACAATcAAAAAAgCAATtgCTTTGTCACGAGcaACCAcTCCAAgAAGAGAGgA
CAGTTTTAAAA-
VMA10-4 (pMM31): JV1128- -
tacgttaattaatccgtttttttgggactaaagagaatattacatagttTTTTAAAAAAGTTTTTCATGTTA
G-
VMA10-5 (pMM47): JV1128- -
tacgttaattaaTCCGTTTTTTTGGGACTAAAGAGAATATTACATAGTTCTGAGCA
ACAATGAAAAAACCAATACCTTTGTCACGAGcaACCAcTCCAAgAAGAGAG
gACAG- - -
caaccactccaagaagagaggacagAAAAGTTTTTCATGTTAGTAAGAAC-
VMA10-6 (pMM48): JV1128-JV1380 using as template PCR JV1128- -
caaccactccaagaagagaggacagTTTTCATGTTAGTAAGAACGC-
VMA10-7 (pMM60): JV1128- -
tacgttaattaaTCCGTTTTTTTGGGACTAAAGAGAATATTACATAGTTCTGAGCA
ACAATGAAAAAACCAATACCTTTGTCACGAGGTACCAGTCCAACAAGAGA
GCACAG- - -
gtaccagtccaacaagagagcacagAAAAGTTTTTCATGTTAGTAAGAAC-
VMA10-8 (pMM59): JV1128-JV1382 using as template PCR JV1128- -
gtaccagtccaacaagagagcacagTTTTCATGTTAGTAAGAACGC-
VMA10-9 (pMM58): JV1128-JV1382 using as template PCR JV1128- -
gtaccagtccaacaagagagcacagtttTGTTAGTAAGAACGCTTGTTAG-
Meyer et al. Supplemental Material - Page 20 of 26
VMA10-10 (pMM21): JV1128- -
tacgttaattaaTCCGTTTTTTTGGGACTAAAGAGAATATTACATAGTTCTGAGCA
ACAATGAAAAAACCAATACaTTTGTCACGAGGTAC-
DMC1-1 (pMM63): JV1406 -
ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgGTATGTTATAATAACA
TTTTAAAACC- - -tacgttaattaaTCTTGTTGTTGACAAAACGGT-
DMC1-2 (pMM65): JV1406- -
tacgttaattaaTCTTGTTGTTGACAAAACGGTCTATAAAAGTTCCTaTCCAAATTA
TTAGTTAGTAAAAG- )
DMC1-3 (pMM66): JV1406- -
tacgttaattaaTCTTGTTGTTGACAAAACGGTCTtTAAAAGTTCCTaTCCAAATTA
TTAGTTAGTAAAAG-
DMC1-4 (pMM76): JV1406- -
tacgttaattaatcttgttgttgacaaaacggtctttaaaagttcctaTATTAGTTAGTAAAAGAAAGGGG
-
DMC1-5 (pMM71): JV1406- -
tacgttaattaatcttgttgttgacaaaacggtctataaaaattaacttttgacaaaacatagtttcatcaaaaagatta
atGTTCCTaTCCAAATTATTAGTTAGTAAAAG-
DMC1-6 (pMM72): JV1406- -
tacgttaattaatcttgttgttgacaaaacggtctataaaaaatatctttacacaatacatagtttcatcaaaaagatta
atGTTCCTaTCCAAATTATTAGTTAGTAAAAG-
REC102-1 (pMM102): JV1177 (5'-
ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaaTGAAAGTATGTTTCTCC
TAGCA-3') - JV1178 (5'-tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCT)
REC102-2 (pMM103): JV1177 - JV1179 (5'-
tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCTGGTACGAGTGACAAAG
CTAggtGTTATTATAGTTAGTA-3')
REC102-3 (pMM104): JV1177 - JV1180 (5'-
tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCaccTACGAGTGACAAAGC
TAggtGTTATTATAGTTAGTA-3')
REC102-4 (pMM105): JV1177 - JV1181 (5'-
tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCTGGTACGAGTGACAAAG
CatgCAGTTA-3')
Meyer et al. Supplemental Material - Page 21 of 26
REC102-5 (pMM106): JV1177-JV1182 (5'-
tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCTGcatCGAGTGACAAAGCa
tgCAGTTA-3')
APE2 WT (pJPV8): -
ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgTGCCAAAAAACGTCA
CAATTAC- -J -ggaagttaattaaAGGAACGACATTATCAGGAAG-
APE2 mutA (pJPV12): JV1517- -
ggaagttaattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTTTT
ACTGGTCATTTTAGTTGTCCTTCGTACTTTTGGGTACAtTTAATAGAGAATG
TAAG-
APE2 mutB (pJPV14): JV1517- -
ggaagttaattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTTTT
ACTGGTCATTTTAGTTGTCCTTCGTACTTTTGGGTAtttTTAATAGAGAATGT
AAG-
APE2 mutC (pJPV18): JV1517- -
ggaagttaattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTTTT
ACTGGTCATTTTAGTTGTCCTTaaTACTTTTGGGTAttATTAATAGAGAATGT
AAG- )
APE2 mutD (pJPV16): JV1517-JV1213 ( -
ggaagttaattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTTTT
ACTGGTCATTTTAGTTGTCCTTacatCTTTTGGGatgtaTTAATAGAGAATGTAA
G-
APE2 mut I (pJPV9): JV1517- -
GAAGTTAattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTT
TTACTGATTTTAGTTGTCCTTCGTACTTTTGGtTtgAtTTAATAGAGAATGTA
AG-
APE2 mut II (pJPV10): JV1517- -
GAAGTTAattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTT
TTACTGGTCATTTTAGTTGTCCTTCGTACTTTTGGtTtgAATTAATAGAGAAT
GTAAG-
Meyer et al. Supplemental Material - Page 22 of 26
S. cerevisiae intron dataset
We downloaded the annotation and genomic sequence of S. cerevisiae from the
Saccharomyces Genome Database (SGD July 2009) (Hong et al., 2008). We then
extracted all introns from chromosomal genes (327) and kept only those that had
length > 0 nt, canonical splice sites (
not have any ambiguous nucleotide (N) in the sequence, obtaining a final set of 282
introns.
B ranch Site Prediction
To predict branch sites introns were scanned for NNNTRACNN motifs up to 200 nt
upstream f
TACTRACNN sequence were predicted as BS. When several motifs with identical
Hamming distances were found, an additional selection based on potential base
pairing to U2 snRNA was applied (RNAcofold (Hofacker, 2009)), forcing the
unpairing of the branching A. If several motifs had the same potential, the closer to
Secondary structure prediction
For each intron, we recovered the sequence between the BS and
both signals. From this region we further removed the first eight nucleotides after the
BS A, as previous experiments show that these nucleotides cannot belong to a
secondary structure. In the selected region we did the secondary structure prediction
using the program RNAfold from the Vienna package (Hofacker, 2009) with default
parameters.
Homologous introns dataset
We used Galaxy (Giardine et al., 2005) to extract the homologous regions to the S.
cerevisae introns in 5 Saccharomyces species (S. paradoxus, S. mikatae, S.
kudriavzevii, S. bayanus, S. castellii). We extracted the genomic alignments for the
yeast species provided by UCSC (Kuhn et al., 2009) and kept only those containing
canonical splice sites and no ambiguous nucleotides in the sequence (Supplementary
Table S1). For each of the homologous introns obtained we did independent BS
predictions applying the same method used for S. cerevisiae. Subsequently, for each
S. cerevisiae intron we built a multiple sequence with its putative homologs using
Meyer et al. Supplemental Material - Page 23 of 26
PRANK (Loytynoja and Goldman, 2008). In the webpage we show only those introns
with homologous BS (those that contained the BS aligned with the S. cerevisiae BS).
E ffective distance measure
between the A of the BS and
TACTAACACNNNN|TAG would be a distance of 10 nt. We defined the effective
BS-
after removing the secondary structure. More specifically, we removed all the bases
that were part of a structured region, and 2 bases corresponding to the beginning and
the end of the structured region were added substituting each structured region.
E ffective distance for random sets
To assess the significance of our findings regarding the effective distance we
generated two sets of random sequences to compare to. First, for each of the
8 nt after the Branch Site A. From each of these sequences we then built 1000
randomized sequences (i.e. maintaining the sequence content and length distribution).
To generate the second set, for each of the real-sequence length we extracted 1000
random sequences of the same length from the genome (i.e. maintaining only the
same length distribution). Then, for each of these sequences we did an RNA structure
prediction using the program RNAfold from the Vienna package (Hofacker, 2009)
with default parameters and measured the effective distance as explained before. For
each of the 2000 random datasets (1000 sets composed of 282 randomized sequences
and 1000 sets composed of 282 random sequences) we measured the maximum
effective length obtained in each of them. Both distributions are significantly different
from the real dataset (Figure S1), with the calculated empirical p-values for the
comparisons to randomized introns and random introns are 0.002 and 0.008,
respectively.
Accessibility measurement
Accessibility is defined as 1 minus de probability of being paired (pair probability).
We calculated pair probabilities using the program RNAfold (Hofacker, 2009). For
Meyer et al. Supplemental Material - Page 24 of 26
nucleotides (HAG) using 4 different windows of increasing length size with lengths n,
n+5, n+10, and n+15 respectively, where n is the BS-HAG distance. The final value is
the average of the values of the three nucleotides averaged over the four windows.
Conservation analysis
We downloaded the phastCons conservation scores from the 7 yeast multiple species
alignment from UCSC (Fujita et al., 2010). PhastCons scores give the probability that
a given nucleotide is part of a conserved element (in a multiple alignment)
considering the information of the alignment column and the flanking alignment
columns, and taking into account the phylogeny of the species used (Siepel et al.,
2004)
both signals) and obtained the PhastCons conservation scores for each of them. Next,
we divided the data in three groups according to the localization of the nucleotides in
the secondary structure (Figure S1). OUT if the nucleotides were outside the
secondary structure; STEM if the nucleotides were base-paired in the predicted
optimal structure; LOOP/BULGE if the nucleotides were in unpaired regions from the
optimal structure predicted. The distribution of the values is shown in Figure S1. We
found that there is a gradient in the conservation, being the nucleotides in stems the
most conserved and nucleotides outside the secondary structure the least conserved.
Motif search
We looked for motifs overrepresented or underrepresented inside the predicted
optimal secondary structures. For each BS-
collected all 4-mers and 5-mers that were fully inside (including loops and bulges) or
outside of the secondary structure. For each of the motifs inside secondary structures,
we checked if it is overrepresented or underrepresented compared to the same motif
outside the secondary structure by using a zscore
zscore
XaNa
XbNb
Na1 Nb
1 Xa XbNa Nb
1Xa XbNa Nb
were Xa is the number of counts of a given k-mer inside the secondary structure, Xb
is the number of counts of the same k-mer outside the secondary structure, Na is the
Meyer et al. Supplemental Material - Page 25 of 26
total amount of k-mer counts inside the secondary structure and Nb is the total
amount of k-mer counts outside the secondary structure. For each zscore, we
calculated a two-tailed p-value assuming a normal distribution. We did multiple
testing correction to all the p-values applying the Benjamini and Hochberg False
Discovery Rate (Benjamini et al., 1995). Then, for each of the significant k-mers, we
computed the average phastCons scores inside and outside of the secondary structure.
The complete list of all significant k-mers is shown in the Table S2.
For comparison, we collected all k-mers that were fully in stems, in loops or bulges,
or outside secondary structures (Table S3).
Suboptimal structure prediction
For each
energy varies at the most 5% relative to the optimal secondary structure using the
program RNAsubopt (Wuchty et al., 1999), similarly to what has been done
previously (Rogic et al., 2008). Using these structures, we can predict a distribution of
effective distances for each of the introns analyzed, which is summarized Figures S1.
Furthermore, if we compare the distribution of effective distances obtained using the
optimal secondary structure and the 1000 suboptimal structures per intron for all
introns, for introns with a predicted optimal secondary structure, or for introns
without a secondary structure; with the effective length distributions obtained for the
same introns using only the optimal secondary structure, we find that there are no
significant differences (Wilcoxon signed-ranked test all introns p-value= 0.019;
introns with a predicted optimal secondary structure p-value= 0.036; without a
predicted optimal secondary structure p-value = 0.343).
Meyer et al. Supplemental Material - Page 1 of 26
SUPPL E M E N T A L R E F ER E N C ES Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological). 57, 289-300 Fujita, P.A, Rhead, B., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Cline, M.S., Goldman, M., Barber, G.P., Clawson, H., Coelho, A., Diekhans, M., Dreszer, T.R., Giardine, B.M., Harte, R.A., Hillman-Jackson, J., Hsu, F., Kirkup, V., Kuhn, R.M., Learned, K., Li, C.H., Meyer, L.R., Pohl, A., Raney, B.J., Rosenbloom, K.R., Smith, K.E., Haussler, D., Kent, W.J., (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Res. (Database issue): D876-82. Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451-1455. Hofacker, I.L. (2009). RNA secondary structure analysis using the Vienna RNA package. Curr Protoc Bioinformatics Chapter 12, Unit12 12. Hong, E.L., Balakrishnan, R., Dong, Q., Christie, K.R., Park, J., Binkley, G., Costanzo, M.C., Dwight, S.S., Engel, S.R., Fisk, D.G., Hirschman, J.E., Hitz, B.C., Krieger, C.J., Livstone, M.S., Miyasato, S.R., Nash, R.S., Oughtred, R., Skrzypek, M.S., Weng, S., Wong, E.D., Zhu, K.K., Dolinski, K., Botstein, D., Cherry, J.M. (2008) Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res. 36 (Database issue):D577-581. Kos, M., and Tollervey, D. (2010). Yeast pre-rRNA processing and modification occur cotranscriptionally. Mol Cell 37, 809-820. Kuhn, R.M., Karolchik, D., Zweig, A.S., Wang, T., Smith, K.E., Rosenbloom, K.R., Rhead, B., Raney, B.J., Pohl, A., Pheasant, M., et al. (2009). The UCSC Genome Browser Database: update 2009. Nucleic Acids Res 37, D755-761. Lesser, C.F., and Guthrie, C. (1993). Mutational analysis of pre-mRNA splicing in Saccharomyces cerevisiae using a sensitive new reporter gene, CUP1. Genetics 133, 851-863. Rogic, S., Montpetit, B., Hoos, H. H., Mackworth, A. K., Ouellette, B. F., and Hieter, P. (2008). Correlation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces cerevisiae. BMC Genomics 9, 355-373 Loytynoja, A., and Goldman, N. (2008). Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632-1635. Siatecka, M., Reyes, J.L., and Konarska, M.M. (1999). Functional interactions of Prp8 with both splice sites at the spliceosomal catalytic center. Genes Dev 13, 1983-1993. Siepel, A., Haussler, D. (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol. 21, 468-488. Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M., Barton, G.J., (1999). Jalview Version 2 -- a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009 25, 1189-1191. Wuchty, S., Fontana, W., Hofacker, I.L., Schuster, P. (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers. 49, 145-165.