Authors: Changnian Song, Xiaoying Li, Nicholas Kibet Korir, Emrul Kayesh, Zhengqiang Ma & Jinggui Fang
Computational prediction of microRNAs is one of the most important approaches in microRNA studies. However, validation of the predicted microRNAs’ precise sequences is essential for further studies on microRNA biogenesis, evolution and functions. Here, we report a highly efficient method of determining the precise sequences of computationally predicted miRNAs, which method combines a miRNA-enriched library preparation, two specific 3’ and 5’ miRNA RACE (miR-RACE) reactions, and sequence-directed cloning. miR-RACE has the potential to overcome the major disadvantage of computational miRNA prediction methods in that they can not predict the precise sequence of miRNAs, which could make the bioinformatic prediction of miRNAs more powerful and accurate. The efficiency of this method can be reflected from the precise sequence validation of the miRNAs computationally predicted in citrus, apple, and some other fruit crops. Our on-going research indicates that miR-RACE can also be very effective in the verification of sequences of some doubtful miRNAs obtained by deep sequencing of small RNA libraries. The protocol of miR-RACE is a rapid one to be executed and can be completed in 2-3 days.
One of the most important developments in molecular biology over the past two decades is the emerging picture of a new level of gene regulation under the control of small yet versatile RNAs (1). Small RNA (sRNA) molecules are widely recognized as common and effective modulators of gene expression in many eukaryotic organisms. According to current knowledge, sRNAs are generally divided into several categories, including microRNAs (miRNAs), short interfering RNAs (siRNAs), trans-acting siRNAs (ta-siRNAs), natural antisense transcript siRNAs (nat-siRNAs), and piwi-interacting RNAs (piRNAs) in metazoans (2). In plants, microRNAs (miRNAs) are produced from partially complementary dsRNA precursor molecules (3, 4). These plant miRNAs are the most characterized sRNAs, and the pathways by which they are generated and they play roles in gene regulation have been well documented (2, 3, 5). Several hundred genes encoding miRNAs in plants have been experimentally identified by the traditional Sanger sequencing method, and an increasing number are predicted by numerous important computational methods that have been popularly utilized, especially in non-model but economically important plants. These methods mainly use secondary structural information to search expressed sequence tags (ESTs) and to mine the repository of available genomic sequences (6-9). These computational methods have obvious advantages, including the quick prediction of a large number of miRNAs, low costs, and the prediction of novel and non-abundant miRNAs that are usually difficult to clone directly. However, miRNA prediction algorithms often cannot predict the accurate location of the mature miRNA in a precursor with nucleotide-level precision. Even though false-positive predictions have been minimized using various scores and rank cutoffs, the precise sequences usually cannot be determined and several candidate miRNA orthologs or paralogs might be predicted for a specific miRNA. Unlike protein-coding genes defined by start and stop codons, the ends of miRNA molecules do not have special characteristics that can be used to exactly define the mature miRNAs. The determination of the precise sequence of the candidate mature miRNA predicted computationally is essential for downstream research applications, such as miRNA target prediction and further studies on miRNA evolution, the regulatory role of miRNAs, and the mechanism of miRNA biogenesis. The report about mutations in the seed region of human miR-96 that had a strong impact on miR96 biogenesis and resulted in a significant reduction in miRNA targeting could suggest the necessity of determining the precise sequences of mature miRNAs before further peering studies on them (10).
Experimental methods used to identify computationally identified miRNAs
In previous studies, the combination of computational prediction and experimental verification was the popular strategy used to identify miRNAs, whereby the experimental validation was mainly focused on determining the expression of the miRNAs by the robust techniques of Affymetrix Gene Chips (11, 12), RNA blotting and/or RT-PCR (13, 14). However, these techniques can only confirm the existence and size, but not the full precise sequence, of a miRNA predicted computationally. With the popularization of computational prediction of miRNAs in many organisms and a great number of new potential miRNAs predicted using the bioinformatic approaches and deposited in the miRBase Sequence Database (http://microrna.sanger.ac.uk/sequences/), the precise sequences of these orthologous miRNAs of those cloned from model organisms need to be determined before the initiation of further studies on their functions and biogenesis. To the best of our knowledge, no reports have employed a comprehensive strategy to determine the precise sequences of these miRNAs.
To validate the precise sequences of candidate miRNAs from computational prediction (pre-miRNAs), we developed an integrated approach, termed microRNA RACE (miR-RACE) which combines the strategies of a miRNA-enriched library preparation, 5’ RACE and 3’ RACE reactions, and sequence-directed cloning, thus making it possible to determine the sequences of pre-miRNAs and even the non-abundant miRNAs that are typically difficult to clone directly (15, 16). This is the first report about the validation of the termini nucleotides in the pre-miRNA. On-going research in our laboratory indicates that miR-RACE can also be used as a powerful method to verify sequences of miRNAs with low reads or those miRNAs that were generated from deep sequencing of small RNA libraries but whose integrity is questionable (data not shown).
miR-RACE comprises of the following main steps: (i) miRNA-enriched library preparation; (ii) 5’ miR-RACE and 3’ miR-RACE for accurate amplification of the 5’ and 3’ ends of a miRNA; (iii) PCR product cloning and sequencing; (iv) Cloning and sequencing of the RT-PCR products for validation of the PCR products. A schematic flowchart of this strategy for precise miRNA sequence determination is as shown in Fig. 1. The innovative core steps in miR-RACE are the two PCR reactions amplifying the 5’ and 3’ ends of the miRNA, in which two specific primers cover both parts of the candidate miRNA and adaptor. These two PCR reactions are denoted as 5’- and 3’-miR-RACE based on their similarity to the rapid amplification of cDNA ends (RACE) technique.
In this method, we utilized the same procedure to generate the miRNA-enriched sRNA library that has been popularly used to clone miRNAs and to measure the expression of miRNAs via RT-PCR (11-13), in which 5’- and 3’-end adaptors (Table 1) were linked to the sRNA molecules. This technique for the sRNA library construction, including the low molecular weight RNA extraction, ligation of 5’- and 3’-end adaptors, first-strand cDNA synthesis, and reverse transcription (see procedures) is routinely used in many laboratories and a flow chart illustrating the procedure is presented in Fig. 1a. After the preparation of the sRNA libraries from various organs and tissues, we pooled similar quantities of these library samples for further PCR amplification reactions.
The difference between our method and traditional RACE lies in the design of the gene-specific primers used. For the primers used in this method (Table 1), we designed one forward primer for the 5’ amplification and one reverse primer for the 3’ amplification, complementary to the 5’ and 3’ adaptors, respectively. The other two gene-specific primers (GSP1, GSP2) were designed considering two additional parameters in Fig. 1b. The first parameter was that the primers covered 17 nucleotides of the candidate mature miRNAs, with these 17 nucleotides being specific to the corresponding miRNA and meeting the criterion for the minimum number of nucleotides of a regular PCR primer. The second parameter was where mismatches between the sequences of the specific primers and the end sequences of the real miRNAs were allowed, with these mismatches not influencing the PCR amplification, similar to the principle employed in site-directed mutagenesis (17, 18) and in the addition of restriction sites to the termini of amplified DNA employed in recombinant DNA technology (19). We elected to use 17 rather than all of the nucleotides of the miRNA for primer design based on the hypothesis that four or fewer termini nucleotides at eider end of the predicted miRNA would vary, thus maintaining at least 75% identity between the primer and the miRNA orthologs, consistent with the conservation reported for cloned miRNA orthologs. This design would allow at most three and four mismatched nucleotides should be validated if found to be true.
The 3’ miR-RACE and 5’ miR-RACE-specific primers also included ten nucleotides of the adaptor sequence and ten nucleotides of PolyT, respectively, for longer primers of up to more than 21 nucleotides. These modifications could result in high specificity and a better match between the annealing temperatures of the specific primer and the opposite reverse adaptor primer, both of which were the most technically challenging steps in this miR-RACE. By using one specific primer and one reverse primer during PCR, the precise sequence of the end of the miRNA opposite to the specific primer could be correctly amplified and validated, in which the 17 nucleotides complimentary to the miRNA were sufficient for accurate and efficient PCR amplification of the opposite ends, and the mismatches within the gene specific primer (GSP) did not influence the primary aim of PCR-amplifying the two ends of the miRNAs. The anticipated sizes of the PCR products were estimated during the prediction of the miRNAs (Fig. 2), and the identity of these PCR products were validated in the subsequent cloning and sequencing. The sequence of the precursor of the miRNAs to be validated from the organism being studied was the definite reference to verify the miRNA sequence after cloning, sequencing and splicing the 3’ and 5’ miR-RACE PCR products.
In summary, the efficient and powerful approach developed herein can be successfully used to validate sequences of miRNAs, especially the termini nucleotides which depict the complete miRNA sequences in their precursors. This method functions as a complementary approach to all of the computational miRNA prediction methods developed for miRNA identification, making it possible to have the correct sequence of the interesting miRNAs predicted computationally and carry out the study on their conservation, biogenesis in the plant kingdom. The results from the studies on Citrus trifoliata (ctr-miRNA) and Malus domestica (mdo-miRNA) can suggest that it is an essential stop to validate the precise sequences of computationally predicted miRNAs before initiating further experimental studies on the miRNA (15, 16). The findings in our laboratory on the determination of the precise sequences of miRNAs predicted from Citrus reticulata, Citrus sinensis, peach, and strawberry (data not shown in this study) can also confirm the efficiency of this approach. This miR-RACE can be very effective in verifying the sequences of some questionable miRNAs derived from deep sequencing sRNA libraries, too.
Primer design for 5’ miR-RACE and 3’ miR-RACE
5’miR-RACE reactions are performed with the mirRacer 5’ primer and miRNA-gene-specific forward primers (GSP1), and 3’ miR-RACE reactions are carried out with the mirRacer 3’ primer and miRNA-gene-specific reverse primers (GSP2). GSP1 and GSP2 should be complementary to 17 nucleotide length sequences of the potential miRNAs and a part of Poly (T) and 5’ adaptor (Fig. 1b).
Small RNA extraction ● TIMING 1 h
Polyadenylation of small RNAs ● TIMING 2h
Ligation of 5’adaptor to Poly (A)-tailed small RNA ● TIMING 1.5 h
Reverse transcription ● TIMING 1.5 h
PCR amplification of small RNA-derived cDNAs by 5’miR-RACE and 3’miR-RACE ● TIMING 4.5 h
First-round amplification ● TIMING 2.5 h
Second-round amplification ● TIMING 2.5 h
Smeared product from the bottom of the gel to the loading well
Too much starting first strand cDNA, or too many cycles. Repeat amplification steps with less first strand cDNA or for fewer cycles and dilute a portion of the amplification products from the first round 1: 50 in TE buffer.
If no products are observed after 30 cycles in the second set of amplifications, add fresh Taq polymerase and carry out an additional 15 rounds of amplification. If efficient amplification is done, a clear product will be observed after a total of 45 cycles. If no product is observed again, test the integrity of reagents by carrying out a control PCR using known templates and primers.
Problems with nonspecific amplification products
Optimization of the annealing temperature is essential to avoid non specific PCR amplification, which can be done by gradually increasing the annealing temperature (about 2 °C at a time) during each stage in the procedure until the background products are lost. Alternatively, use a ‘Touchdown PCR’ procedure to optimize the annealing temperature of the reaction without trial and error.
5’ miR-RACE and 3’ miR-RACE PCR usually generates products that are primarily derived from the cDNA of interest and can be directly sequenced using GSP1 and GSP2 primers (Fig. 2). After the cloning of 5’ miR-RACE and 3’ miR-RACE products, splicing miRNA, and then aligning the miRNA’s sequence with its precursor sequence for confirmation, the accurate sequences of potential miRNAs predicted by computational methods were identified. From the result of our laboratory in plant, miR-RACE was workable and reasonable in determining the termini three or four nucleotides at either end of the predicted miRNAs.
This research was supported by grants of the Science & Technology Key Project of Ministry of Education of China (No. 109084), the Program of NCET (No. NCET96), and the Fundamental Research Funds for the Central Universities (No. KYJ200909).
Table_1: Example of the primers used for miR-5’ RACE, and miR-3’ RACEExample of the primers used for miR-5’ RACE, and miR-3’ RACE
GSP1 is the specific primer for 5’ miR-RACE, and the underlined region base pairs with the 3’ poly (A)n; GSP2 is the specific primer used for 3’ miR-RACE, and the underlined region base pairs with the 5’ adaptor.
Figure_1: Schematic flowchart of 5’ and 3’ miR-RACE
(a) miRNA cDNA library construction. (b) Analysis of 5’ miR-RACE and 3’ miR-RACE. GSP1 and GSP2 are as listed in Table 1.
Figure_2: Example of the 3’ and 5’ miR-RACE products of miRNAs amplified by PCR are shown in an ethidium bromide-stained agarose gel.
Lanes 1-9 are 3’ miR-RACE products of 9 Poncirus trifoliate miRNAs that are ptr-miR156, ptr-miR164, ptr-miR167, ptr-miR171, ptr-miR319, ptr-miR482a, ptr-miR482b, ptr-miR435, and ptr-miR1446, respectively, while lanes 10-18 are their 5’ miR-RACE products. The sizes of the molecular weight markers of the bottom and the second from bottom bands are 50 bp and 100bp, respectively.
Table 2: PROCEDURE step 3 PCR reaction system
Table 3: PROCEDURE step 11 PCR reaction system
Table 4: PROCEDURE step 19 PCR reaction system Download Table 4
Table 5: PROCEDURE step 21 PCR reaction system
Table 6: PROCEDURE step 26 PCR reaction system
Table 7: PROCEDURE step 29 PCR reaction system Download Table 7
Table 8: PROCEDURE step 31 PCR reaction system
Table 9: PROCEDURE step 34 PCR reaction system
Changnian Song, Xiaoying Li, Nicholas Kibet Korir, Emrul Kayesh, Zhengqiang Ma & Jinggui Fang, Jinggui Fang Lab
Correspondence to: Jinggui Fang ([email protected])
Source: Protocol Exchange (2010) doi:10.1038/protex.2010.203. Originally published online 3 December 2010.