Illumina RNA Sequencing

RNA, the intermediate stage in the flow of genetic information from DNA to protein, contains many systems of expression and regulation that can be studied through RNA sequencing. While whole genome sequencing reveals information at the DNA level, next generation RNA sequencing (RNA-seq) characterizes the transcriptome and provides additional context to the genome with qualitative and quantitative expression profiles.

In contrast to the fixed state of the genome, the transcriptome’s variability in content and quantity provides a prime opportunity to study response mechanisms and control systems. For this reason, many RNA-seq projects compare transcript expression of an organism at different developmental stages, under various experimental conditions, and across multiple time points. Understanding the transcriptomic level changes between samples and conditions allows for the interpretation of functional regions of the genome and further extrapolation of how the resulting proteins function to support life.

The first step of any RNA-seq project is the isolation of total RNA from the sample of interest. Total RNA is composed of (among other things) both ribosomal RNA (rRNA) and messenger RNA (mRNA). While rRNA constitutes as much as 90% of total RNA, it is ubiquitous and often provides little in the way of helpful signals. Conversely, mRNA typically makes up only 1-2% of the sample but contains all of the expression information. Successful and cost-effective transcriptome profiling and multiplexing relies on focused sequencing of transcripts of interest, made possible through the removal of unwanted ribosomal material. Thus, a defining factor in any RNA-seq study choosing between one of two approaches to narrow in on the target content: rRNA depletion (negative selection) and polyA enrichment (positive selection). For additional information about the two strategies and pricing, please visit the individual service pages below.

  • mRNA Enrichment Icon

    mRNA Enrichment

    PolyA enrichment positively selects for transcripts with polyadenylated tails and is often referred to as mRNA sequencing. This method provides a clean way of sequencing only on-target fragments but is only applicable for organisms that create polyA-tailed transcripts. Because of this, polyA enrichment sequencing is generally the choice for eukaryote-only studies.

  • rRNA Depletion Icon

    rRNA Depletion

    rRNA depletion enzymatically degrades the ribosomal RNA fragments from a sample by relying on probes targeted at rRNA sequences specific to the organism. It is often (and confusingly) referred to as total RNA sequencing. rRNA depletion works well for prokaryotes and will capture all RNA molecules over 150 bp that are not degraded.

While these guiderails broadly apply, there is a nuance to selecting the most effective sequencing strategy for each project, with differing advantages and disadvantages to each method and biases that impact downstream analyses. We recommend visiting both of our specialty pages for more detailed descriptions to explore the options. If you are still unsure which method is best for your work, please do not hesitate to contact us.

RNA Workflow Overview

For both methods, SeqCenter uses Illumina library prep kits which include DNase treatment as the first step. This will remove any residual genomic content, but primary DNase treatment immediately after harvesting is recommended for maximum removal. To minimize degradation, samples continue straight through to the targeting method of choice, cDNA synthesis, and library preparation. Final libraries are run on high-throughput and accurate Illumina NovaSeq platforms. The resulting 2x150bp sequencing reads are distributed as compressed fastq files. Additionally, each order will receive a project report with all methods specific to your samples for ease of publication at a later date. We offer Basic and Intermediate Data Analysis packages for those seeking additional bioinformatics assistance.

Maximizing the Power of Your RNA Experiment

In order to obtain high-quality and biologically meaningful data from an RNA-seq experiment, careful experimental design is critical. In particular, attention must be paid to details such as sample preparation and storage, the level of biological replication, sequencing read depth, and batch effects.

RNA is much less stable than DNA, and degradation happens quickly at room temperature. This degradation leads not only to lowered quality and loss of data but also decreases the efficiency of downstream procedures, such as depletion. Quality RNA-seq data relies on converting the RNA to the more stable state of cDNA in a smooth and timely manner. While every experiment and sample type are different, a few general rules are considered to be best practices when working with RNA.

  • Ensure the sample remains as cold as possible from the moment of collection. This will maximize integrity and preserve expression profiles.
    • Standard freezer storage (-20°C) is sufficient for short-term storage of less than two weeks. However, for longer-term storage, freeze at -80°C to preserve stability for at least a year.
    • When actively working with RNA, defrosted samples should be kept in cold blocks or on wet ice.
    • Extractions and treatments requiring heating should be fine-tuned to minimize the amount of time samples spend at elevated temperatures.
  • Always avoid freeze-thaw cycles as each further degrades RNA (and DNA), with losses of 25% not unusual per cycle.
    • Plan any treatments or measurements to be done together.
    • Alternatively, create multiple frozen aliquots of each sample for specific uses to prevent unnecessary thawing. (Ex. After DNase treatment, each sample is divided into three tubes: one for sequencing, one for archiving, and one for quality assessment.)
  • Take precautions to shield samples from RNases, which are ever-present.
    • Many consumables are available in RNase-free varieties.
    • RNase-removing cleaners are made for decontaminating workspaces and implements.
  • RNA protectants can be used to preserve expression profiles if harvesting cannot happen immediately.

One of the large hurdles to any RNA-seq experiment is the inherent high variability in any biological system. Because of this variation, it is essential that there is sufficient biological replication to both power the study and identify any outlier samples.

Replication is arguably more important than read depth or read length in many power analyses. When determining the proper number of replicates, some factors to consider are effect size, acceptable false-positive and false-negative rates, and maximum sample size. If helpful, example power calculation tools and a study of RNA data variability can be found here:

  1. Wu, H., Wang, C. & Wu, Z. PROPER: comprehensive power evaluation for differential expression using RNA-seq. Bioinformatics 31, 233–241 (2015).
  2. Gaye, A. Extending the R Library PROPER to enable power calculations for isoform-level analysis with EBSeq. Front. Genet. 7, 225 (2017).
  3. Hansen, K. D., Wu, Z., Irizarry, R. A. & Leek, J. T. Sequencing technology does not eliminate biological variability. Nat. Biotechnol. 29, 572–573 (2011). Required reading for anyone considering RNA-seq or other -omics technologies. A well-written reminder of why quantitative RNA experiments will always need replicates, even if RNA assay technologies were perfect. The authors caution users against being overenthusiastic about new technologies and discarding lessons learned about experimental design.

Unfortunately, there is no simple equation to determine the number of reads required for a given RNA-seq project. It is recommended to find a highly cited peer-reviewed article in the relevant field and examine the methods and results for commonly accepted practices. When using previous studies, it is also important to determine if the numbers reported are (1) reads or read pairs and (2) for total output or for reads that were filtered in some way (i.e., reads successfully mapped or reads passing specific requirements). If the reported value is for filtered reads, total reads will need to be extrapolated.

To provide a starting point, we present the following table based on the cited articles below. All of these recommendations are based on experiments with three biological replicates per treatment group. For projects capturing data for more than one organism, we recommend additive sizing. If you are considering testing sample sizing, we recommend aiming higher as post-sequencing down sampling is possible, as well as sending representatives from all test groups. Both of these allow you to later use the initial data in the final analysis after implementing batch effect controls.

Example RNA-seq Projects and Sizes

Organism

Genome estimate Protein coding gene estimate Estimate of DE genes predicted captured Minimum package sizes recommended
Escherichia Coli (Bacteria)

5 Mbp

> 4,200

3,800

12M reads (6M read pairs)

Saccharomyces cerevisiae (Yeast)

12 Mbp > 7,200 2,000

25M reads (12.5M read pairs)

Solanum lycopersicum (Tomato Plant)

828 Mbp > 25,000 1,000 50M reads (25M read pairs)

Homo sapiens (Humans)

3.2 Gbp > 20,000 4,000

100M reads (50M read pairs)

Sequencing data is sensitive to batch effects due to the nature of amplification-based library preparation and flowcell clustering. In turn, differential expression analyses will also skew with batch effect, which can lead to noisy data and lost signal or even false explanatory variables. To minimize this and other batch effects, analyses will include quantification normalization, but additional care should be taken when planning sequencing batches to set up the downstream analysis for success.

Batch effects can be controlled for during analysis but only if the batch effect is symmetrical. In sequencing, this requires that equal numbers of replicates from each treatment or group be sequenced together. For example, if a researcher with one untreated group and two treatment groups wanted to add more replicates, equal numbers of samples for each of the three groups should be sent together. New analysis groups should not be sequenced by themselves and compared to old, because the batch effect will most likely be a confounding variable and produce false statistical signal.

Additional Material Considerations for All RNA Submissions

  • Material requirements for each service ensure library preparation success and high numbers of unique reads. Because the target material makes up a very tiny fraction of the total RNA, samples under 1 µg total have very limited chances of successful library preparation and could result in low read diversity. Successful preps for low concentration samples tend to indicate unsuccessful depletion and result in mostly ribosomal sequences.
  • Library preparation must continue to move from total RNA to cDNA without pausing due to the fragile nature of RNA. Because of this, there are not any holding points during library prep, and we are not able to assess sample quality without seeing the entire prep through.
    • Samples that are below the minimum requirements for success after the first step of DNase treatment will be removed from prep and will not be processed. All other samples that pass this threshold from the order will move forward. The customer will be notified of any failures at the time of distribution.
    • If you know in advance that any sample failures affect whether or not other samples should be sequenced (because of batch effect or loss of analytical power), please contact us to discuss these arrangements before submission.
    • If you know in advance that you would like SeqCenter to attempt library preparation for any samples below the 1µg total threshold, please make these arrangements before submission. Samples below the threshold that are processed will be billed for the full price of the sequencing package, regardless of sequencing output.
  • RNA integrity directly correlates to depletion efficiency, due to the targeted nature of both approaches. SeqCenter strongly recommends that customers assess fragment length before submission and does not perform fragment analysis as part of our standard pipelines. Additionally, RIN calculations can give some estimation of degradation but can vary greatly. Because of this, SeqCenter gives only an optional recommendation of RIN > 6.

Stranded Library Preparation

All RNA libraries that SeqCenter generates capture the original transcripts’ strand information. This data is critical in the identification of overlapping genes, appropriate splicing, and for aligning reads to poorly annotated reference genomes. In addition, many bioinformatic analyses require this information to ease the alignment process, and studies show that stranded information provides a more accurate estimate of transcript abundance.

SeqCenter relies on Illumina’s stranded technology to translate strand-specificity through library preparation and sequencing. Briefly, stranded information is captured through the use of dUTPs (instead of dTTPs) in the second strand synthesis step of cDNA synthesis. After adaptor ligation, second strand amplification will be suppressed in the final library amplification due to polymerase stalling at the location of the incorporated dUTPs. Due to directionality of sequencing adaptors, Read 1 (R1) will always map to the antisense strand and Read 2 (R2) will always map to the sense strand. How this process works in more detail can be found at Illumina’s knowledge link.

If you prefer to have a non stranded library preparation, please contact us with a description of your project to discuss what services we can provide.

Note for Small RNA Projects

We do not offer a standard library preparation service to capture microRNAs (miRNA) or small RNAs less than 150bp in length. If you are looking to sequence micro-RNAs or RNAs smaller than 150bp, please contact us to discuss custom services we can provide for your project.

Additional Resources:

  • Hansen, K. D., Wu, Z., Irizarry, R. A. & Leek, J. T. Sequencing technology does not eliminate biological variability. Biotechnol.29, 572–573 (2011). Required reading for anyone considering RNA-seq or other -omics technologies. A well-written reminder of why quantitative RNA experiments will always need replicates, even if RNA assay technologies were perfect. The authors caution users against being overenthusiastic about new technologies and discarding lessons learned about experimental design.
  • Wu, H., Wang, C. & Wu, Z. PROPER: comprehensive power evaluation for differential expression using RNA-seq. Bioinformatics.31, 233–241 (2015).
  • Gaye, A. Extending the R Library PROPER to enable power calculations for isoform-level analysis with EBSeq. Genet.7, 225 (2017).
  • The following are model-based examinations of minimum reads required for some model organisms. It is important to note that these are for minimal statistical power and bare minimum numbers to achieve it. Buffering room should be allowed for less than 100% depletion efficiency and noisy data.
    • Giannoukos G, Ciulla DM, Huang K, Haas BJ, Izard J, Levin JZ, Livny J, Earl AM, Gevers D, Ward DV, Nusbaum C, Birren BW, Gnirke A. Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 2012;13(3):R23. doi: 10.1186/gb-2012-13-3-r23. PMID: 22455878; PMCID: PMC3439974.
    • Haas, B.J., Chin, M., Nusbaum, C. et al. How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes?BMC Genomics 13, 734 (2012).
    • Schurch N. J., Schofield P., Gierlinski M., Cole C., Sherstnev A., Singh V., et al. (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA N. Y. N 22, 839–851. 10.1261/rna.053959.115
    • Lamarre S, Frasse P, Zouine M, Labourdette D, Sainderichin E, Hu G, Le Berre-Anton V, Bouzayen M, Maza E. (2018). Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size. Front Plant Sci. 2018 Feb 14;9:108. doi: 10.3389/fpls.2018.00108
    • Liu Y., Zhou J., White K. P. (2014). RNA-seq differential expression studies: more sequence or more replication?  Oxf. Engl. 30, 301–304. 10.1093/bioinformatics/btt688
    • Ching T., Huang S., Garmire L. X. (2014). Power analysis and sample size estimation for RNA-Seq differential expression. RNA20, 1684–1696. 10.1261/rna.046011.114
  • Palazzo, Alex & Lee, Eliza. (2015). Non-coding RNA: What is functional and what is junk?. Frontiers in genetics. 6. 2. 10.3389/fgene.2015.00002.
  • Levin, J. Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).

Go to Top