RNA Analysis

The goal of many RNA sequencing projects is to determine how genes are differentially expressed between experimental groups. With the ever-decreasing cost of sequencing and increasing flowcell capacities, processing large amounts of RNA-seq data can require vast computational resources not readily available to the average researcher and may hinder researchers from starting their analyses.

We offer two levels of RNA analysis packages, Basic and Intermediate, which can help our customers start analyzing their differential expression studies. These standard packages can be a great starting place for burgeoning bioinformaticians, but they will not pick up non-coding RNA and are not intended for metatranscriptomes.

Basic RNA Analysis

The first level of analysis we offer, Basic Analysis, maps RNA sequencing reads to a provided annotated reference genome. Read mapping and counting can be computationally expensive, so this package is ideal for customers who wish to perform their own comparisons and more in-depth analysis, but may lack the computational resources necessary to produce the raw counts in a reasonable amount of time.

Raw counts of CDS annotated genes are returned in table format, with multi-mapping reads discarded. Examples of methods can be found here for prokaryotic samples and here for eukaryotic samples. For experiments with multiple organisms, we standardly recommend performing independent mapping for each organism.

Requirements:

  • Data must be sequenced through an Illumina RNA pipeline.
  • An annotated assembly that is highly identical to the sample is required, in GenBank format (.gb, .gbk, .gbff). Other formats are not accepted.
    • For assemblies uploaded to NCBI, accession numbers are also accepted. Please verify that the correct version is specified (GenBank vs RefSeq) and reference with the correct accession number (GCA vs GCF).
  • Multi-organism analyses may incur additional charges and require additional considerations that should be discussed prior to sample submission.

Intermediate RNA Analysis

The second level we offer is Intermediate RNA Analysis, which expands on the Basic package. After mapping and counting, the pipeline normalizes the raw CDS counts and makes pairwise comparisons between experimental group means. If the reference organism is in the KEGG Pathway Catalog, pathway information is included in the output. Counts are normalized using edgeR’s Trimmed Mean of M’s method (TMM) before statistical analysis is performed. An overall PCA plot for general assessment of the samples is provided, along with high-level heatmaps and full and filtered lists of the differentially expressed genes for each comparison.

This package is most effective with multiple experimental replicates for each group. Multi-dimensional analyses are not supported.

Requirements:

  • Data must be sequenced through an Illumina RNA pipeline.
  • An annotated assembly that is highly identical to the sample is required, in GenBank format (.gb, .gbk, .gbff). Other formats are not accepted.
    • For assemblies uploaded to NCBI, accession numbers are also accepted. Please verify that the correct version is specified (GenBank vs RefSeq) and reference with the correct accession number (GCA vs GCF).
  • Multi-organism analyses may incur additional charges. Please contact us to discuss.
  • Each pairwise comparison between groups to be made must be listed. (Ex. Treatment 1 vs Untreated)
  • For pathway analysis to be included, the requested reference must be included in the KEGG Pathway Catalog.

Controlling for Batch Effect

Sequencing data, and RNA analysis by extension, is very sensitive to batch effects. Because of this, if additional sequencing or replicates will be added to an analysis, we strongly recommend that samples be added in a symmetrical fashion across treatment groups. (Ex. Each round of library preparation and sequencing should contain representatives from each group in equal numbers.) This type of symmetric batch effect can be controlled for during intermediate analysis and will minimize skew resulting from differences in batches.

Go to Top