Assembly and Annotation

Producing an assembly is often the end goal of many sequencing projects, and there is a complex landscape of assembly options that we are happy to help you navigate. Advances in long read technology on the Oxford Nanopore and PacBio platforms have made many traditional assembly methods outdated or obsolete. Meanwhile, Illumina’s per-base value has continued to improve, and the economic accessibility of Illumina has provided a plethora of high-throughput, high-quality data.

SeqCenter specializes in leveraging both read length and read accuracy to generate high quality collapsed de novo draft assemblies, with peer-reviewed, open-source software. Below are four types of automated draft assembly services we offer, with specific requirements and example methods detailing the software used for each pipeline. As with all bioinformatic analyses, SeqCenter recommends that customers ensure that a given workflow is appropriate for their organism and sample type, and to validate all results. All draft genomes should be reviewed and adjusted according to organism-specific best practices before submission or use.


PacBio Long Read Assemblies

The quality and read length of PacBio data is unmatched in the current genomic climate. We observe an average read length between 15-18kb in our PacBio datasets with about 98% of the dataset exceeding Q30 quality scores. These long, highly accurate reads generate the best assemblies that we observe at SeqCenter, as they can easily assemble through complex and repeat-rich genomic regions.

The on-instrument base detection can also capture raw methylation kinetics, which are retained in the generated BAM files. Upon assembly, we can map the raw kinetic data to the generated assembly to report genomic locations that are likely 4mC or 6mA methylation sites, relative to the final assembly.

We routinely utilize this pipeline for bacterial and fungal samples to create beautiful de novo genome assemblies and annotations. For additional information on PacBio collapsed assemblies for higher-order Eukaryotes, please contact us. Example of methods documents can be found here for prokaryotic samples and here for eukaryotic samples, detailing the software that will be used.

Requirements:

  • Data must be sequenced on the PacBio instrument.
  • Intent to retain kinetics data must be indicated prior to sequencing. This data is not retained by default.
  • Methylation calling of the 4mC and 6mA sites is limited to bacteria/fungi/yeast.
  • Genome annotation is limited to bacteria/fungi/yeast. (Custom analysis may be available for others.)

Hybrid Assemblies with Long and Short Reads

While the generated data from the PacBio instrument can create beautiful assemblies, the cost of entry is quite high and therefore less accessible to small-scale projects. To remedy this, SeqCenter offers “hybrid” or “combo” de novo assemblies which use the long reads generated from Oxford Nanopore Technologies (ONT) and the highly accurate short reads generated from Illumina. Historically, despite their comparatively lower quality, ONT reads significantly help to bridge gaps in complex assemblies by virtue of their length.

Recent advances in ONT data allow for the “super-accurate basecalling” on ONT instruments using the v10 chemistry from Oxford Nanopore. This allows us to generate reads with moderate accuracy that can be assembled to provide a backbone. The initial assembly is then polished with Illumina reads, which are aligned to the generated assembly to make base-pair corrections in the consensus and to bridge any gaps left from the initial ONT assembly. Similar to PacBio, kinetics data is captured during ONT sequencing and contains information that can be used to call 5mC sites. This data is retained within all generated BAM files from the ONT instrument. The combination of both Illumina and Oxford Nanopore technologies allows us at SeqCenter to provide you with a high-quality draft genome assembly at an accessible cost. Examples of methods documents can be found here for prokaryotic samples and here for eukaryotic samples, detailing the software that will be used.

Requirements:

  • Data must be sequenced on both the Illumina and ONT platforms.
  • Genome annotation is limited to bacteria/fungi/yeast. (Custom analysis may be available for other sample types.)

Oxford Nanopore Long Read Assemblies

For certain use cases, you may be more interested in the overall arrangement of a genome assembly and less concerned about the per-base accuracy of the assembly. This is most applicable to labs with ongoing research projects where an ancestral strain was sequenced and well documented, and the research group is particularly focused on a specific insertion, deletion, or other rearrangement. This application is also useful for preliminary hypothesis generation. A SeqCenter de novo long read draft assembly can always be complemented later with Illumina data to improve the per-base accuracy of the assembly.

Due to lower base-pair accuracy, the generated annotation data may not reflect the true functions of an organism. SeqCenter does not recommend this draft assembly option for amateur bioinformatic uses. Examples of methods documents can be found here for prokaryotic samples and here for eukaryotic samples, detailing the software that will be used.

Requirements:

  • Data must be sequenced on the ONT platform.
  • Genome annotation is limited to bacteria/fungi/yeast. (Custom analysis may be available for other sample types.)

Short Read Assemblies

Economical and highly abundant, Illumina data has utility in genome assemblies as well. However, since the reads are significantly shorter than either PacBio or Oxford Nanopore, the chance of generating a fragmented assembly is substantially higher due to an inability to assemble through complex or repeat-rich regions.

Typically, a de novo short read draft assembly will generate more than 100 contigs for an organism with a single expected chromosome. This is often sufficient for preliminary analyses and the identification of SNPs and some deletions, but cannot provide information on large indels, mobile elements, and larger structural rearrangements due to the high prevalence of gaps.

Due to these limitations, SeqCenter does not recommend this draft assembly option for amateur bioinformatic uses. Examples of methods documents can be found here for prokaryotic samples and here for eukaryotic samples, detailing the software that will be used.

Requirements:

  • Data must be sequenced through the Illumina WGS pipeline.
  • Genome annotation is limited to bacteria/fungi/yeast. (Custom analysis may be available for other sample types.)

Go to Top