• Software
  • NOVA APPLICATIONS
    Protein Modeling
  • Molecular Biology
  • Automated Virtual Cloning
  • Clone Sequence Verification
  • Gel Electrophoresis Simulation
  • Multiple Sequence Alignment
  • Pairwise Sequence Alignment
  • PCR Site-Directed Mutagenesis
  • PCR Primer Design
  • Sanger Sequence Assembly
  • Protein Analysis
  • Protein Docking
  • Protein Structure Prediction
  • Genomics
  • Clinical Research
  • De Novo Genome Assembly
  • Variant Analysis
  • Whole Genome/Whole Exome
  • Transcriptomics
  • ChIP-seq Data Analysis
  • RNA-Seq Alignment and Analysis
  • Services
  • COVID-19
  • Product Updates
  • Product Notifications
  • Educational Software Request
  • Help + Tutorials
  • About
  • Contact

QUESTIONS? CALL 866.511.5090

DOWNLOAD FREE TRIAL
SHOPPING CART
MY ACCOUNT
DNASTAR DNASTAR
  • Software
  • NOVA APPLICATIONS
    Protein Modeling
  • Molecular Biology
  • Automated Virtual Cloning
  • Clone Sequence Verification
  • Gel Electrophoresis Simulation
  • Multiple Sequence Alignment
  • Pairwise Sequence Alignment
  • PCR Site-Directed Mutagenesis
  • PCR Primer Design
  • Sanger Sequence Assembly
  • Protein Analysis
  • Protein Docking
  • Protein Structure Prediction
  • Genomics
  • Clinical Research
  • De Novo Genome Assembly
  • Variant Analysis
  • Whole Genome/Whole Exome
  • Transcriptomics
  • ChIP-seq Data Analysis
  • RNA-Seq Alignment and Analysis
  • Services
  • COVID-19
  • Product Updates
  • Product Notifications
  • Educational Software Request
  • Help + Tutorials
  • About
  • Contact

Tips for Successful Transcriptome Sequence Assembly

Tips for Successful Transcriptome Sequence Assembly

Tips for Successful Transcriptome Sequence Assembly

October 4, 2018 Best Practices, Next-Gen Sequencing

By Matthew Keyser

De novo assembly of whole transcriptome sequencing data from non-model organisms can be very challenging for a variety of reasons. Factors such as computing limitations and unseen problems lurking in the sequence data can quickly derail your results and your timeline.  For example, the presence of untrimmed Illumina adapter sequences, ribosomal RNA, and genomic repetitive sequences often leads to failed or poor quality assemblies.

Furthermore, even a successful assembly will typically yield thousands of unannotated contigs. In order to produce a complete and annotated mRNA set from the contig assembly, you may need to master multiple assembly and annotation pipelines.

Fortunately, by using DNASTAR’s novel assembly and annotation algorithms, and following a few tips before and after sequencing, you can greatly increase your chance for success.

Below, we’ve compiled our top recommendations for de novo transcriptome sequencing and assembly. These recommendations are based on extensive testing using a variety of transcriptome data sets from the NCBI Short Read Archive, as well as years of experience working with DNASTAR customers on transcriptome assembly and annotation projects.

 

Check your transcriptome sequence data prior to assembly

Transcriptome sequencing data commonly contains untrimmed adapters that severely impair de novo assembly.  Prior to assembly, I always use FASTQC (free download Babraham Institute) to scan the fastq input data files for the presence of adapters. I then use the adapter scan in the Assembly Options page of SeqMan NGen to remove them.

In my experience, more than 50% of the transcriptome data sets downloaded from the NCBI’s Short Read Archive contain unacceptably high levels of the “Illumina Universal Adapter.” Once trimmed, these data sets assemble in a fraction of the time, with longer and more completely assembled mRNAs.

To scan for adapter sequences, load a fasta or text file with the adapter sequences in the Assembly Options step in SeqMan NGen.

 

Read length matters

The algorithms used for de novo transcriptome assembly and RNA-seq are quite different.  While it might be tempting to de novo assemble your 500 million, 2x76bp read Illumina RNA-seq data set, most assembled transcripts will be truncated (shorter) compared to an assembly run with longer reads.  Here are results from de novo assembly of transcriptomic data from 20 different organisms and 7 different Illumina paired read lengths:

 

While 50-100bp paired Illumina reads are perfectly suitable for RNA-seq analysis, de novo transcriptome assembly is greatly improved when using paired Illumina reads 150bp or greater.

 

 

Number of cores matters

The DNASTAR de novo assembly algorithms are optimized to use the maximum number of available cores in your computer.  While a standard Intel i7, 6-core desktop computer will perform well, assembly times on the largest data sets are significantly reduced using a computer such as the 16-core AMD Ryzen “Threadripper”.  If powerful desktop computers are not your thing, you can acheive similarly fast results using DNASTAR Cloud Assemblies to run your assembly on an Amazon 16-core computer. The graph below shows assembly times, in hours, on three different hardware configurations, including  the DNASTAR Cloud. For data sets with over 50 million reads, there is a significant time savings when using a computer with 16 cores.

 

Use transcript databases for auto-annotation

During assembly setup, DNASTAR provides users with mRNA RefSeq database options that our assembly algorithms utilize for both assembly and auto-annotation based on matching.  The use of RefSeq mRNA is done automatically at assembly setup. The process is seamless and automated. All you need to do is select an mRNA database from related taxa or select the entire RefSeq database and the process is seamless and automated.  The output includes an interactive table of annotations and an individual assembly file for each transcript allowing you to closely evaluate the quality of the output.

By comparison, BLAST-based annotation approaches for data sets containing thousands of query sequences can be extremely time consuming (taking multiple days) and difficult to manage.

For many non-model organisms, DNASTAR’s auto-annotation approach results in many identified transcripts:

Organism Identified transcripts Novel transcripts
Hogfish 31,402 37,142
Brassica napus 63,808 16,434
Orchid 31,059 19,758
Bent Grass 5,052 11,136
Gecko 4,843 3,345
Atlantic salmon 26,285 7,331
Giardia intestinalis 9,338 4,644
Holstein cow 40,856 4,687
Neurospora crassa 20,799 2,810

Novel transcripts can sometimes be identified using additional annotation options, like DNA to protein database matching . Please contact DNASTAR for more information on this topic.

 

While de novo transcriptome assemblies can be challenging, and often fail due to untrimmed linkers and computing shortcomings (too many short Illumina reads for too few processor cores), these issues are easily overcome by making some preparations before and after sequencing, and by using the SeqMan NGen assembler in the DNASTAR Genomics Suite.

 

Want to learn more about transcriptome analysis using DNASTAR Genomics Suite?

  • Watch our de novo transcriptome video to see the assembly and analysis worklow
  • Try our next generation sequence analysis tools for yourself by requesting a fully functional trial of Lasergene
0
Share

Leave a Reply

Your email is safe with us.
Cancel Reply

Search Blog Posts

Categories

  • Blog
    • Best Practices
    • Clinical Research
    • DNASTAR Customer Stories
    • DNASTAR News
    • Newsletters
    • Next-Gen Sequencing
    • Press Releases
    • Product Notifications
    • Product Updates
    • Publications
    • Resources
    • Sequence Analysis
    • Structural Biology
    • Webinars
    • Workflows
  • featured post
  • Uncategorized

Recent Posts

  • Lasergene 17.3.3 Release Notes June 28, 2022
  • Why Structure Prediction Matters June 14, 2022
  • Expert Guided Protein Structure Prediction Webinar June 14, 2022
  • Why Structure Prediction Matters June 13, 2022
  • Why Structure Prediction Matters June 13, 2022

Tags

assembling sequences cloud Cloud Assemblies customers De Novo Assembly DNASTAR Genomics Lasergene Metagenomics Metagenomic Sequencing NCBI GenBank newsletters next-gen NGS NGS Sequence Alignment NGS Sequence Asembly publications seqbuilder pro SeqMan NGen sequence assembly Webinar

Archives

Find us on

Most Commented Posts

  • Eppley Institute Adopts DNASTAR Software By toms on March 13, 2013 0
  • Clustal Omega alignment does not complete and results in a “fatal error” message By Sharon Page on June 17, 2014 0
  • DNASTAR Lasergene Software Now Available on the Amazon Cloud By toms on September 4, 2014 0

Would you like to receive technical tips and special offers straight to your inbox?

  • About

Get a 14-Day free trial of our complete Lasergene package. Try before you buy!

FREE TRIAL DOWNLOAD

© 2026 — DNASTAR Privacy Policy

Prev Next
This website uses cookies to improve user experience and understand our web usage. By continuing to use our website, you consent to our use of cookies. Accept
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.