• Software
  • NOVA APPLICATIONS
    Protein Modeling
  • Molecular Biology
  • Automated Virtual Cloning
  • Clone Sequence Verification
  • Gel Electrophoresis Simulation
  • Multiple Sequence Alignment
  • Pairwise Sequence Alignment
  • PCR Site-Directed Mutagenesis
  • PCR Primer Design
  • Sanger Sequence Assembly
  • Protein Analysis
  • Protein Docking
  • Protein Structure Prediction
  • Genomics
  • Clinical Research
  • De Novo Genome Assembly
  • Variant Analysis
  • Whole Genome/Whole Exome
  • Transcriptomics
  • ChIP-seq Data Analysis
  • RNA-Seq Alignment and Analysis
  • Services
  • COVID-19
  • Product Updates
  • Product Notifications
  • Educational Software Request
  • Help + Tutorials
  • About
  • Contact

QUESTIONS? CALL 866.511.5090

DOWNLOAD FREE TRIAL
SHOPPING CART
MY ACCOUNT
DNASTAR DNASTAR
  • Software
  • NOVA APPLICATIONS
    Protein Modeling
  • Molecular Biology
  • Automated Virtual Cloning
  • Clone Sequence Verification
  • Gel Electrophoresis Simulation
  • Multiple Sequence Alignment
  • Pairwise Sequence Alignment
  • PCR Site-Directed Mutagenesis
  • PCR Primer Design
  • Sanger Sequence Assembly
  • Protein Analysis
  • Protein Docking
  • Protein Structure Prediction
  • Genomics
  • Clinical Research
  • De Novo Genome Assembly
  • Variant Analysis
  • Whole Genome/Whole Exome
  • Transcriptomics
  • ChIP-seq Data Analysis
  • RNA-Seq Alignment and Analysis
  • Services
  • COVID-19
  • Product Updates
  • Product Notifications
  • Educational Software Request
  • Help + Tutorials
  • About
  • Contact

Pairwise Alignment with MegAlign Pro: Choosing the best alignment for your project

Pairwise Alignment with MegAlign Pro: Choosing the best alignment for your project

Pairwise Alignment with MegAlign Pro: Choosing the best alignment for your project

November 20, 2018 Best Practices, Sequence Analysis

Written by Eric Cabot, PhD

With the latest version of Lasergene, MegAlign Pro, part of the Molecular Biology Suite, supports pairwise sequence alignments. In addition to the five multiple alignment engines that were already available, users can now perform local, global and semi-global pairwise alignments using the industry standard Smith-Waterman and Needleman-Wunsch algorithms. This article highlights some of these new capabilities and explores situations where aligning sequences in pairs is more appropriate than aligning many sequences at once. At the end of the article, we’ve also provided some example projects for you to experiment with the different alignment methods.

When should I use a pairwise alignment?

The answer to this question may seem obvious: use pairwise alignment when you are only interested in two sequences. Also, sometimes pairwise alignment is simply more suitable than multiple alignment, and we’ll look at some examples later on. Additionally, there are situations where a multiple sequence alignment (MSA) might help identify pairs for sequences or sub-sequences that are worth a more detailed, pairwise comparison.

Beyond workflow considerations, there are some fundamental differences between the two categories of alignment that might make pairwise alignment a better option for some sequence comparisons. Due to the nature of progressive multiple aligners (including Clustal, MUSCLE and MAFFT), the final sequence alignment can contain inappropriately placed gaps, which adversely affect the interpretation of the results. To understand this, we need to take a closer look at how progressive multiple alignments work. The process invariably begins with a single pairwise alignment, adding gaps as necessary in order to minimize the number of mismatches. As the aligner proceeds, additional gaps are added as single sequences and groups of sequences aligned during an earlier stage of the process are included in the growing multiple sequence alignment. During this phase, gaps may be added but are never removed.

This “once a gap, always a gap” approach is a potential drawback that is shared by all progressive multiple alignment algorithms. The heart of the problem is that gap placement (and therefore the alignment) might be affected by the order in which sequences are aligned to each other because sequences added later in the process might be incorrectly aligned. All of the multiple alignment engines used by MegAlign Pro use a “guide tree” based on pairwise similarities of sequences to determine the order in which to align sequences. The first pair chosen consists of the two that are least distant on the guide tree. If the nearest neighbor to this pair is more distant than some other pair are to each other, that pair gets aligned to each other. If not, the neighbor is aligned with the first pair and gaps are added as necessary. In later rounds there may be no singleton sequences left, just clusters of two or more sequences that got aligned. Imagine a case where one of a group of early aligned sequences should have been added later, or where a close relative was added too late. It’s hard to know when this has happened unless you have some a priori information, such as knowledge of the evolutionary relationship of your group of sequences.

The bottom line is that when you examine just a pair from a multiple sequence alignment you may not see the same results as a pairwise alignment of just the two. So the direct approach under these circumstances might give a better picture of the relatedness of the pair.

Which is better, multiple or pairwise alignment?

This question is difficult to answer because it very much depends on how the alignment is going to be used.

Mechanistically, the best sequence alignment is the one that produces the fewest number of mismatches. That metric can be misleading, especially if minimizing the score entails extreme amounts of gapping. Consider an example where the goal is to identify some particular conserved domains or a large insertion. Here the placement of gaps outside of the regions of interest may well be of limited concern. Note that with MegAlign Pro, you can select the interesting regions identified by a multiple alignment and copy them as subsequences to a new document for further analysis.

Now consider a situation where a multiple sequence alignment is used to represent the actual relatedness of a group of sequences. Here the alignment is essentially a model, typically of an evolutionary process. In this case the “best” alignment is the one that is most plausible in the light some biological theory or model. One way that this visualized, of course, is to use the alignment to make an evolutionary tree.

Now we are ready to take a more detailed looks at some examples of alignment workflows.

Three types of pairwise alignment: Local, Global and Semi-Global

The three types of alignment are actually quite similar, although they can often produce very different results. All use a method called dynamic programming to find the best scoring alignment between two sequences. Alignment scores are computed by adding up per-base match scores and subtracting a penalty for opening a gap (of any length) and another for the number of positions that have gaps. The match scores are based on a scoring matrix such as NUC42 or BLOSUM62.

Tip: It’s always a good idea to explore the effect of various settings of these three parameters to see if you can get a more desirable outcome.

Depending on your two sequences, the three methods can potentially yield widely different results, so it’s important to understand how they differ.

Local Pairwise Alignment

MegAlign Pro’s local alignment algorithm, a modernized variant of the one described by Smith-Waterman (1981), is designed specifically to find the highest scoring aligned segments of two sequences, even if the full extent of the two is not included in the final alignment. (Note: in MegAlign Pro, the “Show Context” check-box in the Style Panel lets you display any unaligned parts of the sequences flanking the aligned segments).

Global Pairwise Alignment

The alternative to locally aligning is to align globally. To do this MegAlign Pro uses two variants of the Needleman and Wunsch (1970) algorithm. Global aligners don’t try to find the best scoring segment, but instead require that the full extent of both sequences be included in their results. There is no requirement or guarantee that the best scoring pair of aligned segments from a local alignment will be aligned in a global alignment.

Semi-Global Pairwise Alignment

Semi-global alignment is a relatively new approach that is particularly suitable when the two sequences differ greatly in length. When that happens, the longer sequence will have overhangs on either end of the alignment. Since overhangs are represented with gaps, a global aligner will attempt to increase the match score and minimize accumulated gap penalties by aligning parts of the shorter sequence to overhanging sequence region(s). This effect can produce a number of unrealistic, usually small aligned segments spaced by gaps near the ends of the alignment. Semi-global alignment is designed to address this problem by not penalizing gaps in overhangs (aka “end gaps”).

 

The difference between these three pairwise approaches really can make an impact in the resulting alignment, but the choice of which to use really depends on your task. For basic cases, such as aligning two genes or proteins, local alignment is a good starting point, but when things get more complicated, global or semi-global may be the way to go. Let’s look at some examples to better understand the differences between these methods.

We have created three illustrated, step-by-step tutorials to demonstrate how these different alignment methods compare, and when to use each one.

  • Tutorial 1: Follow a multiple alignment with Global pairwise alignments
  • Tutorial 2: Align transcripts to genes using Local and Global pairwise alignments
  • Tutorial 3: Use Local pairwise alignment to find a gene within a genome

Want to try them yourself? Start by installing our free trial of Lasergene. Then follow along with the projects using free data from the DNASTAR website.

0
Share

Leave a Reply

Your email is safe with us.
Cancel Reply

Search Blog Posts

Categories

  • Blog
    • Best Practices
    • Clinical Research
    • DNASTAR Customer Stories
    • DNASTAR News
    • Newsletters
    • Next-Gen Sequencing
    • Press Releases
    • Product Notifications
    • Product Updates
    • Publications
    • Resources
    • Sequence Analysis
    • Structural Biology
    • Webinars
    • Workflows
  • featured post
  • Uncategorized

Recent Posts

  • Lasergene 17.3.3 Release Notes June 28, 2022
  • Why Structure Prediction Matters June 14, 2022
  • Expert Guided Protein Structure Prediction Webinar June 14, 2022
  • Why Structure Prediction Matters June 13, 2022
  • Why Structure Prediction Matters June 13, 2022

Tags

assembling sequences cloud Cloud Assemblies customers De Novo Assembly DNASTAR Genomics Lasergene Metagenomics Metagenomic Sequencing NCBI GenBank newsletters next-gen NGS NGS Sequence Alignment NGS Sequence Asembly publications seqbuilder pro SeqMan NGen sequence assembly Webinar

Archives

Find us on

Most Commented Posts

  • Eppley Institute Adopts DNASTAR Software By toms on March 13, 2013 0
  • Clustal Omega alignment does not complete and results in a “fatal error” message By Sharon Page on June 17, 2014 0
  • DNASTAR Lasergene Software Now Available on the Amazon Cloud By toms on September 4, 2014 0

Would you like to receive technical tips and special offers straight to your inbox?

  • About

Get a 14-Day free trial of our complete Lasergene package. Try before you buy!

FREE TRIAL DOWNLOAD

© 2023 — DNASTAR Privacy Policy

Prev Next
This website uses cookies to improve user experience and understand our web usage. By continuing to use our website, you consent to our use of cookies. Accept
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.