Aligner

This module implements the aligner.

class Data(score: float)

Private data class for the Needleman-Wunsch+Gotoh sequence aligner.

__init__(score: float)

score: float: The cumulative score up to this square.

m: float: The score for this alignment as match.

p: float: \(P_{m,n}\) in [Gotoh1982].

q: float: \(Q_{m,n}\) in [Gotoh1982].

backtrack: int | None: A copy of the backtracking matrix entry for debugging.

prefilled: bool: True if this cell was prefilled.

backtracked: bool: True if we actually backtracked through this entry.

arrow() → str: Return the appropriate arrow form.

__str__(): Return the data as string useful for debugging.

html() → str: Return the data as HTML string useful for debugging.

static str_size(): Return the size of the string returned by __str__.

class Aligner(start_score: float = -1.0, open_score: float = -1.0, extend_score: float = -0.5)

A generic Needleman-Wunsch+Gotoh sequence aligner.

This implementation uses Gotoh’s improvements to get \(\mathcal{O}(mn)\) running time and reduce memory requirements to essentially the backtracking matrix only. In Gotoh’s technique the gap weight formula must be of the special form \(w_k = uk + v\) (affine gap). \(k\) is the gap size, \(v\) is the gap opening score and \(u\) the gap extension score.

The aligner is type-agnostic. When the aligner wants to compare two objects, it calls the method similarity() with both objects as arguments. This method should return the score of the alignment. The score should increase with the desirability of the alignment, but otherwise there are no fixed rules.

The score must harmonize with the penalties for inserting gaps. If the score for opening a gap is -1.0 (the default) then a satisfactory match should return a score > 1.0.

The similarity() function may consult a PAM or BLOSUM matrix, or compute a hamming distance between the arguments. It may also use auxiliary data like Part-of-Speech tags. In this case the data type aligned could be a dict containing the word and the POS-tag.