Aligner
This module implements the aligner.
- class Data(score: float)
Private data class for the Needleman-Wunsch+Gotoh sequence aligner.
- __init__(score: float)
- score: float
The cumulative score up to this square.
- m: float
The score for this alignment as match.
- p: float
\(P_{m,n}\) in [Gotoh1982].
- q: float
\(Q_{m,n}\) in [Gotoh1982].
- backtrack: int | None
A copy of the backtracking matrix entry for debugging.
- prefilled: bool
True if this cell was prefilled.
- backtracked: bool
True if we actually backtracked through this entry.
- arrow() str
Return the appropriate arrow form.
- __str__()
Return the data as string useful for debugging.
- html() str
Return the data as HTML string useful for debugging.
- static str_size()
Return the size of the string returned by __str__.
- class Aligner(start_score: float = -1.0, open_score: float = -1.0, extend_score: float = -0.5)
A generic Needleman-Wunsch+Gotoh sequence aligner.
This implementation uses Gotoh’s improvements to get \(\mathcal{O}(mn)\) running time and reduce memory requirements to essentially the backtracking matrix only. In Gotoh’s technique the gap weight formula must be of the special form \(w_k = uk + v\) (affine gap). \(k\) is the gap size, \(v\) is the gap opening score and \(u\) the gap extension score.
The aligner is type-agnostic. When the aligner wants to compare two objects, it calls the method
similarity()
with both objects as arguments. This method should return the score of the alignment. The score should increase with the desirability of the alignment, but otherwise there are no fixed rules.The score must harmonize with the penalties for inserting gaps. If the score for opening a gap is -1.0 (the default) then a satisfactory match should return a score > 1.0.
The
similarity()
function may consult a PAM or BLOSUM matrix, or compute a hamming distance between the arguments. It may also use auxiliary data like Part-of-Speech tags. In this case the data type aligned could be a dict containing the word and the POS-tag.- __init__(start_score: float = -1.0, open_score: float = -1.0, extend_score: float = -0.5)
- start_score: float
The gap opening score at the start of the string. Set this to 0 to find local alignments.
- open_score: float
The gap opening score \(v\).
- extend_score: float
The gap extension score \(u\).
- align(seq_a: Sequence[object], seq_b: Sequence[object], similarity: Callable[[object, object], float], gap_a: Callable[[], object] | None = None, gap_b: Callable[[], object] | None = None) Tuple[Sequence[object], Sequence[object], float]
Align two sequences.
- Parameters:
seq_a – the first sequence to align
seq_b – the second sequence to align
similarity – a callable that returns the similarity of two members of seq_a and seq_b
gap_a – insert gap_a() for a gap in sequence a. None inserts None.
gap_b – insert gap_b() for a gap in sequence b. None inserts gap_a().
- Returns:
the aligned sequences and the score
- align_debug(seq_a: Sequence[object], seq_b: Sequence[object], similarity: Callable[[object, object], float], gap_a: Callable[[], object] | None = None, gap_b: Callable[[], object] | None = None, debug=False) Tuple[Sequence[object], Sequence[object], float, List[List[Data]]]
Align two sequences and return debug information.
- Parameters:
seq_a – the first sequence to align
seq_b – the second sequence to align
similarity – a callable that returns the similarity of two members of seq_a and seq_b
gap_a – insert gap_a() for a gap in sequence a. None inserts None.
gap_b – insert gap_b() for a gap in sequence b. None inserts gap_a().
debug – calculate debug information if True, debug info wastes memory
- Returns:
the aligned sequences, the score, and the debug matrix