LCA Mixin
A mixin that implements Constant-Time Lowest Common Ancestor Retrieval.
“With the ability to solve lowest common ancestor queries in constant time, suffix trees can be used to solve many additional string problems.” — [Gusfield1997] §9, 196
“Definition In a rooted tree \(\mathcal{T}\), a node \(u\) is an ancestor of a node \(v\) if \(u\) is on the unique path from the root to \(v\). With this definition a node is an ancestor of itself. A proper ancestor of \(v\) refers to an ancestor that is not \(v\).
Definition In a rooted tree \(\mathcal{T}\), the lowest common ancestor (lca) of two nodes \(x\) and \(y\) is the deepest node in \(\mathcal{T}\) that is an ancestor to both \(x\) and \(y\).” — [Gusfield1997] Chapter 8, 181ff
- uint(x)
Convert a number into unsigned 32-bit representation.
Used only for debugging.
- nlz(x)
Get the number of leadings zeros in a 32 bit word.
>>> nlz(0) 32 >>> nlz(0x1) 31 >>> nlz(0xFF) 24 >>> nlz(0xFFFFFFFF) 0
See: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt
See: [Warren2013] pages 99ff
- msb(x)
Get the position of the most dignificat bit.
Get the position of the most significant set bit counting from the right and starting from 0.
>>> msb(0xF) 3 >>> msb(0xFF) 7 >>> msb(0) -1
- h(k)
Return the h value.
“Definition For any number \(k\), \(h(k)\) denotes the position (counting from the right) of the least-significant 1-bit in the binary representation of \(k\).” — [Gusfield1997] §8.5, 184ff
“Lemma 8.5.1. For any node \(k\) (node with path number k) in \(\mathcal{B}\), \(h(k)\) equals the height of node \(k\) in \(\mathcal{B}\).
For example, node 8 (binary 1000) is at height 4, and the path from it to a leaf has four nodes (three edges).” — [Gusfield1997] §8.5, 184ff
N.B. in this implementation we start counting with 0, so you get:
>>> h(5) 0 >>> h(8) 3
- class Node
Mixin for Node to allow LCA retireval.
- __init__()
This is inlined on all subclasses to save one __init__ call per symbol.
- lca_id
Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182
- I
For a node \(v\) of \(\mathcal{T}\), let \(I(v)\) be a node \(w\) in \(\mathcal{T}\) such that \(h(w)\) is maximum over all nodes in the subtree of \(v\) (including \(v\) itself). — [Gusfield1997] §8.5, 184ff
For any node \(v\), node \(I(v)\) is the deepest node in the run containing node \(v\). — [Gusfield1997] Lemma 8.6.1., 187
N.B. This is the id of the node \(I(v)\).
- A
Bit \(A_v(i)\) is set to 1 if and only if node \(v\) has some ancestor in \(\mathcal{T}\) that maps to height \(i\) in \(\mathcal{B}\), i.e. if and only if \(v\) has an ancestor \(u\) such that \(h(I(u))=i\). — [Gusfield1997] §8.7, 188f
N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181
- compute_A(A)
Compute A.
- compute_I_and_L(L)
Compute I and L.
- prepare_lca(counter)
Prepare the node for LCA retrievals.
- class Leaf
A mixin for leaf nodes to allow for LCA retrievals.
- __str__()
Return str(self).
- compute_A(A)
Compute A.
- compute_I_and_L(L)
Compute I and L.
- prepare_lca(counter)
Prepare the leaf node for LCA retrieval.
- lca_id
Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182
- I
For a node \(v\) of \(\mathcal{T}\), let \(I(v)\) be a node \(w\) in \(\mathcal{T}\) such that \(h(w)\) is maximum over all nodes in the subtree of \(v\) (including \(v\) itself). — [Gusfield1997] §8.5, 184ff
For any node \(v\), node \(I(v)\) is the deepest node in the run containing node \(v\). — [Gusfield1997] Lemma 8.6.1., 187
N.B. This is the id of the node \(I(v)\).
- A
Bit \(A_v(i)\) is set to 1 if and only if node \(v\) has some ancestor in \(\mathcal{T}\) that maps to height \(i\) in \(\mathcal{B}\), i.e. if and only if \(v\) has an ancestor \(u\) such that \(h(I(u))=i\). — [Gusfield1997] §8.7, 188f
N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181
- class Internal
A mixin for internal nodes to allow for LCA retrievals.
- __str__()
Return str(self).
- compute_A(A)
Compute A.
- compute_I_and_L(L)
Compute I and L.
- prepare_lca(counter)
Prepare the internal node for LCA retrieval.
- lca_id
Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182
- I
For a node \(v\) of \(\mathcal{T}\), let \(I(v)\) be a node \(w\) in \(\mathcal{T}\) such that \(h(w)\) is maximum over all nodes in the subtree of \(v\) (including \(v\) itself). — [Gusfield1997] §8.5, 184ff
For any node \(v\), node \(I(v)\) is the deepest node in the run containing node \(v\). — [Gusfield1997] Lemma 8.6.1., 187
N.B. This is the id of the node \(I(v)\).
- A
Bit \(A_v(i)\) is set to 1 if and only if node \(v\) has some ancestor in \(\mathcal{T}\) that maps to height \(i\) in \(\mathcal{B}\), i.e. if and only if \(v\) has an ancestor \(u\) such that \(h(I(u))=i\). — [Gusfield1997] §8.7, 188f
N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181
- class Tree(_d)
A mixin for suffix trees to allow for LCA retrievals.
- __init__(_d)
- nodemap
map from string position to leaf node
- prepare_lca()
Preprocess the tree for Lowest Common Ancestor retrieval.
[Gusfield1997] §8.7
- lca(x, y)
Return the lowest common ancestor node of nodes x and y.
>>> from suffix_tree import Tree >>> tree = Tree({"A": "xabxac", "B": "awyawxawxz"}) >>> tree.prepare_lca() >>> tree.lca(tree.nodemap["A"][1], tree.nodemap["B"][3]).lca_id 8