LCA Mixin

A mixin that implements Constant-Time Lowest Common Ancestor Retrieval.

“With the ability to solve lowest common ancestor queries in constant time, suffix trees can be used to solve many additional string problems.” — [Gusfield1997] §9, 196

Definition In a rooted tree \(\mathcal{T}\), a node \(u\) is an ancestor of a node \(v\) if \(u\) is on the unique path from the root to \(v\). With this definition a node is an ancestor of itself. A proper ancestor of \(v\) refers to an ancestor that is not \(v\).

Definition In a rooted tree \(\mathcal{T}\), the lowest common ancestor (lca) of two nodes \(x\) and \(y\) is the deepest node in \(\mathcal{T}\) that is an ancestor to both \(x\) and \(y\).” — [Gusfield1997] Chapter 8, 181ff

uint(x)

Convert a number into unsigned 32-bit representation.

Used only for debugging.

nlz(x)

Get the number of leadings zeros in a 32 bit word.

>>> nlz(0)
32
>>> nlz(0x1)
31
>>> nlz(0xFF)
24
>>> nlz(0xFFFFFFFF)
0

See: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt

See: [Warren2013] pages 99ff

msb(x)

Get the position of the most dignificat bit.

Get the position of the most significant set bit counting from the right and starting from 0.

>>> msb(0xF)
3
>>> msb(0xFF)
7
>>> msb(0)
-1
h(k)

Return the h value.

Definition For any number \(k\), \(h(k)\) denotes the position (counting from the right) of the least-significant 1-bit in the binary representation of \(k\).” — [Gusfield1997] §8.5, 184ff

Lemma 8.5.1. For any node \(k\) (node with path number k) in \(\mathcal{B}\), \(h(k)\) equals the height of node \(k\) in \(\mathcal{B}\).

For example, node 8 (binary 1000) is at height 4, and the path from it to a leaf has four nodes (three edges).” — [Gusfield1997] §8.5, 184ff

N.B. in this implementation we start counting with 0, so you get:

>>> h(5)
0
>>> h(8)
3
class Node

Mixin for Node to allow LCA retireval.

__init__()

This is inlined on all subclasses to save one __init__ call per symbol.

lca_id

Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182

I

For a node \(v\) of \(\mathcal{T}\), let \(I(v)\) be a node \(w\) in \(\mathcal{T}\) such that \(h(w)\) is maximum over all nodes in the subtree of \(v\) (including \(v\) itself). — [Gusfield1997] §8.5, 184ff

For any node \(v\), node \(I(v)\) is the deepest node in the run containing node \(v\). — [Gusfield1997] Lemma 8.6.1., 187

N.B. This is the id of the node \(I(v)\).

A

Bit \(A_v(i)\) is set to 1 if and only if node \(v\) has some ancestor in \(\mathcal{T}\) that maps to height \(i\) in \(\mathcal{B}\), i.e. if and only if \(v\) has an ancestor \(u\) such that \(h(I(u))=i\). — [Gusfield1997] §8.7, 188f

N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181

compute_A(A)

Compute A.

compute_I_and_L(L)

Compute I and L.

prepare_lca(counter)

Prepare the node for LCA retrievals.

class Leaf

A mixin for leaf nodes to allow for LCA retrievals.

__str__()

Return str(self).

compute_A(A)

Compute A.

compute_I_and_L(L)

Compute I and L.

prepare_lca(counter)

Prepare the leaf node for LCA retrieval.

lca_id

Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182

I

For a node \(v\) of \(\mathcal{T}\), let \(I(v)\) be a node \(w\) in \(\mathcal{T}\) such that \(h(w)\) is maximum over all nodes in the subtree of \(v\) (including \(v\) itself). — [Gusfield1997] §8.5, 184ff

For any node \(v\), node \(I(v)\) is the deepest node in the run containing node \(v\). — [Gusfield1997] Lemma 8.6.1., 187

N.B. This is the id of the node \(I(v)\).

A

Bit \(A_v(i)\) is set to 1 if and only if node \(v\) has some ancestor in \(\mathcal{T}\) that maps to height \(i\) in \(\mathcal{B}\), i.e. if and only if \(v\) has an ancestor \(u\) such that \(h(I(u))=i\). — [Gusfield1997] §8.7, 188f

N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181

class Internal

A mixin for internal nodes to allow for LCA retrievals.

__str__()

Return str(self).

compute_A(A)

Compute A.

compute_I_and_L(L)

Compute I and L.

prepare_lca(counter)

Prepare the internal node for LCA retrieval.

lca_id

Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182

I

For a node \(v\) of \(\mathcal{T}\), let \(I(v)\) be a node \(w\) in \(\mathcal{T}\) such that \(h(w)\) is maximum over all nodes in the subtree of \(v\) (including \(v\) itself). — [Gusfield1997] §8.5, 184ff

For any node \(v\), node \(I(v)\) is the deepest node in the run containing node \(v\). — [Gusfield1997] Lemma 8.6.1., 187

N.B. This is the id of the node \(I(v)\).

A

Bit \(A_v(i)\) is set to 1 if and only if node \(v\) has some ancestor in \(\mathcal{T}\) that maps to height \(i\) in \(\mathcal{B}\), i.e. if and only if \(v\) has an ancestor \(u\) such that \(h(I(u))=i\). — [Gusfield1997] §8.7, 188f

N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181

class Tree(_d)

A mixin for suffix trees to allow for LCA retrievals.

__init__(_d)
nodemap

map from string position to leaf node

prepare_lca()

Preprocess the tree for Lowest Common Ancestor retrieval.

[Gusfield1997] §8.7

lca(x, y)

Return the lowest common ancestor node of nodes x and y.

>>> from suffix_tree import Tree
>>> tree = Tree({"A": "xabxac", "B": "awyawxawxz"})
>>> tree.prepare_lca()
>>> tree.lca(tree.nodemap["A"][1], tree.nodemap["B"][3]).lca_id
8