LCA Mixin

A mixin that implements Constant-Time Lowest Common Ancestor Retrieval.

“With the ability to solve lowest common ancestor queries in constant time, suffix trees can be used to solve many additional string problems.” — [Gusfield1997] §9, 196

Definition In a rooted tree T, a node u is an ancestor of a node v if u is on the unique path from the root to v. With this definition a node is an ancestor of itself. A proper ancestor of v refers to an ancestor that is not v.

Definition In a rooted tree T, the lowest common ancestor (lca) of two nodes x and y is the deepest node in T that is an ancestor to both x and y.” — [Gusfield1997] Chapter 8, 181ff

uint(x)

Convert a number into unsigned 32-bit representation.

Used only for debugging.

nlz(x)

Get the number of leadings zeros in a 32 bit word.

>>> nlz(0)
32
>>> nlz(0x1)
31
>>> nlz(0xFF)
24
>>> nlz(0xFFFFFFFF)
0

See: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt

See: [Warren2013] pages 99ff

msb(x)

Get the position of the most dignificat bit.

Get the position of the most significant set bit counting from the right and starting from 0.

>>> msb(0xF)
3
>>> msb(0xFF)
7
>>> msb(0)
-1
h(k)

Return the h value.

Definition For any number k, h(k) denotes the position (counting from the right) of the least-significant 1-bit in the binary representation of k.” — [Gusfield1997] §8.5, 184ff

Lemma 8.5.1. For any node k (node with path number k) in B, h(k) equals the height of node k in B.

For example, node 8 (binary 1000) is at height 4, and the path from it to a leaf has four nodes (three edges).” — [Gusfield1997] §8.5, 184ff

N.B. in this implementation we start counting with 0, so you get:

>>> h(5)
0
>>> h(8)
3
class Node

Mixin for Node to allow LCA retireval.

__init__()

This is inlined on all subclasses to save one __init__ call per symbol.

lca_id

Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182

I

For a node v of T, let I(v) be a node w in T such that h(w) is maximum over all nodes in the subtree of v (including v itself). — [Gusfield1997] §8.5, 184ff

For any node v, node I(v) is the deepest node in the run containing node v. — [Gusfield1997] Lemma 8.6.1., 187

N.B. This is the id of the node I(v).

A

Bit Av(i) is set to 1 if and only if node v has some ancestor in T that maps to height i in B, i.e. if and only if v has an ancestor u such that h(I(u))=i. — [Gusfield1997] §8.7, 188f

N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181

compute_A(A)

Compute A.

compute_I_and_L(L)

Compute I and L.

prepare_lca(counter)

Prepare the node for LCA retrievals.

class Leaf

A mixin for leaf nodes to allow for LCA retrievals.

__str__()

Return str(self).

compute_A(A)

Compute A.

compute_I_and_L(L)

Compute I and L.

prepare_lca(counter)

Prepare the leaf node for LCA retrieval.

lca_id

Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182

I

For a node v of T, let I(v) be a node w in T such that h(w) is maximum over all nodes in the subtree of v (including v itself). — [Gusfield1997] §8.5, 184ff

For any node v, node I(v) is the deepest node in the run containing node v. — [Gusfield1997] Lemma 8.6.1., 187

N.B. This is the id of the node I(v).

A

Bit Av(i) is set to 1 if and only if node v has some ancestor in T that maps to height i in B, i.e. if and only if v has an ancestor u such that h(I(u))=i. — [Gusfield1997] §8.7, 188f

N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181

class Internal

A mixin for internal nodes to allow for LCA retrievals.

__str__()

Return str(self).

compute_A(A)

Compute A.

compute_I_and_L(L)

Compute I and L.

prepare_lca(counter)

Prepare the internal node for LCA retrieval.

lca_id

Number of the node given in a depth-first traversal of the tree, starting with 1. See [Gusfield1997] Figure 8.1, 182

I

For a node v of T, let I(v) be a node w in T such that h(w) is maximum over all nodes in the subtree of v (including v itself). — [Gusfield1997] §8.5, 184ff

For any node v, node I(v) is the deepest node in the run containing node v. — [Gusfield1997] Lemma 8.6.1., 187

N.B. This is the id of the node I(v).

A

Bit Av(i) is set to 1 if and only if node v has some ancestor in T that maps to height i in B, i.e. if and only if v has an ancestor u such that h(I(u))=i. — [Gusfield1997] §8.7, 188f

N.B. A node is an ancestor of itself. — [Gusfield1997] §8.1, 181

class Tree(_d)

A mixin for suffix trees to allow for LCA retrievals.

__init__(_d)
nodemap

map from string position to leaf node

prepare_lca()

Preprocess the tree for Lowest Common Ancestor retrieval.

[Gusfield1997] §8.7

lca(x, y)

Return the lowest common ancestor node of nodes x and y.

>>> from suffix_tree import Tree
>>> tree = Tree({"A": "xabxac", "B": "awyawxawxz"})
>>> tree.prepare_lca()
>>> tree.lca(tree.nodemap["A"][1], tree.nodemap["B"][3]).lca_id
8