Node

Node classes for a Generalized Suffix Tree.

class Node(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)

The abstract base class for internal and leaf nodes.

__init__(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)

This is inlined on all subclasses to save one __init__ call per symbol.

parent

The parent of this node. Used by Ukkonen and LCA.

S

The sequence of symbols. In a generalized tree the first sequence that created this node. S[start:end] is the path-label of this node.

start

The start position in S of the path-label that goes into this node.

end

The end position in S of the path-label that goes into this node.

name: str

A name that can be given to the node. The name is used only for debugging. It is reported in the debug dot graph.

__str__() str

Return a string representaion of this node for debugging.

__getitem__(i: int) Hashable

Return the symol at position i.

Note: this should implement slices but we don’t need them. Also we don’t need negative indices.

compute_C() Set[object]

Calculate the \(C(v)\) number for this node and all its children.

compute_left_diverse()

Calculate the left-diversity of this node and all its children.

pre_order(f) None

Walk the tree visiting each node before its children.

Parameters:

f (Callable) – the visitor function

post_order(f) None

Walk the tree visiting each node after its children.

Parameters:

f (Callable) – the visitor function

add_position(l: List[Tuple[object, Path]]) None

If this node is a Leaf, then add the path to the list.

get_positions() List[Tuple[object, Path]]

Get all paths that traverse this node.

common_substrings(_V: Dict[int, Tuple[int, object, Path]])

Common substrings recursive function.

maximal_repeats(a: list)

Maximal repeats recursive function.

to_dot() str

Output the tree in GraphViz .dot format.

Returns:

the tree in GraphViz dot format.

class Internal(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)

An internal node.

Internal nodes are created for the root and auxiliary node and when splitting edges. Internal nodes have at least 2 children and a fixed end.

__init__(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)

This is inlined on all subclasses to save one __init__ call per symbol.

Used by the McCreight and Ukkonen algorithms to speed up the construction of the tree.

Corollary 6.1.2 In any implicit suffix tree \(\mathcal{T}_i', if internal node `v\) has path-label \(x\alpha\), then there is a node \(s(v)\) of \(\mathcal{T}_i\) with path-label \(\alpha\).” — [Gusfield1997] page 98

children: dict[Hashable, Node]

A dictionary of item => node

is_left_diverse: bool | None

Definition A node \(v\) of \(\mathcal{T}\) is called left diverse if at least two leaves in \(v\)’s subtree have different left characters. By definition a leaf cannot be left diverse.

For each position \(i\) in string \(S\), character \(S(i-1)\) is called the left character of \(i\). The left character of a leaf of \(\mathcal{T}\) is the left character of the suffix position represented by that leaf. — [Gusfield1997] §7.12.1, page 144ff

N.B. This suffix tree operates on any python hashable object, not just characters, so left characters usually are objects.

C: int

Definition For any internal node \(v\) of \(\mathcal{T}\), define \(C(v)\) to be the number of distinct string identifiers that appear at the leaves in the subtree of \(v\). — [Gusfield1997] §7.6, page 127ff

__str__() str

Return a string representaion of this node for debugging.

add_position(l: List[Tuple[object, Path]]) None

If this node is a Leaf, then add the path to the list.

pre_order(f) None

Walk the tree visiting each node before its children.

Parameters:

f (Callable) – the visitor function

post_order(f) None

Walk the tree visiting each node after its children.

Parameters:

f (Callable) – the visitor function

find_path(S: Sequence[Hashable], start: int, end: int) Tuple[Node, int, Node | None]

Find a path starting from this node.

The path is absolute.

Returns the deepest node on the path, the matched length of the path, and also the next deeper node if the matched length is longer than the string-depth of the deepest node on the path.

split_edge(new_len: int, child: Node) Internal

Split the edge from self to child.

Split self –> child into self –> new_node –> child and return new_node.

Parameters:
  • new_len (int) – the length of the path into the new node

  • child (Node) – indicates the edge to split

Returns:

the new node

compute_C() Set[object]

Calculate the \(C(v)\) number for this node and all its children.

compute_left_diverse() Set | None

Calculate the left-diversity of this node and all its children.

common_substrings(V: Dict[int, Tuple[int, object, Path]])

Common substrings recursive function.

maximal_repeats(a: List[Tuple[int, Path]])

Maximal repeats recursive function.

class Leaf(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: int)

A leaf node.

Leaf nodes are created when a new suffix is inserted into the tree.

A suffix tree contains exactly \(len(S)\) leaf nodes. A generalized suffix tree contains less than \(len (concat (S_1..S_N))\) leaf nodes.

__init__(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: int)

This is inlined on all subclasses to save one __init__ call per symbol.

str_id: object

The id of the sequence that created this node.

__str__() str

Return a string representaion of this node for debugging.

pre_order(f) None

Walk the tree visiting each node before its children.

Parameters:

f (Callable) – the visitor function

post_order(f) None

Walk the tree visiting each node after its children.

Parameters:

f (Callable) – the visitor function

add_position(l: List[Tuple[object, Path]]) None

If this node is a Leaf, then add the path to the list.

compute_C() Set[object]

Calculate the \(C(v)\) number for this node and all its children.

compute_left_diverse()

Calculate the left-diversity of this node and all its children.

maximal_repeats(a: list)

Maximal repeats recursive function.

class UkkonenLeaf(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: List[int])

A leaf node.

UkkonenLeaf nodes have mutable ends. See: end.

A suffix tree contains exactly \(len(S)\) leaf nodes. A generalized suffix tree contains less than \(len (concat (S_1..S_N))\) leaf nodes.

__init__(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: List[int])

This is inlined on all subclasses to save one __init__ call per symbol.

str_id: object

The id of the sequence that created this node.

property end: int

Return the end of the path.

This is a calculated property because in Ukkonen’s algorithm the end of a leaf node is implicitly assumed to be the end of the current phase.

“Any transition of \(STree(T_{i-1})\) leading to a leaf is called an open transition. Such a transition is of the form \(g'(s,(k,i-1)) = r\) where, as stated above, the right pointer has to point to the last position \(i-1\) of \(T_{i-1}\). Therefore it is not necessary to represent the actual value of the right pointer. Instead, open transitions are represented as \(g'(s,(k,\infty)) = r\) where \(\infty\) indicates that this transition is ‘open to grow’.”

[Ukkonen1995]

This will return the current length since at the start of each phase we pop a symbol off the iterable and append it to self.S.

property depth: int

Return the string-depth of this node.

That is the length of the label on the edge going into this node.

Definition For any node \(v\) in a suffix-tree, the string-depth of \(v\) is the number of characters in \(v\)’s label. — [Gusfield1997] §5.2, page 90f