Node
Node classes for a Generalized Suffix Tree.
- class Node(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)
The abstract base class for internal and leaf nodes.
- __init__(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)
This is inlined on all subclasses to save one __init__ call per symbol.
- parent
The parent of this node. Used by Ukkonen and LCA.
- S
The sequence of symbols. In a generalized tree the first sequence that created this node. S[start:end] is the path-label of this node.
- start
The start position in S of the path-label that goes into this node.
- end
The end position in S of the path-label that goes into this node.
- name: str
A name that can be given to the node. The name is used only for debugging. It is reported in the debug dot graph.
- __str__() str
Return a string representaion of this node for debugging.
- __getitem__(i: int) Hashable
Return the symol at position i.
Note: this should implement slices but we don’t need them. Also we don’t need negative indices.
- compute_C() Set[object]
Calculate the \(C(v)\) number for this node and all its children.
- compute_left_diverse()
Calculate the left-diversity of this node and all its children.
- pre_order(f) None
Walk the tree visiting each node before its children.
- Parameters:
f (Callable) – the visitor function
- post_order(f) None
Walk the tree visiting each node after its children.
- Parameters:
f (Callable) – the visitor function
- add_position(l: List[Tuple[object, Path]]) None
If this node is a Leaf, then add the path to the list.
- maximal_repeats(a: list)
Maximal repeats recursive function.
- to_dot() str
Output the tree in GraphViz .dot format.
- Returns:
the tree in GraphViz dot format.
- class Internal(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)
An internal node.
Internal nodes are created for the root and auxiliary node and when splitting edges. Internal nodes have at least 2 children and a fixed end.
- __init__(parent: Internal | None, S: Sequence[Hashable], start: int, end: int)
This is inlined on all subclasses to save one __init__ call per symbol.
- suffix_link: Internal | None
Used by the McCreight and Ukkonen algorithms to speed up the construction of the tree.
“Corollary 6.1.2 In any implicit suffix tree \(\mathcal{T}_i', if internal node `v\) has path-label \(x\alpha\), then there is a node \(s(v)\) of \(\mathcal{T}_i\) with path-label \(\alpha\).” — [Gusfield1997] page 98
- is_left_diverse: bool | None
Definition A node \(v\) of \(\mathcal{T}\) is called left diverse if at least two leaves in \(v\)’s subtree have different left characters. By definition a leaf cannot be left diverse.
For each position \(i\) in string \(S\), character \(S(i-1)\) is called the left character of \(i\). The left character of a leaf of \(\mathcal{T}\) is the left character of the suffix position represented by that leaf. — [Gusfield1997] §7.12.1, page 144ff
N.B. This suffix tree operates on any python hashable object, not just characters, so left characters usually are objects.
- C: int
Definition For any internal node \(v\) of \(\mathcal{T}\), define \(C(v)\) to be the number of distinct string identifiers that appear at the leaves in the subtree of \(v\). — [Gusfield1997] §7.6, page 127ff
- __str__() str
Return a string representaion of this node for debugging.
- add_position(l: List[Tuple[object, Path]]) None
If this node is a Leaf, then add the path to the list.
- pre_order(f) None
Walk the tree visiting each node before its children.
- Parameters:
f (Callable) – the visitor function
- post_order(f) None
Walk the tree visiting each node after its children.
- Parameters:
f (Callable) – the visitor function
- find_path(S: Sequence[Hashable], start: int, end: int) Tuple[Node, int, Node | None]
Find a path starting from this node.
The path is absolute.
Returns the deepest node on the path, the matched length of the path, and also the next deeper node if the matched length is longer than the string-depth of the deepest node on the path.
- split_edge(new_len: int, child: Node) Internal
Split the edge from self to child.
Split self –> child into self –> new_node –> child and return new_node.
- Parameters:
new_len (int) – the length of the path into the new node
child (Node) – indicates the edge to split
- Returns:
the new node
- compute_C() Set[object]
Calculate the \(C(v)\) number for this node and all its children.
- compute_left_diverse() Set | None
Calculate the left-diversity of this node and all its children.
- class Leaf(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: int)
A leaf node.
Leaf nodes are created when a new suffix is inserted into the tree.
A suffix tree contains exactly \(len(S)\) leaf nodes. A generalized suffix tree contains less than \(len (concat (S_1..S_N))\) leaf nodes.
- __init__(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: int)
This is inlined on all subclasses to save one __init__ call per symbol.
- str_id: object
The id of the sequence that created this node.
- __str__() str
Return a string representaion of this node for debugging.
- pre_order(f) None
Walk the tree visiting each node before its children.
- Parameters:
f (Callable) – the visitor function
- post_order(f) None
Walk the tree visiting each node after its children.
- Parameters:
f (Callable) – the visitor function
- add_position(l: List[Tuple[object, Path]]) None
If this node is a Leaf, then add the path to the list.
- compute_C() Set[object]
Calculate the \(C(v)\) number for this node and all its children.
- compute_left_diverse()
Calculate the left-diversity of this node and all its children.
- maximal_repeats(a: list)
Maximal repeats recursive function.
- class UkkonenLeaf(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: List[int])
A leaf node.
UkkonenLeaf nodes have mutable ends. See:
end
.A suffix tree contains exactly \(len(S)\) leaf nodes. A generalized suffix tree contains less than \(len (concat (S_1..S_N))\) leaf nodes.
- __init__(parent: Internal, str_id: object, S: Sequence[Hashable], start: int, end: List[int])
This is inlined on all subclasses to save one __init__ call per symbol.
- str_id: object
The id of the sequence that created this node.
- property end: int
Return the end of the path.
This is a calculated property because in Ukkonen’s algorithm the end of a leaf node is implicitly assumed to be the end of the current phase.
“Any transition of \(STree(T_{i-1})\) leading to a leaf is called an open transition. Such a transition is of the form \(g'(s,(k,i-1)) = r\) where, as stated above, the right pointer has to point to the last position \(i-1\) of \(T_{i-1}\). Therefore it is not necessary to represent the actual value of the right pointer. Instead, open transitions are represented as \(g'(s,(k,\infty)) = r\) where \(\infty\) indicates that this transition is ‘open to grow’.”
This will return the current length since at the start of each phase we pop a symbol off the iterable and append it to self.S.
- property depth: int
Return the string-depth of this node.
That is the length of the label on the edge going into this node.
Definition For any node \(v\) in a suffix-tree, the string-depth of \(v\) is the number of characters in \(v\)’s label. — [Gusfield1997] §5.2, page 90f