I would like to understand what would be the optimal method of finding minimum tree coverage of tree nodes. Let me explain.
I have a self-referencing structure that represents a tree, with a limited depth of X.
Nodes in the tree can be logically “selected”. From the application perspective, it means that user would like to have some aggregate information about the selection.
Let’s say user picks nodes A, D, I, J, L and M.
What I would like to do is to be able to restructure the users selection in order to pick the minimum set of nodes that cover the entire selection.
For this example:
- nodes I and J can be covered by their common parent node F, so I pick F and remove I and J
- node M can be covered by it’s parent node H, so I pick H and remove M
- node D is already covered by node A, so I remove D
After that, nothing can be restructured further – algorithm stops and I should get the following selection.
First, I don’t know if “minimum coverage” is the good name.
Second, I can’t find any better phrase, so Google is not my best friend here.
Another thing to note is that the tree itself can be stored:
- in-memory
- in transactional database
- in OLAP database (not exactly self-referencing any more)
I’m stuck
EDIT
I have been thinking a lot about this problem, and there might be a solution that I have to analyse.
Following can be introduced:
- let’s call tree itself the “template tree”
- let’s call selection the “selection tree”
- each node of the template tree has a precalculated redundant attribute – number of children nodes
- the selection tree can be “dirty” or “clean”
- after a node is added, if nothing is done yet, it’s in the dirty state
- after the restructuring is done, it’s in the clean state
- each node of the selection tree has redundant attributes – number of children nodes selected and state: selected or ghost
- selection tree is a variant of in-memory doubly linked tree (child knows its parent, parent knows its children)
After that, adding a node in the selection tree (selection-node) would work like this.
Let’s call the new parent of the new node in selection tree a selection-parent, and parent of the new node in template tree a template-parent.
Let’s call the children of the selection tree node selection-children, and children of the template tree node template-children.
- Set dirty state of selection tree
- Remove all selection-children of selection-node
- If unexistent in selection tree, fetch template-parent of selection-node, along with its number of children attribute; add selection-parent as a ghost node, with selected number of children = 1
- If existent in selection tree, access selection-parent of selection-node and increment its selected number of children
- If selection-parent has selected number of children equal to number of template-children, promote its state from ghost to selected; treat that selection node as a new node and start at 1
- Set clean state of selection tree
I’m curious whether this algorithm already exists as an implementation somewhere. It’s behavior wouldn’t obviously be worse than O(log(n)), right? Of course, if it doesn’t lack something in the logic.