I have a tree that consists of 5 types of nodes – V
, W
, X
, Y
, and Z
. V
represents the root node, and a given tree has exactly one element of type V
. A V
contains one or more W
s, a W
contains one or more X
s, an X
contains one of more Y
s, and a Y
contains one or more Z
s:
Looking at this, I decided to model this with the following class structure:
Not shown in the class structure are the data elements that are unique to each type of node along with the methods to access those data elements. However, each node does have a unique set of attributes and operations.
To give a perspective on size, there could be up to thousands of nodes at the Z
level. It’s going to be much smaller at the W
level, as there are almost no cases of 1:1:1:1 mappings from W
to Z
.
My problem comes when I think about how to build this tree.
The data comes in what can be thought of as a stream. There is a unit that I’ll refer to as DataChunk
. A DataChunk
contains the data for one or more levels of the tree. If the DataChunk
contains data for a Z
, then it also contains data for a Y
, X
, and W
. However, a DataChunk
that contains data for X
will contain data for W
, but not necessarily Y
.
In a stream, the start of a DataChunk
is marked off by a sync word. In a set of datagrams, each DataChunk
is a single datagram. In a set of files, the DataChunk
could be presented as a set of files where 1 file is one DataChunk
or 1 file contains multiple DataChunk
s, each marked by the sync word. The order is not specified.
In order to build the tree, I need to accept a DataChunk
and parse information out of it.
One approach would be to have each node accept a DataChunk
object, parse the information out, and set its own attributes and then see if it has a matching child to pass the node to. If the corresponding child exists, pass the data chunk to it. If the corresponding child doesn’t exist, create it and then pass the data chunk to it. However, this leaves me with a mutable object – after construction, I can pass an arbitrary DataChunk
into any node and change the tree. This also appears to be a violation of SRP, as each data node is responsible for not only the data associated with it, but to help build the tree.
Another approach may be to have some kind of Builder approach, but this would balloon my classes to 10 – I would need the node data class and the builder class. It also seems like this would be rather inefficient, since the Builder would essentially be a second tree with a mutable interface that would return the tree structure described above. It doesn’t seem performant in terms of time or memory consumption.
I’m also somewhat concerned with flexibility going forward. The tree data model isn’t likely to change, but the format of the DataChunk
may change or be supplemented with additional formats for the data to use to build the tree. That may be another point to the builder-style approach.
Is there a data-consuming / tree-building approach that I’ve overlooked? Would it be preferable to do what appears to be a violation of SRP and present a client with a mutable data structure (assuming they have or construct instances of DataChunk
) or go with a more complex builder to present the client with a cleaner interface, but have a more complex code base.