Quantcast
Channel: Question and Answer » trees
Viewing all articles
Browse latest Browse all 36

What method should I use to populate a tree as I'm consuming data that is contained within the tree?

$
0
0

I have a tree that consists of 5 types of nodes – V, W, X, Y, and Z. V represents the root node, and a given tree has exactly one element of type V. A V contains one or more Ws, a W contains one or more Xs, an X contains one of more Ys, and a Y contains one or more Zs:

tree structure

Looking at this, I decided to model this with the following class structure:

class structure

Not shown in the class structure are the data elements that are unique to each type of node along with the methods to access those data elements. However, each node does have a unique set of attributes and operations.

To give a perspective on size, there could be up to thousands of nodes at the Z level. It’s going to be much smaller at the W level, as there are almost no cases of 1:1:1:1 mappings from W to Z.

My problem comes when I think about how to build this tree.

The data comes in what can be thought of as a stream. There is a unit that I’ll refer to as DataChunk. A DataChunk contains the data for one or more levels of the tree. If the DataChunk contains data for a Z, then it also contains data for a Y, X, and W. However, a DataChunk that contains data for X will contain data for W, but not necessarily Y.

In a stream, the start of a DataChunk is marked off by a sync word. In a set of datagrams, each DataChunk is a single datagram. In a set of files, the DataChunk could be presented as a set of files where 1 file is one DataChunk or 1 file contains multiple DataChunks, each marked by the sync word. The order is not specified.

In order to build the tree, I need to accept a DataChunk and parse information out of it.

One approach would be to have each node accept a DataChunk object, parse the information out, and set its own attributes and then see if it has a matching child to pass the node to. If the corresponding child exists, pass the data chunk to it. If the corresponding child doesn’t exist, create it and then pass the data chunk to it. However, this leaves me with a mutable object – after construction, I can pass an arbitrary DataChunk into any node and change the tree. This also appears to be a violation of SRP, as each data node is responsible for not only the data associated with it, but to help build the tree.

Another approach may be to have some kind of Builder approach, but this would balloon my classes to 10 – I would need the node data class and the builder class. It also seems like this would be rather inefficient, since the Builder would essentially be a second tree with a mutable interface that would return the tree structure described above. It doesn’t seem performant in terms of time or memory consumption.

I’m also somewhat concerned with flexibility going forward. The tree data model isn’t likely to change, but the format of the DataChunk may change or be supplemented with additional formats for the data to use to build the tree. That may be another point to the builder-style approach.

Is there a data-consuming / tree-building approach that I’ve overlooked? Would it be preferable to do what appears to be a violation of SRP and present a client with a mutable data structure (assuming they have or construct instances of DataChunk) or go with a more complex builder to present the client with a cleaner interface, but have a more complex code base.


Viewing all articles
Browse latest Browse all 36

Trending Articles