Let’s say I want to count the number of reachable nodes for each node in the graph. I can do that using the following code:
void DFS(Node start, HashSet<Node> visited)
{
if (start == null || visited.Contains(start)) return;
visited.Add(start);
foreach (var neighbor in start.Neighbors)
{
DFS(neighbor, visited);
}
}
void MarkReachableCount(Graph g)
{
foreach(var node in g.Nodes)
{
HashSet<Node> visited = [];
DFS(node, visisted);
node.ReachableGraph = visited.Count;
}
}
A major performance cost for this sort of operation is the allocation cost. We allocate a separate hash set for each node in the graph, and then allocate whatever backing store is needed for it. If you have a big graph with many connections, that is expensive.
A simple fix for that would be to use:
void MarkReachableCount(Graph g)
{
HashSet<Node> visited = [];
foreach(var node in g.Nodes)
{
visited.Clear();
DFS(node, visisted);
node.ReachableGraph = visited.Count;
}
}
This means that we have almost no allocations for the entire operation, yay!
This function also performs significantly worse than the previous one, even though it barely allocates. The reason for that? The call to Clear()
is expensive. Take a look at the implementation - this method needs to zero out two arrays, and it will end up being as large as the node with the most reachable nodes. Let’s say we have a node that can access 10,000 nodes. That means that for each node, we’ll have to clear an array of about 14,000 items, as well as another array that is as big as the number of nodes we just visited.
No surprise that the allocating version was actually cheaper. We use the visited
set for a short while, then discard it and get a new one. That means no expensive Clear()
calls.
The question is, can we do better? Before I answer that, let’s try to go a bit deeper in this analysis. Some of the main costs in HashSet<Node>
are the calls to GetHashCode()
and Equals()
. For that matter, let’s look at the cost of the Neighbors
array on the Node
.
Take a look at the following options:
public record Node1(List<Node> Neighbors);
public record Node2(List<int> NeighborIndexes);
Let’s assume each node has about 10 - 20 neighbors. What is the cost in memory for each option? Node1
uses references (pointers), and will take 256 bytes just for the Neighbors backing array (32-capacity array x 8 bytes). However, the Node2
version uses half of that memory.
This is an example of data-oriented design, and saving 50% of our memory costs is quite nice. HashSet<int>
is also going to benefit quite nicely from JIT optimizations (no need to call GetHashCode()
, etc. - everything is inlined).
We still have the problem of allocations vs. Clear(),
though. Can we win?
Now that we have re-framed the problem using int indexes, there is a very obvious optimization opportunity: use a bit map (such as BitsArray
). We know upfront how many items we have, right? So we can allocate a single array and set the corresponding bit to mark that a node (by its index) is visited.
That dramatically reduces the costs of tracking whether we visited a node or not, but it does not address the costs of clearing the bitmap.
Here is how you can handle this scenario cheaply:
public class Bitmap
{
private ulong[] _data;
private ushort[] _versions;
private int _version;
public Bitmap(int size)
{
_data = new ulong[(size + 63) / 64];
_versions = new ushort[_data.Length];
}
public void Clear()
{
if(_version++ < ushort.MaxValue)
return;
Array.Clear(_data);
Array.Clear(_versions);
}
public bool Add(int index)
{
int arrayIndex = index >> 6;
if(_versions[arrayIndex] != _version)
{
_versions[arrayIndex] = _version;
_data[arrayIndex] = 0;
}
int bitIndex = index & 63;
ulong mask = 1UL << bitIndex;
ulong old = _data[arrayIndex];
_data[arrayIndex] |= mask;
return (old & mask) == 0;
}
}
The idea is pretty simple, in addition to the bitmap - we also have another array that marks the version of each 64-bit range. To clear the array, we increment the version. That would mean that when adding to the bitmap, we reset the underlying array element if it doesn’t match the current version. Once every 64K items, we’ll need to pay the cost of actually resetting the backing stores, but that ends up being very cheap overall (and worth the space savings to handle the overflow).
The code is tight, requires no allocations, and performs very quickly.