Nodes and the Visitor pattern

Dealing with non-trees

Episode 7
3 months ago
6 min read

In the very first episode of this series, we defined different types of node clusters and saw that some are more tricky to handle than others.

Now that we’ve built enough knowledge around the Visitor pattern, let’s see how we can leverage them to tackle the following three problematic types of node clusters.

  • Circle. When there is a circular dependency in the cluster.
    Problem: we can end up in an infinite loop when visiting our cluster.
  • Undirected circle. When a node can end up with multiple parents.
    Problem: nodes with multiple parents and their children end up being visited more than once.
  • Multiple root. When a cluster has many starting points.
    Problem: if we don’t want to choose a starting point, we have to evaluate all root nodes which will cause some of the nodes to be visited multiple times.

3 diagrams. From left to right: nodes forming a circular dependency; one node having multiple parents in a single-root cluster; a cluster of nodes with multiple roots.

Visiting once

One way to tackle all of these issues is to add a layer of caching in our Visitors that ensures every node of the cluster is only visited once.

This would fix the “Undirected circle” and “Multiple root” problems since we could confidently run visitors on our cluster without worrying about the consequences of a node being visited more than one — performance issues, unpredictable results, etc.

Additionally, this would fix the “Circle” problem by preventing us from entering an infinite loop.

Okay, let’s see how we can implement this. The idea here is to create a new abstract class that implements the required visitX methods and delegate the main logic to some new visitXOnce methods.

abstract class VisitOnlyOnceVisitor extends Visitor
{
    abstract public function visitNodeOnce(Node $node);

    public function visitNode(Node $node)
    {
        // Do something that ensures the node is visited only once...

        // Delegate to the concrete visitor.
        $this->visitNodeOnce($node);
    }
}

Now, any concrete visitor that wants to ensure nodes are visited only once needs to extend the VisitOnceVisitor class and implement the visitXOnce methods instead of the visitX methods.

Okay, now how do we ensure nodes are visited only once? For that, we’ll use a $visited array that keeps track of all nodes visited so far. Thanks to this array, we can now add a check before calling the visitXOnce method that ensures the current $node is not already part of the visited nodes.

abstract class VisitOnlyOnceVisitor extends Visitor
{
    protected array $visited = [];

    abstract public function visitNodeOnce(Node $node);

    public function visitNode(Node $node)
    {
        // Abort if the node was already visited.
        if ($this->wasVisited($node)) {
            return;
        }

        // Otherwise, mark the current node as visited.
        $this->markAsVisited($node);

        // And delegate to the concrete visitor.
        $this->visitNodeOnce($node);
    }

    protected function markAsVisited(Node $node)
    {
        $this->visited[] = $node;
    }

    protected function wasVisited(Node $node)
    {
        return in_array($node, $this->visited);
    }
}

Notice how we are calling markAsVisited before we delegate to the visitNodeOnce method. That is because, otherwise, this visitor would not protect us against infinite loops caused by circles.

Now, this implementation works well but could be improved slightly.

If you look at the if statement that aborts when the current node was already visited, you’ll notice we return nothing. This would be fine for visitors with no return value but it will yield strange results for all other visitors.

A better approach here would be to capture the return value of the visitXOnce method and store it with the visited node. That way, when we identify a node that was already visited before, we can return the same value it returned when it was first visited.

In order to do that though, we need to call markAsVisited after delegating to the visitNodeOnce which we said would remove our protection against circles. Thus, we'll call markAsVisited twice. Once before with the null value — since we don't yet know the return value to store — and once after with the real return value.

Here, we will store the node and its result using an associative array. We will assume each node has a unique key and use it to store the node and its result as a key/value pair. Note that, since PHP 8, we could also use WeakMaps to use the $node directly as a key instead of requiring a unique identifier.

abstract class VisitOnlyOnceVisitor extends Visitor
{
    protected array $visited = [];

    abstract public function visitNodeOnce(Node $node);

    public function visitNode(Node $node)
    {
        // Return the saved result if the node was already visited.
        if ($this->wasVisited($node)) {
            return $this->getVisitedResult($node);
        }

        // Otherwise, mark the node as visited with a 
        // null return value to protect against circles.
        $this->markAsVisited($node, null);

        // Then delegate to the concrete visitor and 
        // update its visited return value.
        $result = $this->visitNodeOnce($node);
        $this->markAsVisited($node, $result);
    }

    protected function markAsVisited(Node $node, $result)
    {
        $this->visited[$node->key] = $result;
    }

    protected function wasVisited(Node $node)
    {
        return isset($this->visited[$node->key]);
    }

    protected function getVisitedResult(Node $node)
    {
        return $this->visited[$node->key];
    }
}

And that’s all there is to it! Now, we can be confident that every node will be visited once and that the result of their first visit will be reused every time it is encountered again.

Protecting against circles

Even though our previous VisitOnceVisitor solves all of our problems including the infinite loop caused by circles, it will still likely cause unpredictable results.

If there is a circular dependency in our node cluster, it likely means that the result of each node in the circle is also expected to be infinite.

Imagine the following cluster composed of one type of node called PlusOne that can have one or zero children. That node simply takes the value of its child and add one to it unless it has no child and simply returns 1. Now imagine there is a circle in that cluster.

A circle of 8 "PlusOne" node all linked to each other with arrows pointing counter-clockwise.

What should the values of these nodes be? Where should we start the evaluation from?

If we used the VisitOnceVisitor here, we wouldn’t enter an infinite loop but we will end up with a result that is irrelevant for the cluster.

So what can we do about it? Well, in most cases, circles are simply not desired. The fact that they can happen — e.g. users creating their own workflows — doesn’t mean that they should. Thus, we will want to protect ourselves against circles by throwing an exception when we encounter one.

So let’s implement this! We’ll start by creating a new exception class that accept an array of nodes that have been detected as a circle.

class CircleDetectedException extends RuntimeException
{
    protected array $circle;

    public function __construct(array $circle)
    {
        parent::__construct('A circular dependency was detected in the cluster.');

        $this->circle = $circle;
    }

    public function getCircle(): array
    {
        return $this->circle;
    }
}

Next, we’ll create a ProtectAgainstCirclesVisitor that goes through the nodes whilst maintaining a $stack variable. The visitor pushes the current node to the stack before visiting its children and pop it out of the stack afterwards. As a result, the $stack array represents a vertical path starting from the root node and finishing at the parent of the current node.

A tree of nodes where each node is labelled from A to H. Node G is highlighted. G's parent is C and C's parent is the root node A. On the right side of the tree, there is an arrow pointing right. Next to that arrow is a subset of the tree of the left made of only A and C. The tree on the right is labelled "Stack".

With our $stack variable, we can now detect circles in our cluster. If, when visiting a node, we see that the node is already present in the stack, then we know we have a cyclic path in our cluster. When that’s the case we simply return the exception we created above and use the current $stack variable as an argument since we know the stack contains the cyclic path we’ve discovered.

Therefore, we end up with the following implementation.

class ProtectAgainstCirclesVisitor extends Visitor
{
    protected array $stack = [];

    public function visitNode(Node $node)
    {
        if ($this->stackContains($node)) {
            throw new CircleDetectedException($this->stack);
        }

        $this->push($node);

        foreach ($node->children as $child) {
            $child->accept($this);
        }

        $this->pop();
    }

    protected function push(Node $node)
    {
        $this->stack[] = $node;
    }

    protected function pop()
    {
        return array_pop($this->stack);
    }

    protected function stackContains(Node $node)
    {
        return in_array($node, $this->stack);
    }
}

Conclusion

That’s it, we now have some handy visitors we can reuse and tweak to handle clusters of nodes that are more complex than trees.

This episode concludes our "Nodes and the Visitor pattern" series. I hope you enjoyed it and learned a few new things out of it.

If you enjoyed this series, you might be interested in a follow-up series where we will create our own language using lexers, parsers, ASTs and visitors. It’s not out yet but you can follow me on Twitter to get notified when it is.

Cheers! 🌿

Discussions

Would you like to chime in?

You must be a member to start a new discussion.

Fortunately, it only takes two click to become one. See you on the other side! 🌸

Become a Member