Nodes and the Visitor pattern

Traversing our nodes

Episode 3
4 months ago
12 min read

Now that we're familiar with nodes and visitors, let's study some useful techniques for making the most of our visitors.

Up and down the tree

Bottom-up

So far, with our calculator example, we've only seen examples of bottom-up visitors. These are visitors that require the result of descendants before being able to compute some logic.

For example, when our EvaluateVisitor comes across an Add node, it first needs to know the value of its $left and $right children before being able to add them together.

Our running example with arrows going from the leaves to the root with numbers on them to show the order

The triangle means we are starting the visitor on this node.

More generically, this means we compute our logic after running the accept method on our children.

public function visitSomeNode(SomeNode $node)
{
    foreach ($node->children as $child) {
        $child->accept($this);
    }
  
    // Our logic here...
}

Top-down

Top-down visitors are the exact opposite and do not require the result of descendants to operate. Consequently, they allow us to traverse our tree in a more predictable manner starting from the root and finishing on the leaves.

Their predictability can be particularly useful when handling state variables that updates from one node to the other as we will see in this article.

Our running example with arrows going from the root to the leaves with numbers on them to show the order

The create a top-down visitor, we simply need to compute our logic before running the accept method on our children.

public function visitSomeNode(SomeNode $node)
{
    // Our logic here...

    foreach ($node->children as $child) {
        $child->accept($this);
    }
}

Mix and match

Whilst it's good to stick to either a top-down or bottom-up visitor for simple operations, it's not uncommon to need visitors that operate both ways.

Additionally, when a node has more than one child, you may also need to:

  • Alternate the order in which children are calling the accept method.
  • Add logic in-between accept calls.

This all adds to the possibilities offered to you when traversing nodes via a visitor.

Our running example with arrows going down from root to leaves and then back up from leaves to root with numbers at each step to show the order of execution

On the image above, blue is for top-down, green for bottom-up and orange for logic between each branch.

public function visitSomeNode(SomeNode $node)
{
    // [Blue] Top-down logic here...
  
    // Control execution order...
    $children = $this->sortChildren($node->children);

    foreach ($children as $index => $child) {
        if ($index > 0) {
            // [Orange] In-between logic here...
        }

        $child->accept($this);
    }

    // [Green] Bottom-up logic here...
}

An analogy with middleware

If you're familiar with the Laravel framework and its Pipeline component, it can be compared with a tree of nodes where each node has exactly one child. Each node receives a $next callback and is expected to call it to continue the pipeline execution.

This is what Laravel uses to execute HTTP middleware before a request reaches our controllers. Each middleware calls the $next callback until the request is routed to the right controller. After that, the $next callback returns the HTTP response so that all middleware have a chance to intercept it on the way up.

6 chained middleware with arrows going down on the left side and going up on the right side with numbers to show the execution order. At the very bottom of the chain there is a dispatchToRouter leaf.

If you examine the handle method of a middleware you will see that the way it intercept the request and/or response is analogous to the top-down and bottom-up logic of visitors we've just seen.

public function handle(Request $request, Closure $next)
{
    // [Blue] Top-down logic here...
  
    $reponse = $next($request);

    // [Green] Bottom-up logic here...
}

Transforming the nodes

We've seen how we can visit our nodes up and down but, so far, we've only talked about visitors that output some sort of aggregates from our nodes — i.e. their evaluation and their printed value.

Let's now have a look at how we can alter the nodes themselves to create powerful visitors that can be chained together.

Some visitor examples

Let's start by defining a few visitors that we will implement and chain together in the following sections. We'll create two new visitors:

  • DoubleNumbersVisitor. A visitor that doubles the value of all Number nodes.
  • RemoveSubtractNodesVisitor. A visitor that replaces Subtract(A, B) with Add(A, Minus(B)).

We will then run these visitors in order and print the result using our PrintVisitor.

$calculator->accept(new DoubleNumbersVisitor());
$calculator->accept(new RemoveSubtractNodesVisitor());
echo $calculator->accept(new PrintVisitor());

Mutating visitors

When transforming nodes, we can either mutate the data directly on the nodes, or we can return a brand new node cluster containing the transformations. Let's see both techniques starting with the one mutating nodes.

First, we'll create a new super-class called NullVisitor that visits the children of each node (if any) but returns nothing.

class NullVisitor extends Visitor
{
    public function visitNumber(Number $node)
    {
        //
    }

    public function visitAdd(Add $node)
    {
        $node->left->accept($this);
        $node->right->accept($this);
    }

    public function visitSubtract(Subtract $node)
    {
        $node->left->accept($this);
        $node->right->accept($this);
    }
    
    public function visitMinus(Minus $node)
    {
        $node->node->accept($this);
    }

    public function visitAssign(Assign $node)
    {
        $node->node->accept($this);
    }

    public function visitGet(Get $node)
    {
        //
    }

    public function visitCalculator(Calculator $node)
    {
      	foreach ($node->statements as $statement) {
            $statement->accept($this);
        }
    }
}

This class is useful for our mutating visitors since they can now extend this class and only override the methods that mutate nodes.

Thus, our DoubleNumbersVisitor can be implemented like so:

class DoubleNumbersVisitor extends NullVisitor
{
    public function visitNumber(Number $node)
    {
        $node->value *= 2;
    }
}

Regarding the RemoveSubtractNodesVisitor, you might be tempted to implement it like this:

class RemoveSubtractNodesVisitor extends NullVisitor
{
    public function visitSubtract(Subtract $node)
    {
        $node->left->accept($this);
        $node->right->accept($this);
      
        return new Add($node->left, new Minus($node->right));
    }
}

However, the NullVisitor only allows us to change the variables inside our nodes, not to swap one node with another. Even if our NullVisitor returned the $node itself at the end of each visit method, it still wouldn't work because any parent node that returns itself will keep the reference of the original Subtract node.

And so we've reached the limit of mutating visitors. They may be good for some use-cases so it's good to know they exist. However, more often then not, you will reach for an immutable visitor instead.

Immutable visitors

Not only immutable visitors are safer to handle — since updating one cluster has no side effects on the rest of the code — they also allow us to alter the structure of the cluster itself.

Instead of using the NullVisitor above, we'll create a new super-class called IdentityVisitor that returns a deep clone of the visited cluster. Namely, it returns a new instance of each node using the result of the visited children (if any).

class IdentityVisitor extends Visitor
{
    public function visitNumber(Number $node)
    {
        return new Number($node->value);
    }

    public function visitAdd(Add $node)
    {
        $left = $node->left->accept($this);
        $right = $node->right->accept($this);

        return new Add($left, $right);
    }

    public function visitSubtract(Subtract $node)
    {
        $left = $node->left->accept($this);
        $right = $node->right->accept($this);

        return new Subtract($left, $right);
    }
    
    public function visitMinus(Minus $node)
    {
        $child = $node->node->accept($this);

        return new Minus($child);
    }

    public function visitAssign(Assign $node)
    {
        $child = $node->node->accept($this);

        return new Assign($node->variable, $child);
    }

    public function visitGet(Get $node)
    {
        return new Get($node->variable);
    }

    public function visitCalculator(Calculator $node)
    {
        $newStatements = [];

      	foreach ($node->statements as $statement) {
            $newStatements = $statement->accept($this);
        }

        return new Calculator($newStatements);
    }
}

Now we can implement our immutable DoubleNumbersVisitor like this:

class DoubleNumbersVisitor extends IdentityVisitor
{
    public function visitNumber(Number $node)
    {
        return new Number($node->value * 2);
    }
}

More importantly, we can now implement the RemoveSubtractNodesVisitor visitor by swapping the Subtract node with an Add node.

class RemoveSubtractNodesVisitor extends IdentityVisitor
{
    public function visitSubtract(Subtract $node)
    {
        $left = $node->left->accept($this);
        $right = $node->right->accept($this);
      
        return new Add($left, new Minus($right));
    }
}

Finally, we need to update the way we run these visitors things they now return a brand new cluster after each call.

$calculator = $calculator->accept(new DoubleNumbersVisitor());
$calculator = $calculator->accept(new RemoveSubtractNodesVisitor());
echo $calculator->accept(new PrintVisitor());

And that's all there is to it!

The headache of variables

In the previous episode, we've seen visitors that require variables. For example, our EvaluateVisitor needed an $assignments array to keep track of the values of all assigned variables. There is only so much a visitor can achieve without variables so it is important to know how to deal with them.

We can differentiate two types of variables inside visitors:

  • The variables that are simply shared across all nodes. They act as a pool of information accessible throughout the journey of a visitor — e.g. the $assignments array. That information can grow as we visit nodes but it does not directly shape how we visit nodes. Let's call these: Shared variables.
  • The variables that define a state that evolves as we progress through our nodes. The value of these variables directly affect how we visit nodes and so it is important to ensure their values are updated properly as we go down and back up of the tree. Let's call these: State variables.

Shared variables

Let's start with the easy one. Any shared memory we need to compute as we visit our nodes enters that category.

  • The $assignment array of our EvaluateVisitor.
  • A $histogram variable that would keep track of how many nodes of each type we've visited.
  • A $validator variable that would accumulate rules and data as we visit our nodes.
  • And so on.

Shared variables also include variables that act as options. For example, our PrintEvaluator could accept some variables like $indentation and $format that would affect how we print our calculator for better reusability.

Yes, they technically affect the way we visit our nodes and potentially the return value of our visitors but they are shared across the entire journey and do not define a current state that needs to be maintained.

State variables

These are the tricky ones. They are often a pain to get your head around but they allow us to take visitors to the next level by making them like powerful state machines.

To illustrate this we will have a look at a few examples. For that, let's simplify our calculator slightly for a moment. We will only keep the following nodes: Add, Minus and Number.

We'll create a visitor that immutably removes all Minus nodes by changing the sign of the Number nodes. Here are a few before/after examples:

3 before/after tree examples. One that transforms Minus Minus Number 5 to Number 5. One that transforms Minus Number 5 to Number -5. Finally, one that transforms Minus Add (Number 1, Number 5) to Add (Number -1, Number -5).

Okay, let's give it a shot. We create a RemoveMinusNodesVisitor class that:

  • Uses a $negative boolean to swap between a "Positive" and a "Negative" state.
  • Reverses the $negative state every time we reach a Minus node.
  • Removes all Minus nodes by returning their child node directly.
  • Revers the sign of the Number node's value if and only if we are in a "Negative" state.
  • Does nothing special when visiting the Add node since the $negative state will simply flow down its children.
class RemoveMinusNodesVisitor extends Visitor
{
    // We create a state variable that defines if
    // we are in a "Negative" or "Positive" state.
    protected bool $negative = false;

    public function visitNumber(Number $node)
    {
        // If we are in a "Negative" state,
        // we inverse the sign of the number.
        $multiplier = $this->negative ? -1 : 1;

        return new Number($node->value * $multiplier);
    }
  
    public function visitMinus(Minus $node)
    {
        // When we encounter a Minus node, we first 
        // reverse the "Negative" or "Positive" state.
        $this->negative = ! $this->negative;
  
        // Then we run our visitor down the child branch.
        $child = $node->node->accept($this);
      
        // Finally, we return the child directly
        // without wrapping it in a Minus node.
        return $child;
    }
  
    public function visitAdd(Add $node)
    {
        $left = $node->left->accept($this);
        $right = $node->right->accept($this);

        return new Add($left, $right);
    }
}

Our implementation uses a top-down state variable. That means, we set up the new state before visiting the child nodes.

Thus, let's have a look at some simple examples and we'll write the "Negative" or "Positive" state between each step.

2 examples of tree transformations where the state "Negative" and "Positive" is written at each step to see the progression of the state variable.

As you can see, the Minus nodes inverse the state variable properly and therefore the sign of the Number leaves are updated accordingly. Since the Minus nodes directly return their visited child, they are automatically removed from the tree.

At first glance, this seems to do the trick. However, there's a bug in that visitor. Have a look at this example.

A transformation of the Minus Add(Minus Number 1, Number 5) tree with a question mark on the arrow.

Since - (A + B) is equivalent to (- A) + (- B), we expect the state of the Add node in this example to be "Negative" for both the left and the right branch.

Since the "Negative" state is set by the first Minus node and since the Add node does not alter the state variable, this feels correct at first.

However, our state variable travels up-and-down the tree with our visitor and therefore any state changes applies to a child branch will eventually bubble back up to its parents.

In this particular case, the second Minus node on the left branch cause the state to become "Positive" and bubbles back up to the Add node which immediately visits the right branch without making any alterations. This means our right branch start with a "Positive" state instead of the expected "Negative" state making the final result wrong.

The same transformation showing that the state comes back "Positive" before visiting the right branch of the Add node causing the transformation to be wrong.

In this example, we can fix this by recording the state variable on the Add node before visiting the left branch and restoring it before visiting the right branch.

public function visitAdd(Add $node)
{
    // Record the state before visiting the left branch.
    $originalNegative = $this->negative;
    $left = $node->left->accept($this);
  
    // Restore it before visiting the right branch.
  	$this->negative = $originalNegative;
    $right = $node->right->accept($this);

    return new Add($left, $right);
}

And now our visitor works as expected!

The same transformation as before but with a "Negative" state before visiting the right branch of the Add node causing the transformation to be correct.

More generally, when dealing with state variables, it is a good idea to always restore the original state right before returning. You only need to do this for nodes that alter the state.

This removes the need to worry about unpredictable state bubbling up from one node to another. This was a simple example but imagine having a cluster of 20 types of node and 10 state variables and having to worry about all possible combinations.

For our RemoveMinusNodesVisitor, it would look like this:

class RemoveMinusNodesVisitor extends Visitor
{
    protected bool $negative = false;

    public function visitNumber(Number $node)
    {
        // Nothing to do here since we don't alter the state.
        $multiplier = $this->negative ? -1 : 1;

        return new Number($node->value * $multiplier);
    }
  
    public function visitMinus(Minus $node)
    {
        // Record state.
        $originalNegative = $this->negative;
      
        // Alter state.
        $this->negative = ! $this->negative;
      
        // Visit branches.
        $child = $node->node->accept($this);
      
        // Restore state.
  	    $this->negative = $originalNegative;
      
        return $child;
    }
  
    public function visitAdd(Add $node)
    {
        // Nothing to do here anymore since we don't alter the state.
        $left = $node->left->accept($this);
        $right = $node->right->accept($this);

        return new Add($left, $right);
    }
}

Now, this was a tough subject but I think it's important to be comfortable with state variables as, without them, there is only so much we can do with visitors.

Note that we've only talked about top-down state variables — meaning the state is set before visiting the children — but you could also have bottom-up state variables. These are useful when you actually want information from the children to bubble up to the parents. And of course, you can have state variables that carry useful information both down and up the tree!

Our running examples with a letter representing a state at each step of the visitor path. Letters go from A to R.

We won't talk about these in this series because I feel like I've used enough of your brain energy already but it's good to know that they exist and that they should be avoided at all costs if you don't like migraines.

Conclusion

Phew, congratulations on making it to the end of this article! 🎉

We now have so many new tools to manage our clusters of nodes.

The next two articles will be exclusive to sponsors. In these articles, we will learn yet another way to visit our nodes using what we call "breadth-first search" and we will then provide a thorough dictionary of visitor operations we can refer back to whenever we are handling nodes.

If you enjoy this series and you are not already a sponsor, it would mean the world to me if you could consider sponsoring me on GitHub. Not only you will unlock all "sponsor-only" articles on my blog but you will enable me to spend more time creating content and less time doing jobs that pay the bills.

After these sponsor-only episodes, another two public episodes will be available to read starting with how to persist our nodes in a database.

Discussions

Would you like to chime in?

You must be a member to start a new discussion.

Fortunately, it only takes two click to become one. See you on the other side! 🌸

Become a Member