Traversing our nodes
Now that we're familiar with nodes and visitors, let's study some useful techniques for making the most of our visitors.
Up and down the tree
Bottom-up
So far, with our calculator example, we've only seen examples of bottom-up visitors. These are visitors that require the result of descendants before being able to compute some logic.
For example, when our EvaluateVisitor
comes across an Add
node, it first needs to know the value of its $left
and $right
children before being able to add them together.
The triangle means we are starting the visitor on this node.
More generically, this means we compute our logic after running the accept
method on our children.
public function visitSomeNode(SomeNode $node)
{
foreach ($node->children as $child) {
$child->accept($this);
}
// Our logic here...
}
Top-down
Top-down visitors are the exact opposite and do not require the result of descendants to operate. Consequently, they allow us to traverse our tree in a more predictable manner starting from the root and finishing on the leaves.
Their predictability can be particularly useful when handling state variables that updates from one node to the other as we will see in this article.
The create a top-down visitor, we simply need to compute our logic before running the accept
method on our children.
public function visitSomeNode(SomeNode $node)
{
// Our logic here...
foreach ($node->children as $child) {
$child->accept($this);
}
}
Mix and match
Whilst it's good to stick to either a top-down or bottom-up visitor for simple operations, it's not uncommon to need visitors that operate both ways.
Additionally, when a node has more than one child, you may also need to:
- Alternate the order in which children are calling the
accept
method. - Add logic in-between
accept
calls.
This all adds to the possibilities offered to you when traversing nodes via a visitor.
On the image above, blue is for top-down, green for bottom-up and orange for logic between each branch.
public function visitSomeNode(SomeNode $node)
{
// [Blue] Top-down logic here...
// Control execution order...
$children = $this->sortChildren($node->children);
foreach ($children as $index => $child) {
if ($index > 0) {
// [Orange] In-between logic here...
}
$child->accept($this);
}
// [Green] Bottom-up logic here...
}
An analogy with middleware
If you're familiar with the Laravel framework and its Pipeline
component, it can be compared with a tree of nodes where each node has exactly one child. Each node receives a $next
callback and is expected to call it to continue the pipeline execution.
This is what Laravel uses to execute HTTP middleware before a request reaches our controllers. Each middleware calls the $next
callback until the request is routed to the right controller. After that, the $next
callback returns the HTTP response so that all middleware have a chance to intercept it on the way up.
If you examine the handle
method of a middleware you will see that the way it intercept the request and/or response is analogous to the top-down and bottom-up logic of visitors we've just seen.
public function handle(Request $request, Closure $next)
{
// [Blue] Top-down logic here...
$reponse = $next($request);
// [Green] Bottom-up logic here...
}
Transforming the nodes
We've seen how we can visit our nodes up and down but, so far, we've only talked about visitors that output some sort of aggregates from our nodes — i.e. their evaluation and their printed value.
Let's now have a look at how we can alter the nodes themselves to create powerful visitors that can be chained together.
Some visitor examples
Let's start by defining a few visitors that we will implement and chain together in the following sections. We'll create two new visitors:
-
DoubleNumbersVisitor
. A visitor that doubles the value of allNumber
nodes. -
RemoveSubtractNodesVisitor
. A visitor that replacesSubtract(A, B)
withAdd(A, Minus(B))
.
We will then run these visitors in order and print the result using our PrintVisitor
.
$calculator->accept(new DoubleNumbersVisitor());
$calculator->accept(new RemoveSubtractNodesVisitor());
echo $calculator->accept(new PrintVisitor());
Mutating visitors
When transforming nodes, we can either mutate the data directly on the nodes, or we can return a brand new node cluster containing the transformations. Let's see both techniques starting with the one mutating nodes.
First, we'll create a new super-class called NullVisitor
that visits the children of each node (if any) but returns nothing.
class NullVisitor extends Visitor
{
public function visitNumber(Number $node)
{
//
}
public function visitAdd(Add $node)
{
$node->left->accept($this);
$node->right->accept($this);
}
public function visitSubtract(Subtract $node)
{
$node->left->accept($this);
$node->right->accept($this);
}
public function visitMinus(Minus $node)
{
$node->node->accept($this);
}
public function visitAssign(Assign $node)
{
$node->node->accept($this);
}
public function visitGet(Get $node)
{
//
}
public function visitCalculator(Calculator $node)
{
foreach ($node->statements as $statement) {
$statement->accept($this);
}
}
}
This class is useful for our mutating visitors since they can now extend this class and only override the methods that mutate nodes.
Thus, our DoubleNumbersVisitor
can be implemented like so:
class DoubleNumbersVisitor extends NullVisitor
{
public function visitNumber(Number $node)
{
$node->value *= 2;
}
}
Regarding the RemoveSubtractNodesVisitor
, you might be tempted to implement it like this:
class RemoveSubtractNodesVisitor extends NullVisitor
{
public function visitSubtract(Subtract $node)
{
$node->left->accept($this);
$node->right->accept($this);
return new Add($node->left, new Minus($node->right));
}
}
However, the NullVisitor
only allows us to change the variables inside our nodes, not to swap one node with another. Even if our NullVisitor
returned the $node
itself at the end of each visit method, it still wouldn't work because any parent node that returns itself will keep the reference of the original Subtract
node.
And so we've reached the limit of mutating visitors. They may be good for some use-cases so it's good to know they exist. However, more often then not, you will reach for an immutable visitor instead.
Immutable visitors
Not only immutable visitors are safer to handle — since updating one cluster has no side effects on the rest of the code — they also allow us to alter the structure of the cluster itself.
Instead of using the NullVisitor
above, we'll create a new super-class called IdentityVisitor
that returns a deep clone of the visited cluster. Namely, it returns a new instance of each node using the result of the visited children (if any).
class IdentityVisitor extends Visitor
{
public function visitNumber(Number $node)
{
return new Number($node->value);
}
public function visitAdd(Add $node)
{
$left = $node->left->accept($this);
$right = $node->right->accept($this);
return new Add($left, $right);
}
public function visitSubtract(Subtract $node)
{
$left = $node->left->accept($this);
$right = $node->right->accept($this);
return new Subtract($left, $right);
}
public function visitMinus(Minus $node)
{
$child = $node->node->accept($this);
return new Minus($child);
}
public function visitAssign(Assign $node)
{
$child = $node->node->accept($this);
return new Assign($node->variable, $child);
}
public function visitGet(Get $node)
{
return new Get($node->variable);
}
public function visitCalculator(Calculator $node)
{
$newStatements = [];
foreach ($node->statements as $statement) {
$newStatements = $statement->accept($this);
}
return new Calculator($newStatements);
}
}
Now we can implement our immutable DoubleNumbersVisitor
like this:
class DoubleNumbersVisitor extends IdentityVisitor
{
public function visitNumber(Number $node)
{
return new Number($node->value * 2);
}
}
More importantly, we can now implement the RemoveSubtractNodesVisitor
visitor by swapping the Subtract
node with an Add
node.
class RemoveSubtractNodesVisitor extends IdentityVisitor
{
public function visitSubtract(Subtract $node)
{
$left = $node->left->accept($this);
$right = $node->right->accept($this);
return new Add($left, new Minus($right));
}
}
Finally, we need to update the way we run these visitors things they now return a brand new cluster after each call.
$calculator = $calculator->accept(new DoubleNumbersVisitor());
$calculator = $calculator->accept(new RemoveSubtractNodesVisitor());
echo $calculator->accept(new PrintVisitor());
And that's all there is to it!
The headache of variables
In the previous episode, we've seen visitors that require variables. For example, our EvaluateVisitor
needed an $assignments
array to keep track of the values of all assigned variables. There is only so much a visitor can achieve without variables so it is important to know how to deal with them.
We can differentiate two types of variables inside visitors:
- The variables that are simply shared across all nodes. They act as a pool of information accessible throughout the journey of a visitor — e.g. the
$assignments
array. That information can grow as we visit nodes but it does not directly shape how we visit nodes. Let's call these: Shared variables. - The variables that define a state that evolves as we progress through our nodes. The value of these variables directly affect how we visit nodes and so it is important to ensure their values are updated properly as we go down and back up of the tree. Let's call these: State variables.
Shared variables
Let's start with the easy one. Any shared memory we need to compute as we visit our nodes enters that category.
- The
$assignment
array of ourEvaluateVisitor
. - A
$histogram
variable that would keep track of how many nodes of each type we've visited. - A
$validator
variable that would accumulate rules and data as we visit our nodes. - And so on.
Shared variables also include variables that act as options. For example, our PrintEvaluator
could accept some variables like $indentation
and $format
that would affect how we print our calculator for better reusability.
Yes, they technically affect the way we visit our nodes and potentially the return value of our visitors but they are shared across the entire journey and do not define a current state that needs to be maintained.
State variables
These are the tricky ones. They are often a pain to get your head around but they allow us to take visitors to the next level by making them like powerful state machines.
To illustrate this we will have a look at a few examples. For that, let's simplify our calculator slightly for a moment. We will only keep the following nodes: Add
, Minus
and Number
.
We'll create a visitor that immutably removes all Minus
nodes by changing the sign of the Number
nodes. Here are a few before/after examples:
Okay, let's give it a shot. We create a RemoveMinusNodesVisitor
class that:
- Uses a
$negative
boolean to swap between a "Positive" and a "Negative" state. - Reverses the
$negative
state every time we reach aMinus
node. - Removes all
Minus
nodes by returning their child node directly. - Revers the sign of the
Number
node's value if and only if we are in a "Negative" state. - Does nothing special when visiting the
Add
node since the$negative
state will simply flow down its children.
class RemoveMinusNodesVisitor extends Visitor
{
// We create a state variable that defines if
// we are in a "Negative" or "Positive" state.
protected bool $negative = false;
public function visitNumber(Number $node)
{
// If we are in a "Negative" state,
// we inverse the sign of the number.
$multiplier = $this->negative ? -1 : 1;
return new Number($node->value * $multiplier);
}
public function visitMinus(Minus $node)
{
// When we encounter a Minus node, we first
// reverse the "Negative" or "Positive" state.
$this->negative = ! $this->negative;
// Then we run our visitor down the child branch.
$child = $node->node->accept($this);
// Finally, we return the child directly
// without wrapping it in a Minus node.
return $child;
}
public function visitAdd(Add $node)
{
$left = $node->left->accept($this);
$right = $node->right->accept($this);
return new Add($left, $right);
}
}
Our implementation uses a top-down state variable. That means, we set up the new state before visiting the child nodes.
Thus, let's have a look at some simple examples and we'll write the "Negative" or "Positive" state between each step.
As you can see, the Minus
nodes inverse the state variable properly and therefore the sign of the Number
leaves are updated accordingly. Since the Minus
nodes directly return their visited child, they are automatically removed from the tree.
At first glance, this seems to do the trick. However, there's a bug in that visitor. Have a look at this example.
Since - (A + B)
is equivalent to (- A) + (- B)
, we expect the state of the Add
node in this example to be "Negative" for both the left and the right branch.
Since the "Negative" state is set by the first Minus
node and since the Add
node does not alter the state variable, this feels correct at first.
However, our state variable travels up-and-down the tree with our visitor and therefore any state changes applies to a child branch will eventually bubble back up to its parents.
In this particular case, the second Minus
node on the left branch cause the state to become "Positive" and bubbles back up to the Add
node which immediately visits the right branch without making any alterations. This means our right branch start with a "Positive" state instead of the expected "Negative" state making the final result wrong.
In this example, we can fix this by recording the state variable on the Add
node before visiting the left branch and restoring it before visiting the right branch.
public function visitAdd(Add $node)
{
// Record the state before visiting the left branch.
$originalNegative = $this->negative;
$left = $node->left->accept($this);
// Restore it before visiting the right branch.
$this->negative = $originalNegative;
$right = $node->right->accept($this);
return new Add($left, $right);
}
And now our visitor works as expected!
More generally, when dealing with state variables, it is a good idea to always restore the original state right before returning. You only need to do this for nodes that alter the state.
This removes the need to worry about unpredictable state bubbling up from one node to another. This was a simple example but imagine having a cluster of 20 types of node and 10 state variables and having to worry about all possible combinations.
For our RemoveMinusNodesVisitor
, it would look like this:
class RemoveMinusNodesVisitor extends Visitor
{
protected bool $negative = false;
public function visitNumber(Number $node)
{
// Nothing to do here since we don't alter the state.
$multiplier = $this->negative ? -1 : 1;
return new Number($node->value * $multiplier);
}
public function visitMinus(Minus $node)
{
// Record state.
$originalNegative = $this->negative;
// Alter state.
$this->negative = ! $this->negative;
// Visit branches.
$child = $node->node->accept($this);
// Restore state.
$this->negative = $originalNegative;
return $child;
}
public function visitAdd(Add $node)
{
// Nothing to do here anymore since we don't alter the state.
$left = $node->left->accept($this);
$right = $node->right->accept($this);
return new Add($left, $right);
}
}
Now, this was a tough subject but I think it's important to be comfortable with state variables as, without them, there is only so much we can do with visitors.
Note that we've only talked about top-down state variables — meaning the state is set before visiting the children — but you could also have bottom-up state variables. These are useful when you actually want information from the children to bubble up to the parents. And of course, you can have state variables that carry useful information both down and up the tree!
We won't talk about these in this series because I feel like I've used enough of your brain energy already but it's good to know that they exist and that they should be avoided at all costs if you don't like migraines.
Conclusion
Phew, congratulations on making it to the end of this article! 🎉
We now have so many new tools to manage our clusters of nodes.
The next two articles will be exclusive to sponsors. In these articles, we will learn yet another way to visit our nodes using what we call "breadth-first search" and we will then provide a thorough dictionary of visitor operations we can refer back to whenever we are handling nodes.
If you enjoy this series and you are not already a sponsor, it would mean the world to me if you could consider sponsoring me on GitHub. Not only you will unlock all "sponsor-only" articles on my blog but you will enable me to spend more time creating content and less time doing jobs that pay the bills.
After these sponsor-only episodes, another two public episodes will be available to read starting with how to persist our nodes in a database.