The Visitor pattern
In the previous episode, we got ourselves a little quirky calculator made of nodes. We'll now make various operations on these nodes starting with the most exciting one: evaluating the result of the calculator.
A world without visitors
Since we've already got one class per type of node — i.e. Number
, Add
, Subtract
, etc. — you might be tempted to add an evaluate
method to all of these classes.
For example, the trivial Number
class could implement it like so:
class Number extends Node
{
public function __construct(public float $value){}
public function evaluate(): float
{
return $this->value;
}
}
Then, the Add
class could follow a similar approach by first calling evaluate
on its left and right operands before adding them together.
class Add extends Node
{
public function __construct(public Node $left, public Node $right){}
public function evaluate(): float
{
$leftValue = $this->left->evaluate();
$rightValue = $this->right->evaluate();
return $leftValue + $rightValue;
}
}
And we could carry on like this for every single type of nodes, adding an abstract method on the Node
class to ensure all of the nodes implement this evaluate
method using the same signature.
abstract class Node {
abstract public function evaluate(): float;
}
I know you can feel a "but" coming but let me just say first that this is a valid solution that works and is a pretty clean one considering our simple example.
Now, the problem with this approach is that it makes operations on the cluster coupled with its data representation. Let me explain what I mean by that.
Imagine you suddenly need to print that cluster like 1 + 5 - (- 100)
. That's fine right? Let's add another print
method on all these nodes and we're sorted.
Now imagine you want to optimise your cluster by removing any double negative before printing it — i.e. the example above would become: 1 + 5 + 100
. However, you want to keep the original print
function for debugging purposes. That's fine, let's add an additional optimisedPrint
function.
All of our nodes now need to implement the following methods.
abstract class Node {
abstract public function evaluate(): float;
abstract public function print(): string;
abstract public function optimisedPrint(): string;
}
And these are fairly simple operations. Imagine having new requirements like:
- Go through each node and update statistics on which nodes are the most often used.
- Validate the calculator by returning a Laravel validator that should fail when, for example, a variable is accessed before being assigned.
- Raise a warning when two
Assign
nodes are using the same variable.
These operations will likely need some variables like $variableAssignedSoFar
that will now be mixed with the data representation of the nodes like $left
and $right
.
Furthermore, any additional feature on the cluster will require us to edit every single type of nodes it supports.
Basically, this makes our nodes less and less maintainable as they have to carry some complexity proportional to our feature set. That's what I mean by operations and data representation being coupled. To fix this we need to treat our nodes like simple value objects and outsource their operations elsewhere.
Outsourcing to visitors
The Visitor pattern allows us to create external classes to manage our node clusters. These classes handle one specific operation independently to each other therefore providing great reusability and separation of concerns.
The way this work is by creating one class per operation that contains one method per type of node. The naming convention for these methods is usually: visitNameOfTheNodeClass
. So let's start by creating an abstract Visitor
class that ensures all visitors contain the required methods.
abstract class Visitor {
abstract public function visitNumber(Number $node);
abstract public function visitAdd(Add $node);
abstract public function visitSubtract(Subtract $node);
abstract public function visitMinus(Minus $node);
abstract public function visitAssign(Assign $node);
abstract public function visitGet(Get $node);
abstract public function visitCalculator(Calculator $node);
}
Note two things:
- Every method expect a node type matching its name.
- We did not add any type expectations on the return value since this will depend on the visitor itself. For example, a visitor evaluating the tree will return a
float
whereas a visitor printing the tree will return astring
.
Next, we need to tell our nodes that they can be visited. For that, I like to create a Visitable
interface containing an accept
method. This method accepts a Visitor
as an argument and can again return anything depending on the visitor itself.
interface Visitable {
public function accept(Visitor $visitor);
}
Note that the Visitable
interface and the Visitor
abstract class are two pieces of the same puzzle. If you suddenly need to manage a different node cluster within the same app, you might want to use less generic names like CalculatorVisitor
and CalculatorVisitable
versus WorkflowVisitor
and WorkflowVisitable
.
Now, all that accept
method needs to do is immediately delegate to the right visitor method. For example, the Number
class will simply delegate to $visitor->visitNumber($this)
. We do this for every single type of node like so:
abstract class Node implements Visitable {}
class Number extends Node {
public function __construct(public float $value){}
public function accept(Visitor $visitor) {
return $visitor->visitNumber($this);
}
}
class Add extends Node {
public function __construct(public Node $left, public Node $right){}
public function accept(Visitor $visitor) {
return $visitor->visitAdd($this);
}
}
class Subtract extends Node {
public function __construct(public Node $left, public Node $right){}
public function accept(Visitor $visitor) {
return $visitor->visitSubtract($this);
}
}
class Minus extends Node {
public function __construct(public Node $node){}
public function accept(Visitor $visitor) {
return $visitor->visitMinus($this);
}
}
class Assign extends Node {
public function __construct(public string $variable, public Node $node){}
public function accept(Visitor $visitor) {
return $visitor->visitAssign($this);
}
}
class Get extends Node {
public function __construct(public string $variable){}
public function accept(Visitor $visitor) {
return $visitor->visitGet($this);
}
}
class Calculator extends Node {
public function __construct(public array $statements){}
public function accept(Visitor $visitor) {
return $visitor->visitCalculator($this);
}
}
If you want to get all fancy, you may also implement the __call
method on the abstract Node
class to dynamically call the right method based on the class name, like 'visit' . class_basename(static::class)
.
And that's it! We've wired visitors inside our node cluster. Now, we can create and execute visitors like this.
// Our previous calculator example.
$firstStatement = new Assign('foo', new Add(new Number(1), new Number(5)));
$secondStatement = new Subtract(new Number(100), new Minus(new Get('foo')));
$calculator = new Calculator([$firstStatement, $secondStatement]);
// Create and execute our first visitor.
$visitor = new MyFirstVisitor();
$resultOfMyFirstVisitor = $calculator->accept($visitor);
Note that — even though they often do — visitor don't have to start at the root of the cluster. They can start anywhere they want. For example, you could run that visitor only on the first statement like so:
// ...
$visitor = new MyFirstVisitor();
$resultForTheFirstStatementOnly = $firstStatement->accept($visitor);
Evaluating through a visitor
Now that we have our visitor system in place, let's implement one that evaluates our calculator.
When naming visitor classes, I like them to start with a verb and finish with Visitor
. If I wanted to create a visitor that lists all variables in the cluster, I would call it ListVariablesVisitor
. In this case, we simply want to evaluate the cluster so let's call it EvaluateVisitor
.
class EvaluateVisitor extends Visitor {
public function visitNumber(Number $node) { /* TODO */ }
public function visitAdd(Add $node) { /* TODO */ }
public function visitSubtract(Subtract $node) { /* TODO */ }
public function visitMinus(Minus $node) { /* TODO */ }
public function visitAssign(Assign $node) { /* TODO */ }
public function visitGet(Get $node) { /* TODO */ }
public function visitCalculator(Calculator $node) { /* TODO */ }
}
Let's implement these one by one. When evaluating a Number
node, we simply need to return its value.
public function visitNumber(Number $node)
{
return $node->value;
}
Similarly to what we did in the evaluate
method earlier, we need the evaluation of the left and right nodes in order to evaluate the Add
node. Whilst we no longer have an evaluate
method on these child nodes, we do have a way to pass a visitor via the accept
method. If we pass the current EvaluateVisitor
to the children, then we know we will end up with their correct evaluations. So all we need to do is replace $child->evaluate()
with $child->accept($this)
from our previous implementation.
public function visitAdd(Add $node)
{
$leftValue = $node->left->accept($this);
$rightValue = $node->right->accept($this);
return $leftValue + $rightValue;
}
Both the Subtract
and the Minus
nodes follow a similar logic.
public function visitSubtract(Subtract $node)
{
$leftValue = $node->left->accept($this);
$rightValue = $node->right->accept($this);
return $leftValue - $rightValue;
}
public function visitMinus(Minus $node)
{
$childValue = $node->node->accept($this);
return - $childValue;
}
It starts to become more interesting with the Assign
node since our visitor now requires some memory. We will use an associative array where the key is the name of the variable and the value is the evaluation of its child node.
But what should we return after storing that value? Well, it's up to us. Some languages decide that assignments should evaluate to nothing, others decide they should evaluate to the value of that assignment. For a calculator, the latter feel like a more natural choice to me so let's do that.
class EvaluateVisitor extends Visitor
{
protected array $assignments = [];
public function visitAssign(Assign $node)
{
// Evaluate the child node.
$childValue = $node->node->accept($this);
// Store its value.
$this->assignments[$node->variable] = $childValue;
// Use the child value as the "Assign" value too.
return $childValue;
}
// ...
}
Next, the Get
node will be evaluated by fetching its variable inside that new assignments
array. What if it does not exist though? What if someone tries to access a variable that has not been declared? Well, again that's up to us. We could be strict about it and raise an exception or fail silently by providing a default value like zero. Let's go for the strict approach here.
public function visitGet(Get $node)
{
if (! isset($this->assignments[$node->variable])) {
throw new \RuntimeException("Variable [{$node->variable}] was not assigned.");
}
return $this->assignments[$node->variable];
}
Last but not least, the Calculator
node. The first step is to loop through each of its statements and evaluate them. But then what should we use as the evaluation of the Calculator
itself? In this case, I like the idea of using the evaluation of the last statement as the "final value".
public function visitCalculator(Calculator $node)
{
if (empty($node->statements)) {
throw new \RuntimeException('The Calculator node requires at least one statement.');
}
$lastStatementValue = null;
foreach ($node->statements as $statement) {
$lastStatementValue = $statement->accept($this);
}
return $lastStatementValue;
}
I've added a little guard to ensure we never end up returning null
as a result of an empty calculator. Note that, we could also outsource all of these guards to one or more visitors whose jobs are to validate the nodes before we run operations on them. This is particularly useful when you have many ways of interpreting your nodes and you end up copy/pasting the same guards all over your visitors. In our example though, we only need two simple guards in one EvaluateVisitor
so it's okay to leave them as-is.
Okay, our EvaluateVisitor
is ready to be used! 🎉
// Our previous calculator example.
$firstStatement = new Assign('foo', new Add(new Number(1), new Number(5)));
$secondStatement = new Subtract(new Number(100), new Minus(new Get('foo')));
$calculator = new Calculator([$firstStatement, $secondStatement]);
// Instantiate and execute the EvaluateVisitor.
$evaluateVisitor = new EvaluateVisitor();
echo $firstStatement->accept($evaluateVisitor) . "\n"; // Returns: 6
echo $secondStatement->accept($evaluateVisitor) . "\n"; // Throws: Variable [foo] was not assigned
echo $calculator->accept($evaluateVisitor) . "\n"; // Returns: 106
So far, we haven't needed to provide constructor arguments to our visitors but this can be a great way to make them more flexible and reusable.
Say we want to evaluate a cluster of nodes with some pre-defined assignments like answerToLifeTheUniverseAndEverything = 42
. We could achieve this by passing an initial value to the $assignments
array of the EvaluateVisitor
class.
class EvaluateVisitor extends Visitor
{
protected array $assignments;
public function __construct(array $preDefinedAssignments = [])
{
$this->assignments = $preDefinedAssignments;
}
}
Now, we can evaluate the following statement without any errors.
$thirdStatement = new Add(new Get('answerToLifeTheUniverseAndEverything'), new Number(8));
$evaluateVisitorWithPredefinedAssignments = new EvaluateVisitor([
'answerToLifeTheUniverseAndEverything' => 42,
]);
echo $thirdStatement->accept($evaluateVisitorWithPredefinedAssignments); // Returns: 50
Printing through a visitor
As a quick exercise, let's now have a go at implementing the print
method we mentioned earlier using a visitor.
We'll create a PrintVisitor
class that display nodes in a PHP-like format. Here it goes:
class PrintVisitor extends Visitor
{
public function visitNumber(Number $node)
{
return $node->value;
}
public function visitAdd(Add $node)
{
$leftValue = $node->left->accept($this);
$rightValue = $node->right->accept($this);
return "$leftValue + $rightValue";
}
public function visitSubtract(Subtract $node)
{
$leftValue = $node->left->accept($this);
$rightValue = $node->right->accept($this);
return "$leftValue - $rightValue";
}
public function visitMinus(Minus $node)
{
$childValue = $node->node->accept($this);
return "- $childValue";
}
public function visitAssign(Assign $node)
{
$childValue = $node->node->accept($this);
return "\${$node->variable} = $childValue";
}
public function visitGet(Get $node)
{
return "\${$node->variable}";
}
public function visitCalculator(Calculator $node)
{
$print = '';
foreach ($node->statements as $statement) {
$print .= $statement->accept($this) . ";\n";
}
return $print;
}
}
And that's it. You can now use it to display your nodes like so:
$printVisitor = new PrintVisitor();
echo $calculator->accept($printVisitor);
// Returns:
// $foo = 1 + 5;
// 100 - - $foo;
Within one single class and without adding any extra complexity to our nodes, we've added a new feature to our cluster.
Chaining visitors
Before finishing this article, let's talk about the optimisedPrint
that removes the double negation before printing.
You might be tempted to create a new printing visitor like PrintWithoutDoubleNegationVisitor
. However, we could achieve the same result by creating a simpler visitor called RemoveDoubleNegationVisitor
whose job is just to remove double negation by making some changes in the cluster.
The cool thing about this approach is that we can reuse the PrintVisitor
without any modifications. All we need to do is call the RemoveDoubleNegationVisitor
before the PrintVisitor
and everything will work as expected.
$calculatorWithoutDoubleNegation = $calculator->accept(new RemoveDoubleNegationVisitor());
echo $calculatorWithoutDoubleNegation->accept(new PrintVisitor());
// Returns:
// $foo = 1 + 5;
// 100 + $foo;
By creating smaller and more specific visitors, we increase their reusability whilst reducing their complexity.
I won't detail the implementation of the RemoveDoubleNegationVisitor
in this article since it requires us to be a little more comfortable with visitor flows which we will tackle in the next episode.
However, if you're interested, you can find my implementation of that visitor in the gist below. If you decide to use this opportunity as an exercise, bear in mind that there are many ways to implement this visitor so don't be discourage if our implementations don't match perfectly.
Conclusion
In this episode, we've seen how visitors can help us decouple our nodes' data from the operations we need to perform on these nodes by turning this:
into this:
Now that we are comfortable with the concepts of nodes and visitors, we will go one step further and identify different ways our visitors can visit our nodes. This will provide us with a greater range of operations we can implement using visitors.