Nodes and the Visitor pattern

The Visitor pattern

Episode 2
3 weeks ago
11 min read

In the previous episode, we got ourselves a little quirky calculator made of nodes. We'll now make various operations on these nodes starting with the most exciting one: evaluating the result of the calculator.

The calculator example from the previous episode

A world without visitors

Since we've already got one class per type of node — i.e. Number, Add, Subtract, etc. — you might be tempted to add an evaluate method to all of these classes.

For example, the trivial Number class could implement it like so:

class Number extends Node
{
    public function __construct(public float $value){}
  
    public function evaluate(): float
    {
        return $this->value;
    }
}

Then, the Add class could follow a similar approach by first calling evaluate on its left and right operands before adding them together.

class Add extends Node
{
    public function __construct(public Node $left, public Node $right){}
  
    public function evaluate(): float
    {
        $leftValue = $this->left->evaluate();
        $rightValue = $this->right->evaluate();
      
        return $leftValue + $rightValue;
    }
}

And we could carry on like this for every single type of nodes, adding an abstract method on the Node class to ensure all of the nodes implement this evaluate method using the same signature.

abstract class Node {
    abstract public function evaluate(): float;
}

I know you can feel a "but" coming but let me just say first that this is a valid solution that works and is a pretty clean one considering our simple example.

Now, the problem with this approach is that it makes operations on the cluster coupled with its data representation. Let me explain what I mean by that.

Imagine you suddenly need to print that cluster like 1 + 5 - (- 100). That's fine right? Let's add another print method on all these nodes and we're sorted.

Now imagine you want to optimise your cluster by removing any double negative before printing it — i.e. the example above would become: 1 + 5 + 100. However, you want to keep the original print function for debugging purposes. That's fine, let's add an additional optimisedPrint function.

All of our nodes now need to implement the following methods.

abstract class Node {
    abstract public function evaluate(): float;
    abstract public function print(): string;
    abstract public function optimisedPrint(): string;
}

Our running example with a block on each node that lists the following methods "evaluate, print and optimised print"

And these are fairly simple operations. Imagine having new requirements like:

  • Go through each node and update statistics on which nodes are the most often used.
  • Validate the calculator by returning a Laravel validator that should fail when, for example, a variable is accessed before being assigned.
  • Raise a warning when two Assign nodes are using the same variable.

These operations will likely need some variables like $variableAssignedSoFar that will now be mixed with the data representation of the nodes like $left and $right.

Furthermore, any additional feature on the cluster will require us to edit every single type of nodes it supports.

Basically, this makes our nodes less and less maintainable as they have to carry some complexity proportional to our feature set. That's what I mean by operations and data representation being coupled. To fix this we need to treat our nodes like simple value objects and outsource their operations elsewhere.

Outsourcing to visitors

The Visitor pattern allows us to create external classes to manage our node clusters. These classes handle one specific operation independently to each other therefore providing great reusability and separation of concerns.

The way this work is by creating one class per operation that contains one method per type of node. The naming convention for these methods is usually: visitNameOfTheNodeClass. So let's start by creating an abstract Visitor class that ensures all visitors contain the required methods.

abstract class Visitor {
    abstract public function visitNumber(Number $node);
    abstract public function visitAdd(Add $node);
    abstract public function visitSubtract(Subtract $node);
    abstract public function visitMinus(Minus $node);
    abstract public function visitAssign(Assign $node);
    abstract public function visitGet(Get $node);
    abstract public function visitCalculator(Calculator $node);
}

Note two things:

  • Every method expect a node type matching its name.
  • We did not add any type expectations on the return value since this will depend on the visitor itself. For example, a visitor evaluating the tree will return a float whereas a visitor printing the tree will return a string.

Next, we need to tell our nodes that they can be visited. For that, I like to create a Visitable interface containing an accept method. This method accepts a Visitor as an argument and can again return anything depending on the visitor itself.

interface Visitable {
    public function accept(Visitor $visitor);
}

Note that the Visitable interface and the Visitor abstract class are two pieces of the same puzzle. If you suddenly need to manage a different node cluster within the same app, you might want to use less generic names like CalculatorVisitor and CalculatorVisitable versus WorkflowVisitor and WorkflowVisitable.

Now, all that accept method needs to do is immediately delegate to the right visitor method. For example, the Number class will simply delegate to $visitor->visitNumber($this). We do this for every single type of node like so:

abstract class Node implements Visitable {}

class Number extends Node {
    public function __construct(public float $value){}
    public function accept(Visitor $visitor) {
        return $visitor->visitNumber($this);
    }
}

class Add extends Node {
    public function __construct(public Node $left, public Node $right){}
    public function accept(Visitor $visitor) {
        return $visitor->visitAdd($this);
    }
}

class Subtract extends Node {
    public function __construct(public Node $left, public Node $right){}
    public function accept(Visitor $visitor) {
        return $visitor->visitSubtract($this);
    }
}

class Minus extends Node {
    public function __construct(public Node $node){}
    public function accept(Visitor $visitor) {
        return $visitor->visitMinus($this);
    }
}

class Assign extends Node {
    public function __construct(public string $variable, public Node $node){}
    public function accept(Visitor $visitor) {
        return $visitor->visitAssign($this);
    }
}

class Get extends Node {
    public function __construct(public string $variable){}
    public function accept(Visitor $visitor) {
        return $visitor->visitGet($this);
    }
}

class Calculator extends Node {
    public function __construct(public array $statements){}
    public function accept(Visitor $visitor) {
        return $visitor->visitCalculator($this);
    }
}

If you want to get all fancy, you may also implement the __call method on the abstract Node class to dynamically call the right method based on the class name, like 'visit' . class_basename(static::class).

And that's it! We've wired visitors inside our node cluster. Now, we can create and execute visitors like this.

// Our previous calculator example.
$firstStatement = new Assign('foo', new Add(new Number(1), new Number(5)));
$secondStatement = new Subtract(new Number(100), new Minus(new Get('foo')));
$calculator = new Calculator([$firstStatement, $secondStatement]);

// Create and execute our first visitor.
$visitor = new MyFirstVisitor();
$resultOfMyFirstVisitor = $calculator->accept($visitor);

Note that — even though they often do — visitor don't have to start at the root of the cluster. They can start anywhere they want. For example, you could run that visitor only on the first statement like so:

// ...

$visitor = new MyFirstVisitor();
$resultForTheFirstStatementOnly = $firstStatement->accept($visitor);

Evaluating through a visitor

Now that we have our visitor system in place, let's implement one that evaluates our calculator.

When naming visitor classes, I like them to start with a verb and finish with Visitor. If I wanted to create a visitor that lists all variables in the cluster, I would call it ListVariablesVisitor. In this case, we simply want to evaluate the cluster so let's call it EvaluateVisitor.

class EvaluateVisitor extends Visitor {
    public function visitNumber(Number $node) { /* TODO */ }
    public function visitAdd(Add $node) { /* TODO */ }
    public function visitSubtract(Subtract $node) { /* TODO */ }
    public function visitMinus(Minus $node) { /* TODO */ }
    public function visitAssign(Assign $node) { /* TODO */ }
    public function visitGet(Get $node) { /* TODO */ }
    public function visitCalculator(Calculator $node) { /* TODO */ }
}

Let's implement these one by one. When evaluating a Number node, we simply need to return its value.

public function visitNumber(Number $node)
{
	return $node->value;
}

Similarly to what we did in the evaluate method earlier, we need the evaluation of the left and right nodes in order to evaluate the Add node. Whilst we no longer have an evaluate method on these child nodes, we do have a way to pass a visitor via the accept method. If we pass the current EvaluateVisitor to the children, then we know we will end up with their correct evaluations. So all we need to do is replace $child->evaluate() with $child->accept($this) from our previous implementation.

public function visitAdd(Add $node)
{
    $leftValue = $node->left->accept($this);
    $rightValue = $node->right->accept($this);

	return $leftValue + $rightValue;
}

Both the Subtract and the Minus nodes follow a similar logic.

public function visitSubtract(Subtract $node)
{
    $leftValue = $node->left->accept($this);
    $rightValue = $node->right->accept($this);

	return $leftValue - $rightValue;
}

public function visitMinus(Minus $node)
{
    $childValue = $node->node->accept($this);

	return - $childValue;
}

It starts to become more interesting with the Assign node since our visitor now requires some memory. We will use an associative array where the key is the name of the variable and the value is the evaluation of its child node.

But what should we return after storing that value? Well, it's up to us. Some languages decide that assignments should evaluate to nothing, others decide they should evaluate to the value of that assignment. For a calculator, the latter feel like a more natural choice to me so let's do that.

class EvaluateVisitor extends Visitor
{
  protected array $assignments = [];

  public function visitAssign(Assign $node)
  {
      // Evaluate the child node.
      $childValue = $node->node->accept($this);
    
      // Store its value.
      $this->assignments[$node->variable] = $childValue;

      // Use the child value as the "Assign" value too.
      return $childValue;
  }
  
  // ...
}

Next, the Get node will be evaluated by fetching its variable inside that new assignments array. What if it does not exist though? What if someone tries to access a variable that has not been declared? Well, again that's up to us. We could be strict about it and raise an exception or fail silently by providing a default value like zero. Let's go for the strict approach here.

public function visitGet(Get $node)
{
    if (! isset($this->assignments[$node->variable])) {
        throw new \RuntimeException("Variable [{$node->variable}] was not assigned.");
    }

	return $this->assignments[$node->variable];
}

Last but not least, the Calculator node. The first step is to loop through each of its statements and evaluate them. But then what should we use as the evaluation of the Calculator itself? In this case, I like the idea of using the evaluation of the last statement as the "final value".

public function visitCalculator(Calculator $node)
{  
    if (empty($node->statements)) {
        throw new \RuntimeException('The Calculator node requires at least one statement.');
    }

    $lastStatementValue = null;

  	foreach ($node->statements as $statement) {
        $lastStatementValue = $statement->accept($this);
    }

	return $lastStatementValue;
}

I've added a little guard to ensure we never end up returning null as a result of an empty calculator. Note that, we could also outsource all of these guards to one or more visitors whose jobs are to validate the nodes before we run operations on them. This is particularly useful when you have many ways of interpreting your nodes and you end up copy/pasting the same guards all over your visitors. In our example though, we only need two simple guards in one EvaluateVisitor so it's okay to leave them as-is.

Okay, our EvaluateVisitor is ready to be used! 🎉

// Our previous calculator example.
$firstStatement = new Assign('foo', new Add(new Number(1), new Number(5)));
$secondStatement = new Subtract(new Number(100), new Minus(new Get('foo')));
$calculator = new Calculator([$firstStatement, $secondStatement]);

// Instantiate and execute the EvaluateVisitor.
$evaluateVisitor = new EvaluateVisitor();
echo $firstStatement->accept($evaluateVisitor) . "\n"; // Returns: 6
echo $secondStatement->accept($evaluateVisitor) . "\n"; // Throws: Variable [foo] was not assigned
echo $calculator->accept($evaluateVisitor) . "\n"; // Returns: 106

So far, we haven't needed to provide constructor arguments to our visitors but this can be a great way to make them more flexible and reusable.

Say we want to evaluate a cluster of nodes with some pre-defined assignments like answerToLifeTheUniverseAndEverything = 42. We could achieve this by passing an initial value to the $assignments array of the EvaluateVisitor class.

class EvaluateVisitor extends Visitor
{
    protected array $assignments;

    public function __construct(array $preDefinedAssignments = [])
    {
        $this->assignments = $preDefinedAssignments;
    }
}

Now, we can evaluate the following statement without any errors.

$thirdStatement = new Add(new Get('answerToLifeTheUniverseAndEverything'), new Number(8));
$evaluateVisitorWithPredefinedAssignments = new EvaluateVisitor([
    'answerToLifeTheUniverseAndEverything' => 42,
]);

echo $thirdStatement->accept($evaluateVisitorWithPredefinedAssignments); // Returns: 50

Printing through a visitor

As a quick exercise, let's now have a go at implementing the print method we mentioned earlier using a visitor.

We'll create a PrintVisitor class that display nodes in a PHP-like format. Here it goes:

class PrintVisitor extends Visitor
{
    public function visitNumber(Number $node)
    {
        return $node->value;
    }

    public function visitAdd(Add $node)
    {
        $leftValue = $node->left->accept($this);
        $rightValue = $node->right->accept($this);
    
        return "$leftValue + $rightValue";
    }

    public function visitSubtract(Subtract $node)
    {
        $leftValue = $node->left->accept($this);
        $rightValue = $node->right->accept($this);
    
        return "$leftValue - $rightValue";
    }
    
    public function visitMinus(Minus $node)
    {
        $childValue = $node->node->accept($this);
    
        return "- $childValue";
    }

    public function visitAssign(Assign $node)
    {
        $childValue = $node->node->accept($this);
      
        return "\${$node->variable} = $childValue";
    }

    public function visitGet(Get $node)
    {
        return "\${$node->variable}";
    }

    public function visitCalculator(Calculator $node)
    {
        $print = '';
    
      	foreach ($node->statements as $statement) {
            $print .= $statement->accept($this) . ";\n";
        }
    
        return $print;
    }
}

And that's it. You can now use it to display your nodes like so:

$printVisitor = new PrintVisitor();
echo $calculator->accept($printVisitor);
// Returns:
// $foo = 1 + 5;
// 100 - - $foo;

Within one single class and without adding any extra complexity to our nodes, we've added a new feature to our cluster.

Chaining visitors

Before finishing this article, let's talk about the optimisedPrint that removes the double negation before printing.

You might be tempted to create a new printing visitor like PrintWithoutDoubleNegationVisitor. However, we could achieve the same result by creating a simpler visitor called RemoveDoubleNegationVisitor whose job is just to remove double negation by making some changes in the cluster.

The cool thing about this approach is that we can reuse the PrintVisitor without any modifications. All we need to do is call the RemoveDoubleNegationVisitor before the PrintVisitor and everything will work as expected.

$calculatorWithoutDoubleNegation = $calculator->accept(new RemoveDoubleNegationVisitor());
echo $calculatorWithoutDoubleNegation->accept(new PrintVisitor());
// Returns:
// $foo = 1 + 5;
// 100 + $foo;

By creating smaller and more specific visitors, we increase their reusability whilst reducing their complexity.

I won't detail the implementation of the RemoveDoubleNegationVisitor in this article since it requires us to be a little more comfortable with visitor flows which we will tackle in the next episode.

However, if you're interested, you can find my implementation of that visitor in the gist below. If you decide to use this opportunity as an exercise, bear in mind that there are many ways to implement this visitor so don't be discourage if our implementations don't match perfectly.

See gist

Conclusion

In this episode, we've seen how visitors can help us decouple our nodes' data from the operations we need to perform on these nodes by turning this:

Our running example with a block on each node that lists the following methods "evaluate, print and optimised print"

into this:

Our running example with only 3 classes next to it: EvaluateVisitor, PrintVisitor and RemoveDoubleNegationVisitor

Now that we are comfortable with the concepts of nodes and visitors, we will go one step further and identify different ways our visitors can visit our nodes. This will provide us with a greater range of operations we can implement using visitors.

See full code as a gist

Discussions

Author avatar
Patrick Curl
2 weeks ago

What's a more...usable use case for this, for real world scenarios?

As a full-stack dev is this something you'd use? It seems a bit overkill, but to me, so does the repository pattern.

If you can do it in less code, you should imho.

I do like the actions pattern like your laravel actions package which I've used a lot, but this feels more complicated and having a hard time grokking why I'd want to use the Visitor/Node pattern.

💖 0

Discussion

The Visitor pattern
Author avatar
Patrick Curl
2 weeks ago

What's a more...usable use case for this, for real world scenarios?

As a full-stack dev is this something you'd use? It seems a bit overkill, but to me, so does the repository pattern.

If you can do it in less code, you should imho.

I do like the actions pattern like your laravel actions package which I've used a lot, but this feels more complicated and having a hard time grokking why I'd want to use the Visitor/Node pattern.

💖 0
Author avatar
Loris Leiva
2 weeks ago

Hi Patrick 👋 You're absolutely right, when designing applications, you're not going to reach for this pattern very often and that's okay. Episode 6 of this series focuses on providing more real-world examples that applies to web development.

This series is a bit of a love letter from me to that pattern because once you're comfortable with it you can create very cool things like workflows and languages. 😊

💖 0

Would you like to chime in?

You must be a member to add a reply to a discussion.

Fortunately, it only takes two click to become one. See you on the other side! 🌸

Become a Member

Would you like to chime in?

You must be a member to start a new discussion.

Fortunately, it only takes two click to become one. See you on the other side! 🌸

Become a Member