Advent of Prism: Part 19 - Blocks

Dec 19, 2023

This blog series is about how the prism Ruby parser works. If you’re new to the series, I recommend starting from the beginning. This post is about blocks and lambdas.

At long last, we have reached the point of talking about blocks and lambdas. These are major pieces of Ruby functionality that we have been deftly avoiding until now. Today, we’ll take a look.

`BlockNode`

Blocks in Ruby code are represented by braces or the do and end keywords. They can also optionally declare parameters. They then accept a set of statements that are saved and then executed later when the block is called (either through the yield keyword or by transforming it into a Proc and then calling #call). Here’s an example:

foo do
  1
end

This code is represented by the following AST:

As you can see from the diagram, blocks hold a pointer to their body as well as their local table. The body field can either be a StatementsNode (as we see in this example) or a BeginNode (like we saw with methods, classes, modules, and singleton classes). That would look like:

foo do
  1
rescue
end

which is represented by the following AST:

rescue and its corresponding else and ensure clauses can only be used when the keywords are being used as the bounds of the block, and not braces.

It’s also worth noting that semantically, there is no difference between the bounds of the block. Once they are parsed, they are exactly the same. However, in the parser they have different precedence. Braces are bound much more tightly than do and end. For example:

foo bar {} # send the block to `bar`
foo bar do end # send the block to `foo`

It’s not necessarily important for you to remember the specifics of how these are bound as much as it is to remember that they cannot be immediately substituted.

`BlockParametersNode`

When blocks (or lambdas) declare parameters they are wrapped in a BlockParametersNode. These nodes are effectively a wrapper around a list of parameters. For example:

foo { |bar| }

This is represented by the following AST:

There are two differences from regular parameters nodes. The first is that they hold an inner location to their bounds (|| for blocks, () for lambdas). The second is that they hold a list of block locals. We’ll talk about these next.

`BlockLocalVariableNode`

In both blocks and lambdas, you can declare local variables that are only visible within the scope of the block or lambda. These declarations go right next to the declaration of the parameters themselves. For example:

foo { |; bar| }

The bar variable is then only visible within the block. This is semantically similar to:

foo do
  bar = nil
end

The main difference is that if bar is declared in an outer scope the block local will not overwrite it, while assigning nil to it will. These locals are represented by BlockLocalVariableNode nodes and go into the locals field on BlockParametersNode. The first example is represented by the following AST:

The actual syntax for these is that they are a semicolon-separated list of identifiers that follow a semicolon within the parameter list.

`LambdaNode`

Lambda literals are represented by the LambdaNode node. They look similar to blocks and function in much the same way — both function as closures around a set of parameters and a body. Here is an example:

-> (foo) { foo * 2 }

The syntax for a lambda literal begins with the -> token. It is then optionally followed by a parameter list. The parameter list can be optionally wrapped in parentheses. The parentheses are required if certain types of parameter types are used. This is followed by a body that is either wrapped in braces or the do and end keywords.

The example above is represented by the following AST:

Believe it or not, we’ve seen every node in this AST before except for the LambdaNode itself. On that node we have lots of internal locations, a pointer to a local table, a set of parameters, and a body. Much like blocks the body can be either a StatementsNode or a BeginNode.

Like blocks, lambdas can also declare block locals. These are represented by the same BlockLocalVariableNode nodes that we saw above. This looks like:

-> (; foo) {}

It’s important to note that these are lambda literals only and not calls to the Kernel#lambda method. Those are represented by CallNode nodes like all other method calls because they can be overridden depending on context.

`NumberedParametersNode`

The last piece of syntax we’re going to talk about today is numbered parameters. This is a special syntax that allows referencing positional parameters without explicitly declaring them. For example:

-> { _1 * 2 }

The syntax for numbered parameters is an underscore followed by a digit. The digit is the position of the parameter that you want to reference (1-indexed).

Numbered parameters are mutually exclusive with regular parameters. If you declare both in the same context, you’ll get a syntax error. You also cannot use them in nested contexts without a syntax error (e.g., -> { -> { _1 } }). Because of this mutual exclusivity we can be assured that the parameters field on BlockNode and LambdaNode will be nil when numbered parameters are used. We take advantage of that fact to provide some extra information for prism consumers. Here’s the AST for the above example:

As you can see, when numbered parameters are in use we use a NumberedParametersNode node to represent them. This node holds an integer that represents the number of parameters that are being referenced. Compilers can use this to set up the correct number of parameters for the block or lambda.

As a brief aside, Matz recently accepted a proposal for it to be another reference to _1. It’s controversial to say the least.

Wrapping up

Blocks and lambdas play a foundational role in Ruby. They are used to execute a set of statements over a closure at a prescribed time. Knowing their syntax and semantics will allow you to take full advantage of them. Here are a couple of things to remember from today:

Blocks and lambdas can have local variables declared that are only visible within the block or lambda.
Numbered parameters are a special syntax that allows referencing positional parameters without explicitly declaring them.

That’s all for today. Tomorrow we’ll be looking at two interesting keywords: alias and undef.

← Back to home