Advent of Prism: Part 21 - Throws and jumps
This blog series is about how the prism Ruby parser works. If you’re new to the series, I recommend starting from the beginning. This post is about throws and jumps.
The terms “throw” and “jump” have more to do with the actual execution of Ruby than the parse tree, but they neatly categorize the nodes that we’re going to look at today.
Throws
“Throw” refers to throwing an exception. CRuby implements many of these using setjmp
/longjmp
, which are context-saving functions that allow you to break the execution flow of your C program much like you would with exceptions in Ruby. Ruby provides a couple of syntactic structures for handling these kinds of non-local control flow.
BeginNode
The parent node of any kind of exception handling is the BeginNode
node. This node houses an optional set of statements as well as any number of rescue
clauses, an optional ensure
clause, and an optional else
clause. Here is an example:
begin
1
rescue
2
end
This is represented by the following AST:
You can see the node has a statements
field that is the optional StatementsNode
holding the statements that should be executed. It also has a pointer to a rescue
node that is the first rescue
clause. If there are more rescue
clauses, they are linked together in a linked list. The ensure
and else
clauses are not present in this example so you don’t see their fields.
Remember from our previous posts that this node is also used to represent rescue
/else
/ensure
clauses being used in other contexts: class and module definitions, singleton class definitions, method definitions, and blocks and lambdas that use do
/end
.
RescueNode
When the rescue
keyword is used as another clause in a begin
statement, we represent it with the RescueNode
node. This node has a list of exceptions to rescue, an optional variable to assign the exception to, an optional set of statements, and an optional consequent rescue
clause. Here is an example that showcases all of that:
begin
foo
rescue Exception1 => error
warn error.message
rescue Exception2, Exception3 => @error
rescue *exception_list
rescue
warn "unknown error"
end
The actual flow of this program works like this:
foo
is called.- If
foo
raises an error, Ruby walks through therescue
clauses in order. - In the first
rescue
clause, theException1
constant is looked up. If it does not contain a class or module, aTypeError
is raised. If it does, then it checks if it is in the ancestor chain of the exception that was raised. If it is, then the exception is assigned to theerror
local variable and the statements in the clause are executed. If it is not, then the error is reraised to trigger checking the subsequent clause. - In the second
rescue
clause, both theException2
andException3
variables are checked in the same manner. If either of them are in the ancestor chain of the exception that was raised, then the exception is assigned to the@error
instance variable. Because there are no statements in this clause, nothing else happens. If neither of them are in the ancestor chain, then the error is reraised to trigger checking the subsequent clause. - In the third
rescue
clause,exception_list
has#to_a
called on it and then Ruby iterates over each element in the resulting array to check for classes or modules in the same as the other exceptions. If any of them are in the ancestor chain, the code jumps out of thebegin
node. Otherwise the error is reraised to trigger checking the subsequent clause. - In the last
rescue
clause the error is implicitly checked againstStandardError
. If it is in the ancestor chain, then the body of the clause is executed. Otherwise the error is reraised.
A couple of important things to note here in terms of syntax:
- The optional error handle is any target that we have seen so far, including call targets. This means you can have the error handle actually be a method call if you want.
- The list of errors is a comma-separated list of (optionally splatted) expressions, not just constants. This is very powerful, but also a source of confusion. Remember that constant lookup itself can trigger method calls (through
const_missing
) so this can get quite dynamic. - If you omit any classes or modules to check against, Ruby implicitly checks against
StandardError
.
Let’s look at a slightly simpler example to see how this is represented in the AST:
begin
rescue Error1 => error
rescue Error2
warn("error")
end
This is represented by the following AST:
Notice that the RescueNode
nodes form a linked list, much like the if statements that we covered back in Part 7 - Control-flow. As we discussed back then, the two options we have for representing these kinds of nodes is a linked list or a flat list. We went with a linked list in this case because it’s not that common that you have more than a couple of rescue
clauses, and it’s simpler to implement this way.
RescueModifierNode
When the rescue
keyword is used as a modifier to an expression, we represent it with the RescueModifierNode
node. Here’s an example:
foo rescue "error!"
This is semantically equivalent to:
begin
foo
rescue StandardError
"error!"
end
The example is represented by the following AST:
This relatively simple node is deceptively complex to parse, but easy to understand and compile. The rescue
keyword actually breaks operator precedence rules and is allowed to be used as the modifier to any assignment expression. This means that you can do things like:
foo = bar rescue baz
and instead of being parsed as (foo = bar) rescue baz
, it is parsed as foo = (bar rescue baz)
. This special path through the parser makes things complex, but tends to better match programmers intuition.
EnsureNode
The ensure
keyword is an optional clause on the begin
statement that is always executed, even if an exception is raised. We represent it with the EnsureNode
. Here is an example:
begin
foo
ensure
bar
end
This is represented by the following AST:
Effectively this node is just a wrapper around a set of statements. It is far more complicated to implement than to parse.
ReturnNode
The last throw is the return
keyword. In normal execution, the return
keyword can be implemented using a leave
instruction, however you can also return from within blocks. In this case the virtual machine must jump all of the way out to the method, which is why this is a throw. First, here is an example:
def foo
[1, 2, 3].each do |i|
return i if i == 2
end
end
This is a little contrived, but it demonstrates the point. This code will call the #each
method on the array literal, and when the iteration variable i
is equal to 2
, it will return i
from the method. This whole example is represented by the following AST:
You can see the ReturnNode
in the bottom right of the diagram there. It has an optional set of arguments, which are the values to return from the method. If there are multiple values, they are grouped together into an array.
Jumps
“Jump” refers to jumping around the instructions in a program. You can think of them effectively as goto
statements. Ruby provides many keywords for jumping around, and they all have their own nodes in the parse tree. Let’s look at them one by one.
BreakNode
The break
keyword jumps out of the current block. It can optionally accept a value to return from the block as here. Here is an example:
while true
break 1
end
This is represented by the following AST:
The code above says to immediately break out of the loop and return 1
. Any number of arguments can be passed to break
— they end up being grouped together into an array if there are multiple. A common misconception is that break
accepts parentheses; in reality if you use parentheses you’re actually just grouping together the first argument.
NextNode
The next
keyword jumps to the end of the current block, but not out of it. Like break
, it can optionally accept any number of values to return from the block. Here is an example:
while true
next 1
end
This is represented by the following AST:
The code above says to immediately jump to the end of the loop and return 1
. This will actually loop indefinitely because the next
keyword just keeps getting executed. Like break
, next
accepts any number of arguments, which are grouped together into an array if there are multiple.
RedoNode
The redo
keyword is effectively the opposite of the next
keyword: it jumps back to the start of the current block. It does not accept any arguments. Here is an example:
while true
redo
end
This will, of course, loop indefinitely. Parsing this is very simple; you only parse the keyword. The node itself is therefore relatively simple as well. Here is the AST for the above snippet:
RetryNode
The retry
keyword is used to jump out of a rescue
clause and back to the begin
block. It does not accept any arguments. Here is an example:
begin
foo
rescue
retry
end
This retry
will get triggered if foo
raises an exception. It will then jump back to the begin
block and try again. This is represented by the following AST:
YieldNode
Using the yield
keyword, you can trigger the execution of a block that was passed to the current method. It can optionally accept any number of arguments to pass to the block. Here is an example:
def foo
yield 1
end
This is represented by the following AST:
Parsing the yield
construct is much the same as the other keywords we’ve looked at so far. It also accepts a list of arguments that are comma-delimited.
Wrapping up
Throws and jumps allow you to issue non-local control flow within your program. They are very powerful constructs, and understanding their semantics will help you get a better picture of what Ruby is doing under the hood. Here are a couple of things to remember from today:
- There are many ways to represent non-local control flow in Ruby
- There is a lot of syntax that allows you to jump around statements in your program
break
,next
,yield
, andreturn
all accept arguments but none of them use parentheses
We’re almost at the end here! Tomorrow we’ll be looking at the first of two posts on pattern matching. See you then!
← Back to home