Advent of YARV: Part 14 - Branching

Dec 14, 2022

This blog series is about how the CRuby virtual machine works. If you’re new to the series, I recommend starting from the beginning. This post is about branching.

Although this blog series isn’t about how CRuby compiles into YARV instructions, we’ve touched on it briefly in a couple of posts. From our first post where we covered how literals are compiling into putobject instructions to our most recent post where we discussed the alias and undef methods, we’ve covered a lot of Ruby syntax without even knowing it. There is still one major piece of Ruby syntax that we haven’t covered: branching.

Braching refers to the ability to execute different code paths based on a condition. You can think of it like railroad tracks diverging. The logic that controls the branching can be conditional or unconditional. In this post, we’ll see how that maps back to both Ruby syntax and YARV instructions.

Consider for example, the if statement:

if foo
  bar
end

Using the instructions we’ve seen so far, we know that foo will get compiled into a method call using the send instruction. We also know that bar will be compiled similarly. However, we need to skip executing the send instruction for bar if the result of the foo method call is falsy.¹ This means our logic needs to branch between two different instruction paths.

The first path will be if foo returns a truthy value. In this case, we want to execute the send instruction for bar. The disassembly for this would look something like:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,19)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
putself
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave

Note that this effectively boils down to foo; bar. The second path will be if foo returns a falsy value. In this case, we want to skip the send instruction for bar.

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,19)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
pop
putnil
leave

Note that this effectively boils down to foo; return.

In order to marry these two paths, let’s look at the commonalities. The first two instructions are the same in both paths. The last instruction is also the same. Therefore, we need something that pops a value instead of executing the send instruction for bar if the result of the foo method call is falsy. This will look like something to the effect of:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,19)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
**************************************
putself
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave
putnil
leave

The asterix in the previous instructions is where we need to insert our special instruction. To branch ahead based on the top value of the stack. But before we can get there, we first need to understand a little more about how instructions are compiled.

Compiling instructions

To compile instructions the VM walks through the syntax tree generated by the parser and builds up a linked list. This linked list includes both instructions and operands. This is why when you look at the disassembly, you can see that the numbers on the left-hand side are not sequential: they are counting operands as well.

This linked list allows the VM to perform various optimizations while it is being compiled. For example, we looked at the opt_str_uminus instruction when we were looking at send specializations. This changes a putstring + send sequence of instructions into a single opt_str_uminus instruction. This optimization changes the number of instructions in the list. A linked list structure allows this to happen more easily than in an array because nothing needs to be copied to perform the removal.

There is one other kind of object that can be an entry in the linked list: a label. Labels are locations in an instruction sequence that other instructions are allowed to branch to. You don’t see them in the disassembly because when the VM compiles the instructions into an array, they are no longer necessary.

Let’s look again at our previous example. We left off by looking at an instruction sequence like the following:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,19)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
**************************************
putself
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave
putnil
leave

In this case the VM will insert a label into the linked list between the first leave and the putnil instruction. This is the location where we want to branch to if the result of the foo method call is falsy. The VM will then insert a branchunless instruction that branches to this label. The final instruction sequence will look like:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,19)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
branchunless                           9
putself
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave
putnil
leave

Notice that the branchunless instruction has a single operand which is a 9. That 9 is the offset from the start of the instruction sequence to the putnil instruction. This is the label that the branchunless instruction is conditionally branching to. With this new knowledge in mind, let’s look at today’s instructions.

branchunless
branchif
branchnil
jump
opt_case_dispatch

`branchunless`

As we just saw branchunless, accepts a single operand which is the offset to branch to. It pops a single value off the top of the stack and branches to the offset if the value is falsy. Otherwise it continues to the next instruction.

In Ruby:

class BranchUnless
  attr_reader :label

  def initialize(label)
    @label = label
  end

  def call(vm)
    vm.jump(label) unless vm.pop
  end
end

In if foo then bar end disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,19)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
branchunless                           9
putself
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave
putnil
leave

`branchif`

branchif does the opposite of branchunless. It branches to its offset operand if the top value of the stack is truthy. Otherwise it continues to the next instruction.

In Ruby:

class BranchIf
  attr_reader :label

  def initialize(label)
    @label = label
  end

  def call(vm)
    vm.jump(label) if vm.pop
  end
end

In while foo do bar end disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,20)> (catch: false)
jump                                   10                        (   1)[Li]
putnil
pop
jump                                   10
putself
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
pop
putself
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
branchif                               6
putnil
leave

`branchnil`

Like the previous two instructions, branchnil accepts a label operand and pops a single value off the stack. This time it branches to the label if the value is nil. Otherwise it continues to the next instruction. This is how Ruby implements the &. operator.

In Ruby:

class BranchNil
  attr_reader :label

  def initialize(label)
    @label = label
  end

  def call(vm)
    vm.jump(label) if vm.pop.nil?
  end
end

In foo&.bar disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,8)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
dup
branchnil                              8
opt_send_without_block                 <calldata!mid:bar, argc:0, ARGS_SIMPLE>
leave

`jump`

There are occasions when we want to unconditionally branch to a label. For example in a while loop we want to jump back to the top of the loop. The jump instruction does just that. It accepts a single operand which is the offset to jump to. It does not pop any values off the stack, and instead directs the VM to continue execution at the offset. It never continues to the next instruction.

In Ruby:

class Jump
  attr_reader :label

  def initialize(label)
    @label = label
  end

  def call(vm)
    vm.jump(label)
  end
end

In while foo do bar end disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,20)> (catch: false)
jump                                   10                        (   1)[Li]
putnil
pop
jump                                   10
putself
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
pop
putself
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
branchif                               6
putnil
leave

You can see in the above disassembly that it first jumps directly to the predicate for the while loop. Then it conditionally branches back to the top of the loop if the predicate is truthy.

`opt_case_dispatch`

The final instruction to look at today is the opt_case_dispatch instruction. First, let’s look at which instructions a case statement compiles into. For example, in:

case foo
when :bar then bar
when :baz then baz
when :qux then qux
end

This code is going to conditionally execute one of the branches based on the value of the foo method call. It will do this by first compiling a series of branchif instructions that check the result of calling #=== on each of the conditions. It will then compile each of the bodies of the conditions into a series of instructions that unconditionally jump to the end when they are done. For example, the above code compiles into:

== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(5,3)> (catch: false)
putself                                                          (   1)[Li]
send                                   <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, nil
putobject                              :bar                      (   2)
topn                                   1
send                                   <calldata!mid:===, argc:1, FCALL|ARGS_SIMPLE>, nil
branchif                               35
putobject                              :baz                      (   3)
topn                                   1
send                                   <calldata!mid:===, argc:1, FCALL|ARGS_SIMPLE>, nil
branchif                               42
putobject                              :qux                      (   4)
topn                                   1
send                                   <calldata!mid:===, argc:1, FCALL|ARGS_SIMPLE>, nil
branchif                               49
pop                                                              (   1)
putnil
jump                                   56
pop                                                              (   2)
putself                                [Li]
send                                   <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>, nil
jump                                   56
pop                                                              (   3)
putself                                [Li]
send                                   <calldata!mid:baz, argc:0, FCALL|VCALL|ARGS_SIMPLE>, nil
jump                                   56
pop                                                              (   4)
putself                                [Li]
send                                   <calldata!mid:qux, argc:0, FCALL|VCALL|ARGS_SIMPLE>, nil
jump                                   56
leave

Notice that the first half is the “dispatch” part. It checks each of the conditions in turn and dispatches to the correct branch. The second half is the “body” part. This part contains each of the bodies of each of the conditions of the case statement.

One of the optimizations that can be done on this kind of code is to use the opt_case_dispatch instruction. Presuming the #=== operator never gets redefined on the Symbol class, we can store a hash where the keys are the symbols to check and the values are the labels to jump to. Then, instead of performing a bunch of instructions to dispatch to the correct branch, we can just look up the label in the hash and jump to it.

The opt_case_dispatch instruction accepts a single operand, which is the compiled hash of conditions to labels. It first pops the value off the top of the stack, then looks up the label to jump to. If it is found, it jumps there, otherwise it jumps past all of the instructions. This optimization can be performed for symbols, integers, floats, nil, true, false, and strings. Applying this optimization to the previous example results in:

== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(5,3)> (catch: false)
putself                                                          (   1)[Li]
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
dup
opt_case_dispatch                      <cdhash>, 31
putobject                              :bar                      (   2)
topn                                   1
opt_send_without_block                 <calldata!mid:===, argc:1, FCALL|ARGS_SIMPLE>
branchif                               34
putobject                              :baz                      (   3)
topn                                   1
opt_send_without_block                 <calldata!mid:===, argc:1, FCALL|ARGS_SIMPLE>
branchif                               39
putobject                              :qux                      (   4)
topn                                   1
opt_send_without_block                 <calldata!mid:===, argc:1, FCALL|ARGS_SIMPLE>
branchif                               44
pop                                                              (   1)
putnil
leave                                                            (   4)
pop                                                              (   2)
putself                                [Li]
opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave                                                            (   4)
pop                                                              (   3)
putself                                [Li]
opt_send_without_block                 <calldata!mid:baz, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave                                                            (   4)
pop
putself                                [Li]
opt_send_without_block                 <calldata!mid:qux, argc:0, FCALL|VCALL|ARGS_SIMPLE>
leave

Note that the other instructions are still in the instruction sequence. They have to be there as a fallback in case one of the basic operators was redefined. For example, if you were to write:

# please don't do this
class Symbol
  def ===(other)
    true
  end
end

then the opt_case_dispatch instruction would immediately exit.

Wrapping up

In this post we looked at all of the instructions that can branch within a single instruction sequence. We also looked at how the compiler uses these instructions to implement branching syntax like if and case. A couple of things to remember from this post:

When compiling instructions, the compiler keeps instructions in a linked list. This allows it to easily insert new instructions and remove instructions from the sequence.
Labels are used to mark the location of instructions that can be jumped to. They are not present in the final disassembly because they are not needed in the final instruction sequence.

In the next post we’ll talk about just a single instruction: defineclass.

The term “falsy” refers nil or false. Everything else is refered to as “truthy”. ↩

← Back to home