Advent of YARV: Part 22 - Pattern matching

Dec 22, 2022

This blog series is about how the CRuby virtual machine works. If you’re new to the series, I recommend starting from the beginning. This post is about pattern matching.

Pattern matching was introduced as a syntax in Ruby 2.7. Since then it has grown to include a couple more variants like single-line matching with the => or in operators and the find pattern for arrays. Originally, pattern matching was implemented as a library. When it was decided to be merged in to the language, it was rewritten to run as YARV instruction sequences.

Because of this, it’s seems very different from the rest of the language. In other features, the compiler generally generates fewer, more powerful instructions. In pattern matching, the compiler generates a lot more instructions. As a result, there weren’t many instructions added for pattern matching. The instructions it does use though are pretty extensive.

Let’s look at an example. In the following Ruby code we’re asserting that the value of foo matches the string "bar". If it does, it will return nil. If it doesn’t, it will raise an error.

foo => "bar"

Below is the YARV instruction sequence for this code.

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,12)> (catch: false)
putnil                                                           (   1)[Li]
putnil
putobject                              false
putnil
putnil
putself
opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
dup
putstring                              "bar"
dupn                                   2
checkmatch                             2
dup
branchif                               37
putspecialobject                       1
putobject                              "%p === %p does not return true"
topn                                   3
topn                                   5
opt_send_without_block                 <calldata!mid:core#sprintf, argc:3, ARGS_SIMPLE>
setn                                   6
putobject                              false
setn                                   8
pop
pop
setn                                   2
pop
pop
branchif                               89
putspecialobject                       1
topn                                   4
branchif                               65
putobject                              NoMatchingPatternError
putspecialobject                       1
putobject                              "%p: %s"
topn                                   4
topn                                   7
opt_send_without_block                 <calldata!mid:core#sprintf, argc:3, ARGS_SIMPLE>
opt_send_without_block                 <calldata!mid:core#raise, argc:2, ARGS_SIMPLE>
jump                                   85
putobject                              NoMatchingPatternKeyError
putspecialobject                       1
putobject                              "%p: %s"
topn                                   4
topn                                   7
opt_send_without_block                 <calldata!mid:core#sprintf, argc:3, ARGS_SIMPLE>
topn                                   7
topn                                   9
opt_send_without_block                 <calldata!mid:new, argc:3, kw:[matchee,key], KWARG>
opt_send_without_block                 <calldata!mid:core#raise, argc:1, ARGS_SIMPLE>
adjuststack                            7
putnil
leave
adjuststack                            6
putnil
leave

You can see why we left this to the end of the blog series. Believe it or not, there is actually only one instruction in that entire list that we haven’t covered yet, which is checkmatch. We’ll get to it in a moment. In the meantime, you can see that there is quite a lot happening here.

Let’s break it up into different sections. Below is the same set of instructions, but broken out into sections whenever a branching instruction is encountered or wherever a branching instruction could branch to. This is called a control-flow graph, and it can help us better understand what’s going on. Each section is called a basic block, and we’ll refer to them that way for the rest of this post.

I’ve linked the image above so you can view it full-size if that helps. Now that we have things a little better laid out, let’s take a look at what’s going on. There are 8 basic blocks in total in this diagram. I’ll refer to them by their starting offset from the start of the instruction sequence.

block 0: The starting block sets up the stack, pushes the object to be matched onto the stack, pushes the pattern to match onto the stack, then calls checkmatch. It branches on the success of running checkmatch.
block 19: This block is a failure case. For some reason the #=== method that was called on the pattern returned an unexpected value. This block sets up the error message that will be used when the NoMatchingPatternError error is raised.
block 49: This block sets up the NoMatchingPatternError’s message and then actually raises it by calling core#raise on the frozen core object. It then jumps to a final exit block.
block 85: This block cleans up the cached values on the stack, pushes nil, and then exits.
block 37: This block checks the result of the match operation. If it was successful, it jumps to block 89. If not, it jumps to block 43.
block 43: This block determines the reason the match operation failed. It branches to either block 49 or 65 to raise the appropriate error.
block 89: This is the successful match case. It cleans out the stack, pushes nil, and then exits.
block 65: This block raises a NoMatchingPatternKeyError and then jumps to block 85.

Believe it or not, this is one of the simplest kinds of pattern matching that exists. It only gets more complicated from here. We’re not going to dive into every kind today, but we will explain a couple of instructions that get used by pattern matching in general.

`checkmatch`

We saw earlier that checkmatch showed up to perform the pattern matching. checkmatch is responsible for popping two values off the stack, the object to be matched and the pattern to match against, matching them using some strategy, and then pushing the result of the match onto the stack.

The strategy used to match the object against the pattern is determined by the flag operand to checkmatch. There are three strategies that can be used:

VM_CHECKMATCH_TYPE_WHEN: This strategy ignores the object to be matched and checks that the pattern is truthy.
VM_CHECKMATCH_TYPE_CASE: This strategy checks that the pattern matches the object to be matched using the #=== method.
VM_CHECKMATCH_TYPE_RESCUE: This performs the same check as VM_CHECKMATCH_TYPE_CASE with the additional check that the pattern is a class or module.

`checktype`

There are times in YARV where the type of the object returned by a method call needs to checked to conform to an expected type. For example, with pattern matching you can define custom #deconstruct and #deconstruct_keys methods. Those methods should return arrays and hashes respectively. If they don’t, then an error should be raised. This is where checktype comes in.

checktype accepts a single operand which is the type that the object on the top of the stack should conform to. If it does, then true is pushed onto the stack. If it doesn’t, then false is pushed onto the stack. The type check uses the TYPE macro, which returns a value from the ruby_value_type enumerable. This mostly involves checking the object’s tag or flags.

Wrapping up

There is a lot of depth that you can go into when looking at how YARV implements pattern matching. We only barely scratched the surface in this post. However, we did go through two of the instructions that make it possible: checkmatch and checktype. A couple of things to remember from this post:

Pattern matching is implemented in many YARV instructions, as opposed to fewer instructions with more logic.
Control-flow graphs with basic blocks are a great way to visualize the flow of instructions in a method.

In the next post we’ll look at a couple of instructions that you will only see in YARV if you’re working on CRuby itself.

← Back to home