Advent of YARV: Part 10 - Local variables (3)

This blog series is about how the CRuby virtual machine works. If you’re new to the series, I recommend starting from the beginning. This post is the last of three posts about local variables.

In the previous two posts, we’ve discussed two ways of introducing local variables into your code. You can assign to a local variable through regular assignment, like foo = 1. Or you can declare parameters on a method declaration and then access them in the method, as in def foo(bar) = bar. In this post, we’ll talk about a couple more ways to introduce local variables and how to access them.

Environment data

First, before we go on, I need to make a confession. I omitted a detail when we were discussing local variables and the environment pointer to make the concept simpler in the beginning. Now that we have a better understanding of locals and the environment pointer though, we need that detail in place.

Here it is: when a frame that is not a block frame is pushed onto the frame stack, 3 other values are pushed onto the value stack as well. Those values depend on the frame type, but effectively form a scratch area that the frame can use to read and write values. Internally to CRuby, this area is called VM_ENV_DATA. The 3 slots of the value stack are within the data area are:

VM_ENV_DATA_INDEX_ME_CREF
This slot can contain a couple of different things: method entries, class references, special variables, or false.
VM_ENV_DATA_INDEX_SPECVAL
This is used to hold either the environment pointer for the parent frame or the block handler for a method.
VM_ENV_DATA_INDEX_FLAGS
This is used to hold flags for the frame. We ran into this in the previous post when we were discussing the VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM flag.

Today, we’re interested in VM_ENV_DATA_INDEX_ME_CREF. This slot contains a special struct that is used to store a vm_svar struct, which contains our additional local variables.

Special variables

Let’s take a look at the struct definition for vm_svar:

/*! SVAR (Special VARiable) */
struct vm_svar {
  VALUE flags;
  const VALUE cref_or_me; /*!< class reference or rb_method_entry_t */
  const VALUE lastline;
  const VALUE backref;
  const VALUE others;
};

This is a common structure used in CRuby called an imemo. An imemo is a kind of interface of the same size that can be passed around the Ruby VM without too much hassle. The last three entries in the structure are what we care about right now. So let’s define them now:

flip-flop

We need to take a quick minute to discuss the flip-flop operator, because it is not a commonly used or known feature of Ruby. The flip-flop operator is a way to create a gating condition that will evaluate to true every time after its initial condition is met until its final condition is met. It is used like this:

(1..100).each do |value|
  if (value == 5) .. (value == 10)
    print "#{value} "
  end
end

puts "done!"

The example above will print 5 6 7 8 9 10 done!. You can see that there’s an extra state that needs to be stored somewhere which is whether or not the flip-flop has been triggered yet. This is the boolean that is stored in the others array.

getspecial

Finally, we get to the instructions that we’re here to see today. These instructions are getspecial and setspecial. These instructions are used to access information held within the special variables we just discussed (but not the variables themselves!).

The getspecial instruction has two operands. Its function is to access the special variable given by the operands and push the value onto the stack.

The first operand is the index of the special variable to access. If the index is 0, then it’s referring to VM_SVAR_LASTLINE, which will access the lastline field on the svar struct. If the index is 1, then it’s referring to VM_SVAR_BACKREF, which will access the backref field on the svar struct. If the index is 2 or above, then it’s referring to 2 more than an index into the the others field, which is an array.

The second value is used only if the match data is being accessed and is used to indicate which field within the match data to return. It is a tagged value, with its tag being the least significant bit.

Let’s go through an example of each type of special variable.

lastline

The only way to access the lastline special variable directly through the getspecial instruction is when you use a regular expression as the predicate of a conditional statement. As in:

if /pattern/
  puts "matched pattern against last line!"
end

That results in a disassembly of:

== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,3)> (catch: false)
0000 putobject                              /pattern/                 (   1)[Li]
0002 getspecial                             0, 0
0005 opt_regexpmatch2                       <calldata!mid:=~, argc:1, ARGS_SIMPLE>[CcCr]
0007 branchunless                           15
0009 putself                                                          (   2)[Li]
0010 putstring                              "matched pattern against last line!"
0012 opt_send_without_block                 <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE>
0014 leave
0015 putnil
0016 leave

You can see the getspecial instruction here with the 0 index means the last line.

backref

As we discussed, there are two kinds of backref special variables: capture groups and fields. Let’s put them into the same example:

[$1, $2, $3, $4, $&, $`, $', $+]

This results in a disassembly of:

== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(1,32)> (catch: false)
0000 getspecial                             1, 2                      (   1)[Li]
0003 getspecial                             1, 4
0006 getspecial                             1, 6
0009 getspecial                             1, 8
0012 getspecial                             1, 77
0015 getspecial                             1, 193
0018 getspecial                             1, 79
0021 getspecial                             1, 87
0024 newarray                               8
0026 leave

You can see the first operand is always 1, to indicate that we’re accessing the last match. The second operand is the tagged value that indicates which field to access. The first four entries all correspond to the capture groups, and the last four correspond to the fields.

others

Finally, we have the flip-flop states.

if (value == 1) .. (value == 3)
  puts "value is between 1 and 3"
elsif (value == 6) .. (value == 8)
  puts "value is between 6 and 8"
end

This results in a disassembly of:

== disasm: #<ISeq:<main>@test.rb:1 (1,0)-(5,3)> (catch: false)
0000 getspecial                             2, 0                      (   1)[Li]
...
0036 getspecial                             3, 0                      (   3)[Li]
...
0074 leave

You can see the first operand increments for each flip-flop encountered. (It would continue if there were more.) It’s always a 0 for the second operand because that doesn’t apply.

Diagram

If we take our example code puts "matched!" if /pattern/ and disassemble it, we get:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,28)> (catch: false)
0000 putobject                              /pattern/                 (   1)[Li]
0002 getspecial                             0, 0
0005 opt_regexpmatch2                       <calldata!mid:=~, argc:1, ARGS_SIMPLE>[CcCr]
0007 branchunless                           15
0009 putself
0010 putstring                              "matched!"
0012 opt_send_without_block                 <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE>
0014 leave
0015 putnil
0016 leave

When we get to the getspecial instruction, it looks a bit like:

getspecial

Once we execute it, it looks a bit like:

getspecial

setspecial

Unlike the flexibility of getspecial which can set any of the special variables, setspecial can only set the boolean associated with a flip-flop. It has a single operand which is two more than the index of the flip-flop to set. The value to set it to is popped off the stack.

For example, in foo if (bar == 1) .. (bar == 2) disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,31)> (catch: false)
0000 getspecial                             2, 0                      (   1)[Li]
0003 branchif                               17
0005 putself
0006 opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0008 putobject_INT2FIX_1_
0009 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0011 branchunless                           34
0013 putobject                              true
0015 setspecial                             2
0017 putself
0018 opt_send_without_block                 <calldata!mid:bar, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0020 putobject                              2
0022 opt_eq                                 <calldata!mid:==, argc:1, ARGS_SIMPLE>[CcCr]
0024 branchunless                           30
0026 putobject                              false
0028 setspecial                             2
0030 putself
0031 opt_send_without_block                 <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>
0033 leave
0034 putnil
0035 leave

Wrapping up

In this post, we looked at the getspecial and setspecial instructions. We saw that they are used to access special variables that are stored on the stack within a vm_svar struct. A couple of things to remember from this post:

With this post, we’re done with our tour of local variables. In the next post, we’ll talk about two other kinds of variables.

← Back to home