Advent of YARV: Part 13 - Constants

This blog series is about how the CRuby virtual machine works. If you’re new to the series, I recommend starting from the beginning. This post is about constants.

Constants in Ruby exist in their own tree. Accessing them involves looking them up by walking up the tree according to your current constant nesting. The details of that specific algorithm are outside the scope of this post, but you can read more about it in the Ruby documentation. You can access constants from one of three starting points:

In the first two cases, you know exactly where to start looking. For an absolute path it’s the top level, which means starting from the Object constant and working your way down the tree. For a path relative to a variable, you start at the constant that the variable points to (it will raise a TypeError if the variable is not a class or module).

In the third case you need to walk up the tree from the current nesting. The nesting is stored as a part of the current frame, so walking up the tree involves walking up the frame stack. The starting point in this case is called the constant base. This needs to be pushed onto the stack to maintain the same stack order as the other two cases.

We’ll see today how the virtual machine handles all three cases, and how it can be optimized to avoid the constant lookup when it can be cached.

getconstant

The instruction that performs the constant lookup is called getconstant. It has a single operand, which is the name of the constant to find. getconstant pops two values off the stack. The first is the constant base, which is the starting point for the lookup. It expects that this object will point to a class or module.

Let’s start with the first two cases, an absolute path or a path relative to a variable. In these cases you either start from the top level (which means the Object class will be pushed onto the stack) or you start at a given constant (which means the class or module will already be on the stack as a result of a different instruction). In either case, the constant base is on the stack in the correct place.

In the last case, the constant base is pushed onto the stack by pushing on nil with putnil. The second operand is a boolean that indicates whether or not the constant base is allowed to be nil. If it is, then it will search the current lexical scope. If it is not, then it will instead call the #const_missing method.1

If the value is successfully found, it is pushed onto the stack, otherwise nil is pushed. For example, with getconstant :Foo:

getconstant

We’ll look at two disassembly examples. The first will be for an absolute path, as in ::Foo::Bar::Baz:2

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,15)> (catch: false)
0000 putnil                                                           (   1)[Li]
0001 pop
0002 putobject                              Object
0004 putobject                              true
0006 getconstant                            :Foo
0008 putobject                              false
0010 getconstant                            :Bar
0012 putobject                              false
0014 getconstant                            :Baz
0016 leave

The second will be for a path relative to a variable, as in foo::Bar::Baz:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: false)
0000 putself                                                          (   1)[Li]
0001 send                                   <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, nil
0004 putobject                              false
0006 getconstant                            :Bar
0008 putobject                              false
0010 getconstant                            :Baz
0012 leave

Notice that in both examples the instructions form a chain of a class/module, then a putobject with a boolean, then a getconstant.

setconstant

Setting a constant is somewhat different from looking one up. You cannot set a chain of constants in Ruby, only a single constant. This means the ::Foo::Bar::Baz = 1 really breaks down to looking up ::Foo::Bar, then setting the :Baz constant on that class or module. This is what the setconstant instruction does.

setconstant accepts a single operand, which is the name of the constant to set. It pops two values off the stack. The top value on the stack is expected to be the constant base. The next value down is expected to be the value to set. For example, with setconstant :Foo:

setconstant

In ::Foo::Bar::Baz = 1 disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,19)> (catch: false)
0000 putnil                                                           (   1)[Li]
0001 pop
0002 putobject                              Object
0004 putobject                              true
0006 getconstant                            :Foo
0008 putobject                              false
0010 getconstant                            :Bar
0012 putobject                              1
0014 swap
0015 topn                                   1
0017 swap
0018 setconstant                            :Baz
0020 leave

In foo::Bar::Baz = 1 disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,17)> (catch: false)
0000 putself                                                          (   1)[Li]
0001 send                                   <calldata!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, nil
0004 putobject                              false
0006 getconstant                            :Bar
0008 putobject                              1
0010 swap
0011 topn                                   1
0013 swap
0014 setconstant                            :Baz
0016 leave

putspecialobject

Because the constant base is expected to be on the stack, there are occasions where it needs to be pushed on from a value relative to the current context. This is done with the putspecialobject instruction.

putspecialobject has a single operand which is an entry in the vm_special_object_type enum. Each value in that enum corresponds to a special object that can be pushed onto the stack for the purpose of maintaining the expectations of other instructions. As we saw with the getcontant instruction, it expects the constant base to be on the stack. As we saw in the previous post on calling methods, it expects the receiver to be on the stack. This instruction fills those expectations.

There are three entries in total, and we’ll look at them in turn.

VM_SPECIAL_OBJECT_VMCORE

The first type of special object is VM_SPECIAL_OBJECT_VMCORE. This has the value of 1 in the enumeration, so you will see putspecialobject 1 in the disassembly. This value corresponds to pushing the special RubyVM::FrozenCore object onto the stack. (Note that you won’t be able to actually find that constant in Ruby because it’s purposefully hidden.)

RubyVM::FrozenCore has a few methods on it to execute functions internal to CRuby. It allows YARV to send methods to it like it were any other Ruby object using the send instruction (or any of its specializations). The methods that it has defined on it include:

VM_SPECIAL_OBJECT_CBASE

The second type of special object is VM_SPECIAL_OBJECT_CBASE. This has the value of 2 in the enumeration, so you will see putspecialobject 2 in the disassembly. This value corresponds to pushing the constant base corresponding to the current frame onto the stack. It does this by looking at the constant reference for the current frame and finding the value of self, then finding its class.

VM_SPECIAL_OBJECT_CONST_BASE

The third and last type of special object is VM_SPECIAL_OBJECT_CONST_BASE. This has a value of 3 in the enumeration, so you will see putspecialobject 3 in the disassembly. This is almost always the exact same value as CBASE, except that it skips eval frames.

This is the value that we care about in this blog post. Because the setconstant instruction expects the constant base to be on the stack, and we don’t have it already from a previous instruction because we’re looking up a relative path, the putspecialobject instruction is used to look up the context from the frame and push the constant base onto the stack.

For example, in Foo = 1 disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,7)> (catch: false)
0000 putobject_INT2FIX_1_                                             (   1)[Li]
0001 dup
0002 putspecialobject                       3
0004 setconstant                            :Foo
0006 leave

Here setconstant knows it expects the constant base to be on the stack, so it is pushed on with the putspecialobject instruction, which will push the constant base based on the current context onto the stack.

opt_getconstant_path

The last instruction that we will be looking at today is an optimization that John Hawthorn made to simplify the whole chain of instructions that were previously in place to lookup a constant. Where before you would see the chain that we already looked at, now you see a single opt_getconstant_path instruction.

This instruction has a single operand which does a lot of heavy-lifting. It is an iseq_inline_constant_cache struct, which contains two values. The first is the value of the constant, which is cached after first lookup.3 The second is an array of symbols corresponding to the constant chain. For example, in Foo::Bar::Baz, the array would contain [:Foo, :Bar, :Baz].4

When the cache is first compiled, it is registered with the virtual machine for every symbol in the array. The VM contains a cache-busting mechanism where any time something changes in the VM that corresponds to a constant, it invalidates any cache that contains a segment corresponding to that name. For example, if you were to run Foo = 1, then any inline cache used by a opt_getconstant_path instruction that had :Foo in its list would be invalidated.

The instruction performs the same lookup (actually using the same path through the code) as getconstant. Once the final value has been found, it is pushed onto the stack. For example, in Foo::Bar::Baz disassembly:

== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: false)
0000 opt_getconstant_path                   <ic:0 Foo::Bar::Baz>      (   1)[Li]
0002 leave

Wrapping up

Today we looked at all of the instructions that have to do with constants in Ruby. We saw how to get and set them, as well as looking at some optimizations that are in place to cache constant lookup. Some things to remember from this post:

This concludes our look at variables in Ruby. Tomorrow we’ll look at instructions that allow the VM to skip other instructions to allow branching logic.


  1. You may be asking yourself why a value is pushed onto the stack to indicate that the constant is allowed to be nil, if that boolean flag is known at compile-time. This as opposed to making it an operand to the getconstant instruction. It turns out that’s a good question

  2. I’m purposefully compiling this without optimizations turned on in order to demonstrate the getconstant instruction. Under regalar circumstances, the compiler will optimize this to use opt_getconstant_path instead. 

  3. Technically, it also contains the class reference in order to be sure that doesn’t change as well, but don’t worry about that for now. You can think of it as just the value of the constant. 

  4. In order to avoid having to hold the size of the array as well, the array is null-terminated. 

← Back to home