Ruby type conversion
Let’s talk about type conversion in Ruby.
First, we’re going to need some definitions:
type
the kind of object the program is currently dealing with. In Ruby this usually breaks down to the class of the object (e.g.,String
orInteger
).type conversion
the process of converting an object from one type to another type. Type conversion is an umbrella term, encapsulating both implicit and explicit type conversion.type coercion
this is the term used for implicit type conversion. The definition of this term in the context of Ruby is relatively up for debate, but I’m going to define it as either (a) a method converting an argument or (b) the syntax of the Ruby language converting an object.type casting
this is the term used for explicit type conversion. Ruby doesn’t necessarily have “casting” in the traditional sense (like C) but instead has methods used to represent explicit type conversions like#to_s
or#to_i
.interface
a set of methods that an object understands. These methods can be defined pretty much anywhere as long as the object knows where to find them (e.g., the class of the object, a module that is included in the object, the singleton class of the object, any active refinements, etc.).
With these definitions in mind, we can define the term “dynamically-typed”. Ruby is considered a dynamically-typed language because the types of the objects that are live during the execution of a Ruby program are not known at compile time, they are only known at run time. It follows that any method can receive any type of object for any argument. When a method receives an argument of a type that it doesn’t expect, it can do one of three things:
- raise an error implicitly - effectively analogous to praying you get the right types
- raise an error explicitly - something like
raise ArgumentError unless value.is_a?(String)
- convert the type of the object - by calling a type conversion method like
#to_str
By far the worst option is the first one, as it’s abdicating responsibility to the next developer. The second option is somewhat better, it in that it enforces the expected type. The issue with the second one is that it’s enforcing the nominal type (the class) as opposed to the structural type (the interface). It precludes the possibility that the object could be converted into the desired type, and instead halts execution.
The third option is the subject of this blog post. Ruby is rife with examples of this kind of type conversion, especially within the standard library. This pattern has propagated into other popular Ruby projects, including Ruby on Rails. To start out, let’s look at some examples.
Type coercion in a method
Let’s look at Array#flatten
. For those unfamiliar, #flatten
is a method on Array
that concatentates together internal arrays into a single array. For example:
[1, [2, 3], [4], [5, [6, [7]]]].flatten
# => [1, 2, 3, 4, 5, 6, 7]
It also accepts as an argument the “level” of flattening (i.e., the number of times it should recursively flatten). That looks like:
[1, [2, 3], [4], [5, [6, [7]]]].flatten(2)
# => [1, 2, 3, 4, 5, 6, [7]]
(Notice that the final array containing the integer 7
is not flattened because it was three levels deep.)
This is the base case, and works just fine in the examples provided above. However, there are times when the values inside the array are not so primitive. Let’s consider an example where we have a custom class that functions as a list of elements (and maintains its own internal array). In that case, if we call #flatten
on an array of those objects, we’ll get back the same array, as in:
class List
def initialize(elems)
@elems = elems
end
end
[List.new([1, 2]), List.new([3, 4]), List.new([5, 6])].flatten
# => [#<List @elems=[1, 2]>, #<List @elems=[3, 4]>, #<List @elems=[5, 6]>]
However, if we define the #to_ary
type coercion method (remember that’s the implicit option), then Ruby will happily call this method for us, as in:
class List
def to_ary
@elems
end
end
[List.new([1, 2]), List.new([3, 4]), List.new([5, 6])].flatten
# => [1, 2, 3, 4, 5, 6]
You can see how it’s implemented in various Ruby implementations like CRuby, TruffleRuby, and Rubinius. You can also see how it’s implemented in the Sorbet type system, which handles these kinds of special conversions in what it calls “instrinsic” methods.
This is an example of type coercion that happens because of a standard library method call. But as mentioned above in the definition of type coercion, it can also happen as a result of syntax.
Type coercion from syntax
Let’s say we have an array of integers and we want to convert them into an array of strings. We can accomplish that with the #map
method that accepts a block and a type cast (the explicit option), as in:
[1, 2, 3].map { |value| value.to_s }
# => ["1", "2", "3"]
We can also create a proc that will be used for the mapping and then pass it as the block for the method by using the &
unary operator, as in:
mapping = proc { |value| value.to_s }
[1, 2, 3].map(&mapping)
From the perspective of the #map
method these two snippets are equivalent as in both it receives a block. There’s also the more terse version of this type conversion that most Rubyists prefer, which is to instead pass a symbol that represents the mapping, as in:
mapping = :to_s
[1, 2, 3].map(&mapping)
# => ["1", "2", "3"]
In the above snippet, we’re taking advantage of the fact that Symbol
has the #to_proc
implicit type conversion method defined on it, which Ruby will take advantage of when using the &
operator. In fact, it will do this on any object that has that conversion method defined, as in:
class Double
def to_proc
proc { |value| value * 2 }
end
end
[1, 2, 3].map(&Double.new)
# => [2, 4, 6]
This is an example of syntax implicitly calling methods on objects in order to achieve a type conversion.
Type conversion interfaces
Some languages have explicit interface constructs that allow the developer to define a set of methods that an object should respond to in order to say they “implement” that interface. (These are also sometimes called traits.) In Ruby they are implicit, though this doesn’t make them any less common. Here is a list of the most common ones that I could find in use within the standard library:
to_a
/to_ary
- converting toArray
to_h
/to_hash
- converting toHash
to_s
/to_str
- converting toString
to_sym
- converting toSymbol
to_proc
- converting toProc
to_io
- converting toIO
to_i
/to_int
/to_f
/to_c
/to_r
- converting to variousNumeric
subtypesto_regexp
- converting toRegexp
to_path
- converting to aString
to be used to represent a filepathto_enum
- converting toEnumerable
to_open
- exclusively used byKernel#open
to convert the object its attempting to open into a URL or path
There are also the relatively new pattern matching interfaces:
deconstruct
- for converting an object into an array or find pattern for matchingdeconstruct_keys
- for converting an object into a hash pattern for matching
As well as some very esoteric ones, like sleep
accepting anything that responds to divmod.
All of these methods can be used for type coercion or type casting, depending on the context. Sometimes Ruby will trigger these method calls implicitly but any developer can also call these methods explicitly. Below I’ll go into some more details about where you can find these conversions and how they’re used.
to_a
Used by the Kernel.Array
and Queue#initialize
methods, but much more commonly used by the splat operator. For example, if you have an object that responds to #to_a
, it can be splatted into an array, as in:
class List
def initialize(elems)
@elems = elems
end
def to_a
@elems
end
end
[1, *List.new([2, 3]), 4]
# => [1, 2, 3, 4]
to_ary
List of methods
Array#&
Array#+
Array#-
Array#<=>
Array#==
Array#[]=
Array#concat
Array#difference
Array#flatten
Array#flatten!
Array#intersection
Array#join
Array#product
Array#replace
Array#to_h
Array#transpose
Array#union
Array#zip
Array.initialize
Array.new
Array.try_convert
Enumerabe#flat_map
Enumerable#collect_concat
Enumerable#to_h
Enumerable#zip
Hash#to_h
Hash.[]
IO#puts
Kernel.Array
Proc#===
Proc#call
Proc#yield
Process.exec
Process.spawn
String#%
Struct#to_h
#to_ary
is used in syntax for destructuring and multiple assignment. For example, let’s say you have a Point
class that represents a point in 2D space, and you want to print out just the x
coordinate from a list of points. You could define the class such as:
class Point
attr_reader :x, :y
def initialize(x, y)
@x = x
@y = y
end
end
Then when you loop through to print the x
coordinate, you would:
[Point.new(1, 2), Point.new(3, 4)].each { |point| puts point.x }
This works, but you can take advantage of the fact that block arguments can destructure values by defining #to_ary
, as in:
class Point
def to_ary
[x, y]
end
end
[Point.new(1, 2), Point.new(3, 4)].each { |(x, y)| puts x }
Similarly, you can destructure within multiple assignment as in:
x, y = Point.new(1, 2)
x
# => 1
to_hash
List of methods
Enumerable#tally
Hash#merge
Hash#replace
Hash#update
Hash.[]
Hash.try_convert
Kernel.Hash
Process.spawn
#to_hash
is used with the double splat (**
) operator. For example, if you have some kind of object that represents parameters being sent to an HTTP endpoint, you can:
class Parameters
def initialize(params)
@params = params
end
def to_hash
@params
end
end
class Job
def initialize(foo:, bar:); end
end
parameters = Parameters.new(foo: "foo", bar: "bar")
Job.new(**parameters)
In the above example, we’re taking advantage of the implicit type conversion performed by the double splat operator in order to call #to_hash
on our Parameters
object.
to_s
List of methods
Array#inspect
Array#pack
Array#to_s
Exception#to_s
File#printf
Hash#to_s
IO#binwrite
IO#print
IO#puts
IO#syswrite
IO#write
IO#write_nonblock
Kernel#String
Kernel#warn
Kernel.sprintf
String#%
String#gsub
String#sub
#to_s
is called implicitly whenever an object is used within string interpolation. (It’s minorly inconsistent in that most of the time in Ruby #to_str
is used to implicitly convert to a String
.) So, for example, if you have:
"#{123}"
this is equivalent to calling:
123.to_s
This is why in most Ruby linters it will show a violation for the code "#{123.to_s}"
because it’s redundant.
to_str
List of methods
Array#*
Array#join
Array#pack
Binding#local_variable_defined?
Binding#local_variable_set
Dir.chdir
ENV.[]
ENV.[]=
ENV.assoc
ENV.rassoc
ENV.store
Encoding.Converter.asciicompat_encoding
Encoding.Converter.new
Encoding.default_external=
Encoding.default_internal=
File#delete
File#path
File#to_path
File#unlink
File.chmod
File.join
File.new
File.split
IO#each
IO#each_line
IO#gets
IO#read
IO#readlines
IO#set_encoding
IO#sysread
IO#ungetbyte
IO#ungetc
IO.for_fd
IO.foreach
IO.new
IO.open
IO.pipe
IO.popen
IO.printf
IO.readlines
Kernel#gsub
Kernel#instance_variable_get
Kernel#open
Kernel#remove_instance_variable
Kernel#require
Kernel#require_relative
Kernel.`
Module#alias_method
Module#attr
Module#attr_accessor
Module#attr_reader
Module#attr_writer
Module#class_eval
Module#class_variable_defined?
Module#class_variable_get
Module#class_variable_set
Module#const_defined?
Module#const_get
Module#const_set
Module#const_source_location
Module#method_defined?
Module#module_eval
Module#module_function
Module#protected_method_defined?
Module#remove_const
Process.getrlimit
Process.setrlimit
Process.spawn
Regexp.union
String#%
String#+
String#<<
String#<=>
String#==
String#===
String#[]=
String#casecmp
String#center
String#chomp
String#concat
String#count
String#crypt
String#delete
String#delete_prefix
String#delete_suffix
String#each_line
String#encode
String#encode!
String#force_encoding
String#include?
String#index
String#initialize
String#insert
String#lines
String#ljust
String#partition
String#prepend
String#replace
String#rjust
String#rpartition
String#scan
String#split
String#squeeze
String#sub
String#tr
String#tr_s
String#unpack
String.try_convert
Thread#name=
Time#getlocal
Time#localtime
Time.gm
Time.local
Time.mktime
Time.new
This is the main conversion method for objects into strings. It’s used all over the place in the standard library. This interface is very popular and can be seen even in recent pull requests to Ruby on Rails.
to_sym
This is a relatively common way to convert strings into symbols, but doesn’t get a ton of usage internally. The only method I could find in the standard library that converted an argument into a symbol using #to_sym
was Tracepoint.new
.
to_proc
Another one that doesn’t get used a ton internally, the only place I could find that used this in a method call was Hash#default_proc=
(which will convert its only argument into a callable proc if it isn’t already one). #to_proc
does get triggered through syntax, however, when passing a block argument (see the example above).
to_io
List of methods
File.directory?
File.size
File.size?
FileTest.directory?
IO#reopen
IO.select
IO.try_convert
This one is less well-known, but gets used a lot within the IO
and File
class to convert method arguments into objects that can be used as IO
-like objects.
to_i
List of methods
Complex#to_i
File#printf
Kernel#Integer
Kernel.Integer
Kernel.sprintf
Numeric#to_int
String#%
This begins a series of conversion methods for Numeric
subtypes, which all convert into each other. #to_i
converts into an Integer
object. It’s not all that commonly used compared in implicit ways compared to #to_int
, but still gets some usage in the methods listed above. Much more commonly this is the method that is called on a string to convert into an integer (and it accepts as an argument a radix for this purpose).
to_int
List of methods
Array#*
Array#[]
Array#[]=
Array#at
Array#cycle
Array#delete_at
Array#drop
Array#fetch
Array#fill
Array#first
Array#flatten
Array#hash
Array#initialize
Array#insert
Array#last
Array#pack
Array#pop
Array#rotate
Array#sample
Array#shift
Array#shuffle
Array#slice
Array.new
Encoding::Converter#primitive_convert
Enumerable#cycle
Enumerable#drop
Enumerable#each_cons
Enumerable#each_slice
Enumerable#first
Enumerable#take
Enumerable#with_index
File#chmod
File#printf
File.fnmatch
File.fnmatch?
File.umask
IO#gets
IO#initialize
IO#lineno=
IO#pos
IO#putc
IO#tell
IO.for_fd
IO.foreach
IO.new
IO.open
IO.readlines
Integer#*
Integer#+
Integer#-
Integer#<<
Integer#>>
Integer#[]
Integer#allbits?
Integer#anybits?
Integer#nobits?
Integer#round
Kernel#Integer
Kernel#exit!
Kernel#exit
Kernel#putc
Kernel.Integer
Kernel.exit!
Kernel.exit
Kernel.putc
Kernel.rand
Kernel.sprintf
Kernel.srand
MatchData#begin
MatchData#end
Process.getrlimit
Process.setrlimit
Random#seed
Random.rand
Range#first
Range#last
Range#step
Regexp.last_match
String#%
String#*
String#[]
String#[]=
String#byteslice
String#center
String#index
String#insert
String#ljust
String#rindex
String#rjust
String#setbyte
String#slice
String#split
String#sum
String#to_i
Time#getlocal
Time#localtime
Time.at
Time.gm
Time.local
Time.mktime
Time.new
Time.new
Time.utc
This is a very commonly used conversion method for converting to Integer
. A ton of methods will call this on arguments passed in to allow any kind of object to be used. It can also be triggered implicitly by setting the $.
(the line number last read by the interpreter). I have no idea why that’s in there, but it is.
to_f
List of methods
Complex#to_f
File#printf
Integer#coerce
Kernel#Float
Kernel.Float
Kernel.sprintf
Math.cos
Numeric#ceil
Numeric#coerce
Numeric#fdiv
Numeric#floor
Numeric#round
Numeric#truncate
String#%
A method of converting an object into a float. Not all that commonly used except for when a developer wants to avoid integer division.
to_c
One of, if not the most, esoteric one I could find in the standard library. This is a way of converting an object into a Complex
number type, which is only used by the Kernel.Complex
method.
to_r
List of methods
Complex#to_r
Numeric#denominator
Numeric#numerator
Numeric#quo
Time#+
Time#-
Time#getlocal
Time#localtime
Time#new
Time.at
The final numeric conversion interface. #to_r
is used to convert an object into a rational number. It gets most of its usage in the Time
class.
to_regexp
A less-used interface for converting an object into a regular expression, this only gets used internally within the Regexp
class in the Regexp.try_convert
and Regexp.union
methods.
to_path
List of methods
Dir#initialize
Dir.[]
Dir.chdir
Dir.children
Dir.chroot
Dir.each_child
Dir.entries
Dir.foreach
Dir.glob
Dir.mkdir
File#path
File#to_path
File.ftype
File.join
File.mkfifo
File.new
File.realpath
File::Stat#initialize
IO#reopen
IO#sysopen
IO.copy_stream
IO.foreach
IO.read
IO.readlines
Kernel#autoload
Kernel#open
Kernel#require
Kernel#require_relative
Kernel#test
Module#autoload
Process.spawn
I like this one a lot because it’s one of the few on this list that is named after the role that the converted object will fulfill as opposed to the type of object that is expected. That is to say, #to_path
converts an object into a String
that will function as the representation of a filepath. It’s used mostly within the Dir
, File
, and IO
classes.
to_enum
Enumerable#zip
interestingly allows you to pass any object that responds to #to_enum
. This was the only mention of this method that I could find in the standard library.
to_open
Similarly to #to_enum
, #to_open
is also only used in one place: Kernel#open
. Anything that you pass to that method that responds to #to_open
will be converted implicitly.
Pattern matching
When pattern matching was introduced into Ruby, we got two additional methods for implicit type conversion: deconstruct
and deconstruct_keys
.
deconstruct
If you’re matching against an object as if it were an array, then deconstruct
will be called implicitly. For example:
class List
def initialize(elems)
@elems = elems
end
def deconstruct
@elems
end
end
case List.new([1, 2, 3])
in 1, 2, 3
# we've matched here successfully!
in *, 2, *
# we would match here successfully too!
end
deconstruct_keys
If you’re matching against an object as if it were a hash, then deconstruct_keys
will be called implicitly. For example:
class Parameters
def initialize(params)
@params = params
end
def deconstruct_keys(keys)
@params.slice(keys)
end
end
case Parameters.new(foo: 1, bar: 2)
in foo: Integer
# we've matched here successfully!
in foo: 1, bar: 2
# we would match here successfully too!
end
coerce
No discussion of type conversions in Ruby would be complete without mentioning the coerce
method. coerce
is an interesting little method that is used for converting between different numeric types. It allows you to effectively hook into methods like Integer#*
without having to monkey-patch it. Say, for example, you were defining your own special class that you wanted to support numeric computations:
class Value
attr_reader :number
def initialize(number)
@number = number
end
def *(other)
Value.new(number * other)
end
end
value = Value.new(2)
value * 3
# => #<Value @number=6>
This works well. However, if you reverse the operands for the *
operator, Ruby breaks it down to 3.*(value)
, which results in TypeError (Value can't be coerced into Integer)
. If you define the coerce
method, however, this can be accomplished, as in:
class Value
def *(other)
case other
when Numeric
Value.new(number * other)
when Value
Value.new(number * other.number)
else
if other.respond_to?(:coerce)
self_equiv, other_equiv = other.coerce(self)
self_equiv * other_equiv
else
raise TypeError, "#{other.class} can't be coerced into #{self.class}"
end
end
end
def coerce(other)
case other
when Numeric
[Value.new(other), self]
when Value
self
else
raise TypeError, "#{self.class} can't be coerced into #{other.class}"
end
end
end
For good measure I’ve changed *
to attempt to coerce its argument as well. Now all of the numeric types play nicely and you can run 3 * value
without it breaking. You can see another example of this kind of coerce
implementation in the standard library in the Matrix class as well as in Ruby on Rails in the Duration class.
Conclusion
Type conversion is a subtle art baked into the very syntax of the Ruby programming language. As with most everything programming related, use it with caution and with the context of the team that will be maintaining the software you’re writing. Especially with type coercion, it’s very easy to write code that is very difficult to reason about.
That said, using the existing interfaces from the standard library and defining interfaces within your own applications can lead to very beautiful code that never needs to perform type checks.
← Back to home