Linting Ruby

The last couple of weeks I’ve been thinking about the process of linting Ruby code. For the most part within the Ruby community we’ve pretty much standardized on rubocop, and for good reason - it’s an impressive project with a massive breadth in terms of how far it is willing to go to guide you toward better code. Other tools within the community exist as well, including my personal favorite reek.

Looking at the state of these linters, it’s interesting to see that every one of them that I could find is based on other parsers. They seem to generally either use whitequark/parser or seattlerb/ruby_parser. Both of these parsers are kept up-to-date with the latest Ruby features, but it begs the question: can we write a linter for Ruby code using just the standard library?

Ripper

Ruby ships with its own parser that you can use without installing other gems: ripper. To try it out, run the following code on your command line:

$ echo 'foo + bar' | ruby -rripper -e 'pp Ripper.sexp_raw(ARGF)'

(Briefly, this is printing out a string of code which then gets piped into a new Ruby process’ standard input. The Ruby process first requires ripper, then executes the code to process and print a concrete syntax tree back to standard out. You can also print entire ruby files into that same process using cat.)

Using this, we can see the concrete syntax tree (CST) that Ripper will generate for us:

[:program,
 [:stmts_add,
  [:stmts_new],
  [:binary,
   [:vcall, [:@ident, "foo", [1, 0]]],
   :+,
   [:vcall, [:@ident, "bar", [1, 6]]]]]]

This CST is a tree representation of our code. It includes location information for what ripper calls scanner events (basic leaf tokens in the tree) and array bodies for what ripper calls parser events (non-leaf nodes in the tree).

Fortunately, Ripper comes with ways to get access to these nodes as it’s parsing, which means we can detect certain patterns with remarkable efficiency. For example, if you wanted to detect when these kinds of binary nodes occurred, you could write your own ripper parser, like so:

class Parser < Ripper
  def on_binary(left, oper, right)
    pp [left, oper, right]
  end
end

Parser.new(ARGF).parse

Put that code into a parser.rb file and run:

$ echo 'foo + bar' | ruby -rripper parser.rb 

You’ll see it hit that binary node and then it did it printed out the arguments it received. Now that you know how to get information about what kind of nodes exist within the syntax tree of any Ruby file, you can start to match against certain patterns.

Assignment in condition

Let’s say for example that we want to detect any time someone puts an assignment into a condition. This would look something like:

if foo = bar
  return 'Equals'
end

You can see from this somewhat contrived example that this may have been a mistake. The author of this code likely was attempting to do a comparison (with ==) and instead accidentally used a single equals assignment operator. Just for safety’s sake, we want to warn the developer and disallow this kind of assignment.

First, we need to determine the pattern that we’re going to match against. To do that, we need to be able to see the tree that ripper is generating. Reusing our script from earlier, we can put this test code into test.rb and then run:

$ cat test.rb | ruby -rripper -e 'pp Ripper.sexp_raw(ARGF)'

We end up with a bit bigger of a tree this time:

[:program,
 [:stmts_add,
  [:stmts_new],
  [:if,
   [:assign,
    [:var_field, [:@ident, "foo", [1, 3]]],
    [:vcall, [:@ident, "bar", [1, 9]]]],
   [:stmts_add,
    [:stmts_new],
    [:return,
     [:args_add_block,
      [:args_add,
       [:args_new],
       [:string_literal,
        [:string_add,
         [:string_content],
         [:@tstring_content, "Equals", [2, 10]]]]],
      false]]],
   nil]]]

Here’s the important thing to notice within this tree: immediately descending from the if node as its first child (which represents the branch predicate) is an assign node. This becomes relatively trivial to find, as we can extend a base ripper parser with a small module to find these kinds of if nodes:

class Parser < Ripper::SexpBuilderPP; end

module AssignmentInCondition
  def on_if(predicate, *others)
    raise 'got an assignment in a condition' if predicate[0] == :assign
    super(predicate, *others)
  end
end

parser = Parser.new(ARGF)
parser.singleton_class.prepend(AssignmentInCondition)
parser.parse

puts 'Lint success.'

(You may be wondering why I would use singleton_class and prepend here - I’ll come back to that.) If we put this into linter.rb file and then run with our previous test.rb file, we get:

$ cat test.rb | ruby -rripper linter.rb
Traceback (most recent call last):
	2: from linter.rb:12:in `<main>'
	1: from linter.rb:12:in `parse'
linter.rb:5:in `on_if': got an assignment in a condition (RuntimeError)

Literal in condition

Now let’s try a more complex example. Let’s say we wanted to find any time a developer used a literal value (in this case a literal number, true, or false) inside a condition.

Effectively we have the same code as before, but with a new check to validate that the condition is a literal node. The module containing the check will look something like this:

module LiteralAsCondition
  def on_if(predicate, *others)
    raise 'literal found in condition' if literal?(predicate)
    super(predicate, *others)
  end
end

Now we need to write the literal? method. Fortunately, Ruby 2.7 has shipped with some new pattern matching syntax that is going to feel right at home in this context since we’re matching against well-known array patterns. The following code should accomplish what we want:

def literal?(node)
  case node
  in [:@int, *] | [:var_ref, [:@kw, 'true' | 'false']]
    true
  else
    false
  end
end

Here we’re expecting the node variable to be an array. If it contains @int as its first child, we’re going to match correctly. And if instead it contains a var_ref node with a @kw child that has the strings true or false, we’re also going to match correctly.

We can throw some extra spice on this by checking against binary nodes to make sure we don’t have a literal inside one side of an || statement (and use a little recursion for good measure):

def literal?(node)
  case node
  in [:@int, *] | [:var_ref, [:@kw, 'true' | 'false']]
    true
  in [:binary, left, :"||", right]
    literal?(left) || literal?(right)
  else
    false
  end
end

Adding onto our previous parser, we can add in this new parsing and everything should run just fine:

parser.singleton_class.prepend(LiteralAsCondition)

Linting

Now that we have the ability to match patterns that we want to find, it’s a small step to a full-fledged linter. We can add a reporter with some nice ANSI color codes to get our addicting green dots:

class Reporter
  def report_error
    print "\e[0;31;49mE\e[0m"
  end

  def report_failure
    print "\e[0;31;49mF\e[0m"
  end

  def report_success
    print "\e[0;32;49m.\e[0m"
  end
end

We can add a runner that will use Dir.glob(pattern) to get the correct files to lint (and maybe extend it later with some ignores). And we can take advantage of the way we structured our violation checks into modules to selectively turn them on and off:

def rules_from(config)
  config.default = { 'Enabled' => true }

  Module.new do
    Rules.constants.each do |constant|
      include(Rules.const_get(constant)) if config[constant.to_s]['Enabled']
    end
  end
end

parser = Parser.new(File.read(path))
parser.singleton_class.prepend(rules_from(config))
parser.parse

In the above we can selectively build a module at runtime that includes only the rules we want enabled, thereby drastically increasing speed if some rules are disabled. (This as opposed to still running with them and when they get hit to check the code returning because they’re in fact disabled.)

Wrapping up

I’ve bundled the code that this post references into its own project on GitHub that you can feel free to peruse. It’s only got three rules in it, and it’s definitely pretty nascient, but it’s fun either way - especially because you can run it like this:

$ ruby --disable-gems bin/rblint 'path/to/files/**/*.rb'

That --disable-gems option has massive speed implications depending on your system and what tool you’ve used to manage your Ruby versions.

tl;dr

You can write your own linter in Ruby using just the standard library, and it’s not too much code. Matching against syntax tree expressions is really nice in Ruby 2.7 with the new pattern matching syntax. Metaprogramming is fun. Ruby!

← Back to home