Introducing: RSpec Smells and Where to Find Them

Intro

A few of you who know me will know I've been threatening to write a book for something like a year and a half now. Life happens, things get in the way, I changed jobs. However! Ground has been broken. This is a quick post representing an extraction of maybe a third of a chapter of the book. My working title is "RSpec Smells and Where to Find Them". Whatever the final title, the subtitle will likely remain "helping you to love your RSpec suite a little more".

On the advice of the ever wonderful Sandi Metz, I've decided to start by just writing pieces on this blog. Eventually, I'll assemble them all and make them into a more coherent arc. Please let me know if you like what you find here. Please let me know if you disagree with any of the content, or have feedback. You can always reach me on twitter: @samphippen.

A quick but very important set of acknowledgements:

  • Chandra Carney served as the primary editor for this post. It would not make as much sense without her wonderful efforts.
  • Sarah Mei provided excellent feedback on the content of this post with a particular view to assumed knowledge. This post is definitely more beginner friendly for the help Sarah provided.

What is book?

As to the content itself: the book itself is a compendium of "testing smells" as they apply to RSpec. I'll explain what I mean by testing smell a little further down. The text will contain some frontmatter to help beginners get more familiar with the features of RSpec that are discussed. I haven't written them here, so if you're new and you're not yet totally comfortable with ideas like stubbing, mocking, doubles, and spies, you may want to watch the first part of this talk (one of mine). It contains a quick explanation of the differences between mocks and stubs, and should serve as a useful primer for the material discussed within.

One other piece of frontmatter: many of the ideas contained within run parallel to the ideas of others. At minimum I'd recommend also checking out Justin Searls' repo on test smells. Justin and I have discussed ideas herein at length. I've also had conversations around this stuff with Sarah Mei, Noel Rappin, Adam Cuppy and others. My ideas definitely don't stand on their own. I think that's what makes stuff like this great. These ideas represent a mix of others' thoughts and my own and I really hope that you enjoy what I've written.

The good bit

Why do we test?

Before I get in too far, I'd like to discuss motivations for testing - what our goals are when we write tests with RSpec. I think this is important because I'll be referring to core testing concepts during the discussion of a number of the smells. As such, you can use this as a chapter to understand the basis of my philosophy of why I think testing is extremely important.

The first and perhaps most evident reason we write test code: verifying the behaviour of our applications. Were I to survey developers with respect to large, I suspect this would be the primary reason they say that they test their code.

Ruby is more or less as dynamic as is possible for a programming language to be. The very nature of dynamic programming languages means that they are difficult to reason about in a static context[1]. That is without actually running the programs it is difficult to determine any of their properties. So, in Ruby, we test. We test more than maybe any other programming community out there. The reason for this is that we need dynamic feedback on whether or not our programs behave as we intend. Tests allow us to create dynamic contexts we control. By running tests we can observe the behaviour of small parts of our programs, dynamically, the only way we can in Ruby.

Behaviour verification is perhaps the first reason anyone will tell you that they write tests. However, there is another reason that testing is extremely useful, and a lot of experts might even list it as the most important thing testing gives us. The tests that you write are a secondary artefact of your system. Tests are a lens through which you can observe the design of your code.

When you see "good tests" you gain a high degree of confidence that the system under test is well-designed. The inverse is also true, if you see a test that is complex, or otherwise confusing, you can be almost certain that the code under test may have problems with its design.

The reasons for this are subtle, yet very important. When writing tests against our Ruby/Rails applications, we build worlds in which our tests must run. Often those worlds are partially fake. They do not represent the reality of production in which our application runs. The patterns of creation in the fake test worlds are a comment on how easy or hard it is to work with a small piece of code. In legacy systems we often see that individual setup for tests is very complicated. That's fine - here our tests tell us exactly in what ways our system might be problematic. They also indicate ways we can solve those problems.

This "design feedback" reason is also the exact reason we teach a lot of beginner and intermediate level developers to practise "TDD". The process of writing a small, focused test ahead of every line of code puts that design feedback inline in the moment to moment of writing code. When newer developers start doing TDD for the first time they often find it difficult. One reason for this is that when TDD gets hard, if you're not used to it, it's hard to differentiate whether you've made a mistake or whether your system has problematic design.

If you're new: generally, it's the latter reason. Try breaking apart the piece of code on which you're working. Doing TDD on smaller pieces is generally easier than doing TDD on larger pieces. #protip.

When it comes to TDD or design feedback in general: I think RSpec is actually one of the harder to use testing frameworks. RSpec is a very flexible testing tool: it offers many different constructs for whipping up fake testing worlds. The use of these different constructs has wildly varying implications about the architecture of the system under test. However all of these constructs are presented through similar, easy-to-use interfaces.

Through the power and ease of these constructs, RSpec dampens much of the design feedback that one typically gets from tests. The flexibility that RSpec offers is definitely a tradeoff. It means that legacy systems can be tested with relatively little friction. It means that new systems can be developed with a flexible pattern of system architecture.

However, one huge downside is that it means it's quite easy to write a test that actively hurts the health of your test suite. It's easy to do this without realising that the thing that you've done might be considered a mistake in the hands of an expert.

This is my strongest motivation for writing this book. I think RSpec is a very powerful tool, one that makes it easy to hurt yourself without realising you've done so. I want to help readers develop an "RSpec Sense". To understand what is a good test, and what might be a problematic test. I want you to understand that good tests can lead to vastly improved system architecture. I want you to love your test suite a little more.

Code smells

The term "code smell" comes from a chapter of Martin Fowler's refactoring book. The chapter was co-authored by Martin Fowler and Kent Beck. The idea of a "code smell", as they introduce it, is the giving of a name that can be used to refer to a bigger idea. The idea represents a pattern of common potential technical debt found in many applications. Fowler and Beck also provide the reader with a curative refactoring for each code smell they introduce. In this way, code smells assembled together and understood become a vocabulary to help us make our code better.

Many developers use the term "code smell" casually to refer to problems they see in code. More often than not, we use "x is a code smell" to describe our opinion about code that we don't like. This use of the term does not fit with the original meaning. In the Code Smells chapter of the refactoring book Beck and Fowler lay out 22 code smells, each with a name and exact meaning. Every code smell has:

  • A precise means of identification
  • An explanation of the underlying cause
  • Most importantly: a refactoring recipe. A literal mechanical set of instructions for refactoring the code smell away. A mechanism made understandable to help you get out of the very specific problem in which you find yourself.

Like anything else in programming, code smells are not absolute. Beck and Fowler themselves advise developers to not treat them as an objective measure of bad code. Instead a code smell is a place where there might be a problem. In the refactoring book it is made crystal clear that opinion and nuance still matter. They commend that you should use your best judgement as a developer. Simply going in and identifying every code smell in your code base is a bad idea. Removing every single one: even worse.

Testing smells

The term "testing smell" is not a new one. With a quick Google you'll find uses of the term going all the way back to 2006. Testing smells, just like code smells, give a common name to potential problems in our tests. If we agree that the problem is real, they give us an exact way to get out of the problem. There is value in common vocabulary when writing software. My aim here is to help you build a vocabulary around testing - to strengthen your ideas so that you and your team have a shared way to talk about your testing methodology.

For the book, I'll be treating the testing smells I catalogue in a very specific way. Each one will have a number of cleanly laid out points:

  1. The name of the smell
  2. A brief description
  3. A precise method for identification
  4. A description of one (or more) likely underlying problems indicated by the smell
  5. A recipe for refactoring

Along the way there will be plenty of code samples. We'll discuss the problems, and I'll make suggestions for adjustments. I'll try to make my code samples as representative of the real world as possible, but that might not always be possible. With each example I'll do my best to explain the context that surrounds, and I hope that helps you understand why a certain smell might be a real issue.

All of the testing smells discussed within will certainly apply to RSpec suites. Some of the points are generic to other testing frameworks, but covering a global view of writing good tests is not my goal here. If your team works actively on a large Rails app, and you want to level up your tests, this is definitely for you.

The point is this: in the book, each testing smell will show a common usage of RSpec. Each one is something encountered in the wild, commonly within Rails and Ruby applications. Each one might indicate a problem with either your tests or your production code. I see tests as a secondary lens to understand how an application is structured. Understanding why a problem in test is present can often be a useful way of identifying a problem in your real code.

As a note: if you work on a legacy Rails app, this isn't a judgement on you or your team. Instead, this is meant to spark a discussion with you and your peers about how you might be able to improve, and love your test suite a little more.

Smell 1: Stubject

In this section I'm very fast and loose with the terms "mock" and "stub", but they can be used essentially interchangeably for the discussion that follows. Testing zealots: I acknowledge they mean different things, but both cause the issue described below.

Brief description

The first testing smell I'd like to discuss is what I'm going to call the "stubject" smell[2]. It refers to any test in which the test subject (system under test) is stubbed or mocked. This smell is also known as Contaminated Test Subject in td/unrealistic/cts[3], and "homeopathic testing" more generally.

Method of identification

In RSpec it is idiomatic (but not ever prevalent) to call out the subject of any individual test with the subject method. The subject method can also take a name (e.g.: subject(:shopping_cart) { ... }). Any calls to expect(subject).to receive(...) or allow(subject).to receive(...) are a sure fire indication this smell is present. If you use the named subject, you should look for calls to expect(...).to receive(...) and allow(...).to receive(...) with the named subject's name, checking that the name hasn't been rewritten in your test.

Let's look at an example (some setup omitted for brevity):

class Cart  
  def initialize(order, coupon)
    @order = order
    @coupon = coupon
  end

  def discount(item)
    line_items.index(item)
  end

  private

  attr_reader :order, :coupon

  def line_items
    order
    .line_items
    .includes(:products)
    .select { |p| coupon.products.include?(p) }
  end
end

RSpec.describe Cart do  
  subject { Cart.new(Order.new, Product.new) }
  let(:line_item_1) { double(:line_item_1) }
  let(:line_item_2) { double(:line_item_2) }

  describe "#call" do
    before do
      allow(subject).to receive(:line_items).and_return([line_item_1, line_item_2])
    end

    it "doesn't discount anything from the first line item" do
      expect(subject.discount(line_item_1)).to eq(0)
    end

    it "discounts 1 from the second line item" do
      expect(subject.discount(line_item_2)).to eq(1)
    end
  end
end  

In this example, we see a clear indication that the stubject smell is present. Looking in the before block we have a literal allow(subject).to receive(...), a direct indicator of this problem (and something for which you can grep as a fast means of identification accross a project).

Underlying problem

Let's look at our above example. What's a likely reason that this test came into being in the first place?

As I mentioned before: our tests serve as a reflection of our design. If writing a test that focuses on an individual object like this is difficult, it typically indicates that the pattern of dependencies in that object is problematic.

How can we set up a fake testing world for this object? The primary method that touches an outside dependency is the #line_items method. This means it's this method we'll have to satisfy in our fake testing world. In order to test this object we're presented with three options:

  1. Do the necessary setup in the database to satisfy this scope
  2. Stub the scope inside #line_items
  3. Stub the #line_items method itself.

The first of these solutions is likely complicated. We'd have to touch test factories, work out what properties our ActiveRecord models need and so on. You've also probably heard the idea that one should avoid hitting the database in test whenever possible. I don't always agree with that, but for now we discount this first possibility.

The second of these options, stub the scope inside #line_items, is also problematic. To explore why, let's bring the code sample in again:

def line_items  
 order
  .line_items
  .includes(:products)
  .select { |p| coupon.products.include?(p) }
end  

In order to stub everything this method does, we'd need to nest stubs inside order, line_items, includes, and possibly select (we could have includes return an array and expect select on array to work correctly). That sounds like a lot of work. Most developers would see that as "design feedback" that doing that nested stubbing is problematic - they'd be right. Nesting stubs inside other stubs is indeed a testing smell. I'll be covering it in another post.

So: the easiest option, the option that RSpec pushes us towards is stubbing the object under test - the very smell we're discussing. The stubject smell can occur in codebases both new and old. It can occur even if tests are leading the design. The underlying reason for this is that it's so easy to stub the object under test in RSpec. RSpec doesn't differentiate stubbing objects that are our test subject, and those that aren't.

Let's look at why this is problematic in the context of our example. Consider this alternate implementation of the Cart object:

class Cart  
  def initialize(order, coupon)
    @order = order
    @coupon = coupon
  end

  def discount(item)
    line_items.index(item)
  end

  private

  attr_reader :order, :coupon

  def line_items
    []
  end
end  

What'd we change? We changed the implementation of #line_items to always return an empty array. Does this pass the tests we have? Yes. On the system from which this example is lifted, all the tests passed. Why is that? Well, our test stubs the #line_items method. It takes the implementation that is provided in our production code and throws it away.

What stubbing is, at the most fundamental level, is modifying the existing implementation of an object to be easier to reason about. That modification means that we aren't testing the actual class of that object as it will go into production. Instead, when we stub an object, we're creating a new class (literally) that only exists in the testing environment. The stub throws away the original method implementation, replacing it with an RSpec one. This is fine for objects that we aren't directly testing. Those objects are often much more complicated than we need to test our code.

For objects we are directly testing, however, it's a different story. If you stub the object under test, you destroy the object that is focus of your test. You're testing a phantom other object, one that is similar, but not the same as, the object you think you're testing. When the stubject smell is present you have false confidence in the system that you're testing. You don't get behaviour verification, which is one of the primary goals of testing.

Most commonly, the stubject smell occurs where we have a pattern of complex dependency in an object we're trying to test directly. We don't want to stub that complex dependency because doing so would be difficult. Instead, we stub the object under test. When we talk about tests giving design feedback, this is exactly what we mean. If stubbing that dependency is hard, you should think about why that is the case, and probably apply the refactoring steps provided below.

While stubbing the object under test is tempting, I strongly advise you to almost never do it. Doing so gives weak tests, and makes refactoring the object under test much harder. I rank stubject as one of the worst testing smells that we'll discuss anywhere in the book. As such, while not all smells are definitely actual problems, this one almost always is.

Refactoring steps

The primary cause of the stubject smell is typically incorrect boundaries between objects, otherwise known as mixed levels of abstraction. To fix the problem, we can apply one of the following refactorings:

  1. If the method being stubbed is a single line of code such as a series of complex calls on a collaborator object (e.g. ActiveRecord scope)
    1. Move it on to the collaborator object using the Hide Delegate refactoring
    2. Stub the call to the new method on the collaborator object
  2. If the method being stubbed is made of multiple lines of code
    1. Perform an Extract Method Object refactoring on the method
    2. Stub the method object

Note: when performing the Extract Method Object refactoring it may be the case that you need to look at the rest of the code in the object under test. It may be the case that instead of Extract Method Object you need to split the object along some abstraction boundary of your choosing.

Let's look at an example of each of our two refactoring paths. Re-introducing our Cart class:

class Order < ActiveRecord::Base

end

class Cart  
  def initialize(order, coupon)
    @order = order
    @coupon = coupon
  end

  def discount(item)
    line_items.index(item)
  end

  private

  attr_reader :order, :coupon

  def line_items
    order
    .line_items
    .includes(:products)
    .select { |p| coupon.products.include?(p) }
  end
end  

We move the scope on to the Order class:

class Order < ActiveRecord::Base  
  def line_items_for_coupon(coupon)
    line_items
      .includes(:products)
      .select { |p| coupon.products.include?(p) }
  end
end

class Cart  
  def initialize(order, coupon)
    @order = order
    @coupon = coupon
  end

  def discount(item)
    line_items.index(item)
  end

  private

  attr_reader :order, :coupon

  def line_items
    order.line_items
  end
end  

and the implementation of Cart#line_items becomes order.line_items.

In our test, we now stub the order collaborator, instead of stubbing the subject directly:

RSpec.describe Cart do  
  subject { Cart.new(order, Product.new) }
  let(:order) { Order.new }
  let(:line_item_1) { double(:line_item_1) }
  let(:line_item_2) { double(:line_item_2) }

  describe "#call" do
    before do
      allow(order).to receive(:line_items).and_return([line_item_1, line_item_2])
    end

    it "doesn't discount anything from the first line item" do
      expect(subject.discount(line_item_1)).to eq(0)
    end

    it "discounts 1 from the second line item" do
      expect(subject.discount(line_item_2)).to eq(1)
    end
  end
end  

The difference between this and the original test is subtle. The important thing to note now is that instead of calling allow(subject), we're calling allow(order). This fundamentally changes the boundaries of the test. We know that the cart object under test is being tested in its entirety. We know that the methods reliably invoke each other correctly. We don't have tests that show the whole system works. After this refactoring we have evidence the cart object in itself functions, at least with stubbed dependencies.

We have alleviated the stubject smell here. It's also worth noting: we had to change our production code to do so. That's very common with making test improvements. Your tests are a lens on the architecture of your system. Understanding pain in your tests often helps you improve your production code.

As a general tip: seeing ActiveRecord scope calls outside of ActiveRecord objects/class indicates a problem with system design. Stubject is one of the ways it's possible to sniff out that problem and fix it.

Above I advised choosing between one of two different refactorings based on the situation. The cart example is good for the first one, but we need a new example to discuss the second. The following example is somewhat synthetic, but is adapted from an application on which I have actually worked.

class ShipmentDispatcher  
  def initialize(user)
    @user = user
  end

  def dispatch
   user.new_shipments.each do |s|
     ThirdPartyShipmentNotifier.new(s).perform_async
   end

   guard
  end

  private

  attr_reader :user

  def guard
    while true do
      yet_to_dispatch = Shipments.recent.where(:shipped => false)
      break if yet_to_dispatch.empty?
      user.update(:waiting_shipments => yet_to_dispatch)
      sleep 5
    end

    user.update(:waiting_shipments => [])

    user.reload
  end
end  

Let's quickly talk about what this object does: when a user has new shipments they get marked as active. We also need to dispatch these shipments to a third party API. When we do so we need to confirm back to the user that shipments have gone immediately. The ThirdPartyShipmentNotifier object is a background worker - it sends calls to the API in the background. The job of the guard method is to wait until all the shipments have been confirmed, it then returns the model to the caller of #dispatch.

In our test for this object, #guard gets stubbed. The messiness of having to wait 5 seconds to confirm that state has changed was problematic for the team working on this code. Calls like allow(subject).to receive(:guard).and_return(true) and allow(subject).to receive(:guard).and_return(false) are prevalent throughout the test file.

This is another example of the stubject smell, we can completely break the #await method and still pass all of our tests. This method is also complex, calling differing methods on outside dependencies. This seems like a prime candidate to apply our second refactoring step, but let's first look at the #guard method a little more closely:

  def guard
    while true do
      yet_to_dispatch = Shipments.recent.where(:shipped => false)
      break if yet_to_dispatch.empty?
      user.update(:waiting_shipments => yet_to_dispatch)
      sleep 5
    end

    user.update(:waiting_shipments => [])

    user.reload
  end

We could just directly follow the refactoring recipe of extracting a method object, but I don't think that's a great idea. The reason? This method is doing two separate things:

  1. Waiting for all the shipments to be shipped
  2. Ensuring the model record is up to date

The concern of waiting for shipments is the one that contains all of the complexity. Ensuring the record is up to date is done via a simple #reload call. Instead of directly extracting the whole method to a method object, I'm going to refactor all but the last line. This does mean we'll have to stub multiple calls, but I think this will end up telling a better story.

This is an example of deviating from the recipe. That's fine. With all of the smells contained within, the suggested steps are just that, suggested. Understanding the cause of the problems, and discussing fixes with your team is going to yield better results than blindly following what I've written here.

So, refactoring:

class DispatchAwaiter  
  def initialize(user)
    @user = user
  end

  def call
    loop do
      pending = Shipments.active.where(:shipped => false)
      break if pending.empty?
      user.update(:waiting_shipments => waiting)
      sleep 5
    end

    user.update(:waiting_shipments => [])
  end

  private

  attr_reader :user
end  

And the original implementation of guard:

def guard  
  DispatchAwaiter.new(user).call
  user.reload
end  

Now, this is the point where our suggested refactoring terminates. We've fixed the stubject problem, and we can definitely just stub onDispatchAwaiter and user. However, both of these lines of code act directly operate on the user object. As such, I'd make one more move here. I'd move both lines of code on to the user. We'd end up with something like this:

class User < ActiveRecord::Base  
  def guard
    DispatchAwaiter.new(self).call
    reload
  end
end  

Original:

def guard  
  user.guard
end  

Now, we can replace our allow(ShipmentDispatcher).to receive(:guard) calls with allow(user).to receive(:guard).and_return(...) calls. Again, we've moved the code that was hard to test somewhere else. This allows us to work with the object under test easily. We can worry about the more complex code elsewhere. Usually, when building an isolated test around a piece of code with a complex dependency we struggle. This is exactly the place where we find occurrences of the stubject smell. Fortunately it's usually easy to cover this code with integration tests then refactor the stubject smell away.

Summary

So, let's quickly summarise. The stubject smell refers to a test which stubs or mocks a method on the object under test. This pattern is problematic because it means that the implementation being tested is one that will never see production. A test with the stubject smell indicates a design problem and typically gives us little to no coverage value. Typically the smell occurs because the object under test has a complex method touching multiple dependencies which are individually difficult to stub or mock.

There are two solutions provided above:

  1. Extract a delegate method and stub that
  2. Extract a method object and stub that

It's worth noting that neither of these solutions are perfect. You end up in a place where you still have other testing smells that I have yet to encode for you to read. That's fine. One of the themes of the book will be moving from the most problematic testing smells to less problematic ones. Perfect is the enemy of good, and knowing where the 80/20 of testing value for you is really important.

Now, as a final warning, please don't go away and remove all the stubjects from your application. Some of them will be there for genuinely meaningful reasons. Take a look at each one and have a think and talk to your team. If you feel good about it, go ahead and do a refactor, and please do let me know how you get on.

Thanks for reading ;)

Footnotes

  1. As opposed to staticly typed languages, where many properties of the program can be reasoned about before the program is running. There's a whole book in this idea, but indeed that is for another time. ^
  2. The best part of writing this book is getting to come up with lots of silly names. ^
  3. I will be using this syntax to refer to other people's resources where I find them instructive. For this post, mostly it'll be examples from Justin (I'm not stealing your ideas Justin, even though you've basically also come up with everything in here, I love you 💚). ^