The Inversion of Control pattern in the test of time

time to read 10 min | 1962 words

imageI run into a really interesting discussion on Twitter, I suggest you go over the whole thread, it is fascinating reading.

I have written DI / IoC business applications for a decade and I was heavily involved at a popular IoC container for about five years, including implementing some core features (open generic binding, which was a PITA to do). Given the scope of the topic, I didn’t want to try to squeeze my thoughts on the subject into a Twitter soundbite, hence, this post.

A couple of weeks ago I posted about how I would start a new project today. With just enough architecture to get things started, and not much more. Almost implicit in my design is the fact that the system is composable. You add functionality to the system not by modifying existing code but by adding code. That isn’t new by any means. A quick search of my blog shows a series of posts from 2012 and a system architecture from 2008. No new ground trodden here, then. So why bother writing this post?

RavenDB doesn’t use a container. This is a pretty big and non trivial project that has no container involved. In fact, I don’t usually pull in containers any longer. For a long while, I tried to push as much complexity as possible into the container. It helped that I was part of the team building the container, so I could actually go ahead and add features to the container. That allowed me to create a system that was driven by convention. As long as you followed the convention, things magically worked and everyone was productive. If you didn’t follow the convention, well, I would need to debug that. Other people on the team could figure things out, but it generally fell on me (not that I minded).

The backend for RavenDB Cloud is the first time in a while that I took part in what you can consider as a business application rather than an infrastructure component. And that backend uses a container, IoC, interfaces, multiple dispatch, etc. It makes for a codebase that can adapt quickly, but also adds complexity. In the case of the cloud backend, just to name a few core features, we have: storage, machine allocations, recovery from failure, billing and monitoring. Each one of those may have multiple implementations (each cloud does storage and deployment differently, different accounts have different plans, etc).  Much of this is handled via implementing the relevant interfaces and dispatching to the right location based on the context of the operation.

In many ways, it works like magic. And it allows us to iterate quick and deploy to three separate cloud providers in a short amount of time. It is also magic. Much of our team is actually infrastructure developers. That has a totally different mindset than business app development. When I saw how these developers, with the infrastructure background, worked with the cloud backend, it was very instructive. To them, it was magic, and impenetrable at first. Interestingly enough, they didn’t need to understand all that was going on to get things done. We made sure that they did, after a while, but the IoC allowed us to ignore such concerns until later (gimme a cluster, don’t worry about how it is wired to the rest of the system).

The auto-wiring is one part of what you’ll typically get from a container. There are other, equally important parts, that don’t generally get as much attention: Using IoC usually means a decomposed systems, which is easier to test independently. And in addition to satisfying dependencies, the container is also in charge of managing the lifetime (or scope) of instances.

Let’s talk about the decomposed system and isolated testing first, because this tend to be a high priority for many people. I’m against such systems. Not because it makes testing easier (although keep that in mind, I’ll have something to say about it shortly) but because it is generally a very short slippery slope toward interface explosion. You end up with a lot of interfaces that have a single implementation. You now have composition issue, it is hard to figure out what is the flow of the code because everything is dynamically composed. That lead to a bunch of problems when you read the code (you have to jump around to understand what is going on) as well as performance issues (you can’t inline methods, you have to do interface calls, etc). Out of those, the first issue is far more important, mind.

Surprisingly, given that we have decomposed to small pieces to be able to work with each item independently, we are now in a much worst position if we want to change something. Because the code is scattered in many different locations and is composed on the fly, if I want to make a significant change, I have to make it in many places. To give a concrete example, let’s say that I need to pass a correlation token through my system, to do distributed tracing. I have to modify pretty much all the interfaces involved to pass this token through. And that lead us to the issue I promised with the tests.

A system that is composed of independent interfaces / implementations is easy to test in isolation. Because each implementation is independent and isolated from other areas of the system. The issue with such a system is that each individual component isn’t really doing much on its own. The benefit of the system is from multiple such components are assembled and working together. So the critical functionality that you have is the composed bundle, as well as the container configuration. But to test that, you need a system test. So you might as well structure you system so that system tests are easy, fast and obvious. Here is another way to do just that.

Finally, we get to the issue of lifetime management. It is easy to ignore just how important this feature is. Usually, you have three lifetimes in your application:

  • Singleton – for the entire application.
  • Transient – get a new instance each time.
  • Scoped – get the same instance in the same scope (typically a single requests).

Being able to rely on the container to manage lifetime is huge, because it is easy to mess things up. A good container will also match dependencies by their lifetimes. So if you have a singleton component it cannot accept a transient component since the lifetimes don’t match (but the other way around is obviously fine). There is an issue here as well. If you are injecting the dependencies, it is easy to lose track of the lifetime of your dependencies. It is easy to get into a situation where you (inadvertently, even) use a dependency to manage state between invocations and not realize that you have now relying on the lifetime of a dependency (or a dependency of dependency).

You might have noticed a theme in this post. I’m outlining a lot of problems, but no solutions. I’ll get to that in a bit, but I wanted to explain something important. Writing non trivial software is complex. This is the nature of the beast. We can re-arrange the complexity or we can sweep it under the rug. There are good use cases for either option, but I would rather that people make this choice explicitly. What you can’t do is eliminate the complexity entirely, at best, you have tamed it.

Earlier, I said that RavenDB doesn’t use a container, which is true (somewhat). But it is using inversion of control. A lot of the core classes are using constructor injection, for example. Let’s take what is probably the most important class we have, DocumentDatabase. That is the class that represent a database inside a RavenDB process. It accept its dependencies (the configuration, the server it is running on, etc) and then is constructed. We don’t use a container here because the setup process of a database in RavenDB is complex. We first create the DocumentDatabase instance, then we have to initialize it. Initializing a database may mean running recovering, loading a lot of data from disk, etc. So we do that in an async manner. When a request comes in for a particular database, we get it, or wait until it is loaded. We will also dispose the database if it has been idle for enough time. So in this case, we have complex (async) initialization, in which we have to deal with a lot of failure modes. We also have a lifetime scope that is based on idle time, which doesn’t fit the usual modes for a container.

Because we manually control how we create the database instance, it is explicit what its dependencies, lifetime and behavior are. We have quite a few example of such classes. For example, the database instance holds DocumentStorage, AttachmentStorage, etc. It is important to note that the number if finite and relatively small. It allow us to reason about the interaction in the database in a static and predictable manner.

Remember when I said that we don’t use a container? That is almost true. There is one location where I wrote our own mini container. One thing that RavenDB has a lot of is Endpoints. An endpoint is the method that handles a particular HTTP request. At last count we had over 300 of them. I don’t have the time / willingness to wire all of these manually. That would put undue burden on developing a new endpoint. And that is the key observation. For stuff that doesn’t change very often (the structure of the database), we do things manually. For the things that we add a lot of (endpoints), we make it as smooth as possible. Adding a new endpoint is adding a class that inherit from a known base class, and that is pretty much it.

Our routing infrastructure will gather all of the implementation, wire up the routing and when a request come in will create an instance of the class in question, inject it the relevant context (what database it is running on, the current request, etc) and then execute it. Just like a container would, in fact, because for all intents and purposes, it is one. What we have done is optimize one aspect, which we deal with often, while manually dealing with the stuff that is rarely changing. That means that if I do need to make a change there, the level of magic involved is greatly reduced. And in RavenDB in particular, we can and have measured the difference in performance between running things through any abstraction layer and doing things directly. To the point where in certain parts of our codebase, an interface method invocation is forbidden because the cost would be too high.

There is another aspect of this architecture, it means that the easiest thing in our code would be to add a new endpoint. That being the easiest thing, it is usually what will happen. This means that we’re far more likely to follow the open/closed principal. It also lead to most of our code looking fairly similar in shape. That make maintenance, code reviews and the act of writing new code a lot simpler. I don’t have to make decisions about structure, I just have to let the code flow.