Ayende @ Rahien

By design, AR entities suffer from one major problem, they violate SRP, since the class is now responsible for its own persistence and for the business operations that relates to it. But that is an acceptable decision in many cases, because it leads to code that is much simpler to understand.

In most cases, that is.

The problem with AR is that it also treat each individual entity independently. In other words, its usage encourage code like this:

var person = Person.FindBy("Id", 1);
person.Name = "Oren";
person.Save();

This is a very procedural way of going about things, but more importantly, it is a way that focus on work done on a single entity.

When I compare the abilities of the Active Record style to the abilities of a modern ORM, there is really nothing that can compare. Things like automatic change tracking or batching writes to the database are part of the package with an ORM.

The sort of applications we write today aren’t touching a single row in the database, we tend to work with entities (especially root aggregates) with operations that can touch many rows. Writing those sort of applications using Active Record leads to complications, precisely because the pattern is meant to simplify data access. The problem is that our data access needs aren’t so simple anymore.

Active Record can still be useful for very simple scenarios, but I think that a major drive behind it was that it was so easy to implement compared to a more full featured solution. Since we don’t need to implement a full featured solution (it comes for free with your ORM of choice), that isn’t really a factor anymore. Using a full blown ORM is likely to be easier and simpler.

Sep 09 2010

Maintainability, Code Size & Code Complexity

time to read 4 min | 669 words

Tweet Share Share 39 comments

Tags:

Design

The most maintainable codebase that I worked with grew at a rate of about ~10 KLoC per month, every month. There wasn’t a large team there, it ranged fro 3 – 6 people. This is the project that I think about whenever I had to talk about good code bases.

It is a WebForms project (under protest, but it is).

What make it maintainable? Not the WebForms part, which wouldn’t come as a surprise. What make it maintainable is that it is the first project where I had applied the notion that large swaths of simple code is better than smaller code base with higher complexity. You could see example of this sort of architecture in the Alexandria and Effectus applications.

Note: I am doing things like measure KLOC here mostly because it is a number that I can measure. As I am saying in this post, KLOC has very little to do with maintainability.

The problem can be expressed well using the graphs. This is a representation of the complexity of the application as it grows, lower maintainability cost is better.

The problem with the smaller and more complex code base is that the complexity tends to explode very quickly. With the larger code base, you end up more or less on the same plane.

But the above chart is somewhat misleading, because it make the hidden assumption that in both code styles, you’ll have the same amount of code, which is obviously false. Here is what is probably a more accurate representation of how much code is written for style per # of features.

This doesn’t look right. Not when we compare it to the chart above. Of the top of my head, I would expect the second chart to be the mirror image of the first one. But it isn’t. And the reason that it isn’t is that each feature you write still cost you some amount of code. And the rate of growth per features added is pretty constant either way.

Putting the two charts together, you can see that it means that even code styles with focus on less code in favor of more complex solutions grow, and that as they grow, they become less maintainable.

So far I have been very vague when I was talking about “complex” and “simple”. In part, this is because those are somewhat hard to define. There would be people who would claim that ORM leads to complex codebases, but I would obviously disagree.

For my part, I also strongly advocate of having a strong core of infrastructure that gives services for the rest of the application, and that tend to be complex piece of coding. But that is also something that is fixed, once the infrastructure is written, it tend to be static, so that saves you from the fate outlined above.

When I look at a piece of code, I do the usual evals (nesting, conditionals, cyclomatic complexity, etc), but I also look at how often the developers reached for a hammer and beat everything around to submission. There is a lot of gut feeling here. But I do have one objective metric to use to tell you whatever a piece of code fit either style.

In the fight between Don’t Repeat Yourself an Single Responsibility Principle, SRP wins, every single time.

Any time that I have seem code that did double (or triple or dozenile) duty, it led to the sort of “we have to write less code” style of coding that ended up in a quagmire.

Sep 08 2010

RavenDB performance optimizations

time to read 2 min | 262 words

Tweet Share Share 15 comments

Tags:

Raven

Just to note, you’ll probably read this post about a month after the change was actually committed.

I spent the day working on a very simple task, reducing the number of writes that RavenDB makes when we perform a PUT operation. I managed to reduce one write operation from the process, but it took a lot of work.

I thought that I might show you what removing a single write operation means, so I built a simple test harness to give me consistent numbers (in the source, look for Raven.Performance).

Please note that the perf numbers are for vanilla RavenDB, with the default configuration, running in debug mode. We can do better than that, but what I am interested in is not absolute numbers, but the change in those numbers.

Here are the results for build 124, before the change:

Wrote 5,163 documents in 5,134ms: 1.01: docs/ms
Finished indexing in 8,032ms after last document write

And here are the numbers for build 126, after the change:

Wrote 5,163 documents in 2,559ms: 2.02: docs/ms
Finished indexing in 2,697ms after last document write

So we get double the speed at write time, but we also get much better indexing speed, this is sort of an accidental by product, because now we index documents based on range, rather than on specific key. But it is a very pleasant accident.

Sep 07 2010

Being a product owner

time to read 1 min | 133 words

Tweet Share Share 11 comments

Tags:

Development

A while ago I have come to the realization that it is impossible for me to everything alone, so I got other people to help me build my projects. In some cases, that was done using OSS, by soliciting contribution from the community. In others, it is a simple commercial transaction, where I give someone money for code.

I think that I have gotten too used to the OSS model, because I got the following reply for this:

That was somewhat of a rude shock, I was using the same loose language that I always hated when I got specs to implement.

Those are all really good questions.

Sep 06 2010

Normalization is from the devil

time to read 4 min | 723 words

Tweet Share Share 66 comments

Tags:

The title of this post is a translation of an Arabic saying that my father quoted me throughout my childhood.

I have been teaching my NHibernate course these past few days, and I had come to realize that my approach for designing RDBMS based applications has gone a drastic change recently. I think that the difference in my view was brought home when I started getting angry about this model:

I mean, it is pretty much a classic, isn’t it? But what really annoyed me was that all I had to do was look at this and know just how badly this is going to end up as when someone is going to try to show an order with its details. We are going to have, at least initially, 3 + N(order lines) queries. And even though this is a classic model, loading it efficiently is actually not that trivial. I actually used this model to show several different ways of eager loading. And remember, this model is actually a highly simplified representation of what you’ll need in real projects.

I then came up with a model that I felt was much more palatable to me:

And looking at it, I had an interesting thought. My problem with the model started because I got annoyed by how many tables were involved in dealing with “Show Order”, but the end result also reminded me of something, Root Aggregates in DDDs. Now, since my newfound sensitivity about this has been based on my experiences with RavenDB, I found it amusing that I explicitly modeled documents in RavenDB after Root Aggregates in DDD, then went the other way (reducing queries –> Root Aggregates) with modeling in RDBMS).

The interesting part is that once you start thinking like this, you end up with a lot of additional reasons why you actually want that. (If the product price changed, it doesn’t affect the order, for example).

If you think about it, normalization in RDBMS had such a major role because storage was expensive. It made sense to try to optimize this with normalization. In essence, normalization is compressing the data, by taking the repeated patterns and substituting them with a marker. There is also another issue, when normalization came out, the applications being being were far different than the type of applications we build today. In terms of number of users, time that you had to process a single request, concurrent requests, amount of data that you had to deal with, etc.

Under those circumstances, it actually made sense to trade off read speed for storage. In today’s world? I don’t think that it hold as much.

The other major benefit of normalization, which took extra emphasis when the reduction in storage became less important as HD sizes grew, is that when you state a fact only once, you can modify it only once.

Except… there is a large set of scenarios where you don’t want to do that. Take invoices as a good example. In the case of the order model above, if you changed the product name from “Thingamajig” to “Foomiester”, that is going to be mighty confusing for me when I look at that order and have no idea what that thing was. What about the name of the customer? Think about the scenarios in which someone changes their name (marriage is most common one, probably). If a woman orders a book under her maiden name, then changes her name after she married, what is supposed to show on the order when it is displayed? If it is the new name, that person didn’t exist at the time of the order.

Obviously, there are counter examples, which I am sure the comments will be quick to point out.

But it does bear thinking about, and my default instinct to apply 3rd normal form has been muted once I realized this. I now have a whole set of additional questions that i ask about every piece of information that I deal with.

Sep 05 2010

I need my own blog software, damn it

time to read 3 min | 441 words

Tweet Share Share 20 comments

Tags:

Blog

I am well aware that I am… outside the curve for bloggers. For a long while I handled that by simply dumping the posts as soon as I wrote them, but that turned out to be quite a burden for some readers, and pieces that I think deserve more attention were skipped, because they were simply drowning in the noise of so many blog posts.

I am much happier with the future posting concept. It make things more predictable, both for me and for the readers. The problem happen when you push this to its logical conclusion. At the time of this writing, I have a month of scheduled posts ahead of me, and this is the third or forth blog post that I wrote in the last 24 hours.

In essence, I created a convoy for my own blog. At some point, if this trend progresses, it will be a problem. But I kinda like the fact that I can relax for a month and the blog will function on auto pilot. There is also the nice benefit that by the time that the blog post is published, I forgot what it said (I use the write & forget method), so I need to read the post again, which helps, a lot.

But there are some disadvantages to this as well. My current system will simply schedule a post on the next day after the last day. This works, great, if I have posts that are not time sensitive. But what actually happen is that there are lot of scenarios in which I want to set the date of the post to the near future. I still try to keep it to one post a day, so that means that I need to shuffle the rest of the items in the queue, though. This is especially troubling when you consider that I usually write a series of posts that interconnect to a full story.

So I can’t just take one of them and bump it to the end, I might have to do rearranging of the entire timeline. And there is no support for that, I have to go and manually update the timing for everything else.

It is pretty clear why this feature is missing, it is an outlier one. But it probably means that i am going to fork SubText and add those things. And the real problem is that I would really like to avoid doing any UI work there. So I need to think about a system that would let me do that without any UI work from my part.

Sep 04 2010

Security Models: On Behalf Of

time to read 5 min | 843 words

Tweet Share Share 8 comments

Tags:

In a security system, On Behalf Of is a vastly underutilized concept. I created that for the first time in 2007, in a project that spurred the creation of Rhino Security. On Behalf Of allows another user to assume the mantle of another user. From the authorization system, once you activate On Behalf Of, you are that other user. That means that you have all the access rights (and limitations) of that user.

Why is this useful?

On Behalf Of gives a help desk operator a very quick way to reproduce a bug that a user run into. Usually those bugs are things like “why can’t I see product Foo”, or “I run into a bug in Order #823838”. Those bugs can only be reproduced when the help desk operator is running within the security context of the user.
On Behalf Of represent how the real world work. Imagine a Team Leader that takes a vacation. For the duration of the vacation, there is someone else that assumes that Team Leader role on a temporary basis. On Behalf Of allows that someone to do the required work, in the the context of the Team Leader, which allows her to perform operations that she would otherwise may not be able to do.

Auditing

On Behalf Of has important implications on auditing. In most systems where auditing plays a role, the user who perform the action is just as important as the action that was taken. On Behalf Of integrates with the auditing system, to indicate not only who actually did the operation (the end user), but also what user that operation was On Behalf Of. This is important, because it avoid “impossible” audit entries later on, where either “I was in vacation that time, I couldn’t have done it” or “I never had permissions to do this, there is a bug in the system” might crop up.

It is important to note that most systems where On Behalf Of is used have a sophisticated security rules. At that point, system administrators are more akin to Windows’ Administrators than Unix’s Root. In Unix, if you are root, you can do whatever you like. In Windows, if you are Administrator, you can do almost everything that you like. A typical example is that as an administrator on Windows, you can’t read another’s user files without leaving a mark that you can’t remove (changing ownership), but there are others.

You can think about On Behalf Of as an extension to this, we want to act as another user, but we have to know that we did. That is why you want to be able to pull the operations made by users acting On Behalf Of other users from the Auditing System easily. In fact, when running On Behalf Of, your audit level is much increased, because we need to track what sort of operations you made, even operations that are normally below the audit level of the system (viewing an entity you otherwise had no way of accessing, for example).

Authorization

From authorization perspective, the actual mechanics are pretty simple, instead of passing the actual user to the security system, you pass the user that we execute operations On Behalf Of.

However, the security system does need to be aware of On Behalf Of, because there are some operations that you cannot perform On Behalf Of someone else. For example, while I may be authorized to act On Behalf on another Team Leader, it is not possible for me to fire a team member while operating On Behalf Of Team Leader. Firing someone is a decision that can be made only by that person direct manager, not by someone acting On Behalf Of. This is a business decision, mind you, to define the set of operations that may be performed On Behalf Of (usually most of them) and the few critical ones that you mustn’t.

Authentication

As you can imagine, there is a great deal to abuse with this feature, so for the most part, it is strictly limited. Usually it is active for system administrators only, but very often, there are both temporary and permanent set of conditions where On Behalf Of is enabled.

For example, an exec is always able to act On Behalf OF his captain. And we already discussed covering another Team Leader when they are sick / vacationing / etc.

Summary

On Behalf Of is a powerful feature, but it requires understanding from the users, it has the potential to be looked at as a security hole, even though it is just a reflection of how we work in real life. And the implications if the users expect some level of privacy in the system are huge. A large part of implementing On Behalf OF correctly isn’t in the technical details, it is in the way you build /document / sale it to the users.

Sep 03 2010

The Law of Conservation of Tradeoffs

time to read 5 min | 933 words

Tweet Share Share 18 comments

Tags:

Every so often I see a group of people or a company come up with a new Thing. That new Thing is supposed to solve a set of problems. The common set of problems that people keep trying to solve are:

Data access with relational databases
Building applications without needing developers

And every single time that I see this, I know that there is going to be a catch involved. For the most part, I can usually even tell you what the catches involved going to be.

It isn’t because I am smart, and it is certain that I am not omnificent. It is an issue of knowing the problem set that is being set out to solve.

If we will take data access as a good example, there aren’t that many ways tat you can approach it, when all is said and done. There is a set of competing tradeoffs that you have to make. Simplicity vs. usability would probably be the best way to describe it. For example, you can create a very simple data access layer, but you’ll give up on doing automatic change tracking. If you want change tracking, then you need to have Identity Map (even data sets had that, in the sense that every row represented a single row :-) )

When I need to evaluate a new data access tool, I don’t really need to go too deeply into how it does things, all I need to do is to look at the set of tradeoffs that this tool made. Because you have to make those tradeoffs, and because I know the play field, it is very easy for me to tell what is actually going on.

It is pretty much the same thing when we start talking about the options for building applications without developers (a dream that the industry had chased for the last 30 – 40 years or so, unsuccessfully). The problem isn’t in lack of trying, the amount of resources that were invested in the matter are staggering. But again you come into the realm of tradeoffs.

The best that a system for non developers can give you is CRUD. Which is important, certainly, but for developers, CRUD is mostly a solved problem. If we want plain CRUD screens, we can utilize a whole host of tools and approaches to do them, but beyond the simplest departmental apps, the parts of the application that really matter aren’t really CRUD. For one application, the major point was being able to assign people to their proper slot, a task with significant algorithmic complexity. In another, it was fine tuning the user experience so they would have a seamless journey into the annals of the organization decision making processes.

And here we get to the same tradeoffs that you have to makes. Developer friendly CRUD system exists in abundance, ASP.Net MVC support for Editor.For(model) is one such example. And they are developer friendly because they give you he bare bones of functionality you need, allow you to define broad swaths. of the application in general terms, but allow you to fine tune the system easily where you need it. They are also totally incomprehensible if you aren’t a developer.

A system that is aimed at paradevelopers focus a lot more of visual tooling to aid the paradeveloper achieve their goal. The problem is that in order to do that, we give up the ability to do things in broad strokes, and have to pretty much do anything from scratch for everything that we do. That is acceptable for a paradeveloper, without the concepts of reuse and DRY, but those same features that make it so good for a paradeveloper would be a thorn in a developer’s side. Because they would mean having to do the same thing over & over & over again.

Tradeoffs, remember?

And you can’t really create a system that satisfy both. Oh, you can try, but you are going to fail. And you are going to fail because the requirement set of a developer and the requirement set of a paradeveloper are so different as to be totally opposed to one another. For example, one of the things that developers absolutely require is good version control support. And by good version control support i mean that you can diff between two versions of the application and get a meaningful result from the diff.

A system for paradeveloper, however, is going to be so choke full of metadata describing what is going on that even if the metadata is in a format that is possible to diff (and all too often it is located in some database, in a format that make it utterly impossible to work with using source control tools).

Paradeveloper systems encourage you to write what amounts to Bottun1_Click handlers, if they give you even that. Because the paradevelopers that they are meant for have no notion about things like architecture. The problem with that approach when developers do that is that it is obviously one that is unmaintainable.

And so on, and so on.

Whenever I see a new system cropping up in a field that I am familiar with, I evaluate it based on the tradeoffs that it must have made. And that is why I tend to be suspicious of the claims made about the new tool around the block, whatever that tool is at any given week.

Sep 02 2010

How far can you push commercialization?

time to read 2 min | 350 words

Tweet Share Share 6 comments

Tags:

Miscellaneous

I was recently at a private company event (not my company, I was invited, along with others, because we have a close association to that company). The event itself wasn’t notable, but there was one thing that really bothered me, before the event actually started, there was the usual phase when everyone is munching on the snacks and mingling. The food was some sort of green cupcakes with inspirational messages on them: “think positive”, “fitting the world to you”, etc.

All in all, I found that somewhat strange, but I didn’t really care, but I was talking with a few friends when a woman walked up to us and started handing out coupons for some free demo courses using a whole new technique, etc. I was quite taken aback. I am used to stuff like that on conferences floors, where you have booth babes doing stuff like that, but that was a private meeting of less than fifty people, and I couldn’t understand what was going on.

It helped that the woman kept dropping the same phrases that appeared on the cupcakes. That was later confirmed at the beginning of the meeting, where the presenter stood up and started by thanking the sponsors for bringing the food, etc.

Looking back at this, I am both appalled, amazed and utterly unsurprised (you can be both at the same time, it seems). That company actually sold sponsorship for an internal, private, meeting. I don’t really know what was the point, if they were trying to save money on the food or they were actually making money out of this, but that behavior really bother me.

I am absolutely for commercialization, if only because the bank would otherwise object, but I was utterly stunned by how crass it was.

What is next? Hiring employees for the express purpose of watching commercials while the company is getting paid for that?

More to the point, there is some expectation about how such functions are going to be, and stunts like that are leaving very bad impression.

Oren Eini

Oren Eini

CEO of RavenDB

Giving courses in Australia

Moving away from the Active Record model

Maintainability, Code Size & Code Complexity

RavenDB performance optimizations

Being a product owner

Normalization is from the devil

I need my own blog software, damn it

Security Models: On Behalf Of

The Law of Conservation of Tradeoffs

How far can you push commercialization?

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed