Combating the Select N + 1 Problem In NHibernate

Using an O/RM can greatly simplify your life, but is has its on set of Gotcha that you need to be aware of. One of the more serious ones is the Select N + 1 issue. To describe the issue, let's look at a typical example. Users א M:M א Blogs א 1:M א Posts א 1:M א Comments. Here is the diagram:

I hope that both my notation above and diagram makes it clear what the relations between the objects are (isn't class designer cool)? As an aside, the above model is from NHibernate.Generics tests. All collections are lazy.

I want to show the user all the comments from all the posts, so they can delete all the nasty comments. The naןve implementation would be something like:

foreach (Post post in blog.Posts)

{

    foreach (Comment comment in post.Comments)

    {

        //print comment...

    }

}

The posts collection has 80 posts in it. When we access it, we load all the posts (1 select to grab all the posts), now, when we access each post's Comments' collection, we need to issue a new select to get the data (1 select to grab all the comments for this post). In my testing, I got 81 selects from this seemingly innocent piece of code. I'm not sure about you, but I'm usually motivated to reduce the number of external queries. To put it another way, for every iteration of the external loop above, we have a database query. And that is in order to print a fairly simple UI, and this is a fairly common task.

There are several ways to solve the issue. The easiest is to just ask NHibernate to give you it all in a single query, like this:

IList posts = session.CreateQuery(

        @"from Post post left join fetch

        post.Comments where post.Blog = :blog")

    .SetEntity("blog",blog)

    .List();

foreach (Post post in posts)

{

    foreach (Comment comment in post.Comments)

    {

        //print comment...

    }

}

This code is a little more explicit in what it does; it tells NHibernate "give me all the posts for this blog, and make sure to fetch all their comments as well, since I'm going to use this". The result of this query is a single SQL Statement to load all the data in one go.

 

Fetch Join vs. Join

One important thing that you should be aware of is the difference between join fetch and just join. NHibernate can use multiply joins in a single statement, but can fetch only a single collection – it can fetch several properties {many to one or one to one} – in a single query. Fetching means that NHibernate will load the collection and put it in its proper place (for instance, load all the comments for this post and put them in the collection).

Normal joining means that NHibernate will return the correct objects, but it will not be able to associate them to their proper place in the objects without going to the database again.

To clarify, the result of this query:

IList  blogs = session.CreateQuery(

        @"from Blog blog left join fetch

        blog.Posts left join blog.Users

        where blog.id = 1")

    .List();

It is not a Blog object with all its collections already filled. It is a list of tuples { { blog1, user1} , { blog1, user2}, { blog1, user3},{ blog1, user1} }, etc. The list in my case has a length of 567, and contains many duplicate items. The reason it contains duplicate items is that NHibernate doesn't attempt to uniqueify (is that a word?) the result from the database, and because the database returned  567 (for all permutations of a post, blog and user), this is what you get. NHibernate is smart enough to return the same instances, so it is not a problem to do the filtering yourself.

However, if we will try to access the Users' collection of the blog that was returned, we will still get a query to the database to get the collection. NHibernate is smart enough to figure out that it already has the data itself in memory, so it will just ask for the identifiers and nothing more.

 

There are other ways to get the same result, we can use the Criteria API (my favorite), to do the same:

session.CreateCriteria(typeof(Post))

    .SetFetchMode("Comments", FetchMode.Eager)

    .Add(Expression.Eq("Blog", blog))

    .List();

This is one of the more powerful features on NHibernate, since it allows you a fine grained control of the loading strategy, based on your scenario. A fairly simple change turned an 80+1 queries (for a small example. If I was using my own blog for example, I would need ~1300+1 queries to get the data!!) to a single query.

And I can do this on a case by case basis! That is truly powerful.  

The other ways that I mentioned are no as fined grained, and involve changing the mapping for the specified association(s). This is good if you need to change the behavior for all the application in one stroke. Personally, I can't think of hand on many cases where I would need that, but it is good to have the option for when I would need it.

Print | posted on Tuesday, May 02, 2006 9:51 PM

Feedback


Gravatar

#  5/3/2006 2:26 PM Chris Bilson

This is awesome stuff. I am doing a presentation on ActiveRecord and Monorail at ssdotnet (in Olympia, WA) next week and will be sure to incorporate above information (credit yours). I like the little type-safe beams of light you shine into some of the areas of NHibernate and ActiveRecord I didn't understand. Thanks!


#  5/3/2006 6:06 PM Philip Nelson

I agree, great post! You not only solve the problem you present at the start of the post, but you explain the hidden inner workings of NHibernate at the same time.


Gravatar

#  5/3/2006 7:08 PM Ayende Rahien

@Chirs, @Philip,

Thanks for the compliments. It means a lot to me.


Gravatar

  5/3/2006 9:47 PM Fabrizio

Good article, but I do not agree regarding two points:

1. Making directly the Hibernate query (by who? the GUI package?) brokes the OO design and introduces coupling between objects
2. The SELECT N+1 problem arises due to the use of the lazy load principle. I guess that if you need to presents in a GUI all the posts and their comments the usage of lazy load makes no sense at all. It has more sense in my opinion to directly load posts and comments together.

Bye,
fabrizio.


Gravatar

#  5/3/2006 9:53 PM Ayende Rahien

1. Yes, doing HQL queries from the UI will _break_ encapsulation.
I would generally shove this to something like LoadPostsWithComments() methods, or something like this.

2. The issue is that I have different needs based of different scenarios. For instance, I usually don't want to display the comments (think of a blog's front page, with many posts, but no comments). When I do want to display the posts and the comments, I want to do it efficiently.
In general, I would like to choose my lazy load strategy based on the common scenarios, and override it using HQL on the less common scenarios where this is causing a performance problem.


Gravatar

  7/17/2006 11:14 AM julio

Please, excuse me for my English, I'm spanish.
Can you put the mappin for this join example?
Thanks you.


Gravatar

#  7/17/2006 9:37 PM Ayende Rahien

Here are the mapping:
http://svn.berlios.de/wsvn/nqa/NHibernate.Generics/NHibernate.Generics.Tests/Mapping.hbm.xml?op=file&rev=0&sc=0


Gravatar

  8/26/2006 1:51 AM Antonio

session.CreateCriteria(typeof(Post))
.SetFetchMode("Comments", FetchMode.Eager)
.Add(Expression.Eq("Blog", blog))
.List();

Hi. What happens if, Post and Comments coontains a column named "Blog" ?

Antonio.


Gravatar

#  8/26/2006 11:24 AM Ayende Rahien

It will search by the Blog column on Post


Gravatar

  8/26/2006 2:27 PM Antonio

Anf if i would search column Blog on Comments Table?
How do i set the crieria ?
Thanks


Gravatar

#  8/26/2006 2:54 PM Ayende Rahien

You want all the posts that has a comment in this blog?
That is something that you need HQL for.


Gravatar

  8/26/2006 11:36 PM Antonio

Ok. Thank you so much!


Gravatar

  9/19/2006 10:18 PM Christopher MG

Hey Ayende! Thanks again for all the magnificent work - especially the stored procs this morning :)

Quick question... You say above that fetch joins can still create duplicate objects - I'm experiencing that, even with 1.2 alpha. I return a list of objects from my query, then I have to write my own manual filtering code to narrow that list down to the actual list of unique objects (now nicely filled via fetch join). Is this still an issue, or has this been fixed since this original blog post? Thanks!!


Gravatar

#  9/19/2006 10:44 PM Ayende Rahien

This is not an issue, it is a feature, serisously.
This is because you often _want_ to join against several objects, and get related objects back.
The "solution" is simply to pipe them through a set, let this:

IList withDuplicates = session.... ;

ISet withoutCollections = new HashedSet(withDuplicates);


# nkxyoowp 3/16/2007 11:45 PM nkxyoowp

nkxyoowp

Comments have been closed on this topic.