The use of caches in OR/M

I recently talked quite a bit about caches in NHibernate, and I am a great believer in careful use of it in order to give an application much better performance. Frans Buoma, however, does not agree. Just to note, Frans is the author of LLBLGen Pro.

First, let me point to an issue that I have with the terminology that he uses. When Frans is talking about cache and uniquing, he refers to a term generally (at least by N/Hibernate & Fowler) called Identity Map.

Frans:

 A cache is an object store which manages objects so you don't have to re-instantiate objects over and over again, you can just re-use the instance you need from the cache.

Fowler:

Ensures that each object gets loaded only once by keeping every loaded object in a map.

When speaking about the advantages of an Identity Map, performance is almost never the first reason to use it. It is a side benefit, which can have a certain affect, but it is not the main reason for that. If we consider Frans' arguments as they apply to Identity Map, I agree. If nothing else, Identity Map tends to be fairly short lived and limited in scope in most cases, so it doesn't have the chance to be of great effectiveness.

But an OR/M has an opportunity to cache much more than just at the session / context level. A word of warning, though, as was mentioned in the post, Caching by its very nature means that you are not seeing the very latest data. You can use cache invalidation policies (including the new data driven cache invalidation policites in .Net 2.0) to help, but you should be aware of this issue.

However, when we consider the common scenarios, it is not often that we need to have real time information. The case than Frans is presenting is a CRM application with a query on all the customers that has more than 5 orders in the last month.

Do we really need this data at real time? Or can we be satisfied with data from several minutes ago? This question is dependant on the business scenario, but fairly often the answer is that we can be reasonably satisfied with a data that is a few minutes or hours behind the real events.

Even if we would like to get real time data, the data can be changed between the time that we queried it and the time that we displayed it, so we would need to query again as soon as we finished displaying (or maybe at the same time as), ad infinitum.

Given that we assume that the business requirements allows us to use caching, this has tremendous benefit perfromance wise. Let us assume that we have cached the query and its results (again, I'm using NHibernate as the model here, and its caches are not caching live entities, but rather their values), we can then satisfy the query entirely from the cache (which usually mean in-proc memory).

The only real cost of the query is several hash table lookups, which are (by their nature) very fast, and constructing the objects, which I already shown to be highly efficent. The end result is that we can serve the results immediately. In many cases, even a cache that is valid for a few minutes can significantly reduce the amounts of queries that the DB has to process.

The concerns that Frans is raising are valid in the context* that he is talking about, but I disagree that caches are not extremely improtant to performance. That said, they should not be over used, and the DB is still the one and only authoritive source for the data. I have seen some places where the requirement is to run the application entirely from cache, without touching the database at all.

This is taking this way too far...

* Do you get the joke here?

Print | posted on Sunday, September 03, 2006 3:24 PM

Feedback


Gravatar

#  9/4/2006 7:41 AM Frans Bouma

The term is 'uniquing'. That Fowler calls it 'identity map' is nice for mr. Fowler, but frankly he has re-defined a lot of things which already had a perfectly fine definition.

Caching query results is something which falls in the same category as caching entity sets: the RDBMS already caches query results. The fun thing is: it also has a very optimized system to determine if the query coming in is indeed servable from the cache.

If you have a more complex query (you know the ones you probably want to pull from a cache if you could), comparing what's fed to you by the code as a query and what's in the cache can be a tricky question which might end up more time consuming than simply re-querying the RDBMS which is especially setup to serve the query from a cache IF that's possible.

People should look in caching the end-result of processing of data, not the data fed to a process; by caching the end result of a process you not only cache the data, but also save the processing time.

Pointing at a cache in an O/R mapper to show how sophisticated the O/R mapper is is just marketing, real efficiency is gained from prefetch paths and the like.


Gravatar

#  9/4/2006 2:43 PM Ayende Rahien

@Frans,

To be frank, the first time that I heard the term 'uniquing' is from your post, I run a search now and it seems that it is a known term for the concept you describe.
As long as we agree that they are the same thing, I don't really care.

Caching query results and caching entity sets saves you two _major_ things even if we assume that the cost of processing the query is zero. It saves network traffic and latency, which is a significant overhead. Since the cost of processing the query is _not_ zero, that is some saving.

Why would I want to compare the result of the query to the cache? My main goal is not to hit the database at _all_. I handle the freshness of the cache by utitlizing cache dependencies on either time or table. I am not going to even try to go to the DB in most cases.

"Pointing at a cache in an O/R mapper to show how sophisticated the O/R mapper is is just marketing" - I mostly agree with you here. I think that a good cache implementation can do quite a bit to help the efficency of the application.

"real efficiency is gained from prefetch paths and the like" - Most certainly _bad_ prefetch cause the worst performance (SELECT N+1)


Gravatar

#  9/4/2006 4:15 PM hammett

"The term is 'uniquing'. That Fowler calls it 'identity map' is nice for mr. Fowler, but frankly he has re-defined a lot of things which already had a perfectly fine definition."

I'm happy I'm not the only one with that impression.

Comments have been closed on this topic.