If you throttle me any me I am going to throttle you back!
It is interesting to note that for a long while, what we were trying to do with RavenDB was make it use less and less resources. One of the reasons for that is that less resources is obviously better, because we aren’t wasting anything.
The other reason is that we have users running us on a 512MB/650 MHz Celeron 32 bit machines. So we really need to be able to fit into a small box (and also allow enough processing power for the user to actually do something with the machine).
We have gotten really good in doing that, actually.
The problem is that we also have users running RavenDB on standard server hardware (32 GB / 16 cores, RAID and what not) in which case they (rightly) complain that RavenDB isn’t actually using all of their hardware.
Now, being conservative about resource usage is generally good, and we do have the configuration in place which can tell RavenDB to use more memory. It is just that this isn’t polite behavior.
RavenDB in most cases shouldn’t require anything special for you to run, we want it to be truly a zero admin database. The solution? Take into account the system state and increase the amount of work that we do to get things done. And yes, I am aware of the pitfalls.
As long as there is enough free RAM available, we will increase the amount of documents that we are going to index in a single batch. That is subject to some limits (for example, if we just created a new index on a big database, we need to make sure we aren’t trying to load it entirely to memory), and it knows how to reserve some room for other things, and how to throttle down and as well as up.
This post is written before I had the chance to actually test this on production level size dataset, but I am looking forward to seeing how it works.
Update: Okay, that is encouraging, it looks like what we did just made things over 7 times faster. And this isn’t a micro benchmark, this is when you throw this on a multi GB database with full text search indexing.
Next, we need to investigate what we are going to do about multiple running indexes and how this optimization affects them. Fun .
Comments
Be careful of this strategy of using the free memory to make decisions on memory allocation - http://blogs.msdn.com/b/oldnewthing/archive/2012/01/18/10257834.aspx
Paul, I am well aware of this, and yes, we took that into account.
Paul, did you notice that I linked to that exact article in the post?
How did you take it into account?
Martin, We aren't trying to do blind guesses, we are going to use as much memory as we can, but we will stop if we reach the predefined limit. That means that if there are other apps running and using memory, it doesn't affect us, we aren't trying to use more memory, we are simply trying not to use too much.
Someone is probably going mention this soon anyway, so I thought I'll go ahead and do this myself.
Often times, using less resources is considered "being green". For example this web CMS is written entirely in C++ to minimize the use of resources: http://cppcms.com/wikipp/en/page/rationale
Question is how far you are willing to go with it. I'm pretty sure moving from your RDBMS of choice to any NoSQL that better fits your needs will save 10 times the energy you had been using so far...
@Itamar: In that case I would like to refer to the quote "Premature optimization..."
You only know whether it costs more or less energy to cache data in memory instead of spinning up those platters every time until you actually measured it.
Ususally databases engine try to allocate as much memory they can, unless some configuration sets an upper limit.
This is how Sql Server or oracle database works, and it is perfectly good, because usually they run on a dedicated server. If you need to limit resource usage, just setup a limit.
Having unused memory is not useful :).
Comment preview