Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,590
|
Comments: 51,218
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 355 words

RavenDB’s query optimizer is pretty smart, it knows how to find the appropriate index for your queries, and even create a new index to match your query if it didn’t exist. But that was the limits of its abilities. A human could still go into the database and say, look at those:

image

Those all operate on Posts, and you should be able to merge them all into a single index. Reducing the number of indexes is a good thing, as it reduces the amount of IO on the system, which is typically our limiting factor.

Now, there was no real reason why we couldn’t actually tell the query optimizer that it should be smart enough that when it creates a new index, it will use all of the properties that have been previously indexed.

However, doing so would actually make no difference to us. Because until now, we didn’t have a way to stop an index. With the new index idling feature, we can now have the query optimizer create a new merged index, and then the database will just mark the extra index as idle after a while.

Almost, there is still another issue that we have to resolve. What happens when we have a big database, and we introduce a new (and wider) index? By default, all matching queries would actually hit that index, and not the previously existing index. That is great, except… the new index is stale, and might remain stale for a few minutes. During that time, we have a perfectly servicable index that is just sitting there.

The query optimizer can now take into account the staleness level of an index as well when selecting it, meaning that there should be no interruption from the point of view of other queries. The new index will be introduced, go through all the documents, and then take over as the serving index for all queries. The existing index will wither away and die.

time to read 1 min | 174 words

RavenDB previously had a really nice feature for temporary indexes. Since we expected most of them to be temporary, we indexed them directly into memory, greatly saving in IO costs. With the removal of temporary indexes, that left us the option of just removing the entire code path and moving on to other things.

But we sat down and thought about this for a while. Typically, the busiest part in the index’s life is its creation, because the database needs to go through all the documents in the db and index them. We have changed things so during this creation period, we will actually index to memory, without hitting the disk. Only if we reached a configurable size or finished indexing everything will we spill everything to disk.

This, in turn, gives us the best of both worlds. We get a really nice optimization for new indexes, and we don’t have to hit the disk for indexes that would soon go away. And, of course, we get the perf boost for all indexes now.

time to read 2 min | 357 words

RavenDB’s ability to analyze your queries and generate the required indexes on the fly has always been a great boon. Rob Ashton was involved in the original implementation and during his visits to Hibernating Rhinos’ secret lair, he got to whack that thing on the head a few more times.

We need to separate two important things:

  • Automatically generating the indexes based on your queries.
  • The temporary indexes model itself.

The first part is a really important feature. The second is just an implementation detail. In particular, temporary indexes had a few problems.

Most importantly, they were temporary, and there was an explicit step for promoting those indexes from one stage to the other. That caused some confusion, and there was a period of time, exactly when we decided that the index was important enough to keep, that caused the index to effectively reset itself. The other problem was that the moment that an index was upgraded to an auto index, it was there forever.

What Rob has done was to remove the concept of temporary indexes all together, which got rid of a whole bunch of code. Instead, we have just standard auto indexes. And now we had a drastically simplified story. We didn’t have the drastic jump from temp to auto, with irrecoverable implications.

Of course, this leads to a lot of interesting questions. Temporary indexes had the benefit of being indexed directly to memory, and they would go away after a database restart, as well as a whole lot of stuff. Not having special code for that made things a lot simpler for us, actually.

Automatic indexes have their age, and that is tracked internally by RavenDB. If an automatic indexed isn’t being used, it will become idle an eventually abandoned. If it is a very young index, we will decide it was a temporary index after all, and remove it from the system completely.

This feature, along with idling indexes, opened up the door for the next important feature, index merging. But before that, we need to upgrade the smarts for the query optimizer… which happens to be our next topic.

time to read 1 min | 162 words

I have been investigating the LevelDB project for the purpose of adding another storage engine to RavenDB. The good news is that there is a very strong likelihood that we can actually use that as a basis for what we want.

The bad news is that it is insanely easy to get LevelDB to compile and work on Linux, and appears to be an insurmountable barrier to do the same on Windows.

Yes, I know that I can get it working by just using a precompiled binary, but that won’t work. I actually want to make some changes there (mostly in the C API, right now).

This instructions appears to be no longer current. And this thread was promising, but didn’t lead anywhere.

I am going to go over the codebase with a fine tooth comb, but I am no longer a C++ programmer, and the intricacies of the build system is putting a very high roadblock of frustration.

FUTURE POSTS

  1. RavenDB & Distributed Debugging - 3 days from now
  2. RavenDB & Ansible - 6 days from now

There are posts all the way to Jul 21, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}