Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,589
|
Comments: 51,218
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 84 words

I run into the following code during code review and had an immediate and visceral reaction.

This is a (bad) attempt to add thread safety, because you are getting a value through a read only interface, but there is still the mutable instance to work with at the source, and now you have someone that observes the instance while it is being mutated, outside the lock.

The proper way to handle this is to copy the list (under the lock) and return a distinct copy.

time to read 1 min | 144 words

I asked about a slow piece of code and why it is so slow. The answer is pretty simple, I messed up, take a look at this piece of code:

image

When the Value is an int, I’m creating a span from the address of a long variable (initialized to zero). That means that I have a lot of hash collisions, which means that adding to the dictionary turned into a sequential operation, which means that the actual cost here is O(N**2).

Interestingly enough, you can’t write this code without the use of unsafe. I actually didn’t know that the scope of the variable was even valid here to have its address taken. That was very hard to debug, because the problem was hidden away somewhere very far from where we looked at.

time to read 1 min | 81 words

The following code takes a long time to run. In fact, I’m writing this blog post while this is running, and I’m not sure how long that will take.

Update: This took over a minute to complete on my machine (which is pretty nice).

The killer is that this code is correct, it does the right thing, but it can be slow. I stripped a much larger scenario to ~50 lines of code, can you see what is going on? And why?

time to read 3 min | 422 words

Following my previous posts, about the use after free bug that was found in a pull request, I put a lot of effort into testing the validity of the fix. As it turned out, my code was right and the fix worked properly. However, I uncovered a race condition in the .NET MemoryCache implementation. Here is what I got when I put the system under load:

Unhandled exception. System.ArgumentException: Unable to sort because the IComparer.Compare() method returns inconsistent results. Either a value does not compare equal to itself, or one value repeatedly compared to another value yields different results. IComparer: 'System.Comparison`1[Microsoft.Extensions.Caching.Memory.CacheEntry]'.
   at System.Collections.Generic.ArraySortHelper`1.Sort(Span`1 keys, Comparison`1 comparer) in System.Private.CoreLib.dll:token 0x60066cd+0x1d
   at System.Collections.Generic.List`1.Sort(Comparison`1 comparison) in System.Private.CoreLib.dll:token 0x600688b+0x3
   at Microsoft.Extensions.Caching.Memory.MemoryCache.g__ExpirePriorityBucket|27_0(Int64& removedSize, Int64 removalSizeTarget, Func`2 computeEntrySize, List`1 entriesToRemove, List`1 priorityEntries) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000061+0x21
   at Microsoft.Extensions.Caching.Memory.MemoryCache.Compact(Int64 removalSizeTarget, Func`2 computeEntrySize) in Microsoft.Extensions.Caching.Memory.dll:token 0x600005b+0xff
   at Microsoft.Extensions.Caching.Memory.MemoryCache.OvercapacityCompaction(MemoryCache cache) in Microsoft.Extensions.Caching.Memory.dll:token 0x6000059+0xad
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in System.Private.CoreLib.dll:token 0x6002b7c+0x110
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in System.Private.CoreLib.dll:token 0x6002c66+0x190

There are a few interesting things here. First, we can see that this killed the process, this isn’t just an error, this is an error from a background thread that ended up unhandled and killed everything. That is a nope, for server applications. The second issue is that the error is strange. What exactly is going on here?

Here is the relevant piece of code that throw the error, inside the MemoryCache:

priorityEntries.Sort((e1, e2) => e1.LastAccessed.CompareTo(e2.LastAccessed));

This is a really interesting line, because of what it does. The priorityEntries is a local list of cache entries, which we need to sort by the last access, to figure out what we can evict. What can go wrong here?

Well, the MemoryCache is a naturally concurrent instance, of course. What happens when we have an access to the entry in question? We’ll update the LastAccessed value. And if we do this just right, we may give the sort function different values for the same cache entry, leading to this problem.

The bug in question was in place for as long as I can track the MemoryCache codebase. In the best case scenario, it will cause evictions of data that shouldn’t be evicted. Not a major issue, and unlikely to be noticed. But if it fail in this manner, it will kill the process, so very likely to be noticed.

My test scenario had a lot of concurrency and a very small cache (leading to a lot of evictions / compactions), I’m guessing that is why I’m able to trigger this so easily.

time to read 2 min | 239 words

After presenting the issue of how to return items to the array pool without creating a use after free bug, I asked you how you would fix that. There are several ways to try to do that, you can use reference counting scheme, or try to use locking, etc. All of those options are expensive, what is worse, they are expensive on a routine basis, not just for the free the buffer code path.

Instead, I changed the way we are handling returning the buffer. Take a look at the following code:

This may require some explanation. I’m using a ConditionaWeakTable here, that was added to the runtime to enable dynamic properties on objects. Basically, it creates a table that you can lookup by an object to get a key. The most important feature is that the runtime ensures that the associated reference lifetime match the key object lifetime. In other words, when we add the buffer in the eviction callback, we ensure that the ReturnBuffer we register will live at least as long as the buffer.

That means that we can let the GC do the verification job. We’ll now return the buffer back to the pool only after the GC has ensured that there are no outstanding references to it. Not a lot of code, and an elegant solution. This also ensures that we are only paying the code on eviction (likely rare), and not all the time.

time to read 2 min | 301 words

In my previous post, I discussed a bug that brought up in code review, that bug made me go into near panic mode. Here is the issue:

In order to understand this bug, you have to take into account multiple concurrent threads at the same time. Look at the ComputeHashAndPutInCache() method, where we register an eviction callback for the item in the cache. When we evict the item, we return the buffer to the buffer pool.

We want to avoid allocating memory, so that is certainly something desirable, no? However, consider what happens if I have a thread in ComputeHash(), getting a value from the cache. Before I have a chance to look at the value, however, the cache will decide to evict it. At which point the eviction callback will run.

We’ll return the buffer back to the buffer pool, where it may be used again by something else. I am also using this buffer to do other things from the caller of the ComputeHash() call. This is a somewhat convoluted use after free issue, basically.

And I find is super scary bug because of its affects:

  • Randomly, and rarely, some buffer will contain the wrong data, leading to wrong results (but hard to track it down).
  • Trying to find such a bug after the fact, however, is nearly impossible.
  • Most of the debugging techniques (repeating the operation for a particular value) will make it go away (the cache will keep the value and not evict it).

In short, this is a recipe for a really nasty debug session and an impossible to resolve bug. All from code that looks very much like an innocent bystander.

Now, I can obviously fix it by not using the array pool, but that may cause me to allocate more memory than I should. How would you approach fixing this issue?

time to read 1 min | 148 words

I was writing a fairly low level piece of code recently, this is deep in the guts of how RavenDB handles query. I’m adding a cache for some processing inside of RavenDB and the performance benefits are amazing (three orders of magnitude better). As you can imagine, this is the sort of things that I would really like to get into the main codebase. So I did all the usual taxes, created a pull request and moved to other things. Part of our process is to review all pull requests by another pair of eyes. And while there was some minor comments about the code, there was one comment asking about a particular line that had be break out in cold sweat.

I created a simpler reproduction for discussion purposes, here is the code:

Take a look, and let’s me know if you see the bug and its consequences.

time to read 4 min | 664 words

Recently we had to tackle a seriously strange bug. A customer reported that under a specific set of circumstances, when loading the database with many concurrent requests, they would get an optimistic concurrency violation from RavenDB.

That is the sort of errors that we look at and go: “Well, that is what you expect it to do, no?”

The customer code looked something like this:

As you can imagine, this code is expected to deal with concurrency errors, since it is explicitly meant to run from multiple threads (and processes) all at once.

However, in the scenario that the user was using, it had two separate types of workers running, one type was accessing the same set of locks, and there you might reasonably expect to get a concurrency error. However, the second type was run in the main thread only, and there should be no contention on that at all. However, the user was getting a concurrency error on that particular lock, which shouldn’t happen.

Looking at the logs, it was even more confusing. Leaving aside that we could only reproduce this issue when we had a high contention rate, we saw some really strange details. Somehow, we had a write to the document coming out of nowhere?

It took a while to figure out what was going on, but we finally figured out what the root cause was. RavenDB internally does not execute transactions independently. That would be far too costly. Instead, we use a process call transaction merging.

Here is what this looks like:

image

The idea is that we need to write to the disk to commit the transaction. That is an expensive operation. By merging multiple concurrent transactions in this manner, we are able to significantly reduce the number of disk writes, increasing the overall speed of the system.

That works great, until you have an error. The most common error, of course, is a concurrency error. At this point, the entire merged transaction is rolled back and we will execute each transaction independently. In this manner, we suffer a (small) performance hit when running into such an error, but the correctness of the system is preserved.

The problem in this case would only happen when we had enough load to start doing transaction merging. Furthermore, it would only happen if the actual failure would happen on the second transaction in the merged sequence, not the first one.

Internally, each transaction does something like this:

Remember that aside from merging transactions, a single transaction in RavenDB can also have multiple operations. Do you see the bug here?

The problem was that we successfully executed the transaction batch, but the next one in the same transaction failed. RavenDB did the right thing and execute the transaction batch again independently (this is safe to do, since we rolled back the previous transaction and are operating on a clean slate).

The problem is that the transaction batch itself, on the other hand, had its own state. In this case, the problem was that when we replied back to the caller, we used the old state, the one that came from the old transaction that was rolled back.

The only state we return to the user is the change vector, which is used for concurrency checks. What happened was that we got the right result, for the old change vector. Everything worked, and the actual state of the database was fine. The only way to discover this issue is if you are continue to make modifications to the same document on the same session. That is a rare scenario, since you typically discard the session after you save its changes.

In any other scenario, you’ll re-load the document from the server, which will give you the right change vector and make everything work.

Like all such bugs, when we look back at it, this is pretty obvious, but to get to this point was quite a long road.

time to read 3 min | 478 words

In distributed systems, the term Byzantine fault tolerance refers to working in an environment where the other nodes in the system are going to violate the invariants held by the system. Sometimes, that is because of a bug, sometimes because of a hardware issue (Figure 11) and sometimes that is a malicious action.

A user called us to let us know about a serious issue, they have a 100 documents in their database, but the index reports that there are 105 documents indexed. That was… puzzling. None of the avenues of investigation we tried help us. There wasn’t any fanout on the index, for example.

That was… strange.

We looked at the index in more details and we noticed something really strange. There were documents ids that were in the index that weren’t in the database, and they were all at the end. So we had something like:

  • users/1 – users/100 – in the index
  • users/101 – users/105 – not in the index

What was even stranger was that the values for the documents that were already in the index didn’t match the values from the documents. For that matter, the last indexed etag, which is how RavenDB knows what documents to index.

Overall, this is a really strange thing, and none of that is expected to happen. We asked the user for more details, and it turns out that they don’t have much. The error was reported in the field, and as they described their deployment scenario, we were able to figure out what was going on.

On certain cases, their end users will want to “reset” the system. The manner in which they do that is to shut down their application and then delete the RavenDB folder. Since all their state is in RavenDB, that brings them back up at a clean state. Everything works, and this is a documented manner in which they are working.

However, the manner in which the end user will delete is done as: “Delete the database files”, the actual task is done by the end user.

A RavenDB database internally is composed of a few directories, for data and indexes. What would happen in the case where you deleted just the database file, but kept the index files? In that case, RavenDB will just accept that this is the case (soft delete is a a thing, so we have to accept that) and open the index file. That index file, however, came from a different database, so a lot of invariants are broken. I’m actually surprised that it took this long to find out that this is problematic, to be honest.

We explained the issue, and then we spent some time ensuring that this will never be an invisible error. Instead, we will validate that the index is coming from the right source, or error explicitly. This is one bug that we won’t have to hunt ever again.

time to read 3 min | 451 words

A user called us with a strange bug report. He said that the SQL ETL process inside of RavenDB was behaving badly. It would write the data from the RavenDB server to the MySQL database, but then it would immediately delete it.

From the MySQL logs, the user showed:

2021-10-12 13:04:18 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: execute <unnamed>: INSERT INTO orders ("id") VALUES ('Tab/57708-A')
2021-10-12 13:04:19 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: execute <unnamed>: DELETE FROM orders WHERE "id" IN ('Tab/57708-A')

As you can imagine, that isn’t an ideal scenario for ETL processes. It is also something that should absolutely not happen. RavenDB issues the delete before the insert, obviously. In fact, for your viewing pleasure, here is the relevant piece of code:

image

It doesn’t get any clearer than that, right? We issue any deletes we have, then we send the inserts. But the user insisted that they are seeing this behavior, and I tend to trust them. But we couldn’t reproduce the issue, and the code in question dates to 2017, so I was pretty certain it is correct.

Then I noticed the user’s configuration. To define a SQL ETL process in RavenDB, you need to define what tables we’ll be writing to as well as deciding what data to send to those tables. Here is what this looked like:

image

And here is the script:

image

Do you see the problem? It might be better to use an overload, which make it clearer:

image

In the table listing, we had an “orders” table, but the script sent the data to the “Orders” table.

RavenDB is a case insensitive database, but in this case, what happened behind the scenes is that the code used a case sensitive dictionary to keep track of the tables that we are working with. That meant that what we did was roughly:

And basically, the changes_by_table dictionary had two tables in it. One from the table definitions and one from the script. When we validate that the tables from the script are fine, we do that using case insensitive comparison, so that passed properly.

To make it worst, the order of items in a dictionary is not predictable. If this was the iteration order other way around, everything would appear to be working just fine.

We fixed the bug, but I found it really interesting that three separate people (very experienced with the codebase) had a look and couldn’t figure out how this bug can happen. It wasn’t in the code, it was in what wasn’t there.

FUTURE POSTS

  1. RavenDB and Gen AI Security - 3 days from now
  2. RavenDB & Distributed Debugging - 6 days from now
  3. RavenDB & Ansible - 11 days from now

There are posts all the way to Jul 22, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}