Ayende @ Rahien

Apr 07 2022

AnswerWhy is this code broken?

time to read 3 min | 407 words

Tags:

I asked why this code is broken, and now is the time to dig into this. The issue is in this block of code. Take a look at that for a moment, if you please:

The code is trying to gather the async upload of all the files, and then it will await them. This code compile and runs successfully, but it will not do what you expect it to do. Let’s break it up a bit to understand what is going on:

We are passing the variable task to the list of tasks we have. We just extract a variable, nothing much going on here. Let’s explore further, what is the type of task? We know it must be a subtype of Task, since that is what the tasks collection will take. It turns out that this isn’t that simple:

What is that, Task<Task> thingie? Well, let’s look at the signature of Task.Factory.StartNew(), shall we?

public Task<TResult> StartNew<TResult>(Func<TResult> function);

Just from the signature, we can tell what is going on. StartNew() accepts a function that returns a value and will then return a task for the eventual result of this function. However, the function we are actually passing to the StartNew() doesn’t produce a value. Except that it does…

Let’s explore that thing for a bit:

var func = async () => { };

What is the type of func in this case?

Func<Task> func = async() => {};

The idea is that when the compiler sees the async keyword, it transforms the function to one that returns a Task. Combining both those features together means that our original code actually registers the start of the asynchronous process to happen and will return as soon as it got started. Basically, we’ll only wait for the actual opening of the file, not for the actual network work that has to happen here.

The right way to express what we want here is:

Task.Run(async () => {});

The signature for this is:

public static Task Run<TResult>(Func<Task> function);

You can see here that we get a function that returns a task, but we aren’t actually wrapping that in another Task instance. The task that will be returned will be completed once the full work has been completed.

It is an interesting pitfall in the API, and can be quite hard to figure out exactly what is going on. Mostly because there are several different things happening all at once.

Apr 06 2022

ChallengeWhy is this code broken?

time to read 1 min | 37 words

Tweet Share Share 12 comments

Tags:

The following code looks straightforward, but it has a really subtle issue.

Can you spot what is going on?

You can ignore the error handling here, by the way, the issue isn’t related to handling unexpected errors.

Apr 05 2022

On fixing a bug (and all its siblings) with a forward looking view

time to read 3 min | 543 words

Tweet Share Share 0 comments

Tags:

We run into a strange situation deep in the guts of RavenDB. A cluster command (the backbone of how RavenDB is coordinating action in a distributed cluster) failed because of an allocation failure. That is something that we are ready for, since RavenDB is a robust system that handles such memory allocation failures. The problem was that this was a persistent allocation failure. Looking at the actual error explained what was going on. We allocate memory in units that are powers of two, and we had an allocation request that would overflow a 32 bits integer.

Let me reiterate that, we have a single cluster command that would need more memory than can fit in 32 bits. A cluster command isn’t strictly limited, but a 1MB cluster command is huge, as far as we are concerned. Seeing something that exceeds the GB mark was horrifying. The actual issue here was somewhere completely different, there was a bug that caused quadratic growth in the size of a database record. This post isn’t about that problem, it is about the fix.

We believe in defense in depth for such issues. So aside from fixing the actual cause for this problem, the issue was how we can prevent similar issues in the future. We decided that we’ll place a reasonable size limit on the cluster commands, and we chose 128MB as the limit (this is far higher than any expected value, mind). We chose that value since it is both big enough to be outside anyone's actual usage, but at the same time, it is small enough that we can increase this if we need to. That means that this needs to be a configuration value, so the user can modify that in place if needed. The idea is that we’ll stop the generation of a command of this size, before it hits the actual cluster and poison it.

Which brings me to this piece of code, which was the reason for this blog post:

This is where we are actually throwing the error if we found a command that is too big (the check is done by the caller, not important here).

Looking at the code, it does what is needed, but it is missing a couple of really important features:

We mention the size of the command, but not the actual size limit.
We don’t mention that this isn’t a hard coded limit.

The fix here would be to include both those details in the message. The idea is that the user will not only be informed about what the problem is, but also be made aware of how they can fix it themselves. No need to contact support (and if support is called, we can tell right away what is going on).

This idea, the notion that we should be quite explicit about not only what the problem is but also how to fix it, is very important to the overall design of RavenDB. It allows us to produce software that is self supporting, instead of ErrorCode: 413, you get not only the full details, but how you can fix it.

Admittedly, I fully expect to never ever hear about this issue again in my lifetime. But in case I’m wrong, we’ll be in a much better position to respond to it.

Apr 04 2022

Managing RavenDB indexes in production, a DevOps guide

time to read 4 min | 742 words

Tweet Share Share 2 comments

Tags:

RavenDB has the ability to analyze your queries and generate the appropriate indexes for you automatically. This isn’t a feature you need to enable or a toggle to switch, it is just the way it works by default. For more advanced scenarios, you have the ability to write your own indexes to process your data in all sorts of interesting ways. Indexes in RavenDB are used for aggregation (map-reduce), full text search, spatial queries, background computation and much more. This post isn’t going to talk about what you can do with RavenDB’s indexes, however. I’m going to discuss how you’ll manage them.

There are several ways to create indexes in RavenDB, the one that we usually recommend is to create a class that will inherit from AbstractIndexCreationTask. If you are using C# or TypeScript, you can create strongly typed indexes that will be checked by the compiler for you. If you are using other clients (or JS indexes), you will have the index definition as constant strings inside a dedicated class. Once you have the indexes defined as part of your codebase, you can then create them using a single command: IndexCreation.CreationIndexes();

What I described so far is the mechanics of working with indexes. You can read all about them in the documentation. I want to talk about the implications of this design approach:

Your indexes live in the same repository as your code. Whenever you checkout a branch, the index definitions you’ll use will always match the code that queries them.
Your indexes are strongly typed and are checked by the compiler. I mentioned this earlier, but this is a huge advantage, worth mentioning twice.
You can track changes on your indexes using traditional source control tools. That makes reviewing index changes just a standard part of the job, instead of something you need to do in addition.

RavenDB has a lot of features around index management. Side by side index deployment, rolling indexes, etc. The question is now, when do you deploy those indexes.

During development, it’s standard to deploy your indexes whenever the application starts. This way, you can change your indexes, hit F5 and you are immediately working on the latest index definition without having to make any other actions.

For production, however, we don’t recommend taking this approach. Two versions of the application using different index definitions would “fight” to apply the “right” version of the index, causing version bounce, for example. RavenDB has features such as index locking, but those are to save you from a fall, not for day to day activity.

You should have a dedicated endpoint / tool that you can invoke that would deploy your indexes from your code to your RavenDB instances. The question is, what should that look like? Before I answer this question, I want to discuss another aspect of indexing in RavenDB: automatic indexing.

So far, we discussed static indexes, ones that you define in your code manually. But RavenDB also allows you to run queries without specifying which index they will use. At this point, the query optimizer will generate the right indexes for your needs. This is an excellent feature, but how does that play in production?

If you deploy a new version of your application, it will likely have new ways of querying the database. If you just push that to production blindly, RavenDB will adjust quickly enough, but it will still need to learn all the new ways you query your data. That can take some time, and will likely cause a higher load on the system. Instead of doing all the learning and adjusting in production, there are better ways to do so.

Run the new version of your system on QA / UAT instance and put it through its paces. The QA instance will have the newest static indexes and RavenDB will learn what sort of queries you are issuing and what indexes it needs to run. Once you have completed this work, you can export the indexes from the QA instance and import them into production. Let the new indexes run and process all their data, then you can push the new version of your application out. The production database is already aware of the new behavior and adjusted to it.

As a final note, RavenDB index deployment is idempotent. That means that you can deploy the same set of indexes twice, but it will not cause us to re-index. That reduces the operational overhead that you have to worry about.

Oren Eini

Oren Eini

CEO of RavenDB

AnswerWhy is this code broken?

ChallengeWhy is this code broken?

On fixing a bug (and all its siblings) with a forward looking view

Managing RavenDB indexes in production, a DevOps guide

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed