Ayende @ Rahien

Mar 09 2022

Optimizing code by understanding usage

time to read 1 min | 175 words

Tags:

During code review, I ran into the following code (simplified):

We have some wrapper object for IEnumerable and allow to access it by index.

I don’t like this code, and suggested this version, instead:

The problem is that if you’ll look at the code of ElementAt(), it is already doing that, so why the duplicate work? It is specialized to make things fast for similar scenarios, why do I write the same code again?

Because I’m familiar with the usage scenario. A really common usage pattern for the wrapper object is something like this:

The difference between what ElementAt() does and what I do in the second version of the code is that my version will materialize the values. If you are calling that in a for loop, you’ll pay the cost of iterating over all the items once.

On the other hand, since the instance we pass to ElementAt() isn’t one that has an indexer, we’ll have to scan through the enumerator multiple times. A for loop with this implementation is a quadratic accident waiting to happen.

Mar 08 2022

Negative Space Scheduling

time to read 5 min | 828 words

Tweet Share Share 14 comments

Tags:

My first project as a professional software developer was to build a scheduling system for a dental clinics chain. That was a huge project (multiple years) and was really quite interesting. Looking back, I have done a lot of really cool technical things there. I also learned quite a lot from that project. The danger of complexity being one of the chief issues.

Consider a dental clinic, where we have the following schedule for a dentist:

Monday – 10:00 – 15:00
Wednesday – 09:00 – 13:00
Thursday – 11:30 – 16:30

The task is to be able to schedule an appointment for the dentist given those rules.

In addition to the schedule of the dentist, we also have actual Appointments, those looks like this:

Assume that you have quite a few of those, and you want to schedule a new appointment for a patient. How would you do that? I’m a database guy, let’s see if I can specify the task as a query?

We need a dentist that has availability of a particular length (different tasks have different schedules) and particular qualifications. However, there is no such thing as availability in our model. We have just:

Dentist
Schedule
Appointment

The complexity here is that we need to search for something that isn’t there.

I actually found some of my posts on this topic, from 2006. That isn’t a simple problem. And the solution is usually to generate the missing data and query on that. My old posts on the topic actually generate an in memory table and operate on that, which is great for small datasets, but will fail in interesting ways for real world datasets.

For what it’s worth, RavenDB allows you to generate the missing data during the indexing process, so at least the queries are fast, but the indexing process is now compute-intensive and a change in the dentist schedule can result in a lot of work.

All of that is because of two issues:

We are trying to query for the data that isn’t there.
The information is never used as queried.

These two points are strongly related to one another. Consider how you would schedule a dentist appointment. You first need to find the rough time frame that you need (“come back in six months”) and then you need to match it to your schedule (“I can’t on Monday, I got the kids”, etc).

There is a better way to handle that, by filling in the missing pieces. Instead of trying to compute the schedule of a dentist from the specification that we have, go the other way around. Generate the schedule based on the template you have. The result should be something like this:

In other words, based on the schedule provided, we’ll generate an entry per day for the dentist. That entry will contain the appointments for the day as well as the maximum duration for an available appointment. That means that on query time, we can do something as simple as:

from Schedules
where Dentist = $dentistId
and At between $start and $end
and MaximumDuration >= $reqDuration

And that gives us the relevant times that we can schedule the appointment. This is cheap to do, easy to work and it actually matches the common ways that users will use the system.

This has a bunch of other advantages, that are not immediately apparent but end up being quite important. Working with time sucks. The schedule above is a nice guideline, but it isn’t a really useful one when you need to run actual computations. Why is that? Well, it doesn’t account for vacations days. If there is a public holiday on Wednesday, the dentist isn’t working, but that is an implied assumption in the schedule.

For that matter, you now need to figure out which calendar to use. A Christian and a Jewish dentist are going to have different holiday calendars. Trying to push that into a query is going to be quite annoying, if not impossibly so. Putting that on the generator simplifies things, because you can “unroll” the schedule, apply the holiday calendar you want and then not think about it.

Other factors, such as vacation days, reserved time for emergencies and similar issues make it a lot easier to manage in a concrete form. Another important aspect is that the schedule changes, for any decent size clinic, the schedule changes all the time. You may have the dentist requesting to close a couple of hours early on Monday because of a dance recital and add more hours on Thursday. If the schedule is generated, this is a simple matter to do (manual adjusting). If we have just the schedule template, on the other hand… that becomes a lot more complex.

In short, the best way to handle this is to take the template schedule, generate it to a concrete schedule and operate from that point on.

Mar 07 2022

WebinarDatabase Security in a Hostile World

time to read 1 min | 33 words

Tweet Share Share 1 comments

Tags:

On March 9, I’m going to be talking about Database Security in a Hostile World. Please join me, I promise it will be interesting.

Mar 02 2022

RecordingRavenDB–a really boring database

time to read 1 min | 20 words

Tweet Share Share 2 comments

Tags:

Dejan is talking about RavenDB here:

Mar 01 2022

Using the .NET JIT to reduce abstraction overhead

time to read 2 min | 349 words

Tweet Share Share 9 comments

Tags:

I ran into this recently and I thought that this technique would make a great post. We are using that extensively inside of RavenDB to reduce the overhead of abstractions while not limiting our capabilities. It is probably best that I’ll start with an example. We have a need to perform some action, which needs to be specialized by the caller.

For example, let’s imagine that we want to aggregate the result of calling multiple services for a certain task. Consider the following code:

As you can see, the code above sends a single request to multiple locations and aggregates the results. The point is that we can separate the creation of the request (and all that this entails) from the actual logic for aggregating the results.

Here is a typical usage for this sort of code:

You can notice that the code is fairly simple, and uses lambdas for injecting the specialized behavior into the process.

That leads to a bunch of problems:

Delegate / lambda invocation is more expensive.
Lambdas need to be allocated.
They capture state (and may capture more and for a lot longer than you would expect).

In short, when I look at this, I see performance issues down the road. But it turns out that I can write very similar code, without any of those issues, like this:

Here, instead of passing lambdas, we pass an interface. That has the same exact cost as lambda, in fact. However, in this case we also specify that this interface must be implemented by a struct (value type). That leads to really interesting behavior, since at JIT time, the system knows that there is no abstraction here, it can do optimizations such as inlining or calling the method directly (with no abstraction overhead). It also means that any state that we capture is done so explicitly (and we won’t be tainted by other lambdas in the method).

We still have good a separation between the process we run and the way we specialize that, but without any runtime overhead on this. The code itself is a bit more verbose, but not too onerous.

Oren Eini

Oren Eini

CEO of RavenDB

Optimizing code by understanding usage

Negative Space Scheduling

WebinarDatabase Security in a Hostile World

RecordingRavenDB–a really boring database

Using the .NET JIT to reduce abstraction overhead

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed