Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,590
|
Comments: 51,218
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 175 words

During code review, I ran into the following code (simplified):

We have some wrapper object for IEnumerable and allow to access it by index.

I don’t like this code, and suggested this version, instead:

The problem is that if you’ll look at the code of ElementAt(), it is already doing that, so why the duplicate work? It is specialized to make things fast for similar scenarios, why do I write the same code again?

Because I’m familiar with the usage scenario. A really common usage pattern for the wrapper object is something like this:

The difference between what ElementAt() does and what I do in the second version of the code is that my version will materialize the values. If you are calling that in a for loop, you’ll pay the cost of iterating over all the items once.

On the other hand, since the instance we pass to ElementAt() isn’t one that has an indexer, we’ll have to scan through the enumerator multiple times. A for loop with this implementation is a quadratic accident waiting to happen.

time to read 5 min | 828 words

imageMy first project as a professional software developer was to build a scheduling system for a dental clinics chain. That was a huge project (multiple years) and was really quite interesting. Looking back, I have done a lot of really cool technical things there. I also learned quite a lot from that project. The danger of complexity being one of the chief issues.

Consider a dental clinic, where we have the following schedule for a dentist:

  • Monday – 10:00 – 15:00
  • Wednesday – 09:00 – 13:00
  • Thursday – 11:30 – 16:30

The task is to be able to schedule an appointment for the dentist given those rules.

In addition to the schedule of the dentist, we also have actual Appointments, those looks like this:

Assume that you have quite a few of those, and you want to schedule a new appointment for a patient. How would you do that? I’m a database guy, let’s see if I can specify the task as a query?

We need a dentist that has availability of a particular length (different tasks have different schedules) and particular qualifications. However, there is no such thing as availability in our model. We have just:

  • Dentist
  • Schedule
  • Appointment

The complexity here is that we need to search for something that isn’t there.

I actually found some of my posts on this topic, from 2006. That isn’t a simple problem. And the solution is usually to generate the missing data and query on that. My old posts on the topic actually generate an in memory table and operate on that, which is great for small datasets, but will fail in interesting ways for real world datasets.

For what it’s worth, RavenDB allows you to generate the missing data during the indexing process, so at least the queries are fast, but the indexing process is now compute-intensive and a change in the dentist schedule can result in a lot of work.

All of that is because of two issues:

  • We are trying to query for the data that isn’t there.
  • The information is never used as queried.

These two points are strongly related to one another. Consider how you would schedule a dentist appointment. You first need to find the rough time frame that you need (“come back in six months”) and then you need to match it to your schedule (“I can’t on Monday, I got the kids”, etc).

There is a better way to handle that, by filling in the missing pieces. Instead of trying to compute the schedule of a dentist from the specification that we have, go the other way around. Generate the schedule based on the template you have. The result should be something like this:

In other words, based on the schedule provided, we’ll generate an entry per day for the dentist. That entry will contain the appointments for the day as well as the maximum duration for an available appointment. That means that on query time, we can do something as simple as:

from Schedules
where Dentist = $dentistId
and     At between $start and $end
and     MaximumDuration >= $reqDuration

And that gives us the relevant times that we can schedule the appointment. This is cheap to do, easy to work and it actually matches the common ways that users will use the system.

This has a bunch of other advantages, that are not immediately apparent but end up being quite important. Working with time sucks. The schedule above is a nice guideline, but it isn’t a really useful one when you need to run actual computations. Why is that? Well, it doesn’t account for vacations days. If there is a public holiday on Wednesday, the dentist isn’t working, but that is an implied assumption in the schedule.

For that matter, you now need to figure out which calendar to use. A Christian and a Jewish dentist are going to have different holiday calendars. Trying to push that into a query is going to be quite annoying, if not impossibly so. Putting that on the generator simplifies things, because you can “unroll” the schedule, apply the holiday calendar you want and then not think about it.

Other factors, such as vacation days, reserved time for emergencies and similar issues make it a lot easier to manage in a concrete form. Another important aspect is that the schedule changes, for any decent size clinic, the schedule changes all the time. You may have the dentist requesting to close a couple of hours early on Monday because of a dance recital and add more hours on Thursday. If the schedule is generated, this is a simple matter to do (manual adjusting). If we have just the schedule template, on the other hand… that becomes a lot more complex.

In short, the best way to handle this is to take the template schedule, generate it to a concrete schedule and operate from that point on.

time to read 2 min | 349 words

I ran into this recently and I thought that this technique would make a great post. We are using that extensively inside of RavenDB to reduce the overhead of abstractions while not limiting our capabilities. It is probably best that I’ll start with an example. We have a need to perform some action, which needs to be specialized by the caller.

For example, let’s imagine that we want to aggregate the result of calling multiple services for a certain task. Consider the following code:

As you can see, the code above sends a single request to multiple locations and aggregates the results. The point is that we can separate the creation of the request (and all that this entails) from the actual logic for aggregating the results.

Here is a typical usage for this sort of code:

You can notice that the code is fairly simple, and uses lambdas for injecting the specialized behavior into the process.

That leads to a bunch of problems:

  • Delegate / lambda invocation is more expensive.
  • Lambdas need to be allocated.
  • They capture state (and may capture more and for a lot longer than you would expect).

In short, when I look at this, I see performance issues down the road. But it turns out that I can write very similar code, without any of those issues, like this:

Here, instead of passing lambdas, we pass an interface. That has the same exact cost as lambda, in fact. However, in this case we also specify that this interface must be implemented by a struct (value type). That leads to really interesting behavior, since at JIT time, the system knows that there is no abstraction here, it can do optimizations such as inlining or calling the method directly (with no abstraction overhead). It also means that any state that we capture is done so explicitly (and we won’t be tainted by other lambdas in the method).

We still have good a separation between the process we run and the way we specialize that, but without any runtime overhead on this. The code itself is a bit more verbose, but not too onerous.

FUTURE POSTS

  1. RavenDB and Gen AI Security - about one day from now
  2. RavenDB & Distributed Debugging - 5 days from now
  3. RavenDB & Ansible - 8 days from now

There are posts all the way to Jul 21, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}