Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,583
|
Comments: 51,213
Privacy Policy · Terms
filter by tags archive
time to read 14 min | 2727 words

Fungible is a funny word, mostly because you are most likely familiar with the term from NFT (non-fungible tokens) and other similar scams. At its core, it is the idea that for certain things, the instance doesn’t matter, just the amount.

The classic example is that if I lend you a 50$ bill, and you give me back two 20$ bills and a 10$ bill, you’ve still given me back my money. That is even though you very clearly didn’t. I didn’t get the same physical 50$ paper bill back, I got bills for that same amount. On the other hand, if I give you my dog for the weekend, I would be quite upset if I got back three different dogs, even if the total weight is the same.

This is actually a lot more than I want to know about fungibility, to be honest. But it turns out that if you are running a cloud business or just use the cloud in general, you have to be well-versed in the matter. Because in the cloud, money isn’t fungible. In fact, it doesn’t behave a lot like money at all.

Let’s assume that we are a cloud company called cloud.example.com, offering VPS for ourr users. You are in charge of writing the billing code, and it is pretty simple, right? Here is some code that can compute the charges:


function compute_charges(custId, start, end) {
  let total = 0;
  let predicate = instance =>
    (instance.custId === custId  && instance.started < end) &&
    (instance.ended > start      || instance.ended == null);


  for (let instance of query_instances(predicate)) {
    total += instance.hours_running(start, end) *
             instance.price_per_hour;
  }


  return total;
}

As you can see, there isn’t much there. We find all the instances that were running in the billing period and then calculate the total hours they ran during that period. Please note, this is a simplified model as we aren’t dealing with stopping & starting instances, etc.

The output of the compute_charges() function is a number, which will presumably be handed over to be charged over a credit card. There are other things that we need to do as well (generate an invoice, have a usage report, etc), but I want to focus on the money issue here.

The simplest model is that at the end of the billing period, we charge the customer (using a credit card, for example) and receive our payment. Everyone is happy and we can go home, hopefully richer.

The challenge arises when we want to offer additional options to the customer. For example, we may be willing to give the customer a discount if they are going to commit to a minimum amount of money they’ll spend each month. We may want to offer them upfront payment options or give monetary incentives to a particular aspect of the business (run on ARM instances instead of X64, for example).

Each time that we make such an offer, we are going to be turning around and (significantly) complicating the way we bill the customer. Let’s talk about something as simple as committing to run an instance for a whole year. No upfront payment, just a commitment to pay for a particular server for a year. In AWS or Azure, that would be Reserved Instances, so you are likely very familiar with the idea.

How is that going to be expressed in code? Probably something like this:


function compute_charges(custId, start, end) {
    let total = 0;
    let predicate = instance => /*..redacted.*/;
   
    var hrsPerIns = {};
    for (let i of this.instances(predicate)) {
        let hours = i.hours_running(start, end);
        hrsPerIns[i.type] = hours + (hrsPerIns[i.type] || 0);
        total += hours * i.price_per_hour;
    }


    for (let c of this.commitmentsFor(custId, start, end)) {
        let hours = c.committed_time(start, end);
        let hoursUsed = hrsPerIns[c.type] || 0;
        let unusedCommittedHours = Math.max(0, hours - hoursUsed);
        total += unusedCommittedHours *
                this.instance(c.type).price_per_hour;
    }
 
    return total;
  }

To be clear, the code above is not a good way to handle such a task, but it does show in a pretty succinct way the hidden complexities. In this case, if you didn’t meet your commitment, we’ll charge you for the unused commitment as well.

A more complex system would have to account for discounted rates while using the committed values, for example. And in that case, the priority of applying such rates between different matching commitments.

Other aspects may be giving the user a discount for a particular level of usage. So the first 100GB are priced differently from the rest, applying a free tier and… you get the point, I think. It gets complex.

Note that at this point, we aren’t even talking about money yet, we are discussing computing the charges. The situation is more interesting when we move to the next stage. On the face of it, this seems pretty simple, all you need to do is charge the credit card, no?

Okay, maybe you need to send an invoice, but that is about it, right?

Well… what happens if the customer made an upfront payment for one of those commitments? Or just accidentally paid twice last month and now has credit on your system.

I’m going to leave aside the whole complexity around payments bouncing (which is a whole other interesting topic) and how to deal with the actual charging. Right now I want to focus on the nature of money itself.

Imagine you have a commitment with a customer for an 8-core / 64 GB VPS server for a whole year. And they paid upfront, getting a nice discount along the way. How would you record that in your system?

The easiest is to create the notion of credit for the user, which you deduct whenever you need to charge them. So we’ll first compute the charges, then deduct the existing credits, and debit the customer if anything remains. This is simple, easy to work with, and wrong.

Remember that discount the user received? They paid for that particular VPS type, and if you now need to charge them for anything else (such as storage charges), that money cannot come from the funds paid for the VPS.

In other words, the money the customer paid is not fungible. It isn’t applicable for any charge, it is colored. It is dedicated to a particular purpose. And managing that turns out to be pretty complex. Mostly because we are trying to fit everything into the debits and credits on the account.

A better model is to avoid using money, in the same way that if you mix inches and centimeters you’ll eventually end up in a bad place on Mars. The solution is to treat each individual charge as its own “currency”.

In other words, when computing the charges, we aren’t trying to find the cost of running a particular instance for the billing period. We are trying to find how many “cost units” we have for that time period.

Instead of getting a single number that we’ll charge the customer, we’ll obtain a detailed set of the changes in question. Not as money, but as cost units. Think about those in a similar way to currency.  Note that all the units are multiples of 730 hours (number of hours per month, on average).


compute_charges(custId, start, end) => {
    custId: 'customers/3291-B',
    start: '2024-01-01', end: '2024-01-31',
    costs: [
      {type: '8Cores-64GB-hours',  qty: 2190},
      {type: '4Cores-32GB-hours',  qty:  730},
      {type: 'disk-5000-iops',     qty: 2920},
    ],
}

The next step after that is to get your allocated budget for the same billing period, which will look something like this:


compute_budget(custId, start, end) => {
    custId: 'customers/3291-B',
    start: '2024-01-01', end: '2024-01-31',
    commitments: [
      {type: '8Cores-64GB-hours',  qty: 2190},
      {type: '4Cores-32GB-hours',  qty: 1460},
      {type: 'disk-5000-iops',     qty: 730},
    ],
}

In other words, just as we compute the charges based on the actual usage for that billing period, we apply the same approach on the commitments we have. The next stage is to just add all of those together. In this case, we’ll end up with the following:

  • 8Cores-64GB-hours ⇒ 0 (we used as much as we committed to)
  • 4Cores-32GB-hours ⇒ -730 (we committed to more than we used)
  • Disk-5000-iops ⇒ 2190 (remaining use after applying commitment, priced as you go)

We aren’t done yet, after commitments, there are other plans that we may need to run. For example, we’ll provide you with some global discounts for VM rental (which doesn’t apply to disks, however). Working at the level of cost units (or colors, or currency, whatever term you like) allows us to apply those things in a very fine-grained manner. More importantly, the end result and all its intermediate steps are very clear. That is quite important when you look at a six-figure bill with hundreds of line items and you want to see whether the billing matches your contract or not.

As you can imagine, given the inherent complexity of the system, being able to clearly “show your work” is quite important. Especially when there is a misunderstanding or questions are being raised (and there will be).

What we have done now is compute the actual charges based on their type, but we need to convert that to real money. There are several steps along this process:

  1. We need to charge all the active commitments. Those may have been pre-paid (in which case there is no current charge), but they may have a (fixed) monthly cost that we need to add to the current invoice.
  2. We need to perform a “currency conversion” between the units we have and actual money. In the example above, we have a negative number of units (for 4Cores-32GB-hours), as we committed to more hours than we actually used. We are still being charged for this by applying the rate from the commitment.
  3. On the other hand, when we examine the disk costs, we used more than we committed to. Here we need to make a decision about what price we’ll charge the user. It can be the commitment price or the pay-as-you-go price. So even for the same currency we may have different rules.

After all of this is done, we are now left with a final number. The actual amount of money that we need to charge the customer. This is the point at which we check if the customer has any credit already paid in the system or if we need to make an actual charge. That aspect is complicated by whether you are charging a credit card (same for any other automatic billing option) or issuing an invoice to be paid manually.

For a manual invoice, you now have a whole other process. For example, you may offer discounts for the customer if they pay within 14 days versus the usual 30, or charge a fee for paying within 60 days, etc.

I’m not touching on collections or what to do when you fail to charge the customer. It is shockingly common to encounter payment failures. To the point where we never had a single payment run that didn’t include at least several such cases. The reasons range from deal size too big to (temporary) lack of funds to suspicious-seeming activity. You need to be able to handle that as well. But those are topics for another post.

In this post, my aim was to discuss just the issue of the complexity of money in the cloud business. I find the model of treating the charges as separate “currencies” to be a nice one overall, but I would love to hear about other people’s experiences in this matter.

time to read 4 min | 672 words

A not insignificant part of my job is to go over code. Today I want to discuss how we approach code reviews at RavenDB, not from a process perspective but from an operational one. I have been a developer for nearly 25 years now, and I’ve come to realize that when I’m doing a code review I’m actually looking at the code from three separate perspectives.

The first, and most obvious one, is when I’m actually looking for problems in the code - ensuring that I can understand what is going on, confirming the flow makes sense, etc. This involves looking at the code as it is right now.

I’m going to be showing snippets of code reviews here. You are not actually expected to follow the code, only the concepts that we talk about here.

Here is a classic code review comment:

There is some duplicated code that we need to manage. Another comment that I liked is this one, pointing out a potential optimization in the code:

If we define the code using the static keyword, we’ll avoid delegate allocation and save some memory, yay!

It gets more interesting when the code is correct and proper, but may do something weird in some cases, such as in this one:

I really love it when I run into those because they allow me to actually explore the problem thoroughly. Here is an even better example, this isn’t about a problem in the code, but a discussion on its impact.

RavenDB has been around for over 15 years, and being able to go back and look at those conversations in a decade or so is invaluable to understanding what is going on. It also ensures that we can share current knowledge a lot more easily.

Speaking of long running-projects, take a look at the following comment:

Here we need to provide some context to explain. The _caseInsensitive variable here is a concurrent dictionary, and the change is a pretty simple optimization to avoid the annoying KeyValuePair overload. Except… this code is there intentionally, we use it to ensure that the removal operation will only succeed if both the key and the value match. There was an old bug that happened when we removed blindly and the end result was that an updated value was removed.

In this case, we look at the code change from a historical perspective and realize that a modification would reintroduce old (bad) behavior. We added a comment to explain that in detail in the code (and there already was a test to catch it if this happens again).

By far, the most important and critical part of doing code reviews, in my opinion, is not focusing on what is or what was, but on what will be. In other words, when I’m looking at a piece of code, I’m considering not only what it is doing right now, but also what we’ll be doing with it in the future.

Here is a simple example of what I mean, showing a change to a perfectly fine piece of code:

The problem is that the if statement will call InitializeCmd(), but we previously called it using a different condition. We are essentially testing for the same thing using two different methods, and while currently we end up with the same situation, in the future we need to be aware that this may change.

I believe one of the major shifts in my thinking about code reviews came about because I mostly work on RavenDB, and we have kept the project running over a long period of time. Focusing on making sure that we have a sustainable and maintainable code base over the long haul is important. Especially because you need to experience those benefits over time to really appreciate looking at codebase changes from a historical perspective.

time to read 20 min | 3954 words

I've been writing this blog since 2004. That means I have been doing this for twenty years, which is frankly unbelievable to me. The actual date is sometime in April, so I’ll probably do a summary post then about that.

What I want to talk about today is a different aspect. The mechanism and processes I use to write blog posts. A large part of the reason I write blog posts is that it helps me understand and organize my own thoughts. And in order to do that effectively, I have found that I need very little friction in the blogging process.

About a decade ago, Google Reader was shut down, and I’m still very bitter about that. It effectively killed a significant portion of the blogging audience and made the ergonomics of reading blogs a lot harder. That also led people to use walled gardens to communicate with others, instead of the decentralized network and feed aggregators. A side effect of that decision is that blogging tools have stopped being a viable thing people spend time or money on.

At the time, I was using Windows Live Writer, which was a high-quality editor and had a rich plugin system. Microsoft discontinued it at some point, it became an open-source project, and even that died. The website is no longer functional and even in terms of the GitHub project, the last commit was 5 years ago.

I’m still using Open Live Writer to write the majority of my blog posts, but given there are no longer any plugins, even something as simple as embedding code in my posts has become an… annoyance. That kills the ergonomics of blogging for me.

Not a problem, this is Open Source, and I can do that myself. Except… I really don’t have the time to spend on something ancillary like that. I would happily pay (a reasonable amount) for a blogging client, but I’m going to assume that I’m not part of a large enough group that there is a market for this.

Taking the code snippets example, I can go into the code, figure out what is going on there, and add a “code snippet” feature. I estimate that would take several days. Alternatively, I can place the code as a GitHub gist and embed it in the page. It is annoying, but far quicker than going to the trouble of figuring that out.

Another issue that bugs me (pun intended) is a problem with copy/paste of images, where taking screenshots using the Snipping Tool doesn’t paste into Writer. I need to first paste them into Paint, then into Writer. In this case, I assume that Writer doesn’t recognize the clipboard format or something similar.  

Finally, it turns out that I’m not writing blog posts in the same manner as I used to. It got to the point where I asked people to review my posts before making them public. It turns out that no matter how many times it is corrected, my brain seems unable to discern when to write “whether” or “whatever”, for example. At this point I gave up updating that piece of software 🙂. Even the use of emojis doesn’t work properly (Open Live Writer mostly predates a lot of them and breaks the HTML in a weird fashion 🤷).

In other words, there are several problems in my current workflow, and it has finally reached the point where I need to do something about it. The last requirement, by the way, is the most onerous. Consider the workflow of getting the following fixes to a blog post:

  • and we run => and we ran
  • we spend => we spent

Where is my collaborating editing and the ability to suggest changes with good UX? Improving the ergonomics for the blog has just expanded in scope massively. Now it is a full-fledged publishing platform with modern sensibilities. It’s 2024, features like proper spelling and grammar corrections should absolutely be there, no? And what about AI integration? It turns out that predicting text makes the writing process more efficient. Here is what this may look like:

At this stage, this isn’t just a few minor fixes. I should mention that for the past decade and a half or so, I stopped considering myself as someone who can do UI in any meaningful manner. I find that the <table/> tag, which used to be my old reliable method, is not recommended now, for some reason.

This… kind of sucks. I want to upgrade my process by a couple of decades, but I don’t want to pay the price for that. If only there was an easier way to do that.

I started using Google Docs to edit my blog posts, then pasting them into Live Writer or directly to the blog (using a Rich Text Box with an editor from… a decade ago). I had to check the source code for this, by the way. The entire experience is decidedly Developer UX. Then I had a thought, I already have a pretty good process of writing the blog posts in Google Docs, right? It handles rich text editing and management much better than the editor in the blog. There are also options for things like proper workflows. For example, someone can go over my drafts and make comments or suggestions.

The only thing that I need is to put both of those together. I have to admit that I spent quite some time just trying to figure out how to get the document from Google Docs using code. The authentication hurdles are… significant to someone who isn’t aware of how it all plugs together. Once I got that done, I got my publishing platform with modern features. Here is what the end result looks like:


public class PublishingPlatform
{
  private readonly DocsService GoogleDocs;
  private readonly DriveService GoogleDrive;
  private readonly Client _blogClient;


  public PublishingPlatform(string googleConfigPath, string blogUser, string blogPassword)
  {
    var blogInfo = new MetaWeblogClient.BlogConnectionInfo(
     "https://ayende.com/blog",
     "https://ayende.com/blog/Services/MetaWeblogAPI.ashx",
     "ayende.com", blogUser, blogPassword);
    _blogClient = new MetaWeblogClient.Client(blogInfo);


    var initializer = new BaseClientService.Initializer
    {
      HttpClientInitializer = GoogleWebAuthorizationBroker.AuthorizeAsync(
          GoogleClientSecrets.FromFile(googleConfigPath).Secrets,
          new[] { DocsService.Scope.Documents, DriveService.Scope.DriveReadonly },
          "user", CancellationToken.None,
          new FileDataStore("blog.ayende.com")
      ).Result
    };


    GoogleDocs = new DocsService(initializer);
    GoogleDrive = new DriveService(initializer);
  }


  public void Publish(string documentId)
  {
    using var file = GoogleDrive.Files.Export(documentId, "application/zip").ExecuteAsStream();
    var zip = new ZipArchive(file, ZipArchiveMode.Read);


    var doc = GoogleDocs.Documents.Get(documentId).Execute();
    var title = doc.Title;


    var htmlFile = zip.Entries.First(e => Path.GetExtension(e.Name).ToLower() == ".html");
    using var stream = htmlFile.Open();
    var htmlDoc = new HtmlDocument();
    htmlDoc.Load(stream);
    var body = htmlDoc.DocumentNode.SelectSingleNode("//body");


    var (postId, tags) = ReadPostIdAndTags(body);


    UpdateLinks(body);
    StripCodeHeader(body);
    UploadImages(zip, body, GenerateSlug(title));


    string post = GetPostContents(htmlDoc, body);


    if (postId != null)
    {
      _blogClient.EditPost(postId, title, post, tags, true);
      return;
    }


    postId = _blogClient.NewPost(title, post, tags, true, null);


    var update = new BatchUpdateDocumentRequest();
    update.Requests = [new Request
    {
      InsertText = new InsertTextRequest
      {
        Text = $"PostId: {postId}\r\n",
        Location = new Location
        {
          Index = 1,
        }
      },
    }];


    GoogleDocs.Documents.BatchUpdate(update, documentId).Execute();
  }


  private void StripCodeHeader(HtmlNode body)
  {
    foreach(var remove in body.SelectNodes("//span[text()='&#60419;']").ToArray())
    {
      remove.Remove();
    }
    foreach (var remove in body.SelectNodes("//span[text()='&#60418;']").ToArray())
    {
      remove.Remove();
    }
  }


  private static string GetPostContents(HtmlDocument htmlDoc, HtmlNode body)
  {
    // we use the @scope element to ensure that the document style doesn't "leak" outside
    var style = htmlDoc.DocumentNode.SelectSingleNode("//head/style[@type='text/css']").InnerText;
    var post = "<style>@scope {" + style + "}</style> " + body.InnerHtml;
    return post;
  }


  private static void UpdateLinks(HtmlNode body)
  {
    // Google Docs put a redirect like: https://www.google.com/url?q=ACTUAL_URL
    foreach (var link in body.SelectNodes("//a[@href]").ToArray())
    {
      var href = new Uri(link.Attributes["href"].Value);
      var url = HttpUtility.ParseQueryString(href.Query)["q"];
      if (url != null)
      {
        link.Attributes["href"].Value = url;
      }
    }
  }


  private static (string? postId, List<string> tags) ReadPostIdAndTags(HtmlNode body)
  {
    string? postId = null;
    var tags = new List<string>();
    foreach (var span in body.SelectNodes("//span"))
    {
      var text = span.InnerText.Trim();
      const string TagsPrefix = "Tags:";
      const string PostIdPrefix = "PostId:";
      if (text.StartsWith(TagsPrefix, StringComparison.OrdinalIgnoreCase))
      {
        tags.AddRange(text.Substring(TagsPrefix.Length).Split(","));
        RemoveElement(span);
      }
      else if (text.StartsWith(PostIdPrefix, StringComparison.OrdinalIgnoreCase))
      {
        postId = text.Substring(PostIdPrefix.Length).Trim();
        RemoveElement(span);
      }
    }
    // after we removed post id & tags, trim the empty lines
    while (body.FirstChild.InnerText.Trim() is "&nbsp;" or "")
    {
      body.RemoveChild(body.FirstChild);
    }
    return (postId, tags);
  }


  private static void RemoveElement(HtmlNode element)
  {
    do
    {
      var parent = element.ParentNode;
      parent.RemoveChild(element);
      element = parent;
    } while (element?.ChildNodes?.Count == 0);
  }


  private void UploadImages(ZipArchive zip, HtmlNode body, string slug)
  {
    var mapping = new Dictionary<string, string>();
    foreach (var image in zip.Entries.Where(x => Path.GetDirectoryName(x.FullName) == "images"))
    {
      var type = Path.GetExtension(image.Name).ToLower() switch
      {
        ".png" => "image/png",
        ".jpg" or "jpeg" => "image/jpg",
        _ => "application/octet-stream"
      };
      using var contents = image.Open();
      var ms = new MemoryStream();
      contents.CopyTo(ms);
      var bytes = ms.ToArray();
      var result = _blogClient.NewMediaObject(slug + "/" + Path.GetFileName(image.Name), type, bytes);
      mapping[image.FullName] = new UriBuilder { Path = result.URL }.Uri.AbsolutePath;
    }
    foreach (var img in body.SelectNodes("//img[@src]").ToArray())
    {
      if (mapping.TryGetValue(img.Attributes["src"].Value, out var path))
      {
        img.Attributes["src"].Value = path;
      }
    }
  }


  private static string GenerateSlug(string title)
  {
    var slug = title.Replace(" ", "");
    foreach (var ch in Path.GetInvalidFileNameChars())
    {
      slug = slug.Replace(ch, '-');
    }


    return slug;
  }
}

You’ll probably not appreciate this, but the fact that I can just push code like that into the document and get it with proper formatting easily is a major lifestyle improvement from my point of view.

The code works with the document in two ways. First, in the Document DOM (which is quite complex), it extracts the title of the blog post and afterward updates it with the document ID. But the core of this code is to extract the document as a zip file, grab everything from there, and push that to the blog. I do some editing for the HTML to get everything set up properly, mostly editing the links and uploading the images. There is also some stuff happening with CSS scopes that I frankly don’t understand. I think I got it right, which is fine for now.

This cost me a couple of evenings, and it was fun. Nothing earth-shattering, I’ll admit. But it’s the first time in a while that I actually wrote a piece of code that was immediately useful. My blogging queue is rather full, and I hope that with this new process it will be easier to push the ideas out of my head and to the blog.

And with that, it is now 01:26 AM, and I’m going to call it a night 🙂.

And as a final thought, I had just made several changes to the post after publication, and it went smoothly. I think that I like it.

time to read 1 min | 103 words

Jaime and I had a really good discussion about RavenDB, why I took the time to create my own NoSql database engine, and the fact that I built it using .NET Core before it was released (back in the pre-1.0 days, when it was known as dnx), and some of the optimisation stories that I worked on when creating RavenDB. Along the way, we cover what the GC (or garbage collector) is, performance issues to look out for when dealing with large JSON objects, and some tips for those who want to optimise their applications.

You can listen to it here.

Would love your feedback.

time to read 2 min | 235 words

1

If you are reading this blog, I assume that you are a like-minded person. My idea of relaxation is to sit and write code. Hopefully on something that I’m not familiar with. I have many such blog post series covering topics I care about. It’s my idea of meditation.

For the end of 2023, I thought that we could do something similar but on a broader scale. A while ago Alex Klaus wrote a walkthrough on how to build a complete application from scratch using modern best practices (and RavenDB). We refreshed the code and made it widely available, offering you something fun , educational, and productive to engage with.

The system is a bug tracker (allowing us to focus on the architecture rather than domain concerns), and you can play with a deployed version live. The code is available under the MIT license, and we’ll be very happy to receive any suggested improvements.

Topics that are covered:

  1. Building an enterprise application with the .NET and RavenDB

  2. Non-Relational Data Modeling Through Domain Driven Design Prism

  3. Hidden side of document IDs in RavenDB

  4. Dynamic Fields for Indexing

  5. Entity Relationships in non-relational database (one-to-many, many-to-many)

  6. Multi-tenant database in NoSQL

  7. Database Integration Testing – The Secret Recipe

As usual, I would love any feedback you have to offer.

time to read 4 min | 622 words

A customer contacted us to complain about a highly unstable cluster in their production system. The metrics didn’t support the situation, however. There was no excess load on the cluster in terms of CPU and memory, but there were a lot of network issues. The cluster got to the point where it would just flat-out be unable to connect from one node to another.

It was obviously some sort of a network issue, but our ping and network tests worked just fine. Something else was going on. Somehow, the server would get to a point where it would be simply inaccessible for a period of time, then accessible, then not, etc. What was weird was that the usual metrics didn’t give us anything. The logs were fine, as were memory and CPU. The network was stable throughout.

If the first level of metrics isn’t telling a story, we need to dig deeper. So we did, and we found something really interesting. Here is the total number of TCP connections on the server over time.

So there are a lot of connections on the system, which is choking it? But the CPU is fine, so what is going on? Are we being attacked? We looked at the connections, but they all came from authorized machines, and the firewall was locked down tight.

unnamed

If you look closely at the graph, you can see that it hits 32K connections at its peak. That is a really interesting number, because 32K is also the number of ephemeral port range values for Linux. In other words, we basically hit the OS limit for how many connections could be sustained between a client and a server.

The question is what could be generating all of those connections? Remember, they are coming from a trusted source and are valid operations.  Indeed, digging deeper we could see that there are a lot of connections in the TIME_WAIT state.

We asked to look at the client code to figure out what was going on. Here is what we found:

There is… not much here, as you can see. And certainly nothing that should cause us to generate a stupendous amount of connections to the server. In fact, this is a very short process. It is going to run, read a single line from the input, write a document to RavenDB, and then exit.

To understand what is actually going on, we need to zoom out and understand the system at a higher level. Let’s assume that the script above is called using the following manner:

What will happen now? All of this code is pretty innocent, I’m sure you can tell. But together, we are going to get the following interesting behavior:

For each line in the input, we’ll invoke the script, which will spawn a separate process to connect to RavenDB, write a single document to the server, and exit. Immediately afterward, we'll have another such process, etc.

Each of those processes is going to have a separate connection, identified by a quartet of (src ip, src port, dst ip, dst port). And there are only so many such ports available on the OS. Once you close a connection, it is moved to a TIME_WAIT mode, and any packets that arrive for the specified connection quartet are going to be assumed to be from the old connection and drop. Generate enough new connections fast enough, and you literally lock yourself out of the network.

The solution to this problem is to avoid using a separate process for each interaction. Aside from alleviating the connection issue (which also requires non trivial cost on the server) it allows RavenDB to far better optimize network and traffic patterns.

time to read 2 min | 290 words

In the previous post, I showed a very simple request router that would always select the fastest node. That worked for a long while, until it didn’t, and the challenge is figuring out why.

As it turns out, the issue is a simple one of spooky action at a distance. Here is what happens. Assume that we have three servers and 10 clients. Each server is sized to handle 4 clients. So far, so good, the system has the capacity to spare.

The problem is in the manner in which clients will detect which is the fastest node in the cluster. The only thing that is considered is the state of the node at the time of selection. At that time, we may end up with all the nodes selecting one particular node as the fastest.

In other words, we have three servers, two of them have no clients talking to them and one of the servers has all the clients talking to it. That results in that node going down, obviously. The clients would then react appropriately, and select a new node to talk to. All of them would do that, find the fastest node, and… bring it down as well. Rinse & repeat.

The issue can be stated as Time Of Check vs Time Of Use, but also as a race condition, where all individual nodes end up doing a synchronized “wave” operation that kills the system.

How do you prevent this?

You introduce randomness into the system. You don’t test the status once, but re-check on a regular basis so you can respond to shifting load. You should also introduce randomness into the process. So the nodes won’t all do this exactly at the same time and end up in the same position.

time to read 1 min | 186 words

Side note: Current state in Israel right now is bad. I’m writing this blog post as a form of escapism so I can talk about something that makes sense and follow logic and reason. I’ll not comment on the current status otherwise in this area.

Consider the following scenario. We have a bunch of servers and clients. The clients want to send requests for processing to the fastest node that they have available. But the algorithm that was implemented has an issue, can you see what this is?

To simplify things, we are going to assume that the work that is being done for each request is the same, so we don’t need to worry about different request workloads.

The idea is that each client node will find the fastest node (usually meaning the nearest one) and if there is enough load on the server to have it start throwing errors, it will try to find another one. This system has successfully spread the load across all servers, until one day, the entire system went down. And then it stayed down.

Can you figure out what is the issue?

time to read 3 min | 479 words

At some point in any performance optimization sprint, you are going to run into a super annoying problem: The dictionary.

The reasoning is quite simple. One of the most powerful optimization techniques is to use a cache, which is usually implemented as a dictionary. Today’s tale is about a dictionary, but surprisingly enough, not about a cache.

Let’s set up the background, I’m looking at optimizing a big indexing batch deep inside RavenDB, and here is my current focus:

image

You can see that the RecordTermsForEntries take 4% of the overall indexing time. That is… a lot, as you can imagine.

What is more interesting here is why. The simplified version of the code looks like this:

Basically, we are registering, for each entry, all the terms that belong to it. This is complicated by the fact that we are doing the process in stages:

  1. Create the entries
  2. Process the terms for the entries
  3. Write the terms to persistent storage (giving them the recorded term id)
  4. Update the entries to record the term ids that they belong to

The part of the code that we are looking at now is the last one, where we already wrote the terms to persistent storage and we need to update the entries. This is needed so when we read them, we’ll be able to find the relevant terms.

At any rate, you can see that this method cost is absolutely dominated by the dictionary call. In fact, we are actually using an optimized method here to avoid doing a TryGetValue() and then Add() in case the value is not already in the dictionary.

If we actually look at the metrics, this is actually kind of awesome. We are calling the dictionary almost 400 million times and it is able to do the work in under 200 nanoseconds per call.

That is pretty awesome, but that still means that we have over 2% of our total indexing time spent doing lookups. Can we do better?

In this case, absolutely. Here is how this works, instead of doing a dictionary lookup, we are going to store a list. And the entry will record the index of the item in the list. Here is what this looks like:

There isn’t much to this process, I admit. I was lucky that in this case, we were able to reorder things in such a way that skipping the dictionary lookup is a viable method.

In other cases, we would need to record the index at the creation of the entry (effectively reserving the position) and then use that later.

And the result is…

image

That is pretty good, even if I say so myself. The cost went down from 3.6 microseconds per call to 1.3 microseconds. That is almost 3 folds improvement.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  2. Webinar (7):
    05 Jun 2025 - Think inside the database
  3. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  4. RavenDB News (2):
    02 May 2025 - May 2025
  5. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}