Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,582
|
Comments: 51,212
Privacy Policy · Terms
filter by tags archive
time to read 5 min | 867 words

image_thumbIf has been two months since the first release candidate of RavenDB 4.0 and the team has been hard at work. Looking at the issues resolved in that time frame, there are over 500 of them, and I couldn’t be happier about the result.

RavenDB 4.0 RC2 is out: Get it here (Windows, Linux, OSX, Raspberry PI, Docker).

When we were going through the list of issues for this released, I noticed something really encouraging. The vast majority of them were things that would never make it into the release highlights. These are the kind of issues that are all about spit and polish. Anything from giving better error messages to improving the first few minutes of your setup and installation to just getting things done. This is a really good thing at this stage in the release cycle.  We are done with features and big ticket stuff. Now it is time to finishing grinding through all the myriads of details and small fixes that make a product really shine.

That said, there are still a bunch of really cool stuff that were cooking for a long time and that we could only now really call complete. This list includes:

  • Authentication and authorization – the foundation for that was laid a long time ago, with X509 client certificates used for authenticating clients against RavenDB 4.0 servers. The past few months had us building the user interface to manage these certificates, define permissions and access across the cluster.
  • Facet and MoreLikeThis queries – this is a feature that was available in RavenDB for quite some time and is now available as a integral part of the RavenDB Query Language. I’m going to have separate posts to discuss these, but they are pretty cool, albeit specialized ways to look at your data.
  • RQL improvements – we made RQL a lot smarter, allowing more complex queries and projections. Spatial support has been improved and is now much easier to work with and reason about using just raw RQL queries.
  • Server dashboardallows you to see exactly what your servers are doing and is meant to be something that the ops team can just hang on the wall and stare at in amazement realizing how much the database can do.
  • Operations – the operations team generally has a lot of new things to look at in this release. SNMP monitoring is back, and significant amount of work was spent on errors. That is, making sure that an admin will have clear and easy to understand errors and a path to fix them. Traffic monitoring and live tracing of logs is also available directly in the studio now. CSV import / export is also available in the studio, as well as Excel integration for the business people. Automatic backup processes are also available now for scheduled backups for both local and cloud targets and an admin has more options to control the database. This include compaction of databases after large deletes to restore space to the sytem.
  • Patching, querying and expiring UI  – this was mostly exposing existing functionality and improving the amount of details that we provide by default. Allowing users to define auto expiration policy for documents with time to live. On the querying side, we are showing a lot more information. My favorite feature there is that the studio can now show the result of including documents, which allow to easily show how this feature can save you in network roundtrips.  Queries & patching now has much much nicer UI and also support some really cool intellisense.
  • Performance – most of the performance work was already done, but we were able to identify some bottlenecks on the client side and reduce the amount of work it takes to save data to the database significantly. This especially affects bulk inert operations, but the effect is actually wide spread enough to impact most of the client operations.
  • Advanced Linq support – a lot of work has been put into the Linq provider (again) to enable more advanced scenarios and more complex queries.
  • ETL Processes -  are now exposed and allow you to define both RavenDB and SQL databases as target for automatic ETL from a RavenDB instance.
  • Cluster wide atomic operations – dubbed cmpxchng after the similar assembly instruction, this basic building block allow to build very complex distributed behaviors in a distributed environment without any hassle, relying on RavenDB consensus to verify that such operations are truly atomics.
  • Identity support – identities are now fully supported in the client and operate as a cluster wide operation. This means that you can rely on them being unique cluster wide.

Users provided really valuable feedback, finding a lot of pitfalls and stuff that didn’t make sense or flow properly. And that was a lot of help in reducing friction and getting things flowing smoothly.

There is another major feature that we worked on during this time, the setup process. And it may sound silly, but this is probably the one that I’m most excited about in this release. Excited enough that I’ll have a whole separate post for it, coming soon.

time to read 4 min | 755 words

imageRavenDB uses HTTP for most of its communication. It can be used in unsecured mode, using HTTP or in secured mode, using HTTPS. So far, this is pretty standard. Let us look at a couple of URLs:

  • http://github.com
  • https://github.com

If you try to go to github using HTTP, it will redirect you to the HTTPS site. It is very easy to do, because the URLs above are actually:

  • http://github.com:80
  • https://github.com:443

In other words, by default when you are using HTTP, you’ll use port 80, while HTTPS will default to port 443. This means that the server in port 80 can just read the response and redirect you immediately to the HTTPS endpoint.

RavenDB, however, it usually used in environments where you will explicitly specify a port. So the URL would look something like this:

  • http://a.orders.raven.local:8080
  • https://a.orders.raven.local:8080

It is very common for our users to start running with port 8080 in an unsecured mode, then later move to a secure mode with HTTPS but retain the same port. That can lead to some complications. For example, here is what happens in a similar situation if I’m trying to connect to an HTTPS endpoint using HTTP or vice versa.

image

image

This means that a common scenario (running on a non native port and using the wrong protocol) will lead to a nasty error. We call this a nasty error because the user has no real way to figure out what the issue is from the error. In many cases, this will trigger an escalation to the network admin or support ticket. This is the kind of issue that I hate, it is plainly obvious, but it is so hard to figure out and then you feel stupid for not realizing this upfront.

Let us see how we can resolve such an issue. I already gave some hints on how to do it earlier, but the technique in that  post wasn’t suitable for production use in our codebase. In particular, we introduced another Stream wrapping instance and another allocation that would affect all input / output calls over the network. We would really want to avoid that.

So we cheat (but we do that a lot, so this is fine). Kestrel allow us to define connection adapters, which give us a hook very early in the process to how the TCP connection is managed. However, that lead to another problem. We want to sniff the first byte of the raw TCP request, but Stream doesn’t provide a way to Peek at a byte, any such attempt will consume it, which will result in the same problem on an additional indirection that we wanted to avoid.

Therefor, we decided to take advantage of the way Kestrel is handling things. It is buffering data in memory and if you dig a bit you can access that in some very useful ways. Here is how we are able to sniff HTTP vs. HTTPS:

The key here is that we use a bit of reflection emit magic to get the inner IPipeReader instance from Kestrel. We have to do it this way because that value isn’t exposed externally. Once we do have the pipe reader instance, we borrow the already read buffer and inspect it, if the first character is a capital character (G from GET, P from PUT, etc), this is an HTTP connection (SSL connection’s first byte is either 22 or greater than 127, so there is no overlap). We then return the buffer to the stream and carry on, Kestrel will parse the request normally, but another portion in the pipeline will get the wrong protocol message and throw that to the user. And obviously we’ll skip doing the SSL negotiation.

This is important, because the client is speaking HTTP, and we can’t magically upgrade it to HTTPS without causing errors such as the one above. We need to speak the same protocol as the client expect.

With this code, trying to use the wrong protocol give us this error:

image

Now, if you are not reading the error message that might still mean a support call, but it should be resolved as soon as someone actually read the error message.

time to read 3 min | 586 words

I mentioned in a previous post that an SSL connection will typically use a Server Name Indication in the initial (unencrypted) packet to let the server know  which address it is interested in. This allow the server to do things such as select the appropriate certificate to answer this initial challenge.

A more interesting scenario is when you want to force your users to always use HTTPS. That is pretty trivial, you setup a website to listen on port 80 and port 443 and redirect all HTTP traffic from port 80 to port 443 as HTTPS. Pretty much any web server under the sun already have some sort of easy to use configuration for that that. Let us see how this will look like if we were writing this using bare bones Kestrel.

This is pretty easy, right? We setup a connection adapter on port 80, so we can detect that this is using the wrong port and then just redirect it. Notice that there is some magic that we need to apply here. At the connection adapter, we deal with raw TCP socket, but we don’t want to mess around with that, so we just pass the decision up the chain until we get to the part that deal with HTTP and let it send the redirect.

Pretty easy, right? But about about when a user does something like this?

http://my-awesome-service:443

Note that in this case, we are using the HTTP protocol and not the HTTPS protocol. At that point, things are a mess. A client will make a request and send a TCP packet containing HTTP request data, but the server is trying to parse that as an SSL client help message. What will usually happen is that the server will look at the incoming packet, decide that this is garbage and just close the connection. That lead to some really hard to figure out errors and much forehead slapping when you figure out what the issue is.

Now, I’m sure that you’ll agree that anyone seeing a URL as listed about will be a bit suspicious. But what about these ones?

  • http://my-awesome-service:8080
  • https://my-awesome-service:8080

Unlike before, where we would probably notice that :443 is the HTTPS port and we are using HTTP, here there is no additional indication about what the problem is. So we need to try both. And if a user is getting connection dropped error when trying the connection, there is very little chance that they’ll consider switching to HTTPS. It is far more likely that they will start looking at the firewall rules.

So now, we need to do protocol sniffing and figure out what to do from there. Let us see how this will look like in code:

We read the first few bytes of the request and see if this is the start of an SSL TCP connection. If it is, we forward the call to the usual Kestrel HTTPS behavior. If it isn’t, we mark the request as must redirect and pass it, as is, to the request parsed and ready for action and then send the redirect back.

In this way, any request on port 80 will be sent to port 443 and an HTTP request on a port that listens to HTTPS will be told that it needs to switch.

One note about the code in this post. This was written at 1:30 AM as a proof of concept only. I’m pretty sure that I’m heavily abusing the connection adapter system, especially with regards to the reflection bits there.

time to read 4 min | 667 words

imageI introduced the notion of frictionless software in the previous post, but I wanted to dedicate some time to talk about the deeper meaning for this kind of thinking. RavenDB is an open source product. There are a lot of business models around OSS projects, and the most common ones includes charging for support and services.

Hibernating Rhinos was founded because I wanted to write code. And the way the way we structured the company is primarily to write software and the tooling around it. We provide support and consulting services, certainly, but we aren’t looking at them as the money makers. From my perspective, we want to sell people RavenDB licenses, not to have them pay us to help them do things with RavenDB.

That means that from the company perspective, support is a cost center, not a revenue center. In other words, the more support calls I have, the sadder I become.

This mesh well with my professional pride. I want to create stuff that are useful, awesome and friction free. I want our users to take what we do and blast off, not to have them double check that their support contracts are up to date and that the support lines are open. I did a lot of study around that early on, and similar to Conway’s law, the structure of the company and its culture has deep impact on the software that it produces.

With support seen as a cost center, this lead to a ripple effect on the structure of the software. It means that error message are clearer, because if you give the user a good error message, maybe with some indication of how to fix the issue, they can resolve things on their own, without having to call support. It means that configuration and tuning should be minimal and mostly self served, instead of having to open a support ticket with “what should be my configuration settings for this or that scenario”.

It also means that we want to reduce as much as possible anything that might trip users up as they setup and use our software. You can see that with the RavenDB Studio, how we spend a tremendous amount of time and effort to make information accessible and actionable for the user. Be it the overall dashboard, or the deep insight into the internals, various graphs and metrics we expose, etc. The whole idea is to make sure that the users and admins have all the information and tooling they need in order to make things works without having to call support.

Now, to be clear, we have a support hotline with 24/7 availability, because at our scale and with the kind of software that we provide, you need that. But we are able to reduce the support load by an order of magnitude with such techniques. And it means that by and far, our support, when you need it, is going to be excellent (because we don’t need to deal with a lot of low level support issues). That means that we don’t need a many tiered support system and it take very little time to actually get to an engineer that has deep familiarity with the system and how to troubleshoot it.

There are a bunch of reasons why we went this route, treating support as a necessary overhead that needs to be reduced as much as possible. Building new features is much more interesting than fielding support calls, so we do our best to develop things so we’ll not have to spend much time on support. But mostly, it is about creating a product that is well round and complete. It’s about taking pride in not only having all the bells and whistles but also taking care to ensure that things work and that the level of friction you’ll run into using our products is as low as possible.

time to read 3 min | 428 words

imageWe are currently at the stage of the RavenDB release cycle where most of what we do is friction removal. Analyzing what is going on and removing friction along the way. This isn’t about performance, we are pretty much done with this for this release cycle.

Removing friction is figuring out all the myriad of ways in which users are going to use RavenDB and run into small annoyances. Things that work exactly as they should, but it can often add a tiny bump in the road toward success. In other words, not only do I want you to drop you into the pit of success, I want to make sure that you’ll get a cushioned landing.

I take pride in my work, and I think that the sand & polish stage, removing splinters and ensuring true frictionless experience is one of the most important stages in creating awesome products. It is also quite arduous one and it has very little visible impact on the product itself. If you are successful, no one will ever even know that you had done any work at all.

I was explaining this to my wife the other day and I think that I came up with a good metaphor to explain it. Think about wearing a pair of comfortable shoes. If they are truly comfortable you’ll not notice them. In fact, them being comfortable will not be anything to remark upon, it is just there. Now, turn it around and imagine a pair of shoes that are not uncomfortable.

You do notice them, and it can be quite painful. But what would you do if you are used to all shoes being painful. Take high heels as a good example. It is standard practice, I understand, to just assume that they will be painful. So if a shoe looks great but it painful to wear, many would wear it, accepting that it is painful. It is only when you wear comfortable shoes after wearing an uncomfortable ones that you can really notice.

You feel the lack of pain, where there used to be one.

Coming back to software from high fashion, these kind of features are hard and they are often unnoticed, but they jell together to create an awesome experience and a smooth, professional feeling for the product. Even if you need to look at what is going on the other side of the fence to realize how much is being done for you.

time to read 1 min | 77 words

I’m writing a chapter about indexing in RavenDB and I wanted to have the reader grasp the notion of indexing more easily. I came up with the following code that should explain what is going on:

I think that this does a good job of explaining how an index is actually used, at least to the point where even if you don’t understand indexing, you can sort of grasp what is going on behind the scenes.

Thoughts?

time to read 1 min | 194 words

Somehow it looks like we have a really busy social calendar. The good things about it is that we are probably somewhere near you.

imageWe are now on the cusp of releasing RavenDB 4.0, so it is a good time to come and hear us talk about it. We are left with some chores to do, but we can show you some really cool stuff. We are also doing a lot of talking about how we built RavenDB, in particular about the performance improvements.

If you are in Malmo, Sweden RavenDB has a booth in Oredev and Michael is going to talk about how you can Stay Friendly with the GC this Friday.

Next week, in Moscow, Russia is Federico talking about Patterns for high-performance C#.

Also next week, in Vilnius, Lithuania I’ll be speaking about Extreme Performance Architecture as well as giving a full day workshop on RavenDB 4.0.

At the end of the month, in Tel Aviv, Israel we’ll have a booth and show off RavenDB at the Microsoft Tech Summit.

time to read 2 min | 247 words

The following set of issues all fall into code that is used within the scope of a single assembly, and that is important. I’m writing this blog post before I got the chance to talk to the dev in question, so I’m guessing about intent.

image

This change is likely motivated by the fact that callers are not expected to make a modification to the resulting dictionary.

That said, this is used between different components in the same assembly, and is never exposed outside. That means that we have a much higher trust between the components, and reading IReadOnlyDictionary means that we need to spend more cycles trying to figure out who you are trying to protect from.

Equally important, in this case, the Dictionary methods can be called without any virtual call overhead, while the IReadOnlyDictionary needs interface dispatch to work.

image

This is a case that is a bit more subtle. The existingData is a variable that is passed to a method. The problem is that in this case, no one is ever going to send null, and sending a null is actually an error.

In this case, if we did get a null, I would rather that the code would immediately crash with “what just happened?” rather than limp along with bad data.

time to read 2 min | 232 words

imageSomething that I have noticed is that there is a strong reverse correlation between how long it takes to resolve a problem and the size of the change.  In other words, the more time you spend on investigating an issue, the less code will be required to fix the issue.

Case in point, we just closed an issue that took one of the best guys in the team almost a month of investigation to fix. The size of the change? 3 lines of code. My personal best is a 15 man weeks over 3 weeks period with 5 people head done trying to resolve a problem that ended up being a missing ToList() call.

This is usually when there are race conditions, hardware or very long test cycles involved. In this case, this was a problem that could only be reproduced on ARM devices with slow I/O and a particular race condition after we created a very big database.

Thinking about this, it make sense. The more time the investigation takes, the more things you rule out, so eventually it ends up with something subtle that doesn’t work. It make sense, but it can be frustrating for the developer, “I spent all this time, and that is the result?”.

What is your best bug story?

time to read 4 min | 737 words

DNS is used to resolve a hostname to an IP. This is something that most developers already know. What is not widely known, or at least not talked so much is the structure of the DNS network. To the right you can find the the map of root servers, at least in a historical point of view, but I’ll get to it.

If we have root servers, then we also have non root servers, and probably non root ones. In fact, the whole DNS system is based on 13 well known root servers who then delegate authority to servers who own the relevant portion of the namespace, you can see that in the diagram below. It goes down like that for pretty much forever.

File:Dns-server-hierarchy.gif

Things become a lot more interesting when you start to consider that traversing the full DNS path is fast, but it is done trillions of times per day. Because of that, there are always caching DNS servers in the middle. This is where the TTL (time to live) aspect of DNS records come into play.

A DNS is basically just a distributed database with very slow updates. The root servers allow you to reach the owner of a piece of the namespace and from that you can extract the relevant records for that namespace. All of that is backed with the premise that DNS values change rarely and that you can cache them for long durations, typically minutes at the low end and usually for days.

This means that a DNS query will most often hit a cache along the way and not have to traverse the entire path. For that matter, portions of the path are also cached. For example, the DNS route for the “.com” domain is usually cached for 48 hours. So even if you are using a new hostname, you’ll typically be able to skip the whole “let’s go to the root server” and stop at somewhere along the way.

For developers, the most common usage of DNS is when you’ll edit the “/etc/hosts” to enable some scenario (such as local development with the real URLs). But most organizations has their own DNS (if only so you’ll be able to find other machines on the organization network). This include the ability to modify the results of the public DNS, although this is mostly done at coffee shops.

I also mentioned earlier that the the map above is a historical view of how things used to be. This is where things gets really confusing. Remember when I said that a DNS is mapping a hostname to IP? Well, the common view about an IP being a pointer to a single server is actually false. Welcome to the wonderful world of IP Anycast. Using anycast, you can basically specify multiple servers with the same IP. You’ll typically route to the nearest node and you’ll usually only do that for connectionless protocols (such as DNS). This is one of the ways that the 13 root servers are actually implemented. The IPs are routed to multiple locations.

This misdirection is done by effectively laying down multiple paths to the same IP address using the low level routing protocols (a developer will rarely need to concern themselves with that, this is the realm of infrastructure and network engineers). This is how the internet usually works, you have multiple paths that you can send a packet and you’ll chose the best one. In this case, instead of all the paths terminating in a single location, they’ll each terminate in a different one, but they will behave in the same manner. This is typically only useful for UDP, since each packet in such a case may reach a totally different server, so you cannot use TCP or any connection oriented protocols.

Another really interesting aspect of DNS is that there really isn’t any limitation on the kind of answers it returns. In other words, querying “localtest.me” will give you 127.0.0.1 back, even though this is an entry that reside on the global internet, not in your own local network. There are all sort of fun games that one can play with this approach, by making a global address point to a local IP address. One of them is the possibility of issuing a SSL certificate for a local server, which isn’t expose to the internet. But that is a hack for another time.

FUTURE POSTS

  1. fsync()-ing a directory on Linux (and not Windows) - 13 hours from now

There are posts all the way to Jun 09, 2025

RECENT SERIES

  1. Webinar (7):
    05 Jun 2025 - Think inside the database
  2. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  3. RavenDB News (2):
    02 May 2025 - May 2025
  4. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  5. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}