Ayende @ Rahien

Jan 02 2024

Recording.NET Rocks Data Sharding with Oren Eini

time to read 1 min | 139 words

Tags:

This was actually released a while ago, I was occupied with other matters and missed that.

I had a blast talking with Carl & Richard about data sharding and how we implemented that in RavenDB.

What is data sharding, and why do you need it? Carl and Richard talk to Oren Eini about his latest work on RavenDB, including the new data sharding feature. Oren talks about the power of sharding a database across multiple servers to improve performance on massive data sets. While a sharded database is typically in a single data center, it is possible to distribute the shards across multiple locations. The conversation explores the advantages and disadvantages of the different approaches, including that you might not need it today, but it's great to know it's there when you do!

You can listen to the podcast here.

Jan 01 2024

Recording.NET Core podcast on RavenDB, performance and .NET

time to read 1 min | 103 words

Tweet Share Share 0 comments

Tags:

Jaime and I had a really good discussion about RavenDB, why I took the time to create my own NoSql database engine, and the fact that I built it using .NET Core before it was released (back in the pre-1.0 days, when it was known as dnx), and some of the optimisation stories that I worked on when creating RavenDB. Along the way, we cover what the GC (or garbage collector) is, performance issues to look out for when dealing with large JSON objects, and some tips for those who want to optimise their applications.

You can listen to it here.

Would love your feedback.

Oct 19 2023

.NET Rocks: Data Sharding with Oren Eini

time to read 1 min | 122 words

Tweet Share Share 0 comments

Tags:

You can listen to me talk to Carl & Richard on RavenDB Sharding here.

What is data sharding, and why do you need it? Carl and Richard talk to Oren Eini about his latest work on RavenDB, including the new data sharding feature. Oren talks about the power of sharding a database across multiple servers to improve performance on massive data sets. While a sharded database is typically in a single data center, it is possible to distribute the shards across multiple locations. The conversation explores the advantages and disadvantages of the different approaches, including that you might not need it today, but it's great to know it's there when you do!

This episode was recorded a while ago, and just went live.

Aug 28 2023

RecordingRavenDB and High Performance with Oren Eini

time to read 1 min | 34 words

Tweet Share Share 0 comments

Tags:

I spoke for over an hour about how you can build high-performance systems and how we utilize these techniques inside of RavenDB.

Aug 15 2023

Unhandled Exception Episode 55: RavenDB and Database Internals - with Oren Eini

time to read 1 min | 30 words

Tweet Share Share 0 comments

Tags:

You can listen to me talk with Dan in the Unhandled Exception podcast, where we dug deep into the internals of database engines.

As usual, I would love your feedback.

Aug 04 2023

Technology & Friends: Oren Eini on Building Projects that Endure

time to read 1 min | 22 words

Tweet Share Share 0 comments

Tags:

Jul 21 2023

PodcastHansleminutes - All the Performance with RavenDB's Oren Eini

time to read 1 min | 110 words

Tweet Share Share 2 comments

Tags:

I had a great time talking with Scott Hanselman about how we achieve great performance for RavenDB with .NET.

You can listen to the podcast here, as usual, I would love your feedback.

In this episode, we talk to Oren Eini from RavenDB. RavenDB is a NoSQL document database that offers high performance, scalability, and security. Oren shares his insights on why performance is not just a feature, but a service that developers and customers expect and demand. He also explains how RavenDB achieves fast and reliable data access, how it handles complex queries and distributed transactions, and how it leverages the cloud to optimize resource utilization and cost efficiency!

Jul 04 2023

Café debug - Interview with Oren Eini CEO of RavenDB

time to read 1 min | 14 words

Tweet Share Share 0 comments

Tags:

Apr 26 2023

Fight for every byte it takesNibbling at the costs

time to read 4 min | 750 words

Tweet Share Share 0 comments

Tags:

In my last post we implemented variable-sized encoding to be able to pack even more data into the page. We were able to achieve 40% better density because of that. This is pretty awesome, but we would still like to do better. There are two disadvantages for variable size integers:

They may take more space than the actual raw numbers.
The number of branches is high, and non-predictable.

Given that we need to encode the key and value together, let’s see if we can do better. We know that both the key and the value are 8 bytes long. Using little-endian systems, we can consider the number as a byte array.

Consider this number: 139,713,513,353 which is composed of the following bytes: [137, 7, 147, 135, 32, 0, 0, 0]. This is how it looks in memory. This means, that we only need the first 5 bytes, not the last 3 zero ones.

It turns out that there is a very simple way to compute the number of used bytes, like so:

This translates into the following assembly:

Which is about as tight as you can want it to be.

Of course, there is a problem. In order to read the value back, we need to store the number of bytes we used somewhere. For variable-sized integers, they use the top bit until they run out. But we cannot do that here.

Remember however, that we encode two numbers here. And the length of the number is 8 bytes. In binary, that means that we need 4 bits to encode the length of each number. This means that if we’ll take an additional byte, we can fit the length of both numbers into a single byte.

The length of the key and the value would each fit on a nibble inside that byte. Here is what the encoding step looks like now:

And in assembly:

Note that there are no branches at all here. Which I’m really stoked about. As for decoding, we just have to go the other way around:

No branches, and really predictable code.

That is all great, but what about the sizes? We are always taking 4 additional bits per number. So it is actually a single additional byte for each entry we encode. By using varint, the memory we encode numbers that are beyond the 2GB range, we’re already winning. Encoding (3,221,241,856), for example, will cost us 5 bytes (since we limit the range of each byte to 7 bits). The key advantage in our case is that if we have any case where either key or value needs to take an additional byte, we are at parity with the nibble method. If both of them need that, we are winning, since the nibble method will use a single additional byte and the variable size integer will take two (one for each number).

Now that we understand encoding and decoding, the rest of the code is basically the same. We just changed the internal format of the entry, nothing about the rest of the code changes.

And the results?

For the realistic dataset, we can fit 759 entries versus 712 for the variable integer model.

For the full dataset, we can fit 752 entries versus 710 for the variable integer model.

That is a 7% improvement in size, but it also comes with a really important benefit. Fewer branches.

This is the sort of code that runs billions of times a second. Reducing its latency has a profound impact on overall performance. One of the things that we pay attention to in high-performance code is the number of branches, because we are using super scalar CPUs, multiple instructions may execute in parallel at the chip level. A branch may cause us to stall (we have to wait until the result is known before we can execute the next instruction), so the processor will try to predict what the result of the branch would be. If this is a highly predictable branch (an error code that is almost never taken, for example), there is very little cost to that.

The variable integer code, on the other hand, is nothing but branches, and as far as the CPU is concerned, there is no way to actually predict what the result will be, so it has to wait. Branchless or well-predicted code is a key aspect of high-performance code. And this approach can have a big impact.

As a reminder, we started at 511 items, and we are now at 759. The question is, can we do more?

I’ll talk about it in the next post…

Mar 10 2023

Next week: Kobo's Journey Into High Performance and Reliable Document Databases at InfoQ

time to read 2 min | 262 words

Tweet Share Share 0 comments

Tags:

Trevor Hunter from Kobo Rakuten is going to be speaking about Kobo’s usage of RavenDB in a webinar next Wednesday.

When I started at Kobo, we needed to look beyond the relational and into document databases. Our initial technology choice didn't work out for us in terms of reliability, performance, or flexibility, so we looked for something new and set on a journey of discovery, exploratory testing, and having fun pushing contender technologies to their limits (and breaking them!). In this talk, you'll hear about our challenges, how we evaluated the options, and our experience since widely adopting RavenDB. You'll learn about how we use it, areas we're still a bit wary of, and features we're eager to make more use of. I'll also dive into the key aspects of development - from how it affects our unit testing to the need for a "modern DBA" role on a development team.

About the speaker: Trevor Hunter: "I am a leader and coach with a knack for technology. I’m a Chief Technology Officer, a mountain biker, a husband, and a Dad. My curiosity to understand how things work and my desire to use that understanding to help others are the things I hope my kids inherit from me. I am currently the Chief Technology Officer of Rakuten Kobo. Here I lead the Research & Development organization where our mission is to deliver the best devices and the best services for our readers. We innovate, create partnerships, and deliver software, hardware, and services to millions of users worldwide."

You can register to the webinar here.

Oren Eini

Oren Eini

CEO of RavenDB

Recording.NET Rocks Data Sharding with Oren Eini

Recording.NET Core podcast on RavenDB, performance and .NET

.NET Rocks: Data Sharding with Oren Eini

RecordingRavenDB and High Performance with Oren Eini

Unhandled Exception Episode 55: RavenDB and Database Internals - with Oren Eini

Technology & Friends: Oren Eini on Building Projects that Endure

PodcastHansleminutes - All the Performance with RavenDB's Oren Eini

Café debug - Interview with Oren Eini CEO of RavenDB

Fight for every byte it takesNibbling at the costs

Next week: Kobo's Journey Into High Performance and Reliable Document Databases at InfoQ

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed