Ayende @ Rahien

filter by tags archive

architecture (612) rss
bugs (451) rss
challanges (123) rss
community (379) rss
databases (481) rss
design (895) rss
development (641) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1449) rss
ravendb.net (533) rss
reviews (184) rss

2025
- June (5)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

May 21 2013

RavenDB, Victory

time to read 2 min | 299 words

Tweet Share Share 4 comments

Tags:

raven

Jeremy Miller’s post about Would I use RavenDB again has been making the round. It is a good post, and I was asked to comment on it by multiple people.

I wanted to comment very briefly on some of the issues that were brought up:

Memory consumption – this is probably mostly related to the long term session usage, which we expect to be much more short lived.

The 2nd level cache is mostly there to speed things up when you have relatively small documents. If you have very large documents, or routinely have requests that return many documents, that can be a memory hog. That said, the 2nd level cache is limited to 2,048 items by default, so that shouldn’t really be a big issue. And you can change that (or even turn it off) with ease.

Don’t abstract RavenDB too much – yeah, that is pretty much has been our recommendation for a while.

I don’t see this as a problem. You have just the same issue if you are using any OR/M against an RDBMS.

Bulk Insert – the issue has already been fixed. In fact, IIRC, it was fixed within a day or two of the issue being brought up.
Eventual Consistency – Yes, you need to decide how to handle that. As Jeremy said, there are several ways of handling that, from using natural keys with no query latency associated with them to calling WaitForNonStaleResultsAsOfNow();

Truthfully, the thing that really caught my eye wasn’t Jeremy’s post, but one of the comments:

Thanks you, we spend a lot of time on that!

May 20 2013

Fixing up the build process

time to read 2 min | 325 words

Tweet Share Share 11 comments

Tags:

raven

There is a big problem in the RavenDB build process. To be rather more exact, there is a… long problem in the RavenDB build process.

As you can imagine, when the build process run for that long, it doesn’t' get run too often. We already had several runs of “let us optimize the build”. But… the actual reason for the tests taking this long is a bit sneaky.

To save you from having to do the math, this means an average of 1.15 seconds per test.

In most tests, we actually have to create a RavenDB instance. That doesn’t take too long, but it does take some time. And we have a lot of tests that uses the network, because we need to test how RavenDB works on the wire.

From that perspective, it means that we don’t seem to have any real options. Even if we cut the average cost of running the tests by half, it would still be a 30 minutes build process.

Instead, we are going to create a layered approach. We are going to freeze all of our existing tests, move them to an Integration Tests project. We will create a small suite of tests that cover just core stuff with RavenDB, and use that. Over time, we will be adding tests to the new test project. When that becomes too slow, we will have another migration.

What about the integration tests? Well, those will be run solely by our build server, and we will setup things so we can automatically test when running from our own forks, not the main code line.

May 17 2013

RavenDB WebinarAggregation just jump a grade or two…

time to read 1 min | 39 words

Tweet Share Share 3 comments

Tags:

raven

In tomorrow’s Webinar, we will discuss handle dynamic aggregation using RavenDB. A new feature in 2.5, this is meant to give you more options for reporting queries, including complex aggregation, dynamic selection, etc.

You can register here: https://www2.gotomeeting.com/register/789291530

May 16 2013

The difference between Ordering & Boosting

time to read 2 min | 320 words

Tweet Share Share 3 comments

Tags:

raven

This seems to be a pretty common issue with people getting the two of them confused. As an example, let us take the users in Stack Overflow:

Here, we want to get the users in order. We want to get all the users in descending order of reputation.

But what happens when we want to do an actual search, for example, we want to get users by tag. Perhaps we want to get someone that knows some ravendb.

Here is the data that we have to work with:

Now, when searching, we want to be able to do the following. Find users that match what the tags that we specified, that are relevant and have them show up in reputation order.

And that is where it kills us. Relevancy & order are pretty much exclusive. Before we can explain that, we need to understand that order is absolute, but relevancy is not. If I have 10,000 tags, there is very little meaning to me having a tag or not. But if I have 10 tags, me having a tag or not is a lot more important. You want to talk with an expert in a specific field, not just someone who is a jack of all trades.

Now, it might be that you want to apply some boost factor to users with high reputation, because there are people who are jack of all trades and master of most. That is the difference between boosting and ordering.

Ordering is absolute, while boosting is a factor applied against the relative relevancy of the current query.

May 15 2013

How not to deal with Replication Lag

time to read 3 min | 480 words

Tweet Share Share 4 comments

Tags:

raven

Because RavenDB replication is async in nature, there is a period of time between a write has been committed on the master and until it is visible to the clients.

A user has requested that we would provide a low latency way to provide a solution to that. The idea was that the master server would report to the secondaries that a write happened, and then they would mark all reads from them for those documents as dirty, until replication caught up.

Implementation wise, this is all ready to happen. We have the Changes API, which is an easy way to get changes from a db. We have the ability to return a 204 Non Authoritative response, so it looks easy.

In theory, it sounds reasonable, but this idea just doesn’t hold water. Let us talk about normal operations. Even with the “low latency” notifications (and replication is about as low latency as it already get), we have to deal with a window of time between the write completing on the master and the notification arriving on the secondaries. In fact, it is the exact same window as with replication. Sure, if you have a high replication load, that might be different, but those tend to be rare (high write load, very big documents, etc).

But let us assume that this is really the case. What about failures?

Let us assume Server A & B and client C. Client C makes a write to A, A notifies B and when C reads from B, it would get 204 response until A replicates to B. All nice & dandy. But what happens when A can’t talk to B ? Remember a server being down is the easiest scenario, the hard part is when both A & B are operational, but can’t talk to one another. RavenDB is designed to gracefully handle network splits and merges, so what would happen in this case?

Client C writes to A, but A can’t notify B or replicate to it. Client C reads from B, but since B got no notification about a change, it return 200 Ok response, which means that this is the latest version. Problem.

In this case, this is actually a bigger problem than you might consider. If we support the notifications under the standard scenario, user will make assumptions about this. They will have separate code paths for non authoritative responses, for example. But as we have seen, we have a window of time where the reply would say it is authoritative even though it isn’t (a very short one, sure, but still) and under failure scenarios we will out right lie.

It is better not to have this “feature” at all, and let the user handle that on his own (and there are ways to handle that, reading from the master for important stuff, for example).

May 14 2013

RavenDB Clusters & Write Assurances

time to read 3 min | 563 words

Tweet Share Share 6 comments

Tags:

raven

RavenDB handles replication in an async manner. Let us say that you have 5 nodes in your cluster, set to use master/master replication.

That means that you call SaveChanges(), the value is saved to the a node, and then replicated to other nodes. But what happens when you have safety requirements? What happens if a node goes down after the call to SaveChanges() was completed, but before it replicate the information out?

In other systems, you have the ability to specify W factor, to how many nodes this value will be written before it is considered “safe”. In RavenDB, we decided to go in a similar route. Here is the code:

   1: await session.StoreAsync(user);

   2: await session.SaveChangesAsyng(); // save to one of the nodes

3:

   4: var userEtag = session.Advanced.GetEtagFor(user);

5:

   6: var replicas = await store.Replication.WaitAsync(etag: userEtag, repliacs: 1);

As you can see, we now have a way to actually wait until replication is completed. We will ping all of the replicas, waiting to see that replication has matched or exceeded the etag that we just wrote. You can specify the number of replicas that are required for this to complete.

Practically speaking, you can specify a timeout, and if the nodes aren’t reachable, you will get an error about that.

This gives you the ability to handle write assurances very easily. And you can choose how to handle this, on a case by case basis (you care to wait for users to be created, but not for new comments, for example) or globally.

May 13 2013

RavenDB & Locking indexes

time to read 2 min | 338 words

Tweet Share Share 1 comments

Tags:

raven

One of the things that we keep thinking about with RavenDB is how to make it easier for you to run in production.

To that end, we introduce a new feature in 2.5, Index Locking. This looks like this:

But what does this mean, to lock an index?

Well, let us consider a production system, in which you have the following index:

from u in docs.Users
select new
{
   Query = new[] { u.Name, u.Email, u.Email.Split('@') }
}

After you go to production, you realize that you actually needed to also include the FullName in the search queries as well. You can, obviously, do a full deployment from scratch, but it is generally so much easier to just fix the index definition on the production server, update the index definition on the codebase, and wait for the next deploy for them to match.

This works, except that in many cases, RavenDB applications call IndexCreation.CreateIndexes() on start up. Which means that on the next startup of your application, the change you just did will be reverted. These options allows you to lock an index for changes, either in such a way that gives you the ability ignore changes to this index, or by raising an error when someone tries to modify the index

It is important to note that this is not a security feature, you can at any time unlock the index. This is there to make help operations, that is all.

May 10 2013

Better patching API for RavenDB: Creating New Documents

time to read 2 min | 329 words

Tweet Share Share 5 comments

Tags:

raven

A while ago we introduced the ability to send js scripts to RavenDB for server side execution. And we have just recently completed a nice improvement on that feature, the ability to create new documents from existing ones.

Here is how it works:

store.DatabaseCommands.UpdateByIndex("TestIndex",
                                     new IndexQuery {Query = "Exported:false"},
                                     new ScriptedPatchRequest { Script = script }
  ).WaitForCompletion();

Where the script looks like this:

for(var i = 0; i < this.Comments.length; i++ ) {
   PutDocument('comments/', {
    Title: this.Comments[i].Title,
    User: this.Comments[i].User.Name,
    By: this.Comments[i].User.Id
  });
}

this.Export = true;

This will create a set of documents for each of the embedded documents.

May 09 2013

RavenDB Map/Reduce optimizations

time to read 2 min | 216 words

Tweet Share Share 2 comments

Tags:

raven

So I was diagnosing a customer problem, which required me to write the following:

This is working on a data set of about half a million records.

I took a peek at the stats and I saw this:

You can ignore everything before 03:23, this is the previous index run. I reset it to make sure that I have a clean test.

What you can see is that we start out with a mapping & reducing values. And you can see that initially this is quite expensive. But very quickly we recognize that we are reducing a single value, and we switch strategies to a more efficient method, and we suddenly have very little cost involved in here. In fact, you can see that the entire process took about 3 minutes from start to finish, and very quickly we got to the point where are bottle neck was actually the maps pushing data our way.

That is pretty cool.

May 08 2013

The state of Rhino Mocks

time to read 1 min | 127 words

Tweet Share Share 13 comments

Tags:

rhino mocks

I was asked to comment on the current state of Rhino Mocks. The current codebase is located here: https://github.com/hibernating-rhinos/rhino-mocks

The last commit was 2 years ago. And I am no longer actively / passively monitoring the mailing list.

From my perspective, Rhino Mocks is done. Done in the sense that I don’t have any interest in extending it, done in the sense that I don’t really use mocking any longer.

If there is anyone in the community that wants to steps in and take charge as the Rhino Mocks project leader, I would love that. Failing that, the code it there, it works quite nicely, but that is all I am going to be doing with this for the time being and the foreseeable future.

Oren Eini

Oren Eini

CEO of RavenDB

RavenDB, Victory

Fixing up the build process

RavenDB WebinarAggregation just jump a grade or two…

The difference between Ordering & Boosting

How not to deal with Replication Lag

RavenDB Clusters & Write Assurances

RavenDB & Locking indexes

Better patching API for RavenDB: Creating New Documents

RavenDB Map/Reduce optimizations

The state of Rhino Mocks

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed