RavenDB Sharding–Map/Reduce in a cluster

time to read 6 min | 1047 words

In my previous post, I introduced RavenDB Sharding and discussed how we can use sharding in RavenDB. We discussed both blind sharding and data driven sharding. Today I want to introduce another aspect of RavenDB Sharding. The usage of Map/Reduce to  gather information from multiple shards.

We start by defining a map/reduce index. In this case, we want to look at the invoice totals per date. We define the index like this:

public class InvoicesAmountByDate : AbstractIndexCreationTask<Invoice, InvoicesAmountByDate.ReduceResult>
{
    public class ReduceResult
    {
        public decimal Amount { get; set; }
        public DateTime IssuedAt { get; set; }
    }

    public InvoicesAmountByDate()
    {
        Map = invoices =>
              from invoice in invoices
              select new
              {
                  invoice.Amount,
                invoice.IssuedAt
              };

        Reduce = results =>
                 from result in results
                 group result by result.IssuedAt
                 into g
                 select new
                 {
                     Amount = g.Sum(x => x.Amount),
                    IssuedAt = g.Key
                 };
    }
}

And then we execute the following code:

using (var session = documentStore.OpenSession())
{
    var asian = new Company { Name = "Company 1", Region = "Asia" };
    session.Store(asian);
    var middleEastern = new Company { Name = "Company 2", Region = "Middle-East" };
    session.Store(middleEastern);
    var american = new Company { Name = "Company 3", Region = "America" };
    session.Store(american);

    session.Store(new Invoice { CompanyId = american.Id, Amount = 3, IssuedAt = DateTime.Today.AddDays(-1)});
    session.Store(new Invoice { CompanyId = asian.Id, Amount = 5, IssuedAt = DateTime.Today.AddDays(-1) });
    session.Store(new Invoice { CompanyId = middleEastern.Id, Amount = 12, IssuedAt = DateTime.Today });
    session.SaveChanges();
}

We use a three way sharding, based on the region of the company, so we actually have the following document sin three different servers:

First server, Asia:

image

Second server, Middle East:

image

Third server, America:

image

Now, let us see what happen when we use the map/reduce query:

using (var session = documentStore.OpenSession())
{
    var reduceResults = session.Query<InvoicesAmountByDate.ReduceResult, InvoicesAmountByDate>()
        .ToList();

    foreach (var reduceResult in reduceResults)
    {
        string dateStr = reduceResult.IssuedAt.ToString("MMM dd, yyyy", CultureInfo.InvariantCulture);
        Console.WriteLine("{0}: {1}", dateStr, reduceResult.Amount);
    }
    Console.WriteLine();
}

As you can see, again, we make no distinction in our code about using sharding, we just query it normally. The results, however, are quite interesting:

image

As you can see, we got the correct results, cluster wide.

RavenDB was able to query all the servers in the cluster for their results, reduce them again, and get us the total across all three servers.

And that, my friends, it truly awesome.