RavenDB is really great in aggregations. Even if you have a stupendous amount of information, it will do really well in crunching through the data and summarizing it for you. This is due to the way RavenDB implements map/reduce operations, it allows us to instantly give you aggregation results, regardless of data size. However, this approach requires that you’ll tell RavenDB up front how you want to do the aggregation. This allows RavenDB to do the work ahead of time, as you modify the data, instead of each time you query for it.
A question was raised in the mailing list. How can you use RavenDB to do arbitrary ranges during aggregation. For example, let’s say that we want to be able to look at the total charges for a customer, but slice it so we’ll have:
- Active – 7 days back
- Recent – 7 – 21 days back
- History – 21 – 60 days back
And each user can define they own time period for Active / Recent / History ranges. That complicate things somewhat, but luckily, RavenDB has multiple ways to solve this issue. First, let’s create the appropriate index for this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
// map from order in docs.Orders select new { order.Company, order.OrderedAt.Date, Count = 1, Total = order.Lines.Sum(l => (l.Quantity * l.PricePerUnit) * (1 - l.Discount)) } // reduce from result in results group result by new{ result.Company , result.Date } into g select new { g.Key.Company, g.Key.Date, Count = g.Sum(x => x.Count), Total = g.Sum(x => x.Total) }
This isn’t really anything special. It will simply aggregate the data by company and by date. That isn’t enough for what we want to query. For that, we need to reach for another tool in our belt, facets.
Here is the relevant query:
And here is what the output looks like:
The idea is that we have two stages for the process. First, we use the map/reduce index to pre-aggregate the data at a daily level. Then we use facets to roll up the information to the level we desire. Instead of having to go through the raw data, we can operate on the partially aggregated data and dramatically reduce the overall cost.