It is the little things that trips you: Reducing RavenDB’s Map/Reduce cost
After finally managing to get to Inbox Zero, I had the chance to tackle some problems that were on the back burner. One of the more acute ones was RavenDB performance for map/reduce indexes.
Standard indexes are actually very fast, especially since I just gave them an additional boost, but map/reduce indexes had two problems:
- The reduce operation is currently single threaded (for all indexes).
- The reduce operation blocks standard indexing.
In order to avoid that, we split the work so reduce would run in a separate thread from standard indexing operations. That done, I started to take a look at the actual cost of map/reduce operations.
It quickly became apparent that while the map part was pretty fast, it was the reduce operation that was killing us.
After some narrowing down, I was able to figure out that this is the code at fault:
public override void Execute(WorkContext context) { if (ReduceKeys.Length == 0) return; var viewGenerator = context.IndexDefinitionStorage.GetViewGenerator(Index); if (viewGenerator == null) return; // deleted view? context.TransactionaStorage.Batch(actions => { IEnumerable<object> mappedResults = null; foreach (var reduceKey in ReduceKeys) { IEnumerable<object> enumerable = actions.MappedResults.GetMappedResults(Index, reduceKey, MapReduceIndex.ComputeHash(Index, reduceKey)) .Select(JsonToExpando.Convert); if (mappedResults == null) mappedResults = enumerable; else mappedResults = mappedResults.Concat(enumerable); } context.IndexStorage.Reduce(Index, viewGenerator, mappedResults, context, actions, ReduceKeys); }); }
Can you see the problem?
My first thought was that we had a problem with the code inside the foreach, since it effectively generate something like:
select * from MappedResulsts where Index = "MapReduceTestIndex" and ReduceKey = "Oren" select * from MappedResulsts where Index = "MapReduceTestIndex" and ReduceKey = "Ayende" select * from MappedResulsts where Index = "MapReduceTestIndex" and ReduceKey = "Arava"
And usually you’ll have about 2,500 of those.
Indeed, I modified the code to look like this:
public override void Execute(WorkContext context) { if (ReduceKeys.Length == 0) return; var viewGenerator = context.IndexDefinitionStorage.GetViewGenerator(Index); if (viewGenerator == null) return; // deleted view? context.TransactionaStorage.Batch(actions => { IEnumerable<object> mappedResults = new object[0]; foreach (var reduceKey in ReduceKeys) { IEnumerable<object> enumerable = actions.MappedResults.GetMappedResults(Index, reduceKey, MapReduceIndex.ComputeHash(Index, reduceKey)) .Select(JsonToExpando.Convert); mappedResults = mappedResults.Concat(enumerable); } var sp = Stopwatch.StartNew(); Console.WriteLine("Starting to read {0} reduce keys", ReduceKeys.Length); var results = mappedResults.ToArray(); Console.WriteLine("Read {0} reduce keys in {1} with {2} results", ReduceKeys.Length, sp.Elapsed, results.Length); context.IndexStorage.Reduce(Index, viewGenerator, results, context, actions, ReduceKeys); }); }
And got the following:
Starting to read 2470 reduce keys
Read 2470 reduce keys in 00:57:57.5292856 with 2499 results
Yes, for 2,470 results, that took almost an hour!!
I started planning how to fix this by moving to what is effectively an “IN” approach, when I realized that I skipped a very important step, I didn’t run this through the profiler. And as we know, if it haven’t run through the profiler, it isn’t real, when we are talking about performance testing.
And the profiler led me to this method:
public IEnumerable<JObject> GetMappedResults(string view, string reduceKey, byte[] viewAndReduceKeyHashed) { Api.JetSetCurrentIndex(session, MappedResults, "by_reduce_key_and_view_hashed"); Api.MakeKey(session, MappedResults, viewAndReduceKeyHashed, MakeKeyGrbit.NewKey); if (Api.TrySeek(session, MappedResults, SeekGrbit.SeekEQ) == false) yield break; Api.MakeKey(session, MappedResults, viewAndReduceKeyHashed, MakeKeyGrbit.NewKey); Api.JetSetIndexRange(session, MappedResults, SetIndexRangeGrbit.RangeUpperLimit | SetIndexRangeGrbit.RangeInclusive); if (Api.TryMoveFirst(session, MappedResults) == false) yield break; do { // we need to check that we don't have hash collisions var currentReduceKey = Api.RetrieveColumnAsString(session, MappedResults, tableColumnsCache.MappedResultsColumns["reduce_key"]); if (currentReduceKey != reduceKey) continue; var currentView = Api.RetrieveColumnAsString(session, MappedResults, tableColumnsCache.MappedResultsColumns["view"]); if (currentView != view) continue; yield return Api.RetrieveColumn(session, MappedResults, tableColumnsCache.MappedResultsColumns["data"]).ToJObject(); } while (Api.TryMoveNext(session, MappedResults)); }
You might have noticed the part that stand out , yes, we actually had a O(n) algorithm here (effectively a table scan) that was absolutely meaningless and didn’t need to be there. I am pretty sure that it was test code that I wrote (yes, I run git blame on that and ordered some crow for tomorrow’s lunch).
Once I removed that, things changed for the better . To give you an idea how much, take a look at the new results:
Starting to read 2470 reduce keys
Read 2470 reduce keys in 00:00:00.4907501 with 2499 results
And just for fun, I had tested how long it takes to reduce each batch of results:
Indexed 2470 reduce keys in 00:00:01.0533390 with 2499 results
And that is without doing any actual optimizations, that is just removing the brain-dead code that had no business being there.
Comments
"After some narrowing down..."
That is what confuses me...how did you narrow it down without a profiler?
When the app is under full load pressing pause in the debugger a few times has proven to be a very effective hot spot detector.
Frank,
I run that part in isolation in a console app, and took measurements using Stopwatch.
I had to do that before I could run the profiler, because otherwise the profiler output would be too distracting to be of real help.
Comment preview