Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,585
|
Comments: 51,218
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 449 words

his is a bit of a side track. One of the things that is quite clear to me when I am reading the leveldb code is that I was never really any good at C++. I was a C/C++ developer. And that is a pretty derogatory term. C & C++ share a lot of the same syntax and underlying assumption, but the moment you want to start writing non trivial stuff, they are quite different. And no, I am not talking about OO or templates.

I am talking about things that came out of that. In particular, throughout the leveldb codebase, they are very rarely, if at all, allocate memory directly. Pretty much the whole codebase rely on std::string to handle buffer allocations and management. This make sense, since RAII is still the watch ward for good C++ code. Being able to utilize std::string for memory management also means that the memory will be properly released without having to deal with it explicitly.

More interestingly, the leveldb codebase is also using std::string as a general buffer. I wonder why it is std::string vs. std::vector<char>,  which would bet more reasonable, but I guess that this is because most of the time, users will want to pass strings as keys, and likely this is easier to manage, given the type of operations available on std::string (such as append).

It is actually quite fun to go over the codebase and discover those sort of things. Especially if I can figure them out on my own Smile.

This is quite interesting because from my point of view, buffers are a whole different set of problems. We don’t have to worry about the memory just going away in .NET (although we do have to worry about someone changing the buffer behind our backs), but we have to worry a lot about buffer size. This is because at some point (80Kb), buffers graduate to the large object heap, and stay there. Which means, in turn, that every time that you want to deal with buffers you have to take that into account, usually with a buffer pool.

Another aspect that is interesting with regards to memory usage is the explicit handling of copying. There are various places in the code where the copy constructor was made private, to avoid this. Or a comment is left about making a type copy-able intentionally. I get the reason why, because it is a common failing point in C++, but I forgot (although I am pretty sure that I used to know) the actual semantics of when/ how you want to do that in all cases.

time to read 26 min | 5198 words

One of the key external components of leveldb is the idea of WriteBatch. It allows you to batch multiple operations into a single atomic write.

It looks like this, from an API point of view:

   1: leveldb::WriteBatch batch;
   2: batch.Delete(key1);
   3: batch.Put(key2, value);
   4: s = db->Write(leveldb::WriteOptions(), &batch);

As we have learned in the previous post, WriteBatch is how leveldb handles all writes. Internally, any call to Put or Delete is translated into a single WriteBatch, then there is some batching involved across multiple batches, but that is beside the point right now.

I dove into the code for WriteBatch, and immediately I realized that this isn’t really what I bargained for. In my mind, WriteBatch was supposed to be something like this:

   1: public class WriteBatch
   2: {
   3:    List<Operation> Operations;
   4: }

Which would hold the in memory operations until they get written down to disk, or something.

Instead, it appears that leveldb took quite a different route. The entire data is stored in the following format:

   1: // WriteBatch::rep_ :=
   2: //    sequence: fixed64
   3: //    count: fixed32
   4: //    data: record[count]
   5: // record :=
   6: //    kTypeValue varstring varstring         |
   7: //    kTypeDeletion varstring
   8: // varstring :=
   9: //    len: varint32
  10: //    data: uint8[len]

This is the in memory value, mind. So we are already storing this in a single buffer. I am not really sure why this is the case, to be honest.

WriteBatch is pretty much a write only data structure, with one major exception:

   1: // Support for iterating over the contents of a batch.
   2: class Handler {
   3:  public:
   4:   virtual ~Handler();
   5:   virtual void Put(const Slice& key, const Slice& value) = 0;
   6:   virtual void Delete(const Slice& key) = 0;
   7: };
   8: Status Iterate(Handler* handler) const;

You can iterate over the batch. The problem is that we now have this implementation for Iterate:

   1: Status WriteBatch::Iterate(Handler* handler) const {
   2:   Slice input(rep_);
   3:   if (input.size() < kHeader) {
   4:     return Status::Corruption("malformed WriteBatch (too small)");
   5:   }
   6:  
   7:   input.remove_prefix(kHeader);
   8:   Slice key, value;
   9:   int found = 0;
  10:   while (!input.empty()) {
  11:     found++;
  12:     char tag = input[0];
  13:     input.remove_prefix(1);
  14:     switch (tag) {
  15:       case kTypeValue:
  16:         if (GetLengthPrefixedSlice(&input, &key) &&
  17:             GetLengthPrefixedSlice(&input, &value)) {
  18:           handler->Put(key, value);
  19:         } else {
  20:           return Status::Corruption("bad WriteBatch Put");
  21:         }
  22:         break;
  23:       case kTypeDeletion:
  24:         if (GetLengthPrefixedSlice(&input, &key)) {
  25:           handler->Delete(key);
  26:         } else {
  27:           return Status::Corruption("bad WriteBatch Delete");
  28:         }
  29:         break;
  30:       default:
  31:         return Status::Corruption("unknown WriteBatch tag");
  32:     }
  33:   }
  34:   if (found != WriteBatchInternal::Count(this)) {
  35:     return Status::Corruption("WriteBatch has wrong count");
  36:   } else {
  37:     return Status::OK();
  38:   }
  39: }

So we write it directly to a buffer, then read from that buffer. The interesting bit is that the actual writing to leveldb itself is done in a similar way, see:

   1: class MemTableInserter : public WriteBatch::Handler {
   2:  public:
   3:   SequenceNumber sequence_;
   4:   MemTable* mem_;
   5:  
   6:   virtual void Put(const Slice& key, const Slice& value) {
   7:     mem_->Add(sequence_, kTypeValue, key, value);
   8:     sequence_++;
   9:   }
  10:   virtual void Delete(const Slice& key) {
  11:     mem_->Add(sequence_, kTypeDeletion, key, Slice());
  12:     sequence_++;
  13:   }
  14: };
  15:  
  16: Status WriteBatchInternal::InsertInto(const WriteBatch* b,
  17:                                       MemTable* memtable) {
  18:   MemTableInserter inserter;
  19:   inserter.sequence_ = WriteBatchInternal::Sequence(b);
  20:   inserter.mem_ = memtable;
  21:   return b->Iterate(&inserter);
  22: }

As I can figure it so far, we have the following steps:

  • WriteBatch.Put / WriteBatch.Delete gets called, and the values we were sent are copied into our buffer.
  • We actually save the WriteBatch, at which point we unpack the values out of the buffer and into the memtable.

It took me a while to figure it out, but I think that I finally got it. The reason this is the case is that leveldb is a C++ application. As such, memory management is something that it needs to worry about explicitly.

In particular, you can’t just rely on the memory you were passed to be held, the user may release that memory after they called to Put. This means, in turn, that you must copy the memory to memory that leveldb allocated, so leveldn can manage its own lifetime. This is a foreign concept to me because it is such a strange thing to do in the .NET land, where memory cannot just disappear underneath you.

On my next post, I’ll deal a bit more with this aspect, buffers management and memory handling in general.

time to read 27 min | 5257 words

I think that the very first thing that we want to do is to actually discover how exactly is leveldb saving the information to disk. In order to do that, we are going to trace the calls (with commentary) for the Put method.

We start from the client code:

   1: leveldb::DB* db;
   2: leveldb::DB::Open(options, "play/testdb", &db);
   3: status = db->Put(leveldb::WriteOptions(), "Key", "Hello World");

This calls the following method:

   1: // Default implementations of convenience methods that subclasses of DB
   2: // can call if they wish
   3: Status DB::Put(const WriteOptions& opt, const Slice& key, const Slice& value) {
   4:   WriteBatch batch;
   5:   batch.Put(key, value);
   6:   return Write(opt, &batch);
   7: }
   8:  
   9: Status DB::Delete(const WriteOptions& opt, const Slice& key) {
  10:   WriteBatch batch;
  11:   batch.Delete(key);
  12:   return Write(opt, &batch);
  13: }

I included the Delete method as well, because this code teaches us something important, all the modifications calls are always going through the same WriteBatch call. Let us look at that now.

   1: Status DBImpl::Write(const WriteOptions& options, WriteBatch* my_batch) {
   2:   Writer w(&mutex_);
   3:   w.batch = my_batch;
   4:   w.sync = options.sync;
   5:   w.done = false;
   6:  
   7:   MutexLock l(&mutex_);
   8:   writers_.push_back(&w);
   9:   while (!w.done && &w != writers_.front()) {
  10:     w.cv.Wait();
  11:   }
  12:   if (w.done) {
  13:     return w.status;
  14:   }
  15:  
  16:   // May temporarily unlock and wait.
  17:   Status status = MakeRoomForWrite(my_batch == NULL);
  18:   uint64_t last_sequence = versions_->LastSequence();
  19:   Writer* last_writer = &w;
  20:   if (status.ok() && my_batch != NULL) {  // NULL batch is for compactions
  21:     WriteBatch* updates = BuildBatchGroup(&last_writer);
  22:     WriteBatchInternal::SetSequence(updates, last_sequence + 1);
  23:     last_sequence += WriteBatchInternal::Count(updates);
  24:  
  25:     // Add to log and apply to memtable.  We can release the lock
  26:     // during this phase since &w is currently responsible for logging
  27:     // and protects against concurrent loggers and concurrent writes
  28:     // into mem_.
  29:     {
  30:       mutex_.Unlock();
  31:       status = log_->AddRecord(WriteBatchInternal::Contents(updates));
  32:       if (status.ok() && options.sync) {
  33:         status = logfile_->Sync();
  34:       }
  35:       if (status.ok()) {
  36:         status = WriteBatchInternal::InsertInto(updates, mem_);
  37:       }
  38:       mutex_.Lock();
  39:     }
  40:     if (updates == tmp_batch_) tmp_batch_->Clear();
  41:  
  42:     versions_->SetLastSequence(last_sequence);
  43:   }
  44:  
  45:   while (true) {
  46:     Writer* ready = writers_.front();
  47:     writers_.pop_front();
  48:     if (ready != &w) {
  49:       ready->status = status;
  50:       ready->done = true;
  51:       ready->cv.Signal();
  52:     }
  53:     if (ready == last_writer) break;
  54:   }
  55:  
  56:   // Notify new head of write queue
  57:   if (!writers_.empty()) {
  58:     writers_.front()->cv.Signal();
  59:   }
  60:  
  61:   return status;
  62: }

Now we have a lot of code to go through. Let us see what conclusions we can draw from this.

The first 15 lines or so seems to create a new Writer, not sure what that is yet, and register that in a class variable. Maybe it is actually being written on a separate thread?

I am going to switch over and look at that line of thinking .First thing to do is to look at the Writer implementation. This writer looks like this:

   1: struct DBImpl::Writer {
   2:   Status status;
   3:   WriteBatch* batch;
   4:   bool sync;
   5:   bool done;
   6:   port::CondVar cv;
   7:  
   8:   explicit Writer(port::Mutex* mu) : cv(mu) { }
   9: };

So this is just a data structure with no behavior. Note that we have CondVar, whatever that is. Which accepts a mutex. Following the code, we see this is a pthread condition variable. I haven’t dug too deep into this, but it appears like it is similar to the .NET lock variable. Except that there seems to be the ability to associate multiple variables with a single mutex. Which could be a useful way to signal on specific conditions. The basic idea is that you can wait for a specific operation, not just a single variable.

Now that I get that, let us see what we can figure out about the writers_ usage. This is just a standard (non thread safe) std::deque, (a data structure merging properties of list & queue). Thread safety is achieved via the call to MutexLock on line 7. I am going to continue ignoring the rest of the function and look where else this value is being used now. Back now, and it appears that the only place where writers_ are used in in this method or methods that it calls.

What this means in turn is that unlike what I thought, there isn’t a dedicated background thread for this operation. Rather, this is a way for leveldb to serialize access. As I understand it. Calls to the Write() method would block on the mutex access, then it waits until its write is the current one (that is what the &w != writers_.front() means. Although the code also seems to suggest that another thread may pick up on this behavior and batch multiple writes to disk at the same time. We will discuss this later on.

Right now, let us move to line 17, and MakeRoomForWrite. This appears to try to make sure that we have enough room to the next write. I don’t really follow the code there yet, I’ll ignore that for now and move on to the rest of the Write() method.

In line 18, we get the current sequence number, although I am not sure why that is, I think it is possible this is for the log. The next interesting bit is in BuildBatchGroup, this method will merge existing pending writes into one big write (but not too big a write). This is a really nice way to merge a lot of IO into a single disk access, without introducing latency in the common case.

The rest of the code is dealing with the actual write to the log  / mem table 20 – 45, then updating the status of the other writers we might have modified, as well as starting the writes for existing writers that may have not got into the current batch.

And I think that this is enough for now. We haven’t got to disk yet, I admit, but we did get a lot of stuff done. On my next post, I’ll dig even deeper, and try to see how the data is actually structured, I think that this would be interesting…

time to read 2 min | 285 words

LevelDB is…

a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

That is the project’s own definition. Basically, it is a way for users to store data in an efficient manner. It isn’t a SQL database. It isn’t even a real database in any sense of the word. What it is is a building block for building databases. It handles writing and reading to disk, and it supports atomicity. But anything else is on you (from transaction management to more complex items).

As such, it appears perfect for the kind of things that we need to do. I decided that I wanted to get to know the codebase, especially since at this time, I can’t even get it to compile Sad smile. The fact that this is a C++ codebase, written by people who eat & breath C++ for a living is another reason why. I expect that this would be a good codebase, so I might as well sharpen my C++-foo at the same time that I grok what this is doing.

The first thing to do is to look at the interface that the database provides us with:

image

That is a very small surface area, and as you can imagine, this is something that I highly approve of. It make it much easier to understand and reason about. And there is some pretty complex behavior behind this, which I’ll be exploring soon.

time to read 15 min | 2876 words

There are two things that I would change in the RavenBurgerCo sample app.

The first would be session management, I dislike code like this:

image

I would much rather do that in a base controller and avoid manual session management. But that is most a design choice, and it ain’t really that important.

But what is important is the number of indexes that the application uses. We have:

  • LocationIndex
  • DeliveryIndex
  • DriveThruIndex

And I am not really sure that we need all three. In fact, I am pretty sure that we don’t. What we can do is merge them all into a single index. I am pretty sure that the reason that there were three of them was because there there was a bug in RavenDB that made it error if you gave it a null WKT (vs. just recognize this an a valid opt out). I fixed that bug, but even with that issue in place, we can get things working:

   1: public class SpatialIndex : AbstractIndexCreationTask<Restaurant>
   2: {
   3:     public SpatialIndex()
   4:     {
   5:         Map = restaurants =>
   6:               from restaurant in restaurants
   7:               select new
   8:                   {
   9:                       _ = SpatialGenerate(restaurant.Latitude, restaurant.Longitude),
  10:                       __ = restaurant.DriveThruArea == null ? 
  11:                                     new object[0] : 
  12:                                     SpatialGenerate("drivethru", restaurant.DriveThruArea),
  13:                       ___ = restaurant.DeliveryArea == null ? 
  14:                                     new object[0] : 
  15:                                     SpatialGenerate("delivery", restaurant.DeliveryArea)
  16:                   };
  17:     }
  18: }

And from then, it is just a matter of updating the queries, which now looks like the following:

Getting the restaurants near my location (for Eat In page):

   1: return session.Query<Restaurant, SpatialIndex>()
   2:     .Customize(x =>
   3:                    {
   4:                        x.WithinRadiusOf(25, latitude, longitude);
   5:                        x.SortByDistance();
   6:                    })
   7:     .Take(250)
   8:     .Select( ... );

Getting the restaurants that deliver to my location (Delivery page):

   1: return session.Query<Restaurant, SpatialIndex>()
   2:     .Customize(x => x.RelatesToShape("delivery", point, SpatialRelation.Intersects))
   3:     // SpatialRelation.Contains is not supported
   4:     // SpatialRelation.Intersects is OK because we are using a point as the query parameter
   5:     .Take(250)
   6:     .Select( ... ) ;

Getting the restaurants inside a particular rectangle (Map page):

   1: return session.Query<Restaurant, SpatialIndex>()
   2:     .Customize(x => x.RelatesToShape(Constants.DefaultSpatialFieldName, rectangle, SpatialRelation.Within))
   3:     .Take(512)
   4:     .Select( ... );

Note that we use DefaultSpatialFieldName, instead of indexing the location twice.

And finally, getting the restaurants that are applicable for drive through for my route (Drive Thru page):

   1: return session.Query<Restaurant, SpatialIndex>()
   2:     .Customize(x => x.RelatesToShape("drivethru", lineString, SpatialRelation.Intersects))
   3:     .Take(512)
   4:     .Select( ... );

And that is that.

Really great project, and quite amazing, both client & server code. It is simple, it is elegant and it is effective. Well done Simon!

time to read 82 min | 16287 words

This is a review of RavenBurgerCo, created as a sample app for RavenDB spatial support by Simon Bartlett. This is by no means an unbiased review, if only because I had laughed out load and crazily when I saw the first page:

image

What is this about?

Raven Burger Co is a chain of fast food restaurants, based in the United Kingdom. Their speciality is burgers made with raven meat. All their restaurants offer eat-in/take-out service, while some offer home delivery, and others offer a drive thru service.

This sample application is their online restaurant locator.

Good things about this project? Here is how you get started:

  1. Clone this repository
  2. Open the solution in Visual Studio 2012
  3. Press F5
  4. Play!

And it actually works! It uses embeddable RavenDB to make it super easy and stupid to run it, right out of the box.

We will start this review by looking at the infrastructure for this project, starting, as usual, from Global.asax:

image

Let us see how RavenDB is setup:

   1: public static void ConfigureRaven(MvcApplication application)
   2: {
   3:     var store = new EmbeddableDocumentStore
   4:                         {
   5:                             DataDirectory = "~/App_Data/Database",
   6:                             UseEmbeddedHttpServer = true
   7:                         };
   8:  
   9:     store.Initialize();
  10:     MvcApplication.DocumentStore = store;
  11:  
  12:     IndexCreation.CreateIndexes(typeof(MvcApplication).Assembly, store);
  13:  
  14:     var statistics = store.DatabaseCommands.GetStatistics();
  15:  
  16:     if (statistics.CountOfDocuments < 5)
  17:         using (var bulkInsert = store.BulkInsert())
  18:             LoadRestaurants(application.Server.MapPath("~/App_Data/Restaurants.csv"), bulkInsert);
  19: }

So use embedded RavenDB, and if there isn’t enough data in the db, load the default data set using RavenDB’s new Bulk Insert feature.

Note that we set MvcApplication.DocumentStore property, let us see how this is used.

Simon did a really nice thing here. Note that UseEmbeddedHttpServer is set to true, which means that RavenDB will find an open port and use it, this is then exposed in the UI:

image

So you can click on the link and land right in the studio for your embedded database, which gives you the ability to view, debug & modify how things are actually going. This is a really nice way to expose it.

Now, let us move to the actual project code itself. Exploring the options in this project, we have map browsing:

image

And here I have to admit ignorance. I have no idea on how to use maps, so this is quite nice for me, something new to learn. The core of this page is this script:

   1: $(function () {
   2:  
   3:     var gmapLayer = new L.Google('ROADMAP');
   4:     var resultsLayer = L.layerGroup();
   5:  
   6:     var map = L.map('map', {
   7:         layers: [gmapLayer, resultsLayer],
   8:         center: [51.4775, -0.461389],
   9:         zoom: 12,
  10:         maxBounds: L.latLngBounds([49, 15], [60, -25])
  11:     });
  12:  
  13:     var loadMarkers = function() {
  14:         if (map.getZoom() > 9) {
  15:             var bounds = map.getBounds();
  16:             $.get('/api/restaurants', {
  17:                 north: bounds.getNorthWest().lat,
  18:                 east: bounds.getSouthEast().lng,
  19:                 south: bounds.getSouthEast().lat,
  20:                 west: bounds.getNorthWest().lng,
  21:             }).done(function(restaurants) {
  22:                 resultsLayer.clearLayers();
  23:                 $.each(restaurants, function(index, value) {
  24:                     var marker = L.marker([value.Latitude, value.Longitude])
  25:                         .bindPopup(
  26:                             '<p><strong>' + value.Name + '</strong><br />' +
  27:                                 value.Street + '<br />' +
  28:                                 value.City + '<br />' +
  29:                                 value.PostCode + '<br />' +
  30:                                 value.Phone + '</p>'
  31:                         );
  32:                     resultsLayer.addLayer(marker);
  33:                 });
  34:             });
  35:         } else {
  36:             resultsLayer.clearLayers();
  37:         }
  38:     };
  39:  
  40:     loadMarkers();
  41:     map.on('moveend', loadMarkers);
  42: });

You can see that loadMarkers method, which is getting called whenever the map is moved, and on startup. This end up calling this method with the boundaries of the visible UI on the server:

   1: public IEnumerable<object> Get(double north, double east, double west, double south)
   2: {
   3:     var rectangle = string.Format(CultureInfo.InvariantCulture, "{0:F6} {1:F6} {2:F6} {3:F6}", west, south, east, north);
   4:  
   5:     using (var session = MvcApplication.DocumentStore.OpenSession())
   6:     {
   7:         return session.Query<Restaurant, LocationIndex>()
   8:             .Customize(x => x.RelatesToShape("location", rectangle, SpatialRelation.Within))
   9:             .Take(512)
  10:             .Select(x => new
  11:                             {
  12:                                 x.Name,
  13:                                 x.Street,
  14:                                 x.City,
  15:                                 x.PostCode,
  16:                                 x.Phone,
  17:                                 x.Delivery,
  18:                                 x.DriveThru,
  19:                                 x.Latitude,
  20:                                 x.Longitude
  21:                             })
  22:             .ToList();
  23:     }
  24: }

Note that in this case, we are doing a search for items inside the rectangle. But the search options are a bit funky. You have to send the data in WKT format.  Luckily, Simon already create a better solution (in this case, he is using the long hand method to make sure that we all understand what he is doing). The better method would be to use his Geo library, in which case the code would look like:

   1: .Geo("location", x => x.RelatesToShape(new Rectangle(west, south, east, north), SpatialRelation.Within))

So that was the map, now let us look at another example, the Eat In example. In that case, we are looking for restaurants near our location to be figure out where to eat. This looks like this:

image

Right in the bull’s eye!

Here is the server side code:

   1: public IEnumerable<object> Get(double latitude, double longitude)
   2: {
   3:     using (var session = MvcApplication.DocumentStore.OpenSession())
   4:     {
   5:         return session.Query<Restaurant, LocationIndex>()
   6:             .Customize(x =>
   7:                            {
   8:                                x.WithinRadiusOf(25, latitude, longitude);
   9:                                x.SortByDistance();
  10:                            })
  11:             .Take(250)
  12:             .Select(x => new
  13:                             {
  14:                                 x.Name,
  15:                                 x.Street,
  16:                                 x.City,
  17:                                 x.PostCode,
  18:                                 x.Phone,
  19:                                 x.Delivery,
  20:                                 x.DriveThru,
  21:                                 x.Latitude,
  22:                                 x.Longitude
  23:                             })
  24:             .ToList();
  25:     }
  26: }

And on the client side, we just do the following:

   1: $('#location').change(function () {
   2:     var latlng = $('#location').locationSelector('val');
   3:  
   4:     var outerCircle = L.circle(latlng, 25000, { color: '#ff0000', fillOpacity: 0 });
   5:     map.fitBounds(outerCircle.getBounds());
   6:  
   7:     resultsLayer.clearLayers();
   8:     resultsLayer.addLayer(outerCircle);
   9:     resultsLayer.addLayer(L.circle(latlng, 15000, { color: '#ff0000', fillOpacity: 0.1 }));
  10:     resultsLayer.addLayer(L.circle(latlng, 10000, { color: '#ff0000', fillOpacity: 0.3 }));
  11:     resultsLayer.addLayer(L.circle(latlng, 5000, { color: '#ff0000', fillOpacity: 0.5 }));
  12:     resultsLayer.addLayer(L.circleMarker(latlng, { color: '#ff0000', fillOpacity: 1, opacity: 1 }));
  13:  
  14:  
  15:     $.get('/api/restaurants', {
  16:         latitude: latlng[0],
  17:         longitude: latlng[1]
  18:     }).done(function (restaurants) {
  19:         $.each(restaurants, function (index, value) {
  20:             var marker = L.marker([value.Latitude, value.Longitude])
  21:                 .bindPopup(
  22:                     '<p><strong>' + value.Name + '</strong><br />' +
  23:                     value.Street + '<br />' +
  24:                     value.City + '<br />' +
  25:                     value.PostCode + '<br />' +
  26:                     value.Phone + '</p>'
  27:                 );
  28:             resultsLayer.addLayer(marker);
  29:         });
  30:     });
  31: });

We define several circles of different opacities, and then show up the returned markers.

It is all pretty simple code, but the result it quite stunning. I am getting really excited by this thing. It is simple, beautiful and quite powerful. Wow!

The delivery tab does pretty much the same thing as the eat-in mode, but it does so in a different way. First, you might have noticed the LocationIndex in the previous two examples, this looks like this:

   1: public class LocationIndex : AbstractIndexCreationTask<Restaurant>
   2: {
   3:     public LocationIndex()
   4:     {
   5:         Map = restaurants => from restaurant in restaurants
   6:                              select new
   7:                                         {
   8:                                             restaurant.Name,
   9:                                             _ = SpatialGenerate(restaurant.Latitude, restaurant.Longitude),
  10:                                             __ = SpatialGenerate("location", restaurant.LocationWkt)
  11:                                         };
  12:     }
  13: }

Before we look at this, we need to look at a sample document:

image

I am note quite sure why we have in LocationIndex both SpatialGenerate() and SpatialGenerate(“location”). I think that this is just a part of the demo. Because the data is the same, and both lines should produce the same results.

However, for deliveries, the situation is quite different. We don’t just deliver to a certain distance, as you can see, we have a polygon that determines where do we actually delivers to. On the map, this looks like this:

image

The red circle is where I am located, the blue markers are the restaurants that delivers to my location and the blue polygon is the delivery area for the selected burger joint. Let us see how this works, okay? We will start from the index:

   1: public class DeliveryIndex : AbstractIndexCreationTask<Restaurant>
   2: {
   3:     public DeliveryIndex()
   4:     {
   5:         Map = restaurants => from restaurant in restaurants
   6:                              where restaurant.DeliveryArea != null
   7:                              select new
   8:                                         {
   9:                                             restaurant.Name,
  10:                                             _ = SpatialGenerate("delivery", restaurant.DeliveryArea, SpatialSearchStrategy.GeohashPrefixTree, 7)
  11:                                         };
  12:     }
  13: }

So we are indexing just restaurants that have a drive through polygon, and then we query it like this:

   1: public IEnumerable<object> Get(double latitude, double longitude, bool delivery)
   2: {
   3:     if (!delivery)
   4:         return Get(latitude, longitude);
   5:  
   6:     var point = string.Format(CultureInfo.InvariantCulture, "POINT ({0} {1})", longitude, latitude);
   7:  
   8:     using (var session = MvcApplication.DocumentStore.OpenSession())
   9:     {
  10:         return session.Query<Restaurant, DeliveryIndex>()
  11:             .Customize(x => x.RelatesToShape("delivery", point, SpatialRelation.Intersects))
  12:             // SpatialRelation.Contains is not supported
  13:             // SpatialRelation.Intersects is OK because we are using a point as the query parameter
  14:             .Take(250)
  15:             .Select(x => new
  16:                             {
  17:                                 x.Name,
  18:                                 x.Street,
  19:                                 x.City,
  20:                                 x.PostCode,
  21:                                 x.Phone,
  22:                                 x.Delivery,
  23:                                 x.DriveThru,
  24:                                 x.Latitude,
  25:                                 x.Longitude,
  26:                                 x.DeliveryArea
  27:                             })
  28:             .ToList();
  29:     }
  30: }

This basically says, give me all the Restaurants who delivers to a locations that includes me. And then the rest all happens on the client side.

Quite cool.

The final example is the drive thru mode, which looks like this:

image

Given that I am driving from the green dot to the red dot, what restaurants can I stop at?

Here is the index:

   1: public class DriveThruIndex : AbstractIndexCreationTask<Restaurant>
   2: {
   3:     public DriveThruIndex()
   4:     {
   5:         Map = restaurants => from restaurant in restaurants
   6:                              where restaurant.DriveThruArea != null
   7:                              select new
   8:                                         {
   9:                                             restaurant.Name,
  10:                                             _ = SpatialGenerate("drivethru", restaurant.DriveThruArea)
  11:                                         };
  12:     }
  13: }

And now the code for this:

   1: public IEnumerable<object> Get(string polyline)
   2: {
   3:     var lineString = PolylineHelper.ConvertGooglePolylineToWkt(polyline);
   4:  
   5:     using (var session = MvcApplication.DocumentStore.OpenSession())
   6:     {
   7:         return session.Query<Restaurant, DriveThruIndex>()
   8:             .Customize(x => x.RelatesToShape("drivethru", lineString, SpatialRelation.Intersects))
   9:             .Take(512)
  10:             .Select(x => new
  11:                             {
  12:                                 x.Name,
  13:                                 x.Street,
  14:                                 x.City,
  15:                                 x.PostCode,
  16:                                 x.Phone,
  17:                                 x.Delivery,
  18:                                 x.DriveThru,
  19:                                 x.Latitude,
  20:                                 x.Longitude
  21:                             })
  22:             .ToList();
  23:     }
  24: }

We get the driving direction from the map, convert it to a line string, and then just check if our path intersects with the drive thru area for the restaurants.

Pretty cool application, and some really nice UI.

Okay, enough with the accolades, next time, I’ll talk about the things that can be better.

time to read 39 min | 7645 words

As part of my ongoing reviews efforts, I am going to review the BitShuva Radio application.

BitShuva Radio is a framework for building internet radio stations with intelligent social features like community rank, thumb-up/down songs, community song requests, and machine learning that responds to the user's likes and dislikes and plays more of the good stuff.

I just cloned the repository and opened it in VS, without reading anything beyond the first line. As usual, I am going to start from the top and move on down:

image

We already have some really good indications:

  • There is just one project, not a gazillion of them.
  • The folders seems to be pretty much the standard ASP.NET MVC ones, so that should be easy to work with.

Some bad indications:

  • Data & Common folders are likely to be troublesome spots.

Hit Ctrl+F5, and I got this screen, which is a really good indication. There wasn’t a lot of setup required.

image

Okay, enough with the UI, I can’t really tell if this is good or bad anyway. Let us dive into the code. App_Start, here I come.

image

I get the feeling that WebAPI and Ninject are used here. I looked in the NinjectWebCommon file, and found:

image

Okay, I am biased, I’ll admit, but this is good.

Other than the RavenDB stuff, it is pretty boring, standard and normal codebase. No comments so far. Let us see what is this RavenStore all about, which leads us to the Data directory:

image

So it looks like we have the RavenStore and a couple of indexes. And the code itself:

   1: public class RavenStore
   2: {
   3:     public IDocumentStore CreateDocumentStore()
   4:     {
   5:         var hasRavenConnectionString = ConfigurationManager.ConnectionStrings["RavenDB"] != null;
   6:         var store = default(IDocumentStore);            
   7:         if (hasRavenConnectionString)
   8:         {
   9:             store = new DocumentStore { ConnectionStringName = "RavenDB" };
  10:         }
  11:         else
  12:         {
  13:             store = new EmbeddableDocumentStore { DataDirectory = "~/App_Data/Raven" };
  14:         }
  15:  
  16:         store.Initialize();
  17:         IndexCreation.CreateIndexes(typeof(RavenStore).Assembly, store);
  18:         return store;
  19:     }
  20: }

I think that this code need to be improved, to start with, there is no need for this to be an instance. And there is no reason why you can’t use EmbeddableDocumentStore to use remote stuff.

I would probably write it like this, but yes, this is stretching things:

   1: public static class RavenStore
   2: {
   3:     public static IDocumentStore CreateDocumentStore()
   4:     {
   5:         var store = new EmbeddableDocumentStore
   6:             {
   7:                 DataDirectory = "~/App_Data/Raven"
   8:             };
   9:  
  10:         if (ConfigurationManager.ConnectionStrings["RavenDB"] != null)
  11:         {
  12:             store.ConnectionStringName = "RavenDB";
  13:         }
  14:         store.Initialize();
  15:         IndexCreation.CreateIndexes(typeof(RavenStore).Assembly, store);
  16:         return store;
  17:     }
  18: }

I intended to just glance at the indexes, but this one caught my eye:

image

This index effectively gives you random output. It will group by the count of documents, and since we reduce things multiple times, the output is going to be… strange.

I am not really sure what this is meant to do, but it is strange and probably not what the author intended.

The Common directory contains nothing of interest beyond some util stuff. Moving on to the Controllers part of the application:

image

So this is a relatively small application, but an interesting one. We will start with what I expect o be a very simple part of the code .The HomeController:

   1: public class HomeController : Controller
   2: {
   3:     public ActionResult Index()
   4:     {
   5:         var userCookie = HttpContext.Request.Cookies["userId"];
   6:         if (userCookie == null)
   7:         {
   8:             var raven = Get.A<IDocumentStore>();
   9:             using (var session = raven.OpenSession())
  10:             {
  11:                 var user = new User();
  12:                 session.Store(user);
  13:                 session.SaveChanges();
  14:  
  15:                 HttpContext.Response.SetCookie(new HttpCookie("userId", user.Id));
  16:             }
  17:         }
  18:  
  19:         // If we don't have any songs, redirect to admin.
  20:         using (var session = Get.A<IDocumentStore>().OpenSession())
  21:         {
  22:             if (!session.Query<Song>().Any())
  23:             {
  24:                 return Redirect("/admin");
  25:             }
  26:         }
  27:  
  28:         ViewBag.Title = "BitShuva Radio";
  29:         return View();
  30:     }
  31: }

There are a number of things in here that I don’t like. First of all, let us look at the user creation part. You look at the cookies and create a user if it isn’t there, setting the cookie afterward.

This has the smell of something that you want to do in the infrastructure. I did  a search for “userId” in the code and found the following in the SongsController:

   1: private User GetOrCreateUser(IDocumentSession session)
   2: {
   3:     var userCookie = HttpContext.Current.Request.Cookies["userId"];
   4:     var user = userCookie != null ? session.Load<User>(userCookie.Value) : CreateNewUser(session);
   5:     if (user == null)
   6:     {
   7:         user = CreateNewUser(session);
   8:     }
   9:  
  10:     return user;
  11: }
  12:  
  13: private static User CreateNewUser(IDocumentSession session)
  14: {
  15:     var user = new User();
  16:     session.Store(user);
  17:  
  18:     HttpContext.Current.Response.SetCookie(new HttpCookie("userId", user.Id));
  19:     return user;
  20: }

That is code duplication with slightly different semantics, yeah!

Another issue with the HomeController.Index method is that we have direct IoC calls (Get.As<T>) and multiple sessions per request. I would much rather do this in the infrastructure, which would also give us a place for the GetOrCreateUser method to hang from.

SongsController is actually an Api Controller, so I assume that it is called from JS on the page. Most of the code there looks like this:

   1: public Song GetSongForSongRequest(string songId)
   2: {
   3:     using (var session = raven.OpenSession())
   4:     {
   5:         var user = GetOrCreateUser(session);
   6:         var songRequest = new SongRequest
   7:         {
   8:             DateTime = DateTime.UtcNow,
   9:             SongId = songId,
  10:             UserId = user.Id
  11:         };
  12:         session.Store(songRequest);
  13:         session.SaveChanges();
  14:     }
  15:  
  16:     return GetSongById(songId);
  17: }

GetSongById will use its own session, and I think it would be better to have just one session per request, but that is about the sum of my comments.

One thing that did bug me was the song search:

   1: public IEnumerable<Song> GetSongMatches(string searchText)
   2: {
   3:     using (var session = raven.OpenSession())
   4:     {
   5:         return session
   6:             .Query<Song>()
   7:             .Where(s =>
   8:                 s.Name.StartsWith(searchText) ||
   9:                 s.Artist.StartsWith(searchText) ||
  10:                 s.Album.StartsWith(searchText))
  11:             .Take(50)
  12:             .AsEnumerable()
  13:             .Select(s => s.ToDto());
  14:     }
  15: }

RavenDB has a really good full text support. And we could be using that, instead. It would give you better results and be easier to work with, to boot.

Overall, this is a pretty neat little app.

time to read 3 min | 411 words

I recently began to read the “indies” in the Amazon’s listing, and I wanted to point out some things that I really enjoyed:

The David Birkenhead series:

Product Details Product Details Product Details Product Details

If I have one complaint about this series, it is that it the books are fairly short, typically less than 200 pages. That said, they are very well written, and the main protagonist is likable almost from the get go.  I just noticed that the latest (Commander) came out, and I read it in one sitting.

It is good, really good, military Sci Fi. It is believable, interesting and in general, a lot of fun.

The Admiral Who series:

Product Details Product Details

In contrast to the Birkenhead series, no one can complain that those books are short. Each comes at around 500 pages or so, and they are filled with a lot of really good content.

This is a reluctant hero story, and it reminded me strongly of Mat from the Wheel of Time, just in space.

 

The problem with most indie content on Amazon is that they are frequently poorly edited. In both cases, however, the editing is pretty good (not perfect, but good enough that it doesn’t distract from the story). And the stories more than compensate for that.

Just to give you some idea bout how goo they are, I am currently re-reading those books, and that is a honor that many professionally produced books just don’t get.

The Birkenhead series is supposed to have another book in late October (my thought, it is already late October, any later and it is November!) and while there isn’t a release date for the Admiral Who series, I am looking forward to both eagerly.

time to read 1 min | 191 words

Greg Young has a comment on my Rhino Events post that deserves to be read in full. Go ahead, read it, I’ll wait.

Since you didn’t, I’ll summarize. Greg points out numerous faults and issues that aren’t handled or could be handled better in the code.

That is excellent, from my point of view, if only because it gives me more stuff to think about for the next time.

But the most important thing to note here is that Greg is absolutely correct about something:

I have always said an event store is a fun project because you can go anywhere from an afternoon to years on an implementation.

Rhino Events is a fun project, and I’ve learned some stuff there that I’ll likely use again letter on. But above everything else, this is not production worthy code .It is just some fun code that I liked. You may take and do whatever you like with it, but mostly I was concerned with finding the right ways to actually get things done, and not considering all of the issues that might arise in a real production environment.

time to read 6 min | 1104 words

I had a really bad couple of days. I am pissed, annoyed and angry, for totally not technical reasons.

And then I run into this issue, and I just want to throw something really hard at someone, repeatedly.

The issue started from this bug report:

   1: NetTopologySuite.Geometries.TopologyException was unhandled
   2:   HResult=-2146232832
   3:   Message= ... trimmed ...
   4:   Source=NetTopologySuite
   5:   StackTrace:
   6:        at NetTopologySuite.Operation.Overlay.Snap.SnapIfNeededOverlayOp.GetResultGeometry(SpatialFunction opCode)
   7:        at NetTopologySuite.Operation.Union.CascadedPolygonUnion.UnionActual(IGeometry g0, IGeometry g1)
   8:        at NetTopologySuite.Operation.Union.CascadedPolygonUnion.Worker.Execute()
   9:        at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
  10:        at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
  11:        at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
  12:        at System.Threading.ThreadHelper.ThreadStart()

At first, I didn’t really realized why it was my problem. I mean, it is NTS problem, isn’t it?

Except that this particular issue actually crashed ravendb (don’t worry, it is unstable builds only). The reason it crashed RavenDB? An unhandled thread exception.

What I can’t figure out is what on earth is going on. So I took a look at the code, have a look:

image

I checked, and this isn’t code that has been ported from the Java code. You can see the commented code there? That is from the Java version.

And let us look at what the execute method does?

image

So let me see if I understand. We have a list of stuff to do, so we spin out threads, reclusively, then we wait on them. I think that the point was to optimize things in some manner by parallelizing the work between the two halves.

Do you know what the real killer is? If we assume that we have a geometry with just 20 items on it, this will generate twenty two threads.

Leaving aside the issue of not handling errors properly (and killing the entire process because of this), the sheer cost of creating the threads is going to kill this program.

Libraries should be made to be thread safe (I already had to fix a thread safety bug there),  but they should not be creating their own threads unless it is quite clear that they need to do so.

I believe that this is a case of a local optimization for a specific scenario, it also carry all of the issues associated with local optimizations. It solves one problem and opens up seven other ones.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  2. Webinar (7):
    05 Jun 2025 - Think inside the database
  3. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  4. RavenDB News (2):
    02 May 2025 - May 2025
  5. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}