Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,575
|
Comments: 51,188
Privacy Policy · Terms
filter by tags archive
time to read 5 min | 938 words

Udi has been talking lately about using spaces. Spaces are a distributed hash table that has the following operations: Write/Read/Take/Notify. The idea is that you can put entities or messages in the space, and something will handle them for you.

I find the idea interesting, since I had the chance to build such a system (~20 - 50 workers, each doing their own little part, independant from others) on a single machine. It turn out that this is a very nice approach to separating responsabilities and getting things done.

Scaling it out to a distibuted approach is interesting. I am going to ignore Udi's suggestion to get a ready-made solution and think about the way I would implement it. Just to make it interesting, I will throw in as many technologies as possible. Remember, this is a thought experiment.

Distributed hash table... hm, where did I heard that before? Memcached is exacatly what this is talking about and using it makes Read / Write / Take operations very easy, and notify nearly impossible. Obviously notify is a critical piece here, so I would rule Memcached out for now, it may be used as our backend, but it can't be the primary technology.

WCF has a peer to peer capabilities, and it may be interesting to try that. Read / Write / Take now becomes interesting, not in the implementation, but rather in deciding where to put the data? Ideally, I want to be ignorant of where the data is physically at, while maintaining the hash table aspect of it.

This means that for a read operation with a known key, I should be able to go to the exact node that contains the data that I want. This probably means that P2P is out, since I don't want to query the mesh in order to get a response, I would like to communicate directly with nodes. This bring me to a design decision, do I assume that we are using a mostly reliable and static set of nodes, or do I want to go for unreliable, dynamic nodes. The first means that I can do a simple hash of the key, mod by the number of known nodes, and go directly to the node by index.

This is how Memcached works, basically a two way hashtable. The problem with that is handling failures and additions. Because failures and additions both distrub the key distributions and require re-shuffling of the data. This is not so important in a cache, which is (by definition) allowed to lose data, but it is much more important if I am going to put business data there.

I think that I will decide on the second approach, unreilable and dynamic it is. This means that the operations for the infrastructure are now: Read / Write / Take / Notify - which is what the user is familiar with, but on the wire, we have:

  • FindFor(key) - search the P2P mesh for the node that has this key
  • FindPlaceFor(key, size, attributes) - search the P2P mesh for a node that is willing to accept the data
  • [Read | Write | Take]Direct - talk directly to the node - doesn't use the P2P mesh, rather talks directly to the node
  • Notify - sends a notify request on the mesh, where all the nodes will pick it up.

Now, let us think about notify, which is actually the reason that I started all this. We basically want to pass some sort of a query that will return results for us. When I am thinking about a query, there are several options, we can use something like this as the client side API:

var query = from msg in Space.OrderMessages
   where msg.Status == OrderMessageStatus.ReadyToShip;
Space.Notification += OnMessageNotified;
Space.Notify(query);

But how is this implemented? I can think of two ways, either NHibernate criteria or Lucene queries. I am probably going to lean toward Lucene here, it is much faster for mostly search scenarios, and very easy to work with.

A client for the mesh would expose an end point for getting notified about the messages, here I'll probably pass either the node id and the data, or just the node id and the key. Probably the second, so I can get reliable results in the case of more than a single client for the same type of message.

Transactions are a problem, though. We would usually want to do several operations, and I would really like them to run in a transaction. This is more complicated because I may need to interact with several nodes in a single transaction. I am not familiar enough with WCF to know how to make this work, although I do believe that this is possible.

ThoughtExperiment.End();

time to read 9 min | 1718 words

Phil Haack just posted some code that made me wince:

[Test]

public void DemoLegsProperty()

{

       MockRepository mocks = new MockRepository();

 

       //Creates an IAnimal stub   

       IAnimal animalMock = (IAnimal)mocks.DynamicMock(typeof(IAnimal));

 

       //Makes the Legs property actually work, creating a fake.

       SetupResult.For(animalMock.Legs).PropertyBehavior();

       mocks.ReplayAll();

 

       animalMock.Legs = 0;

       Assert.AreEqual(0, animalMock.Legs);

 

       SomeClass instance = new SomeClass(animalMock);

       instance.SetLegs(10);

       Assert.AreEqual(10, animalMock.Legs);

}

The reason that it made me wince is that it is such a common scenario, and there are four lines of Rhino Mocks code here that just doesn't add any value to the test. The test is trying to verify that calling SetLegs on SomeClass will set the animal leg. Very trivial test, but about half of it is spent just setting up Rhino Mocks.

I don't like that.

Here is my version:

[Test]

public void DemoLegsProperty()

{

       IAnimal animalStub = MockRepository.GenerateStub<IAnimal>();

 

       animalStub.Legs = 0;

       Assert.AreEqual(0, animalStub.Legs);

 

       SomeClass instance = new SomeClass(animalStub);

       instance.SetLegs(10);

       Assert.AreEqual(10, animalStub.Legs);

}

Well, I cheated, I added this functionality to Rhino Mocks :-) Now we have a single line of Rhino Mocks code, which is very explicit about what it is doing.

The code is already in the repository, and I plan to release an update today, along with a bunch of other stuff.

time to read 7 min | 1271 words

After reading a bit about Jasper, I decided that I would like to see what it would take to build it with NHibernate and Boo. I am going to implement it and write the post at the same time, so you would get near real time documentation of the process.

  • 18:05, decided that I want this syntax:

    import Bumbler

     

    conStr = "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=True;";

     

    bumble = Bumble.Through(conStr)

    for customer in bumble.Customers:

          print "Customer ${customer.ContactName}"

    c

    = bumble.GetCustomer("TRADH")
  • 19:10, finished building client interface and quick & dirty mapping from database implementation.
  • 19:24, finished code generation, now working on runtime compilcation...
  • 19:54: finished runtime code generation
  • 20:13, can now successfully query the database using NHibernate just from connection string. Starting to work on dynamic implementation.
  • 20:16, done basic dynamics, now playing with Booish.
  • 20:21, the code above now works, and I am going for dinner.
  • 21:10, back from dinner, final polish...
  • 21:13, done!

So, about two hours of work, most of which had to do with generating the mapping from the database. I have cut some corners there, so it is SQL Server only, and it can't support a legacy app, adding support for the more complex cases is simply adding it to the mapping generation, nothing else. I made sure that adding new functionality would be easy, however.

Right now, you can get the code, just run the application, and you will get a shell that you can use to interact with the objects and the database. You need to start a new Bumble (you can simply cut/paste the code above) and then play with it. You can also reference the assembly and work with it directly using Boo.

Getting the code:

svn co https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/SampleApplications/Bumbler
time to read 1 min | 136 words

Jeremy Miller has a long post about all sorts of interesting things, the section that I wanted to talk about is about building meta-data using DB or XML with the intent of allowing non programmers to change the behavior of the system:

Oh, and the people out there swearing up and down that giving the business end users a screen to configure a business rules engine themselves will save the day?  BS.  Drawing workflow's with pictures is coding -- with every bit of the risk and danger that coding brings.  Only now you're making noncoders code in the production system.  Think about that last sentence.

 

time to read 2 min | 268 words

Imagine this fictitious scenario, you are working on some functionality, and you are pretty pleased with what you have so far. Then, a new requirement come in, and it require something that is plain out not possible with the approach that you have taken so far. Then you start to investigate it a little further, and you find that if you do this, and push that, and maybe not look too closely at the end result, you can get what you want, without major changes to your code.

I have done this. I have done things to the GridView that should make it weep in shame, and there is a reason why I know the System.Web.UI.Calendar.Render() has a special case for Hebrew calendar and Cyclomatic Complexity of 42.

The problem with these types of solutions is that they are a Hack. As such, they may serve an immediate need, but they will prevent me from continuing to evolve the software in any meaningful way. This is getting into technical debt in a serious manner. I finally drew a line in the sand when I found out that I would need to do multi threading in javascript, and I actually had an idea about how to make it work (don't ask!).

Next time, I believe that I will draw that line much sooner. Better to cut my loses and build the functionality again than having to deal with the complexities of an inappropriate technical solutions and hacks on top of it.

 

time to read 3 min | 555 words

Phill Haack is talking about integration from the database point of view. I had the pleasure of dealing with several projects that involved interfacing to legacy systems. Invariably, the suggested solution was to use the legacy system database directly, and build the application from that.

It was assumed that this would be a less costly endevour than building interfaces at both ends and hooking them together. A few stored procedures and many (many) views later, we had the integration completed on the side of the legacy system. Then it was time to really interface with those systems, at which point we discovered just how bad that was going to be.

I will leave aside the schema warts that we run into, let us talk about how complex it was to use it in practice. Versioning support simply doesn't exists, you are vulnerable to anything that run on that system (a long report is running, say goodbye to the app for the next half hour)...

But by far the biggest issue is the limited ability to express an interface that has a business meaning. ODBC it very suited to express data, but data is not business meaning. Paul Stovell wrote about SP being an interfaces, and yes, they are. But I have very little intrest in an interface like "GetCustomers" that gives me the select from Customers table.

Paul also suggests that exposing those stored procedures as web services will eanble a more flexible approach, where the DBA can change the intrenals without affecting my code. In my expriance, that doesn't work, while WS may be able to solve the issue of versioning, it require a lot more work than is often applied to it. What you end up with is the result set serialized as XML at best, and huge deployment issues for the simplest change.

My opinion is that it is important not to try to abstract the database. It has significant capabilities that nearly all abstractions simply ignore. I want to get all the employees joined to their salaries on the last three months. If I am working against the database, it is easy, I just do it. If I am working against a Stored Procedures layer, or worse, Stored Procedures+Web Services, then we talk about new stored procedure, that now needs to be maintained, checked for duplicated code, deployed, etc. If we are talking about WS as well, we need to publish a web service, write a schema, etc. Now I need to talk to the new Web Service, and changing the stored procedure now means that I have to change (at a minimum) three places, the stored procedure itself, the web service, and the client code.

Paul seems to believe that talking directly to the database implies that there is no contract involved (and I fully agree that a contract is important). I think that the mapping inherit in working with OR/M technologies is the contract. I can safely modify my database and my queries without affecting the code, and I retain all the stregnths of the  database.

*I am talking about views and SP, not the ODBC API itself, of course.

time to read 1 min | 141 words

After about 6 months, we have a new stable release of NHibernate. The full details are here.

I would strongly urge you to upgrade, but be aware that this is not a drop in replacement, there is a migration guide available. From experiance, it takes about a day to convert a reasonable size application to 1.2.

Favorite new features:

  • Filters - (I really wish that I had them when I started my previous project)
  • Generic Collections
  • Database generated properties.
  • SQL Dependencies for NHibernate's cache

WSDL Annoyance

time to read 2 min | 320 words

This is not imprtant, but I can figure out why this:

namespace Foo
{
       [ServiceContract(Namespace = "http://www.test.com")]
       public interface IFubar
       {
              [OperationContract]
              void Act();
              
       }
 
       public class Fubar : IFubar
       {
              public void Act()
              {
                     
              }
       }
}

Result in a WSDL that starts with this:

<wsdl:definitions name="Fubar" targetNamespace="http://tempuri.org/">

Adding a [ServiceBehavior] with the correct namespace only move the tempuri to another location. Any ideas?

FUTURE POSTS

  1. Optimizing the cost of clearing a set - 8 hours from now
  2. Scaling HNSW in RavenDB: Optimizing for inadequate hardware - 2 days from now

There are posts all the way to May 14, 2025

RECENT SERIES

  1. RavenDB News (2):
    02 May 2025 - May 2025
  2. Recording (15):
    30 Apr 2025 - Practical AI Integration with RavenDB
  3. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  4. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  5. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}