Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,582
|
Comments: 51,212
Privacy Policy · Terms
filter by tags archive
time to read 14 min | 2786 words

After my last post about the ADO.Net Entity Framework, I got a long comment from Pablo Castro, the ADO.NET Technical Lead. I took the time answerring it, mainly because I wanted to consider my words carefully. Again, I'm probably not a partial side in this matter, with my strong NHibernate bias, but I do have quite a bit of experiance with Object Relational Mapping and how they map to real world applications.

My previous post was mainly about extensibility in the framework, and how much I can extend it to fit the needs of the applications. So, without further ado :-), let me get to the points Pablo brought up:

We're not ignoring extensibility :), we're looking at some aspects of this right now. A couple of examples:
  - we are working on making sure that the data model itself is extensible so services can be built on top of the model and have service-specific metadata that's accessible through the metadata API and at the same time doesn't interfere with the code runtime that depends on it.
  - we're also working finishing the details of the provider model so everyone can plug-in their stores  into the entity framework.

I read this paragraph several times, and I am afraid that I'm lacking the background to understand exactly what he means here. The Provider model, as I understand it, will allow to use Oracle, MySQL, etc as the database engine underlying the technology. The services is something I'm puzzled about, I'm not sure what is a service in this particual instance. Is the LINQ wrapper on the EDM a service? Or does it mean application level service?

A few comments on your specific concerns:
- many to many relationships: it's not in the current bits, but it's not ruled out either as of yet. The system models certain things as link-tables already (which doesn't necessarily mean you have to use link tables for 1-n relationships in the actual store, of course, you can use the regular 2-table PK/FK story), although not all the 2details are in place for specifying and updating m-m relationships. We'll see were we land on this.

This worries me. Many to many is a very common idiom in databases and in domain models. Users and groups is a very simple example of this, but in any model there are many such relationships. I am aware of the amount of work that this feature require, but that is the reason we get those frameworks in the first place, so the developers wouldn't have to handle this.

- Indexed collections. The current collections are not indexable, but I'll take the feedback and look into it. As for dictionary, I acknowledge that there are scenarios for it, although we currently don't have plans for doing it. Again, feedback taken.

Sets match the database model very nicely, but as a developer, I often has other need from collections. For instance, a set of filtering rules where the orderring of the rules is important match very nicely to a list (indexed by the position). A set of attributes that can related to an object is modeled as a index collection of key and value pairs. Frankly, I don't see much of a difference between dictionaries and indexed collections at a high level.

Those are the simple things, by the way. What about cases where the collection's key has a valid business meaning. A good example is checking valid date ranges. My house' lease have a contract for a specific period only, this naturally maps to an object model that has the relation between the house and the current leaser is a dictionary of date ranges and customers. Add a couple more freakish requirements, and you have to have support for those issues. As much as I would like it to be, using simple sets is often just not possible, too much information goes away this way.

- Custom collections: we have had *really long* discussion over this among ourselves...right now we are focusing on the scenarios where our own classes are used for collections; that allows us to do relationship fixups and the ends of a relationship changes. It also makes it straightforward to do change-tracking over related sets. With custom collections we can't fix up ends of relationships, so let's say you have a Customer object and an SalesOrder object; when you add the SalesOrder to the Customer.Orders collection, you'll expect SalesOrder.Customer to point to the right customer...we do that by hooking up into the collection. It results in less surprises, particular for users that don't have a lot of experience with the tricky areas of object layers on top of databases. As for your particular scenario, is that something you can embed in the CLR type that represents the entity (e.g. as property in the user part of the partial class)?

I wrote my own collection classes for NHibernate that will do the fixups for the relationships, so I fully understand the need and how nice it is to have this. That said, please don't try to protect me from myself. If writing a custom collection is something really hard, document it with big red giant letters, and let me feel the pain of doing it myself. I will need to do this, period.

I understand the issues with exploding test matrixes and scenarios that you can't support because while you may infinite supply of resources, you don't have infinite supply of time :-). But I would much rather a "Here Be Dragons" sign over a locked door.

For the specific scenario, you can take a look at this post, I discussed a bit how I solved the issue. Note that all the History collections are of IDictionary<DateRange, T> type.

- The tricky part of this one is update. When you change the value in your entity from false (null) to true (not-null) what value do you put in the database? "not-null" is not specific enough.

It took me a moment to figure out what you meant here. To the readers who aren't versed in OR/M implementation details, the issue is this:

sale.Customer = currentCustomer;
context.SubmitChanges();

The OR/M nees to figure out what the Id of the currentCustomer is, so it can save it to the database. I'm not sure that I understand the problem here, though.

You have the instance of the related object, and you know the object model. It is very simple to get from those two pieces of information to the value that should go to the database. I understand that you are working slightly differently in your model, using EntityRef<T> to explicitly hold the ID value, but I don't see this as an unsolvable issue, or even a very hard one. The simplest issue here is to dictate that you need this value, and fail if it is not there. If I do my custom thing, I should play nice with the rest of the framework.

Thoughts and further questions are welcome :)

You'll probably regret saying this :-) but I will try.

First, the main assumtion that I am making here is that the ADO.Net Entity Framework is supposed to be used in both new projects and existing ones, and that it is supposed to be a fully featured package, and not one that is targeted at the simple scenarios. If I am wrong, than most of my comments are invalid by defination, but the way I see ADO.Net Entity Framework presented seems to support this assumtion.

  • Legacy support
    This is a major issue if you decide that you want to support existing projects and not just new ones. In this case, you need to support some fairly crazy database schemas. From the sane to the fully de-normalized ones to the Let Us Put This Data Somehere approach that some peole prefer.
    To give you a few examples:
    • A PK/FK relationship where the FK is found via substring(5, PK) in the parent table.
    • A boolean flag that has null / not null mapped to true or false, with completely random values as the non null values.
    • A table where each row contains several entities, and the relations between them.
  • Inheritance model
    From the documentation that I have seen so far, the inheritance models supported are table per hierarchy (discriminator) and table per sub-class. It there support for table per class, and how well it plays with the rest of the system.
  • Splitting a value from a table
    This is related to too much information in a row, but I may want to break a part of the row into a value object that is associated with the object.
  • Caching
    What is the caching story? I know that you have Identity Map in place, but what about application wide caching? I assume that you will use ASP.Net's cache, but what happen on a web farm scenario?
  • Master / slave scanerios
    What happen when I want to scale by making all my writes to a single server, and replicate from there?
  • Connection Control
    What happens if I want to explicitly control the connection lifetime and behavior?
    For that matter, how much control do I have for where exactly the data goes? Can I decide to move to a different database for a save, and then move back?
  • Change tracking
    Change tracking on entities is usually done by comparing them to their original values, when this is done on large object graphs, it can be a significant performance hit, especially if I'm only updating a few values, but I need to read a lot of data. Can I take action in this case?
  • SQL Functionality
    What happens if I have some specific functionality in the database that I need to use? UDF is one example, but my SQL Functions come to mind as well. Is it possible? How well it integrates into the rest of the systems?
  • Composite Keys
    Can I use them? How well do they play with relashions? What happen if the relation is based on only part of the compose key?
  • Custom Types
    I mentioned the null bool scenario, but there are many other cases where I want to get involved with the way the framework is creating and persisting properties in my objects.

Customer scenarios - This is here because of a post by Clemens Vasters about how much less affective he became in influencing the direction of WCF since he joined the team. Now he doesn't have the justification of a customer need to do it.

Here a few examples of scenarios that I personally run into:

  • Timed objects - I discussed how I used OR/M to make sense of a really complicated business model. This approach require customized collection and quite a bit flexibility on the side of the OR/M when defining the way the data is transformed from the database to the object graph.
  • Xml Extended Attributes - I have got a customer who want to keep a list of extended attributes inside the table itself, and has decided to use XML Column in order to do so. The challange here is to get the data from the XML Column inot an object property and persist it back to the XML. This allows extending the table for child objects without modifying the object structure.
  • Handling legacy schema such as this one, I know how I can map it cleanly so the object model has no idea how the database looks like.

I posted anoter list of features that I constant use in OR/M here.

off-topic: regarding data-adapter batching, I made that call early in the Whidbey cycle; it was painful, but it was unrelated to wanting/not-wanting extensibility. It was a scoping decision. At some point you have to decide where to cut a release and it forces you to prioritize scenarios, and that includes painful decisions. I completely understand your frustration, but you can be assured that it wasn't, and it's not now in the Entity Framework design, lack of interest for extensibility.

The issue that I have with this approach is that it targets the simple scenarios, those who probably wouldn't get that much benefit from this. The amount of effort that Microsoft is putting into OR/M efforts is proof that the DataSet model doesn't really scale for complex applications. The decision to make this accessible only via DataAdapter punishes anyone who decide that they can do better by going lower in the stack and build from there. I am used to having more options the lower I go, not the other way around.

In ASP.Net, for instance, I may not be able to change the Page Controller architecture in the WebForms level, but I can go down a couple of levels and use Front Controller architecture, and the only functionality that I lose is the one spesific for WebForms.

But this is crying over spilled milk, I am afraid. The most pressing question at the moment is will this be fixed in .Net 3.0? And yes, I am talking about the former WinFX release. I want and need this ability. This is the closest release, and the next one after that is sometimes in 2008, which is way too long.

time to read 12 min | 2350 words

First things first, don't get too excited, NHibernate doesn't support Stored Procedures (yet).

What I am talking about is encapsulating HQL queries in a similar manner to the way stored procedures work.

A little background before I start talking about the technical details. When using NHibernate, I'm usually pushing the use of NHibernate's Criteria approach, since this is by far the most easy way for people to comprehend how to query the database using NHibernate. While for simple uses HQL is very simple:

from

Employee e where e.MiddleName is not null

HQL can get complicated when you want to do complex stuff, and it has one big problem. It looks like SQL, but it is not SQL.

Take this example, for instnace:

from PresenseReport pr, EmployementContract ec

where pr.Employee = ex.Employee

and   pr.Date between ec.ValidityStart and ec.ValidityEnd

and   pr.Employee.IsMobile = true

and   pr.Date between :startDate and :endDate

and   pr.Employee.Manager = :manager

I will spare you the effort of trying to come up with the equilent SQL. This is still a simple query, but it is not something that can be expressed using NHibernate's Criterias.

Now we have got several options of how to use it. We can embed it in our code, but this has some serious disadvantages. I really hate to have strings in my application, and I would much rather use a seperate file or a resource somewhere, but it turns out that we don't need to. NHibernate has the concept of a Named Query, which allow you to define a query in the mapping files and give it a name, like this:

<?xml version="1.0" encoding="utf-8"?>

<hibernate-mapping xmlns:xsd="http://www.w3.org/2001/XMLSchema"

                               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

                              xmlns="urn:nhibernate-mapping-2.0">

       <query name="GetPresenceReportAndContractsForSubordinaceMobileEmployeesInDateRange">

              <![CDATA[

              from PresenseReport pr, EmployementContract ec

              where  pr.Employee = ex.Employee

              and    pr.Date between ec.ValidityStart and ec.ValidityEnd

              and    pr.Employee.IsMobile = true

              and    pr.Date between :startDate and :endDate

              and    pr.Employee.Manager = :manager

              ]]>

       </query>

</hibernate-mapping>

You may consider the query name too long, but I like to be explicit about such things.

Now we can use the same query in several locations, without copying and pasting the code. If you really like, you can even change the query without changing the client code (which is supposed to be the holy grail of stored procedures.

You can use this query like this:

session.GetNamedQuery("GetPresenceReportAndContractsForSubordinaceMobileEmployeesInDateRange").

       SetParameter("manager", currentEmployee).

       SetParameter("startDate", startDate).

       SetParameter("endDate", endDate).

       List();

Now, this still doesn't free the application developers from understanding what is going on, of course. For instnace, this query returns a list of tuples, of of those contains a precense report and the matching employement contract. I need to understand the return value and unpack it into a form that is easier to work with.

By the way, in this case, the idea is to get the employement contract and use that to validate the report. Notice that I am checking for the date of botht he employement contract validity and the precense report itself. It removes a whole set of issues that I suddenly do not hvave to think about.

If you wanted to be really fancy, it is trivial to take the XML above and use code generation to generate a query class that will wrap this functionality.

time to read 1 min | 189 words

My current readership seems to have stabilized on around 3,500 readers a day (after a peak of about 5,500 a day after I got linked from ScottGu's blog).

After seeing the dramatic difference that such a link can cause, I decided to check how I can bring in more readers. The simplest thing is to get myself linked from the aggregations sites such as Digg, DotNetKicks, etc. Of course that the high goal there is to get slashdotted, but I'm leaving this tactic for the future :-)

Now, putting my own posts on those sites is sleazy, but there is no reason that I wouldn't make it easy for you to do it. So I added links to deli.cio.us, Digg and DotNetKicks on each post. I used dasBlog's ability to define custom macros, and this custom macro library.

The link contains all the information that you can want on this, but be aware that the macros shown there wouldn't work as-is. For some reason Vasanth forgot to put <% on the beginning of each macro.

time to read 5 min | 995 words

I talked about why statics are evil a couple of days ago. Now let us see why we want to use them anyway. Let us talk about a common scenario, and see what we have there. The scenario that I present here is extremely simplistic, of course, but it should be enough that you would get the point.

Let us take the common scenario of dispalying a web page. Most web pages are composed of many small pieces of data, and it is often not possible to fetch them from the same source. For the purpose of discussion, we will need to show a list of orders, the customer info as well as a set of personalization information.

The calls we need to make include the following:

  • Get Recent Orders
  • Get Current Customer
  • Get Personalization Information
  • Get Shippers Statuses

Eventually, each of those call will need to acceess a database. There are several ways to handle this issue. The simplest one will be to have each call create its own connection, like this:

using(IDbConnection connection = DataBase.CreateConnection())
{
  //do work
}

This is simple, but it moves the responsability for the data access to each of the classes involved. Even assuming that creating and disposing the connections is not important, because we have connection pooling turned on, there is still overhead associated with them. I really don't like spreading the responsability around like this, even more than the performance issue.

Let us try something different, and pass the connection from outside, like this:

void OnPageLoad(object sender, EventArgs e)
{
  using(IDbConnection connection = DataBase.CreateConnection())
  {
    connection.Open();
    OrdersCollection orders = new OrderRepository(connection).GetRecentOrders(...);
    Customer cust = new CustomerRepository(connection).GetCurrentCustomer(...);
    // etc, etc 
  }
  // do something useful 
}

This get rid of the issue of creating and disposing the connection, but it is still painful. I need to pass the connection explicitly, and now my UI layer knows about such things as databases. That in itself isn't really bad, but the code above is UI code that is managing database connections.

This is definately not the responsability of any UI layer that I have heard of.

Keep those issues in mind, let us take a look in another scenario. I need to validate the orders, so I can display their status. For that, I have a set of business rules that run on each order, and check it for consistency. At the beginning, I used this approach:

ValidationResult result = new Validator().ValidateOrder(order);

Validator will run each of the seperate business rules, and aggerate its results:

public ValidationResult ValidateOrder(Order order)
{
 ValidationResult result = new ValidationResult ();
 foreach(IBusinessRuleValidator validator in OrderValidators)
 {
    validator.Validate(order, result);
 }
 return result;
}

Very simple, isn't it? Until I need to add a business rule that need to check the database as well. For example, validate that I have a contract with a supplier in a spesific date. Now I need to modify the Validator class to pass it a connection, and all the business rules, just for the sake of a single rule. (This is assuming that the code is mine to change.)

I'm going to leave it at this point, and post my preferred way to handle these types of issues.

Ten points for the first guy/gal that can post a solution that contains "service" in its description and actually makes sense :-)

time to read 1 min | 117 words

Note: You can probably ignore this message, it is here for a specific person.

To my reader from *.af.mil, I'm flattered that you link to me, but this address is not on the internet, and it is driving me crazy :-)

I don't know who you are or how to contact you, but I'm assuming that this is an internal army blog, I'm pretty sure that this blog shouldn't go out and ping me from a secured network. If you can't turn pinging off, can you at least send me the posts? I feel like someone is whisperring just outside my hearing.

Thanks.

time to read 4 min | 614 words

Moran has pointed me to the CROSS APPLY syntax in T-SQL 2005, which does allows joining against a table valued function. Using this, the query goes down to this:

SELECT

        EmpName,

        CONVERT(NVARCHAR, CurrentDate,103) Date,

        HasWorked = CASE SUBSTRING(Roster,

                dbo.IndexInRoster(StartDate,CurrentDate,LEN(Roster)), 1)

                        WHEN '_' THEN 0

                        WHEN '*' THEN 1

                        ELSE NULL

            END

FROM Schedules CROSS APPLY DateRange(StartDate,EndDate)

This is much nicer way to deal with it. Considerring that I am using similar techniques all over the place, this is a really good thing to know.

time to read 1 min | 153 words

Okay, I manage to hold out for over a week without releasing it, but the code started snarling at me when I worked with it, so I suppose I have better let it free.

This release contains multiply fixes for orderring, including some really bizzare edge cases. In addition to that, Eric Nicholson was kind enough to send a patch that fixes an issue with mocking classes that has finalizers.

Just to make things interesting, you can find only the binaries here. I decide to keep the source for myself at the moment, network connectivity issues to the subversion repository had nothing to do with this decision.

Happy mocking,

  Ayende.

Update: Okay, I broke down and update the source repository as well. That was 5 minutes of Rhino Mocks being "closed source"

time to read 55 min | 10901 words

I'm starting to get quite a bit of mail from this blog. Some of those questions are about subjects I can answer immediately, some require a fair amount of work (which can be had, if you really want), and the more interesting ones are those that require some thinking, but does not require too much time. This question from Dave is the best one so far, and I got his permission to blog about it, so I'm doubly happy.

The issue is working against a legacy database to get the data for further processing. I'll let Dave explain the issue, since he does it much better:

I have to write a query to generate a report over some interesting data.  It's basically scheduling which days people are working.  The data looks like this:

Id EmpName StartDate EndDate Roster
1 Bob 12/06/2006 18/06/2006 _*___**
2 Mary 12/06/2006 18/06/2006 *_*__*_

The trick is, the roster field contains a string with a _ or * depending on wether the person is scheduled to work that day or not, but the first character always starts on the sunday.  The startdate and enddate can be any day of the week.
In the example above, the 12-jun is a monday, so monday corresponds to the second character in the roster string, so Bob's working and Mary's not.
The roster string wraps around, so the first character of the roster string actually corresponds with the enddate here!  Now, this roster string could be 7, 10, 14 days long. 

I could get the report out if I can write a query to get it to this:

 

Employee DateWorking
Bob 12/06/2006
Bob 16/06/2006
Mary 13/06/2006
Mary 16/06/2006

 

By the way, I haven't asked, but I'll bet that this schema has originated from a MainFrame, if not currently, than in its recent past.

First I created the schema I needed:

 

CREATE TABLE Schedules

(

        Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,

        EmpName NVARCHAR(255) NOT NULL,

        StartDate DATETIME NOT NULL,

        EndDate DATETIME NOT NULL,

        Roster NVARCHAR(50) NOT NULL

);

GO

 

INSERT INTO Schedules

SELECT 'Bob','12-Jun-06','18-Jun-06','_*___**'

UNION ALL

SELECT 'Mary','12-Jun-06','18-Jun-06','*_*__*_'

 

GO

 

Then, I started playing with DatePart(), getting the day of the week of StartDate from each row. This gave me the index I needed into the Roster column. But, this only told me whatever the employee worked or didn't work on the start date, which isn't very helpful. What I needed was a way to check for all the values between StartDate and EndDate.

I posted about this issue a while ago, and I made use of this techqnique here:

 

CREATE FUNCTION DateRange ( @start datetime, @end datetime )

RETURNS @DateRange TABLE ( CurrentDate datetime )

AS

BEGIN

      WHILE (@start <= @end)

      BEGIN

            INSERT INTO @DateRange(CurrentDate) VALUES(@start)

            SELECT @start = DATEADD(day,1,@start)

      END

      RETURN

END

GO

Conceptually, what I wanted was this:

SELECT

      IndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)

FROM Schedules, DateRange(StartDate,EndDate)

Unfortantely, DateRange() is a table valued function, and what this query ask from SQL Server is to join each row in the Schedules table to another table. This is not possible, of course.

I settled on faking it using this appraoch:

WITH AllDatesInTable(CurrentDate) AS

(

        SELECT CurrentDate FROM dbo.DateRange(

                  (SELECT MIN(StartDate) FROM Schedules),

                        (SELECT Max(EndDate) FROM Schedules) )

)

SELECT

      TestIndexInRoster = DatePart(dw,StartDate) + DateDiff(day,StartDate, CurrentDate)

FROM Schedules JOIN AllDatesInTable

ON CurrentDate BETWEEN StartDate AND EndDate

This query uses Common Table Expression to define a table that has all the dates in the Schedules table. Notice that I constrained it to all the dates in the current row. In essense, this give me a row per each date in the date range of each row. This is the basis of solving this problem.

The other issue is the wrapping of the day index in the roster. This is a bit complicated because we need to take into account three things. SQL Server string handling is 1 base, not 0 based (argh!), we are shifting based on the start date functionality, and we need to wrap around correctly. In order to handle this issue I created this function:

CREATE FUNCTION IndexInRoster(@StartDate DATETIME, @CurrentDate DATETIME, @RosterLen INT)

RETURNS INT AS

BEGIN

      DECLARE @Result int

      SET @Result = (DATEDIFF(day,@StartDate,@CurrentDate) + DATEPART(dw,@StartDate)) % (@RosterLen)

      IF @Result = 0

            RETURN @RosterLen

     

      RETURN @Result

END

GO

The check for @Result equals 0 is there because SQL Server is using 1 based string handling.

Brining it all together, we get this:

WITH AllDatesInSchedulesTable(CurrentDate) AS

(

        SELECT CurrentDate FROM dbo.DateRange(

                  (SELECT MIN(StartDate) FROM Schedules),

                        (SELECT Max(EndDate) FROM Schedules) )

)

SELECT

        EmpName,

        CONVERT(NVARCHAR, CurrentDate,103) Date,

        HasWorked = CASE SUBSTRING(Roster,

                dbo.IndexInRoster(StartDate,CurrentDate,LEN(Roster)), 1)

                        WHEN '_' THEN 0

                        WHEN '*' THEN 1

                        ELSE NULL

            END

FROM Schedules JOIN AllDatesInSchedulesTable

ON CurrentDate BETWEEN StartDate AND EndDate

And the result of this query:

EmpName Date HasWorked
Bob 12/06/2006 1
Mary 12/06/2006 0
Bob 13/06/2006 0
Mary 13/06/2006 1
Bob 14/06/2006 0
Mary 14/06/2006 0
Bob 15/06/2006 0
Mary 15/06/2006 0
Bob 16/06/2006 1
Mary 16/06/2006 1
Bob 17/06/2006 1
Mary 17/06/2006 0
Bob 18/06/2006 0
Mary 18/06/2006 1

And from here it is trivial to get to whatever format you want.

FUTURE POSTS

  1. fsync()-ing a directory on Linux (and not Windows) - one day from now

There are posts all the way to Jun 09, 2025

RECENT SERIES

  1. Webinar (7):
    05 Jun 2025 - Think inside the database
  2. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  3. RavenDB News (2):
    02 May 2025 - May 2025
  4. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  5. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}