Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,590
|
Comments: 51,223
Privacy Policy · Terms
filter by tags archive
time to read 23 min | 4461 words

There is another ASP.Net MVC sample app, this time it is official, passed the proper review procedure, and is explicitly marketed as “intended to guide you with the planning, architecting, and implementing of Web 2.0 applications and services.”

I am saying all of that in order to distinguish it from Oxite, which was non of this things. There have been a couple of reviews of Kobe already. Frankly, I don’t really care for them, mostly because I think that they dealt too much with nitty gritty details of the app that doesn’t really matter much. I don’t much care for extra using or the use of System.Int32 vs. int, the naming convention used or even what sort of HTML formatting they used. I mostly care about the code and architecture. So I decided to take a look myself.

This post is going to go over the nuts & bolts, looking at low level coding details. I am going to have another that is going to look at the architecture and probably a few others that talk about some additional concerns that I have.

Let us start by looking at the code via some tools. Simian reports:

image

And that is when the similarity threshold is 6 lines, I usually run it with three, if we try that, we get:

Found 5138 duplicate lines in 873 blocks in 67 files

Next, the most important code metric to me in most cases, Cyclomatic Complexity. Two methods literally jump out of the solution and beg for mercy.

  • HomeController.Discovery with CC index of 35(!)
  • GroupController.Profile with CC index of 22

I am going to show them both in all their glory, and let you be the judge of them:

public ActionResult Discovery(string query)
{
    query = query.Trim();

    query = Server.HtmlEncode(query);

    List<PresentationSummary> presentationsAll = null;
    List<PresentationSummary> presentationsMostPopular = null;
    List<PresentationSummary> presentationsMostViewed = null;
    List<PresentationSummary> presentationsMostDownload = null;
    List<PresentationSummary> presentationsNew = null;

    List<UserSummary> activeusers = null;
    List<UserSummary> allusers = null;
    List<Group> activegroups = null;
    List<Group> allgroups = null;

    int iNewMemberPageCount = 0;
    int iNewGroupPageCount = 0;
    int ipresentationsAll = 0; 
    int ipresentationsMostPopular =  0;
    int ipresentationsMostViewed = 0;
    int ipresentationsMostDownload = 0;
    int ipresentationsNew = 0;

    UserSummary user = _userService.GetUserByName(query);
    if (user != null)
    {
        presentationsAll = _userService.GetAuthoredPresentations(user.UserName);
        presentationsMostPopular = presentationsAll.OrderByDescending(p => p.Favorites).ToList();
        presentationsMostViewed = presentationsAll.Where(p => p.Views > 0).OrderByDescending(p => p.Views).ToList();
        presentationsMostDownload = presentationsAll.Where(p => p.Downloads > 0).OrderByDescending(p => p.Downloads).ToList();
        presentationsNew = presentationsAll.OrderByDescending(p => p.InsertedDate).ToList();

        ipresentationsAll = decimal.Divide(presentationsAll.Count, 10).ToInt();
        ipresentationsMostPopular = decimal.Divide(presentationsMostPopular.Count, 10).ToInt();
        ipresentationsMostViewed = decimal.Divide(presentationsMostViewed.Count, 10).ToInt();
        ipresentationsMostDownload = decimal.Divide(presentationsMostDownload.Count, 10).ToInt();
        ipresentationsNew = decimal.Divide(presentationsNew.Count, 10).ToInt();

        activeusers = new List<UserSummary> { user };
        allusers = new List<UserSummary> { user };
        iNewMemberPageCount = decimal.Divide(allusers.Count, 8).ToInt();

        allgroups = _userService.GetGroups(user.UserName);
        activegroups = allgroups.OrderByDescending(g => g.InsertedDate).ToList();
        iNewGroupPageCount = decimal.Divide(allgroups.Count, 8).ToInt();
    }
    else
    {
        Group group = _groupService.GetGroupByName(query);
        if (group != null)
        {
            presentationsAll = _groupService.GetPresentations(group.GroupName);
            presentationsMostPopular = presentationsAll.OrderByDescending(p => p.Favorites).ToList();
            presentationsMostViewed = presentationsAll.Where(p => p.Views > 0).OrderByDescending(p => p.Views).ToList();
            presentationsMostDownload = presentationsAll.Where(p => p.Downloads > 0).OrderByDescending(p => p.Downloads).ToList();
            presentationsNew = presentationsAll.OrderByDescending(p => p.InsertedDate).ToList();

            ipresentationsAll = decimal.Divide(presentationsAll.Count, 10).ToInt();
            ipresentationsMostPopular = decimal.Divide(presentationsMostPopular.Count, 10).ToInt();
            ipresentationsMostViewed = decimal.Divide(presentationsMostViewed.Count, 10).ToInt();
            ipresentationsMostDownload = decimal.Divide(presentationsMostDownload.Count, 10).ToInt();
            ipresentationsNew = decimal.Divide(presentationsNew.Count, 10).ToInt();

            allusers = _groupService.GetMembers(group.GroupName);
            activeusers = allusers.OrderByDescending(u => u.DateOfJoining).ToList();
            iNewMemberPageCount = decimal.Divide(allusers.Count, 8).ToInt();

            allgroups = new List<Group> { group };
            activegroups = new List<Group> { group };
            iNewGroupPageCount = decimal.Divide(allgroups.Count, 8).ToInt();
        }
        else
        {
            presentationsAll = _presentationService.GetAllPresentationsByKewordTimeLine(query, "Day", "0", 10, 1);
            presentationsMostPopular = _presentationService.GetMostPopularPresentationsByKewordTimeLine(query, "Day", "0", 10, 1);
            presentationsMostViewed = _presentationService.GetMostViewedPresentationsByKewordTimeLine(query, "Day", "0", 10, 1); 
            presentationsMostDownload = _presentationService.GetMostDownloadedPresentationsByKewordTimeLine(query, "Day", "0", 10, 1); 
            presentationsNew = _presentationService.GetNewPresentations(query, "Day", "0", 10, 1);

            ipresentationsAll = decimal.Divide(_presentationService.GetAllPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsMostPopular = decimal.Divide(_presentationService.GetMostPopularPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsMostViewed = decimal.Divide(_presentationService.GetMostViewedPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsMostDownload = decimal.Divide(_presentationService.GetMostDownloadedPresentationsByKewordTimeLine(query, "Day", "0").Count, 10).ToInt();
            ipresentationsNew = decimal.Divide(_presentationService.GetNewPresentations(query, "Day", "0").Count, 10).ToInt();

            activeusers = _userService.GetMostActiveUsers(query, 8, 1);
            allusers = _userService.GetAllUsers(query, 8, 1);
            iNewMemberPageCount = decimal.Divide(_userService.GetMostActiveUsers(query).Count,8).ToInt();

            activegroups = _groupService.GetMostActiveGroupByKeyword(query, 8, 1);
            allgroups = _groupService.GetAllGroupByKeyword(query, 8, 1);
            iNewGroupPageCount = decimal.Divide(_groupService.GetMostActiveGroupByKeyword(query).Count, 8).ToInt();
        }
    }

    ViewData.Add("membersList-mostactive", activeusers);
    ViewData.Add("membersList-all", allusers);
    ViewData.Add("groupsList-mostactive", activegroups);
    ViewData.Add("groupsList-all", allgroups);

    ViewData.Add("presentations-all",presentationsAll);
    ViewData.Add("presentations-mostpopular",presentationsMostPopular);
    ViewData.Add("presentations-mostviewed",presentationsMostViewed);
    ViewData.Add("presentations-mostdownload",presentationsMostDownload);
    ViewData.Add("presentations-new",presentationsNew);
   
    ViewData.Add("Query", query);
    //ViewData.Add("Presentations", presentations);

    ViewData.Add("members-totalcount", iNewMemberPageCount);
    ViewData.Add("groups-totalcount", iNewGroupPageCount);

    ViewData.Add("presentations-alltotalcount", ipresentationsAll);
    ViewData.Add("presentations-mostpopulartotalcount", ipresentationsMostPopular);
    ViewData.Add("presentations-mostviewedtotalcount", ipresentationsMostViewed);
    ViewData.Add("presentations-mostdownloadtotalcount", ipresentationsMostDownload);
    ViewData.Add("presentations-newtotalcount", ipresentationsNew);

    return View();
}

This is… a very busy method, I must say. But in a way, the Profile method is much worse:

[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Profile(string gname, string type, string section, string subSection, string page)
{
    if (string.IsNullOrEmpty(gname))
        return new ContentResult { Content = "" };

    if (type != "widget")
        return new ContentResult { Content = "" };

    Group group = null;

    try
    {
        group = _groupService.GetGroupByName(gname);
    }
    catch (Exception)
    {
        return new ContentResult { Content = "" };
    }

    if (group == null)
    {
        return new ContentResult { Content = "" };
    }

    string groupName = group.GroupName;

    AddUserLevelToViewData(groupName);

    int pageNo = 1;
    Int32.TryParse(page, out pageNo);
    if (pageNo == 0)
        pageNo = 1;

    if (section == "div-GroupPresentations")
    {
        List<PresentationSummary> presentations = null;

        switch (subSection)
        {
            case "div-GroupPresentations-RecentltAdded":
                presentations = _groupService.GetRecentlyAddedPresentations(groupName, 5, pageNo);
                break;
            case "div-GroupPresentations-MostViewed":
                presentations = _groupService.GetMostViewedPresentations(groupName, 5, pageNo);
                break;
            case "div-GroupPresentations-MostDownloaded":
                presentations = _groupService.GetMostDownloadedPresentations(groupName, 5, pageNo);
                break;
            case "div-GroupPresentations-All":
                presentations = _groupService.GetPresentations(groupName, 5, pageNo);
                break;
        }

        return View("PresentationsList", presentations);
    }
    else if (section == "div-GroupWall")
    {
        switch (subSection)
        {
            case "div-GroupWall-Messages":
                ViewData["GroupMessages"] = _groupService.GetMessages(groupName, 5, pageNo);
                return View("GroupMessageList", ViewData["GroupMessages"]);
            case "div-GroupWall-MemberRequests":
                ViewData["GroupJoiningRequests"] = _groupService.GetGroupJoiningRequests(groupName, 5, pageNo);
                return View("GroupJoiningRequestList", ViewData["GroupJoiningRequests"]);
        }
    }
    else if (section == "div-GroupInfoExtended")
    {
        switch (subSection)
        {
            case "div-GroupInfoExtended-GroupMembers":
                ViewData["GroupMembers"] = _groupService.GetMembers(groupName, 4, pageNo);
                return View("MembersList", ViewData["GroupMembers"]);
        }
    }

    return new ContentResult { Content = "" };
}

Just look at the code. I thought that the whole point of MVC was to separate the logic from the view. Having the view strongly tied to the controller output is fine by me, but having the controller strongly tied to the HTML format of the page? That isn’t right.

Another thing that isn’t right is HomeController.Index():

public ActionResult Index()
{
    GetFeaturedPresentations(); //*** Dummy call the the Database to activate the Connection.
    List<PresentationSummary> featured = _presentationService.GetFeaturedPresentations();
    List<PresentationSummary> beingViewed = _presentationService.GetPresentationRecentlyViewed();
    List<PresentationSummary> mostDownloaded = _presentationService.GetMostDownloadedPresentation();
    PresentationSummary presentationOfDay = _presentationService.GetPresentationOfDay();

    ViewData.Add("FeaturedPresentations", featured.ToArray());
    ViewData.Add("RecentlyViewedPresentations", beingViewed.ToArray());
    ViewData.Add("MostDownloadedPresentations", mostDownloaded.ToArray());
    ViewData.Add("presentationsOfDay", presentationOfDay);
    ViewData["Tags"] = _presentationService.GetPopularTags();

    return View();
}

Notice the first call?

private void GetFeaturedPresentations()
{
    try {
        //*** Dummy call to the Presentation Service to get the Featured presentations,
        //*** this call is place because, an exception thrown from the Data layered on first hit to the DB (failed to open DB) 
        //*** and the second hit to the DB gets success.
        _presentationService.GetFeaturedPresentations();
    }
    catch (Exception)
    { /*do nothing with this exception*/ }
}

I really like it when you work around a bug instead of actually fix it.

Moving on, let us look at the service layer. Most of it looks like this:

public ADs GetAdById(string adId)
{
    try
    {
        string key = "Ads-" + adId;
        ADs data = cacheService.Get<ADs>(key);
        if (data == null)
        {
            data = provider.GetAdById(adId.ToGuid());
            if (data != null && data.Image != null)
            {
                cacheService.Add(key, data as Object, null, DateTime.MaxValue, new TimeSpan(0, 10, 0), System.Web.Caching.CacheItemPriority.Normal, null);
            }
        }
        return data;

    }
    catch (Exception ex)
    {
        bool rethrow = ExceptionPolicy.HandleException(ex, "Service Policy");
        if (rethrow && WebOperationContext.Current == null)
        {
            throw;
        }
        return null;
    }
}

I am not joking about all of them looking alike, by the way, it is obviously has been cut & paste a lot.

Nitpick, “data as Object” – that is not something you often see. But this is even better (from CacheService):

image

I like how we have a cacheService but we coupled its interface with System.Web.Caching, or the fact that most of this code is just a very long winded way of calling GetAdById.

But wait, I spared you the method documentation, which is a real masterpiece:

/// <summary>
/// Returns Advertisment.
/// </summary>
/// <param name="adId"> GUID of an Advertisment</param>
/// <returns>Ads</returns>
public ADs GetAdById(string adId)

Yes, the adId is a string which is a guid. We are so lucky to work with VARIANT again, right?

Let us take another method, just to see what is going on in a typical method inside the project. I intentionally avoid the ones that we already looked at. I took a peek at CommunityController and found the Index method:

[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Index(string type, string section, string subSection, string page)
{
    if (type != "widget")
        return new ContentResult { Content = "" };

    int pageNo = 1;
    Int32.TryParse(page, out pageNo);
    if (pageNo == 0)
        pageNo = 1;

    if (section == "members")
    {
        List<UserSummary> users = null;

        switch (subSection)
        {
            case "members-new":
                users = _communityService.GetNewUsers(8, pageNo);
                break;
            case "members-mostactive":
                users = _communityService.GetMostActiveUsers(8, pageNo);
                break;
            case "members-all":
                users = _communityService.GetAllUsers(8, pageNo);
                break;
            default:
                users = _communityService.GetAllUsers(8, pageNo);
                break;
        }

        return View("MembersList", users);
    }
    else if (section == "groups")
    {
        List<Group> groups = null;

        switch (subSection)
        {
            case "groups-new":
                groups = _communityService.GetNewGroups(8, pageNo);
                break;
            case "groups-mostactive":
                groups = _communityService.GetMostActiveGroups(8, pageNo);
                break;
            case "groups-all":
                groups = _communityService.GetAllGroups(8, pageNo);
                break;
            default:
                groups = _communityService.GetAllGroups(8, pageNo);
                break;
        }

        return View("GroupsList", groups);
    }
    else if (section == "favourites")
    {
        List<PresentationSummary> favouritePresentation = _communityService.GetCommunityFavorite(10, pageNo);
        return View("PresentationsView", favouritePresentation.ToArray());
    }


    return new ContentResult { Content = "" };
}

Let me see how many things I can find in a cursory examination:

  • Hard coding galore
  • Long method
  • Complex method
  • Controller method return several different views

And note that I am still not even trying for the architectural concepts or code quality metrics. That I’m going to leave to another post.

Frankly, I am seeing way too much bad things in the code to overview all of them. I am going to stop with a goodie, though.

Let us explore GroupRepository.GetGroup, shall we?

 private Group GetGroup(Guid groupId, KobeEntities Context)
 {
     try
     {
         Group group = Context.GroupSet
                          .Where(g => g.GroupId == groupId)
                          .FirstOrDefault();

         if (group == null)
             throw new DataException("Invalid Group Id", null);

         return group;
     }
     catch (Exception ex)
     {

         bool rethrow = ExceptionPolicy.HandleException(ex, "Data Policy");
         if (rethrow)
         {
             throw;
         }
         return null;
     }
 }

On the face of it, except for the repeated stupid error handling, there doesn’t seems to be something wrong here, right?

Take note for the different parameter casing on the GetGroup, though. Why is KobeEntities PascalCase? Well, that is because there is also a property called Context on the GroupRepository that you might use by accident. So, what is this Context parameter all about? GetGroup is a private method, who is calling it?

Here is one such callsite:

 public void AddGroupInterests(Guid groupId, string[] interests, Guid userId)
 {
     try
     {
         KobeEntities _context = Context;

         Group group = GetGroup(groupId, _context);
         User user = GetUser(userId, _context);

So, we take the Context property, put it in a _context local variable. Then we pass it to GetGroup, which uses it.

I must say that I am at a loss to understand what was going on in the mind of whoever wrote it. Was he trying to optimize the number of time we access the property?

As I said, I am currently writing a few other posts about Kobe, this was just to get to see the code itself, so we have an impression about its quality.

I am… not impressed.

time to read 2 min | 283 words

Queries are business logic, as such, they can be pretty complex, and they also tend to be pretty perf sensitive. As such, you usually want to have a good control over any complex queries. You can do that by extracting your queries to the mapping, so they don’t reside, hardcoded, in the code:

<query name="PeopleByName">
	from Person p
	where p.Name like :name
</query>

And you can execute it with:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	session.GetNamedQuery("PeopleByName")
		.SetParameter("name", "ayende")
		.List();
	tx.Commit();
}

PeopleByName is a pretty standard query, and executing this code will result in:

image

Now, let us say that we discovered some performance problem in this query, and we want to optimize it. But the optimization is beyond what we can do with HQL, we have to drop to a database specific SQL for that. Well, that is not a problem, <sql-query/> is coming to the rescue.

All you need is to replace the query above with:

<sql-query name="PeopleByName">
	<return alias="person"
					class="Person"/>
	SELECT {person.*}
	FROM People {person} WITH(nolock)
	WHERE {person}.Name LIKE :name
</sql-query>

And you are set. You don’t need to make any changes to the code, but the resulting SQL would be:

image

Fun, isn’t it?

time to read 2 min | 260 words

I, like many, have grown used to NHibernate’s schema generation capabilities. Those make working with databases such a pleasure that I cannot imagine trying without them.

However, at some point, even NHibernate’s smarts reach an end, and such an occasion requires the use of direct SQL to manipulate the database directly. A good example of that would be:

<!-- SQL Server need this index -->
<database-object>
	<create>
	CREATE INDEX PeopleByCityAndLastName ...
	</create>
	<drop>
	DROP INDEX PeopleByCityAndLastName 
	</drop>
	<dialect-scope name="NHibernate.Dialect.MsSql2000Dialect"/>
	<dialect-scope name="NHibernate.Dialect.MsSql2005Dialect"/>
	<dialect-scope name="NHibernate.Dialect.MsSql2008Dialect"/>
</database-object>

<!-- Oracle need this stats only -->
<database-object>
	<create>
	CREATE STATISTICS PeopleByCityAndLastName ...
	</create>
	<drop>
	DROP STATISTICS PeopleByCityAndLastName 
	</drop>
	<dialect-scope name="NHibernate.Dialect.OracleDialect"/>
	<dialect-scope name="NHibernate.Dialect.Oracle9Dialect"/>
</database-object>

As you can see, this allows us to execute database specific SQL, using the dialect scope. It is not a common feature, but it can be incredibly useful.

time to read 5 min | 900 words

NHibernate has several concurrency models that you can use:

  • None
  • Optimistic
    • Dirty
    • All
  • Versioned
    • Numeric
    • Timestamp
    • DB timestamp
  • Pessimistic

We will explore each of those in turn.

None basically means that we fall back to the transaction semantics that we use in the database. The database may throw us out, but aside from that, we don’t really care much about things.

Optimistic is more interesting. It basically states that if we detect a change in the entity, we cannot update it. Let us see a simple example of using optimistic dirty checking for changed fields only:

<class name="Person"
			 optimistic-lock="dirty"
			 dynamic-update="true"
			 table="People">

Using this with this code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var person = session.Get<Person>(1);
	person.Name = "other";
	tx.Commit();
}

Will result in:

image

Note that we have so specify dynamic-update to true. This is required because doing so will generally cause much greater number of query plan to exist in the database cache.

Setting optimistic-lock to all would result in:

image

If the update fails because the row was updated, we will get a StaleObjectException. Like all exceptions, this will make the session ineligible for use, and you would have to create a new session to handle it.

Usually a better strategy is to use an explicit version column. We can do it by specifying <version/>:

<version name="Version" column="Version"/>

And that would result in:

image

As you can probably guess, if the version doesn’t match, we will get StaleObjectException.

Instead of using numeric values, we can use a timestamp:

<version name="Version" column="Version" type="timestamp"/>

In this case, the property type should be DateTime, and the resulting SQL would be:

image

This is, of course, a less safe way of doing things, and I recommend that you would use a numeric value instead.

Another option is to use the database facilities to handle that. in MS SQL Server, this is the TimeStamp column, which is a 8 byte binary that is changed any time that the row is updated.

We do this by changing the type of the Version property to byte array, and changing the mapping to:

<version name="Version"
				 generated="always"
				 unsaved-value="null"
				 type="BinaryBlob">
	<column name="Version"
					not-null="false"
					sql-type="timestamp"/>
</version>

Executing the code listed above will result in:

image

We use the value of the timestamp to ensure that we aren’t overwriting the row data after it was changed. The database will ensure that the row timestamp will change whenever the row itself is updated. This plays well with system where you may need to update the underlying tables outside of NHibernate.

Pessimistic concurrency is also expose with NHibernate, by using the overloads that takes a LockMode. This is done in a database independent way, using each database facilities and syntax.

For example, let us example the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var person = session.Get<Person>(1,LockMode.Upgrade);
	person.Name = "other";
	tx.Commit();
}

This will result in the following SQL:

image

We can also issue a separate command to the database to obtain a lock on the row representing the entity:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var person = session.Get<Person>(1);
	session.Lock(person, LockMode.Upgrade);
	person.Name = "other";
	tx.Commit();
}

The Get() would generate a standard select, without the locks, but the Lock() method would generate the following SQL:

image

The behavior for conflict in this case is very simple, we wait. If we wait for too long, the timeout will expire and we will get a timeout exception, because we could not obtain the lock.

That is consistent with how we would use pessimistic concurrency elsewhere.

time to read 1 min | 117 words

This has nothing to do with technology. It has to do with books. In particular the Wheel of Time books.

Go and read this announcement. I want to cry!

For crying out loud, I have been reading this series of book for the last decade. I spent most of my high school re-reading the books, and I consider them a big reason for why I able to understand English at the level that I want.

Hell, I own several copies of some of the books, but for crying out loud, another three books? And ones that would be basically cut in the middle?

Gnashing of teeth describe my current status quite nicely.

time to read 12 min | 2225 words

And now it is time to go to the <set/> and explore it. Most of the collections in NHibernate follow much the same rules, so I am not going to go over them in details:

<set
    name="propertyName"                                         (1)
    table="table_name"                                          (2)
    schema="schema_name"                                        (3)
    lazy="true|false"                                           (4)
    inverse="true|false"                                        (5)
    cascade="all|none|save-update|delete|all-delete-orphan"     (6)
    sort="unsorted|natural|comparatorClass"                     (7)
    order-by="column_name asc|desc"                             (8)
    where="arbitrary sql where condition"                       (9)
    fetch="select|join|subselect"                               (10)
    batch-size="N"                                              (11)
    access="field|property|ClassName"                           (12)
    optimistic-lock="true|false"                                (13)
    outer-join="auto|true|false"                                (14)
>

    <key .... />
    <one-to-many .... />
</set>

1) is the collection property name, just like <property/> or <many-to-one/> are the value property names.

2) table is obviously the table name in which the values for this association exists.

3) schema is the schema in which that table lives.

4) lazy controls whatever this collection will be lazy loaded or not. By default it is set to true. Let us see how this work:

<set name="Posts" table="Posts">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

With the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.Get<Blog>(1);
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

This produces the following statements:

image

image

We need two select statements to load the data.

However, if we change the set definition to:

<set name="Posts" table="Posts" lazy="false">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

We… would get the exact same output. Why is that?

The answer is quite simple, lazy only control whatever the collection will be loaded lazily or not. It does not control how we load it. The default is to use a second select for that, because that tend to be more efficient in the general case, since this avoid the possibility of a Cartesian product. There are other options, of course.

If we just set lazy to false, it means that when we load the entity, we load the collection. The reason that we see the same output from SQL perspective is that we don’t have a time perspective of that. With lazy set to true, the collection will only be loaded in the foreach. With lazy set to true, the collection will be loaded on the Get call.

You are probably interested in outer-join, which we can set to true, which will give us:

<set name="Posts" table="Posts" outer-join="true">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And would result in the following SQL:

image

Here we get both the blog and its posts in a single query to the server.

The reason that lazy is somewhat complicated is that there are quite a bit of options to select from when choosing the fetching strategy for the collection, and in general, it is suggested that you would not set this in the mapping, because that is usually too generic. It is preferred to control this at a higher level, when you are actually making use of the entities.

5) inverse is something that I talk about extensively here, so I’ll not repeat this.

6) cascade is also something that I already talked about

7) sort gives you a way to sort the values in the collection, by providing a comparator. Note that this is done in memory, not in the database. The advantage is that it will keep thing sorted even for values that you add to the collection in memory.

8) order-by gives you the ability to sort the values directly from the database.

Note that both 7 & 8 does not work with generic sets and that in general, you don’t want to rely on those ordering properties, you want to use the natural properties of the selected collection. Sets are, by definition, unordered set of unique elements. But generic sorted bags does work:

<bag name="Posts" table="Posts" order-by="Title ASC">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</bag>

And would produce in the following SQL:

image 

9) where allow us to use some arbitrary SQL expression to limit the values in the collection. Usually this is used to filter out things like logically deleted rows. Here is a silly example:

<set name="Posts" table="Posts"
		  where="(len(Title) > 6)">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

Which would result in:

image

Note that there is important subtlety here, if you intend to use this collection with eager loading, you must make sure that your where clause can handle null values appropriately (in the case of an outer join).

10) fetch controls how we get the values from the database. There are three values, select, join and subselect. The default is select, and you are already familiar with it. Setting it to join would result in:

<set name="Posts" table="Posts" fetch="join">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And the following code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blog = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(5)
		.List<Blog>()[0];
	foreach (var post in blog.Posts)
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Will give us:

image

Setting it to subselect will show something quite a bit more interesting:

image

We have two queries, the first to load the blogs, and the second one:

image

In this case, we load all the related posts using a subselect. This is probably one of the more efficient ways of doing this. We load all the posts for all the blogs in a single query. That assumes, of course, that we actually want to use all those posts. In the code seen above, this is actually a waste, since we only ever access the first blog Posts collection.

11) batch-size is another way of controlling how we load data from the database. It is similar to fetch, but it gives us more control. Let us see how it actually work in action before we discuss it.

<set name="Posts" table="Posts" batch-size="5">
	<key column="BlogId"/>
	<one-to-many class="Post"/>
</set>

And this code:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	var blogs = session.CreateCriteria(typeof(Blog))
		.SetMaxResults(30)
		.List<Blog>();
	foreach (var post in blogs.SelectMany(x=>x.Posts))
	{
		Console.WriteLine(post.Title);
	}
	tx.Commit();
}

Produces:

image

Fist we load the blogs, and we load 30 of them. Now, when we access any of the unloaded collections, something very interesting is going to happen. NHibernate is going to search for up to batch-size unloaded collections of the same type and try to load them all in a single query. The idea is that we take a SELECT N+1 situation and turn that into a SELECT N/batch-size + 1 situation.

In this case, it will turn a 31 queries situation into a 7 queries situation. And we can increase the batch size a bit to reduce this even further. As usual, we have to balance the difference between local and global optimizations. If we make batch-size too large, we load too much data, if we make it too small, we still have too many queries.

This is one of the reasons that I consider those fancy options important, but not as important as setting the fetching strategy for each scenario independently. That is usually a much better strategy overall.

12) access was already discussed elsewhere.

13) optimistic-lock was already discussed elsewhere.

14) outer-join was discussed above, when we talked about lazy.

time to read 2 min | 276 words

Like the <component/> mapping, <dynamic-component/> allows us to treat parts of the entity table in a special way. In this case, it allow us to push properties from the mapping into a dictionary, instead of having to have the entity have properties for it.

This is very useful when we need to build dynamically extended entities, where the client can add columns on the fly.

Let us take this entity as an example:

image

And this table:

image

Where we want to have the SSN accessible from our entity, but without modifying its structure. We can do this using <dynamic-component/>:

<class name="Person"
		table="People">

	<id name="Id">
		<generator class="identity"/>
	</id>
	<property name="Name" />

	<dynamic-component name="Attributes">
		<property name="SSN"
			type="System.String"/>
	</dynamic-component>
</class>

And the query just treat this as yet another column in the table:

image

time to read 6 min | 1145 words

I wanted to explore a few options regarding the way we can map inheritance using NHibernate. Here is the model that we are going to use:image

And the code that we are going to execute:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	session.CreateCriteria(typeof(Party)).List();
	session.CreateCriteria(typeof(Company)).List();
	session.CreateCriteria(typeof(Person)).List();
	tx.Commit();
}

From now on we are going to simply play with the mapping options to see what we can come up with. We will start with a very simple discriminator based mapping (table per hierarchy):

<class name="Party"
			 abstract="true"
			 table="Parties">
	<id name="Id">
		<generator class="identity"/>
	</id>
	<discriminator column="Discriminator"
			not-null="true"
			type="System.String"/>

	<subclass
		name="Person"
		discriminator-value="Person">
		<property name="FirstName"/>
	</subclass>

	<subclass
		name="Company"
		discriminator-value="Company">
		<property name="CompanyName"/>
	</subclass>
</class>

Which result in the following table structure:

image

And the SQL that was generated is:

Select Party

image

Select Company

image

Select Person

image

But that is just one option. Let us see what happen if we try the table per concrete class option:

<class name="Person"
	table="People">
	<id name="Id">
		<generator class="identity"/>
	</id>
	<property name="FirstName"/>
</class>

<class name="Company"
	table="Companies">
	<id name="Id">
		<generator class="identity"/>
	</id>
	<property name="CompanyName"/>
</class>

Which result in the following table structure:

image

And the following queries:

Select Party

image

image

No, that is not a mistake, we issue two SQL queries to load all possible parties.

Select Company

image

Select Person

image

The inheritance strategy is table per subclass:

<class name="Party"
		abstract="true"
		table="Parties">
	<id name="Id">
		<generator class="identity"/>
	</id>

	<joined-subclass
		table="People"
		name="Person">
		<key column="PartyId"/>
		<property name="FirstName"/>
	</joined-subclass>

	<joined-subclass
		table="Companies"
		name="Company">
		<key column="PartyId"/>
		<property name="CompanyName"/>
	</joined-subclass>
</class>

Which result in the following table structure:

image

And the queries:

Select Party

image

This is slightly tricky, basically, we get the class based on whatever we have a row in the appropriate table.

Select Company

image

Select Person

image

The final option is using unioned subclasses, which looks like this:

 

<class name="Party"
		abstract="true"
		table="Parties">
	<id name="Id">
		<generator class="hilo"/>
	</id>

	<union-subclass
		table="People"
		name="Person">
		<property name="FirstName"/>
	</union-subclass>

	<union-subclass
		table="Companies"
		name="Company">
		<property name="CompanyName"/>
	</union-subclass>
</class>

Note that it is not possible to use identity with union-subclasses, so I switched to hilo, which is generally much more recommended anyway.

The table structure is similar to what we have seen before:

image

But the querying is drastically different:

Select Party

image

Select Company

image

Select Person

image

The benefit over standard table per concrete class is that in this scenario, we can query over the entire hierarchy in a single query, rather than having to issue separate query per class.

time to read 2 min | 322 words

I got a call today from a team mate about a piece of software that we wrote. We initially planned it to work with data sizes of hundreds to thousands of records, and considering that we wrote that one in a day, I considered it a great success. We didn’t pay one bit of attention to optimization, but the perf was great for what we needed.

However, a new requirement came up which require us to handle hundred thousand records, and our software was working… well under the new constraints. As a matter of fact, it would take about 0.4 seconds to compute a result when the data size was hundred thousand records. That was probably well within acceptable range, but it had me worried, because 0.4 seconds may be acceptable for this single computation, but it was a CPU bound computation, and it was probably going to kill us when we start hammering the server with multiple requests for that.

Moreover, adding 0.4 to the clients of the system would add an unacceptable delay. So we sat down with dotTrace and tried to figure out what we were doing that could be better.

We spent several hours on that. At some point, a simple extraction of a variable from a loop reduced 7% of the total execution time, but we weren’t really interesting in the small fish. We soon identified a O(N) operation in the code, which was taking about 20% of the total time of the calculation, and focused on that.

By changing the way that we worked with this, we changed this from an O(N) operation to O(1). We did it by introducing indexes and doing intersects of those indexes. The usual choice of time vs. memory has saved us again.

Current time for computing a result? 20 – 50 ms, and that include network time.

I think that I’ll find that acceptable.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}