Ayende @ Rahien

filter by tags archive

architecture (615) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1087) rss
raven (1456) rss
ravendb.net (540) rss
reviews (184) rss

2025
- July (6)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Oct 24 2009

How to mis-configure your production to dead crawl

time to read 2 min | 322 words

Tweet Share Share 9 comments

Tags:

Bugs

One of the things that I avidly subscribe to is that when you are building server applications, visibility into what the application is doing is key. If you don’t have visibility into the application, you are left in the dark, stumbling and alone.

NH Prof is a good example of how you make things much more visible to developers, but I am doing this in many other places, Rhino Service Bus is another example of how important it is to have visibility into what your servers are doing. Being able to see what is actually going on in your apps is priceless!

However, that does come at a cost. And a pretty steep one at that.

Logging is expensive. It is expensive to gather the data to log, it is expensive to actually log the data, in order to produce coherent logs in multi threaded environment, you need to perform synchronization, etc.

One of the most common performance problems that people run into is when they push their debug logging configuration to production. If we will look into NHibernate as a good example, if you set the log level to debug, you are going to get tens of thousands of messages just for the startup! And you are going to get hundreds and thousands of log messages for routine operations.

That is going to slow things down, considerably.

Production configuration management is a big topic, and it deserves its own post, but the impact of setting the log level too low in production cannot be stressed strongly enough.

Pay attention to this, or it will make you pay attention to it.

In one notable case, several years ago, we re-implemented a whole module in the application (significantly optimizing it) to get exactly zero benefit (well, negative benefit, actually) from the exercise, since the actual problem was slow logging database.

In another, we improved performance by over 60% by simply raising the log level from DEBUG to ERROR.

Oct 23 2009

MSMQ: Insufficient resources to perform operation.

time to read 2 min | 330 words

Tweet Share Share 4 comments

Tags:

Bugs

I run into an annoyingly persistent error on a production server:

System.Messaging.MessageQueueException: Insufficient resources to perform operation.
   at System.Messaging.MessageQueue.SendInternal(Object obj, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
   at System.Messaging.MessageQueue.Send(Object obj)
   at ConsoleApplication1.Program.Main(String[] args)

This was generated from the following code:

class Program
{
    static void Main(string[] args)
    {
        try
        {
            var queue = new MessageQueue(@".\private$\MyQueue");
            queue.Send("test msg");
            Console.WriteLine("success"); 

        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
    }
}

The error was happening on all queues, and persisted even after rebooting the machine. I should mention that we had a runaway message generation, and we have queues with lots of messages currently. The queue that I was trying to send the message to was empty, however. And there was no reading/writing on the queues.

Nasty issue to figure out, I was pointed to this post, and indeed, suggestion #7 was it, we run over the machine quota for MSMQ, so it started rejecting messages.

No one really thought about it, because we never set the quota, but it seems that by default there is a 1GB quota for the entire machine. This is usually more than enough to handle things, I would assume, but when you run into a convoy situation, bad things will happen.

Oct 22 2009

Database independence with NHibernate

time to read 3 min | 415 words

Tweet Share Share 11 comments

Tags:

NHibernate

Karl has posted about his experience of porting an NHibernate application from SQL Server to PostgreSQL, long story short, he did it in 1 hour.

He does bring up a few points where the database that he was using bled into the code, hurting the database independence goal. I wanted to look at those and point out the builtin solutions that NHibernate provides for those.

Handling unique constraints violations, Karl has code like this:

try
{
    Session.SaveOrUpdate(user);
    transaction.Commit();
    return true;
}
catch (GenericADOException ex)
{
    transaction.Rollback();
    var sql = ex.InnerException as SqlException;
    if (sql != null && sql.Number == 2601)
    {
        return false;
    }
    throw;
}
catch (Exception)
{
    transaction.Rollback();
    throw;
}

Obviously, this code relies heavily on internal knowledge of SQL Server error codes, and wouldn’t translate to PostgreSQL.

With NHibernate, you deal with those issues by writing ISqlExceptionConverter, you can read Fabio’s post about this, but the gist of it is that you can provide your own exceptions in a database independent way.

The count function may return different types on different databases. NHibernate tries to stay true to whatever the database is giving it, so it wouldn’t truncate information. You can force the issue by overriding the count function definition by writing a derived dialect. You can see an example of that in this post.

Bits & Booleans, while SQL Server accepts 1 as a Boolean value, PostgreSQL requires that you would use ‘1’ instead. This is where NHibernate’s query.substitutions supports come in. I usually define substitutions for true and false and then use true and false in the queries. Based on the database that I am running on, I can select what will be substituted for those values.

Achieving database independence with NHibernate is very easy, as Karl’s 1 hour porting story proves.

Oct 22 2009

Emergency fixes & the state of your code base

time to read 2 min | 397 words

Tweet Share Share 12 comments

Tags:

Bugs

One of the most horrible things that can happen to a code base is a production bug that needs to be fixed right now. At that stage, developers usually throw aside all attempts of creating well written code and just try to fix the problem using brute force.

Add to that the fact that very often we don’t have the exact cause of the production bug, and you get an interesting experiment in the scientific method. The devs form a hypothesis, try to build a fix, and “test” that on production. Repeat until something make the problem go away, which isn’t necessarily one of the changes that the developers intended to make.

Consider that this is a highly stressful period of time, with very little sleep (if any), and you get into a cascading problem situation. A true mess.

Here is an example from me trying to figure out an error in NH Prof’s connectivity:

This is not how I usually write code, but it was very helpful in narrowing down the problem.

But this post isn’t about the code example, it is about the implications on the code base. It is fine to cut corners in order to resolve a production bug. One thing that you don’t want to do is to commit those to source control, or to keep them in production. In one memorable case, during the course of troubleshooting a production issue, I enabled full logging on the production server, then forgot about turning them off when we finally resolved the issue. Fast forward three months later, and we had a production crash because of a full hard disk.

Another important thing is that you should only try one thing at a time when you try to troubleshoot a production error. I often see people try something, and when it doesn’t work, they try something else on top of the modified code base.

That change that you just made may be important, so by all means save it in a branch and come back to it later, but for production emergency fixes, you want to always work from the production code snapshot, not on top of increasingly more franticly changed code base.

Oct 22 2009

Core NHibernatePersistence with NHibernate – Coming to Paris

time to read 1 min | 69 words

Tweet Share Share 8 comments

Tags:

NHibernate

My 3 days course about NHibernate will be delivered at Paris , on 04 Nov 2009 by Sebastien Lambla.

This course is squeezing into 3 days everything that you need to go from an NHibernate newbie to a proficient craftsman.

An interesting tidbit, the last time that Sebastien gave the course, I got amazing feedback about the course.

You can register here.

Oct 21 2009

ChallengeCan you spot the bug?

time to read 2 min | 276 words

Tweet Share Share 37 comments

I have this piece of code:

public static string FirstCharacters(this string self, int numOfChars)
{
    if (self == null)
        return "";
    if (self.Length < numOfChars)
        return self;
    return self
        .Replace(Environment.NewLine, " ")
        .Substring(0, numOfChars - 3) + "...";
}

And the following exception was sent to me:

System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
Parameter name: length
  at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
  at StringExtensions.FirstCharacters(String self, Int32 numOfChars)

It actually took me a while to figure out what is going on. Partly that is because we have an “impossible” stack trace (the JIT seems to have inlined the Substring call). When I did figure what the bug was, i is a head slapping moment.

Oct 20 2009

JAOOMore on Evolving the Key/Value Programming Model to a Higher Level from Billy Newport

time to read 6 min | 1099 words

Tweet Share Share 6 comments

Tags:

Databases

As I already mentioned, this presentation had me thinking. Billy presented a system called Redis, which is a Key/Value store which is intended for an attribute based storage.

That means that storing something like User { Id = 123, Name = “billy”, Email = “billy@example.org”} is actually stored as:

{ "uid:123:name": "billy" } 
{ "uid:123:email": "billy@example.org" }
{ "uname:billy": "123" }

Each of those lines represent a different key/value pair in the Redis store. According to Billy, this has a lot of implications. On the advantage side, you get no schema and very easy support for just adding stuff as you go along. On the other hand, Redis supports not transactions and it is easy to “corrupt” the database during development (usually as a result of a programming bug).

What actually bothered me the most was the implications on the number of remotes calls that are being made. The problem shows itself very well in this code sample, (a twitter clone), which show inserting a new twit:

long postid = R.str_long.incr("nextPostId"); 
long userId = PageUtils.getUserID(request); 
long time = System.currentTimeMillis(); 
string post = Long.toString(userId)+"|" + Long.toString(time)+"|"+status; 
R.c_str_str.set("p:"+Long.toString(postid), post);
List<long> followersList = R.str_long.smembers(Long.toString(userId)+":followers"); 
if(followersList == null) 
   followersList - new ArrayList<Long>(); 
HashSet<long> followerSet = new HashSet<long>(followersList); 
followerSet.add(userid); 
long replyId = PageUtils.isReply(status); 
if(replyId != -1) 
   followerSet.add(replyId); 
for(long i : followerSet) 
    R.str_long.lpush(Long.toString(i)+":posts", postid); 
// -1 uid is global timeline 
String globalKey = Long.toString(-l)+":posts"; 
R.str_long.lpush(globalKey,postid); 
R.str_long.ltrim(globolKey, 200);

I really don’t like the API, mostly because it reminds me of C, but the conventions are pretty easy to figure out.

R is the static gateway into the Redis API
str_long = store ong
c_str_str – store string and keep it in nearby cache

The problem with this type of code is the number of remote calls and the locality of those calls. With a typical sharded set of servers, you are going to have lots of calls going all over the place. And when you get into people that have thousands and millions of followers, the application is simply going to die.

A better solution is required. Billy suggested using async programming or sending code to the data store to execute there.

I have a different variant on the solution.

We will start from the assumption that we really want to reduce remote calls, and that the system performance in the face of large amount of writes (without impacting reads) is important. The benefits of using something like Redis is that it is very easy to get started, very easy to change things around and great for rapid development mode. We want to keep that for now, so I am going to focus on a solution based on the same premise.

The first thing to go is the notion that a key can sit anywhere that it wants. In a key/value store, it is important to be able to control locality of reference. We change the key format so it is now: [server key]@[local key]. What does this mean? It means that for the previously mentioned user, this is the format it will be stored as:

{ "uid:123@name": "billy" } 
{ "uid:123@email": "billy@example.org" }
{ "uname@billy": "123" }

We use the first part of the key (before the @) to find the appropriate server. This means that everything with a prefix of “uid:123” is known to reside on the same server. This allow you to do things like transactions on a single operation of setting multiple keys.

Once we have that, we can start adding to the API. Instead of getting a single key at a time, we can get a set of values in one remote call. That has the potential of significantly reducing the number of remote calls we will make.

Next, we need to consider repeated operations. By that I mean anything where we have a look in which we call to the store. That is a killer when you are talking about any data of significant size. We need to find a good solution for this.

Billy suggested sending JRuby script to the server (or similar) and executing it there, saving the network roundtrips. Which this is certainly possible, I think it would be a mistake. I have a much simpler solution. Teach the data store about repeated operations. Let us take as a good example the copying that we are doing of a new twit to all your followers. Instead of reading the entire list of followers into memory, and then writing the status to every single one of them, let us do something different:

Redis.PushToAllListFoundIn("uid:"+user_id+"@followers", status, "{0}@posts");

I am using the .NET conventions here because otherwise I would go mad. As you can see, we instruct Redis to go to a particular list, and copy the status that we pass it to all the keys found in the list ( after formatting the key with the pattern). This gives the data store enough information about this to be able to optimize this operation considerably.

With just these few changes, I think that you gain enormously, and you retain the very simple model of using a key/value store.

Oct 19 2009

RantCompilers != Parsers

time to read 1 min | 121 words

Tweet Share Share 18 comments

Tags:

Programming

This is just something that really annoys me. There is a common misconception that compilers are mostly about parsing the code. This couldn’t be further from the truth. About the easiest step in building a compiler is parsing the text into some form of machine readable format. There are great tools to help you there, and a lot of information.

It is the next stage that is complex, taking the machine readable format (AST) and turning that into executable. There is a lot less information about that, and that tends to be a pretty gnarly sort of a problem.

As the title says, this post is here just to make sure that people distinguish between the two.

Oct 18 2009

LinqToSql Profiler & NHibernate Profiler – What is happening?

time to read 2 min | 292 words

Tweet Share Share 3 comments

Tags:

About two weeks ago I posted my spike results of porting NHProf to Linq to SQL, a few moments ago I posted screenshots of the code that is going to private beta.

I spent most of the last week working on the never-ending book, but that is a subject for a different post. Most of the time that I actually spent on the profiler wasn’t spent on integrating with Linq to SQL. To be frank, it took me two hours to do basic integration to Linq to SQL. That is quite impressive, considering that is from the point of view of someone who never did anything more serious with it than the hello world demo, and it was years ago. The Linq to SQL codebase is really nice.

Most of the time went into refactoring the profiler. I know that I am going to want to add additional OR/Ms (and other stuff) there, so I spent the time to make the OR/M into a concept in the profiler. That was a big change, but it was mostly mechanical. All the backend changes are done, and plugging in a new OR/M should take about two hours, I guess.

For Linq to SQL, however, the work is far from over, while we are now capable of intercepting, displaying and analyzing Linq to SQL output, the current Linq to SQL profiler, as you can see in the screen shots, is using the NHibernate Profiler skin, we need to adapt the skin to match Linq to SQL (different names, different options, etc).

That is the next major piece of work that we have left to do before we can call the idea of OR/Ms as a concept done, and release Linq to SQL as a public beta.

Oct 18 2009

LinqToSql Profiler private Beta – the screen shot gallery

time to read 1 min | 158 words

Tweet Share Share 11 comments

Tags:

L2SProf

This is just to give you some ideas about what the new LinqToSql Profiler can do.

It can track statements and associate them to their data context, gives you properly formatted and highlighted SQL, including the parameters, in a way that you can just copy and execute in SQL Server:

Oren Eini

Oren Eini

CEO of RavenDB

How to mis-configure your production to dead crawl

MSMQ: Insufficient resources to perform operation.

Database independence with NHibernate

Emergency fixes & the state of your code base

Core NHibernatePersistence with NHibernate – Coming to Paris

ChallengeCan you spot the bug?

JAOOMore on Evolving the Key/Value Programming Model to a Higher Level from Billy Newport

RantCompilers != Parsers

LinqToSql Profiler & NHibernate Profiler – What is happening?

LinqToSql Profiler private Beta – the screen shot gallery

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed