SOA Future Batches
Pair<CustomerDto, ICollection<OrderDto>> GetCustomerAndOrders(int customerId);
Today I had a very interesting lunch conversation, about designing the backend of a web side, SOA style. The initial suggestion called for an interface similar to this one:
public interface ICustomerService{CustomerDto GetCustomerById(int id);
ICollection<OrderDto> GetOrdersForCustomer(int customerId);
}
One the surface, that was reasonable, until we started talking about the implementation. Which would look somewhat similar to this:
PropertyBag["customer"] = customerSrv.GetCustmerById(customerId);PropertyBag["orders"] = customerSrv.GetOrdersForCustomer(customerId);
I wasn't pleased with that, because this means two remote calls, instead of one. That, in general, is bad. Especially since developers tend to write small methods, requiring many roundtrips to the server. On the other hand, having an interface that closely matched the UI was going to make the whole SOA thing silly, and make it much harder for other clients to make use of the services (and there are other clients).
Beside, this is ugly:
Pair<CustomerDto, ICollection<OrderDto>> GetCustomerAndOrders(int customerId);
Clearly, something had to be done. I suggested the following, instead of having a service (and end point) per logical service, has a single service, with a single method:
public interface IHippoService{AbstractResponse[] Process(params AbstractRequest[] requests);
}
And we define the following request/response pairs as well:
public class GetCustomerByIdRequest{public int CustomerId;}public class GetCustomerResponse{public CustomerDto Customer;
}public class GetOrdersByCustomerRequest{public int CustmerId;}public class GetOrdeersResponse{public ICollection<OrderDto> Orders;
}
If this reminds you of NServiceBus, you are correct. Same pattern, but with vastly different goals.
The main point of going with this approach is that we can now do the following:
var responses = hippoService.Process( new GetCustomerById(customerId), new GetOrdersForCustmer(15));PropertyBag["customer"] = responses[0];
PropertyBag["orders"] = respones[1];
And this allows us to perform the required operation in a single remote call, instead of two.
Furthermore, this style of programming, while efficient, is not really friendly, we can make this much friendlier by introducing futures into the deal, so the API that we will have will look like this:
var futureCustomer = ProcessInFuture<GetCustomerResponse>(new GetCustmerByIdRequest(customerId));
var futureOrders = ProcessInFuture<GetOrdersResponse>(new GetOrdersForCustomerRequest(customerId));
// use the future values, causing both to be sent to the server in one go
This programing model is much more natural for most developers, but it keep all the performance benefits of the batching approach.
This is also how I can get unbelievable performance from NHibernate, for code that looks very readable, but would execute like a dog without futures and batching.
We have spiked a small test case using WCF, and it works, pretty nicely, I might add.
Thoughts?
Comments
Are these methods being called against a single session (or something similar)? Otherwise, I can't see how they're being tied together.
Beyond that, it sounds like a nice approach - and one which should be reasonably familiar to anyone considering LINQ and deferred execution.
Mind you, it's one of those things where accidentally referencing the results too early (e.g. when bug fixing) could kill performance. You'd definitely want to make people very aware of what's going on. I don't say that as a bad thing, just as a caveat.
This is totally sweet... You've got the wheels turning on making some major changes to a service framework I developed last year for a client.
Ok, so I'm convinced that this is a good idea, but where's the ProcessInFuture code? What's the best way to accomplish this with NHibernate?
Jon,
I assume that you mean about the future calls.
They are stored on the current context until one of them is referenced, then they are making the request for multiply results in a single remote call.
Check out this post for the implementation:
http://ayende.com/Blog/archive/2008/01/24/FutureltTNHibernateQuerygt.aspx
And yes, you do need to be aware of what is going on.
I tend to fault the application if you cross some threshold, which can be major help in many cases.
Will,
Take a look here for a discussion on how it works:
http://ayende.com/Blog/archive/2008/01/24/FutureltTNHibernateQuerygt.aspx
And here for the implementation details
http://ayende.com/Blog/archive/2008/01/24/Future-Query-Of-implemented.aspx
You've just re-invented the "big, universal invoke operation". From a SOA perspective, this is really a bad design, since clients have no chance to infer correct usage of the service by looking at the WSDL.
You are 'solving' this by some cool .NET tricks, but a SOA client may very well not be a .NET client, so that doesn't really fly...
Your idea about using futures to batch service operation invocations is good (I've been thinking about something similar), but I don't see how it's particularly dependent on that very abstract service interface definition.
Mark,
You submit a message to the server, and get a message back.
The messages are included in the WSDL, so they are easily accessible from other env.
Discoverability is handled by naming conventions.
FooRequest -> FooResponse
So you are saying that because the FooRequest and FooResponse message structures are present in the WSDL, people will know which ones to use?
It sure beats creating an operation that takes xs:any as input and returns xs:any, but personally I still prefer a more descriptive interface.
An issue that commonly comes up when batching operations is transactional integrity. If everything happens as a single batch, the service must guarantee that the batch eiter succeeds or fails as a whole.
If the service is a façade for distributed business services, this may be either difficult or downright impossible.
Why doesn't the CustomerDto have a collection of OrdersDto?
Then, in your request, you specify which fields and child collections of the customer you want returned.
We have always done it this way and it works perfectly.
The more descriptive interface leads to RPC style communication.
The reason for batching here is reducing remote calls.
If you want transactions, that is easy enough to handle:
service.Process( new GetFoo(),
new TransactionMessage( new DoBar(), new DoXyz()),
new GetTzar() );
GetFoo run without transaction, DoBar and DoXyz are in the same transaction, GetTzar without transaction.
Very simple, and no change to the model necessary.
I totaly like the approach.
From my perspective it would be good to get more information into the result:
some ideas:
futureCustomers.AreFetched;
futureCustomers.LastExecutionTime
futureCustomers.ReFetch
futureCustomers.ForceFetch
The AbstractResponse could a concrete Implemenation and maybe an (List) Wrapper which would support sth. like:
response.HasCustomer
response.Customers
response.Count
El,
The problem is that you may not want to have the Orders collection fill all the time. That takes time and bandwidth.
Sure, you can create a way to specify what you want, but this gets costly very fast.
Moreover, what if I want things that are not intrinsically associated with one another?
Robert,
Sure, this is just an idea at this moment.
I wouldn't put those on the response, however.
I may want to do completely different thing tomorrow (GetProducts, for example)
To put domain specific information into the response object, would not hurt the abstract class.
It would just provide an helper and support the "dry principle". From my experience these kind of question to the state of on object are addressed more than once. And these expressive helper properties and functions are very nice to reed.
(But I would add them only, if there are needed for the first time. yagni. But I could imagine, that it could be later wantend to manage the responses, e.g. to keep them up to date. Like:
if (response.LastUpdate.IsOlder(10).Minutes){
response.Customers.ReFetch();
//Or: response.Customers.CheckForUpdates();
}
May be, extension methods would be the right to way, to achieve this nicely?
Distributed transactions are only going to work if you have Microsoft all over, or both client and all services support WS-Atomic Transaction (or whatever the name is today).
...and I don't agree that descriptive operations necessarily lead to RPC style communications. There's nothing stopping you from using your futures/batching approach against a more 'strongly typed' service.
Mark,
re: Tx
That assumes that the server is going to make a distributed call. In this case, there is one endpoint that generally talks to a single app. The fact that I am sending multiply messages non withstanding.
re: Batching strong typed method calls:
I would really like to see a sample of that, can you think of any?
Something like this is very easy in a dynamically typed language, because you can subvert the return value. Not so easy with something like C#
re: Tx:
Yes, as long as the service endpoint is a single application there's no issue - I was just pointing out that if your service is just a façade that distributes messages to different back-end services, transactions may be an issue across those back-end services.
re: Batching strongly typed method calls:
In your example, you are already using strongly typed messages (transmitted via a generic operation). What's the semantic difference between that and strongly typed operations?
re: batching
I think we are talking past each other here. When I was talking about the strongly type part, I was talking about the classic WCF service contract, not message passing.
OrderDto and CustomerDto? If you are using this api for a tiered application architecture, I'm going to puke. This is an anti-pattern I see used in many places.
Been there. Made that mistake.
You couldn't drag me there with a bulldozer.
Hmm... gotta be said, I've used the "single request" technique before today and regretted it. However, if you wanted to do it, I'd make the following observation: the recommended way of using WCF is to create proxy classes. These proxy classes are all partial classes, which means you can extend them.
So, you /could/ implement the following:
partial class GetCustomerByIdRequest : IFuture<GetCustomerByIdResponse>
That would remove the discoverability problem and mean you could write code like.
var r1 = new GetCustomerByIdRequest(id);
var r2 = new GetOrdersForCustomerRequest(id);
var customers = r1.Value;
var order = r2.Value;
Obviously, in order to do that, you need to jettison two assumptions:
1) That the objects on the client are the same as on the server
2) That the objects are pure DTOs that directly correspond to what's sent down the wire.
What a beautiful, elegant solution!
The version without Futures seemed more effective I thought, because you get clear, explicit control over granularity. Isn't that is the key issue when playing near the distribution boundary? (that said, I don't do distributed stuff that much)
I'm definately gonna try this, and also the use of Futures in NHibernate.
Evan,
I agree about the DTO in general, but there are some interesting conflict here.
Assume that the service layer actually uses a domain model.
You are already going to have a Customer and Order entities, how are you going to separate that from the messages on the wire?
Julian,
Can you explain why you regretted it?
And I disagree about the preferred way. If I am using WCF, I would have a contracts DLL that would be shared between client and server. Much easier development model.
Proxy generation only comes into place when you really need that, versioning or different systems are the usual reasons
I have to say I disagree regarding your last comment (preferred way of using WCF).
Proxy generation is in fact Microsoft's recommended way of using WCF,
as generated proxies inherit from ClientBase<T> and implement
IDisposable for you to cleanup custom channel resources after use.
Now when it comes to versioning, both practices demand that
you either re-generate the proxy and recompile or change contract and recompile, so I see no real benefit in using proxies for this reason.
Besides that, I really like the idea of batching commands
to reduce server-round-trips. Although in PoEAA the recommended way is using DTOs, I prefer generalizing instead of creating a DTO for
every type of request.
I'm currently implementing a solution
using a modified version of NServiceBus which allows batching
several messages ('commands') together.
btw- In our implementation, we utilize the composite pattern
in order to create a 'message' tree, where the leaf nodes
are represent simple requests and the branches represent transaction containers.
:)
There is no problem batching service calls (MSMQ does it) with a full contract. I don't understand the whole debate. If you go asynchronous (which is the preferred way in SOA), it does not matter if you collect the messages in the sending side, in the channel or in the receiving site.
Alon,
Look at the type of messages in the examples.
For those, you pretty much want to have sync communication. Batching them makes for significant perf improement
Yes, I agree that batching improves performance, and I agree that there are many cases that you want a synchronous behavior. Bay batching is a mixture of synchronous and asynchronous calls. You don't get the result of the first call and then call again for the second call. You will get the results of all the calls all together. Having a mechanism based on asynchronous calls that batch the messages and then submit them together will give you the same result (performance) of bathing.
What I really meant is that you don’t have to implement one "function" that does all. You can still use a well defined contract. After all, all of the "function calls" in WCF become a message. So by implementing a batching mechanism that will get a WCF message (action="*") and send them together you'll get the batching without losing the contract. Of course you will have to tell this mechanism when to actually send the batch.
Alon,
That sounds interesting.
Can you talk a bit more about it, I am not sure that I understand how you can do that.
I would really like to see an example.
I don't say that it is an easy task, however look at the example of a chunking channel: http://msdn2.microsoft.com/en-us/library/aa717050(vs.85).aspx . What we want is the opposite. But the implementation may be somewhat similar. We also have to define when to actually send the batch. Maybe by adding something to the ContextMessageProperty when we want the batch to be sent, or by defining a policy of timeout or limiting the number of messages. We also need to use some sort of async calls maybe by using the client asynch pattern (i.e. use the async "svcUtil /async" which is actually a synch service call http://www.dotnetconsult.co.uk/weblog2/PermaLink,guid,83b06d32-1fa8-4757-b062-c2e1766a5525.aspx ). We can also use a synch call, batch it and return with empty result. In the custom binding in the client side we will batch the calls. In the service side we will batch the results. The code to call the service maybe based on the future pattern or based on the IASyncResult pattern. We can also ask the result to be an array of messages. But then we lose the contract based dispatching.
Alon, what I am thinking about is, can you make the following code work:
var custmer = customerService.GetCustomerById(15);
var orders = customerService.GetCustomerOrders(15);
And do it using a single batch? I don't see how
What if the you want to show Orders and retrieve the customers for the Orders?
A View that lists all Orders, with ordernr, date etc, and the customer name, and adress.
Would you loop through all the orders and Call orderService.GetCustomerForOrder(orderid) for every order, or would you make a GetCustomersForOrders() that return all Customers in a Key/Dictionary and use that to lookup the correct customer?
Hmm... trying to think about my experiences of the single request model. A bit of context: everything was done in XML on VB6 and DCOM so not all of the lessons are going to be directly applicable.
One problem is that it infected every part of the system as it was used as a general bus. This was, in retrospect, a huge mistake; this sort of stuff should be restricted to the boundaries where it's actually needed. Another was that actually debugging this became a pain simply because it was hard to figure out exactly what the object state was meant to be at any given point. It should probably be mentioned that the separation of concerns wasn't exactly ideal. However, the use of a type-safe language should help a lot with the last problem.
Now, WCF has a great advantage over remoting on this second point in that you have to restrict what can actually be sent down the wire, explicitly defining what sub-classes can actually be transmitted. It simplifies the version issue no end. Actually, the versioning issue is huge in the first place. I really would recommend going with proxies. To expand on the previous example, you could actually implement IFuture on the server as well, and put the implementation of the original method in there. It's actually a relatively natural model. You have client objects and server objects which are proper domain objects that communicate down a well-defined data schema down the wire. Partial classes on the client makes this fairly manageable.
However, the more I think about it, the more I think that others have hit the nail on the head that, fundamentally, this should be being handled at the wire level. A brave developer could implement an extension to the protocol so that a function returning IFuture<T> was batched (either by extending the wire protocol or looking for a suitably defined method as you've described already).
Although I know it looks less elegant, I'd probably recommend letting batch control be more explicit. e.g.
using (new BatchRequest()) {
var custmer = customerService.GetCustomerById(15);
var orders = customerService.GetCustomerOrders(15);
}
So that the batch always ran when BatchRequest was disposed.
Morten,
I would have a separate message to do that.
About the interface:
ICollection<OrderDto> GetOrdersForCustomer(int customerId);
That's a dangerous call to have as its memory consumption can grow unbounded for large numbers of orders, say, for strategic customers - these calls tend to be the culprit of decreased server stability. As the eat up memory, paging starts occuring, threads block, bad things happen.
For these scenarios, a more robust solution involves multiple responses over time. Using nServiceBus, this would be done by a message handler as follows:
while ((data = GetUpTo(MAX).RowsFrom(Table) != 0)
Bus.Reply( ConvertToMessage(data) );
The client would need to have a message handler of their own to handle the data coming back.
As you begin to look at the world this way, you might see that receiving a notification of new data could behave the same way from the client's perspective. Just something to keep in the back of your mind :)
Hope that helps.
@Ayende
This post inspired me to come up with something similar. At first I tried using the Future of T approach as described here: http://www.ayende.com/Blog/archive/2008/01/02/Its-the-future-now.aspx but it wasn't working right. Then I ditched the concept entirely, but as a result the interceptor has a reference to both the Request and Response:
FutureInterceptor : IInterceptor
{
AbstractRequest Request { get; private set; }
AbstractResponse Response { get; set; }
IRequestBatcher Batcher {get; private set; }
}
in Intercept(), I check to see if the Response is null. If it is, I flush the batcher to the server - which has a list of all FutureInterceptors - and sets the response on them.
I was wondering, is this a bad approach? I haven't seen any IInterceptor that holds references to several different objects.
That sounds reasonable, certainly.
I would probably try to go with this idea, but I want to see how it looks like when we are doing this explicit
This is the second time I've seen IHippoService when talking about SOA. Where's that come from?
Well, IRhinoService is reserved, I had to use something else.
Comment preview