Pipes and filters: The IEnumerable appraoch

architecture (624) rss
bugs (451) rss
community (383) rss
databases (481) rss
design (899) rss
development (658) rss
hibernating-practices (74) rss
miscellaneous (592) rss
performance (397) rss
programming (1113) rss
raven (1483) rss
ravendb.net (570) rss
reviews (184) rss

2025
- December (8)
- November (4)
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Jan 05 2008

Pipes and filtersThe IEnumerable appraoch

time to read 5 min | 879 words

Pipes are very common in computing. It is a very good way to turn a complex problem to a set of small problems. You are probably familiar with the pattern, even if not explicitly.

The ASP.Net Http Pipeline (Begin_Request, Authorize_Request, etc
Compiler Pipelines (Parse, ProcessTypes, SaveAssembly, etc)
Command Line piping (ps -ax | grep Finder)

What I wanted to talk about today was how to implement this in code. I did several implementation of pipes and filters in the past, and they all were overly complex. I took this weekend to look at the problem again, and I came up with a ridiculously simple solution.

In a nutshell, here it is:

We have a pipeline, that is composed of operations. Each operation accepts an input and return an output. The use of IEnumerable<T> means that we can streamline the entire process without any effort whatsoever.

Most problems that calls for the pipeline approach are fairly complex, so picking a simple example means that it is trivial to implement it otherwise. Let us go to the really trivial sample of printing all the processes whose working set is greater than 50 MB.

We have three stages in the pipeline, the first, get processes:

public class GetAllProcesses : IOperation<Process>
{
    public IEnumerable<Process> Execute(IEnumerable<Process> input)
    {
        return Process.GetProcesses();
    }
}

The second, limit by working set size:

public class LimitByWorkingSetSize : IOperation<Process>
{
    public IEnumerable<Process> Execute(IEnumerable<Process> input)
    {
        int maxSizeBytes = 50 * 1024 * 1024;
        foreach (Process process in input)
        {
            if (process.WorkingSet64 > maxSizeBytes)
                yield return process;
        }
    }
}

The third, print process name:

public class PrintProcessName : IOperation<Process>
{
    public IEnumerable<Process> Execute(IEnumerable<Process> input)
    {
        foreach (Process process in input)
        {
            System.Console.WriteLine(process.ProcessName);
        }
        yield break;
    }
}

All of those are very trivial implementation. You can see that the GetAllProcesses class doesn't care about its input, it is the source. The LimitByWorkingSetSize iterate over the input and use the "yield return" keywords to stream the results to the next step, PrintProcessesName. Since this step is the final one, we use the "yield break" keywords to make it compile without returning anything. (We could return null, but that would be rude).

It is important to note that the second stage uses the if to control what get pass downstream.

Now we only have to bring them together, right?

public class TrivialProcessesPipeline : Pipeline<Process>
{
    public TrivialProcessesPipeline()
    {
        Register(new GetAllProcesses());
        Register(new LimitByWorkingSetSize());
        Register(new PrintProcessName());
    }
}

Now, executing this pipeline will execute all three steps, in a streaming fashion.

Okay, this is a lot of code that we can replace with the following snippet:

int maxSizeBytes = 50 * 1024 * 1024;
foreach (Process process in Process.GetProcesses())
{
     if (process.WorkingSet64 > maxSizeBytes)
         System.Console.WriteLine(process.ProcessName);
}

What are we getting from this?

Composability and streaming. When we execute the pipeline, we are not executing each step in turn, we are executing them all in parallel. (Well, not in parallel, but together.)

Hey, I didn't show you how the Pipeline<T> was implemented, right?

public class Pipeline<T>
{
    private readonly List<IOperation<T>> operations = new List<IOperation<T>>();

    public Pipeline<T> Register(IOperation<T> operation)
    {
        operations.Add(operation);
        return this;
    }

    public void Execute()
    {
        IEnumerable<T> current = new List<T>();
        foreach (IOperation<T> operation in operations)
        {
            current = operation.Execute(current);
        }
        IEnumerator<T> enumerator = current.GetEnumerator();
        while (enumerator.MoveNext()) ;
    }
}

I'll leave you to ponder that.

Tweet Share Share 22 comments

Tags:

Design

Comments

05 Jan 2008
20:39 PM

Mark Monster

I've heard about Pipes and Filters as an Architecture Pattern. But didn't do any implementation of it yet. Do you have a real world example about when Pipes and Filters is a good use?

Besides this, a very nice Pipes and Filters solution I think. I haven't tried the code, yet. But what about the last two lines in Pipeline.Execute, what's the use?

IEnumerator<T> enumerator = current.GetEnumerator();

while (enumerator.MoveNext()) ;

05 Jan 2008
20:43 PM

Ayende Rahien

I gave a few in the beginning of the post.

Others include batch processing, ETL, workflow, etc.

The last two lines are where the magic happens, they are driving the whole thing.

05 Jan 2008
20:57 PM

Avish

That's nice, but I'm bugged about the way the ends are implemented. The first step ignores its input, and in the last step we had to trick the compiler since we're not returning anything. Also, the last "while (enumerator.MoveNext())" is a little ugly.

Also, I think the example doesn't do a good job of explaining the need. This kind of things really calls for list comprehensions (p.Name for p in Process.GetProcesses if p.WorkingSet > threshold) or LINQ, I guess. This heavy-duty kind of pipelines really is useful for doing complicated, state-sensitive work on a collection of objects (the compiler example is a better one).

05 Jan 2008
22:12 PM

Arnon Rotem-Gal-Oz

Hi Oren

You got the pipes and filters mixed

The pipes are where the messages flow and the filters is where the processing is done (see http://www.enterpriseintegrationpatterns.com/PipesAndFilters.html)

Arnon

05 Jan 2008
23:26 PM

Omer Mor

Nice implementation, Oren.

Is there any reason you chose to use an interface (IOperation) instead of a delegate?

I don't like decalring new classes if I don't have to, and I don't see any problem with replacing IOperation with a method that fits the following delegate:

delegate IEnumerable<Process> ExecuteOperation(IEnumerable<Process> input);

If you'll need a statefull class that contains the operation that you can implement one and pass its execute method as the delegate, but if your operations are stateless, than a simple (possibly static) method will do.

06 Jan 2008
00:10 AM

Alex Henderson

Hi Oren,

The thing that bugs me is the IEnumerable<T> being baked in - why not make the operations work with T instead of IEnumerable<T>, you can still achieve the same end result, but you can use the pipeline for processing singular items like requests... or am I missing something?

Maybe something like this:

http://trac.devdefined.com/public/trac/tools.devdefined.com/browser/trunk/src/DevDefined.Common/Pipeline/Pipeline.cs

06 Jan 2008
00:31 AM

Paul Stovell

Hi Oren,

Pipes and filters are exactly what LINQ does.

Instead of implementing an IOperation<T> interface, you could simply implement IEnumerable<T>. The following extension methods:

public static IEnumerable<Process> LimitByWorkingSetSize(this IEnumerable<Process> inputs);

public static IEnumerable<Process> PrintProcessName(this IEnumerable<Process> inputs);

Would be all you need. You can then pipe them like so:

GetAllProcesses().LimitByWorkingSetSize().PrintProcessName();

The Pipeline<T> class also has a severe limitation in that it assumes all operations are for the same type. This disables transformation features within an operation. The Pipeline<T> class should be simply Pipeline, and the operation should at least be:

IOperation<TInput, TOutput>

However, what if an operation has multiple inputs? (Like a union for example.) You could collect all of the inputs into one container object. But why not simply use a method, IEnumerable<T>, and skip the IOperation<T> interface?

static IEnumerable<Process> LimitByWorkingSetSize(this IEnumerable<Process> inputs, int size) {

return new LimitByWorkingSetSizeEnumerator(inputs, size);

}

class LimitByWorkingSetSizeEnumerator : IEnumerable<Process>{

private int _size;

private IEnumerable<Process> _inputs;

public LimitByWorkingSetSize(IEnumerable<Process> inputs, int size) {

     _size = size; _inputs = inputs;

}

public void GetEnumerator() {

    foreach (Process p in inputs) {

         if (p.Size < _size) {

             yield return p;

         }

    }

}

}

Note that the implementations can be stateful or stateless. Consider the "Where" extension, which filters items one-by-one, or the OrderBy extension, which reads all of the inputs before sorting and returning the outputs.

06 Jan 2008
07:19 AM

Ayende Rahien

Avish,

yes, we are cheating the compiler to get the nice programming model.

I mentioned that this is a trivial example. The real ones are usually too complex to be easily explained.

06 Jan 2008
07:22 AM

Ayende Rahien

Arnon,

Thanks for pointing it out.

It looks like I am calling filters a state or operation, is that what you mean?

I don't like the name filter in this case, because most of the time I am not doing filtering there, I am doing transformations on the data.

Note to self: re-read pattern's description before using it.

06 Jan 2008
07:23 AM

Ayende Rahien

Omer,

A class give me more options and has better scalability in general.

You could do it with a delegate, but using classes makes sense in this scenario, I want to have a lot of small tiny classes.

06 Jan 2008
07:25 AM

Ayende Rahien

Alex,

Because of the filtering thing.

I may want to stop processing an item, so I can just not yield it.

I may want to split an item, so I can just yield it twice.

06 Jan 2008
07:28 AM

Ayende Rahien

Paul,

Consider a set of business rules that need to execute on an order batch, to decide if we can approve it.

Consider a set of transformations that a message goes through before it is let out the door.

Consider a set of data manipulation that occurs for a row in an ETL process.

Linq is not a good approach in those scenarios. I am not talking about querying, in most cases, I am talking about processing.

06 Jan 2008
12:29 PM

Markus Zywitza

For better composibility, you should consider Pipeline<T> implementing IOperation<T> to allow concatenating Pipelines.

The Execute()-Method will then use the input enumeration instead of creating a new List and yield its results, avoiding the ugly empty while-loop as a side benefit.

06 Jan 2008
14:59 PM

Jon Skeet

Business rules deciding on whether or not an order batch should be approved: Where clause.

Transformation: Select clause

Data manipulation for a row in an ETL process: I don't know enough about this to comment.

The first two at least are perfectly reasonable uses for LINQ, and I suspect the last is too. LINQ is for more than just querying. Paul is right: pipes and filters are exactly what LINQ is all about, at least for LINQ to Objects.

Jon

06 Jan 2008
15:04 PM

Ayende Rahien

Jon,

When I am thinking about selecting & transforming, I am usually talking about more than one liners.

Assume that you want to interrogate an external system for data about the customer credit status.

Or that the transformation is involved or contains complex business logic.

All I have seen of linq so far convinced me that it breaks down really fast when you get to complex stuff.

06 Jan 2008
16:02 PM

Jon Skeet

If you want more than a one liner, write a method and use that as the action of the delegate instance. Don't forget:

1) You don't have to use anonymous methods or lambda expressions to create delegate instances.

2) You don't have to use query expressions to use LINQ.

3) You can write your own extension methods as well to expand LINQ as you need to.

Now admittedly the bug wrt output type inference of method groups is a slight disadvantage here - but it's better than being forced to (manually) create a new type every time you need a different kind of filter or transformation.

I really believe that pretty much any limitation of LINQ is going to prove a limitation of your scheme above too - simply because they're so similar. The advantage of LINQ is that for simple cases you can use query expressions, lambda expressions etc. Oh, and it's going to be rather better understood by the majority of developers in the next couple of years :)

Jon

06 Jan 2008
18:40 PM

Ayende Rahien

Jon,

The only criteria that I have for this is how maintainable I can make it.

I don't see Linq adding anything to the mix here. It is possible that I am wrong, but I would wait to see the code before being able to say so.

06 Jan 2008
18:54 PM

Jon

Using LINQ would add four things:

1) Not reinventing the wheel. If you hire a C# developer in a year, I'd hope they'd be familiar with LINQ. They probably won't be familiar with your pipeline framework.

2) Taking advantage of the ease of creating delegates in C# 3, rather than forcing the use of interfaces.

3) Transformation ability, as Paul pointed out, where the input and output types can be different

4) The ability to integrate simply with other LINQ-related technologies such as Parallel LINQ.

I would argue that your use of PrintProcessName is an odd one for a pipeline, to yield an empty result at the end. I think I'd rather implement a ForEach extension method on IEnumerable<T> which takes an Action<T>. (I'm kinda surprised that isn't in the framework already, to be honest.)

At that point, your code would become:

Process.GetAllProcesses()

.Where (proc => proc.WorkingSet64 > 50*1024*1024)

.ForEach(proc => Console.WriteLine(proc.ProcessName);

No new types needed at all, except for the static class to hold the ForEach extension method. If you want to create extra classes for reusable logic, you certainly can - but you don't have to.

If the logic for any of the steps is complicated, you can stick that in a method easily, either casting the method group or just calling the method from a lambda expression. When the logic isn't complicated, do it inline as above.

Personally, I think that's more maintainable. We still have the composability and the streaming, but we also have all the standard query operators, the ability to use query expressions where you need to, etc.

I may have said it before on this blog, but I believe LINQ to Objects has been significantly under-marketed. LINQ to SQL has more of an "ooh, ahh" factor - but LINQ to Objects will be more applicable in many situations (and without forcing you to use SQL server!)

06 Jan 2008
19:11 PM

Ayende Rahien

1/ if they can't grok the concept in 30 minutes, they are not worth keeping. The idea of grabbing someone from the street is a myth.

2/ build a DelegateOperation<T>(Action<T>), and you are set

3/ In this scenario, I actually need it to keep one type all the way. I am doing transformations in a pipeline. If I wanted any type, I could have used IEnumerable instead of IEnumerable<T>

4/ interesting, probably the best point.

I said that the print processes example is trivial, right?

A more realistic sample would be:

1/ read customers from file

2/ left join to existing customers in database

3/ for all those missing from database:

3.1/ create customer record

3.2/ send email about new customer

4/ get all active orders in last day

5/ left join customers with orders

6/ update customer statistics

6.1 / amount bought

6.2 / favorite products

7/ update customers

Assume significant complexity for at least some of those steps.

Assume that I want to maintain separation of concerns.

07 Jan 2008
03:27 AM

Paul Stovell

Using LINQ and extension methods does not mean you have to put all of your operation inside one method.

If you look at many extensions, they actually return a class (implementing IEnumerble<T>) which contains all the logic. The SyncLINQ source code certainly isn’t one 40,000 line class with 500 massive methods; I’m sure the LINQ source is similar.

ETL can't be done with LINQ? Here's how I'd do it:

var transformedCustomers = new AddAdditionalMetadataToCustomerOperation(

                           new ConvertCRMCustomerToSASCustomerOperation(

                              new SwapFirstNameAndLastNameOperation(

                                  new ImportCustomersFromCRMOperation(crmUrl));

The nice thing about IEnumerable and not fixing the inputs/outputs is each class can be defined differently:

class AddAdditionalMetadataToCustomerOperation : IEnumerable<SASCustomer>

public new (IEnumerable<SASCustomer>);

class ConvertCRMCustomerToSASCustomerOperation : IEnumerable<SASCustomer>

public new (IEnumerable<CRMCustomer>);

class SwapFirstNameAndLastNameOperation: IEnumerable<CRMCustomer>

public new (IEnumerable<CRMCustomer>);

class ImportCustomersFromCRMOperation: IEnumerable<CRMCustomer>

public new (string crmUrl);

Note that each operation can return different types, and take different types as inputs. Since you’re using classes, you can use inheritance. It has all the capabilities of the solution you blogged about. You can add multithreading, processor yielding, whatever you want.

Then, for readability only, you can wrap it in some LINQ extensions to become:

var sasCustomers = ImportCustomersFromCRM(url)

               .SwapFirstAndLastName()

               .ConvertToSasCustomers()

               .AddAdditonalMetadata();

And like any good pipeline, it reads from right to left :)

07 Jan 2008
08:55 AM

Jon Skeet

Is your example with customers meant to be a single pipeline, or two? There seems to be a disconnect at item 4. If there isn't, I don't quite understand it - but it raises an interesting issue anyway.

One issue with the "pull" model of LINQ is it assumes there's basically one consumer. If you want to split a pipeline, it's relatively hard to do without threading. Just as a sort of plug, and because you might find it interesting, have a look at my blog entry about "push" LINQ:

http://msmvps.com/blogs/jon.skeet/archive/2008/01/04/quot-push-quot-linq-revisited-next-attempt-at-an-explanation.aspx

(Or at the moment, just the top entry at http://msmvps.com/jon.skeet)

It supports splitting at any point very naturally, although I haven't used that for anything other than simple cases.

I'd expect all the steps you've mentioned to be feasible with LINQ though - as I say in the fluent pipelines thread, you can still use methods as delegate targets too...

Jon

13 Jan 2008
06:48 AM

wow

wow, I just stumbled upon this and will bookmark this.. my understanding of LINQ to Objects is 10x. Thank you Thank you!

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Pipes and filtersThe IEnumerable appraoch

More posts in "Pipes and filters" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Pipes and filters" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication