Rhino ETL: First Code Drop

First, let me make it clear, it is not ready yet.

What we have:

  • 99% complete on the syntax
  • Overall architecture should be stable
  • The engine works - but I think of it as a spike, it is likely to change significantly.

What remains to be done:

  • Parallelising the work inside a pipeline
  • Better error messages
  • More logging
  • More tests
  • Transforms over sets of rows

Here are a few works about how it works. The DSL is compromised of connection, source, destination and transform, which has one to one mapping with the respective Connection, DataSource, DataDestination and Transform class. In some cases, we just fill the data in (Connection), in some cases we pass a generator (think of it as a delegate) to the instance that we create (DataSource, DataDestination), and sometimes we subclass the class to add the new behavior (transform).

A pipeline is a central concept, and is compromised of a set of pipeline associations, which connect the input/output of components.

Places to start looking at:

  • EtlContextBuilder - Compile the DSL and spits out an instance of:
  • EtlConfigurationContext - the result of the DSL, which can be run using:
  • ExecutionPackage - the result of building the EtlConfigurationContext, this one manages the running of all the pipelines.

There is an extensive set of tests (mostly for the syntax), and a couple of integration tests. As I said, anything that happens as a result of a call to ExecutionPackage.Execute() is suspect and will likely change. I may have been somewhat delegate happy in the execution, it is anonymous delegate that calls anonymous delegate, etc, which is probably too complex for what we need here.

I am putting the source out for review, while it can probably handle most simple things, it very bare bone and subject to change.

You can get it here: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino-ETL

But it needs references from the root, so it would be easiest to just do:

svn checkout https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

Print | posted on Tuesday, July 24, 2007 1:24 AM

Feedback


Gravatar

# re: Rhino ETL: First Code Drop 7/24/2007 7:13 PM Tobin Harris

Cool beans! I'm gonna downlaod and have a play once I'm back on my ETL project. I look foward to it :-)


Gravatar

# re: Rhino ETL: First Code Drop 7/24/2007 11:04 PM Dave Arkley

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/Rhino.ETL

URL does not exist

When eill it be up?

Regards
Dave


Gravatar

# re: Rhino ETL: First Code Drop 7/24/2007 11:07 PM Al Gonzalez

The link doesn't work for me but the following one does:

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/rhino-etl/


Gravatar

# re: Rhino ETL: First Code Drop 7/25/2007 1:43 AM Ayende Rahien

Dave,
I have updated the link, please try again.


Gravatar

# re: Rhino ETL: First Code Drop 7/25/2007 12:38 PM Thibaut Barrère

Great to hear - I've been using SSIS a lot and have been looking for open-source alternatives, more easily configurable, customizable etc. I'll definitely have a look. A dotnet open-source alternative is definitely interesting.

BTW, have you heard of ActiveWarehouse-ETL ? (http://activewarehouse.rubyforge.org/etl/). It's another open-source ETL package (in Ruby) which I'm using in production.

regards,

Thibaut


Gravatar

# re: Rhino ETL: First Code Drop 7/25/2007 12:45 PM Ayende Rahien

I have heard of it, looks interesting, but it follows a fairly different path than what I have in mind.


Gravatar

# re: Rhino ETL: First Code Drop 7/25/2007 12:46 PM Thibaut Barrère

Ok - just read the older posts, so please ignore my AW-ETL comment :)


Gravatar

# re: Rhino ETL: First Code Drop 7/27/2007 1:57 AM Dave Arkley

Confirm the link is working fine now. Thanks Ayende.

Regards
Dave


Gravatar

# re: Rhino ETL: First Code Drop 8/1/2007 1:05 PM Tobin Harris

Hi Ayende

Am I being a muppet? I couldn't get it to build!

I get missing PipelineStage and BlockExpression files. Also, the AssemblyInfo.cs files were missing.

Looking forward to trying this out!


Gravatar

# re: Rhino ETL: First Code Drop 8/1/2007 2:49 PM Ayende Rahien

No muppet, the code is broken right now, and need to be fixed (working on it).
the assemblyinfo.cs files are generated when building from the command line


Gravatar

# re: Rhino ETL: First Code Drop 8/1/2007 3:26 PM Tobin Harris

Cool.

I have just shed a small tear as I had to close the Rhino.ETL solution and instead start a new SSIS project.

I've got 30 data files to import into a table, all are different formats and need various transformations. Do let me know if you get anything sorted today, otherwise I'll look forward to checking out the fixes another time :-)


Comments have been closed on this topic.