Kettle 3.2.0-M1 release

Dear PDI fans,

Today we’re releasing another version of the Pentaho Data Integration Community Edition (a.k.a. Kettle) : version 3.2.0-Milestone 1.

This is built on the trunk source code in subversion, revision 10101.

A lot of things have changed as can be seen by the massive amount of new steps in the “Experimental step category”, but we also added new things like:

  • Visualization improvements: hop color scheme augmented with mini-icons over hops, tooltips (more intuitive)
  • New job entries
  • Imported Formula step using libformula
  • Imported Reservoir Sampling step
  • Improvements in existing steps (Table Output, etc)
  • Huge pile of bug fixes
  • Performance enhancements
  • Dynamic clustering (headless-XML only)

The release notes are here.

WARNING: Although this is a fairly stable build, we do not recommend that you use this to run anything in production nor do we provide support for it.  We do recommend that you file bug reports in case you find anything is wrong with 3.2.0-M1.

Until next time,


Gartner DI MQ

Dear Data Integration fans,

A few weeks ago, Yves de Montcheuil from Talend took a shot across the bow of Gartner for not including Talend in their Magic Quadrant (MQ) for data integration.  After that post, Andreas Bitter from Gartner (rightfully) felt personally under assault and felt the need to set the record straight.

I think the discussion itself is very interesting, but misses very important point:

The Magic Quadrant contains companies not trends nor communities nor people nor software!

Think about it for a second.  In the early days of JBoss there were complaints from Marc Fleury about the fact that only a small percentage of the “JBoss the software” users paid anything to “JBoss the company”.  Numbers that floated around back then were 0.01% or 0.1%, can’t remember exactly.

Those numbers make sense, I’ve heard about similar figures from other commercial open source companies.  Anything in the range 0.01% to 1% is possible.

Let’s be “optimistic” here and claim that a company like Pentaho converts 1% of all users into customers. (trust me, that figure would be really great given the millions of users out there :-))  That would mean that we’re disturbing the market of our competitors for the turnover x 100.  So if Pentaho would do a dollar turnover, we’re disturbing the closed source vendors for 100 dollars.

Pentaho and yes indeed Talend see that they are being a serious disturbance to the market dominance from the traditional DI vendors.  And that is why Yves feels a bit mistreated by Gartner.  However, since companies like Pentaho and Talend use a disruptive business model it is only normal that the Gartner MQ itself is also disrupted by our models. You simply can’t be part of the system if you want to disrupt it I guess. (*)

All that being said, it’s only a matter of time before something has got to give: open source or the Gartner DI MQ.  Yves, Andreas, let it be noted I’m betting on the former to come out of this as a winner.

Until next time,


(*) This also partly explains why Kettle and TOS are not really competitors: we’re using the same business model and are not disrupting each other.  We offer 2 completely different choices to our users.

Hudson Continuous Integration

Hi Kettle fans,

Lately we’ve been having a lot of fun and profit from our brand new Hudson continuous build server.

Hudson finally gives us a nice interface to everything build related.  Most if not all Pentaho projects are integrated by now, including Pentaho Data Integration / Kettle over here:

It provides you an overview of the latest commits and the build results.  Hudson sends us mails in case we “break the build” as they call it.  It also executes the unit test cases.  It’s been taking a while to make all our test-cases locale and time-zone independent but I think we’re almost there now.

We might no longer have nightly builds, but for our developers we now have a much better tool indeed.  A new build 20 minutes after your commit with all the test results nicely indexed, whoot!

Of-course, the downside is that if you screw up, everybody can see it.  But I guess it’s important that someone sees the problem, rather than no-one at all.  This is open source software after all!  We have nothing to hide!

Until next time,

E4X : XML tweaking

One of the lesser known features of Pentaho Data Integration 3.1 was the inclusion of E4X a.k.a EMCAScript for XML.  Up until recently I didn’t really have time to look more closely to the details behind it.  Today someone asked how to set an attribute in an XML string using JavaScript.  Well, this is how you do it…

Suppose we have the following piece of XML in a String field called “input”:


Now we want to set an attribute in <foo> with a certain id=5 :

var foo = new XML(input);

foo.@id = 5;

var result = foo.toXMLString()

As you can see from the syntax, you need to construct a special XML object before you can work with it like we did.

By the way, the result is :

<foo id=”5″>bar</foo>

Our friends over at Mozilla have compiled an extensive set of explanations and examples on this page. Great job Mozilla!

Until next time,