Writing and maintaining any software is a constant struggle between what is known as Feature Creep and the request for new features by our users.

At a certain point you have to balance off the size of the audience of a new feature and the possibility of a useable workaround. For certain step types in Pentaho Data Integration the point of saturation certainly seems to have been reached. Steps like “Text File Input” already contain a gazilion features, checkboxes, configuration fields, etc. I sometimes wonder if it takes a brave person to use these dialogs.

So instead of just immediately creating this feature request I created another Weekly Tip (all tips are here) to show how you can evaluate certain conditions in a job.

Tell me what you think of this approach. I would love to hear about how other folks manage these issues.

Image from the Kettle tip

Wireless relieve after a great week

As I was packing this morning in my hotel in Orlando, I was fearing the worst. The last couple of days, it has been all over the news: again, flying across the Atlantic Ocean became more difficult because of tightend security measures in all airports.

So, being a good decent citizen, I went to the airport 3 hours early to cover for the extra wait…. only to breeze through security in less than 5 minutes. Well, I guess it was different a couple of days ago, but today there was no big queue to deal with. Then again, like Mondrian maintainer Julyan Hyde said yesterday: it probably will be busy when a couple of 747’s are ready to board…

Hey, here is the good part: FREE WLAN at Orlando International Airport! I’m writing this blog entry at the airport. Why shouldn’t I, I have more than 2 hours to waste! I ‘m really hoping the other airports will some day get their act together and also provide free Wifi.

We had a great Pentaho developer summit this week in Orlando: it’s great to see the big picture for a change. Among other things, we got status reports on what’s going on with the different tools in Pentaho. When you see the new stuff that’s in the pipeline, you can’t help but be excited about the whole thing. And apparently so are our customers, partners AND investors.

To you reader I can add, Pentaho might provide you with a great free toolset now, but wait until early next year when the next-generation of tools come about.

One of the cornerstone tools will of-course be a central meta-data repository that will combine enterprise reporting solutions, ad-hoc web reporting, etc with incredible ease of use. Especially this feature will probably cause the Pentaho toolset to get traction in enterprises even faster than it is happening today. (We’re already at around 80.000 downloads a month!!!)

I will most certainly be blogging about meta-data and related stuff in the near future as I’m sort of “hosting” the effort together with Nicholas Goodman and of-course Pentaho “Chief Geek” James Dixon (& the rest of the team)

Until next time,

The Wireless mess

Yesterday I spent most of the day traveling from Brussels to Orlando. This time it the trip was : Brussels – Chicago – Orlando.

It took me around 18 hours of travel of which one hour on the train, 2 hours of wait at the Brussels airport (WiFi available, company 1: need to pay and register).

The 8 and a half hour trip on the plane didn’t have any connectivity options either of-course. I hear that SAS is now allowing you to connect to WiFi. To me that means that all the BS that we’ve been hearing about it not being safe to use electronic devices on airplanes is just that : a load of BS.

At Chicago airport they have Wireless LAN as well. They are so proud of it that they have big billboards saying “Wifi zone here”. Of-course, you still need to register and pay around $US7,00 for a day of surfing. The time between flights was a couple of hours, but you spend an hour struggling through customs and various passport checks. Then you eat something so that means you have to pay that amount of money for half an hour of surfing: No thanks.

Again, no connectivity options on the domestic flight from Chicago to Orlando. (I have to say that Orlando airport DOES have FREE Wifi!)

A “connected”, “always on-line” world? I don’t think so…

We did find a nice beer last night so in the end it all worked out though 😉

About bugs…

The problem with bugs is that you rarely know that they are there. A couple of days ago we fixed a nasty bug in Kettle that has probably been in the code for more than a year. The problem was only triggered once in a while under special conditions. However, if you were hit by it, our sincere apologies!

The problem with bugs is also that very often they can’t be reproduced. This was certainly the case in our situation. It was thanks to Jens Bleuel (from Pentaho partner Proratio) that we found out about it. He made a very simple transformation that proved something was wrong.

A lot of folks probably must have had a feeling that “something” wasn’t quite going the way they expected it to go. But very often it this feeling is then ignored because the next time they started the transformation, everything is going OK.

So again, here is an invitation to put every small detail that you think is not quite right on the bug list. Indeed, if it’s such a small detail, it’s probably fixed very easily. If it’s something worse, at least then we will know about it.

The conclusion:

bugs = bad

bug reports = good

Here is an overview of the current situation: as you can see, before each release we bring down the number of bugs to close to zero 🙂

Number of trackers : open / closed


Being a “non-American”, Independence Day completely caught me by surprise.
Independence is still something that goes to the core of what many people and companies take at heart.   It’s always good to keep that in mind.  A few days ago I read (and answered) some question on Nick’s blog concerning the level of independence that you would retain when choosing for the Pentaho Open Source BI platform.

Unfortunately, I know all too well where the concerns come from when companies ask these though questions.  Time and time again I have witnessed the failure of commercial companies to respond to critical business needs or problems in the software.  On one occasion I have seen a €50.000 piece of software simply “not work”.   Mmm, you say, didn’t you check with this fine company if the functionality would be in the software?  Yes, but against flat-out lies, there is little defence (besides not paying the invoice that is 🙂 )

Also, if you have been in the BI line of work for a number of years (like I am), surely you must have heard one of the “failed BI projects” stories too.  Projects that drag on for 6-12 months and then have little or nothing to show for but a nice-but-outdated-by-12-months requirements document.

Companies and the people working for these companies remember these disasters quite well.  There is nothing like blowing a milion dolars to make a company concerned that this kind of thing won’t ever happen again.

One way of making sure that it doesn’t is by turning to open source for your BI needs.  A (BI) solution based on open source will give you a better return on your investment in time, servers, software, but also guarantees you that you minimize the risk of never seeing a return on your investment.

It also makes you less dependend.

Happy 4th of July America!

Take back control

The type of questions that we get on the forums with regards to Pentaho Data Integration (Kettle) has been shifting lately from this type of question:

How do I read data from database type xxx

going to this type of questions:

I want to read a list of e-mail addresses from a database table, set a variable and send the warehouse log files off to all these people.

That’s quite an evolution that’s been going on. It’s obvious that people are starting to find the obvious solutions to the first type of questions so now they just get stuck on doing more complex things. I guess that’s to be expected, really. It’s nice that for the most part, I can now say, “yes, with the new 2.3.0 release, that is most certainly possible”
However, IMHO, often there is something missing at the implementers side of the story as well. I guess what I’m saying is that all too often these fine folks have been strugling so long with software limitations that they forget that it’s them that’s in charge, not the data, not the database, them.

So, as a guideline for Kettle development, this is what’s important to me. If it’s blocking for an BI implementer, it’s important. It’s as simple as that. That’s because I hate saying that something is not possible.

So the bottom line is: don’t let the data or the database get you down, take back control!

Example of variable use in Mail job entry