Rolling back transactions

Pentaho Data Integration (Kettle) never was a real transactional database engine, and never pretended to be that. It was designed to handle large data volumes and slam a commit in between every couple of thousand rows to prevent the databases from chocking on the logging problem.

However, more and more people are using Kettle transformations in a transactional way. They want to have the option to roll back any change that happened to a database during the execution of a transformation in case anything goes wrong.

Well, we have been working on that in the past, but never quite got it right… until today actually. As part of bug report 724 I lifted the decision to commit or roll back all databases to the transformation level.

Take for example a look at this transformation:

What happens is that the first 2 steps will always finish execution before a single row hits the Abort step. That means that all rows from the “CSV file input” step will be inserted into the database table before the transformation fails. Well, in the past, even if you enabled “Unique connections”, this would have resulted in those rows to remain in the table.

To test yourself, use revision 6587 in trunk to build yourself or download a nightly build tomorrow.

With a little luck (further tests and then more tests) we can back-port this fix to version 3.0.2 this week, ready for the 3.0.2GA release at the end of next week.

I’m hoping to extend this same principle to jobs as well in the (more distant) future.

Until next time,
Matt

Step performance graphs

One of the things I’ve been working on lately in Kettle / Pentaho Data Integration is the transparency of the performance monitoring.

We don’t just need an API to get the step performance data out, but we also need to visualize this data in a simple way, something like this:
performance graph

Graph with moving average

The next steps will be to also allow this data to be spooled off to a database somewhere and to be accessed remotely using Carte.

Until next time,

Matt

An apple with a latte

One of the things that surprised me the most in 2007 was how much I disliked the Apple mini I bought from a software development viewpoint. The reasons for that included a wide variety of things, including a lousy keyboard (no | symbol for example) and a Java bug that caused a solid freeze of the Java VM.

I’m happy to say that there are now at least 2 possible solutions for that last problem.

The first is presented in the form of an update from Apple itself. This update appeared right before the hollidays. Like I suspected in the the problem was indeed a problem in the “concurren” classes as indicated in this bug fix.

The alternative is to give the SoyLatte project a try. This is a port of BSD Java on OSX. It aims to provide an OpenJDK port. Similar to what IcedTea does on Linux, it combines the code from Sun Microsystems with parts from Classpath.

It looks very much like open source (SoyLatte/OpenJDK) beat closed source (Apple Java) in the race to deliver a 1.6 JDK too. It is going to be interesting to watch this process unfold in 2008.

Until next time,

Matt