Book Review : Pentaho Reporting 3.5 for Java Developers

Hi Pentaho fans,

These are exciting times for Pentaho for sure.  These are also extremely busy times.  However, that doesn’t mean we can’t look around once in a while.  Today we’ll take a quick look at a new book that arrived on my doorstep a few weeks ago.  It’s titled

Pentaho Reporting 3.5 for Java Developers

I’m very pleased to be able to review this book as it is written by one of the smartest but more importantly also one of the nicest people at Pentaho: Will Gorman.  Not only that, he apparently had help from KC (Kurtis Cruzada) and Jem (Matzan) completing the dream team for this book.

And what a great book it turned out to be.  It covers pretty much everything from basic reporting, over mobile reporting, calculations and formula, sub-reporting, cross-tabs, charting down to the Java API.

Obviously, this book as been reviewed many times before by various people and websites. (Yes, it’s that popular)   To me that means that I can’t just do a quick review, I’m going to have to actually use and read the book.  And that’s what we’ll do today for this review.

We’re going to create a report in the form of a PDF.  The data for the report comes from a Kettle transformation.  We’re going to do it with my favorite programming language (Java) and a complete stack of Open Source Software…

I began by creating a new Eclipse project called KettleBook, download the source over here.
To make sure I didn’t miss any library dependencies, I used the complete “lib” folder of Pentaho Report Designer 3.5 as my class path. (not included in the download)

First, I went to Chapter 10 in the book and started reading the paragraph titled “Building a report using Pentaho Reporting’s API” as that seems to fit the bill. (page 266)

That part explains plain and simple how to create a new Master Report, how data sources work.  But wait, I don’t want a DefaultTableModel, I want to read from Kettle!  Well, a few page flips later we find ourselves on page 143 reading about the KettleDataFactory.  That got me quite far actually as the sample is quite descriptive.

So then I created a small transformation to read from a sample customer file using Pentaho Data Integration 3.2.  This is it:

It reads 100 rows of sample customer data, filters out the people from California, Florida and New York state.  That gives us 91 records.  We’re going to read from the RESULT step placeholder.

The part on page 147 I needed was this block:

KettleTransFromFileProducer producer = new KettleTransFromFileProducer("Customer data", transFile, stepName, "", "", new String[0], new ParameterMapping[0]);
KettleDataFactory factory = new KettleDataFactory();
factory.setQuery("default", producer);

This part describes a producer to the engine.

I then proceeded on page 269 and put a document header and footer on the report and an item band.  Then I put 4 columns on the page and the report was written.  This took me all of about 30 minutes. The nice folks at Pentaho Orlando will have to forgive me, reporting is not my specialty. Personally I was quite pleased that it was that easy to do.

So, with the report definition ready, I now wanted to create an actual PDF out of that.  More reading revealed that we needed a PDF Output processor (to generate the actual file) and a page-able report processor to paginate and process the report definition.  This is how it looks in my case:

  FileOutputStream fos = new FileOutputStream("files/output.pdf");
DefaultConfiguration configuration = new DefaultConfiguration();
PdfOutputProcessor processor = new PdfOutputProcessor(configuration, fos);
PageableReportProcessor reportProcessor = new PageableReportProcessor(report, processor);

5 lines of code to generate a PDF! Suffice it to say I was very happy.

In total I spent a little over an hour to produce this document:

It’s quite simple: if it weren’t for the book I would have a really hard time figuring out where to begin.  I probably would have had to talk to Thomas Morgner, the brain child of Pentaho Reporting.  A nice fellow as he is, communicating to him is not for the faint hearted. (Fortunately he recently moved to Ireland so things will get better soon)

All joking aside, if you are planning to create reports using the Java API, do yourself a favor and buy this book right away.  Even if you’re not going to use the API, Pentaho Reporting principles and concepts are explained in great detail.

Many thanks to Packt publishing for sending me the book to review and congratulations to Will Gorman and the reviewers for an excellent job.  Congratulations to Thomas and his community too for making Pentaho Reporting 3.5 a smash hit.

Until next time,

P.S. I’ll be obviously covering more of this Java API sample at the upcoming Devoxx conference in Antwerp.

My new netbook…

Dear Linux fans,

Last weekend I saw an ad for a netbook in a Carrefour superstore leaflet that I guess was just too good to refuse.

Unlike other netbooks, this one was priced really low: €199,00 (including taxes which makes it cost my company €164.46 or about 200 $USD).  For me, that’s the price point where a netbook makes sense, not €400-500 what you see all over the place.

Now, for that low price, you get the following machine:

  • 1.6Ghz VIA C7-M CPU
  • 512MB RAM (DDR2 667, shared with video, 384 available)
  • 120GB hard disk (2.5″, 7200rpm)
  • 1024×600 LCD screen (pretty good quality actually)
  • Webcam
  • WIFI b/g
  • 2xUSB 2.0
  • VGA port
  • a multi-format card reader (SD, SDHC, MMC)
  • Microphone
  • Sound in/out
  • Mandriva Linux 2009.1

It was very interesting to see that “Windows 2007 Home Premium” was priced at exactly the same price.  Talk about a total waste of money on the Microsoft side.

OK, back to the netbook.  The memory issue is not a problem.  I already ordered a 2GB DDR2 RAM module for the machine at €39.

UPDATE 10/27 : the RAM arrived, was installed in 5 minutes and all works fine now.  With 1.9GB available the machine is a lot snappier too.

Performance is obviously not stellar but I didn’t expect this either.  I paid less for it then my current cell phone.  However, it plays full screen AVI without a glitch.

The only real problem the box has is that it comes with … Mandriva Linux.  Maybe I’m spoiled by years of Ubuntu use, but this distribution really sucks.  Can I please just install some software, customize the UI a bit?  Please?  I don’t recall the last time I couldn’t install a piece of software on Ubuntu because a package couldn’t be downloaded.  WTF?  And charge €28 just to get a couple of codecs to play audio/video? I can legally use these drivers in Europe without a problem.

Don’t get me wrong, all hardware is supported and works fine, including audio, the webcam, skype, flash, etc.

Anyway, I tried to put Ubuntu Netbook Remix 9.04 on it by booting from a USB stick.  Unfortunately, either the image or the stick has an issue since it freezes upon installer boot.  The live system boots but has a nasty video problem.  So I’m going to retry later next week.  Heck, maybe it’s better to just wait until Kubuntu 9.10 Netbook Remix comes out next week.

Feel free to leave advice on what distro to pick and how to best handle the install.  Also feel free to leave tips on how to explain the kids that this is not a toy.

Thanks in advance!



Presenting Pentaho at Devoxx

Hello Pentaho fans,

The organisers of the Devoxx conference have been so kind as to invite Pentaho to do a full (2×75 minutes) university session.  Devoxx takes place in the Metropolis cinema complex in Antwerp (formerly known as Javapolis) from November 16-20th.  The session I’ll be doing is titled:

Java business intelligence with Pentaho

This will take place on November 17th 13:30-16:30.

Because of the length of the session, the topic is the full Pentaho stack:

  • Introduction to Pentaho : project & company
  • The Pentaho BI Platform overview
  • Pentaho Analyses (OLAP) a.k.a Mondrian.
  • Pentaho Reporting a.k.a JFreeReport
  • Pentaho Data Integration (ETL) a.k.a. Kettle
  • Pentaho Data Mining a.k.a. Weka
  • Pentaho Metadata
  • Recap, useful links, etc.
  • Q&A

Java code samples and demos will be given for each topic.

See you all in Antwerp!


Kettle 4 Logging architecture

Just a quick update on the Kettle 4.0 logging architecture.

Glad it’s finally arrived…

In short:

  • Log separation between objects (transformations, jobs, …) on the same server
  • Central logging store infrastructure
  • Logging data lineage so we know where each log row comes from
  • Incremental log updates from the central log store
  • Color coding
  • Log update Pause/continue
Until next time,
P.S. Sorry about the audio hickups. They come from a trans-coding issue somewhere.  I don’t have time to chase/fix it.  It’s not that important.