Dear Pentaho friends,
Ever since a number of projects joined forces under the Pentaho umbrella (over 7 years ago) we have been looking for ways to create more synergy across this complete software stack. That is why today I’m exceptionally happy to be able to announce, not just version 5.0 of Pentaho Data Integration but a new way to integrate Data Integration, Reporting, Analyses, Dashboarding and Data Mining through one single interface called Data Blending, available in Pentaho Business Analytics 5.0 (Commercial Edition).
Data Blending allows a data integration user to create a transformation capable of delivering data directly to our other Pentaho Business Analytics tools (and even non-Pentaho tools). Traditionally data is delivered to these tools through a relational database. However, there are cases where that can be inconvenient, for example when the volume of data is just too high or when you can’t wait until the database tables are updated. This for example leads to a new kind of big data architecture with many moving parts:
From what we can see in use at major deployments with our customers, mixing Big Data, NoSQL and classical RDBS technologies is more the rule than the exception.
So, how did we solve this puzzle?
The main problem we faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL. At first glance it seems that the worlds of data integration and SQL are not compatible. In DI we read from a multitude of data sources, such as databases, spreadsheets, NoSQL and Big Data sources, XML and JSON files, web services and much more. However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data. So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designed transformations but by SQL. This is at the heart of the Pentaho Data Blending solution.
To ensure that the “automatic” part of the data chain doesn’t become an impossible to figure out “black box”, we made once more good use of existing PDI technologies. We’re logging all executed queries on the Data Integration server (or Carte server) so you have a full view of all the work being done:
In addition to this, the statistics from the queries can be logged and viewed in the operations data mart giving you insights into which data is queried and how often.
We sincerely hope that you like these new powerful options for Pentaho Business Analytics 5.0!
– Also check out the new exciting capabilities we deliver today on top of MongoDB!
Chief of Data Integration at Pentaho, Kettle project founder