Reading from MongoDB

How to read data from MongoDB using PDI 4.2

Hi Folks,

Now that we’re blogging again I thought I might as well continue to do so.

Today we’re reading data from MongoDB with Pentaho Data Integration.  We haven’t had a lot of requests for MongoDB support so there is no step to read from it yet.  However, it is surprisingly simple to do with the “User Defined Java Class” step.

For the following sample to work you need to be on a recent 4.2.0-M1 build.  Get it from here.

Then download mongo-2.4.jar and put it in the libext/ folder of your PDI/Kettle distribution.

Then you can read from a collection with the following “User Defined Java Class” code:

import java.math.*;
import java.util.*;
import java.util.Map.Entry;
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;

private Mongo m;
private DB db;
private DBCollection coll;

private int outputRowSize = 0;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
	DBCursor cur = coll.find();

	if (first) {
		first=false;
		outputRowSize = data.outputRowMeta.size();
 	}

	while(cur.hasNext() && !isStopped()) {
		String json = cur.next().toString();
		Object[] row = createOutputRow(new Object[0], outputRowSize);
        	int index=0;
		row[index++] = json;

	    	// putRow will send the row on to the default output hop.
        	//
    		putRow(data.outputRowMeta, row);
	}

	setOutputDone();

    	return false;
}

public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface)
{
	try {
        	m = new Mongo("127.0.0.1", 27017);
		db = m.getDB( "test" );
    		coll = db.getCollection("testCollection");

 		return parent.initImpl(stepMetaInterface, stepDataInterface);
	} catch(Exception e) {
	  	logError("Error connecting to MongoDB: ", e);
    		return false;
	}
}

You can simply paste this code into a new UDJC step dialog. Change the parts in the init() method to server your needs. This code reads all the data from a collection in a Mongo database.  The output of this step is a set of rows contain each one JSON string. So make sure to specify one JSON String field as output of your step.  These JSON structures can be parsed with the new “JSON Input” step and then you can do whatever you want with it.

Please let us know what you think of this and whether or not you would like to see support for writing to MongoDB and/or dedicated steps for it.  I’m sorry to say I have no idea of the popularity of these new NoSQL databases.

Until next time,

Matt

UPDATE: The functionality described in this UDJC code is available in a new “MongoDB Input” step in 4.2.0-M1 or later.

UPDATE2: We also added authentication for MongoDB in PDI-6137

P.S. To install and run MongoDB on your Ubuntu 10.10 machine, do this:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
sudo apt-get update
sudo apt-get install mongodb

My Android tablet

I’ve always been a fan of gadgets and so when it came time to buy my dad a replacement for his 6 year old Palm Pilot that recently broke down, we (me and my sisters) bought him an Apple iPad.  Just to make this clear and to get this out of the way, it was €600 well spent since he loves this device a lot.  Mostly he watches television on it and reads his newspaper.

I had the iPad about a week before we wrapped it up and in that small time frame I was impressed with the device, both in terms of user friendliness but also in terms of frustration.  I felt a lot of frustration because that device is as closed down as you can possible close down a computer.  My biggest gripe was of-course iTunes.  For my dad it must have felt quite natural to “synchronize” a dumb terminal hand-held device like a Palm or an iPad.  To me, it felt really awkward coming from the Android platform where mail and calendar is held in the cloud, where you can install applications from a web interface and with devices you can hook up to your computer to transfer files.

So naturally I wanted to buy an Android tablet for myself.  Having played with the 10-inch iPad I’m convinced that this is a really nice form factor for a tablet so I wanted one of those.  However, the problem there is that you basically have 2 main variations of Android tablets at the moment: the really expensive or the really cheap.  First let’s take a look at the really expensive. I don’t know about you but to me forking over +€600 for what is basically a gadget is too much.  In that respect I think that devices like the Galaxy Tab, the upcoming Xoom and many more are simply missing the mark with prices of beyond €800.  I already have a laptop and a smartphone.  This gadget will be used to browse the web, play games, read books… in my couch or in bed.

The dirt cheap category is filled with all sorts of equipment that looks nice but either has old versions of Android, a lousy single-touch screen, not enough memory (256MB), a battery life of half an hour or a dog-slow 5-year-old processor.  Sure, it only costs €100 but you know you’re never going to use it for more than a bit of testing.

Unfortunately, there are very few (Android or iOS) tablets to be found in the space in between, the price range between €250 and €450 where netbooks did so well.  It almost looks like all manufacturers want to make a quick buck from this new tablet hype.

At the end of my search I heard about the Point of View Mobii Tegra 2 tablet that came out around the end of last year.  A month ago I bought it and priced at a reasonable €350 it comes with the latest  NVidia Tegra 250 mobile chipset which is (it has to be said) bloody fast with a Dual core Cortex A9 processor at 1Ghz.  It has 512MB of RAM to work with and a multi-touch capacitive 10.2″ screen (1024×600).  While the screen itself is by far the weakest part of the deal, the device also comes with a MicroSD card reader, a 1.3M front-facing webcam, a USB port (host mode, meaning you can hook up your 1TB hard drive to is, or your keyboard & mouse) and an HDMI port (to play Angry Birds on your 50″ HD TV set)

As such, this device would be nearly perfect if it weren’t for the fact that the software that runs on it, Android 2.2 with some customizations from the manufacturer, is pretty bad.  After a month of usage this tablet has been a lot of fun to test and play with and it’s gained its spot in my life.  To me the tablet would be perfect with a better OS (Android 3?) and with a better screen (viewing angle isn’t great but not that bad to become an issue).  Something tells me that this will very soon become possible at the same price-point.

Fortunately though, software is something that can be fixed rather easily these days.  After all, this is open source Android we’re talking about here.  To save you folks out there that just bought this machine the trouble, I’m going to explain to you what you need to do to get it up to speed (literally).

UPDATE: before you start you might want to consider updating your device firmware (from February 11th) to v1.0.9.  To check go to the root screen, Menu, Settings, About the phone.  If you update you can skip the installation of the flash player and the screen calibration.  This update erases your tablet so make sure to back up your important data.

We’re going to install “some” extra software to make it usable… The information I found almost exclusively comes from Tweakers.net, a Dutch site.  Since Point of View is a Dutch company I guess that makes sense.  It also allows for a very hackable and upgradeable machine 🙂

Before we start, make sure to insert a MicroSD card into your new PoV tablet.  It will help it getting good performance and in general I think the software expects to find something there.  Since these things are dirt cheap and range from 1GB to 32GB, pick your pick.  I put 4GB in there and I still think that’s plenty for this device given the fact you can plug in your USB HD/Stick to watch movies.

To begin with, the machines usually ship with a badly calibrated screen making typing and touching the screen a bad experience.  So first download and install the “Module AP” application.  Before running, make sure your screen is clean and your device is placed (screen up) flat on your desk.  Note that after trying certain older Android games I had to re-run this app to re-calibrate the screen. (once or twice I think)

Next we want to obviously install Adobe Flash. Download and install that, not from the Android Market (we will install the Market later) but from this location.  For the Apple fans out there dissing Flash: too many websites use it, it’s not going away.  Most news sites play video using Flash and it works great and fast (even full screen high resolution) on a Tablet.  Let me spell it out for you: banning it from iOS was a big mistake.

Next the Android Market.  For the zip file and explanation see this blog post from Oudmaijer.  It should be fairly straightforward, I didn’t spend too much time on it.  Oudmaijer also has advice on installing alternative ROMs, the Google apps and much more for the adventurous.

Personally I already have a calendar and e-mail on every place I can possible think of it so I erased all these apps from my tablet again.  The machine is now strictly non-work related and I like it that way.  (note I can use the browser to read mail and see my Calendar while on the road just fine if I need to).

The lack of 3G is no longer relevant in my opinion now that I have a smart-phone that can Wifi-tether.  If I plug the phone into the USB port of the tablet it charges too.  If you read the various websites of the PoV Mobii you can read up on how to connect your 3G dongle or how to change the USB port from Host mode to Client mode.  I never bothered with either.

Here is some other cool software I installed on the device:

  • Amazon Kindle & Aldiko for a bit of reading.  Having my own book with me is great 🙂
  • Firefox 4 Mobile (Beta 5 or later) Excellent browser with Sync support.  The new béta 5 release of v4 mobile is absolutely a pleasure to work with.  Too bad it misses flash support but I’m sure it will arrive sooner or later.
  • Astro File Manager so you can browse your file system and so on.  Make sure to also install the networking, SMB, SSH/SFTP modules so you can browse your remote (Linux) PCs.
  • Advanced Task Killer Froyo: Especially in the beginning when you’re testing every new piece of software out there when none of it was designed to run on a dual core or on a 10″ screen it can be quite handy to kill all programs with one finger swipe.  Kill everything before running intensive games like a few of the ones listed below.
  • Better terminal emulator Pro: It doesn’t make a lot of sense to have on a phone but with a tablet and a USB keyboard at hand it can be surprisingly useful. BTEP also has support for scp, ssh and other useful commands so you can copy files the way you know you like it.  Heck, I was Unix sys-admin for Volvo in a previous life and Android is a Unix machine.
  • QuickPic : For picture viewing. The built-in software works for file viewing and looks very fancy.  It just seems to choke on large volumes of images.
  • z4root (see download link at the bottom of the post) to root your device.  You know you want to 🙂  Goes nicely with “Uninstaller for root” to remove unwanted built-in apps like “e-Mail” and so on.
  • Google Maps : I installed it of-course, but I don’t use it that much.
  • Rock Player: This media player handles all possible file formats so you don’t have to convert the HD movies on your hard disk or thumb drive.  Copy them over and watch them when you feel like it.
  • Adobe Reader: Nicely renders PDFs with very good performance. (almost immediately with quick page-turning)
  • Skype: works great but unfortunately without video. I removed it because people would ping me while reading a book.
  • Seesmic can be used for Twittering.  Tweetdeck thinks for some reason it can’t run on the tablet. (lazy developers I tell you)  Again, I un-installed twitter from this device.  It’s not because you can that you should install this sort of thing 🙂

These were some apps you can start with.  Please note that Android 2.2 isn’t very good for multi-tasking and multi-media yet.  It tries to do too many background tasks.  If you don’t want to be disturbed with any of that (what the default should be), turn off background synchronization in the settings.  It helps out a lot with responsiveness in the games listed next.  Updates for this device were promised in the form of Android 2.3 in the coming months so I’m sure that this situation will improve soon.

  • Asphalt 5 : This HD game fully supports the capabilities of your dual core tablet and is just a lot of fun to play.  It nicely shows off the potential of your tablet.
  • Angry birds & Angry birds seasons: if you have kids they’ll want it, they’ll need  it. Both game engines can stall on your brand new CPU (once or twice, not often).  If this happens, hit the power button for a while, select Home screen and kill the game with your favorite task killer.  Rovio should release an update for the new dual cores to fix this.
  • Glow Hocky and Air Hocky: HD full screen air hocky games. The first one is very flashy the second plain.  I think I prefer the first version.
  • Fruit Ninja: Just a lot of full screen fun on a tablet.
  • TurboFly 3D: More fast-paced 3D racing fun.
  • Robo Defense 2.0: This recently released version supports HD screens and tablets just fine.  Still a lot of fun, now with more extensions and upgrades.
  • Penguin Skiing: my son’s favorite, clone of the open source (Linux) variant.
  • Radiant: Excellent game with a retro look and feel. Think big pixels.  Originally purchased for my phone it works great full screen too. There is a HD variant as well but I didn’t try it yet.
  • Krazy Kart Racing: more high speed full screen 3D racing fun. Another favorite of Sam.
  • 3D Invaders: Even though it’s a beta it’s playable and fun.
  • Android Shogi: Great program with large opening book that is downloaded on request. Tactically less strong in the middle game but with a brutal mating engine.  I enjoyed playing Shogi again after all these years 🙂
  • PewPew: Tough multi-touch 2D game with a really cool (vector graphics) retro look.
  • Klondike Solitaire: In case you still have time left on that looong flight across the Atlantic.
  • Pinball Deluxe: See me, hear me, feel me. More full screen smooth graphics fun with a virtual pinball machine.
  • xkcd: On and off-line viewer of the well known geek comic.  Use this one, the others don’t support the 10″ screen.
  • Spaghetti Marshmallows: fun physics engine game. Again full screen high resolution support on this tablet.
  • Tank Hero: Nice fast-paced tank-busting game.
  • New! NVIDIA Tegra Zone: A new app in the market that lists Tegra2 optimized games and apps like Fruit Ninja THD and the upcoming Galaxy on Fire 2 THD awesomeness.

For all these games the same is true: don’t use built-in the G-Sensor.  Even though it’s possible, 10″ devices like this one or the iPad are simply too heavy for it.  After 2 minutes of pretending your screen is a steering wheel, the fun is over.  All the games listed above have a touch-steering mode which is actually a lot more fun. (If you think you’re G-Sensor is broken, unlock it with the switch at the top of your tablet :-))

The only app I can think of I’m still missing is a nice video-chat application since the tablet does indeed have a front-facing webcam.  In the future I could then give a tablet to the remote family members (grand-parents and so on) so we can all can video chat with them.  I know it’s technically possible now.

Finally, a word on the battery life of this thing.  As far as I can tell the battery life of this tablet is about the same or better than the iPad.  Perhaps that is because I no longer have any background connections taking place all the time checking for eMail,  calendar appointments or Google Talk/Skype connections.  In any case, I think it lasts around 5-6 hours tops if you are non-stop doing intensive gaming.  I haven’t tried to run HD movies yet but I’m sure the thing could last a movie or 2 easily.  For normal web-browsing and mostly stand-by usage I think the PoV Mobii Tegra lasts for days. (I never tried since I usually charge it overnight.  Obviously just like it is the case with the iPad, turning off the wireless LAN actually makes a huge difference in battery life.

There you are.  I hope you liked this “little” review. You’ll be up and running on your new tablet in no time.  While the Android tablet market space is just opening up, it’s already really interesting to be using it.

Just try not to have too much fun!

Until next time,

Matt

P.S. Before anyone asks: yes you can access your Pentaho BI server with the built-in browser and yes, open flash charts work great.  Haven’t tried 3.7.1 and Analyzer yet but I will do that soon.

P.P.S. If you somehow managed to install experimental software (I’m guilty of trying everything) and you can’t use the touch screen anymore… simply hook up a standard USB (US) keyboard and use the cursor keys to navigate and re-calibrate the screen.  Keep pressing the “Back” button to unlock your screen.

My new netbook…

Dear Linux fans,

Last weekend I saw an ad for a netbook in a Carrefour superstore leaflet that I guess was just too good to refuse.

Unlike other netbooks, this one was priced really low: €199,00 (including taxes which makes it cost my company €164.46 or about 200 $USD).  For me, that’s the price point where a netbook makes sense, not €400-500 what you see all over the place.

Now, for that low price, you get the following machine:

  • 1.6Ghz VIA C7-M CPU
  • 512MB RAM (DDR2 667, shared with video, 384 available)
  • 120GB hard disk (2.5″, 7200rpm)
  • 1024×600 LCD screen (pretty good quality actually)
  • Webcam
  • WIFI b/g
  • 2xUSB 2.0
  • VGA port
  • a multi-format card reader (SD, SDHC, MMC)
  • Microphone
  • Sound in/out
  • Mandriva Linux 2009.1

It was very interesting to see that “Windows 2007 Home Premium” was priced at exactly the same price.  Talk about a total waste of money on the Microsoft side.

OK, back to the netbook.  The memory issue is not a problem.  I already ordered a 2GB DDR2 RAM module for the machine at €39.

UPDATE 10/27 : the RAM arrived, was installed in 5 minutes and all works fine now.  With 1.9GB available the machine is a lot snappier too.

Performance is obviously not stellar but I didn’t expect this either.  I paid less for it then my current cell phone.  However, it plays full screen AVI without a glitch.

The only real problem the box has is that it comes with … Mandriva Linux.  Maybe I’m spoiled by years of Ubuntu use, but this distribution really sucks.  Can I please just install some software, customize the UI a bit?  Please?  I don’t recall the last time I couldn’t install a piece of software on Ubuntu because a package couldn’t be downloaded.  WTF?  And charge €28 just to get a couple of codecs to play audio/video? I can legally use these drivers in Europe without a problem.

Don’t get me wrong, all hardware is supported and works fine, including audio, the webcam, skype, flash, etc.

Anyway, I tried to put Ubuntu Netbook Remix 9.04 on it by booting from a USB stick.  Unfortunately, either the image or the stick has an issue since it freezes upon installer boot.  The live system boots but has a nasty video problem.  So I’m going to retry later next week.  Heck, maybe it’s better to just wait until Kubuntu 9.10 Netbook Remix comes out next week.

Feel free to leave advice on what distro to pick and how to best handle the install.  Also feel free to leave tips on how to explain the kids that this is not a toy.

Thanks in advance!

Cheers,

Matt

The kindness of strangers

Dear Kettle fans,

There isn’t a week that goes by where I don’t find myself amazed by the number of contributions and help that the Pentaho Data Integration project receives in all kinds of forms.  There are people contributing anything from small patches to complete steps, folks helping out others on the forum, writing documentation, writing books, translating PDI, etc.  Without any question, this has been a truly amazing experience, not just for me but for the whole Kettle project.

It’s because of that overwhelmingly positive experience that I’ve always tried to be accessible and in contact with my community in all sorts of possible ways.  And because of that positive vibe I have refrained from commenting on the negative flip side to that story for the longest time.

The problem is really that lately things have been changing.  It’s probably caused in general by an increasing attention to open source and specifically by an increase in popularity of Kettle.  In any case, certain types of people do the following:

  • Send me personal email
  • IM me on skype/Yahoo!/MSN/AIM/…
  • Send me all sorts of messages and questions through the forums
  • Ask questions on this blog

Usually it’s a combination of any of the above.  Any time now I expect folks to be sending me direct twitter messages.  The questions are always the same:

I have an urgent Pentaho porblem.  I am incapable of using the forum for some stupid reason and so you have to help me, preferable now or within the next 15 minutes!!!!

This way, the meaning of “The kindness of strangers” becomes more and more like the one from the Nick Cave song.

I’ve just finished reading Linus‘ book “Just for fun” (Thanks again Domingo!) and his approach to the problem of staying in reach for people to contribute code and at the same time allowing yourself to have a life and a job is simple : if it ain’t fun, don’t do it.  Well, the barrage of this sort of questions has stopped being fun for me a long time ago.

As such, I’m going to try this approach: any question that could or should be asked on the forum is from now on silently ignored and deleted from my mailbox.  Any person that is not part of my “community” and that needlessly contacts me over IM gets blocked indefinitely.  And yes, that goes for twitter as well.  Off-topic questions on this blog go to the spam folder as well.  I will simply refuse to spend time on non-interesting topics.

I thought about creating a standard response e-mail, but any sort of replying is simply an encouragement to certain types of people and will only make matter worse. (been there, done that)

I’m sure everyone understands that this is the only way to free up time to work on the real problems at hand.  Thank you for your understanding in any case.

Until next time,

Matt

Donate to Freenode

I just heard from cry for help from fellow countryman Jochen Maes on planet grep.  Apparently Freenode, the service that provides us with our ##pentaho IRC channel is in need of a bit of money.

So obviously I donated a couple of GBP.  After all, they aren’t worth what they used to be so it’s cheap to donate!  I hope that if you use Freenode, you’ll find the time to do this too.

Until next time,

Matt

Gartner DI MQ

Dear Data Integration fans,

A few weeks ago, Yves de Montcheuil from Talend took a shot across the bow of Gartner for not including Talend in their Magic Quadrant (MQ) for data integration.  After that post, Andreas Bitter from Gartner (rightfully) felt personally under assault and felt the need to set the record straight.

I think the discussion itself is very interesting, but misses very important point:

The Magic Quadrant contains companies not trends nor communities nor people nor software!

Think about it for a second.  In the early days of JBoss there were complaints from Marc Fleury about the fact that only a small percentage of the “JBoss the software” users paid anything to “JBoss the company”.  Numbers that floated around back then were 0.01% or 0.1%, can’t remember exactly.

Those numbers make sense, I’ve heard about similar figures from other commercial open source companies.  Anything in the range 0.01% to 1% is possible.

Let’s be “optimistic” here and claim that a company like Pentaho converts 1% of all users into customers. (trust me, that figure would be really great given the millions of users out there :-))  That would mean that we’re disturbing the market of our competitors for the turnover x 100.  So if Pentaho would do a dollar turnover, we’re disturbing the closed source vendors for 100 dollars.

Pentaho and yes indeed Talend see that they are being a serious disturbance to the market dominance from the traditional DI vendors.  And that is why Yves feels a bit mistreated by Gartner.  However, since companies like Pentaho and Talend use a disruptive business model it is only normal that the Gartner MQ itself is also disrupted by our models. You simply can’t be part of the system if you want to disrupt it I guess. (*)

All that being said, it’s only a matter of time before something has got to give: open source or the Gartner DI MQ.  Yves, Andreas, let it be noted I’m betting on the former to come out of this as a winner.

Until next time,

Matt

(*) This also partly explains why Kettle and TOS are not really competitors: we’re using the same business model and are not disrupting each other.  We offer 2 completely different choices to our users.

Canonical: take my money

Dear Canonical,

You claim that there is little money in the desktop software business and more in services.  Well here is something I would pay money for:

Take the top selling business laptops from Dell, Acer, HP, Lenovo and offer customized distributions for them.

I would pay for that in an instant.  All too often people confuse open source with free of charge.  I’m perfectly capable of making that distinction.  In fact, I use my machines for my work and don’t want to spend days configuring all the devices on them.  As such, I would pay something like 50 USD for a customized (K)Ubuntu or perhaps 150-200 USD if it came with some sort of (e-mail) support contract for a year.

I don’t use Linux / Ubuntu because it costs less, I use it because I prefer it over Windows to do my job.  I would pay that kind of money because I would save time and money in the long run.

Until the major hardware vendors offer decent (worldwide) support for Linux on their machines (out of the box that is), I think this is an idea with potential and I hope at least someone picks it up.  Go ahead, let me spend money on it!

Until next time,
Matt