Self-service BI with Pig, Impala and PowerBI


Visualisations in PowerBI

Here are some charts that we generated with PowerBI. The nice thing is that you can drill down on any bar. This is ideal for exploring a dataset.


You can also easily build an animated chart. In the following example, the delays per airport are shown on a scatter chart, where the total delay is plotted against the likelihood of having a delay. If you ‘play’ the chart, you can see the evolution of the delays on a day-to-day basis. From this animation, it’s clear that Saturday is your best bet if you really don’t like delays.


Like any self-respecting BI tool, PowerBI also offers a Map chart. We’ve experimented with it and we’ve got some beautiful results already.


As I mentioned before, the search feature is also very powerful. For example:

Doing BI becomes as simple as doing a Google search. Well, I guess Microsoft calls it a Bing search, but anyways…

The only thing I really miss with PowerBI are “live queries”. PowerBI retrieves all data you need from the source, and does all calculations on your machine. This doesn’t work well with Big Data. For one, you’re moving your data around, not your processing, That’s a bad smell. You’re limited to the amount of memory and processing power of your machine. You’ve lost all advantages of a distributed SQL database or a Hadoop platform. Also, downloading millions and millions of rows puts a heavy load on the network, and it will take a while before you can fire your first query. Typically, you can download only a subset of your data. That obviously restricts you in so many ways.

Tableau does offer those live queries. What it means, is that it doesn’t try to retrieve the entire dataset. In stead, it fires the right SQL query to the database, and only returns the results. You can take full advantage of your powerful cluster, you’re not congesting the network, and you can start querying your dataset immediately. I hope this will be possible in future versions of PowerBI as well.

Detailed tutorial from Hortonworks

Hortonworks have done pretty much the same thing (obviously minus Impala), and have put together a very detailed tutorial about it. Well worth the read if you want to try this yourself:

Published by Aryan Nava

Founder of "BlockchainMind", CTO for two Blockchain startup during 2018, Cloud/DevOps Consultant and Blockchain Trainer

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: