Have you worked on a go-live project or a prototype?
- What Most People Say: “I’ve dabbled with Hadoop in my spare time.”
- What You Should Say: “I had considerable experience as a data warehouse architect before taking classes to learn Hadoop. Then, to make sure I was ready to handle big data sets, I pulled massive amounts of historical data from the New York Stock Exchange and used the sample database to hone my analytical skills. I also used the data to create programs in MapReduce. You can see samples of my work by visiting my website.”
- Why You Should Say It: If you’re going to hone your skills in a simulated environment, make sure it emulates what you’ll find in the real world, says Chacko. Real jobs require you to handle big, heavy data sets.
How many nodes can be in one cluster?
- What Most People Say: “I would say no more than two to three nodes.”
- What You Should Say: “Hadoop scales out nicely, so the load really depends on the structure and data warehouse configuration. Hadoop can easily handle 10 to 50 nodes.”
- Why You Should Say It: Inspire confidence by showing that you understand Hadoop’s clusters and how to coordinate the parallel processing of data using MapReduce. Also, be sure to highlight your previous experience working with large data sets, even if it didn’t involve Hadoop.
Which NoSQL databases have you worked with?
- What Most People Say: “I’ve worked with Cassandra.”
- What You Should Say: “There are four categories of NoSQL databases. The first is key-values stores. I’ve used Redis, primarily when working with semi-structured data. The second is column value stores. I’ve used Cassandra when I needed scalability and high availability. The third is document databases. When I’ve needed to store and access semi-structured documents in formats like JSON, I’ve used CouchDB. Finally, there’s graph databases like InfiniteGraph.”
- Why You Should Say It: Sometimes, professionals are told to work with an open source database simply because it’s cheap. Unfortunately, they’re not ready for prime time because they have no idea why they’re using it or which NoSQL database is more efficient for processing large quantities of structured, semi-structured or unstructured data.
Which tool have you used for monitoring nodes and clusters?
- What Most People Say: “I haven’t used one.”
- What You Should Say: “I’ve used Nagios for monitoring servers and switches. And I’ve used Ganglia for monitoring the entire grid.”
- Why You Should Say It: “There are approximately 59 tools that can be used with Hadoop,” explains Chacko. “And not all of them can be used at the same time.”