Bring up a Hadoop Cluster using Ambari

I remember, couple of weeks back one of my colleagues and I, were trying to bring up a 3-node Hadoop cluster using Cloudera. The experience was painful, to say the least. We solved quite a few intermittent issues, and thought the end was nigh, but unfortunately it wasn’t.  Finally, we gave up after 2-3 days of struggling, and it had left a bitter taste in our mouth. I don’t really

Installing Java on CentOS 7

This article is more for my own reference, as I keep forgetting the steps for installing Java on CentOS. The procedures are pretty straightforward. I’ll be assuming it’s CentOS 7(.4) and Java SE 8 is what we’re going to install. I’m also going to assume, the installation is done as the root user. If you’re planning to install as non-root user, you might have to use sudo for some of

Comparing performance of different models and choosing the best one

Hello there folks. Let’s talk about some Machine Learning today, Supervised Learning to be precise. Couple of months back, I had enrolled for the Udacity ML Advanced Nanodegree course. As part of that course, we had to choose a capstone project. I chose a problem of predicting whether a star is pulsar or not. In a nutshell, it’s a binary classification problem. The Kaggle dataset for this problem can be