Just finished “Hadoop Platform and Application Framework” MOOC on Coursera. Several thoughts on it.
- The primary goal for me was to get the sense of various abbreviations in the Hadoop world(HDFS, YARN, Pig, Spark). I think I reached this goal.
- The course is pretty lightweight in terms of time to be spent. It took around 1-2 hours for each lesson to be done. Only once I had to spend around 3 hours which was related to an assignment and weird autograder output.
- Autograder in this class is not good. You can’t really say what went wrong with your results and where you need to search for the error.
- I could easily learn everything I needed on the 1.25x video speed.
- There are really high chances that your BASH utils will be much faster than Hadoop MapReduce approach if you don’t have gigabytes of data.
- Spark is amazing. It really makes the whole Hadoop cluster open for programmers like me. You really start creating normal functional code to process your data rather than struggling with the mappers/reducers paradigm directly.
This MOOC is definitely worth spending time on it. Looking forward to get into the next course named “Introduction to Big Data Analytics” from the same specialization.