Udacity’s Data Engineering Nano Degree — A Review

My regular little chit chats:

I keep reminding myself this; That I have personally never invested on my computer science related skills during the first half of my career.

The head of our department, our dearest DLP sir (Dr. DLP) after the admission process to our PG course had told us to make use of the break to learn some software skills. He insisted that all of us should be familiar with programming & some UNIX. Dad somehow managed to find a very young teacher to help me with UNIX. It is sad that I don’t even remember the name of the teacher, but even today when I type mkdir or a cp, I remember how he taught me what each command stands for. If I remember correctly, dad paid some 2k for that two weeks UNIX class then. I learnt programming from my younger brother. He taught me C and C++ just 5 or 10 days before I was to join my PG.

By the time the classes opened for our PG classes, I was ok to start working on UNIX, but was not totally confident with C and C++. (Hope my brother is not reading this!) My tryst with programming started here. I struggled a lot with codes during my first phase of my career though I really loved programming.

During the second phase, I found myself addicted to courses, technologies and re-skilling. Unlike earlier, technology related trainings were more democratised by then.

It was probably in 2017 that I decided to switch to a data science related career. One of the reasons I made that decision is because my spouse works in the field of mobile applications development and any work I do or any job I take up, people somehow liked to believe that all the work is done by my spouse!

I wanted to get out of this and wanted to have an identity myself and decided to choose a field that he has no connection with. That is how I decided to give it a try on data science. (Guess what! Things are not much better even now when it comes to my identity. It is still tied with my spouse’s!)

I worked on two or three projects in the field of data. But all of them were related to data capturing, data wrangling and visualisation. Even when there were some Python involved to do the data wrangling part, all the data sources were spreadsheets or csv files.

I was planning to give my courses and learnings a break and take up a bit of machine learning after the break. But, somehow my current job demanded data engineering skills. In my earlier jobs, I had worked on structured and unstructured data both, but all of them could be managed with the processing power of the machines I owned.

It was when I first wrote a script to pull real big data from an analytics platform that I realised that probably my skills will not be enough to handle bigger data. Somewhere during this time when I was planning to look into GCP as an option for storing bigger data, I came across a project which got delivered on data. I had made a note of the skills which the team had and decided to learn those technologies during my free time.

I had a passion for flutter and also wanted to learn machine learning very badly. But, then, I decided to let go all those and focus on data engineering skills and browsed for a data engineering course in Udacity. Udacity has always been my go to learning platform whenever I wanted to pick a totally new skill & when I wanted to gain some hands on practise in those techs.

I found this course — data engineering nano degree and I decided to take it seeing three topics there. Intro to cloud computing and AWS, Redshift and data pipelines using Airflow.

I had no time to give it a thought since our new project was about to start & it was important that I should be skilled by then. I just decided to buy the course though I was planning to move away from Udacity for various reasons.

So, time to wrap the chit chat sections. Let me get to the blog now. :)

Into the blog

Did I benefit from the course?

I am still with the course. Learning the last topic in the course. I have to admit that as someone who always mostly worked on front end, this was one of the most boring course I have ever done. I was falling asleep while watching the videos.

But, undoubtedly, I did benefit a lot from the course. Here are the things that I am excited about my new learnings.

  • Python is my favourite programming language. I think I should keep aside a blog for writing about why Python is my favourite language. But, yeah, let me jot down one of the reasons here. A library like pandas is one of the reasons Python is my favourite language. Data wrangling has always been very simple with the power of pandas. When the data sets that I started handling becoming bigger and bigger, I had started having this question haunt me: how am I going to handle bigger data & wrangle them when I have to work with one such set. The course has an answer to this question. It makes us comfortable with spark. Spark will be an answer to many big data problems for sure. (Personally, I loved spark sql because it almost helps you use regular sql on big data with the help of spark.)
  • I have worked on front end and business layers of the various applications that I worked on. Have also worked on data bases independently. But, I never had to bother about any kinds of hosting & scaling. The course introduces us to various services on AWS. Storing big files on S3 buckets, writing python codes on EMR clusters on EC2 machines and writing data to tables on redshift were all new things to me.
  • place holder text for a write up to be inserted after completing the section on Airflow.

Few things which caused some hiccups:

  1. The first two sections of the course was about regular relational and non relational databases. There were exercises to connect and query these databases using Python. The course is designed for a bit experienced resources. Considering that, this section is something which anyone with some good knowledge on Python and basic data bases should be able to do without much efforts. That way, I personally was bored with the first two sections of the course and felt that I didn’t learn many new things from here. (Though those were the most organised portions of the course)
  2. The course offers AWS credits and it mentions that most of the students have used only a small percentage of the credits. They have also given some tips on how to save on the credits. Despite me being “at my OCD best” to follow the instructions, I exhausted my credits when I had not even half way through the course. I am using my personal account to finish the remaining course.

The highlights of the course:

When I first did a course with Udacity, we had personal mentors. I found the new model better. We have forums were you can post your doubts and the questions are all answered by Udacity mentors. That way the experience was smooth when you are stuck with something.

Also, as in the case of the other Udacity nano degree programs, the projects are the highlights of this course as well. Each of them makes us feel very confident about the skills we just learnt.

Alright, time to wind my blog. I was bored again and decided to write here. Hopefully, I will be back soon with my experience with the course from Front End masters on Docker.

In love with technology and programming. Also a devil’s advocate. Believes in sisterhood. https://pyarisinghk.github.io