Write your first spark application

Apache spark is a framework with which you can process huge amount of data with lightening fast speed. You can run it on a single node or in a cluster where task is distributed among nodes. One of the usage of spark is in ETL process where you extract data…

Journey...

Somewhere I read Consider your job as a mode of transport. You get onto this to reach your destination. If it breaks down, you don't just wait and complain, you find another vehicle and continue your journey. The most important part is, You need to know the destination. This is…

How to detect drift in AWS stack- part 2

In the first part, I had explained how to detect drift manually. Now we will see how to automate that. We are going to write a lambda function which can tell us whether there is any drift present in a given stack. There are boto3 api which helps to get…

Building data platform on AWS - part 1

You have joined this startup which wants to build a brand new,  scalable platform. With this it aims to reduces its operating cost and provides  better services to its customer. The platform in question here is the entire back end system to support  their products.  Since you don't want to…

The final question...

The question you should ask your code before shipping ...Don't only look at the happy path. At every single line, see if this fails, what happens and how are you handling that.…

Self documenting unit test

When you write unit test case with python unittest module and run the test,          by default it prints the testcase name (module name)...status. test_is_even_number (test_math_util.TestMathUtil) ... ok ---------------------------------------------------------------------- Ran 1 test in 0.002s OKIf you have a couple of testcases for a given…

How to detect drift in AWS stack- part 1

If you have deployed AWS stack using cloud formation template, there is a possibility that someone may have changed some of the properties on one or more resources without updating the CF template. As a result of this, your actual deployed resource is different from what is defined in the…

Create and initialize a list in python

Create a list and initialize it with some default values. #create a list of 10 elements with default value as 0 >>> my_list = [0]*10 >>> my_list [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]Later you can assign the value to…