Spark Streaming for beginners

Whether you are running an eCommerce store and want  to put up a dash board which shows the number of  orders processed every minute or run a very popular blog and would like to display trending articles on your web site or any other scenarios like this, all of these…

PySpark tips for beginners

Be careful when you use .collect()Do not call .collect() on RDD or data frame. Your driver may go out of memory if RDD or data frame is too large to fit on a node. Use take() function instead. You can specify the count with take that reduces the number…

Capture text from web with flask and JavaScript

We read a lot of stuff from web and sometimes would like to make a note of some of it so that we can refer to it later. There are a few products which charge few $/month and provide this service but we can easily write a browser extension to…

Write Your personal money manager for fun and free !!!

At some point in time you may have used a money manager to track your expenses, categorize them etc. All these work great but the only issue is that you need to share very sensitive information with a third party. If they are tracking SMS, then they know all about…

Write your first spark application

Apache spark is a framework with which you can process huge amount of data with lightening fast speed. You can run it on a single node or in a cluster where task is distributed among nodes. One of the usage of spark is in ETL process where you extract data…

How to detect drift in AWS stack- part 2

In the first part, I had explained how to detect drift manually. Now we will see how to automate that. We are going to write a lambda function which can tell us whether there is any drift present in a given stack. There are boto3 api which helps to get…

Self documenting unit test

When you write unit test case with python unittest module and run the test,          by default it prints the testcase name (module name)...status. test_is_even_number (test_math_util.TestMathUtil) ... ok ---------------------------------------------------------------------- Ran 1 test in 0.002s OKIf you have a couple of testcases for a given…

Create and initialize a list in python

Create a list and initialize it with some default values. #create a list of 10 elements with default value as 0 >>> my_list = [0]*10 >>> my_list [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]Later you can assign the value to…