big data

Blog: big data

What is Airflow and the best contexts to use it?

Apache Airflow is a platform that is used to programmatically authoring, scheduling, and monitoring workflows. It is completely open-source and is especially useful in architecting complex data pipelines.

python

Python and Impala — Quick Overview and Samples

Today we would like to switch gears a bit and get our feet wet with another BigData combo of Python and Impala. The reason for this is because there are some limitations that exist when using Hive that might prove a deal-breaker for your specific solution. Impala might be a better route to take instead.

python

How to use Python with Hive to handle Big Data?

Over the past few years, we have been hearing more about the wealth of data we humans generate. This has progressively grown into the concept that if you have enough of this data and you are able to piece together some meaning from it, then you can achieve everything from predicting the future to curing all human ills.