Member-only story

Apache Airflow on Docker for Complete Beginners

Justin Gage
12 min readFeb 10, 2019

--

Airflow — it’s not just a word Data Scientists use when they fart. It’s a powerful open source tool originally created by Airbnb to design, schedule, and monitor ETL jobs. But what exactly does that mean, and why is the community so excited about it?

Background: OLTP vs. OLAP, Analytics Needs, and Warehouses

Every company starts out with some group of tables and databases that are operation critical. These might be an orders table, a users table, and an items table if you’re an e-commerce company: your production application uses those tables as a backend for your day-to-day operations. This is what we call OLTP, or Online Transaction Processing. A new user signs up and a row gets added to the users table — mostly insert, update, or delete operations.

As companies mature (this point is getting earlier and earlier these days), they’ll want to start running analytics. How many users do we have? How have our order counts been growing over time? What are our most popular items? These are more complex questions and will tend to require aggregation (sum, average, maximum) as well as a few joins to other tables. We call this OLAP, or Online Analytical Processing.

The most important differences between OLTP and OLAP operations is what their priorities are:

--

--

Justin Gage
Justin Gage

Written by Justin Gage

Technically explains software concepts like APIs and databases in easy to understand language and the right depth so you can impress your boss.

Responses (4)