-->
This project requires robust data pipelines that can efficiently process high-volume datasets. To address this, I engineered an Inflation Analysis System using an automated ETL workflow. The pipeline extracts data from disparate sources, transforms it into a star schema in PostgreSQL (Data warehouse), and integrates with BI tools for real-time visualization and analytics. Orchestrated with Apache Airflow, the system ensures scalability, automation, and reliability.
The challenge was to process weekly food price data from web sources and transform it into actionable insights. To meet this need, the ETL system was designed to:
The pipeline adheres to a modular workflow orchestrated in Apache Airflow:
The extraction stage uses Python’s requests library to interact with a RESTful API. Airflow schedules and monitors this task to ensure reliability.
The raw data is normalized into a star schema. A fact table stores price metrics, while dimension tables capture commodities, regions, and dates.
The transformed data is loaded into PostgreSQL, enabling optimized querying for BI tools.
Apache Airflow is critical for orchestrating each stage of the ETL pipeline, enabling automated, reliable data workflows. In this setup, Airflow DAGs (Directed Acyclic Graphs) define the sequence and dependencies of tasks from extraction to visualization. The DAG is scheduled to run weekly to match the incoming data cadence, ensuring fresh data is processed and loaded into the data warehouse in PostgreSQL.
To ensure data quality and pipeline reliability, the ETL system incorporates comprehensive logging and monitoring mechanisms. Each stage of the pipeline logs its execution details, including task statuses, processing times, and potential errors. These logs are stored in a dedicated logs table in PostgreSQL, allowing for centralized monitoring and historical tracking
Using BI tools (eg. Power BI, Tableau,Looker Studio) the PostgreSQL database is connected to create visualizations such as:
Try it: https://lookerstudio.google.com/reporting/042ab5ef-2863-4bfb-a119-2bb2765742e2
This Inflation Analysis System demonstrates the power of data engineering principles in solving real-world challenges. The automated ETL pipeline, built with Airflow and PostgreSQL, provides a scalable and efficient foundation for economic insights. By integrating BI tools, the system transforms raw data into actionable intelligence for better decision-making.