Build Your First End-to-End Pipeline
In this tutorial, we'll build a complete data pipeline that extracts data from a public API, transforms it using Python, loads it into a local SQLite database, and visualises the results with a simple dashboard. No cloud services required — everything runs on your laptop.
Step 1: Extract
We'll use the OpenWeatherMap API to fetch weather data. Start by signing up for a free API key. Then write a Python script that requests current weather data for your city and saves the raw JSON response.
import requests
import json
API_KEY = "your_key"
CITY = "Silchar"
url = f"https://api.openweathermap.org/data/2.5/weather?q={CITY}&appid={API_KEY}"
response = requests.get(url)
data = response.json()
with open("raw_weather.json", "w") as f:
json.dump(data, f)
Step 2: Transform
Read the raw JSON and extract only the fields we care about: temperature, humidity, pressure, and weather description.
import json
with open("raw_weather.json") as f:
data = json.load(f)
transformed = {
"city": data["name"],
"temp_c": round(data["main"]["temp"] - 273.15, 1),
"humidity": data["main"]["humidity"],
"pressure": data["main"]["pressure"],
"description": data["weather"][0]["description"],
"timestamp": data["dt"]
}
with open("weather_clean.json", "w") as f:
json.dump(transformed, f)
Step 3: Load
Create a SQLite database and a table, then insert the cleaned data.
import sqlite3, json
conn = sqlite3.connect("weather.db")
conn.execute("CREATE TABLE IF NOT EXISTS weather (city TEXT, temp_c REAL, humidity INT, pressure INT, description TEXT, timestamp INT)")
with open("weather_clean.json") as f:
row = json.load(f)
conn.execute("INSERT INTO weather VALUES (?, ?, ?, ?, ?, ?)",
(row["city"], row["temp_c"], row["humidity"], row["pressure"], row["description"], row["timestamp"]))
conn.commit()
Step 4: Visualise
Use Streamlit to build a quick dashboard that reads from the database and plots temperature trends over time.
import streamlit as st
import sqlite3
import pandas as pd
conn = sqlite3.connect("weather.db")
df = pd.read_sql("SELECT * FROM weather ORDER BY timestamp", conn)
st.line_chart(df.set_index("timestamp")["temp_c"])
st.write(f"Average temperature: {df['temp_c'].mean():.1f}°C")
Automate It
Wrap the entire pipeline in a single Python script and schedule it with cron (Linux/macOS) or Task Scheduler (Windows). Congratulations — you've just built a production-ready data pipeline!

