Automate Your Data Quality Checks
Data quality isn't optional — it's the difference between trusted dashboards and decisions made on bad data. Here's how to set up automated quality checks using two of the most popular tools in the ecosystem.
Great Expectations
Great Expectations lets you define expectations for your data (e.g., "column X should never be null" or "values in column Y should be between 0 and 100"). It generates data documentation and can send alerts when expectations fail.
Start by installing the library and initialising a project:
pip install great_expectations great_expectations init
Define Your First Expectation
import great_expectations as ge
df = ge.read_csv("sales.csv")
df.expect_column_values_to_not_be_null("order_id")
df.expect_column_values_to_be_between("revenue", 0, 100000)
df.expect_column_pair_values_to_be_in_set("status", "delivery_status", [["shipped", "delivered"]])
Integrate with dbt
dbt has built-in tests that run as part of your transformation pipeline. Add tests to your schema.yml:
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: revenue
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
Automate with CI/CD
Run your Great Expectations suite and dbt tests in your CI pipeline (GitHub Actions, GitLab CI, etc.) on every code push. Fail the build if any critical test fails. This catches data issues at merge time, before they reach production.
For production pipelines, add a scheduled job that runs quality checks every hour and posts results to Slack or PagerDuty.

