Admissions Open 2026
Skip to content

Automate Your Data Quality Checks

Mar 16, 2026·9 min read

Data quality isn't optional — it's the difference between trusted dashboards and decisions made on bad data. Here's how to set up automated quality checks using two of the most popular tools in the ecosystem.

Great Expectations

Great Expectations lets you define expectations for your data (e.g., "column X should never be null" or "values in column Y should be between 0 and 100"). It generates data documentation and can send alerts when expectations fail.

Start by installing the library and initialising a project:

pip install great_expectations
great_expectations init

Define Your First Expectation

import great_expectations as ge

df = ge.read_csv("sales.csv")
df.expect_column_values_to_not_be_null("order_id")
df.expect_column_values_to_be_between("revenue", 0, 100000)
df.expect_column_pair_values_to_be_in_set("status", "delivery_status", [["shipped", "delivered"]])

Integrate with dbt

dbt has built-in tests that run as part of your transformation pipeline. Add tests to your schema.yml:

version: 2
models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: revenue
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0

Automate with CI/CD

Run your Great Expectations suite and dbt tests in your CI pipeline (GitHub Actions, GitLab CI, etc.) on every code push. Fail the build if any critical test fails. This catches data issues at merge time, before they reach production.

For production pipelines, add a scheduled job that runs quality checks every hour and posts results to Slack or PagerDuty.