0 / 9 complete0%

Step 05 of 9 3-4 weeks· advanced

Step 5: Data Quality Framework

Build the DQ framework before pipelines ship to production. Schema, freshness, volume, nulls, duplicates, ranges, referential integrity, business invariants.

What you're doing in this step

Set up DQ test patterns, a DQ runner integrated into the pipeline framework, a quarantine pattern for bad data (separate clean from bad rather than always failing the pipeline), a DQ reporting dashboard, SLO definitions for critical tables, and alerting on violations. Pick tooling — Great Expectations or PySpark-native assertions in the shared library.

Recommended prompts

Use one of these to do the work in your IDE

Open the template to read it in full. Click Copy prompt to grab it (with your stack values pre-filled where they apply) — then paste into Claude Code, Cursor, or wherever you build.

Primary recommendation 1-2 days for initial setup

Data Quality Test Suite

Generate comprehensive data quality tests for ETL pipelines: schema validation, freshness checks, null/duplicate/range checks, and business invariants.

azurepysparkfabric

View template

Template· Template 1-2 days

Behavior Parity Test Suite

Generate tests that lock down current legacy behavior so the new system doesn't accidentally change it during migration.

Use this when: DQ also has to prove parity with a legacy system being replaced

View template

Recommended skills

Drop these into Claude Code for this phase

Skills auto-trigger on the right kind of request. Install once; they apply to every prompt that fits.

Skill· Skill 5 min setup

Data Validation Skill

Claude Code skill that compares old and new system outputs for parity — running validation queries on both DBs and reporting drift.

claude-codesql

View skill

Skill· Skill 5 min setup

Test Generator Skill

Claude Code skill that picks the right test type (unit/integration/E2E) based on context and applies Evoke's testing patterns automatically.

claude-code

View skill

Recommended MCP configs

Wire these tools into Claude Code first

MCP servers give Claude Code direct access to external systems (Jira, browsers, databases). Configure once.

MCP config· MCP config 10 min setup

Azure DevOps MCP for Evoke

Pre-configured Azure DevOps MCP server for Claude Code — work items, repos, PRs, and pipelines from chat.

claude-codemcp

View config

MCP config· MCP config 5 min setup

Filesystem MCP for Evoke

Pre-configured filesystem MCP server for Claude Code — safe, scoped read/write access to project files.

claude-codemcp

View config

When you're done

Verify these in your own work before moving on

This is a checklist for you to mentally tick off in your repo and IDE — the site doesn't track it, you do.

DQ framework deployed and tested
Standard test patterns documented
Quarantine pattern working
DQ reporting dashboard created
SLOs defined for the first set of tables
Alerting integrated
DQ runs as part of every pipeline (not separate)

Common pitfalls

What goes wrong at this step

Skipping DQ until production — discovers issues in production. Build it in from day 1
DQ tests that always pass — broken tests give false confidence
No quarantine pattern — pipeline either fails or ships bad data, no middle ground
DQ separate from pipelines — drifts out of sync; rules don't apply consistently
No alerting on violations — silent failures are worst

← Previous step