Playbook
0 / 9 complete0%
  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 06
  7. 07
  8. 08
  9. 09
Step 05 of 9 3-4 weeks· advanced

Step 5: Data Quality Framework

Build the DQ framework before pipelines ship to production. Schema, freshness, volume, nulls, duplicates, ranges, referential integrity, business invariants.

What you're doing in this step

Set up DQ test patterns, a DQ runner integrated into the pipeline framework, a quarantine pattern for bad data (separate clean from bad rather than always failing the pipeline), a DQ reporting dashboard, SLO definitions for critical tables, and alerting on violations. Pick tooling — Great Expectations or PySpark-native assertions in the shared library.

Recommended prompts

Use one of these to do the work in your IDE

Open the template to read it in full. Click Copy prompt to grab it (with your stack values pre-filled where they apply) — then paste into Claude Code, Cursor, or wherever you build.

Primary recommendation 1-2 days for initial setup

Data Quality Test Suite

Generate comprehensive data quality tests for ETL pipelines: schema validation, freshness checks, null/duplicate/range checks, and business invariants.

azurepysparkfabric
View template
Template· Template 1-2 days

Behavior Parity Test Suite

Generate tests that lock down current legacy behavior so the new system doesn't accidentally change it during migration.

Use this when: DQ also has to prove parity with a legacy system being replaced

View template
Recommended skills

Drop these into Claude Code for this phase

Skills auto-trigger on the right kind of request. Install once; they apply to every prompt that fits.

Skill· Skill 5 min setup

Data Validation Skill

Claude Code skill that compares old and new system outputs for parity — running validation queries on both DBs and reporting drift.

claude-codesql
Skill· Skill 5 min setup

Test Generator Skill

Claude Code skill that picks the right test type (unit/integration/E2E) based on context and applies Evoke's testing patterns automatically.

claude-code
Recommended MCP configs

Wire these tools into Claude Code first

MCP servers give Claude Code direct access to external systems (Jira, browsers, databases). Configure once.

MCP config· MCP config 10 min setup

Azure DevOps MCP for Evoke

Pre-configured Azure DevOps MCP server for Claude Code — work items, repos, PRs, and pipelines from chat.

claude-codemcp
MCP config· MCP config 5 min setup

Filesystem MCP for Evoke

Pre-configured filesystem MCP server for Claude Code — safe, scoped read/write access to project files.

claude-codemcp
When you're done

Verify these in your own work before moving on

This is a checklist for you to mentally tick off in your repo and IDE — the site doesn't track it, you do.

  • DQ framework deployed and tested
  • Standard test patterns documented
  • Quarantine pattern working
  • DQ reporting dashboard created
  • SLOs defined for the first set of tables
  • Alerting integrated
  • DQ runs as part of every pipeline (not separate)
Common pitfalls

What goes wrong at this step

  • Skipping DQ until production — discovers issues in production. Build it in from day 1
  • DQ tests that always pass — broken tests give false confidence
  • No quarantine pattern — pipeline either fails or ships bad data, no middle ground
  • DQ separate from pipelines — drifts out of sync; rules don't apply consistently
  • No alerting on violations — silent failures are worst
← Previous step

Command Palette

Search for a command to run...