Playbook
0 / 9 complete0%
  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 06
  7. 07
  8. 08
  9. 09
Step 04 of 9 3-4 weeks· advanced

Step 4: PySpark Transformation Framework

Establish PySpark standards before writing many notebooks. Shared utility library, Bronze / Silver / Gold templates, performance patterns, testing approach.

What you're doing in this step

Build a shared utility library (wheel package or %run notebooks) with reusable functions. Create Bronze ingestion, Silver transformation (cleansing + MERGE upsert), and Gold dimensional notebook templates. Document performance patterns (partitioning, broadcast joins, caching). Establish a testing approach. Create a code-review checklist for PySpark notebooks.

Recommended prompts

Use one of these to do the work in your IDE

Open the template to read it in full. Click Copy prompt to grab it (with your stack values pre-filled where they apply) — then paste into Claude Code, Cursor, or wherever you build.

Primary recommendation reference document; 1 day for initial setup

PySpark Transformation Standards

Standards for PySpark transformations in Bronze/Silver/Gold pipelines: idempotency, partitioning, Delta Lake patterns, and code organization.

azurepysparkfabricdatabricks
View template
Template· Template 1-2 days per dimension

Slowly Changing Dimensions (SCD) Implementation in PySpark

Implement SCD Type 1, 2, 3, and 6 patterns in PySpark with Delta Lake MERGE — for dimensional modeling in modern data platforms.

Use this when: Building the dimension-table side of the framework with proper SCD handling

azurepysparkdelta-lakefabric
View template
Recommended skills

Drop these into Claude Code for this phase

Skills auto-trigger on the right kind of request. Install once; they apply to every prompt that fits.

Skill· Skill 5 min setup

Spec-Driven Builder Skill

Tool-neutral skill that walks developers through PRD → stories → schema → API → tests for any new feature, producing real artifacts at each step. The methodology is identical on every supported tool.

claude-codecopilotcursor
Skill· Skill 5 min setup

Test Generator Skill

Claude Code skill that picks the right test type (unit/integration/E2E) based on context and applies Evoke's testing patterns automatically.

claude-code
Skill· Skill 5 min setup

Code Reviewer Skill

Claude Code skill that performs comprehensive code review on PRs and diffs, prioritized by severity with concrete fixes.

claude-code
Recommended MCP configs

Wire these tools into Claude Code first

MCP servers give Claude Code direct access to external systems (Jira, browsers, databases). Configure once.

MCP config· MCP config 10 min setup

Azure DevOps MCP for Evoke

Pre-configured Azure DevOps MCP server for Claude Code — work items, repos, PRs, and pipelines from chat.

claude-codemcp
MCP config· MCP config 5 min setup

Filesystem MCP for Evoke

Pre-configured filesystem MCP server for Claude Code — safe, scoped read/write access to project files.

claude-codemcp
When you're done

Verify these in your own work before moving on

This is a checklist for you to mentally tick off in your repo and IDE — the site doesn't track it, you do.

  • Shared utility library deployed and tested
  • Standard notebook templates created (Bronze / Silver / Gold)
  • Bronze, Silver, Gold patterns proven on a real entity
  • Performance patterns documented with examples
  • Testing approach proven
  • Code-review checklist for PySpark notebooks created
Common pitfalls

What goes wrong at this step

  • Treating notebooks as scripts — no structure, no docs, no error handling
  • Copy-paste boilerplate everywhere — build a shared library
  • Spark for small data (<1GB) — use SQL / pandas / DuckDB instead
  • No idempotency — re-runs create duplicates
  • inferSchema in production — slow, unpredictable
  • .collect() on large data — OOM disasters
← Previous step

Command Palette

Search for a command to run...