0 / 9 complete0%

Step 04 of 9 3-4 weeks· advanced

Step 4: PySpark Transformation Framework

Establish PySpark standards before writing many notebooks. Shared utility library, Bronze / Silver / Gold templates, performance patterns, testing approach.

What you're doing in this step

Build a shared utility library (wheel package or %run notebooks) with reusable functions. Create Bronze ingestion, Silver transformation (cleansing + MERGE upsert), and Gold dimensional notebook templates. Document performance patterns (partitioning, broadcast joins, caching). Establish a testing approach. Create a code-review checklist for PySpark notebooks.

Recommended prompts

Use one of these to do the work in your IDE

Open the template to read it in full. Click Copy prompt to grab it (with your stack values pre-filled where they apply) — then paste into Claude Code, Cursor, or wherever you build.

Primary recommendation reference document; 1 day for initial setup

PySpark Transformation Standards

Standards for PySpark transformations in Bronze/Silver/Gold pipelines: idempotency, partitioning, Delta Lake patterns, and code organization.

azurepysparkfabricdatabricks

View template

Template· Template 1-2 days per dimension

Slowly Changing Dimensions (SCD) Implementation in PySpark

Implement SCD Type 1, 2, 3, and 6 patterns in PySpark with Delta Lake MERGE — for dimensional modeling in modern data platforms.

Use this when: Building the dimension-table side of the framework with proper SCD handling

azurepysparkdelta-lakefabric

View template

Recommended skills

Drop these into Claude Code for this phase

Skills auto-trigger on the right kind of request. Install once; they apply to every prompt that fits.

Skill· Skill 5 min setup

Spec-Driven Builder Skill

Tool-neutral skill that walks developers through PRD → stories → schema → API → tests for any new feature, producing real artifacts at each step. The methodology is identical on every supported tool.

claude-codecopilotcursor

View skill

Skill· Skill 5 min setup

Test Generator Skill

Claude Code skill that picks the right test type (unit/integration/E2E) based on context and applies Evoke's testing patterns automatically.

claude-code

View skill

Skill· Skill 5 min setup

Code Reviewer Skill

Claude Code skill that performs comprehensive code review on PRs and diffs, prioritized by severity with concrete fixes.

claude-code

View skill

Recommended MCP configs

Wire these tools into Claude Code first

MCP servers give Claude Code direct access to external systems (Jira, browsers, databases). Configure once.

MCP config· MCP config 10 min setup

Azure DevOps MCP for Evoke

Pre-configured Azure DevOps MCP server for Claude Code — work items, repos, PRs, and pipelines from chat.

claude-codemcp

View config

MCP config· MCP config 5 min setup

Filesystem MCP for Evoke

Pre-configured filesystem MCP server for Claude Code — safe, scoped read/write access to project files.

claude-codemcp

View config

When you're done

Verify these in your own work before moving on

This is a checklist for you to mentally tick off in your repo and IDE — the site doesn't track it, you do.

Shared utility library deployed and tested
Standard notebook templates created (Bronze / Silver / Gold)
Bronze, Silver, Gold patterns proven on a real entity
Performance patterns documented with examples
Testing approach proven
Code-review checklist for PySpark notebooks created

Common pitfalls

What goes wrong at this step

Treating notebooks as scripts — no structure, no docs, no error handling
Copy-paste boilerplate everywhere — build a shared library
Spark for small data (<1GB) — use SQL / pandas / DuckDB instead
No idempotency — re-runs create duplicates
inferSchema in production — slow, unpredictable
.collect() on large data — OOM disasters

← Previous step

Use one of these to do the work in your IDE

PySpark Transformation Standards

Slowly Changing Dimensions (SCD) Implementation in PySpark

Drop these into Claude Code for this phase

Spec-Driven Builder Skill

Test Generator Skill

Code Reviewer Skill

Wire these tools into Claude Code first

Azure DevOps MCP for Evoke

Filesystem MCP for Evoke

Verify these in your own work before moving on

What goes wrong at this step

Command Palette