Skip to content

Instantly share code, notes, and snippets.

@przadka
przadka / 1-save-for-later.md
Created March 5, 2026 11:54
Claude Code slash commands for saving and resuming work sessions via dev logs
description argument-hint
Save session context to dev log for later resumption. Use when stopping work mid-task.
[additional_notes]

Save current session to dev_docs/dev-log-<branch>.md (branch name sanitized, e.g., feature/authfeature-auth).

Working State (overwrite this section each time)

Update the ## Working State section at the top of the file. If the file doesn't exist, create it. This section is always overwritten — it's the fast-orientation block.

@przadka
przadka / ask-claude.md
Created March 5, 2026 11:20
Claude Code slash commands for getting independent code reviews from Codex CLI and Claude CLI
description argument-hint allowed-tools
Get independent review from Claude CLI
what to review (e.g., 'last commit', 'staged changes', 'src/parser.ts')
Bash(claude*), Bash(git*), Bash(rm*), Bash(cat*), Read, Write

Claude Review

Get an independent review from another Claude instance via CLI.

@przadka
przadka / consultant-profile.md
Last active March 4, 2026 16:16
Consultant profile — Michał Prządka (RFC)

Consultant Profile — Michał Prządka

Internal reference for proposal writing. Not for client distribution.

The short version

A former board member at a global corporation who now writes Python and reviews PRs at 6am. 20+ years across business, technology, and data — from quantitative analysis through engineering leadership through C-suite, now building AI systems hands-on.

Why clients hire him

@przadka
przadka / REPORT.md
Last active February 6, 2026 05:41
Cheddar Bench Results: AI Coding Agents Bug Detection Benchmark

Cheddar Bench Results: How Good Are AI Coding Agents at Finding Bugs?

We tested three AI coding agents on their ability to find bugs through code review. The benchmark uses a self-play approach: agents inject bugs into codebases, then other agents try to find them blind. No human labeling required.

Bottom line: Claude found 65.5% of bugs, Codex found 34.7%, Gemini found 23.2%. About a quarter of all bugs went undetected by everyone.

Results Summary

Agent Detection Rate Bugs Found
@przadka
przadka / benchmark-results.md
Last active February 2, 2026 06:29
CLI Coding Agents Bug Detection Benchmark - Early Results

Cheddar-bench: Can AI Coding Agents Find Each Other's Bugs?

An unsupervised benchmark for CLI coding agents. Agents inject bugs, other agents try to find them. No human labeling required - ground truth comes from the injection itself.

TL;DR

Reviewer Detection Rate
Claude 40.6%
Codex 33.0%
@przadka
przadka / gist:76216ca763a442c4d5a8e6048be4d4df
Created January 3, 2026 06:19
oh-my-opencode test prompt: Personal Finance Tracker
ultrawork: Build a personal finance tracker.
Tech stack: Python (FastAPI) backend, HTML/CSS/JS frontend.
Features:
1. Add/edit/delete transactions (income/expense)
2. Categories with custom colors
3. Monthly dashboard with charts (use Chart.js)
4. Budget limits per category with warnings
5. CSV import/export
@przadka
przadka / gist:64cb671f049e16f14736de450494fbb1
Created January 3, 2026 06:19
oh-my-opencode test prompt: Kanban Board
ultrawork: Build a full-stack Kanban board application.
Tech stack: Python (FastAPI) backend, vanilla HTML/CSS/JS frontend (no frameworks).
Features:
1. Multiple boards
2. Columns per board (drag-drop reordering)
3. Cards within columns (drag-drop between columns)
4. Comments on cards with timestamps
5. Persistent storage (SQLite)