Skip to content

Instantly share code, notes, and snippets.

@fry69

fry69/summary.md Secret

Created June 30, 2025 07:25
Show Gist options
  • Select an option

  • Save fry69/79bd6897f185df7671edea5699e33a4c to your computer and use it in GitHub Desktop.

Select an option

Save fry69/79bd6897f185df7671edea5699e33a4c to your computer and use it in GitHub Desktop.
Agentic Coding: The Future of Software Development with Agents

"Agentic Coding: The Future of Software Development with Agents" by Armin Ronacher (@mitsuhiko.at‬)

YouTube video: https://www.youtube.com/watch?v=nfOVgz_omlU

Bluesky announcement post with comments: https://bsky.app/profile/mitsuhiko.at/post/3lspd5bj6kc2e

Simon Willison's Post (short overview): https://simonwillison.net/2025/Jun/29/agentic-coding/


Extracted with NotebookLM. I just added the YouTube link as a source and then used this prompt:

Can you extract strcuture and the tips from the slides used in this video and create blog post style markdown document from them, please?

Slight manual cleanup was necessary (clot -> claude)


Agentic Coding: The Future of Software Development (and Why I'm Hooked)

Armin Ronacher, known for his impactful work on Flask, Jinja, and Sentry, has been deeply immersed in agentic programming since early May, finding it incredibly addictive and energizing. He describes this new approach as "catnip for programmers," a sentiment widely shared within the community. This isn't just an incremental improvement in development; it represents a profound shift in how we approach software creation.

What Exactly is Agentic Coding?

Agentic coding is a software development process where AI agents actively participate in the coding process alongside humans. It marks a significant departure from more common tools like Cursor or GitHub Copilot, which are primarily designed for autocomplete functionality.

Instead of the AI merely finishing your thoughts, agentic coding fosters a real-time collaboration between human and AI. This often involves a dynamic back-and-forth interaction, whether through reviewing the agent's completed work or by actively observing its progress as it executes tasks. Tools such as Claude and AMP enable you to break down larger problems into smaller, manageable tasks that agents then execute step-by-step. Remarkably, these agents can operate for extended durations—Claude has successfully worked on substantial tasks for over four hours—and are proficient at conserving context by delegating work to sub-agents.

Ultimately, the success of agentic coding heavily relies on how you provide data to the agent and how effectively you set it up for success; there's a considerable skill involved in mastering this new paradigm.

Why the Sudden Explosion in Agentic Programming?

The rapid surge in interest in agentic programming, particularly since mid-May, can be attributed to two primary factors:

  1. Advanced Model Capabilities (GPT-4 & Opus 4): These models appear to have been trained extensively on tool usage, making them uniquely suited for agentic workflows. They possess a "hardcoded" understanding of fundamental tools like text editing and computer usage, even if these aren't explicitly supplied via API. Currently, no other models come close to their proficiency in tool usage for agentic programming.
  2. Anthropic's Claude Code: Anthropic not only developed models with superior tool usage but also released Claude code, a practical agent that can be run to experiment with these capabilities. This directly spurred the explosive growth of many Claude-inspired agents.

Beyond the technical advancements, the immense value users derive from these tools, coupled with their cost-effectiveness compared to traditional API token usage, has significantly contributed to their widespread adoption. Subscriptions, even the larger ones, offer a much more economical way to leverage these powerful tools.

Understanding the Risks: Claude's "YOLO" Mode

While agentic coding, like any powerful technology, carries inherent risks (notably the unsolved problem of prompt injection). The practical trade-off with Claude's "dangerous skip permissions" or "yolo mode" (running with full system permissions) appears to be acceptable for typical projects. Surprisingly, the speaker notes that 100% of hooked users run Claude in this mode, and it's "shockingly good" without typically causing system damage. Though potential issues like malicious instructions embedded in GitHub issues exist, they haven't been a significant practical concern so far.

Claude vs. Cursor: A Practical Comparison

For those familiar with Cursor, here's how Claude differentiates itself:

  • Tool Limits & Longevity: Cursor generally has a low tool limit and struggles with long-running operations. While it has background agents, they're not yet effective. Claude, conversely, can run for hours on complex tasks.
  • User Interface Philosophy: Claude de-emphasizes the editor through its terminal user interface (TUI). While the TUI isn't ideal, its success stems from features like remote SSH access, which greatly facilitates experimentation. The speaker finds direct interaction with tools like Claude or AMP far more engaging than switching context to outputs from detached background agents.
  • Long-Term Vision: While background and cloud agents are likely the future, Claude's current "scrum solution" proves more effective in practice. Improving developer environments and tooling is crucial for future agent systems to reduce the cost of context switching when integrating agent output.
  • Practical Usage Shift: The speaker's experience, echoed by many others and even observed on the Cursor subreddit (where a third of posts are about Claude), indicates that Cursor's utility has largely diminished to reviewing code, with Claude performing the actual development work.
  • Trust and Reliability: A critical distinction is trust. Cursor has occasionally performed "scary" actions, like deleting files it shouldn't have, eroding trust. Claude, on the other hand, instills a sense of safety, building user confidence over time.
  • Financial Recommendation: Given the rapid evolution of these tools, the speaker strongly advises against annual commitments. Month-to-month subscriptions are preferable, allowing flexibility as tools evolve.

Beyond Claude: Other Agentic Coding Tools

While Claude is the speaker's favorite, other notable tools include:

  • OpenCode: Described as "very, very good" and compatible with a Claude subscription.
  • AMP: Uses Anthropic models and is "pretty decent," though its pricing model might not be optimal for frequent programmers.
  • Codex: The speaker did not have good results with Codex, but it remains an option for those with expensive OpenAI subscriptions.
  • Gemini CLI: On its own, the speaker found Gemini CLI ineffective. However, he uses it successfully from within Claude for tricky tasks, particularly for reading PDFs and summarizing large contexts. This ability to trivially nest agents (one agent running another) is a key advantage of terminal-based agents over editor integrations for exploratory work.

Diverse Applications: How Claude is Used Beyond Coding

The speaker utilizes Claude much like a versatile terminal, extending its use far beyond traditional programming tasks:

  • CI/CD Management: Used to set up and debug Continuous Integration (CI).
  • System Configuration: Helps reconfigure machines and apply significant system changes, like adjusting Git flags.
  • On-the-Spot Tool Creation: Creates and uses small, throwaway tools as needed.
  • Information Gathering: Used to browse the internet and even create presentations.
  • Complex Debugging: Swiftly diagnosed a Go path issue within Xcode on macOS by identifying paths v, a process that would have taken much longer manually. The speaker often runs Claude to debug in the background, racing to see who solves the problem first.
  • Media and Automation: Successfully used yt-dlp via uxv to download a video and embed it into slides. It also remotely controlled a browser to list an item on an Austrian classifieds site, handling descriptions, prices, and pictures.
  • Resource Management: Identified large files for deletion to free up disk space.

Key Recommendations and Learnings for Agentic Coding Success

Based on weeks of intensive use, the speaker offers several practical recommendations:

1. Simplicity in Code & Ecosystem

  • Language Choice: Go, PHP, and "basic Python" are highly effective for agentic coding. Basic Python refers to simple, less convoluted code that heavily uses the standard library, has few dependencies, and clear naming, resembling shell scripts or data pipelines.
  • Minimize Ecosystem Churn: The less churn in an ecosystem, the better. While JavaScript has significant churn, stable parts like React work well because they change less. Conversely, something like Tailwind can confuse models due to meaningful differences between versions (e.g., v3 vs. v4).
  • Codebase Consistency: Avoid competing patterns within a single codebase, as they confuse agents. If present, provide explicit context about which patterns are outdated.
  • Descriptive Naming:
    • Long Function Names are beneficial for the AI, as agents don't always fully grasp namespaces.
    • For subtasks, unique naming helps agents find information and prevents accidental code duplication, which can be hard to detect in reviews.

2. Improve Your Development Environment

  • Centralized Logging & Observability: This is the most crucial factor for agent success.
  • Agent-Friendly Tools: Agents will inevitably misuse tools. If your tools behave poorly when misused, the agent won't succeed. Tools must be robust.
  • Speed is Paramount: If a tool takes too long, the agent will time out and stop using it.
  • Agent Observability: For agents, observability means plain, simple text output, good exceptions, and log files—not complex systems like OpenTelemetry. The agent needs to clearly see and understand the output and how to get more of it.
  • Protect Against Tool Misuse: Tools should error clearly and indicate what went wrong if used incorrectly. An example of a poorly protected tool is Rust's test system saying "run zero tests successfully" when given an invalid test name, leading the agent to incorrectly assume success.

3. Enable Agent-Created Scratch Tools

  • Provide clear instructions (e.g., in a claude.mv file) on how to write, place, and run throwaway code for experimentation.
  • Early in a project, test Claude's ability to create and run a tool; if this works, you're off to a good start.

4. Minimize Multi-Context Problem Solvers (MCPs)

  • The speaker personally uses only one MCP: Playwright, as it's truly necessary.
  • Avoid other MCPs for coding agents: They pollute the context, dedicating valuable context to tool definitions. Agents' base models are better at writing code than at using MCPs.
  • Prefer Command Line Tools (CLIs): For example, use the GitHub CLI instead of a GitHub MCP. CLIs are more successful because agents can embed them within shell scripts to compose larger, sequential pipelines, a capability not possible with MCPs which can only run in the main agentic loop.
  • Human Usability: MCPs are cumbersome for humans to use and debug directly. CLIs generated by the agent are much easier for human interaction and debugging.
  • Future Outlook: MCPs may become more successful when models have larger context windows.

5. Effective Context Management

  • Reduce Context Bloat: The primary goal is to keep the context size down.
  • Streamline Codebase Exploration: Prevent agents from "spelunking" inefficiently through the codebase. Provide tools that summarize relevant parts (e.g., a make go methods tool to list Go methods), giving the agent a more focused subset of information.
  • Controlled Logging: Offer methods to read the last 20 lines of combined logs, ensuring critical information is present, and provide instructions for retrieving more logs if needed. Crucially, ensure logs do not contain confusing information.
  • Utilize Subtasks/Sub-agents: Claude supports breaking down problems into subtasks for sub-agents, which helps conserve context. This requires skill but can yield positive results, especially for prototypes.
  • Avoid Context Compaction: If an agent reaches a point where it needs to compact its context, you're "kind of lost." Compaction introduces randomness, making success unpredictable. It's often better to abort and restart from scratch.
  • Prevent Context Rot: This occurs when context is filled with past failures or undone actions, potentially causing the AI to backtrack or reintroduce deleted code. A major source of context rot is a pre-broken development environment, forcing the agent to waste context fixing it instead of making pure forward progress.
  • Promote Forward Progress: Use systems that inherently support clear forward progress. Go's test caching, which only runs tests for changed code, is excellent because it provides clean, relevant output. In contrast, systems where agents must select individual tests (e.g., Rust) can lead to misselections, false positives, and confusion.

6. Unified Logging for Web Applications

  • Centralize All Logs: For web applications, forward browser console logs to the server and then log them into the same file as server logs. This allows the agent to tail a single log file and understand both client and server actions in their correct order, aiding in debugging initial failures.
  • Include SQL Logs: Integrate SQL logs to help the agent understand data flow to the database. Provide a condensed version for general use and a full log when detailed inspection is needed.
  • Makefile for Tools: Ensure all your tools are accessible via a Makefile, as Claude is proficient at running them.

7. Managing Multiple Processes and Concurrency

  • Current Challenge: Agents still struggle to fully understand how multiple processes are orchestrated and interact.
  • Synchronization Points: Implement utility functions (e.g., reached) that emit events within background processes. This allows the agent to await specific synchronization points from the outside, avoiding reliance on timeouts.
  • Future Goal: The ideal solution would be lock-step execution like a debugger, but this is an ongoing area of development.

8. Debugging CI Effectively

  • Powerful Use Case: Claude is exceptionally good at debugging CI.
  • Browser-Based Debugging: The agent can sign into a browser and debug CI by navigating and clicking.
  • GH CLI Integration: Claude understands the GitHub CLI tool, enabling it to debug applications directly within the CI environment.
  • Draft PR Workflow: You can configure an agentic flow where Claude makes code changes, creates a draft pull request, waits for CI to fail, and then proceeds to fix the issue. This is invaluable for hard-to-debug CI problems.

"Witch Stuff" and Beyond: Creative Uses

The speaker also highlights some more "witchy" or creative applications of Claude:

  • Nested Agents for Specialized Tasks: Using Gemini CLI from within Claude specifically for reading PDFs and summarizing large contexts. While the output streaming isn't ideal, the ability to nest agents for specialized tasks is powerful.
  • Automated Downloads: Used the yt-dlp tool via uvx to download a video directly into a presentation.
  • Web Automation: Remotely controlled a browser to post an item for sale on an Austrian classifieds website, handling all details.
  • System Reconfiguration: Made significant system changes, such as adjusting Git flags, when the speaker wasn't sure how to.
  • Disk Space Optimization: Identified and listed large files in cache folders for deletion, aiding in disk space management.

The Future of Agentic Computing

The combination of Large Language Models (LLMs) and agentic loops holds immense power, and we are just at the beginning. While programmers are among the first to experience this transformation, many more real-world tasks involving "deep research" (e.g., Google results, summarization) and automation will become safely possible. The blend of code writing and reasoning promises a very interesting future. The speaker encourages everyone to interact more with these tools, as it feels truly transformative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment