Skip to content

Instantly share code, notes, and snippets.

@gojira
Last active January 12, 2025 19:46
Show Gist options
  • Save gojira/594cda1b57e2c7db71a213056f57b8a6 to your computer and use it in GitHub Desktop.
Save gojira/594cda1b57e2c7db71a213056f57b8a6 to your computer and use it in GitHub Desktop.
Solve It With Code Lesson 8 Inspiration

Hi everyone - Lesson 8 with Eric Ries really unlocked my brain on this course & Solve It.

Lean-all-the-way-down: accelerating encounters with the real world

I often struggle at my day job to promote fast iteration. The conversation with Jeremy, Johno, & Eric inspired me that you can apply Lean to all layers right down the single dialog cells. This is a point of view I never heard articulated and it really makes sense. A popular phrase these days is "You can just do things" and you can think of that as a spiritual cousin.

I work primarily in product so a lot of my output is to write stuff down. I find that "just start somewhere" and iterating is best with writing, and even more importantly, then getting it out in front of customers.

Answer.ai

The Lesson 8 conversation also gave me a vibe that answer.ai is working kind of like what I imagine it must have been like developing Unix & B / C at Bell Labs. I don't know if Kernighan and Ritchie knew what they were building ahead of time. I can imagine that they worked from a strong sense of "this would be cool to build" and just kept iterating based on if they keep using it themselbesv. I believe Jeremy looked at the history of labs like these at the start of answer.ai. It's interesting that I get this echo vibe now.

Dialogs for Training Reasoning Models

I'm sure many of us are familiar with "reasoning" models like o1 and Deepseek R1. These models are trained to do better with things like math and coding by outputting reasoning steps (chains of thought) along the way to the answer.

Let's think back to the cause of the ChatGPT / Cursor / Artifact "big chunk of code" doom loop. The reason that non-reasoning models behave this way is because they are trained to "one shot" the final code. More precisely, a lot of the training data consists of final code output only. The training data is missing the intermediate steps that a human (usually) used to create the finall code.

Guess what? Solve It Dialogs include every step a person used to output the final code. Not only that, they include thinking steps that the human + AI used to get to the code. This is potentially immeasurably valuable data for future LMs. In fact, full wall-time Dialog traces, not cleaned-up final Dialogs might be even better for training data (for the same reason that final code output is less useful training data than those including steps).

Using GitHub Issues & PRs also provide a version of this process trace information. Arguably Solve It provides better signal because PRs still don't include the steps take to produce the code changes. And of course I'm a Python primary coder and so as long it works for Python it's all good for me ;).

This recent paper is a good overview of how things like Dialogs fit with the training of reasoning modelsTowards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought.

SIWC Reflection

I'll end with some context. You probably haven't noticed me before in this class. I started Lessons 1 & 2 more or less in sync with the class but then day job went into high gear and I just caught up through Lesson 6 last weekend. Prior to SIWC I experienced the ChatGPT / Cursor / Artifacts "big chunk of code" doom loop & given up many times so I am very interested to see where the class goes especially as we hopefully add things like fasthtml and data analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment