Skip to content

Instantly share code, notes, and snippets.

@jwhiting
Last active July 18, 2024 21:51
Show Gist options
  • Save jwhiting/dcbc751d7081b37a4b55e7bd0afcbeab to your computer and use it in GitHub Desktop.
Save jwhiting/dcbc751d7081b37a4b55e7bd0afcbeab to your computer and use it in GitHub Desktop.
JW's accumulated tips for using llms as a dev

JW's accumulated tips for using LLMs as a dev

These mostly came out of various workshops, talks, and conversations at the AI World's Fair SF 2024. For example, the GitHub Copilot sessions, and there were other sessions dedicated to developer productivity such as Manuel Odenhal's session (link). I've aggregated the thinking on my own added more of my own thoughts as I've been incorporating more AI tools as a developer.

What LLMs are good at

  • They are excellent at translation. Given input material in form X, produce output in form Y.
  • They are excellent at reproducing and remixing commonly seen coding tasks
  • They are great at generating feasible starting points for solutions given well-described, constrained problems.
  • They are helpful for assisting in the higher-level design process and architectural thinking behind software development through brainstorming, thinking things through out loud, raising points that are obvious in retrospect that you didn't think of, etc. Talking through your task with the LLM ahead of time can save a lot of time by surfacing issues up front especially if you give it good prompts (see good prompts below).

What they are not good at

  • They don't have strong analytical reasoning capability.
  • They miss details, especially as the length of input increases (and output, which becomes input for the next tokens).
  • Therefore they can't generate perfect, bug-free, secure code, which requires strong reasoning and ability to reliably catch all details.
  • This is also huge part of why "agents" (semi-autonomous AI actors that use LLMs for planned workflows and multi-step tasks) are extremely unreliable.

Since they don't reason well and miss details, don't waste time expecting them to do that. Plan for it. How many devs are not using AI because they got a wrong answer? Aggressively generate and plan to throw away most of it, you still save time.

This being said, how to get the best results?

Use good prompts

Include:

  • Context: what's the task
  • Intent: what is your goal and purpose in mind
  • Clarity: ambiguous language that can be interpreted many ways will generate misses. clearly define the desired result.
  • Specificity: be as specific as possible and state all expectations, known constraints, requirements, use cases, etc.
  • Examples help (aka one- or few-shot prompting) when possible.
  • Role statements sometimes help: "act as a python programmer who job is to do X, thinks about Y, etc."
  • For code, treat it like a junior engineer who can reuse and remix what it has already seen, not a senior programmer with general purpose creative reasoning. Don't confuse the ability to remix examples in the data set for seniority. (For novel coding challenges published after 2021 that aren't in ChatGPT's training dataset, research shows its performance drops massively. link)

Use regeneration and temperature

  • For any nonzero temperature, regenerate at least a little to see how much variation you're getting. A second or third result might do the trick the first didn't. It will also show you faults with your prompt and how you might sharpen it up.
  • Use low or zero temperature for most coding tasks especially for a very thorough prompt, when doing language domain translation tasks, etc.
  • Increase temperature for chat use cases and conversational utility
  • Higher temp is rare in dev but can be useful for speculative, creative, wild brainstorming tasks at the higher level when thinking through task or coming up with alternative solutions.

Good human coding practices lead to better results from AI

  • Use good names and natural language in the code. The variable name "annual_revenue" is a better than "rev" because it invokes the LLM's finance domain knowledge.
  • Use functional programming. Smaller units of code that have no side effects are easier not just for humans but also for LLMs to reason about, write tests for, debug, etc., because they don't rely on, or mutate state in, distant unseen places.
  • Separate concerns, e.g. config from logic, presentation from control code, etc, allowing the LLM to focus on a single problem domain when possible.
  • Be consistent. Generated code tries to follow your existing code style.
  • Docs and comments, even if AI-generated, provide context to future prompts over that code, not just for other devs. If the code is fully or mostly generated from a prompt, include that as a comment or docstring.
  • Code comments that give examples of input/output and use cases are very helpful.
  • Generate test cases by asking the LLM to use an adversarial mindset. See "role statements" above. Have it act as an adversary and explicitly identify edge cases and write tests from that point of view.

LLMs don't have to focus on just code in order to be useful for development

  • Represent problems and code in intermediate states like DSLs, config YAML, pseudocode, etc, so that LLM i/o is on higher-level representations of your problem space. Remember they are excellent translators, so how can you model a problem as one of language translation instead of code gen?
  • Have a bag of tricks for types and formats of the output you might ask for, and try a different format if you're not getting what you want - the results might usefully suprise you. For example, ask for a YAML file to represent a DSL over the problem, or code that generates code, or have it role play someone in your position or an end user of the code.
  • Ask the LLM to perform world simulation: you can ask it to act as your program or system architecture itself and report on its internal activity, state changes, logic flows, etc given inputs and events.

Collect and iterate on small utilities/scripts

Build a personal toolkit that you can invest in and generate compound interest.

  • If a task can reasonably be scripted, try generating it. Generate and throw away lots of small scripts, keep and iterate on the ones that are useful more than once.
  • Roll prompts you use often into scripts that take command line arguments
  • Write scripts to extract context from your codebase to use in prompts (e.g. all your classes, function names, with docstrings, with arguments to filter it to particular domains if needed) so that the whole code structure or a particular domain can be included in prompts). Code copilots are doing this now, Aider for example is an OSS copiot that uses treesitter to analyze the codebase and provide a "map". You can use Aider for that purpose alone.
  • Explore the expanding space of CLI LLM tools like Chatblade

Allocate 10-20% to trying new, uncomfortable things

Or more at first... we're all busy and have tickets to close, but, you'll never see the benefit of new tools if you don't deviate from your regularly scheduled program and try new ideas. We're in time of extreme acceleration in productivity but very little guidance to realize it, but the ROI is worth it both immediately and in the long-term. There's a lot of tools emerging all the time and they are getting better and better. Go beyond the big corporate products (Github Copilot etc) and look into smaller projects and tools as well which can be really useful. Just get started.

Specific tips for Github copilot

  • Autocomplete is designed to be fast to generate but has less context: just the current file and tabs open. Whereas inline chat pulls from the whole workspace into a larger context window. They might use different models and they change these over time.
  • Learn the chat commands including /explain, @github, @workspace and use them often.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment