arrdem/llms.md

## llms.md

      
    Raw
  

              llms.md
            
          
    First things first, obviously I work on SRE and EE. The thing that originally got me hyped about software was when I realized I could write tools that wrote more tools, or gave me leverage over the tasks at hand. I have net-negative interest in repetitive tasks and want to see us find ways to move faster with more confidence.
To me, classical specification-driven tools for parser or test or what have you generation are good examples of success in this field. You take a formal specification of some sort and begin to iterate on it and poke holes in it until you've arrived at something which produces the correct behavior. From a certain perspective, all of these tools are compilers. Sounds a lot like an LLM, but the basis in formal evaluation semantics brings confidence in the results of the tool. In theory at least, proof in software being what it is.
Current generation LLMs are best understood as a probabilistic recall database with lossy compression. An LLM is not a formal system, nor does it have any understanding of the evaluation semantics of the results it produces. The LLM response to a query is a best-fit choice from the data it was trained on. They cannot respond to truly novel prompts.
This isn't a strictly bad thing. Heaven knows I do plenty of stackoverflowing on the daily and I've joked before that googling is a core skill. But it is important to be precise about the bounds of a tool and what it can and can't do.
I have no doubt that an LLM can help you (re)produce a solution to some problem using baroque kubectl or aws cli operations. These are common tools with large user bases and large pools of stackoverflow answers and other data to draw on in sourcing existing solutions. If you can find acceleration in your personal workflow there, great.
But given the technical limitations of LLMs I think it's important to distinguish what they can do from what they can't do. An LLM can help you find an existing fragmentary solution you can riff on and build into a tool that suits your needs. It can't in general provide you a cut and dried working solution. It's a large language model, not an AGI.
There's been a lot of interesting work on using formal methods to try and solve search search problems for specification-based program generation, but the problem has always been that writing a specification is still writing a program. Just because you aren't programming directly in X anymore doesn't mean you aren't still programing - you've just move the semantic burden of the problem into Y and hopefully compacted it some. Which makes me more than raise an eyebrow at the notion of "prompt engineering" because c'mon. This industry has repeated the same boom and bust cycle of "no code!" "oh shit it's still code" solutions since COBOL came along. Remember? The Common Business Oriented Language?
I hope we see efforts to hybridize formal code generation search techniques with LLMs. There's interesting notions to be had of using LLMs to try and drive solutions into formal verifiers, or to use LLMs as convolution sources for formal search algorithms.
But something in the loop needs to be doing symbolic reasoning.
When we talk about "prompt engineering" or "result evaluation", that's a human.
So by all means let's play with LLMs and see what they can be useful for - but I can tell you this right now. The current limits of the technology are that it cannot produce novel solutions, and it cannot produce verified solutions. As such, saying "we can and should drop everything and replace everything with LLM results" wildly overestimates the capabilities of what is in essence a snippets search engine.
We can leverage tools as long as we acknowledge and reason about their limits. And LLMs are not AGIs. They are not a replacements for programming or for understanding your tools. They are at best splat compilers from prompts to concatenations of other humans' previous invocations of existing tools.