Skip to content

Instantly share code, notes, and snippets.

@cedrickchee
Last active April 26, 2024 02:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cedrickchee/5683330768d15e27f01a6bcb05eb2cdb to your computer and use it in GitHub Desktop.
Save cedrickchee/5683330768d15e27f01a6bcb05eb2cdb to your computer and use it in GitHub Desktop.
AI Agents and LLM Performance

AI Agents and LLM Performance

In response to Dr. Andrew Ng's letter: https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/

When I read Andrew's letter, I'm imagining him as Steve Balmer, shouting "Agentic, agentic, agentic workflows!". Haha, we can hear you. No need for that.

AI agent competitions are rising; MetaGPT -> AgentCoder -> Devin/OpenDevin/Devika -> SWE-Agent -> AutoCodeRover

LLM-based agents are still in their infancy, and there’s a lot of room for improvement. Agent or multi-agents are still in the very early research/prototype stage.

AutoCodeRover is the agent king born from Singapore. Devin was announced 3 weeks ago and it's turning the spotlight on AI like it's the latest celebrity in town. Devin is generally useful but very slow and costly. It exposed models to an exponentially larger number of calls for production level work. AutoCodeRover is a research prototype. AgentCoder performance (relative to GPT-4) in the graph is astounding, but there is no improvement beyond 100% of this benchmark.

I believe that AI agents will significantly improve in the near future, but the majority of companies and their workers are still figuring out how to integrate the first layer of AI into their workflows and processes.

Agentic workflows have the potential to unlock capabilities beyond what is possible with the current approach of prompting models for one-shot/zero-shot/CoT generations. The tools to create agents are improving rapidly. The architecture/pattern is improving with ideas such as Karpathy's LLM Operating System design. The comparison between traditional LLMs and the iterative, agentic approach is interesting whether or not there will be a pivotal shift in AI application.

What's next for AI agentic workflows ft. Andrew Ng

(Andrew Ng speaks about what's next for AI agentic workflows; planning and multi-agent collaboration. Planning is like the "ChatGPT moment" for AI agent.)

I'm excited to see progress on SWE-bench and new benchmarks for even more complex/bigger tasks. The performance leap with iterative workflows are compelling.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment