In response to Dr. Andrew Ng's letter: https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/
When I read Andrew's letter, I'm imagining him as Steve Balmer, shouting "Agentic, agentic, agentic workflows!". Haha, we can hear you. No need for that.
AI agent competitions are rising; MetaGPT -> AgentCoder -> Devin/OpenDevin/Devika -> SWE-Agent -> AutoCodeRover
LLM-based agents are still in their infancy, and there’s a lot of room for improvement. Agent or multi-agents are still in the very early research/prototype stage.
AutoCodeRover is the agent king born from Singapore. Devin was announced 3 weeks ago and it's turning the spotlight on AI like it's the latest celebrity in town. Devin is generally useful but very slow and costly. It exposed models to an exponentially larger number of calls for production level work. AutoCodeRover is a research prototype. AgentCoder performance (relative to GPT-4) in the graph is astounding, but there is no improvement beyond 100% of this benchmark.
I believe that AI agents will significantly improve in the near future, but the majority of companies and their workers are still figuring out how to integrate the first layer of AI into their workflows and processes.
Agentic workflows have the potential to unlock capabilities beyond what is possible with the current approach of prompting models for one-shot/zero-shot/CoT generations. The tools to create agents are improving rapidly. The architecture/pattern is improving with ideas such as Karpathy's LLM Operating System design. The comparison between traditional LLMs and the iterative, agentic approach is interesting whether or not there will be a pivotal shift in AI application.
(Andrew Ng speaks about what's next for AI agentic workflows; planning and multi-agent collaboration. Planning is like the "ChatGPT moment" for AI agent.)
I'm excited to see progress on SWE-bench and new benchmarks for even more complex/bigger tasks. The performance leap with iterative workflows are compelling.
- Landscape: https://github.com/e2b-dev/awesome-ai-agents
- Papers:
- ReAct: Synergizing Reasoning and Acting in Language Models by Princeton University and Google Brain (ICLR 2023)
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face by Shen et al. (2023) - The group use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results.
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation by Wu et al. (2023)
- Communicative Agents for Software Development by Qian et al. (2023) - At the core of this multi-agent collaboration paradigm lies ChatDev, a virtual chat-powered software development company.
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents by Tsinghua University et al. (2023)
- Large Language Models as Tool Makers by Google Deepmind et al. (ICLR 2024)
- More Agents Is All You Need by Tencent (2024) - The study explores the scaling property of agents created by LLMs. It finds that increasing the number of agents improves performance when using a simple sampling-and-voting method. This approach eliminates the need for complex frameworks, such as the CoT pipeline or multi-agent collaboration systems, to solve complex problems.
- https://github.com/hyp1231/awesome-llm-powered-agent
- https://github.com/lafmdp/Awesome-Papers-Autonomous-Agent
- https://github.com/tmgthb/Autonomous-Agents
- https://github.com/WooooDyy/LLM-Agent-Paper-List
- Survey papers: https://github.com/taichengguo/LLM_MultiAgents_Survey_Papers
- Articles:
- LLM Powered Autonomous Agents by Lilian Weng (2023). 🔥🔥🔥
- Planning is a key design pattern of LLM-based agent by Andrew Ng (2024) - Planning, in which an LLM autonomously decide on what sequence of steps to execute to accomplish a larger task. 🔥
- Tool use is a key design pattern of LLM-based agent by Andrew Ng (2024) - Tool use, in which an LLM is given functions it can request to call for gathering information, taking action, or manipulating data. 🔥
- Introduction to LLM Agents (Part 1) by NVIDIA (2023).
- Building Your First LLM Agent Application (Part 2) by NVIDIA (2023).
- Applications:
- Enterprise-scale: https://github.com/mindsdb/mindsdb
- Programming: https://github.com/plandex-ai/plandex
- Cybersecurity: https://github.com/fr0gger/Awesome-GPT-Agents
- The "App Store" for GPT (unofficial): https://github.com/Anil-matcha/Awesome-GPT-Store
- Frameworks:
- https://github.com/joaomdmoura/crewAI - Based on LangChain. So, at larger scale project, you might run into LangChain limitations.
- https://github.com/langgenius/dify - Dify is an open-source LLM app development platform. Its intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
- Practical aspects of building AI applications:
- Reducing LLM costs and latency with semantic cache - this can be a solution for speeding up Devin.
- GPTCache is a good library for creating semantic cache for LLM queries
- Tips: Fast token generation is important. Generating more tokens even from a lower quality LLM can give good results.
- Reducing LLM costs and latency with semantic cache - this can be a solution for speeding up Devin.
- Tweets:
- Magic.dev is building agents with 99.9% accuracy and frontier model?: https://twitter.com/altryne/status/1776284573443277053
- Raw video of Devin (not a cherry-picked demo): https://twitter.com/cognition_labs/status/1768341296836391311