Skip to content

Instantly share code, notes, and snippets.

@0xdevalias
Last active April 25, 2024 02:15
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save 0xdevalias/09a5c27702cb94f81c9fb4b7434df966 to your computer and use it in GitHub Desktop.
Save 0xdevalias/09a5c27702cb94f81c9fb4b7434df966 to your computer and use it in GitHub Desktop.
Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus on open source tools)

AI/ML Toolkit

Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus on open source tools)

Table of Contents

Some of my other related gists

Image Generation

Automatic1111 (Stable Diffusion WebUI)

ComfyUI

Unsorted

Song / Audio Generation

Suno

Udio

Unsorted

  • https://github.com/facebookresearch/audiocraft
    • Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

    • https://github.com/facebookresearch/audiocraft#models
      • At the moment, AudioCraft contains the training code and inference code for:

        • MusicGen: A state-of-the-art controllable text-to-music model.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md
            • MusicGen: Simple and Controllable Music Generation AudioCraft provides the code and models for MusicGen, a simple and controllable model for music generation. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.

        • AudioGen: A state-of-the-art text-to-sound model.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/AUDIOGEN.md
            • AudioGen: Textually-guided audio generation AudioCraft provides the code and a model re-implementing AudioGen, a textually-guided audio generation model that performs text-to-sound generation.

              The provided AudioGen reimplementation follows the LM model architecture introduced in MusicGen and is a single stage auto-regressive Transformer model trained over a 16kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. This model variant reaches similar audio quality than the original implementation introduced in the AudioGen publication while providing faster generation speed given the smaller frame rate.

        • EnCodec: A state-of-the-art high fidelity neural audio codec.

        • Multi Band Diffusion: An EnCodec compatible decoder using diffusion.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md
            • MultiBand Diffusion AudioCraft provides the code and models for MultiBand Diffusion, From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion. MultiBand diffusion is a collection of 4 models that can decode tokens from EnCodec tokenizer into waveform audio.

        • MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md
            • MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer AudioCraft provides the code and models for MAGNeT, Masked Audio Generation using a Single Non-Autoregressive Transformer.

              MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike prior work on masked generative audio Transformers, such as SoundStorm and VampNet, MAGNeT doesn't require semantic token conditioning, model cascading or audio prompting, and employs a full text-to-audio using a single non-autoregressive Transformer.

  • https://cassetteai.com/
    • Cassette is your Copilot for AI Music Generation.

      Our cutting edge Artificial Intelligence technology built using Latent Diffusion models (LDMs) makes music production, customization & listening available to everyone. Creating music is now as simple as writing a prompt.

See Also

ollama

LangChain, LangServe, LangSmith, LangFlow, etc

AI Agents / etc

Agent Benchmarks / Leaderboards

  • See also:
  • https://github.com/zhangxjohn/LLM-Agent-Benchmark-List
    • LLM-Agent-Benchmark-List A benchmark list for evaluation of large language models.

  • https://github.com/THUDM/AgentBench
  • https://github.com/princeton-nlp/SWE-bench
    • [ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?

    • SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.

    • https://www.swebench.com/
      • https://www.swebench.com/lite.html
        • SWE-bench Lite A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers

        • SWE-bench was designed to provide a diverse set of codebase problems that were verifiable using in-repo unit tests. The full SWE-bench test split comprises 2,294 issue-commit pairs across 12 python repositories.

          Since its release, we've found that for most systems evaluating on SWE-bench, running each instance can take a lot of time and compute. We've also found that SWE-bench can be a particularly difficult benchmark, which is useful for evaluating LMs in the long term, but discouraging for systems trying to make progress in the short term.

          To remedy these issues, we've released a canonical subset of SWE-bench called SWE-bench Lite. SWE-bench Lite comprises 300 instances from SWE-bench that have been sampled to be more self-contained, with a focus on evaluating functional bug fixes. SWE-bench Lite covers 11 of the original 12 repositories in SWE-bench, with a similar diversity and distribution of repositories as the original. We perform similar filtering on the SWE-bench dev set to provide 23 development instances that can be useful for active development on the SWE-bench task. We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench Lite in lieu of the full SWE-bench set if necessary.

OpenAI Assistants / ChatGPT custom GPTs

  • https://openai.com/blog/introducing-gpts
    • Introducing GPTs You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills.

    • We’re rolling out custom versions of ChatGPT that you can create for a specific purpose—called GPTs. GPTs are a new way for anyone to create a tailored version of ChatGPT to be more helpful in their daily life, at specific tasks, at work, or at home—and then share that creation with others.

  • https://platform.openai.com/docs/assistants/overview
    • The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries.

OpenGPTs

  • https://github.com/langchain-ai/opengpts
    • This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:

      The LLM you use (choose between the 60+ that LangChain offers)

      • The prompts you use (use LangSmith to debug those)
      • The tools you give it (choose from LangChain's 100+ tools, or easily write your own)
      • The vector database you use (choose from LangChain's 60+ vector database integrations)
      • The retrieval algorithm you use
      • The chat history database you use

Autogen / FLAML / etc

  • https://github.com/microsoft/autogen
    • Enable Next-Gen Large Language Model Applications.

    • AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

      • AutoGen enables building next-gen LLM applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
      • It supports diverse conversation patterns for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy, the number of agents, and agent conversation topology.
      • It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
      • AutoGen provides enhanced LLM inference. It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.
    • Roadmap: https://github.com/orgs/microsoft/projects/989/views/3
    • https://github.com/microsoft/autogen#multi-agent-conversation-framework
      • Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.

    • https://microsoft.github.io/autogen/blog/
    • https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenAssistant/
      • AutoGen Assistant: Interactively Explore Multi-Agent Workflows

      • To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Assistant, an interface powered by AutoGen. It allows you to:

        • Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
        • Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken).
        • Explicitly add skills to your agents and accomplish more tasks.
        • Publish your sessions to a local gallery.
        • AutoGen Assistant is open source, give it a try!
      • we are thrilled to introduce a new user-friendly interface: the AutoGen Assistant. Built upon the leading foundation of AutoGen and robust, modern web technologies like React.

      • With the AutoGen Assistant, users can rapidly create, manage, and interact with agents that can learn, adapt, and collaborate. As we release this interface into the open-source community, our ambition is not only to enhance productivity but to inspire a level of personalized interaction between humans and agents.

      • We recommend using a virtual environment (e.g., conda) to avoid conflicts with existing Python packages. With Python 3.10 or newer active in your virtual environment, use pip to install AutoGen Assistant: pip install autogenra

      • Once installed, run the web UI by entering the following in your terminal: autogenra ui --port 8081. This will start the application on the specified port. Open your web browser and go to http://localhost:8081/ to begin using AutoGen Assistant.

      • The AutoGen Assistant proposes some high-level concepts that help compose agents to solve tasks.

        • Agent Workflow: An agent workflow is a specification of a set of agents that can work together to accomplish a task. The simplest version of this is a setup with two agents – a user proxy agent (that represents a user i.e. it compiles code and prints result) and an assistant that can address task requests (e.g., generating plans, writing code, evaluating responses, proposing error recovery steps, etc.). A more complex flow could be a group chat where even more agents work towards a solution.
        • Session: A session refers to a period of continuous interaction or engagement with an agent workflow, typically characterized by a sequence of activities or operations aimed at achieving specific objectives. It includes the agent workflow configuration, the interactions between the user and the agents. A session can be “published” to a “gallery”.
        • Skills: Skills are functions (e.g., Python functions) that describe how to solve a task. In general, a good skill has a descriptive name (e.g. generate_images), extensive docstrings and good defaults (e.g., writing out files to disk for persistence and reuse). You can add new skills to the AutoGen Assistant via the provided UI. At inference time, these skills are made available to the assistant agent as they address your tasks.

        AutoGen Assistant comes with 3 example skills: fetch_profile, find_papers, generate_images. Please feel free to review the repo to learn more about how they work.

      • While the AutoGen Assistant is a web interface, it is powered by an underlying python API that is reusable and modular. Importantly, we have implemented an API where agent workflows can be declaratively specified (in JSON), loaded and run.

    • https://microsoft.github.io/autogen/blog/2023/11/26/Agent-AutoBuild/
      • Agent AutoBuild - Automatically Building Multi-agent Systems

      • Introducing AutoBuild, building multi-agent system automatically, fast, and easily for complex tasks with minimal user prompt required, powered by a new designed class AgentBuilder. AgentBuilder also supports open-source LLMs by leveraging vLLM and FastChat.

      • In this blog, we introduce AutoBuild, a pipeline that can automatically build multi-agent systems for complex tasks. Specifically, we design a new class called AgentBuilder, which will complete the generation of participant expert agents and the construction of group chat automatically after the user provides descriptions of a building task and an execution task.

      • AutoBuild supports open-source LLM by vLLM and FastChat.

      • OpenAI Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. AutoBuild also supports the assistant API by adding use_oai_assistant=True to build().

    • https://microsoft.github.io/autogen/blog/2023/11/20/AgentEval/
      • How to Assess Utility of LLM-powered Applications?

      • As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks?

      • We introduce AgentEval — the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks. AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria.

    • https://microsoft.github.io/autogen/blog/2023/11/13/OAI-assistants/
      • AutoGen Meets GPTs

      • OpenAI assistants are now integrated into AutoGen via GPTAssistantAgent. This enables multiple OpenAI assistants, which form the backend of the now popular GPTs, to collaborate and tackle complex tasks.

    • https://microsoft.github.io/autogen/blog/2023/11/09/EcoAssistant/
      • EcoAssistant - Using LLM Assistants More Accurately and Affordably

      • TL;DR:

        • Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably.
        • We show how to let the LLM assistant agent leverage external API to solve user query.
        • We show how to reduce the cost of using GPT models via Assistant Hierachy.
        • We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.
    • https://microsoft.github.io/autogen/blog/2023/11/06/LMM-Agent/
      • Multimodal with GPT-4V and LLaVA

      • This blog post and the latest AutoGen update concentrate on visual comprehension. Users can input images, pose questions about them, and receive text-based responses from these LMMs. We support the gpt-4-vision-preview model from OpenAI and LLaVA model from Microsoft now.

    • https://microsoft.github.io/autogen/blog/2023/10/26/TeachableAgent/
      • AutoGen's TeachableAgent

      • We introduce TeachableAgent (which uses TextAnalyzerAgent) so that users can teach their LLM-based assistants new facts, preferences, and skills.

    • https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat/
      • Retrieval-Augmented Generation (RAG) Applications with AutoGen

      • TL;DR:

        • We introduce RetrieveUserProxyAgent and RetrieveAssistantAgent, RAG agents of AutoGen that allows retrieval-augmented generation, and its basic usage.
        • We showcase customizations of RAG agents, such as customizing the embedding function, the text split function and vector database.
        • We also showcase two advanced usage of RAG agents, integrating with group chat and building a Chat application with Gradio.
  • https://github.com/microsoft/FLAML
    • A Fast Library for Automated Machine Learning & Tuning

    • FLAML is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.

      • FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
      • For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
      • It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.
    • Heads-up: We have migrated AutoGen into a dedicated github repository. Alongside this move, we have also launched a dedicated Discord server and a website for comprehensive documentation.

ChatDev

  • https://github.com/OpenBMB/ChatDev
    • Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

    • Communicative Agents for Software Development

    • ChatDev stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer , Chief Product Officer , Chief Technology Officer , programmer , reviewer , tester , art designer . These agents form a multi-agent organizational structure and are united by a mission to "revolutionize the digital world through programming." The agents within ChatDev collaborate by participating in specialized functional seminars, including tasks such as designing, coding, testing, and documenting. The primary objective of ChatDev is to offer an easy-to-use, highly customizable and extendable framework, which is based on large language models (LLMs) and serves as an ideal scenario for studying collective intelligence.

    • https://github.com/OpenBMB/ChatDev#-news
      • November 15th, 2023: We launched ChatDev as a SaaS platform that enables software developers and innovative entrepreneurs to build software efficiently at a very low cost and barrier to entry. Try it out at https://chatdev.modelbest.cn/

      • November 2nd, 2023: ChatDev is now supported with a new feature: incremental development, which allows agents to develop upon existing codes. Try --config "incremental" --path "[source_code_directory_path]" to start it.

      • October 26th, 2023: ChatDev is now supported with Docker for safe execution (thanks to contribution from ManindraDeMel). Please see Docker Start Guide.

      • September 25th, 2023: The Git mode is now available, enabling the programmer to utilize Git for version control. To enable this feature, simply set "git_management" to "True" in ChatChainConfig.json. See guide.

      • September 20th, 2023: The Human-Agent-Interaction mode is now available! You can get involved with the ChatDev team by playing the role of reviewer and making suggestions to the programmer ; try python3 run.py --task [description_of_your_idea] --config "Human". See guide and example.

      • September 1st, 2023: The Art mode is available now! You can activate the designer agent to generate images used in the software; try python3 run.py --task [description_of_your_idea] --config "Art". See guide and example.

    • https://chatdev.modelbest.cn/

Unsorted

  • https://github.com/OpenBMB/AgentVerse
    • 🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

    • Task-solving: This framework assembles multiple agents as an automatic multi-agent system (AgentVerse-Tasksolving, Multi-agent as system) to collaboratively accomplish the corresponding tasks. Applications: software development system, consulting system, etc.

    • Simulation: This framework allows users to set up custom environments to observe behaviors among, or interact with, multiple agents. Applications: game, social behavior research of LLM-based agents, etc.

    • https://arxiv.org/abs/2308.10848
      • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

      • Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework that can collaboratively and dynamically adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that the framework can effectively deploy multi-agent groups that outperform a single agent. Furthermore, we delve into the emergence of social behaviors among individual agents within a group during collaborative task accomplishment. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.

    • https://developer.nvidia.com/blog/building-your-first-llm-agent-application/
      • Building Your First LLM Agent Application

  • https://gpt.chatcody.com/
  • https://dosu.dev/
    • Dosu is an AI teammate that lives in your GitHub repo, helping you respond to issues, triage bugs, and build better documentation.

    • How much does Dosu cost? Auto-labeling and backlog grooming are completely free! For Q&A and debugging, Dosu is free for 25 tickets per month. After that, paid plans start at $20 per month. A detailed pricing page is coming soon.

      At Dosu, we are strong advocates of OSS. If you maintain a project that is FOSS, part of the Cloud Native Computing Foundation (CNCF), or the Apache Software Foundation (ASF), please reach out to hi@dosu.dev about special free-tier plans

  • https://github.com/princeton-nlp/SWE-agent
    • SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models

    • SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

      On SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.

    • Agent-Computer Interface (ACI) We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.

      Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.

  • https://github.com/paul-gauthier/aider
    • aider is AI pair programming in your terminal Aider is a command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository. Aider will directly edit the code in your local source files, and git commit the changes with sensible commit messages. You can start a new project or work with an existing git repo. Aider is unique in that it lets you ask for changes to pre-existing, larger codebases.

    • https://aider.chat/
  • https://github.com/OpenDevin/OpenDevin
    • OpenDevin: Code Less, Make More

  • https://github.com/geekan/MetaGPT
  • https://github.com/Pythagora-io/gpt-pilot
  • https://github.com/blarApp/code-base-agent
  • https://github.com/cpacker/MemGPT
    • MemGPT allows you to build LLM agents with self-editing memory

    • Building persistent LLM agents with long-term memory

  • https://github.com/daveshap/OpenAI_Agent_Swarm
    • Hierarchical Autonomous Agent Swarm (HAAS)

    • The Hierarchical Autonomous Agent Swarm (HAAS) is a groundbreaking initiative that leverages OpenAI's latest advancements in agent-based APIs to create a self-organizing and ethically governed ecosystem of AI agents. Drawing inspiration from the ACE Framework, HAAS introduces a novel approach to AI governance and operation, where a hierarchy of specialized agents, each with distinct roles and capabilities, collaborate to solve complex problems and perform a wide array of tasks.

      The HAAS is designed to be a self-expanding system where a core set of agents, governed by a Supreme Oversight Board (SOB), can design, provision, and manage an arbitrary number of sub-agents tailored to specific needs. This document serves as a comprehensive guide to the theoretical underpinnings, architectural design, and operational principles of the HAAS.

    • https://github.com/daveshap/OpenAI_Agent_Swarm/discussions
  • https://github.com/daveshap/ACE_Framework
    • ACE (Autonomous Cognitive Entities) - 100% local and open source autonomous agents

    • We will be committed to using 100% open source software (OSS) for this project. This is to ensure maximimum accessibility and democratic access.

  • https://github.com/ShishirPatil/gorilla
    • Gorilla: An API store for LLMs

    • Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them! Hop on our Discord, or open a PR, or email us if you would like to have your API incorporated as well.

    • https://gorilla.cs.berkeley.edu/
      • Gorilla: Large Language Model Connected with Massive APIs

    • https://github.com/ShishirPatil/gorilla/tree/main/openfunctions
      • Gorilla Openfunctions

      • Gorilla OpenFunctions extends Large Language Model(LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.

      • Comes with Parallel Function Calling!

      • OpenFunctions is compatible with OpenAI Functions

      • https://gorilla.cs.berkeley.edu/blogs/4_open_functions.html
      • OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. Imagine if the LLM could fill in parameters for a variety of services, ranging from Instagram and DoorDash to tools like Google Calendar and Stripe. Even users who are less familiar with API calling procedures and programming can use the model to generate API calls to the desired function. Gorilla OpenFunctions is an LLM that we train using a curated set of API documentation, and Question-Answer pairs generated from the API documentations. We have continued to expand on the Gorilla Paradigm and sought to improve the quality and accuracy of valid function calling generation. This blog is about developing an open-source alternative for function calling similar to features seen in proprietary models, in particular, function calling in OpenAI's GPT-4. Our solution is based on the Gorilla recipe, and with a model with just 7B parameters, its accuracy is, surprisingly, comparable to GPT-4.

    • https://github.com/gorilla-llm/gorilla-cli
      • LLMs for your CLI

      • Gorilla CLI Gorilla CLI powers your command-line interactions with a user-centric tool. Simply state your objective, and Gorilla CLI will generate potential commands for execution. Gorilla today supports ~1500 APIs, including Kubernetes, AWS, GCP, Azure, GitHub, Conda, Curl, Sed, and many more. No more recalling intricate CLI arguments! 🦍

OpenInterpreter

Vision / Multimodal

OpenAI

  • https://platform.openai.com/docs/guides/vision
    • Vision

    • Learn how to use GPT-4 to understand images

    • GPT-4 with Vision, sometimes referred to as GPT-4V or gpt-4-vision-preview in the API, allows the model to take in images and answer questions about them.

LLaVA / etc

  • https://llava-vl.github.io/
    • LLaVA: Large Language and Vision Assistant

    • LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

    • LLaVA-1.5 achieves SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods that use billion-scale data.

    • Demo: https://llava.hliu.cc/
    • https://github.com/haotian-liu/LLaVA
      • LLaVA: Large Language and Vision Assistant

      • Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

      • https://github.com/haotian-liu/LLaVA#release
        • The following are just a couple of notes that jumped out at me:
        • 11/10 LLaVA-Plus is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). Project Page Demo Code Paper

        • 11/2 LLaVA-Interactive is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. Project Page Demo Code Paper

        • 10/26 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts, script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA.

        • 10/12 LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support!

        • 10/5 LLaVA-1.5 is out! Achieving SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods like Qwen-VL-Chat that use billion-scale data. Check out the technical report, and explore the demo! Models are available in Model Zoo.

        • 6/11 We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here.

        • 6/1 We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page.

    • https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md
  • https://github.com/LLaVA-VL/LLaVA-Plus-Codebase
  • https://github.com/microsoft/LLaVA-Med
    • LLaVA-Med: Large Language and Vision Assistant for BioMedicine

    • Visual instruction tuning towards building large language and vision models with GPT-4 level capabilities in the biomedicine space.

Unsorted

  • https://github.com/tldraw/draw-a-ui
  • https://github.com/jordansinger/build-it-figma-ai
    • Draw and sketch UI in Figma and FigJam with this widget. Inspired by SawyerHood/draw-a-ui and tldraw/draw-a-ui

  • https://github.com/jordansinger/UIDraw
  • https://github.com/microsoft/SoM
    • Set-of-Mark Prompting for LMMs

    • Set-of-Mark Visual Prompting for GPT-4V

    • We present Set-of-Mark (SoM) prompting, simply overlaying a number of spatial and speakable marks on the images, to unleash the visual grounding abilities in the strongest LMM -- GPT-4V. Let's using visual prompting for vision!

    • https://arxiv.org/abs/2310.11441
      • Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

      • We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V. As illustrated in Fig. 1 (right), we employ off-the-shelf interactive segmentation models, such as SEEM/SAM, to partition an image into regions at different levels of granularity, and overlay these regions with a set of marks e.g., alphanumerics, masks, boxes. Using the marked image as input, GPT-4V can answer the questions that require visual grounding. We perform a comprehensive empirical study to validate the effectiveness of SoM on a wide range of fine-grained vision and multimodal tasks. For example, our experiments show that GPT-4V with SoM in zero-shot setting outperforms the state-of-the-art fully-finetuned referring expression comprehension and segmentation model on RefCOCOg. Code for SoM prompting is made public at: this https URL.

    • https://github.com/facebookresearch/segment-anything
      • Segment Anything

      • The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

      • The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.

    • https://github.com/UX-Decoder/Semantic-SAM
      • Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

      • In this work, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We have trained on the whole SA-1B dataset and our model can reproduce SAM and beyond it.

      • Segment everything for one image. We output controllable granularity masks from semantic, instance to part level when using different granularity prompts.

    • https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once
      • SEEM: Segment Everything Everywhere All at Once

      • [NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

      • We introduce SEEM that can Segment Everything Everywhere with Multi-modal prompts all at once. SEEM allows users to easily segment an image using prompts of different types including visual prompts (points, marks, boxes, scribbles and image segments) and language prompts (text and audio), etc. It can also work with any combination of prompts or generalize to custom prompts!

    • https://github.com/IDEA-Research/GroundingDINO
      • Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

    • https://github.com/IDEA-Research/OpenSeeD
      • [ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

    • https://github.com/IDEA-Research/MaskDINO
      • [CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

    • https://github.com/facebookresearch/VLPart
      • [ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation

      • Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object descriptions, object parts. In this work, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.

  • https://github.com/OthersideAI/self-operating-computer
    • Self-Operating Computer Framework A framework to enable multimodal models to operate a computer.

      Using the same inputs and outputs of a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.

  • https://github.com/ddupont808/GPT-4V-Act
    • GPT-4V-Act: Chromium Copilot

    • AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

    • GPT-4V-Act serves as an eloquent multimodal AI assistant that harmoniously combines GPT-4V(ision) with a web browser. It's designed to mirror the input and output of a human operator—primarily screen feedback and low-level mouse/keyboard interaction. The objective is to foster a smooth transition between human-computer operations, facilitating the creation of tools that considerably boost the accessibility of any user interface (UI), aid workflow automation, and enable automated UI testing.

    • GPT-4V-Act leverages both GPT-4V(ision) and Set-of-Mark Prompting, together with a tailored auto-labeler. This auto-labeler assigns a unique numerical ID to each interactable UI element.

      By incorporating a task and a screenshot as input, GPT-4V-Act can deduce the subsequent action required to accomplish a task. For mouse/keyboard output, it can refer to the numerical labels for exact pixel coordinates.

  • https://github.com/Jiayi-Pan/GPT-V-on-Web
    • 👀🧠 GPT-4 Vision x 💪⌨️ Vimium = Autonomous Web Agent

    • This project leverages GPT4V to create an autonomous / interactive web agent. The action space are discretized by Vimium.

  • https://github.com/bdekraker/WebcamGPT-Vision
    • Lightweight GPT-4 Vision processing over the Webcam

    • WebcamGPT-Vision is a lightweight web application that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The application captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results.

Vector Databases / etc

  • TODO

Benchmarks / Leaderboards

  • See also:
  • https://chat.lmsys.org/
    • LMSYS Chatbot Arena Leaderboard

  • https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
    • Open LLM Leaderboard

  • https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard
  • https://github.com/EleutherAI/lm-evaluation-harness
    • Language Model Evaluation Harness

    • A framework for few-shot evaluation of language models.

  • https://github.com/openai/evals
    • OpenAI Evals

    • Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

    • Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to write your own custom evals for use cases you care about. You can also use your data to build private evals which represent the common LLMs patterns in your workflow without exposing any of that data publicly.

      If you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.

  • https://github.com/openai/simple-evals
    • This repository contains a lightweight library for evaluating language models. We are open sourcing it so we can be transparent about the accuracy numbers we're publishing alongside our latest models (starting with gpt-4-turbo-2024-04-09). Evals are sensitive to prompting, and there's significant variation in the formulations used in recent publications and libraries. Some use few-shot prompts or role playing prompts ("You are an expert software programmer..."). These approaches are carryovers from evaluating base models (rather than instruction/chat-tuned models) and from models that were worse at following instructions.

      For this library, we are emphasizing the zero-shot, chain-of-thought setting, with simple instructions like "Solve the following multiple choice problem". We believe that this prompting technique is a better reflection of the models' performance in realistic usage.

Prompts / Prompt Engineering / etc

  • https://github.com/mshumer/gpt-prompt-engineer
    • gpt-prompt-engineer Prompt engineering is kind of like alchemy. There's no clear way to predict what will work best. It's all about experimenting until you find the right prompt. gpt-prompt-engineer is a tool that takes this experimentation to a whole new level.

      Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best.

    • Prompt Testing: The real magic happens after the generation. The system tests each prompt against all the test cases, comparing their performance and ranking them using an ELO rating system.

    • ELO Rating System: Each prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This way, you can easily see which prompts are the most effective.

      • https://en.wikipedia.org/wiki/Elo_rating_system
        • The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess.

        • The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.

        • A player's Elo rating is a number which may change depending on the outcome of rated games played. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. If the higher-rated player wins, then only a few rating points will be taken from the lower-rated player. However, if the lower-rated player scores an upset win, many rating points will be transferred. The lower-rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. Players whose ratings are too low or too high should, in the long run, do better or worse correspondingly than the rating system predicts and thus gain or lose rating points until the ratings reflect their true playing strength.

  • https://github.com/dair-ai/Prompt-Engineering-Guide
  • https://github.com/daveshap/ChatGPT_Custom_Instructions
    • Repo of custom instructions that you can use for ChatGPT

  • https://github.com/daveshap/PTSD_prompts
    • GPT based PTSD experiments - USE AT OWN RISK - EXPERIMENTAL ONLY

  • https://github.com/yzfly/Awesome-Multimodal-Prompts
    • Awesome Multimodal Prompts

    • Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

  • https://arxiv.org/abs/2402.03620
    • Self-Discover: Large Language Models Self-Compose Reasoning Structures Submitted on 6 Feb 2024 We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

Other Useful Tools / Libraries / etc

Unsorted

  • See Also
  • https://pipedream.com/requestbin
    • Request Bin

    • Inspect webhooks and HTTP requests Get a URL to collect HTTP or webhook requests and inspect them in a human-friendly way. Optionally connect APIs, run code and return a custom response on each request.

  • https://github.com/googleapis/release-please
    • Release Please Release Please automates CHANGELOG generation, the creation of GitHub releases, and version bumps for your projects.

      It does so by parsing your git history, looking for Conventional Commit messages, and creating release PRs.

      It does not handle publication to package managers or handle complex branch management.

    • https://github.com/google-github-actions/release-please-action
      • automated releases based on conventional commits

      • Release Please Action Automate releases with Conventional Commit Messages.

    • https://www.conventionalcommits.org/
  • https://github.com/winstonjs/winston
  • https://github.com/tldraw/tldraw
    • a very good whiteboard

    • tldraw is a collaborative digital whiteboard available at tldraw.com. Its editor, user interface, and other underlying libraries are open source and available in this repository. They are also distributed on npm. You can use tldraw to create a drop-in whiteboard for your product or as the foundation on which to build your own infinite canvas applications.

    • https://tldraw.dev/
      • You can use the Tldraw React component to embed a fully featured and extendable whiteboard in your app.

      • For multiplayer whiteboards, you can plug the component into the collaboration backend of your choice.

      • You can use the Editor API to create, update, and delete shapes, control the camera—or do just about anything else. You can extend tldraw with your own custom shapes and custom tools. You can use our user interface overrides to change the contents of menus and toolbars, or else hide the UI and replace it with your own.

      • If you want to go even deeper, you can use the TldrawEditor component as a more minimal engine without the default tldraw shapes or user interface.

  • JavaScript (full text) Search Libraries
    • https://www.npmjs.com/search?q=full%20text%20search
    • https://byby.dev/js-search-libraries
    • https://github.com/nextapps-de/flexsearch
      • Next-Generation full text search library for Browser and Node.js

      • Web's fastest and most memory-flexible full-text search library with zero dependencies.

      • When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching.

        Depending on the used options it also provides the most memory-efficient index. FlexSearch introduce a new scoring algorithm called "contextual index" based on a pre-scored lexical dictionary architecture which actually performs queries up to 1,000,000 times faster compared to other libraries. FlexSearch also provides you a non-blocking asynchronous processing model as well as web workers to perform any updates or queries on the index in parallel through dedicated balanced threads.

      • https://github.com/nextapps-de/flexsearch#consumption
        • Memory Consumption

      • https://nextapps-de.github.io/flexsearch/bench/
        • Benchmark of Full-Text-Search Libraries (Stress Test)

      • https://nextapps-de.github.io/flexsearch/bench/match.html
        • Relevance Scoring Comparison

      • https://github.com/angeloashmore/react-use-flexsearch
        • React hook to search a FlexSearch index

        • The useFlexSearch hook takes your search query, index, and store and returns results as an array. Searches are memoized to ensure efficient searching.

    • https://github.com/krisk/fuse
      • Lightweight fuzzy-search, in JavaScript

      • Fuse.js is a lightweight fuzzy-search, in JavaScript, with zero dependencies.

      • https://www.fusejs.io/
    • https://github.com/weixsong/elasticlunr.js
      • Based on lunr.js, but more flexible and customized.

      • Elasticlunr.js Elasticlunr.js is a lightweight full-text search engine developed in JavaScript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting, field search, more rational scoring/ranking methodology, fast computation speed and so on. Elasticlunr.js is a bit like Solr, but much smaller and not as bright, but also provide flexible configuration, query-time boosting, field search and other features.

      • Contributor Welcome!!! As I'm now focusing on new domain, hope that someone who interested on this project could help to maintain this repository.

      • http://elasticlunr.com/
    • https://github.com/olivernn/lunr.js
      • Lunr.js A bit like Solr, but much smaller and not as bright

      • Lunr.js is a small, full-text search library for use in the browser. It indexes JSON documents and provides a simple search interface for retrieving documents that best match text queries.

      • For web applications with all their data already sitting in the client, it makes sense to be able to search that data on the client too. It saves adding extra, compacted services on the server. A local search index will be quicker, there is no network overhead, and will remain available and usable even without a network connection.

      • https://lunrjs.com/
    • https://github.com/apache/solr
      • Apache Solr

      • Solr is the popular, blazing fast open source search platform for all your enterprise, e-commerce, and analytics needs, built on Apache Lucene.

Node-based UI's, Graph Execution, Flow Based Programming, etc

  • https://github.com/xyflow/awesome-node-based-uis
    • A curated list with resources about node-based UIs

  • https://github.com/xyflow/xyflow
  • https://github.com/retejs/rete
    • JavaScript framework for visual programming

    • Rete.js is a framework for creating visual interfaces and workflows. It provides out-of-the-box solutions for visualization using various libraries and frameworks, as well as solutions for processing graphs based on dataflow and control flow approaches.

    • https://retejs.org/
      • A tailorable TypeScript-first framework for creating processing-oriented node-based editors

      • https://retejs.org/examples
        • https://retejs.org/examples/processing/dataflow
          • Data Flow

            This example showcases a data processing pipeline using rete-engine, where data flows from left to right through nodes. Each node features a data method, which receives arrays of incoming data from their respective input sockets and delivers an object containing data corresponding to the output sockets. To initiate their execution, you can make use of the engine.fetch method by specifying the identifier of the target node. Consequently, the engine will execute all predecessors recursively, extracting their output data and delivering it to the specified node.

        • https://retejs.org/examples/processing/control-flow
          • Control Flow

            This example showcases an executing of schema via control flow using rete-engine, where each node dynamically decides which of its outgoing nodes will receive control. Each node features an execute method that takes an input port key as a control source, and a function for conveying control to outgoing nodes through a defined output port. To initiate the execution of the flow, you can use engine.execute method, specifying the identifier of the starting node. Consequently, the outgoing nodes will be executed sequentially, starting from the designated node.

        • https://retejs.org/examples/processing/hybrid-engine
          • Hybrid Engine

            This example shows how rete-engine allows for the simultaneous integration of both dataflow and control flow. Consequently, certain nodes serve as data sources, others manage the flow, and a third set incorporates both of these approaches.

        • https://retejs.org/examples/modules
          • This example showcases a schema reusability technique, where processing is carried out using DataflowEngine. This is accomplished by creating a dedicated Module node that loads a nested schema containing Input and Output nodes, subsequently generating corresponding sockets. As a result, the module node initializes the engine, feeds it with input data, executes it, and retrieves the output data.

        • https://retejs.org/examples/scopes
          • Scopes

            The structures shown in this example may also be referred to as subgraphs or nested nodes. This functionality is achieved using the advanced rete-scopes-plugin plugin. Changing a node's parent is easy: simply long-press the node and move it over the new parent node.

        • https://retejs.org/examples/selectable-connections
          • Selectable connections The editor doesn't offer a built-in connection selection feature. However, if you're using BidirectFlow and can't delete connections from UI, or you need to select connections for other purposes, you can create a custom connection and sync it with AreaExtensions.selector

        • https://retejs.org/examples/reroute
          • Reroute This particular example shows the usage of a plugin designed for user-controlled connection rerouting. Users can insert rerouting points by clicking on a connection or remove them by right-clicking. These points can be dragged or selected by users (similarly to nodes) to move multiple points at once.

        • https://retejs.org/examples/codegen
      • https://retejs.org/docs
        • Visualization: you can choose React.js, Vue.js, Angular or Svelte to visualize nodes, sockets, controls, and connections. These visual components can be tailored to your specific needs by creating custom components for each framework, and they can all coexist in a single editor.

        • Processing: the framework offers various types of engines that enable processing diagrams based on their nature, including dataflow and control flow. These types can be combined within the same graph.

      • https://retejs.org/docs/development/rete-kit
        • The purpose of this tool is to improve efficiency when developing plugins or projects using this framework.

      • https://retejs.org/docs/api/rete-engine
        • DataflowEngine is a plugin that integrates Dataflow with NodeEditor making it easy to use. Additionally, it provides a cache for the data of each node in order to avoid recurring calculations.

        • ControlFlowEngine is a plugin that integrates ControlFlow with NodeEditor making it easy to use

  • https://github.com/graphology/graphology
    • Graphology graphology is a robust & multipurpose Graph object for JavaScript and TypeScript.

      It aims at supporting various kinds of graphs with the same unified interface.

      A graphology graph can therefore be directed, undirected or mixed, allow self-loops or not, and can be simple or support parallel edges.

      Along with this Graph object, one will also find a comprehensive standard library full of graph theory algorithms and common utilities such as graph generators, layouts, traversals etc.

      Finally, graphology graphs are able to emit a wide variety of events, which makes them ideal to build interactive renderers for the browser.

    • https://graphology.github.io/
  • https://github.com/cytoscape/cytoscape.js
  • https://github.com/jagenjo/litegraph.js
    • A graph node engine and editor written in Javascript similar to PD or UDK Blueprints, comes with its own editor in HTML5 Canvas2D. The engine can run client side or server side using Node. It allows to export graphs as JSONs to be included in applications independently.

  • https://github.com/noflo/noflo
    • NoFlo: Flow-based programming for JavaScript NoFlo is an implementation of flow-based programming for JavaScript running on both Node.js and the browser. From WikiPedia:

      In computer science, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

    • NoFlo itself is just a library for implementing flow-based programs in JavaScript. There is an ecosystem of tools around NoFlo and the fbp protocol that make it more powerful. Here are some of them:

      • Flowhub -- browser-based visual programming IDE for NoFlo and other flow-based systems
      • noflo-nodejs -- command-line interface for running NoFlo programs on Node.js
      • noflo-browser-app -- template for building NoFlo programs for the web
      • noflo-assembly -- industrial approach for designing NoFlo programs
      • fbp-spec -- data-driven tests for NoFlo and other FBP environments
      • flowtrace -- tool for retroactive debugging of NoFlo programs. Supports visual replay with Flowhub

      See also the list of reusable NoFlo modules on NPM.

    • https://noflojs.org/
    • https://flowhub.io/ide/
      • Flowhub IDE is a tool for building full-stack applications in a visual way. With the ecosystem of flow-based programming environments, you can use Flowhub to create anything from distributed data processing applications to internet-connected artworks.

    • https://flowbased.github.io/fbp-protocol/
      • FBP Network Protocol The Flow-Based Programming network protocol (FBP protocol) has been designed primarily for flow-based programming interfaces like the Flowhub to communicate with various FBP runtimes. However, it can also be utilized for communication between different runtimes, for example server-to-server or server-to-microcontroller.

      • https://github.com/flowbased/fbp
        • FBP flow definition language parser The fbp library provides a parser for a domain-specific language for flow-based-programming (FBP), used for defining graphs for FBP programming environments like NoFlo, MicroFlo and MsgFlo.

    • https://en.wikipedia.org/wiki/Flow-based_programming
      • In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of black box processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

      • https://en.wikipedia.org/wiki/Component-based_software_engineering
        • Component-based software engineering (CBSE), also called component-based development (CBD), is a style of software engineering that aims to build software out of loosely-coupled, modular components. It emphasizes the separation of concerns among different parts of a software system.

  • https://nodered.org/
    • Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways.

      It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click.

    • https://github.com/node-red/node-red
      • Low-code programming for event-driven applications

    • https://nodered.org/docs/api/modules/v/1.3/@node-red_runtime.html
      • @node-red/runtime This module provides the core runtime component of Node-RED. It does not include the Node-RED editor. All interaction with this module is done using the api provided.

      • https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/index.js#L125-L234
        • var redNodes = require("./nodes");
        • function start() {
          • Start the runtime

          • return redNodes.load().then(function() {
          • return redNodes.loadContextsPlugin().then(function () {
              redNodes.loadFlows().then(() => { redNodes.startFlows() }).catch(function(err) {});
              started = true;
            });
      • https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/nodes/index.js#L198-L267
        • var registry = require("@node-red/registry");
          var flows = require("../flows");
          var context = require("./context");
        • module.exports = {
              // Lifecycle
              init: init,
              load: registry.load,
          
              // ..snip..
          
              // Flow handling
              loadFlows:  flows.load,
              startFlows: flows.startFlows,
              stopFlows:  flows.stopFlows,
              setFlows:   flows.setFlows,
              getFlows:   flows.getFlows,
          
              addFlow:     flows.addFlow,
              getFlow:     flows.getFlow,
              updateFlow:  flows.updateFlow,
              removeFlow:  flows.removeFlow,
          
              // ..snip..
          
              // Contexts
              loadContextsPlugin: context.load,
              closeContextsPlugin: context.close,
              listContextStores: context.listStores,
          };

Unsorted

  • https://github.com/pytorch/torchtune
  • https://zapier.com/blog/train-chatgpt-to-write-like-you/
    • How to train ChatGPT to write like you

  • https://github.com/EleutherAI/gpt-neox
    • An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

    • GPT-NeoX This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. This library is in widespread use in academic, industry, and government labs, including by researchers at Oak Ridge National Lab, CarperAI, Stability AI, Together.ai, Korea University, Carnegie Mellon University, and the University of Tokyo among others. Uniquely among similar libraries GPT-NeoX supports a wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.

      If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.

    • https://github.com/EleutherAI/gpt-neox#why-gpt-neox
      • Why GPT-NeoX?

        GPT-NeoX leverages many of the same features and technologies as the popular Megatron-DeepSpeed library but with substantially increased usability and novel optimizations. Major features include:

        • Distributed training with ZeRO and 3D parallelism
        • A wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.
        • Cutting edge architectural innovations including rotary and alibi positional embeddings, parallel feedforward attention layers, and flash attention.
        • Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 & 2
        • Curriculum Learning
        • Easy connections with the open source ecosystem, including Hugging Face's tokenizers and transformers libraries, logging via WandB, and evaluation via our Language Model Evaluation Harness.
  • https://github.com/mozilla-Ocho/llamafile
    • Distribute and run LLMs with a single file

    • llamafile lets you distribute and run LLMs with a single file Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.

    • https://hacks.mozilla.org/2023/11/introducing-llamafile/
      • Introducing llamafile

  • https://github.com/microsoft/LLMLingua
    • To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

    • https://github.com/microsoft/LLMLingua/blob/main/examples/Retrieval.ipynb
      • We know that LLMs have a 'lost in the middle' issue, where the position of key information in the prompt significantly impacts the final result.

      • How to build an accurate positional relationship between the document and the question has become an important issue. We evaluated the effects of four types of reranker methods on a dataset (NaturalQuestions Multi-document QA that is very close to the actual RAG scenario, e.g. BingChat).

      • The results show that reranker-based methods are significantly better than embedding methods. The LongLLMLingua method is even better than the current SoTA reranker methods, and it can more accurately capture the relationship between the query and the document, thus alleviating the 'lost in the middle' issue.

    • https://llmlingua.com/
      • (Long)LLMLingua | Designing a Language for LLMs via Prompt Compression

    • https://blog.llamaindex.ai/longllmlingua-bye-bye-to-middle-loss-and-save-on-your-rag-costs-via-prompt-compression-54b559b9ddf7
      • LongLLMLingua: Bye-bye to Middle Loss and Save on Your RAG Costs via Prompt Compression

  • https://github.com/apoorvumang/prompt-lookup-decoding
    • In several LLM use cases where you're doing input grounded generation (summarization, document QA, multi-turn chat, code editing), there is high n-gram overlap between LLM input (prompt) and LLM output. This could be entity names, phrases, or code chunks that the LLM directly copies from the input while generating the output. Prompt lookup exploits this pattern to speed up autoregressive decoding in LLMs.

    • On both summarization and context-QA, we get a relatively consistent 2.4x speedup (on average).

    • https://twitter.com/apoorv_umang/status/1728831397153104255
      • Prompt lookup decoding: Get 2x-4x reduction in latency for input grounded LLM generation with no drop in quality using this speculative decoding technique

    • huggingface/transformers#27722
      • Adding support for prompt lookup decoding (variant of assisted generation)

    • ggerganov/llama.cpp#4226
      • lookahead-prompt: add example

  • https://github.com/vercel/ai
    • Vercel AI SDK The Vercel AI SDK is a library for building AI-powered streaming text and chat UIs.

    • Build AI-powered applications with React, Svelte, Vue, and Solid

    • https://sdk.vercel.ai/docs
      • Vercel AI SDK An open source library for building AI-powered user interfaces.

        The Vercel AI SDK is an open-source library designed to help developers build conversational streaming user interfaces in JavaScript and TypeScript. The SDK supports React/Next.js, Svelte/SvelteKit, and Vue/Nuxt as well as Node.js, Serverless, and the Edge Runtime.

  • https://github.com/oobabooga/text-generation-webui
    • A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

    • Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation.

    • https://github.com/oobabooga/text-generation-webui-extensions
      • This is a directory of extensions for oobabooga/text-generation-webui

  • https://github.com/huggingface/chat-ui
    • Open source codebase powering the HuggingChat app

  • https://github.com/lm-sys/FastChat
  • https://github.com/vllm-project/vllm
  • https://github.com/philipturner/metal-benchmarks
    • Apple GPU microarchitecture

    • This document thoroughly explains the Apple GPU microarchitecture, focusing on its GPGPU performance. Details include latencies for each ALU assembly instruction, cache sizes, and the number of unique instruction pipelines. This document enables evidence-based reasoning about performance on the Apple GPU, helping people diagnose bottlenecks in real-world software. It also compares Apple silicon to generations of AMD and Nvidia microarchitectures, showing where it might exhibit different performance patterns. Finally, the document examines how Apple's design choices improve power efficiency compared to other vendors.

      This repository also contains open-source benchmarking scripts. They allow anyone to reproduce and verify the author's claims about performance. A complementary library reports the hardware specifications of any Apple-designed GPU.

      • https://github.com/philipturner/applegpuinfo
        • Print all known information about the GPU on Apple-designed chips

        • This is a mini-framework for querying parameters of an Apple-designed GPU. It also contains a command-line tool, gpuinfo, which reports information similarly to clinfo. It was co-authored with an AI.

        • https://github.com/Oblomov/clinfo
          • Print all known information about all available OpenCL platforms and devices in the system

          • clinfo is a simple command-line application that enumerates all possible (known) properties of the OpenCL platform and devices available on the system.

  • https://github.com/tinygrad/tinygrad
    • You like pytorch? You like micrograd? You love tinygrad! ❤️

    • This may not be the best deep learning framework, but it is a deep learning framework.

      Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.

    • https://tinygrad.org/
  • https://github.com/microsoft/DirectML
    • DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment