Skip to content

Instantly share code, notes, and snippets.

@0xdevalias
Last active June 11, 2024 17:01
Show Gist options
  • Save 0xdevalias/09a5c27702cb94f81c9fb4b7434df966 to your computer and use it in GitHub Desktop.
Save 0xdevalias/09a5c27702cb94f81c9fb4b7434df966 to your computer and use it in GitHub Desktop.
Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus on open source tools)

AI/ML Toolkit

Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus on open source tools)

Table of Contents

Some of my other related gists

Image Generation

Automatic1111 (Stable Diffusion WebUI)

ComfyUI

Unsorted

Song / Audio Generation

Udio

Suno

Stable Audio

  • https://arxiv.org/abs/2404.10301
    • Long-form music generation with latent diffusion (2024)

    • Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.

    • https://stability-ai.github.io/stable-audio-2-demo/
      • stable-audio-2-demo

      • Additional creative capabilities Audio-to-audio With diffusion models is possible to perform some degree of style-transfer by initializing the noise with audio during sampling. This capability can be used to modify the aesthetics of an existing recording based on a given text prompt, whilst maintaining the reference audio’s structure (e.g., a beatbox recording could be style-transfered to produce realistic-sounding drums). As a result, our model can be influenced by not only text prompts but also audio inputs, enhancing its controllability and expressiveness. We noted that when initialized with voice recordings (such as beatbox or onomatopoeias), there is a sensation of control akin to an instrument.

      • Memorization analysis Recent works examined the potential of generative models to memorize training data, especially for repeated elements in the training set. Further, musicLM conducted a memorization analysis to address concerns on the potential misappropriation of creative content. Adhering to principles of responsible model development, we also run a comprehensive study on memorization.

        Considering the increased probability of memorizing repeated music within the dataset, we start by studying if our training set contains repeated data. We embed all our training data using the LAION-CLAP audio encoder to select audios that are close in this space based on a manually set threshold. The threshold is set such that the selected audios correspond to exact replicas. With this process, we identify 5566 repeated audios in our training set.

        We compare our model’s generations against the training set in LAION-CLAP space. Generations are from 5566 prompts within the repeated training data (in-distribution), and 586 prompts from the Song Describer Dataset (no-singing, out-of-distribution). We then identify the top-50 generated music that is closest to the training data and listen.

        We extensively listened to potential memorization candidates, and could not find memorization.

  • https://www.stableaudio.com/
  • Stable Audio Create music with AI

    • https://www.stableaudio.com/user-guide/text-to-audio
      • Text-to-audio

    • https://www.stableaudio.com/user-guide/audio-to-audio
      • Audio-to-audio

    • https://www.stableaudio.com/user-guide/model-2
      • Stable Audio 2.0 Model

      • Our groundbreaking Stable Audio AudioSparx 2.0 model has been designed to generate full tracks with coherent structure at 3 minutes and 10 seconds. Our new model is available for everyone to generate full tracks on our Stable Audio product.

      • Key features:

        • Stable Audio 2.0 sets a new standard in AI generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1KHz stereo.
        • The new model introduces audio-to-audio generation by allowing users to upload and transform samples using natural language prompts.
        • Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators.
  • https://stability.ai/news?tags=Audio

AudioCraft: MusicGen, AudioGen, etc

  • https://github.com/facebookresearch/audiocraft
    • Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

    • https://github.com/facebookresearch/audiocraft#models
      • At the moment, AudioCraft contains the training code and inference code for:

        • MusicGen: A state-of-the-art controllable text-to-music model.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md
            • MusicGen: Simple and Controllable Music Generation AudioCraft provides the code and models for MusicGen, a simple and controllable model for music generation. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.

        • AudioGen: A state-of-the-art text-to-sound model.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/AUDIOGEN.md
            • AudioGen: Textually-guided audio generation AudioCraft provides the code and a model re-implementing AudioGen, a textually-guided audio generation model that performs text-to-sound generation.

              The provided AudioGen reimplementation follows the LM model architecture introduced in MusicGen and is a single stage auto-regressive Transformer model trained over a 16kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. This model variant reaches similar audio quality than the original implementation introduced in the AudioGen publication while providing faster generation speed given the smaller frame rate.

        • EnCodec: A state-of-the-art high fidelity neural audio codec.

        • Multi Band Diffusion: An EnCodec compatible decoder using diffusion.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md
            • MultiBand Diffusion AudioCraft provides the code and models for MultiBand Diffusion, From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion. MultiBand diffusion is a collection of 4 models that can decode tokens from EnCodec tokenizer into waveform audio.

        • MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.

          • https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md
            • MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer AudioCraft provides the code and models for MAGNeT, Masked Audio Generation using a Single Non-Autoregressive Transformer.

              MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike prior work on masked generative audio Transformers, such as SoundStorm and VampNet, MAGNeT doesn't require semantic token conditioning, model cascading or audio prompting, and employs a full text-to-audio using a single non-autoregressive Transformer.

Neural Audio Codecs

  • https://haoheliu.github.io/SemantiCodec/
    • SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

    • Highlights

      • Ultra-Low bit rate We focus on bitrate between 0.31 kbps and 1.43 kbps, with token rate of 25, 50, or 100 per second.
      • Strong semantic in the audio token Indicated by classification accuracy.
      • Supporting variable vocabulary sizes One model that supporting four different vocabulary sizes.
    • https://github.com/haoheliu/SemantiCodec
      • SemantiCodec

    • https://arxiv.org/abs/2405.00233
      • SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

      • Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these challenges, we introduce SemantiCodec, a novel codec designed to compress audio into fewer than a hundred tokens per second across diverse audio types, including speech, general audio, and music, without compromising quality. SemantiCodec features a dual-encoder architecture: a semantic encoder using a self-supervised AudioMAE, discretized using k-means clustering on extensive audio data, and an acoustic encoder to capture the remaining details. The semantic and acoustic encoder outputs are used to reconstruct audio via a diffusion-model-based decoder. SemantiCodec is presented in three variants with token rates of 25, 50, and 100 per second, supporting a range of ultra-low bit rates between 0.31 kbps and 1.43 kbps. Experimental results demonstrate that SemantiCodec significantly outperforms the state-of-the-art Descript codec on reconstruction quality. Our results also suggest that SemantiCodec contains significantly richer semantic information than all evaluated audio codecs, even at significantly lower bitrates.

  • https://github.com/yangdongchao/AcademiCodec
    • AcademiCodec: An Open Source Audio Codec Model for Academic Research

    • Audio codec models are widely used in audio communication as a crucial technique for compressing audio into discrete representations. Nowadays, audio codec models are increasingly utilized in generation fields as intermediate representations. For instance, AudioLM is ann audio generation model that uses the discrete representation of SoundStream as a training target, while VALL-E employs the Encodec model as an intermediate feature to aid TTS tasks. Despite their usefulness, two challenges persist: (1) training these audio codec models can be difficult due to the lack of publicly available training processes and the need for large-scale data and GPUs; (2) achieving good reconstruction performance requires many codebooks, which increases the burden on generation models. In this study, we propose a group-residual vector quantization (GRVQ) technique and use it to develop a novel \textbf{Hi}gh \textbf{Fi}delity Audio Codec model, HiFi-Codec, which only requires 4 codebooks. We train all the models using publicly available TTS data such as LibriTTS, VCTK, AISHELL, and more, with a total duration of over 1000 hours, using 8 GPUs. Our experimental results show that HiFi-Codec outperforms Encodec in terms of reconstruction performance despite requiring only 4 codebooks. To facilitate research in audio codec and generation, we introduce AcademiCodec, the first open-source audio codec toolkit that offers training codes and pre-trained models for Encodec, SoundStream, and HiFi-Codec.

    • https://github.com/yangdongchao/AcademiCodec#what-the-difference-between-soundstream-encodec-and-hifi-codec
      • In our view, the mian difference between SoundStream and Encodec is the different Discriminator choice. For Encodec, it only uses a STFT-dicriminator, which forces the STFT-spectrogram be more real. SoundStream use two types of Discriminator, one forces the waveform-level to be more real, one forces the specrogram-level to be more real. In our code, we adopt the waveform-level discriminator from HIFI-GAN. The spectrogram level discrimimator from Encodec. In thoery, we think SoundStream enjoin better performance.

    • https://arxiv.org/abs/2305.02765
      • HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

      • Audio codec models are widely used in audio communication as a crucial technique for compressing audio into discrete representations. Nowadays, audio codec models are increasingly utilized in generation fields as intermediate representations. For instance, AudioLM is an audio generation model that uses the discrete representation of SoundStream as a training target, while VALL-E employs the Encodec model as an intermediate feature to aid TTS tasks. Despite their usefulness, two challenges persist: (1) training these audio codec models can be difficult due to the lack of publicly available training processes and the need for large-scale data and GPUs; (2) achieving good reconstruction performance requires many codebooks, which increases the burden on generation models. In this study, we propose a group-residual vector quantization (GRVQ) technique and use it to develop a novel \textbf{Hi}gh \textbf{Fi}delity Audio Codec model, HiFi-Codec, which only requires 4 codebooks. We train all the models using publicly available TTS data such as LibriTTS, VCTK, AISHELL, and more, with a total duration of over 1000 hours, using 8 GPUs. Our experimental results show that HiFi-Codec outperforms Encodec in terms of reconstruction performance despite requiring only 4 codebooks. To facilitate research in audio codec and generation, we introduce AcademiCodec, the first open-source audio codec toolkit that offers training codes and pre-trained models for Encodec, SoundStream, and HiFi-Codec.

  • https://github.com/descriptinc/descript-audio-codec
    • Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN

    • State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio

      • With Descript Audio Codec, you can compress 44.1 KHz audio into discrete codes at a low 8 kbps bitrate.
      • That's approximately 90x compression while maintaining exceptional fidelity and minimizing artifacts.
      • Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.
      • It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.)
    • https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5
      • Descript Audio Codec Welcome to the demo page for the paper “High Fidelity Compression Algorithm with Improved RVQGAN”. Here, we provide samples from our ablation studies and other competitive baselines.

    • https://arxiv.org/abs/2306.06546
      • High-Fidelity Audio Compression with Improved RVQGAN

      • Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses. We compress all domains (speech, environment, music, etc.) with a single universal model, making it widely applicable to generative modeling of all audio. We compare with competing audio compression algorithms, and find our method outperforms them significantly. We provide thorough ablations for every design choice, as well as open-source code and trained model weights. We hope our work can lay the foundation for the next generation of high-fidelity audio modeling.

    • https://github.com/DBraun/DAC-JAX
      • DAC-JAX

      • A JAX Implementation of the Descript Audio Codec

      • Descript Audio Codec (.dac) is a high-fidelity general neural audio codec introduced in the paper "High-Fidelity Audio Compression with Improved RVQGAN". This repository is an unofficial JAX implementation of the PyTorch-based DAC and has no affiliation with Descript.

  • https://github.com/AudiogenAI/agc
    • Audiogen Codec (agc) We are announcing the open source release of Audiogen Codec (agc) 🎉. A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵.

      It comes in two flavors:

      • agc-continuous 🔄 KL regularized, 32 channels, 100hz.
      • agc-discrete 🔢 24 stages of residual vector quantization, 50hz.

      AGC (Audiogen Codec) is a convolutional autoencoder based on the DAC architecture, which holds SOTA 🏆. We found that training with EMA and adding a perceptual loss term with CLAP features improved performance. These codecs, being low compression, outperform Meta's EnCodec and DAC on general audio as validated from internal blind ELO games 🎲.

      We trained (relatively) very low compression codecs in the pursuit of solving a core issue regarding general music and audio generation, low acoustic quality and audible artifacts, which hinder industry use for these models 🚫🎶. Our hope is to encourage researchers to build hierarchical generative audio models that can efficiently use high sequence length representations without sacrificing semantic abilities 🧠.

      This codec will power Audiogen's upcoming models. Stay tuned! 🚀

    • https://audiogen.notion.site/Audiogen-Codec-Examples-546fe64596f54e20be61deae1c674f20
      • Audiogen Codec Examples

  • https://github.com/facebookresearch/encodec
    • EnCodec: High Fidelity Neural Audio Compression

    • State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

    • We provide our two multi-bandwidth models:

      • A causal model operating at 24 kHz on monophonic audio trained on a variety of audio data.
      • A non-causal model operating at 48 kHz on stereophonic audio trained on music-only data.

      The 24 kHz model can compress to 1.5, 3, 6, 12 or 24 kbps, while the 48 kHz model support 3, 6, 12 and 24 kbps. We also provide a pre-trained language model for each of the models, that can further compress the representation by up to 40% without any further loss of quality.

    • https://github.com/facebookresearch/encodec#-transformers
    • https://arxiv.org/abs/2210.13438
      • High Fidelity Neural Audio Compression

      • We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective, architectural changes and a study of various perceptual loss functions. We present an extensive subjective evaluation (MUSHRA tests) together with an ablation study for a range of bandwidths and audio domains, including speech, noisy-reverberant speech, and music. Our approach is superior to the baselines methods across all evaluated settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio.

Audio Super Resolution

Unsorted

  • https://cassetteai.com/
    • Cassette is your Copilot for AI Music Generation.

      Our cutting edge Artificial Intelligence technology built using Latent Diffusion models (LDMs) makes music production, customization & listening available to everyone. Creating music is now as simple as writing a prompt.

See Also

ollama

LangChain, LangServe, LangSmith, LangFlow, etc

AI Agents / etc

Agent Benchmarks / Leaderboards

  • See also:
  • https://github.com/zhangxjohn/LLM-Agent-Benchmark-List
    • LLM-Agent-Benchmark-List A benchmark list for evaluation of large language models.

  • https://github.com/THUDM/AgentBench
  • https://github.com/princeton-nlp/SWE-bench
    • [ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?

    • SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.

    • https://www.swebench.com/
      • https://www.swebench.com/lite.html
        • SWE-bench Lite A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers

        • SWE-bench was designed to provide a diverse set of codebase problems that were verifiable using in-repo unit tests. The full SWE-bench test split comprises 2,294 issue-commit pairs across 12 python repositories.

          Since its release, we've found that for most systems evaluating on SWE-bench, running each instance can take a lot of time and compute. We've also found that SWE-bench can be a particularly difficult benchmark, which is useful for evaluating LMs in the long term, but discouraging for systems trying to make progress in the short term.

          To remedy these issues, we've released a canonical subset of SWE-bench called SWE-bench Lite. SWE-bench Lite comprises 300 instances from SWE-bench that have been sampled to be more self-contained, with a focus on evaluating functional bug fixes. SWE-bench Lite covers 11 of the original 12 repositories in SWE-bench, with a similar diversity and distribution of repositories as the original. We perform similar filtering on the SWE-bench dev set to provide 23 development instances that can be useful for active development on the SWE-bench task. We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench Lite in lieu of the full SWE-bench set if necessary.

  • https://github.com/aorwall/SWE-bench-docker
    • A Docker based solution of the SWE-bench evaluation framework

    • This is a Dockerfile based solution of the SWE-Bench evaluation framework.

      The solution is designed so that each "testbed" for testing a version of a repository is built in a separate Docker image. Each test is then run in its own Docker container. This approach ensures more stable test results because the environment is completely isolated and is reset for each test. Since the Docker container can be recreated each time, there's no need for reinstallation, speeding up the benchmark process.

OpenAI Assistants / ChatGPT custom GPTs

  • https://openai.com/blog/introducing-gpts
    • Introducing GPTs You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills.

    • We’re rolling out custom versions of ChatGPT that you can create for a specific purpose—called GPTs. GPTs are a new way for anyone to create a tailored version of ChatGPT to be more helpful in their daily life, at specific tasks, at work, or at home—and then share that creation with others.

  • https://platform.openai.com/docs/assistants/overview
    • The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries.

OpenGPTs

  • https://github.com/langchain-ai/opengpts
    • This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:

      The LLM you use (choose between the 60+ that LangChain offers)

      • The prompts you use (use LangSmith to debug those)
      • The tools you give it (choose from LangChain's 100+ tools, or easily write your own)
      • The vector database you use (choose from LangChain's 60+ vector database integrations)
      • The retrieval algorithm you use
      • The chat history database you use

Autogen / FLAML / etc

  • https://github.com/microsoft/autogen
    • Enable Next-Gen Large Language Model Applications.

    • AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

      • AutoGen enables building next-gen LLM applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
      • It supports diverse conversation patterns for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy, the number of agents, and agent conversation topology.
      • It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
      • AutoGen provides enhanced LLM inference. It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.
    • Roadmap: https://github.com/orgs/microsoft/projects/989/views/3
    • https://github.com/microsoft/autogen#multi-agent-conversation-framework
      • Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.

    • https://microsoft.github.io/autogen/blog/
    • https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenAssistant/
      • AutoGen Assistant: Interactively Explore Multi-Agent Workflows

      • To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Assistant, an interface powered by AutoGen. It allows you to:

        • Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
        • Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken).
        • Explicitly add skills to your agents and accomplish more tasks.
        • Publish your sessions to a local gallery.
        • AutoGen Assistant is open source, give it a try!
      • we are thrilled to introduce a new user-friendly interface: the AutoGen Assistant. Built upon the leading foundation of AutoGen and robust, modern web technologies like React.

      • With the AutoGen Assistant, users can rapidly create, manage, and interact with agents that can learn, adapt, and collaborate. As we release this interface into the open-source community, our ambition is not only to enhance productivity but to inspire a level of personalized interaction between humans and agents.

      • We recommend using a virtual environment (e.g., conda) to avoid conflicts with existing Python packages. With Python 3.10 or newer active in your virtual environment, use pip to install AutoGen Assistant: pip install autogenra

      • Once installed, run the web UI by entering the following in your terminal: autogenra ui --port 8081. This will start the application on the specified port. Open your web browser and go to http://localhost:8081/ to begin using AutoGen Assistant.

      • The AutoGen Assistant proposes some high-level concepts that help compose agents to solve tasks.

        • Agent Workflow: An agent workflow is a specification of a set of agents that can work together to accomplish a task. The simplest version of this is a setup with two agents – a user proxy agent (that represents a user i.e. it compiles code and prints result) and an assistant that can address task requests (e.g., generating plans, writing code, evaluating responses, proposing error recovery steps, etc.). A more complex flow could be a group chat where even more agents work towards a solution.
        • Session: A session refers to a period of continuous interaction or engagement with an agent workflow, typically characterized by a sequence of activities or operations aimed at achieving specific objectives. It includes the agent workflow configuration, the interactions between the user and the agents. A session can be “published” to a “gallery”.
        • Skills: Skills are functions (e.g., Python functions) that describe how to solve a task. In general, a good skill has a descriptive name (e.g. generate_images), extensive docstrings and good defaults (e.g., writing out files to disk for persistence and reuse). You can add new skills to the AutoGen Assistant via the provided UI. At inference time, these skills are made available to the assistant agent as they address your tasks.

        AutoGen Assistant comes with 3 example skills: fetch_profile, find_papers, generate_images. Please feel free to review the repo to learn more about how they work.

      • While the AutoGen Assistant is a web interface, it is powered by an underlying python API that is reusable and modular. Importantly, we have implemented an API where agent workflows can be declaratively specified (in JSON), loaded and run.

    • https://microsoft.github.io/autogen/blog/2023/11/26/Agent-AutoBuild/
      • Agent AutoBuild - Automatically Building Multi-agent Systems

      • Introducing AutoBuild, building multi-agent system automatically, fast, and easily for complex tasks with minimal user prompt required, powered by a new designed class AgentBuilder. AgentBuilder also supports open-source LLMs by leveraging vLLM and FastChat.

      • In this blog, we introduce AutoBuild, a pipeline that can automatically build multi-agent systems for complex tasks. Specifically, we design a new class called AgentBuilder, which will complete the generation of participant expert agents and the construction of group chat automatically after the user provides descriptions of a building task and an execution task.

      • AutoBuild supports open-source LLM by vLLM and FastChat.

      • OpenAI Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. AutoBuild also supports the assistant API by adding use_oai_assistant=True to build().

    • https://microsoft.github.io/autogen/blog/2023/11/20/AgentEval/
      • How to Assess Utility of LLM-powered Applications?

      • As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks?

      • We introduce AgentEval — the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks. AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria.

    • https://microsoft.github.io/autogen/blog/2023/11/13/OAI-assistants/
      • AutoGen Meets GPTs

      • OpenAI assistants are now integrated into AutoGen via GPTAssistantAgent. This enables multiple OpenAI assistants, which form the backend of the now popular GPTs, to collaborate and tackle complex tasks.

    • https://microsoft.github.io/autogen/blog/2023/11/09/EcoAssistant/
      • EcoAssistant - Using LLM Assistants More Accurately and Affordably

      • TL;DR:

        • Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably.
        • We show how to let the LLM assistant agent leverage external API to solve user query.
        • We show how to reduce the cost of using GPT models via Assistant Hierachy.
        • We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.
    • https://microsoft.github.io/autogen/blog/2023/11/06/LMM-Agent/
      • Multimodal with GPT-4V and LLaVA

      • This blog post and the latest AutoGen update concentrate on visual comprehension. Users can input images, pose questions about them, and receive text-based responses from these LMMs. We support the gpt-4-vision-preview model from OpenAI and LLaVA model from Microsoft now.

    • https://microsoft.github.io/autogen/blog/2023/10/26/TeachableAgent/
      • AutoGen's TeachableAgent

      • We introduce TeachableAgent (which uses TextAnalyzerAgent) so that users can teach their LLM-based assistants new facts, preferences, and skills.

    • https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat/
      • Retrieval-Augmented Generation (RAG) Applications with AutoGen

      • TL;DR:

        • We introduce RetrieveUserProxyAgent and RetrieveAssistantAgent, RAG agents of AutoGen that allows retrieval-augmented generation, and its basic usage.
        • We showcase customizations of RAG agents, such as customizing the embedding function, the text split function and vector database.
        • We also showcase two advanced usage of RAG agents, integrating with group chat and building a Chat application with Gradio.
  • https://github.com/microsoft/FLAML
    • A Fast Library for Automated Machine Learning & Tuning

    • FLAML is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.

      • FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
      • For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
      • It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.
    • Heads-up: We have migrated AutoGen into a dedicated github repository. Alongside this move, we have also launched a dedicated Discord server and a website for comprehensive documentation.

ChatDev

  • https://github.com/OpenBMB/ChatDev
    • Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

    • Communicative Agents for Software Development

    • ChatDev stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer , Chief Product Officer , Chief Technology Officer , programmer , reviewer , tester , art designer . These agents form a multi-agent organizational structure and are united by a mission to "revolutionize the digital world through programming." The agents within ChatDev collaborate by participating in specialized functional seminars, including tasks such as designing, coding, testing, and documenting. The primary objective of ChatDev is to offer an easy-to-use, highly customizable and extendable framework, which is based on large language models (LLMs) and serves as an ideal scenario for studying collective intelligence.

    • https://github.com/OpenBMB/ChatDev#-news
      • November 15th, 2023: We launched ChatDev as a SaaS platform that enables software developers and innovative entrepreneurs to build software efficiently at a very low cost and barrier to entry. Try it out at https://chatdev.modelbest.cn/

      • November 2nd, 2023: ChatDev is now supported with a new feature: incremental development, which allows agents to develop upon existing codes. Try --config "incremental" --path "[source_code_directory_path]" to start it.

      • October 26th, 2023: ChatDev is now supported with Docker for safe execution (thanks to contribution from ManindraDeMel). Please see Docker Start Guide.

      • September 25th, 2023: The Git mode is now available, enabling the programmer to utilize Git for version control. To enable this feature, simply set "git_management" to "True" in ChatChainConfig.json. See guide.

      • September 20th, 2023: The Human-Agent-Interaction mode is now available! You can get involved with the ChatDev team by playing the role of reviewer and making suggestions to the programmer ; try python3 run.py --task [description_of_your_idea] --config "Human". See guide and example.

      • September 1st, 2023: The Art mode is available now! You can activate the designer agent to generate images used in the software; try python3 run.py --task [description_of_your_idea] --config "Art". See guide and example.

    • https://chatdev.modelbest.cn/

Unsorted

  • https://githubnext.com/projects/copilot-workspace
  • https://github.com/holmeswww/agentkit
    • AgentKit: Flow Engineering with Graphs, not Coding

    • An intuitive LLM prompting framework for multifunctional agents, by explicitly constructing a complex "thought process" from simple natural language prompts.

    • AgentKit offers a unified framework for explicitly constructing a complex human "thought process" from simple natural language prompts. The user puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".

      Different arrangements of nodes could represent different functionalities, allowing the user to integrate various functionalities to build multifunctional agents.

      A basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience.

  • https://github.com/CopilotKit/CopilotKit
  • https://github.com/OpenBMB/AgentVerse
    • 🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

    • Task-solving: This framework assembles multiple agents as an automatic multi-agent system (AgentVerse-Tasksolving, Multi-agent as system) to collaboratively accomplish the corresponding tasks. Applications: software development system, consulting system, etc.

    • Simulation: This framework allows users to set up custom environments to observe behaviors among, or interact with, multiple agents. Applications: game, social behavior research of LLM-based agents, etc.

    • https://arxiv.org/abs/2308.10848
      • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

      • Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework that can collaboratively and dynamically adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that the framework can effectively deploy multi-agent groups that outperform a single agent. Furthermore, we delve into the emergence of social behaviors among individual agents within a group during collaborative task accomplishment. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.

    • https://developer.nvidia.com/blog/building-your-first-llm-agent-application/
      • Building Your First LLM Agent Application

  • https://gpt.chatcody.com/
  • https://dosu.dev/
    • Dosu is an AI teammate that lives in your GitHub repo, helping you respond to issues, triage bugs, and build better documentation.

    • How much does Dosu cost? Auto-labeling and backlog grooming are completely free! For Q&A and debugging, Dosu is free for 25 tickets per month. After that, paid plans start at $20 per month. A detailed pricing page is coming soon.

      At Dosu, we are strong advocates of OSS. If you maintain a project that is FOSS, part of the Cloud Native Computing Foundation (CNCF), or the Apache Software Foundation (ASF), please reach out to hi@dosu.dev about special free-tier plans

  • https://github.com/princeton-nlp/SWE-agent
    • SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models

    • SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

      On SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.

    • Agent-Computer Interface (ACI) We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.

      Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.

  • https://github.com/paul-gauthier/aider
    • aider is AI pair programming in your terminal Aider is a command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository. Aider will directly edit the code in your local source files, and git commit the changes with sensible commit messages. You can start a new project or work with an existing git repo. Aider is unique in that it lets you ask for changes to pre-existing, larger codebases.

    • https://aider.chat/
  • https://github.com/NL2Code/CodeR
    • CodeR

    • GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28% of issues, in the case of submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.

  • https://github.com/simonw/llm
    • Access large language models from the command-line

    • https://llm.datasette.io/
      • LLM A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine.

        Run prompts from the command-line, store the results in SQLite, generate embeddings and more.

      • https://llm.datasette.io/en/stable/openai-models.html
        • OpenAI models LLM ships with a default plugin for talking to OpenAI’s API. OpenAI offer both language models and embedding models, and LLM can access both types.

      • https://llm.datasette.io/en/stable/other-models.html
        • Other models LLM supports OpenAI models by default. You can install plugins to add support for other models. You can also add additional OpenAI-API-compatible models using a configuration file.

        • Installing and using a local model LLM plugins can provide local models that run on your machine.

          To install llm-gpt4all, providing 17 models from the GPT4All project, run this:

          llm install llm-gpt4all
          

          Run llm models to see the expanded list of available models.

      • https://llm.datasette.io/en/stable/embeddings/cli.html
        • Embedding with the CLI LLM provides command-line utilities for calculating and storing embeddings for pieces of content.

        • llm embed The llm embed command can be used to calculate embedding vectors for a string of content. These can be returned directly to the terminal, stored in a SQLite database, or both.

        • Storing embeddings in SQLite Embeddings are much more useful if you store them somewhere, so you can calculate similarity scores between different embeddings later on.

          LLM includes the concept of a collection of embeddings. A collection groups together a set of stored embeddings created using the same model, each with a unique ID within that collection.

          Embeddings also store a hash of the content that was embedded. This hash is later used to avoid calculating duplicate embeddings for the same content.

        • Storing content and metadata By default, only the entry ID and the embedding vector are stored in the database table.

          You can store a copy of the original text in the content column by passing the --store option

      • You can also store a JSON object containing arbitrary metadata in the metadata column by passing the --metadata option.

      • llm embed-multi The llm embed command embeds a single string at a time.

        llm embed-multi can be used to embed multiple strings at once, taking advantage of any efficiencies that the embedding model may provide when processing multiple strings.

        This command can be called in one of three ways:

        • With a CSV, TSV, JSON or newline-delimited JSON file
        • With a SQLite database and a SQL query
        • With one or more paths to directories, each accompanied by a glob pattern
      • Embedding data from a SQLite database You can embed data from a SQLite database using --sql, optionally combined with --attach to attach an additional database.

      • Embedding data from files in directories LLM can embed the content of every text file in a specified directory, using the file’s path and name as the ID.

      • llm similar The llm similar command searches a collection of embeddings for the items that are most similar to a given or item ID.

        This currently uses a slow brute-force approach which does not scale well to large collections. See issue 216 for plans to add a more scalable approach via vector indexes provided by plugins.

      • You can compare against text stored in a file using -i filename

      • When using a model like CLIP, you can find images similar to an input image using -i filename with --binary

      • llm embed-models To list all available embedding models, including those provided by plugins, run this command:

        llm embed-models
        
      • llm collections list To list all of the collections in the embeddings database, run this command:

        llm collections list
        
      • https://llm.datasette.io/en/stable/embeddings/writing-plugins.html
        • Writing plugins to add new embedding models Read the plugin tutorial for details on how to develop and package a plugin.

          This page shows an example plugin that implements and registers a new embedding model.

          There are two components to an embedding model plugin:

          • An implementation of the register_embedding_models() hook, which takes a register callback function and calls it to register the new model with the LLM plugin system.
          • A class that extends the llm.EmbeddingModel abstract base class. The only required method on this class is embed_batch(texts), which takes an iterable of strings and returns an iterator over lists of floating point numbers.
        • Embedding binary content If your model can embed binary content, use the supports_binary property to indicate that

        • If your model accepts binary, your .embed_batch() model may be called with a list of Python bytestrings. These may be mixed with regular strings if the model accepts both types of input.

      • https://llm.datasette.io/en/stable/plugins/installing-plugins.html
        • Installing plugins Plugins must be installed in the same virtual environment as LLM itself.

          You can find names of plugins to install in the plugin directory

          Use the llm install command (a thin wrapper around pip install) to install plugins in the correct environment

      • https://llm.datasette.io/en/stable/plugins/directory.html#plugin-directory
        • Plugin directory The following plugins are available for LLM.

      • https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html
        • Writing a plugin to support a new model This tutorial will walk you through developing a new plugin for LLM that adds support for a new Large Language Model.

      • https://llm.datasette.io/en/stable/aliases.html
        • Model aliases LLM supports model aliases, which allow you to refer to a model by a short name instead of its full ID.

        • Listing aliases To list current aliases, run this:

          llm aliases
          
        • Adding a new alias The llm aliases set <alias> <model-id> command can be used to add a new alias

        • Removing an alias The llm aliases remove <alias> command will remove the specified alias

      • Viewing the aliases file Aliases are stored in an aliases.json file in the LLM configuration directory.

        To see the path to that file, run this:

        llm aliases path
        

        To view the content of that file, run this:

        cat "$(llm aliases path)"
        
      • https://llm.datasette.io/en/stable/python-api.html
        • Python API LLM provides a Python API for executing prompts, in addition to the command-line interface.

          Understanding this API is also important for writing Plugins.

      • https://llm.datasette.io/en/stable/templates.html
        • Prompt templates Prompt templates can be created to reuse useful prompts with different input data.

      • https://llm.datasette.io/en/stable/logging.html
        • Logging to SQLite llm defaults to logging all prompts and responses to a SQLite database.

          You can find the location of that database using the llm logs path command

        • To avoid logging an individual prompt, pass --no-log or -n to the command

        • To turn logging by default off: llm logs off

      • https://llm.datasette.io/en/stable/related-tools.html
        • Related tools The following tools are designed to be used with LLM:

          • https://llm.datasette.io/en/stable/related-tools.html#strip-tags
            • strip-tags strip-tags is a command for stripping tags from HTML. This is useful when working with LLMs because HTML tags can use up a lot of your token budget.

              Here’s how to summarize the front page of the New York Times, by both stripping tags and filtering to just the elements with class="story-wrapper":

              curl -s https://www.nytimes.com/ \
                | strip-tags .story-wrapper \
                | llm -s 'summarize the news'
              
          • https://llm.datasette.io/en/stable/related-tools.html#ttok
            • ttok ttok is a command-line tool for counting OpenAI tokens. You can use it to check if input is likely to fit in the token limit for GPT 3.5 or GPT4

            • It can also truncate input down to a desired number of tokens

          • https://llm.datasette.io/en/stable/related-tools.html#symbex
            • Symbex Symbex is a tool for searching for symbols in Python codebases. It’s useful for extracting just the code for a specific problem and then piping that into LLM for explanation, refactoring or other tasks.

            • It can also be used to export symbols in a format that can be piped to llm embed-multi in order to create embeddings

            • Based on how Symbex is described, I think grep-ast might be able to do a similar job, but across any language supported by tree-sitter, and not just python:
              • https://github.com/paul-gauthier/grep-ast
                • grep-ast Grep soure code files and see matching lines with useful context that show how they fit into the code. See the loops, functions, methods, classes, etc that contain all the matching lines. Get a sense of what's inside a matched class or function definition. You see relevant code from every layer of the abstract syntax tree, above and below the matches.

    • https://simonwillison.net/tags/llm/
      • https://simonwillison.net/2023/Apr/4/llm/
        • Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter

        • The llm CLI tool This is one new piece of software I’ve released in the past few weeks that I haven’t written about yet.

          I built the first version of llm, a command-line tool for running prompts against large language model (currently just ChatGPT and GPT-4), getting the results back on the command-line and also storing the prompt and response in a SQLite database.

      • https://simonwillison.net/2023/May/18/cli-tools-for-llms/
        • llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs I’ve been building out a small suite of command-line tools for working with ChatGPT, GPT-4 and potentially other language models in the future.

          The three tools I’ve built so far are:

          • llm — a command-line tool for sending prompts to the OpenAI APIs, outputting the response and logging the results to a SQLite database. I introduced that a few weeks ago.
          • ttok — a tool for counting and truncating text based on tokens
          • strip-tags — a tool for stripping HTML tags from text, and optionally outputting a subset of the page based on CSS selectors

          The idea with these tools is to support working with language model prompts using Unix pipes.

      • https://simonwillison.net/2023/Jun/18/symbex/
        • Symbex: search Python code for functions and classes, then pipe them into a LLM I just released a new Python CLI tool called Symbex. It’s a search tool, loosely inspired by ripgrep, which lets you search Python code for functions and classes by name or wildcard, then see just the source code of those matching entities.

      • https://simonwillison.net/2023/Jul/12/llm/
        • My LLM CLI tool now supports self-hosted language models via plugins LLM is my command-line utility and Python library for working with large language models such as GPT-4. I just released version 0.5 with a huge new feature: you can now install plugins that add support for additional models to the tool, including models that can run on your own hardware.

      • https://simonwillison.net/2023/Sep/4/llm-embeddings/
        • LLM is my Python library and command-line tool for working with language models. I just released LLM 0.9 with a new set of features that extend LLM to provide tools for working with embeddings.

        • An embedding model lets you take a string of text—a word, sentence, paragraph or even a whole document—and turn that into an array of floating point numbers called an embedding vector.

        • A model will always produce the same length of array—1,536 numbers for the OpenAI embedding model, 384 for all-MiniLM-L6-v2—but the array itself is inscrutable. What are you meant to do with it? The answer is that you can compare them. I like to think of an embedding vector as a location in 1,536-dimensional space. The distance between two vectors is a measure of how semantically similar they are in meaning, at least according to the model that produced them.

        • Things you can do with embeddings include:

          • Find related items. I use this on my TIL site to display related articles, as described in Storing and serving related documents with openai-to-sqlite and embeddings.
          • Build semantic search. As shown above, an embeddings-based search engine can find content relevant to the user’s search term even if none of the keywords match.
          • Implement retrieval augmented generation—the trick where you take a user’s question, find relevant documentation in your own corpus and use that to get an LLM to spit out an answer. More on that here.
          • Clustering: you can find clusters of nearby items and identify patterns in a corpus of documents.
          • Classification: calculate the embedding of a piece of text and compare it to pre-calculated “average” embeddings for different categories.
        • My goal with LLM is to provide a plugin-driven abstraction around a growing collection of language models. I want to make installing, using and comparing these models as easy as possible. The new release adds several command-line tools for working with embeddings, plus a new Python API for working with embeddings in your own code. It also adds support for installing additional embedding models via plugins.

      • https://simonwillison.net/2024/Mar/26/llm-cmd/
        • I just released a neat new plugin for my LLM command-line tool: llm-cmd. It lets you run a command to to generate a further terminal command, review and edit that command, then hit to execute it or to cancel.

  • https://github.com/OpenDevin/OpenDevin
    • OpenDevin: Code Less, Make More

    • https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/
      • Introducing OpenDevin CodeAct 1.0, a new State-of-the-art in Coding Agents

      • today we introduce a new state-of-the-art coding agent, OpenDevin CodeAct 1.0, which achieves 21% solve rate on SWE-Bench Lite unassisted, a 17% relative improvement above the previous state-of-the-art posted by SWE-Agent. OpenDevin CodeAct 1.0 is now the default in OpenDevin v0.5

      • We also are working on a new simplified evaluation harness for testing coding agents, which we hope will be easy to use for agent developers and researchers, facilitating comprehensive evaluation and comparison. The current version of the harness is available here (tutorial, harness).

      • SWE-Bench is a great benchmark that tests the ability of coding agents to solve real-world github issues on a number of popular repositories. However, due in part to its realism the process of evaluating on SWE-Bench can initially seem daunting.

      • To help make it easy to perform this process in an efficient, stable, and reproducible manner, the OpenDevin team containerized the evaluation environment. This preparation involves setting up all necessary testbeds (codebases at various versions) and their respective conda environments in advance. For each task instance, we initiate a sandbox container where the testbed is pre-configured, ensuring a ready-to-use setup for the agent

      • This supports both SWE-Bench-Lite (a smaller benchmark of 300 issues that is more conducive to quick benchmarking) and SWE-Bench (the full dataset of 2,294 issues, work-in-progress). With our evaluation pipeline, we obtained a replicated SWE-agent resolve score of 17.3% (52 out of 300 test instances) on SWE-Bench-Lite using the released SWE-agent patch predictions, which differs by 2 from the originally reported 18.0% (54 out of 300).

    • OpenDevin/OpenDevin#742
      • Explore whether stack graphs may be useful in this tool

  • https://github.com/stitionai/devika
    • Devika - Agentic AI Software Engineer

    • Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.

  • https://github.com/geekan/MetaGPT
  • https://github.com/Pythagora-io/gpt-pilot
  • https://github.com/blarApp/code-base-agent
  • https://github.com/cpacker/MemGPT
    • MemGPT allows you to build LLM agents with self-editing memory

    • Building persistent LLM agents with long-term memory

  • https://github.com/daveshap/OpenAI_Agent_Swarm
    • Hierarchical Autonomous Agent Swarm (HAAS)

    • The Hierarchical Autonomous Agent Swarm (HAAS) is a groundbreaking initiative that leverages OpenAI's latest advancements in agent-based APIs to create a self-organizing and ethically governed ecosystem of AI agents. Drawing inspiration from the ACE Framework, HAAS introduces a novel approach to AI governance and operation, where a hierarchy of specialized agents, each with distinct roles and capabilities, collaborate to solve complex problems and perform a wide array of tasks.

      The HAAS is designed to be a self-expanding system where a core set of agents, governed by a Supreme Oversight Board (SOB), can design, provision, and manage an arbitrary number of sub-agents tailored to specific needs. This document serves as a comprehensive guide to the theoretical underpinnings, architectural design, and operational principles of the HAAS.

    • https://github.com/daveshap/OpenAI_Agent_Swarm/discussions
  • https://github.com/daveshap/ACE_Framework
    • ACE (Autonomous Cognitive Entities) - 100% local and open source autonomous agents

    • We will be committed to using 100% open source software (OSS) for this project. This is to ensure maximimum accessibility and democratic access.

  • https://github.com/ShishirPatil/gorilla
    • Gorilla: An API store for LLMs

    • Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them! Hop on our Discord, or open a PR, or email us if you would like to have your API incorporated as well.

    • https://gorilla.cs.berkeley.edu/
      • Gorilla: Large Language Model Connected with Massive APIs

    • https://github.com/ShishirPatil/gorilla/tree/main/openfunctions
      • Gorilla Openfunctions

      • Gorilla OpenFunctions extends Large Language Model(LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.

      • Comes with Parallel Function Calling!

      • OpenFunctions is compatible with OpenAI Functions

      • https://gorilla.cs.berkeley.edu/blogs/4_open_functions.html
      • OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. Imagine if the LLM could fill in parameters for a variety of services, ranging from Instagram and DoorDash to tools like Google Calendar and Stripe. Even users who are less familiar with API calling procedures and programming can use the model to generate API calls to the desired function. Gorilla OpenFunctions is an LLM that we train using a curated set of API documentation, and Question-Answer pairs generated from the API documentations. We have continued to expand on the Gorilla Paradigm and sought to improve the quality and accuracy of valid function calling generation. This blog is about developing an open-source alternative for function calling similar to features seen in proprietary models, in particular, function calling in OpenAI's GPT-4. Our solution is based on the Gorilla recipe, and with a model with just 7B parameters, its accuracy is, surprisingly, comparable to GPT-4.

    • https://github.com/gorilla-llm/gorilla-cli
      • LLMs for your CLI

      • Gorilla CLI Gorilla CLI powers your command-line interactions with a user-centric tool. Simply state your objective, and Gorilla CLI will generate potential commands for execution. Gorilla today supports ~1500 APIs, including Kubernetes, AWS, GCP, Azure, GitHub, Conda, Curl, Sed, and many more. No more recalling intricate CLI arguments! 🦍

Code Generation / Execution

Unsorted

  • TODO

Code Leaderboards / Benchmarks

  • https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
    • Big Code Models Leaderboard

    • Inspired from the 🤗 Open LLM Leaderboard and 🤗 Open LLM-Perf Leaderboard 🏋️, we compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. We also measure throughput and provide information about the models. We only compare open pre-trained multilingual code models, that people can start from as base models for their trainings.

  • https://evalplus.github.io/leaderboard.html
    • EvalPlus Leaderboard

    • EvalPlus evaluates AI Coders with rigorous tests.

    • https://github.com/evalplus/evalplus
      • EvalPlus

      • EvalPlus is a rigorous evaluation framework for LLM4Code, with:

        • ✨ HumanEval+: 80x more tests than the original HumanEval!
        • ✨ MBPP+: 35x more tests than the original MBPP!
        • ✨ Evaluation framework: our packages/images/tools can easily and safely evaluate LLMs on above benchmarks.
      • https://evalplus.github.io/
        • Benchmarks @ EvalPlus EvalPlus team aims to build high-quality benchmarks for evaluating LLMs for code. Below are the benchmarks we have beening building so far

        • HumanEval+ & MBPP+ HumanEval and MBPP initially came with limited tests. EvalPlus made HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous eval.

        • RepoQA: Long-Context Code Understanding Repository understanding is crucial for intelligent code agents. At RepoQA, we are designing evaluators of long-context code understanding.

          • https://evalplus.github.io/repoqa.html
            • RepoQA The First Benchmark for Long-Context Code Understanding

            • The goal of RepoQA: is to create a series of long-context code understanding tasks to challenge chat/instruction models for code:

              • Multi-Lingual: RepoQA covers 50 high-quality respositories from 5 programming langauges.
              • Application-Driven: While "Needle in the Code" by CodeQwen uses a synthetic task to examine the vulnerable parts over the LLM's long context, RepoQA focuses on tasks that can reflect real-world uses.
              • 🔍 Searching Needle Function (🔗): Search a function given its description.
              • 🚧 RepoQA is still under development... More types of QA tasks are coming soon... Stay tuned!

AutoCoder

  • https://github.com/bin123apple/AutoCoder
    • AutoCoder

    • We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024). (90.9% vs 90.2%).

      Additionally, compared to previous open-source models, AutoCoder offers a new feature: it can automatically install the required packages and attempt to run the code until it deems there are no issues, whenever the user wishes to execute the code.

    • https://arxiv.org/abs/2405.14906
      • AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

      • We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset.

OpenCodeInterpreter

  • https://github.com/OpenCodeInterpreter/OpenCodeInterpreter
    • OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

    • OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. It significantly enhances code generation capabilities by integrating execution and iterative refinement functionalities.

    • https://opencodeinterpreter.github.io/
    • https://arxiv.org/abs/2402.14658
      • OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

      • The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.

OpenInterpreter

Vision / Multimodal

OpenAI

  • https://platform.openai.com/docs/guides/vision
    • Vision

    • Learn how to use GPT-4 to understand images

    • GPT-4 with Vision, sometimes referred to as GPT-4V or gpt-4-vision-preview in the API, allows the model to take in images and answer questions about them.

LLaVA / etc

  • https://llava-vl.github.io/
    • LLaVA: Large Language and Vision Assistant

    • LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

    • LLaVA-1.5 achieves SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods that use billion-scale data.

    • Demo: https://llava.hliu.cc/
    • https://github.com/haotian-liu/LLaVA
      • LLaVA: Large Language and Vision Assistant

      • Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

      • https://github.com/haotian-liu/LLaVA#release
        • The following are just a couple of notes that jumped out at me:
        • 11/10 LLaVA-Plus is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). Project Page Demo Code Paper

        • 11/2 LLaVA-Interactive is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. Project Page Demo Code Paper

        • 10/26 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts, script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA.

        • 10/12 LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support!

        • 10/5 LLaVA-1.5 is out! Achieving SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods like Qwen-VL-Chat that use billion-scale data. Check out the technical report, and explore the demo! Models are available in Model Zoo.

        • 6/11 We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here.

        • 6/1 We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page.

    • https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md
  • https://github.com/LLaVA-VL/LLaVA-Plus-Codebase
  • https://github.com/LLaVA-VL/LLaVA-NeXT
    • LLaVA-NeXT: Open Large Multimodal Models

    • https://llava-vl.github.io/blog/2024-01-30-llava-next/
      • LLaVA-NeXT: Improved reasoning, OCR, and world knowledge

      • Today, we are thrilled to present LLaVA-NeXT, with improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.

      • Compared with LLaVA-1.5, LLaVA-NeXT has several improvements:

        • Increasing the input image resolution to 4x more pixels. This allows it to grasp more visual details. It supports three aspect ratios, up to 672x672, 336x1344, 1344x336 resolution.
        • Better visual reasoning and OCR capability with an improved visual instruction tuning data mixture.
        • Better visual conversation for more scenarios, covering different applications. Better world knowledge and logical reasoning.
        • Efficient deployment and inference with SGLang.
    • https://llava-vl.github.io/blog/2024-04-30-llava-next-video/
      • LLaVA-NeXT: A Strong Zero-shot Video Understanding Model

      • In today’s exploration, we delve into the performance of LLaVA-NeXT within the realm of video understanding tasks. We reveal that LLaVA-NeXT surprisingly has strong performance in understanding video content.

      • SoTA Performance! Without seeing any video data, LLaVA-Next demonstrates strong zero-shot modality transfer ability, outperforming all the existing open-source LMMs (e.g., LLaMA-VID) that have been specifically trained for videos. Compared with proprietary ones, it achieves comparable performance with Gemini Pro on NextQA and ActivityNet-QA.

    • https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/
      • LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities in the Wild

  • https://github.com/microsoft/LLaVA-Med
    • LLaVA-Med: Large Language and Vision Assistant for BioMedicine

    • Visual instruction tuning towards building large language and vision models with GPT-4 level capabilities in the biomedicine space.

Unsorted

  • https://github.com/tldraw/draw-a-ui
  • https://github.com/jordansinger/build-it-figma-ai
    • Draw and sketch UI in Figma and FigJam with this widget. Inspired by SawyerHood/draw-a-ui and tldraw/draw-a-ui

  • https://github.com/jordansinger/UIDraw
  • https://github.com/microsoft/SoM
    • Set-of-Mark Prompting for LMMs

    • Set-of-Mark Visual Prompting for GPT-4V

    • We present Set-of-Mark (SoM) prompting, simply overlaying a number of spatial and speakable marks on the images, to unleash the visual grounding abilities in the strongest LMM -- GPT-4V. Let's using visual prompting for vision!

    • https://arxiv.org/abs/2310.11441
      • Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

      • We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V. As illustrated in Fig. 1 (right), we employ off-the-shelf interactive segmentation models, such as SEEM/SAM, to partition an image into regions at different levels of granularity, and overlay these regions with a set of marks e.g., alphanumerics, masks, boxes. Using the marked image as input, GPT-4V can answer the questions that require visual grounding. We perform a comprehensive empirical study to validate the effectiveness of SoM on a wide range of fine-grained vision and multimodal tasks. For example, our experiments show that GPT-4V with SoM in zero-shot setting outperforms the state-of-the-art fully-finetuned referring expression comprehension and segmentation model on RefCOCOg. Code for SoM prompting is made public at: this https URL.

    • https://github.com/facebookresearch/segment-anything
      • Segment Anything

      • The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

      • The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.

    • https://github.com/UX-Decoder/Semantic-SAM
      • Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

      • In this work, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We have trained on the whole SA-1B dataset and our model can reproduce SAM and beyond it.

      • Segment everything for one image. We output controllable granularity masks from semantic, instance to part level when using different granularity prompts.

    • https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once
      • SEEM: Segment Everything Everywhere All at Once

      • [NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

      • We introduce SEEM that can Segment Everything Everywhere with Multi-modal prompts all at once. SEEM allows users to easily segment an image using prompts of different types including visual prompts (points, marks, boxes, scribbles and image segments) and language prompts (text and audio), etc. It can also work with any combination of prompts or generalize to custom prompts!

    • https://github.com/IDEA-Research/GroundingDINO
      • Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

    • https://github.com/IDEA-Research/OpenSeeD
      • [ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

    • https://github.com/IDEA-Research/MaskDINO
      • [CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

    • https://github.com/facebookresearch/VLPart
      • [ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation

      • Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object descriptions, object parts. In this work, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.

  • https://github.com/OthersideAI/self-operating-computer
    • Self-Operating Computer Framework A framework to enable multimodal models to operate a computer.

      Using the same inputs and outputs of a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.

  • https://github.com/ddupont808/GPT-4V-Act
    • GPT-4V-Act: Chromium Copilot

    • AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

    • GPT-4V-Act serves as an eloquent multimodal AI assistant that harmoniously combines GPT-4V(ision) with a web browser. It's designed to mirror the input and output of a human operator—primarily screen feedback and low-level mouse/keyboard interaction. The objective is to foster a smooth transition between human-computer operations, facilitating the creation of tools that considerably boost the accessibility of any user interface (UI), aid workflow automation, and enable automated UI testing.

    • GPT-4V-Act leverages both GPT-4V(ision) and Set-of-Mark Prompting, together with a tailored auto-labeler. This auto-labeler assigns a unique numerical ID to each interactable UI element.

      By incorporating a task and a screenshot as input, GPT-4V-Act can deduce the subsequent action required to accomplish a task. For mouse/keyboard output, it can refer to the numerical labels for exact pixel coordinates.

  • https://github.com/Jiayi-Pan/GPT-V-on-Web
    • 👀🧠 GPT-4 Vision x 💪⌨️ Vimium = Autonomous Web Agent

    • This project leverages GPT4V to create an autonomous / interactive web agent. The action space are discretized by Vimium.

  • https://github.com/bdekraker/WebcamGPT-Vision
    • Lightweight GPT-4 Vision processing over the Webcam

    • WebcamGPT-Vision is a lightweight web application that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The application captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results.

Vector Databases/Search, Similarity Search, Clustering, etc

  • TODO: add more things here

Faiss

  • https://github.com/facebookresearch/faiss
    • Faiss

    • A library for efficient similarity search and clustering of dense vectors.

    • Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed primarily at Meta's Fundamental AI Research group.

    • https://faiss.ai/

Benchmarks / Leaderboards

  • See also:
  • https://chat.lmsys.org/
    • LMSYS Chatbot Arena Leaderboard

  • https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
    • Open LLM Leaderboard

  • https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard
  • https://github.com/EleutherAI/lm-evaluation-harness
    • Language Model Evaluation Harness

    • A framework for few-shot evaluation of language models.

  • https://github.com/openai/evals
    • OpenAI Evals

    • Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

    • Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to write your own custom evals for use cases you care about. You can also use your data to build private evals which represent the common LLMs patterns in your workflow without exposing any of that data publicly.

      If you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.

  • https://github.com/openai/simple-evals
    • This repository contains a lightweight library for evaluating language models. We are open sourcing it so we can be transparent about the accuracy numbers we're publishing alongside our latest models (starting with gpt-4-turbo-2024-04-09). Evals are sensitive to prompting, and there's significant variation in the formulations used in recent publications and libraries. Some use few-shot prompts or role playing prompts ("You are an expert software programmer..."). These approaches are carryovers from evaluating base models (rather than instruction/chat-tuned models) and from models that were worse at following instructions.

      For this library, we are emphasizing the zero-shot, chain-of-thought setting, with simple instructions like "Solve the following multiple choice problem". We believe that this prompting technique is a better reflection of the models' performance in realistic usage.

Prompts / Prompt Engineering / etc

  • https://github.com/mshumer/gpt-prompt-engineer
    • gpt-prompt-engineer Prompt engineering is kind of like alchemy. There's no clear way to predict what will work best. It's all about experimenting until you find the right prompt. gpt-prompt-engineer is a tool that takes this experimentation to a whole new level.

      Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best.

    • Prompt Testing: The real magic happens after the generation. The system tests each prompt against all the test cases, comparing their performance and ranking them using an ELO rating system.

    • ELO Rating System: Each prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This way, you can easily see which prompts are the most effective.

      • https://en.wikipedia.org/wiki/Elo_rating_system
        • The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess.

        • The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.

        • A player's Elo rating is a number which may change depending on the outcome of rated games played. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. If the higher-rated player wins, then only a few rating points will be taken from the lower-rated player. However, if the lower-rated player scores an upset win, many rating points will be transferred. The lower-rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. Players whose ratings are too low or too high should, in the long run, do better or worse correspondingly than the rating system predicts and thus gain or lose rating points until the ratings reflect their true playing strength.

  • https://github.com/dair-ai/Prompt-Engineering-Guide
  • https://github.com/daveshap/ChatGPT_Custom_Instructions
    • Repo of custom instructions that you can use for ChatGPT

  • https://github.com/daveshap/PTSD_prompts
    • GPT based PTSD experiments - USE AT OWN RISK - EXPERIMENTAL ONLY

  • https://github.com/yzfly/Awesome-Multimodal-Prompts
    • Awesome Multimodal Prompts

    • Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

  • https://arxiv.org/abs/2402.03620
    • Self-Discover: Large Language Models Self-Compose Reasoning Structures Submitted on 6 Feb 2024 We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

Other Useful Tools / Libraries / etc

Unsorted

  • See Also
  • https://github.com/pypa/pipx
    • pipx — Install and Run Python Applications in Isolated Environments

    • https://pipx.pypa.io/stable/
      • pipx is a tool to help you install and run end-user applications written in Python. It's roughly similar to macOS's brew, JavaScript's npx, and Linux's apt.

        It's closely related to pip. In fact, it uses pip, but is focused on installing and managing Python packages that can be run from the command line directly as applications.

  • https://pipedream.com/requestbin
    • Request Bin

    • Inspect webhooks and HTTP requests Get a URL to collect HTTP or webhook requests and inspect them in a human-friendly way. Optionally connect APIs, run code and return a custom response on each request.

  • https://github.com/googleapis/release-please
    • Release Please Release Please automates CHANGELOG generation, the creation of GitHub releases, and version bumps for your projects.

      It does so by parsing your git history, looking for Conventional Commit messages, and creating release PRs.

      It does not handle publication to package managers or handle complex branch management.

    • https://github.com/google-github-actions/release-please-action
      • automated releases based on conventional commits

      • Release Please Action Automate releases with Conventional Commit Messages.

    • https://www.conventionalcommits.org/
  • https://github.com/winstonjs/winston
  • https://github.com/tldraw/tldraw
    • a very good whiteboard

    • tldraw is a collaborative digital whiteboard available at tldraw.com. Its editor, user interface, and other underlying libraries are open source and available in this repository. They are also distributed on npm. You can use tldraw to create a drop-in whiteboard for your product or as the foundation on which to build your own infinite canvas applications.

    • https://tldraw.dev/
      • You can use the Tldraw React component to embed a fully featured and extendable whiteboard in your app.

      • For multiplayer whiteboards, you can plug the component into the collaboration backend of your choice.

      • You can use the Editor API to create, update, and delete shapes, control the camera—or do just about anything else. You can extend tldraw with your own custom shapes and custom tools. You can use our user interface overrides to change the contents of menus and toolbars, or else hide the UI and replace it with your own.

      • If you want to go even deeper, you can use the TldrawEditor component as a more minimal engine without the default tldraw shapes or user interface.

  • JavaScript (full text) Search Libraries
    • https://www.npmjs.com/search?q=full%20text%20search
    • https://byby.dev/js-search-libraries
    • https://github.com/nextapps-de/flexsearch
      • Next-Generation full text search library for Browser and Node.js

      • Web's fastest and most memory-flexible full-text search library with zero dependencies.

      • When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching.

        Depending on the used options it also provides the most memory-efficient index. FlexSearch introduce a new scoring algorithm called "contextual index" based on a pre-scored lexical dictionary architecture which actually performs queries up to 1,000,000 times faster compared to other libraries. FlexSearch also provides you a non-blocking asynchronous processing model as well as web workers to perform any updates or queries on the index in parallel through dedicated balanced threads.

      • https://github.com/nextapps-de/flexsearch#consumption
        • Memory Consumption

      • https://nextapps-de.github.io/flexsearch/bench/
        • Benchmark of Full-Text-Search Libraries (Stress Test)

      • https://nextapps-de.github.io/flexsearch/bench/match.html
        • Relevance Scoring Comparison

      • https://github.com/angeloashmore/react-use-flexsearch
        • React hook to search a FlexSearch index

        • The useFlexSearch hook takes your search query, index, and store and returns results as an array. Searches are memoized to ensure efficient searching.

    • https://github.com/krisk/fuse
      • Lightweight fuzzy-search, in JavaScript

      • Fuse.js is a lightweight fuzzy-search, in JavaScript, with zero dependencies.

      • https://www.fusejs.io/
    • https://github.com/weixsong/elasticlunr.js
      • Based on lunr.js, but more flexible and customized.

      • Elasticlunr.js Elasticlunr.js is a lightweight full-text search engine developed in JavaScript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting, field search, more rational scoring/ranking methodology, fast computation speed and so on. Elasticlunr.js is a bit like Solr, but much smaller and not as bright, but also provide flexible configuration, query-time boosting, field search and other features.

      • Contributor Welcome!!! As I'm now focusing on new domain, hope that someone who interested on this project could help to maintain this repository.

      • http://elasticlunr.com/
    • https://github.com/olivernn/lunr.js
      • Lunr.js A bit like Solr, but much smaller and not as bright

      • Lunr.js is a small, full-text search library for use in the browser. It indexes JSON documents and provides a simple search interface for retrieving documents that best match text queries.

      • For web applications with all their data already sitting in the client, it makes sense to be able to search that data on the client too. It saves adding extra, compacted services on the server. A local search index will be quicker, there is no network overhead, and will remain available and usable even without a network connection.

      • https://lunrjs.com/
    • https://github.com/apache/solr
      • Apache Solr

      • Solr is the popular, blazing fast open source search platform for all your enterprise, e-commerce, and analytics needs, built on Apache Lucene.

Node-based UI's, Graph Execution, Flow Based Programming, etc

  • https://github.com/xyflow/awesome-node-based-uis
    • A curated list with resources about node-based UIs

  • https://github.com/xyflow/xyflow
  • https://github.com/retejs/rete
    • JavaScript framework for visual programming

    • Rete.js is a framework for creating visual interfaces and workflows. It provides out-of-the-box solutions for visualization using various libraries and frameworks, as well as solutions for processing graphs based on dataflow and control flow approaches.

    • https://retejs.org/
      • A tailorable TypeScript-first framework for creating processing-oriented node-based editors

      • https://retejs.org/examples
        • https://retejs.org/examples/processing/dataflow
          • Data Flow

            This example showcases a data processing pipeline using rete-engine, where data flows from left to right through nodes. Each node features a data method, which receives arrays of incoming data from their respective input sockets and delivers an object containing data corresponding to the output sockets. To initiate their execution, you can make use of the engine.fetch method by specifying the identifier of the target node. Consequently, the engine will execute all predecessors recursively, extracting their output data and delivering it to the specified node.

        • https://retejs.org/examples/processing/control-flow
          • Control Flow

            This example showcases an executing of schema via control flow using rete-engine, where each node dynamically decides which of its outgoing nodes will receive control. Each node features an execute method that takes an input port key as a control source, and a function for conveying control to outgoing nodes through a defined output port. To initiate the execution of the flow, you can use engine.execute method, specifying the identifier of the starting node. Consequently, the outgoing nodes will be executed sequentially, starting from the designated node.

        • https://retejs.org/examples/processing/hybrid-engine
          • Hybrid Engine

            This example shows how rete-engine allows for the simultaneous integration of both dataflow and control flow. Consequently, certain nodes serve as data sources, others manage the flow, and a third set incorporates both of these approaches.

        • https://retejs.org/examples/modules
          • This example showcases a schema reusability technique, where processing is carried out using DataflowEngine. This is accomplished by creating a dedicated Module node that loads a nested schema containing Input and Output nodes, subsequently generating corresponding sockets. As a result, the module node initializes the engine, feeds it with input data, executes it, and retrieves the output data.

        • https://retejs.org/examples/scopes
          • Scopes

            The structures shown in this example may also be referred to as subgraphs or nested nodes. This functionality is achieved using the advanced rete-scopes-plugin plugin. Changing a node's parent is easy: simply long-press the node and move it over the new parent node.

        • https://retejs.org/examples/selectable-connections
          • Selectable connections The editor doesn't offer a built-in connection selection feature. However, if you're using BidirectFlow and can't delete connections from UI, or you need to select connections for other purposes, you can create a custom connection and sync it with AreaExtensions.selector

        • https://retejs.org/examples/reroute
          • Reroute This particular example shows the usage of a plugin designed for user-controlled connection rerouting. Users can insert rerouting points by clicking on a connection or remove them by right-clicking. These points can be dragged or selected by users (similarly to nodes) to move multiple points at once.

        • https://retejs.org/examples/codegen
      • https://retejs.org/docs
        • Visualization: you can choose React.js, Vue.js, Angular or Svelte to visualize nodes, sockets, controls, and connections. These visual components can be tailored to your specific needs by creating custom components for each framework, and they can all coexist in a single editor.

        • Processing: the framework offers various types of engines that enable processing diagrams based on their nature, including dataflow and control flow. These types can be combined within the same graph.

      • https://retejs.org/docs/development/rete-kit
        • The purpose of this tool is to improve efficiency when developing plugins or projects using this framework.

      • https://retejs.org/docs/api/rete-engine
        • DataflowEngine is a plugin that integrates Dataflow with NodeEditor making it easy to use. Additionally, it provides a cache for the data of each node in order to avoid recurring calculations.

        • ControlFlowEngine is a plugin that integrates ControlFlow with NodeEditor making it easy to use

  • https://github.com/graphology/graphology
    • Graphology graphology is a robust & multipurpose Graph object for JavaScript and TypeScript.

      It aims at supporting various kinds of graphs with the same unified interface.

      A graphology graph can therefore be directed, undirected or mixed, allow self-loops or not, and can be simple or support parallel edges.

      Along with this Graph object, one will also find a comprehensive standard library full of graph theory algorithms and common utilities such as graph generators, layouts, traversals etc.

      Finally, graphology graphs are able to emit a wide variety of events, which makes them ideal to build interactive renderers for the browser.

    • https://graphology.github.io/
  • https://github.com/cytoscape/cytoscape.js
  • https://github.com/jagenjo/litegraph.js
    • A graph node engine and editor written in Javascript similar to PD or UDK Blueprints, comes with its own editor in HTML5 Canvas2D. The engine can run client side or server side using Node. It allows to export graphs as JSONs to be included in applications independently.

  • https://github.com/noflo/noflo
    • NoFlo: Flow-based programming for JavaScript NoFlo is an implementation of flow-based programming for JavaScript running on both Node.js and the browser. From WikiPedia:

      In computer science, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

    • NoFlo itself is just a library for implementing flow-based programs in JavaScript. There is an ecosystem of tools around NoFlo and the fbp protocol that make it more powerful. Here are some of them:

      • Flowhub -- browser-based visual programming IDE for NoFlo and other flow-based systems
      • noflo-nodejs -- command-line interface for running NoFlo programs on Node.js
      • noflo-browser-app -- template for building NoFlo programs for the web
      • noflo-assembly -- industrial approach for designing NoFlo programs
      • fbp-spec -- data-driven tests for NoFlo and other FBP environments
      • flowtrace -- tool for retroactive debugging of NoFlo programs. Supports visual replay with Flowhub

      See also the list of reusable NoFlo modules on NPM.

    • https://noflojs.org/
    • https://flowhub.io/ide/
      • Flowhub IDE is a tool for building full-stack applications in a visual way. With the ecosystem of flow-based programming environments, you can use Flowhub to create anything from distributed data processing applications to internet-connected artworks.

    • https://flowbased.github.io/fbp-protocol/
      • FBP Network Protocol The Flow-Based Programming network protocol (FBP protocol) has been designed primarily for flow-based programming interfaces like the Flowhub to communicate with various FBP runtimes. However, it can also be utilized for communication between different runtimes, for example server-to-server or server-to-microcontroller.

      • https://github.com/flowbased/fbp
        • FBP flow definition language parser The fbp library provides a parser for a domain-specific language for flow-based-programming (FBP), used for defining graphs for FBP programming environments like NoFlo, MicroFlo and MsgFlo.

    • https://en.wikipedia.org/wiki/Flow-based_programming
      • In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of black box processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.

      • https://en.wikipedia.org/wiki/Component-based_software_engineering
        • Component-based software engineering (CBSE), also called component-based development (CBD), is a style of software engineering that aims to build software out of loosely-coupled, modular components. It emphasizes the separation of concerns among different parts of a software system.

  • https://nodered.org/
    • Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways.

      It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click.

    • https://github.com/node-red/node-red
      • Low-code programming for event-driven applications

    • https://nodered.org/docs/api/modules/v/1.3/@node-red_runtime.html
      • @node-red/runtime This module provides the core runtime component of Node-RED. It does not include the Node-RED editor. All interaction with this module is done using the api provided.

      • https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/index.js#L125-L234
        • var redNodes = require("./nodes");
        • function start() {
          • Start the runtime

          • return redNodes.load().then(function() {
          • return redNodes.loadContextsPlugin().then(function () {
              redNodes.loadFlows().then(() => { redNodes.startFlows() }).catch(function(err) {});
              started = true;
            });
      • https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/nodes/index.js#L198-L267
        • var registry = require("@node-red/registry");
          var flows = require("../flows");
          var context = require("./context");
        • module.exports = {
              // Lifecycle
              init: init,
              load: registry.load,
          
              // ..snip..
          
              // Flow handling
              loadFlows:  flows.load,
              startFlows: flows.startFlows,
              stopFlows:  flows.stopFlows,
              setFlows:   flows.setFlows,
              getFlows:   flows.getFlows,
          
              addFlow:     flows.addFlow,
              getFlow:     flows.getFlow,
              updateFlow:  flows.updateFlow,
              removeFlow:  flows.removeFlow,
          
              // ..snip..
          
              // Contexts
              loadContextsPlugin: context.load,
              closeContextsPlugin: context.close,
              listContextStores: context.listStores,
          };

Unsorted

  • https://github.com/google-gemini/cookbook
    • Gemini API Cookbook

    • A collection of guides and examples for the Gemini API.

    • This is a collection of guides and examples for the Gemini API, including quickstart tutorials for writing prompts and using different features of the API, and examples of things you can build.

    • https://ai.google.dev/gemini-api/docs
      • Get started with Gemini API

  • https://github.com/NaturalNode/natural/
  • https://github.com/pytorch/torchtune
  • https://llama.meta.com/llama3/
    • Meta Llama 3 Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications

    • https://github.com/meta-llama/llama3
      • Meta Llama 3

      • The official Meta Llama 3 GitHub site

      • We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.

        This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models — including sizes of 8B to 70B parameters.

        This repository is a minimal example of loading Llama 3 models and running inference. For more detailed examples, see llama-recipes.

        • https://github.com/meta-llama/llama-recipes
          • Llama Recipes: Examples to get started using the Llama models from Meta

          • Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

  • https://zapier.com/blog/train-chatgpt-to-write-like-you/
    • How to train ChatGPT to write like you

  • https://github.com/EleutherAI/gpt-neox
    • An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

    • GPT-NeoX This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. This library is in widespread use in academic, industry, and government labs, including by researchers at Oak Ridge National Lab, CarperAI, Stability AI, Together.ai, Korea University, Carnegie Mellon University, and the University of Tokyo among others. Uniquely among similar libraries GPT-NeoX supports a wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.

      If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.

    • https://github.com/EleutherAI/gpt-neox#why-gpt-neox
      • Why GPT-NeoX?

        GPT-NeoX leverages many of the same features and technologies as the popular Megatron-DeepSpeed library but with substantially increased usability and novel optimizations. Major features include:

        • Distributed training with ZeRO and 3D parallelism
        • A wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.
        • Cutting edge architectural innovations including rotary and alibi positional embeddings, parallel feedforward attention layers, and flash attention.
        • Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 & 2
        • Curriculum Learning
        • Easy connections with the open source ecosystem, including Hugging Face's tokenizers and transformers libraries, logging via WandB, and evaluation via our Language Model Evaluation Harness.
  • https://microsoft.github.io/promptflow/
    • Prompt flow Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

      With prompt flow, you will be able to:

      • Create flows that link LLMs, prompts, Python code and other tools together in a executable workflow.
      • Debug and iterate your flows, especially the interaction with LLMs with ease.
      • Evaluate your flows, calculate quality and performance metrics with larger datasets.
      • Integrate the testing and evaluation into your CI/CD system to ensure quality of your flow.
      • Deploy your flows to the serving platform you choose or integrate into your app’s code base easily.
      • (Optional but highly recommended) Collaborate with your team by leveraging the cloud version of Prompt flow in Azure AI.
    • https://microsoft.github.io/promptflow/concepts/concept-flows.html
      • Flows

      • While how LLMs work may be elusive to many developers, how LLM apps work is not - they essentially involve a series of calls to external services such as LLMs/databases/search engines, or intermediate data processing, all glued together.

    • https://microsoft.github.io/promptflow/reference/index.html
      • Reference

    • https://github.com/microsoft/autogen/tree/main/samples/apps/promptflow-autogen
      • Pomptflow Autogen Example

  • https://github.com/stanfordnlp/dspy
    • DSPy: The framework for programming—not prompting—foundation models

    • DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.

      To make this more systematic and much more powerful, DSPy does two things. First, it separates the flow of your program (modules) from the parameters (LM prompts and weights) of each step. Second, DSPy introduces new optimizers, which are LM-driven algorithms that can tune the prompts and/or the weights of your LM calls, given a metric you want to maximize.

      DSPy can routinely teach powerful models like GPT-3.5 or GPT-4 and local models like T5-base or Llama2-13b to be much more reliable at tasks, i.e. having higher quality and/or avoiding specific failure patterns. DSPy optimizers will "compile" the same program into different instructions, few-shot prompts, and/or weight updates (finetunes) for each LM. This is a new paradigm in which LMs and their prompts fade into the background as optimizable pieces of a larger system that can learn from data. tldr; less prompting, higher scores, and a more systematic approach to solving hard tasks with LMs.

    • https://dspy-docs.vercel.app/
      • DSPy - Programming—not prompting—Language Models

      • The Way of DSPy

        • Systematic Optimization: Choose from a range of optimizers to enhance your program. Whether it's generating refined instructions, or fine-tuning weights, DSPy's optimizers are engineered to maximize efficiency and effectiveness.
        • Modular Approach: With DSPy, you can build your system using predefined modules, replacing intricate prompting techniques with straightforward, effective solutions.
        • Cross-LM Compatibility: Whether you're working with powerhouse models like GPT-3.5 or GPT-4, or local models such as T5-base or Llama2-13b, DSPy seamlessly integrates and enhances their performance in your system.
  • https://github.com/sgl-project/sglang
    • SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

    • https://lmsys.org/blog/2024-01-17-sglang/
      • Fast and Expressive LLM Inference with RadixAttention and SGLang

      • On the backend, we propose RadixAttention, a technique for automatic and efficient KV cache reuse across multiple LLM generation calls.

      • On the frontend, we develop a flexible domain-specific language embedded in Python to control the generation process. This language can be executed in either interpreter mode or compiler mode.

      • KV cache reuse means different prompts with the same prefix can share the intermediate KV cache and avoid redundant memory and computation.

      • To systematically exploit these reuse opportunities, we introduce RadixAttention, a novel technique for automatic KV cache reuse during runtime. Instead of discarding the KV cache after finishing a generation request, our approach retains the KV cache for both prompts and generation results in a radix tree. This data structure enables efficient prefix search, insertion, and eviction. We implement a Least Recently Used (LRU) eviction policy, complemented by a cache-aware scheduling policy, to enhance the cache hit rate.

      • On the frontend, we introduce SGLang, a domain-specific language embedded in Python. It allows you to express advanced prompting techniques, control flow, multi-modality, decoding constraints, and external interaction easily. A SGLang function can be run through various backends, such as OpenAI, Anthropic, Gemini, and local models.

      • Figure 5 shows a concrete example. It implements a multi-dimensional essay judge utilizing the branch-solve-merge prompting technique. This function uses LLMs to evaluate the quality of an essay from multiple dimensions, merges the judgments, generates a summary, and assigns a final grade.

      • The syntax of SGLang is largely inspired by Guidance. However, we additionally introduce new primitives and handle intra-program parallelism and batching

        • https://github.com/guidance-ai/guidance
          • Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use case—while reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly.

      • SGLang outperformed the baseline systems in all benchmarks, achieving up to 5 times higher throughput. It also excelled in terms of latency, particularly for the first token latency, where a prefix cache hit can be significantly beneficial. These improvements are attributed to the automatic KV cache reuse with RadixAttention, the intra-program parallelism enabled by the interpreter, and the co-design of the frontend and backend systems. Additionally, our ablation study revealed no noticeable overhead even in the absence of cache hits, leading us to always enable the RadixAttention feature in the runtime.

  • https://github.com/mozilla-Ocho/llamafile
    • Distribute and run LLMs with a single file

    • llamafile lets you distribute and run LLMs with a single file Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.

    • https://hacks.mozilla.org/2023/11/introducing-llamafile/
      • Introducing llamafile

  • https://github.com/microsoft/LLMLingua
    • To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

    • https://github.com/microsoft/LLMLingua/blob/main/examples/Retrieval.ipynb
      • We know that LLMs have a 'lost in the middle' issue, where the position of key information in the prompt significantly impacts the final result.

      • How to build an accurate positional relationship between the document and the question has become an important issue. We evaluated the effects of four types of reranker methods on a dataset (NaturalQuestions Multi-document QA that is very close to the actual RAG scenario, e.g. BingChat).

      • The results show that reranker-based methods are significantly better than embedding methods. The LongLLMLingua method is even better than the current SoTA reranker methods, and it can more accurately capture the relationship between the query and the document, thus alleviating the 'lost in the middle' issue.

    • https://llmlingua.com/
      • (Long)LLMLingua | Designing a Language for LLMs via Prompt Compression

    • https://blog.llamaindex.ai/longllmlingua-bye-bye-to-middle-loss-and-save-on-your-rag-costs-via-prompt-compression-54b559b9ddf7
      • LongLLMLingua: Bye-bye to Middle Loss and Save on Your RAG Costs via Prompt Compression

  • https://github.com/apoorvumang/prompt-lookup-decoding
    • In several LLM use cases where you're doing input grounded generation (summarization, document QA, multi-turn chat, code editing), there is high n-gram overlap between LLM input (prompt) and LLM output. This could be entity names, phrases, or code chunks that the LLM directly copies from the input while generating the output. Prompt lookup exploits this pattern to speed up autoregressive decoding in LLMs.

    • On both summarization and context-QA, we get a relatively consistent 2.4x speedup (on average).

    • https://twitter.com/apoorv_umang/status/1728831397153104255
      • Prompt lookup decoding: Get 2x-4x reduction in latency for input grounded LLM generation with no drop in quality using this speculative decoding technique

    • huggingface/transformers#27722
      • Adding support for prompt lookup decoding (variant of assisted generation)

    • ggerganov/llama.cpp#4226
      • lookahead-prompt: add example

  • https://github.com/vercel/ai
    • Vercel AI SDK The Vercel AI SDK is a library for building AI-powered streaming text and chat UIs.

    • Build AI-powered applications with React, Svelte, Vue, and Solid

    • https://sdk.vercel.ai/docs
      • Vercel AI SDK An open source library for building AI-powered user interfaces.

        The Vercel AI SDK is an open-source library designed to help developers build conversational streaming user interfaces in JavaScript and TypeScript. The SDK supports React/Next.js, Svelte/SvelteKit, and Vue/Nuxt as well as Node.js, Serverless, and the Edge Runtime.

  • https://github.com/oobabooga/text-generation-webui
    • A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

    • Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation.

    • https://github.com/oobabooga/text-generation-webui-extensions
      • This is a directory of extensions for oobabooga/text-generation-webui

  • https://github.com/huggingface/chat-ui
    • Open source codebase powering the HuggingChat app

  • https://github.com/lm-sys/FastChat
  • https://github.com/vllm-project/vllm
  • https://github.com/philipturner/metal-benchmarks
    • Apple GPU microarchitecture

    • This document thoroughly explains the Apple GPU microarchitecture, focusing on its GPGPU performance. Details include latencies for each ALU assembly instruction, cache sizes, and the number of unique instruction pipelines. This document enables evidence-based reasoning about performance on the Apple GPU, helping people diagnose bottlenecks in real-world software. It also compares Apple silicon to generations of AMD and Nvidia microarchitectures, showing where it might exhibit different performance patterns. Finally, the document examines how Apple's design choices improve power efficiency compared to other vendors.

      This repository also contains open-source benchmarking scripts. They allow anyone to reproduce and verify the author's claims about performance. A complementary library reports the hardware specifications of any Apple-designed GPU.

      • https://github.com/philipturner/applegpuinfo
        • Print all known information about the GPU on Apple-designed chips

        • This is a mini-framework for querying parameters of an Apple-designed GPU. It also contains a command-line tool, gpuinfo, which reports information similarly to clinfo. It was co-authored with an AI.

        • https://github.com/Oblomov/clinfo
          • Print all known information about all available OpenCL platforms and devices in the system

          • clinfo is a simple command-line application that enumerates all possible (known) properties of the OpenCL platform and devices available on the system.

  • https://github.com/tinygrad/tinygrad
    • You like pytorch? You like micrograd? You love tinygrad! ❤️

    • This may not be the best deep learning framework, but it is a deep learning framework.

      Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.

    • https://tinygrad.org/
  • https://github.com/microsoft/DirectML
    • DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment