0xdevalias/ai-ml-toolkit.md

## ai-ml-toolkit.md

      
    Raw
  

              ai-ml-toolkit.md
            
          
    AI/ML Toolkit

Some notes on AI/ML tools that seem interesting/useful (largely aiming to focus on open source tools)
Table of Contents


Some of my other related gists
Image Generation

Automatic1111 (Stable Diffusion WebUI)
ComfyUI
Unsorted


Song / Audio Generation

Udio
Suno
Stable Audio
AudioCraft: MusicGen, AudioGen, etc
Neural Audio Codecs
Audio Super Resolution
Unsorted
See Also


ollama
LangChain, LangServe, LangSmith, LangFlow, etc
AI Agents / etc

Agent Benchmarks / Leaderboards
OpenAI Assistants / ChatGPT custom GPTs
OpenGPTs
Autogen / FLAML / etc
ChatDev
Unsorted


Code Generation / Execution

Unsorted
Code Leaderboards / Benchmarks
AutoCoder
OpenCodeInterpreter
OpenInterpreter


Vision / Multimodal

OpenAI
LLaVA / etc
Unsorted


Vector Databases/Search, Similarity Search, Clustering, etc

Faiss


Benchmarks / Leaderboards
Prompts / Prompt Engineering / etc
Other Useful Tools / Libraries / etc

Unsorted
Node-based UI's, Graph Execution, Flow Based Programming, etc


Unsorted


Some of my other related gists


AI Agent Swarm Musings (0xdevalias' gist)
ChatGPT / AI Rental Property Plugins/Agents (0xdevalias' gist)

Image Generation

Automatic1111 (Stable Diffusion WebUI)


https://github.com/AUTOMATIC1111/stable-diffusion-webui


Stable Diffusion web UI


A browser interface based on Gradio library for Stable Diffusion.


https://github.com/AUTOMATIC1111/stable-diffusion-webui#installation-on-apple-silicon

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon


ComfyUI


https://github.com/comfyanonymous/ComfyUI


The most powerful and modular stable diffusion GUI with a graph/nodes interface.


https://github.com/comfyanonymous/ComfyUI#apple-mac-silicon


You can install ComfyUI in Apple Mac silicon (M1 or M2) with any recent macOS version.


https://comfyanonymous.github.io/ComfyUI_examples/


ComfyUI Examples
This repo contains examples of what is achievable with ComfyUI. All the images in this repo contain metadata which means they can be loaded into ComfyUI with the Load button (or dragged onto the window) to get the full workflow that was used to create the image.


comfyanonymous/ComfyUI#42


Alternative UI


https://github.com/rvion/CushyStudio

The AI and Generative Art platform for everyone


https://github.com/space-nuko/ComfyBox


Customizable Stable Diffusion frontend for ComfyUI


ComfyBox is a frontend to Stable Diffusion that lets you create custom image generation interfaces without any code. It uses ComfyUI under the hood for maximum power and extensibility.


comfyanonymous/ComfyUI#389


Separation of UI presentation and graph


comfyanonymous/ComfyUI#497


[Feature Request] Chip based groups / general group update


comfyanonymous/ComfyUI#669


[FEATURE REQUEST] sub workflows with customizable inputs / output pins


comfyanonymous/ComfyUI#724


Subgraph support


comfyanonymous/ComfyUI#931


Node Expansion, While Loops, Components, and Lazy Evaluation


comfyanonymous/ComfyUI#1132


Simple changes to massively simplify ComfyUI in basic use-cases


comfyanonymous/ComfyUI#1310


Switch the version of litegraph used to litegraph.ts


comfyanonymous/ComfyUI#1776


Group nodes


https://github.com/ltdrdata/ComfyUI-Workflow-Component


This is a side project to experiment with using workflows as components.


https://github.com/ltdrdata/ComfyUI-Manager


ComfyUI Manager
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.


https://github.com/Chaoses-Ib/ComfyScript


A Python front end for ComfyUI


Can apparently use this to generate more complex workflows without having to mess with making the graph manually
https://github.com/Chaoses-Ib/ComfyScript#workflow-generation


Workflow generation


https://github.com/Chaoses-Ib/ComfyScript#transpiler


Transpiler
The transpiler can translate ComfyUI's workflows to ComfyScript.


https://github.com/0xbitches/ComfyUI-LCM


Latent Consistency Model for ComfyUI


Archival Notice: ComfyUI has officially implemented LCM scheduler, see this commit. Please update your install and use the official implementation.


https://github.com/Fannovel16/ComfyUI-MotionDiff


ComfyUI MotionDiff
Implementation of MDM, MotionDiffuse and ReMoDiffuse into ComfyUI


Unsorted


https://playgroundai.com/
https://github.com/invoke-ai/InvokeAI


Invoke AI - Generative AI for Professional Creatives


InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.


https://invoke.ai/


https://github.com/Sygil-Dev/sygil-webui


Stable Diffusion web UI


https://github.com/easydiffusion/easydiffusion


Easy Diffusion 3.0


Easiest 1-click way to create beautiful artwork on your PC using AI, with no tech knowledge. Provides a browser UI for generating images from text prompts and images. Just enter your text prompt, and see the generated image.


https://easydiffusion.github.io/


https://github.com/varunshenoy/opendream


Opendream: A Web UI For the Rest of Us 💭 🎨


An extensible, easy-to-use, and portable diffusion web UI 👨‍🎨


Opendream brings much needed and familiar features, such as layering, non-destructive editing, portability, and easy-to-write extensions, to your Stable Diffusion workflows.


https://www.reddit.com/r/StableDiffusion/comments/15rzu8h/opendream_a_layer_based_stable_diffusion_web_ui/


Opendream: A Layer Based Stable Diffusion Web UI


Song / Audio Generation

Udio


https://www.udio.com/


Udio | Make your music


Suno


https://www.suno.ai/


Make any song you can imagine


https://app.suno.ai/


https://github.com/suno-ai/bark


Text-Prompted Generative Audio Model


Stable Audio


https://arxiv.org/abs/2404.10301


Long-form music generation with latent diffusion (2024)


Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.


https://stability-ai.github.io/stable-audio-2-demo/


stable-audio-2-demo


Additional creative capabilities
Audio-to-audio With diffusion models is possible to perform some degree of style-transfer by initializing the noise with audio during sampling. This capability can be used to modify the aesthetics of an existing recording based on a given text prompt, whilst maintaining the reference audio’s structure (e.g., a beatbox recording could be style-transfered to produce realistic-sounding drums). As a result, our model can be influenced by not only text prompts but also audio inputs, enhancing its controllability and expressiveness. We noted that when initialized with voice recordings (such as beatbox or onomatopoeias), there is a sensation of control akin to an instrument.


Memorization analysis
Recent works examined the potential of generative models to memorize training data, especially for repeated elements in the training set. Further, musicLM conducted a memorization analysis to address concerns on the potential misappropriation of creative content. Adhering to principles of responsible model development, we also run a comprehensive study on memorization.
Considering the increased probability of memorizing repeated music within the dataset, we start by studying if our training set contains repeated data. We embed all our training data using the LAION-CLAP audio encoder to select audios that are close in this space based on a manually set threshold. The threshold is set such that the selected audios correspond to exact replicas. With this process, we identify 5566 repeated audios in our training set.
We compare our model’s generations against the training set in LAION-CLAP space. Generations are from 5566 prompts within the repeated training data (in-distribution), and 586 prompts from the Song Describer Dataset (no-singing, out-of-distribution). We then identify the top-50 generated music that is closest to the training data and listen.
We extensively listened to potential memorization candidates, and could not find memorization.


https://www.stableaudio.com/


Stable Audio
Create music with AI


https://www.stableaudio.com/user-guide/text-to-audio


Text-to-audio


https://www.stableaudio.com/user-guide/audio-to-audio


Audio-to-audio


https://www.stableaudio.com/user-guide/model-2


Stable Audio 2.0 Model


Our groundbreaking Stable Audio AudioSparx 2.0 model has been designed to generate full tracks with coherent structure at 3 minutes and 10 seconds. Our new model is available for everyone to generate full tracks on our Stable Audio product.


Key features:

Stable Audio 2.0 sets a new standard in AI generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1KHz stereo.
The new model introduces audio-to-audio generation by allowing users to upload and transform samples using natural language prompts.
Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators.


https://stability.ai/news?tags=Audio

https://stability.ai/news/stable-audio-using-ai-to-generate-music


Announcing Stable Audio, a product for music & sound generation


https://stability.ai/news/stable-audio-2-0


Introducing Stable Audio 2.0


https://stability.ai/news/introducing-stable-audio-open


Introducing Stable Audio Open - An Open Source Model for Audio Samples and Sound Design


Key Takeaways:

Stable Audio Open is an open source text-to-audio model for generating up to 47 seconds of samples and sound effects.
Users can create drum beats, instrument riffs, ambient sounds, foley and production elements.
The model enables audio variations and style transfer of audio samples.


https://huggingface.co/stabilityai/stable-audio-open-1.0


Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. It comprises three components: an autoencoder that compresses waveforms into a manageable sequence length, a T5-based text embedding for text conditioning, and a transformer-based diffusion (DiT) model that operates in the latent space of the autoencoder.


This model is made to be used with the stable-audio-tools library for inference


https://github.com/Stability-AI/stable-audio-tools


stable-audio-tools
Training and inference code for audio generation models


https://github.com/Stability-AI/stable-audio-tools#fine-tuning


Fine-tuning
Fine-tuning a model involves continuning a training run from a pre-trained checkpoint.


https://github.com/diontimmer/audio-diffusion-gradio


audio-diffusion-gradio


Decked-out gradio client for audio diffusion, mainly stable-audio-tools.


The Audio Diffusion Gradio Interface is a user-friendly graphical user interface (GUI) made in Gradio that simplifies the process of working with audio diffusion models, autoencoders, diffusion autoencoders, and various models trainable using the stable-audio-tools package. This interface not only streamlines your audio diffusion tasks but also provides a modular extension system, enabling users to easily integrate additional functionalities.


https://github.com/lks-ai/ComfyUI-StableAudioSampler


ComfyUI-StableAudioSampler
The New Stable Audio Open 1.0 Sampler In a ComfyUI Node. Make some beats!


https://huggingface.co/spaces/ameerazam08/stableaudio-open-1.0


Stable Audio Multiplayer Live
Generate audio with text, share and learn from others how to best prompt this new model


AudioCraft: MusicGen, AudioGen, etc


https://github.com/facebookresearch/audiocraft


Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.


https://github.com/facebookresearch/audiocraft#models


At the moment, AudioCraft contains the training code and inference code for:


MusicGen: A state-of-the-art controllable text-to-music model.


https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md


MusicGen: Simple and Controllable Music Generation
AudioCraft provides the code and models for MusicGen, a simple and controllable model for music generation. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.


AudioGen: A state-of-the-art text-to-sound model.


https://github.com/facebookresearch/audiocraft/blob/main/docs/AUDIOGEN.md


AudioGen: Textually-guided audio generation
AudioCraft provides the code and a model re-implementing AudioGen, a textually-guided audio generation model that performs text-to-sound generation.
The provided AudioGen reimplementation follows the LM model architecture introduced in MusicGen and is a single stage auto-regressive Transformer model trained over a 16kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. This model variant reaches similar audio quality than the original implementation introduced in the AudioGen publication while providing faster generation speed given the smaller frame rate.


EnCodec: A state-of-the-art high fidelity neural audio codec.


https://github.com/facebookresearch/audiocraft/blob/main/docs/ENCODEC.md


EnCodec: High Fidelity Neural Audio Compression
AudioCraft provides the training code for EnCodec, a state-of-the-art deep learning based audio codec supporting both mono and stereo audio, presented in the High Fidelity Neural Audio Compression paper.


Multi Band Diffusion: An EnCodec compatible decoder using diffusion.


https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md


MultiBand Diffusion
AudioCraft provides the code and models for MultiBand Diffusion, From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion. MultiBand diffusion is a collection of 4 models that can decode tokens from EnCodec tokenizer into waveform audio.


MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.


https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md


MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer
AudioCraft provides the code and models for MAGNeT, Masked Audio Generation using a Single Non-Autoregressive Transformer.
MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike prior work on masked generative audio Transformers, such as SoundStorm and VampNet, MAGNeT doesn't require semantic token conditioning, model cascading or audio prompting, and employs a full text-to-audio using a single non-autoregressive Transformer.


Neural Audio Codecs


https://haoheliu.github.io/SemantiCodec/


SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound


Highlights

Ultra-Low bit rate We focus on bitrate between 0.31 kbps and 1.43 kbps, with token rate of 25, 50, or 100 per second.
Strong semantic in the audio token Indicated by classification accuracy.
Supporting variable vocabulary sizes One model that supporting four different vocabulary sizes.


https://github.com/haoheliu/SemantiCodec


SemantiCodec


https://arxiv.org/abs/2405.00233


SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound


Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these challenges, we introduce SemantiCodec, a novel codec designed to compress audio into fewer than a hundred tokens per second across diverse audio types, including speech, general audio, and music, without compromising quality. SemantiCodec features a dual-encoder architecture: a semantic encoder using a self-supervised AudioMAE, discretized using k-means clustering on extensive audio data, and an acoustic encoder to capture the remaining details. The semantic and acoustic encoder outputs are used to reconstruct audio via a diffusion-model-based decoder. SemantiCodec is presented in three variants with token rates of 25, 50, and 100 per second, supporting a range of ultra-low bit rates between 0.31 kbps and 1.43 kbps. Experimental results demonstrate that SemantiCodec significantly outperforms the state-of-the-art Descript codec on reconstruction quality. Our results also suggest that SemantiCodec contains significantly richer semantic information than all evaluated audio codecs, even at significantly lower bitrates.


https://github.com/yangdongchao/AcademiCodec


AcademiCodec: An Open Source Audio Codec Model for Academic Research


Audio codec models are widely used in audio communication as a crucial technique for compressing audio into discrete representations. Nowadays, audio codec models are increasingly utilized in generation fields as intermediate representations. For instance, AudioLM is ann audio generation model that uses the discrete representation of SoundStream as a training target, while VALL-E employs the Encodec model as an intermediate feature to aid TTS tasks. Despite their usefulness, two challenges persist: (1) training these audio codec models can be difficult due to the lack of publicly available training processes and the need for large-scale data and GPUs; (2) achieving good reconstruction performance requires many codebooks, which increases the burden on generation models. In this study, we propose a group-residual vector quantization (GRVQ) technique and use it to develop a novel \textbf{Hi}gh \textbf{Fi}delity Audio Codec model, HiFi-Codec, which only requires 4 codebooks. We train all the models using publicly available TTS data such as LibriTTS, VCTK, AISHELL, and more, with a total duration of over 1000 hours, using 8 GPUs. Our experimental results show that HiFi-Codec outperforms Encodec in terms of reconstruction performance despite requiring only 4 codebooks. To facilitate research in audio codec and generation, we introduce AcademiCodec, the first open-source audio codec toolkit that offers training codes and pre-trained models for Encodec, SoundStream, and HiFi-Codec.


https://github.com/yangdongchao/AcademiCodec#what-the-difference-between-soundstream-encodec-and-hifi-codec


In our view, the mian difference between SoundStream and Encodec is the different Discriminator choice. For Encodec, it only uses a STFT-dicriminator, which forces the STFT-spectrogram be more real. SoundStream use two types of Discriminator, one forces the waveform-level to be more real, one forces the specrogram-level to be more real. In our code, we adopt the waveform-level discriminator from HIFI-GAN. The spectrogram level discrimimator from Encodec. In thoery, we think SoundStream enjoin better performance.


https://arxiv.org/abs/2305.02765


HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec


Audio codec models are widely used in audio communication as a crucial technique for compressing audio into discrete representations. Nowadays, audio codec models are increasingly utilized in generation fields as intermediate representations. For instance, AudioLM is an audio generation model that uses the discrete representation of SoundStream as a training target, while VALL-E employs the Encodec model as an intermediate feature to aid TTS tasks. Despite their usefulness, two challenges persist: (1) training these audio codec models can be difficult due to the lack of publicly available training processes and the need for large-scale data and GPUs; (2) achieving good reconstruction performance requires many codebooks, which increases the burden on generation models. In this study, we propose a group-residual vector quantization (GRVQ) technique and use it to develop a novel \textbf{Hi}gh \textbf{Fi}delity Audio Codec model, HiFi-Codec, which only requires 4 codebooks. We train all the models using publicly available TTS data such as LibriTTS, VCTK, AISHELL, and more, with a total duration of over 1000 hours, using 8 GPUs. Our experimental results show that HiFi-Codec outperforms Encodec in terms of reconstruction performance despite requiring only 4 codebooks. To facilitate research in audio codec and generation, we introduce AcademiCodec, the first open-source audio codec toolkit that offers training codes and pre-trained models for Encodec, SoundStream, and HiFi-Codec.


https://github.com/descriptinc/descript-audio-codec


Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN


State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio


With Descript Audio Codec, you can compress 44.1 KHz audio into discrete codes at a low 8 kbps bitrate.
That's approximately 90x compression while maintaining exceptional fidelity and minimizing artifacts.
Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.
It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.)


https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5


Descript Audio Codec
Welcome to the demo page for the paper “High Fidelity Compression Algorithm with Improved RVQGAN”. Here, we provide samples from our ablation studies and other competitive baselines.


https://arxiv.org/abs/2306.06546


High-Fidelity Audio Compression with Improved RVQGAN


Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses. We compress all domains (speech, environment, music, etc.) with a single universal model, making it widely applicable to generative modeling of all audio. We compare with competing audio compression algorithms, and find our method outperforms them significantly. We provide thorough ablations for every design choice, as well as open-source code and trained model weights. We hope our work can lay the foundation for the next generation of high-fidelity audio modeling.


https://github.com/DBraun/DAC-JAX


DAC-JAX


A JAX Implementation of the Descript Audio Codec


Descript Audio Codec (.dac) is a high-fidelity general neural audio codec introduced in the paper "High-Fidelity Audio Compression with Improved RVQGAN".
This repository is an unofficial JAX implementation of the PyTorch-based DAC and has no affiliation with Descript.


https://github.com/AudiogenAI/agc


Audiogen Codec (agc)
We are announcing the open source release of Audiogen Codec (agc) 🎉. A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵.
It comes in two flavors:

agc-continuous 🔄 KL regularized, 32 channels, 100hz.
agc-discrete 🔢 24 stages of residual vector quantization, 50hz.

AGC (Audiogen Codec) is a convolutional autoencoder based on the DAC architecture, which holds SOTA 🏆. We found that training with EMA and adding a perceptual loss term with CLAP features improved performance. These codecs, being low compression, outperform Meta's EnCodec and DAC on general audio as validated from internal blind ELO games 🎲.
We trained (relatively) very low compression codecs in the pursuit of solving a core issue regarding general music and audio generation, low acoustic quality and audible artifacts, which hinder industry use for these models 🚫🎶. Our hope is to encourage researchers to build hierarchical generative audio models that can efficiently use high sequence length representations without sacrificing semantic abilities 🧠.
This codec will power Audiogen's upcoming models. Stay tuned! 🚀


https://audiogen.notion.site/Audiogen-Codec-Examples-546fe64596f54e20be61deae1c674f20


Audiogen Codec Examples


https://github.com/facebookresearch/encodec


EnCodec: High Fidelity Neural Audio Compression


State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.


We provide our two multi-bandwidth models:

A causal model operating at 24 kHz on monophonic audio trained on a variety of audio data.
A non-causal model operating at 48 kHz on stereophonic audio trained on music-only data.

The 24 kHz model can compress to 1.5, 3, 6, 12 or 24 kbps, while the 48 kHz model support 3, 6, 12 and 24 kbps. We also provide a pre-trained language model for each of the models, that can further compress the representation by up to 40% without any further loss of quality.


https://github.com/facebookresearch/encodec#-transformers


Encodec has now been added to Transformers. For more information, please refer to Transformers' Encodec docs


https://huggingface.co/docs/transformers/main/en/model_doc/encodec


Using 🤗 Transformers, you can leverage Encodec at scale along with all the other supported models and datasets.


https://arxiv.org/abs/2210.13438


High Fidelity Neural Audio Compression


We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective, architectural changes and a study of various perceptual loss functions. We present an extensive subjective evaluation (MUSHRA tests) together with an ablation study for a range of bandwidths and audio domains, including speech, noisy-reverberant speech, and music. Our approach is superior to the baselines methods across all evaluated settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio.


Audio Super Resolution


https://github.com/haoheliu/versatile_audio_super_resolution


AudioSR: Versatile Audio Super-resolution at Scale


Versatile audio super resolution (any -> 48kHz) with AudioSR.


Pass your audio in, AudioSR will make it high fidelity!
Work on all types of audio (e.g., music, speech, dog, raining, ...) & all sampling rates.


https://replicate.com/nateraw/audio-super-resolution


AudioSR: Versatile Audio Super-resolution at Scale


Unsorted


https://cassetteai.com/


Cassette is your Copilot for AI Music Generation.
Our cutting edge Artificial Intelligence technology built using Latent Diffusion models (LDMs) makes music production, customization & listening available to everyone. Creating music is now as simple as writing a prompt.


See Also


AI Voice Cloning / Transfer (eg. RVCv2) (0xdevalias' gist)
Singing Voice Synthesizers (eg. Vocaloid, etc) (0xdevalias' gist)
Generating Synth Patches with AI (0xdevalias' gist)

ollama


https://github.com/ollama/ollama


Get up and running with Llama 2 and other large language models locally


https://github.com/ollama/ollama#model-library


Ollama supports a list of open-source models available on https://ollama.ai/library


https://github.com/ollama/ollama#customize-your-own-model


Ollama supports importing GGUF models in the Modelfile


https://medium.com/@phillipgimmi/what-is-gguf-and-ggml-e364834d241c


GGUF and GGML are file formats used for storing models for inference, particularly in the context of language models like GPT (Generative Pre-trained Transformer).


In summary, GGUF is positioned as an upgrade to GGML, offering more flexibility, extensibility, and compatibility. It aims to simplify the user experience and accommodate various models beyond llama.cpp. GGML, while a valuable early effort, had limitations that GGUF seeks to overcome.


https://github.com/ollama/ollama#cli-reference


CLI Reference


https://github.com/ollama/ollama#rest-api


Ollama has a REST API for running and managing models


https://github.com/ollama/ollama#community-integrations


Community Integrations


https://github.com/hinterdupfinger/obsidian-ollama


This is a plugin for Obsidian that allows you to use Ollama within your notes.


https://github.com/ollama/ollama-python


Ollama Python Library
The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.


https://github.com/ollama/ollama-js


Ollama JavaScript Library
The Ollama JavaScript library provides the easiest way to integrate your JavaScript project with Ollama.


https://ollama.ai/

https://ollama.ai/library

https://ollama.ai/library/mistral


Mistral 7B model is an Apache licensed 7.3B parameter model. It is available in both instruct (instruction following) and text completion.


https://ollama.ai/library/llama2


Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.


https://ollama.ai/library/codellama


Code Llama is a model for generating and discussing code, built on top of Llama 2. It’s designed to make workflows faster and efficient for developers and make it easier for people to learn how to code. It can generate both code and natural language about code. Code Llama supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash and more.


https://ollama.ai/library/zephyr


Zephyr is a 7 billion parameter model, fine-tuned on Mistral to achieve results similar to Llama 2 70B Chat in several benchmarks (ARC, HellaSwag, MMLU, TruthfulQA). It was fine-tuned using Direct Performance Optimization This model has the built-in alignment of its source datasets removed.


etc


https://ollama.ai/blog

https://ollama.ai/blog/python-javascript-libraries


Python & JavaScript Libraries


LangChain, LangServe, LangSmith, LangFlow, etc


https://github.com/langchain-ai/langchain


Building applications with LLMs through composability


LangChain is a framework for developing applications powered by language models.


Looking for the JS/TS library? Check out LangChain.js


https://www.langchain.com/
https://python.langchain.com/docs/get_started/introduction


https://github.com/langchain-ai/langchainjs


LangChain.js


Building applications with LLMs through composability


This is built to integrate as seamlessly as possible with the LangChain Python package. Specifically, this means all objects (prompts, LLMs, chains, etc) are designed in a way where they can be serialized and shared between languages.
The LangChainHub is a central place for the serialized versions of these prompts, chains, and agents.


https://js.langchain.com/docs/get_started/introduction
https://blog.cloudflare.com/langchain-and-cloudflare/


Using LangChainJS and Cloudflare Workers together


https://github.com/hwchase17/langchain-hub


Taking inspiration from Hugging Face Hub, LangChainHub is collection of all artifacts useful for working with LangChain primitives such as prompts, chains and agents. The goal of this repository is to be a central resource for sharing and discovering high quality prompts, chains and agents that combine together to form complex LLM applications.
We are starting off the hub with a collection of prompts, and we look forward to the LangChain community adding to this collection. We hope to expand to chains and agents shortly.


This repo is getting replaced by our hosted LangChain Hub Product! Visit it at https://smith.langchain.com/hub


https://github.com/langchain-ai/langserve


LangServe helps developers deploy LangChain runnables and chains as a REST API.
This library is integrated with FastAPI and uses pydantic for data validation.
In addition, it provides a client that can be used to call into runnables deployed on a server. A javascript client is available in LangChainJS.


https://www.langchain.com/langsmith


Build and deploy LLM apps with confidence
An all-in-one developer platform for every step of the application lifecycle.


A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.


https://github.com/logspace-ai/langflow


Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.


https://www.langflow.org/
https://github.com/logspace-ai/langflow_examples


Examples for LangFlow


https://github.com/logspace-ai/langflow-embedded-chat


The Langflow Embedded Chat is a powerful web component that enables seamless communication with the Langflow. This widget provides a chat interface, allowing you to integrate Langflow into your web applications effortlessly.


https://github.com/langfuse/langfuse


Langfuse is the open source LLM engineering platform


https://langfuse.com/

https://langfuse.com/docs


Langfuse is an open source LLM engineering platform to help teams collaboratively debug, analyze and iterate on their LLM Applications.


https://docs.langflow.org/guides/langfuse_integration


Integrating Langfuse with Langflow


AI Agents / etc

Agent Benchmarks / Leaderboards


See also:

Benchmarks / Leaderboards


https://github.com/zhangxjohn/LLM-Agent-Benchmark-List


LLM-Agent-Benchmark-List
A benchmark list for evaluation of large language models.


https://github.com/THUDM/AgentBench


A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)


https://llmbench.ai/agent


https://github.com/princeton-nlp/SWE-bench


[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?


SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.


https://www.swebench.com/

https://www.swebench.com/lite.html


SWE-bench Lite
A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers


SWE-bench was designed to provide a diverse set of codebase problems that were verifiable using in-repo unit tests. The full SWE-bench test split comprises 2,294 issue-commit pairs across 12 python repositories.
Since its release, we've found that for most systems evaluating on SWE-bench, running each instance can take a lot of time and compute. We've also found that SWE-bench can be a particularly difficult benchmark, which is useful for evaluating LMs in the long term, but discouraging for systems trying to make progress in the short term.
To remedy these issues, we've released a canonical subset of SWE-bench called SWE-bench Lite. SWE-bench Lite comprises 300 instances from SWE-bench that have been sampled to be more self-contained, with a focus on evaluating functional bug fixes. SWE-bench Lite covers 11 of the original 12 repositories in SWE-bench, with a similar diversity and distribution of repositories as the original. We perform similar filtering on the SWE-bench dev set to provide 23 development instances that can be useful for active development on the SWE-bench task. We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench Lite in lieu of the full SWE-bench set if necessary.


https://github.com/aorwall/SWE-bench-docker


A Docker based solution of the SWE-bench evaluation framework


This is a Dockerfile based solution of the SWE-Bench evaluation framework.
The solution is designed so that each "testbed" for testing a version of a repository is built in a separate Docker image. Each test is then run in its own Docker container. This approach ensures more stable test results because the environment is completely isolated and is reset for each test. Since the Docker container can be recreated each time, there's no need for reinstallation, speeding up the benchmark process.


OpenAI Assistants / ChatGPT custom GPTs


https://openai.com/blog/introducing-gpts


Introducing GPTs
You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills.


We’re rolling out custom versions of ChatGPT that you can create for a specific purpose—called GPTs. GPTs are a new way for anyone to create a tailored version of ChatGPT to be more helpful in their daily life, at specific tasks, at work, or at home—and then share that creation with others.


https://platform.openai.com/docs/assistants/overview


The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries.


OpenGPTs


https://github.com/langchain-ai/opengpts


This is an open source effort to create a similar experience to OpenAI's GPTs. It builds upon LangChain, LangServe and LangSmith. OpenGPTs gives you more control, allowing you to configure:
The LLM you use (choose between the 60+ that LangChain offers)

The prompts you use (use LangSmith to debug those)
The tools you give it (choose from LangChain's 100+ tools, or easily write your own)
The vector database you use (choose from LangChain's 60+ vector database integrations)
The retrieval algorithm you use
The chat history database you use


Autogen / FLAML / etc


https://github.com/microsoft/autogen


Enable Next-Gen Large Language Model Applications.


AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.


AutoGen enables building next-gen LLM applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.
It supports diverse conversation patterns for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy, the number of agents, and agent conversation topology.
It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.
AutoGen provides enhanced LLM inference. It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.


Roadmap: https://github.com/orgs/microsoft/projects/989/views/3
https://github.com/microsoft/autogen#multi-agent-conversation-framework


Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans. By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.


https://microsoft.github.io/autogen/blog/
https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenAssistant/


AutoGen Assistant: Interactively Explore Multi-Agent Workflows


To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Assistant, an interface powered by AutoGen. It allows you to:

Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
Use our UI to create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken).
Explicitly add skills to your agents and accomplish more tasks.
Publish your sessions to a local gallery.
AutoGen Assistant is open source, give it a try!


we are thrilled to introduce a new user-friendly interface: the AutoGen Assistant. Built upon the leading foundation of AutoGen and robust, modern web technologies like React.


With the AutoGen Assistant, users can rapidly create, manage, and interact with agents that can learn, adapt, and collaborate. As we release this interface into the open-source community, our ambition is not only to enhance productivity but to inspire a level of personalized interaction between humans and agents.


We recommend using a virtual environment (e.g., conda) to avoid conflicts with existing Python packages. With Python 3.10 or newer active in your virtual environment, use pip to install AutoGen Assistant: pip install autogenra


Once installed, run the web UI by entering the following in your terminal: autogenra ui --port 8081. This will start the application on the specified port. Open your web browser and go to http://localhost:8081/ to begin using AutoGen Assistant.


The AutoGen Assistant proposes some high-level concepts that help compose agents to solve tasks.

Agent Workflow: An agent workflow is a specification of a set of agents that can work together to accomplish a task. The simplest version of this is a setup with two agents – a user proxy agent (that represents a user i.e. it compiles code and prints result) and an assistant that can address task requests (e.g., generating plans, writing code, evaluating responses, proposing error recovery steps, etc.). A more complex flow could be a group chat where even more agents work towards a solution.
Session: A session refers to a period of continuous interaction or engagement with an agent workflow, typically characterized by a sequence of activities or operations aimed at achieving specific objectives. It includes the agent workflow configuration, the interactions between the user and the agents. A session can be “published” to a “gallery”.
Skills: Skills are functions (e.g., Python functions) that describe how to solve a task. In general, a good skill has a descriptive name (e.g. generate_images), extensive docstrings and good defaults (e.g., writing out files to disk for persistence and reuse). You can add new skills to the AutoGen Assistant via the provided UI. At inference time, these skills are made available to the assistant agent as they address your tasks.

AutoGen Assistant comes with 3 example skills: fetch_profile, find_papers, generate_images. Please feel free to review the repo to learn more about how they work.


While the AutoGen Assistant is a web interface, it is powered by an underlying python API that is reusable and modular. Importantly, we have implemented an API where agent workflows can be declaratively specified (in JSON), loaded and run.


https://microsoft.github.io/autogen/blog/2023/11/26/Agent-AutoBuild/


Agent AutoBuild - Automatically Building Multi-agent Systems


Introducing AutoBuild, building multi-agent system automatically, fast, and easily for complex tasks with minimal user prompt required, powered by a new designed class AgentBuilder. AgentBuilder also supports open-source LLMs by leveraging vLLM and FastChat.


In this blog, we introduce AutoBuild, a pipeline that can automatically build multi-agent systems for complex tasks. Specifically, we design a new class called AgentBuilder, which will complete the generation of participant expert agents and the construction of group chat automatically after the user provides descriptions of a building task and an execution task.


AutoBuild supports open-source LLM by vLLM and FastChat.


OpenAI Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. AutoBuild also supports the assistant API by adding use_oai_assistant=True to build().


https://microsoft.github.io/autogen/blog/2023/11/20/AgentEval/


How to Assess Utility of LLM-powered Applications?


As a developer of an LLM-powered application, how can you assess the utility it brings to end users while helping them with their tasks?


We introduce AgentEval — the first version of the framework to assess the utility of any LLM-powered application crafted to assist users in specific tasks.  AgentEval aims to simplify the evaluation process by automatically proposing a set of criteria tailored to the unique purpose of your application. This allows for a comprehensive assessment, quantifying the utility of your application against the suggested criteria.


https://microsoft.github.io/autogen/blog/2023/11/13/OAI-assistants/


AutoGen Meets GPTs


OpenAI assistants are now integrated into AutoGen via GPTAssistantAgent. This enables multiple OpenAI assistants, which form the backend of the now popular GPTs, to collaborate and tackle complex tasks.


https://microsoft.github.io/autogen/blog/2023/11/09/EcoAssistant/


EcoAssistant - Using LLM Assistants More Accurately and Affordably


TL;DR:

Introducing the EcoAssistant, which is designed to solve user queries more accurately and affordably.
We show how to let the LLM assistant agent leverage external API to solve user query.
We show how to reduce the cost of using GPT models via Assistant Hierachy.
We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via Solution Demonstration.


https://microsoft.github.io/autogen/blog/2023/11/06/LMM-Agent/


Multimodal with GPT-4V and LLaVA


This blog post and the latest AutoGen update concentrate on visual comprehension. Users can input images, pose questions about them, and receive text-based responses from these LMMs. We support the gpt-4-vision-preview model from OpenAI and LLaVA model from Microsoft now.


https://microsoft.github.io/autogen/blog/2023/10/26/TeachableAgent/


AutoGen's TeachableAgent


We introduce TeachableAgent (which uses TextAnalyzerAgent) so that users can teach their LLM-based assistants new facts, preferences, and skills.


https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat/


Retrieval-Augmented Generation (RAG) Applications with AutoGen


TL;DR:

We introduce RetrieveUserProxyAgent and RetrieveAssistantAgent, RAG agents of AutoGen that allows retrieval-augmented generation, and its basic usage.
We showcase customizations of RAG agents, such as customizing the embedding function, the text split function and vector database.
We also showcase two advanced usage of RAG agents, integrating with group chat and building a Chat application with Gradio.


https://github.com/microsoft/FLAML


A Fast Library for Automated Machine Learning & Tuning


FLAML is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.

FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range.
It supports fast and economical automatic tuning (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations), capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.


Heads-up: We have migrated AutoGen into a dedicated github repository. Alongside this move, we have also launched a dedicated Discord server and a website for comprehensive documentation.


ChatDev


https://github.com/OpenBMB/ChatDev


Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)


Communicative Agents for Software Development


ChatDev stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer , Chief Product Officer , Chief Technology Officer , programmer , reviewer , tester , art designer . These agents form a multi-agent organizational structure and are united by a mission to "revolutionize the digital world through programming." The agents within ChatDev collaborate by participating in specialized functional seminars, including tasks such as designing, coding, testing, and documenting.
The primary objective of ChatDev is to offer an easy-to-use, highly customizable and extendable framework, which is based on large language models (LLMs) and serves as an ideal scenario for studying collective intelligence.


https://github.com/OpenBMB/ChatDev#-news


November 15th, 2023: We launched ChatDev as a SaaS platform that enables software developers and innovative entrepreneurs to build software efficiently at a very low cost and barrier to entry. Try it out at https://chatdev.modelbest.cn/


November 2nd, 2023: ChatDev is now supported with a new feature: incremental development, which allows agents to develop upon existing codes. Try --config "incremental" --path "[source_code_directory_path]" to start it.


October 26th, 2023: ChatDev is now supported with Docker for safe execution (thanks to contribution from ManindraDeMel). Please see Docker Start Guide.


September 25th, 2023: The Git mode is now available, enabling the programmer to utilize Git for version control. To enable this feature, simply set "git_management" to "True" in ChatChainConfig.json. See guide.


September 20th, 2023: The Human-Agent-Interaction mode is now available! You can get involved with the ChatDev team by playing the role of reviewer  and making suggestions to the programmer ; try python3 run.py --task [description_of_your_idea] --config "Human". See guide and example.


September 1st, 2023: The Art mode is available now! You can activate the designer agent  to generate images used in the software; try python3 run.py --task [description_of_your_idea] --config "Art". See guide and example.


https://chatdev.modelbest.cn/


Unsorted


https://githubnext.com/projects/copilot-workspace


Copilot Workspace
A Copilot-native dev environment, designed for everyday tasks.


https://github.com/githubnext/copilot-workspace-user-manual


The user manual for GitHub Copilot Workspace


https://github.blog/2024-04-29-github-copilot-workspace/


GitHub Copilot Workspace: Welcome to the Copilot-native developer environment
We’re redefining the developer environment with GitHub Copilot Workspace - where any developer can go from idea, to code, to software all in natural language.


https://github.com/holmeswww/agentkit


AgentKit: Flow Engineering with Graphs, not Coding


An intuitive LLM prompting framework for multifunctional agents, by explicitly constructing a complex "thought process" from simple natural language prompts.


AgentKit offers a unified framework for explicitly constructing a complex human "thought process" from simple natural language prompts. The user puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".
Different arrangements of nodes could represent different functionalities, allowing the user to integrate various functionalities to build multifunctional agents.
A basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience.


https://github.com/CopilotKit/CopilotKit


CopiloptKit


A framework for building custom AI Copilots 🤖 in-app AI chatbots, in-app AI Agents, & AI-powered Textareas.


The Open-Source Copilot Framework
Build, deploy, and operate fully custom AI Copilots.
in-app AI chatbots, AI agents, and AI Textareas


https://www.copilotkit.ai/
https://github.com/CopilotKit/demo-todo


This is a demo that showcases using CopilotKit to build a simple Todo app.


https://todo-demo-phi.vercel.app/


https://github.com/OpenBMB/AgentVerse


🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation


Task-solving: This framework assembles multiple agents as an automatic multi-agent system (AgentVerse-Tasksolving, Multi-agent as system) to collaboratively accomplish the corresponding tasks. Applications: software development system, consulting system, etc.


Simulation: This framework allows users to set up custom environments to observe behaviors among, or interact with, multiple agents. Applications: game, social behavior research of LLM-based agents, etc.


https://arxiv.org/abs/2308.10848


AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors


Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks. However, in real-world scenarios, cooperation among individuals is often required to enhance the efficiency and effectiveness of task accomplishment. Hence, inspired by human group dynamics, we propose a multi-agent framework that can collaboratively and dynamically adjust its composition as a greater-than-the-sum-of-its-parts system. Our experiments demonstrate that the framework can effectively deploy multi-agent groups that outperform a single agent. Furthermore, we delve into the emergence of social behaviors among individual agents within a group during collaborative task accomplishment. In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.


https://developer.nvidia.com/blog/building-your-first-llm-agent-application/


Building Your First LLM Agent Application


https://gpt.chatcody.com/


ChatGPT GitHub Empowered assistant
Designed for comprehensive repository interaction - from code contributions to read/write operations, reviews and advanced task automation.


https://chat.openai.com/g/g-jSqTyHBbh-chatcody-github-gitlab-assistant


https://dosu.dev/


Dosu is an AI teammate that lives in your GitHub repo, helping you respond to issues, triage bugs, and build better documentation.


How much does Dosu cost?
Auto-labeling and backlog grooming are completely free! For Q&A and debugging, Dosu is free for 25 tickets per month. After that, paid plans start at $20 per month. A detailed pricing page is coming soon.
At Dosu, we are strong advocates of OSS. If you maintain a project that is FOSS, part of the Cloud Native Computing Foundation (CNCF), or the Apache Software Foundation (ASF), please reach out to hi@dosu.dev about special free-tier plans


https://github.com/princeton-nlp/SWE-agent


SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models


SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.
On SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.


Agent-Computer Interface (ACI)
We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.
Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.


https://github.com/paul-gauthier/aider


aider is AI pair programming in your terminal
Aider is a command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository. Aider will directly edit the code in your local source files, and git commit the changes with sensible commit messages. You can start a new project or work with an existing git repo. Aider is unique in that it lets you ask for changes to pre-existing, larger codebases.


https://aider.chat/

https://aider.chat/blog/

https://aider.chat/2023/10/22/repomap.html


Building a better repository map with tree sitter


https://aider.chat/2023/12/21/unified-diffs.html


Unified diffs make GPT-4 Turbo 3X less lazy


https://github.com/NL2Code/CodeR


CodeR


GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28% of issues, in the case of submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.


https://github.com/simonw/llm


Access large language models from the command-line


https://llm.datasette.io/


LLM
A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine.
Run prompts from the command-line, store the results in SQLite, generate embeddings and more.


https://llm.datasette.io/en/stable/openai-models.html


OpenAI models
LLM ships with a default plugin for talking to OpenAI’s API. OpenAI offer both language models and embedding models, and LLM can access both types.


https://llm.datasette.io/en/stable/other-models.html


Other models
LLM supports OpenAI models by default. You can install plugins to add support for other models. You can also add additional OpenAI-API-compatible models using a configuration file.


Installing and using a local model
LLM plugins can provide local models that run on your machine.
To install llm-gpt4all, providing 17 models from the GPT4All project, run this:
llm install llm-gpt4all

Run llm models to see the expanded list of available models.


https://llm.datasette.io/en/stable/embeddings/cli.html


Embedding with the CLI
LLM provides command-line utilities for calculating and storing embeddings for pieces of content.


llm embed
The llm embed command can be used to calculate embedding vectors for a string of content. These can be returned directly to the terminal, stored in a SQLite database, or both.


Storing embeddings in SQLite
Embeddings are much more useful if you store them somewhere, so you can calculate similarity scores between different embeddings later on.
LLM includes the concept of a collection of embeddings. A collection groups together a set of stored embeddings created using the same model, each with a unique ID within that collection.
Embeddings also store a hash of the content that was embedded. This hash is later used to avoid calculating duplicate embeddings for the same content.


Storing content and metadata
By default, only the entry ID and the embedding vector are stored in the database table.
You can store a copy of the original text in the content column by passing the --store option


You can also store a JSON object containing arbitrary metadata in the metadata column by passing the --metadata option.


llm embed-multi
The llm embed command embeds a single string at a time.
llm embed-multi can be used to embed multiple strings at once, taking advantage of any efficiencies that the embedding model may provide when processing multiple strings.
This command can be called in one of three ways:

With a CSV, TSV, JSON or newline-delimited JSON file
With a SQLite database and a SQL query
With one or more paths to directories, each accompanied by a glob pattern


Embedding data from a SQLite database
You can embed data from a SQLite database using --sql, optionally combined with --attach to attach an additional database.


Embedding data from files in directories
LLM can embed the content of every text file in a specified directory, using the file’s path and name as the ID.


llm similar
The llm similar command searches a collection of embeddings for the items that are most similar to a given or item ID.
This currently uses a slow brute-force approach which does not scale well to large collections. See issue 216 for plans to add a more scalable approach via vector indexes provided by plugins.


simonw/llm#216


Support for plugins that implement vector indexes


You can compare against text stored in a file using -i filename


When using a model like CLIP, you can find images similar to an input image using -i filename with --binary


llm embed-models
To list all available embedding models, including those provided by plugins, run this command:
llm embed-models


llm collections list
To list all of the collections in the embeddings database, run this command:
llm collections list


https://llm.datasette.io/en/stable/embeddings/writing-plugins.html


Writing plugins to add new embedding models
Read the plugin tutorial for details on how to develop and package a plugin.
This page shows an example plugin that implements and registers a new embedding model.
There are two components to an embedding model plugin:

An implementation of the register_embedding_models() hook, which takes a register callback function and calls it to register the new model with the LLM plugin system.
A class that extends the llm.EmbeddingModel abstract base class. The only required method on this class is embed_batch(texts), which takes an iterable of strings and returns an iterator over lists of floating point numbers.


Embedding binary content
If your model can embed binary content, use the supports_binary property to indicate that


If your model accepts binary, your .embed_batch() model may be called with a list of Python bytestrings. These may be mixed with regular strings if the model accepts both types of input.


https://llm.datasette.io/en/stable/plugins/installing-plugins.html


Installing plugins
Plugins must be installed in the same virtual environment as LLM itself.
You can find names of plugins to install in the plugin directory
Use the llm install command (a thin wrapper around pip install) to install plugins in the correct environment


https://llm.datasette.io/en/stable/plugins/directory.html#plugin-directory


Plugin directory
The following plugins are available for LLM.


https://llm.datasette.io/en/stable/plugins/tutorial-model-plugin.html


Writing a plugin to support a new model
This tutorial will walk you through developing a new plugin for LLM that adds support for a new Large Language Model.


https://llm.datasette.io/en/stable/aliases.html


Model aliases
LLM supports model aliases, which allow you to refer to a model by a short name instead of its full ID.


Listing aliases
To list current aliases, run this:
llm aliases


Adding a new alias
The llm aliases set <alias> <model-id> command can be used to add a new alias


Removing an alias
The llm aliases remove <alias> command will remove the specified alias


Viewing the aliases file
Aliases are stored in an aliases.json file in the LLM configuration directory.
To see the path to that file, run this:
llm aliases path

To view the content of that file, run this:
cat "$(llm aliases path)"


https://llm.datasette.io/en/stable/python-api.html


Python API
LLM provides a Python API for executing prompts, in addition to the command-line interface.
Understanding this API is also important for writing Plugins.


https://llm.datasette.io/en/stable/templates.html


Prompt templates
Prompt templates can be created to reuse useful prompts with different input data.


https://llm.datasette.io/en/stable/logging.html


Logging to SQLite
llm defaults to logging all prompts and responses to a SQLite database.
You can find the location of that database using the llm logs path command


To avoid logging an individual prompt, pass --no-log or -n to the command


To turn logging by default off: llm logs off


https://llm.datasette.io/en/stable/related-tools.html


Related tools
The following tools are designed to be used with LLM:


https://llm.datasette.io/en/stable/related-tools.html#strip-tags


strip-tags
strip-tags is a command for stripping tags from HTML. This is useful when working with LLMs because HTML tags can use up a lot of your token budget.
Here’s how to summarize the front page of the New York Times, by both stripping tags and filtering to just the elements with class="story-wrapper":
curl -s https://www.nytimes.com/ \
  | strip-tags .story-wrapper \
  | llm -s 'summarize the news'


https://llm.datasette.io/en/stable/related-tools.html#ttok


ttok
ttok is a command-line tool for counting OpenAI tokens. You can use it to check if input is likely to fit in the token limit for GPT 3.5 or GPT4


It can also truncate input down to a desired number of tokens


https://llm.datasette.io/en/stable/related-tools.html#symbex


Symbex
Symbex is a tool for searching for symbols in Python codebases. It’s useful for extracting just the code for a specific problem and then piping that into LLM for explanation, refactoring or other tasks.


It can also be used to export symbols in a format that can be piped to llm embed-multi in order to create embeddings


Based on how Symbex is described, I think grep-ast might be able to do a similar job, but across any language supported by tree-sitter, and not just python:

https://github.com/paul-gauthier/grep-ast


grep-ast
Grep soure code files and see matching lines with useful context that show how they fit into the code. See the loops, functions, methods, classes, etc that contain all the matching lines. Get a sense of what's inside a matched class or function definition. You see relevant code from every layer of the abstract syntax tree, above and below the matches.


https://simonwillison.net/tags/llm/

https://simonwillison.net/2023/Apr/4/llm/


Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter


The llm CLI tool
This is one new piece of software I’ve released in the past few weeks that I haven’t written about yet.
I built the first version of llm, a command-line tool for running prompts against large language model (currently just ChatGPT and GPT-4), getting the results back on the command-line and also storing the prompt and response in a SQLite database.


https://simonwillison.net/2023/May/18/cli-tools-for-llms/


llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs
I’ve been building out a small suite of command-line tools for working with ChatGPT, GPT-4 and potentially other language models in the future.
The three tools I’ve built so far are:

llm — a command-line tool for sending prompts to the OpenAI APIs, outputting the response and logging the results to a SQLite database. I introduced that a few weeks ago.
ttok — a tool for counting and truncating text based on tokens
strip-tags — a tool for stripping HTML tags from text, and optionally outputting a subset of the page based on CSS selectors

The idea with these tools is to support working with language model prompts using Unix pipes.


https://simonwillison.net/2023/Jun/18/symbex/


Symbex: search Python code for functions and classes, then pipe them into a LLM
I just released a new Python CLI tool called Symbex. It’s a search tool, loosely inspired by ripgrep, which lets you search Python code for functions and classes by name or wildcard, then see just the source code of those matching entities.


https://simonwillison.net/2023/Jul/12/llm/


My LLM CLI tool now supports self-hosted language models via plugins
LLM is my command-line utility and Python library for working with large language models such as GPT-4. I just released version 0.5 with a huge new feature: you can now install plugins that add support for additional models to the tool, including models that can run on your own hardware.


https://simonwillison.net/2023/Sep/4/llm-embeddings/


LLM is my Python library and command-line tool for working with language models. I just released LLM 0.9 with a new set of features that extend LLM to provide tools for working with embeddings.


An embedding model lets you take a string of text—a word, sentence, paragraph or even a whole document—and turn that into an array of floating point numbers called an embedding vector.


A model will always produce the same length of array—1,536 numbers for the OpenAI embedding model, 384 for all-MiniLM-L6-v2—but the array itself is inscrutable. What are you meant to do with it?
The answer is that you can compare them. I like to think of an embedding vector as a location in 1,536-dimensional space. The distance between two vectors is a measure of how semantically similar they are in meaning, at least according to the model that produced them.


Things you can do with embeddings include:

Find related items. I use this on my TIL site to display related articles, as described in Storing and serving related documents with openai-to-sqlite and embeddings.
Build semantic search. As shown above, an embeddings-based search engine can find content relevant to the user’s search term even if none of the keywords match.
Implement retrieval augmented generation—the trick where you take a user’s question, find relevant documentation in your own corpus and use that to get an LLM to spit out an answer. More on that here.
Clustering: you can find clusters of nearby items and identify patterns in a corpus of documents.
Classification: calculate the embedding of a piece of text and compare it to pre-calculated “average” embeddings for different categories.


My goal with LLM is to provide a plugin-driven abstraction around a growing collection of language models. I want to make installing, using and comparing these models as easy as possible.
The new release adds several command-line tools for working with embeddings, plus a new Python API for working with embeddings in your own code.
It also adds support for installing additional embedding models via plugins.


https://simonwillison.net/2024/Mar/26/llm-cmd/


I just released a neat new plugin for my LLM command-line tool: llm-cmd. It lets you run a command to to generate a further terminal command, review and edit that command, then hit  to execute it or  to cancel.


https://github.com/OpenDevin/OpenDevin


OpenDevin: Code Less, Make More


https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/


Introducing OpenDevin CodeAct 1.0, a new State-of-the-art in Coding Agents


today we introduce a new state-of-the-art coding agent, OpenDevin CodeAct 1.0, which achieves 21% solve rate on SWE-Bench Lite unassisted, a 17% relative improvement above the previous state-of-the-art posted by SWE-Agent. OpenDevin CodeAct 1.0 is now the default in OpenDevin v0.5


We also are working on a new simplified evaluation harness for testing coding agents, which we hope will be easy to use for agent developers and researchers, facilitating comprehensive evaluation and comparison. The current version of the harness is available here (tutorial, harness).


Tutorial: https://github.com/OpenDevin/OpenDevin/tree/bl-xw/swe-bench/evaluation/swe_bench
Harness: https://github.com/OpenDevin/OD-SWE-bench/tree/eval


SWE-Bench is a great benchmark that tests the ability of coding agents to solve real-world github issues on a number of popular repositories. However, due in part to its realism the process of evaluating on SWE-Bench can initially seem daunting.


To help make it easy to perform this process in an efficient, stable, and reproducible manner, the OpenDevin team containerized the evaluation environment. This preparation involves setting up all necessary testbeds (codebases at various versions) and their respective conda environments in advance. For each task instance, we initiate a sandbox container where the testbed is pre-configured, ensuring a ready-to-use setup for the agent


This supports both SWE-Bench-Lite (a smaller benchmark of 300 issues that is more conducive to quick benchmarking) and SWE-Bench (the full dataset of 2,294 issues, work-in-progress). With our evaluation pipeline, we obtained a replicated SWE-agent resolve score of 17.3% (52 out of 300 test instances) on SWE-Bench-Lite using the released SWE-agent patch predictions, which differs by 2 from the originally reported 18.0% (54 out of 300).


OpenDevin/OpenDevin#742


Explore whether stack graphs may be useful in this tool


https://github.com/stitionai/devika


Devika - Agentic AI Software Engineer


Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.


https://github.com/geekan/MetaGPT


MetaGPT: The Multi-Agent Framework
Assign different roles to GPTs to form a collaborative entity for complex tasks.


The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming


https://www.deepwisdom.ai/

https://docs.deepwisdom.ai/


https://github.com/Pythagora-io/gpt-pilot


The first real AI developer


GPT Pilot is the core technology for the Pythagora VS Code extension that aims to provide the first real AI developer companion. Not just an autocomplete or a helper for PR messages but rather a real AI developer that can write full features, debug them, talk to you about issues, ask for review, etc.


https://marketplace.visualstudio.com/items?itemName=PythagoraTechnologies.gpt-pilot-vs-code


Pythagora (GPT Pilot) Beta


https://github.com/blarApp/code-base-agent


Code agents for LLMs
This repo introduces a method to represent a local code repository as a graph structure. The objective is to allow an LLM to traverse this graph to understand the code logic and flow. Providing the LLM with the power to debug, refactor, and optimize queries. However, several tasks are yet unexplored.


https://blar.io/

https://blar.io/blog

https://blar.io/blog/how-can-you-improve-the-accuracy-of-your-vector-database-and-rag-systems


How can you improve the accuracy of your vector database and RAG systems?


https://blar.io/blog/vector-database-alternative-graphs


Vector Database Alternative: Graphs


https://github.com/cpacker/MemGPT


MemGPT allows you to build LLM agents with self-editing memory


Building persistent LLM agents with long-term memory


https://github.com/daveshap/OpenAI_Agent_Swarm


Hierarchical Autonomous Agent Swarm (HAAS)


The Hierarchical Autonomous Agent Swarm (HAAS) is a groundbreaking initiative that leverages OpenAI's latest advancements in agent-based APIs to create a self-organizing and ethically governed ecosystem of AI agents. Drawing inspiration from the ACE Framework, HAAS introduces a novel approach to AI governance and operation, where a hierarchy of specialized agents, each with distinct roles and capabilities, collaborate to solve complex problems and perform a wide array of tasks.
The HAAS is designed to be a self-expanding system where a core set of agents, governed by a Supreme Oversight Board (SOB), can design, provision, and manage an arbitrary number of sub-agents tailored to specific needs. This document serves as a comprehensive guide to the theoretical underpinnings, architectural design, and operational principles of the HAAS.


https://github.com/daveshap/OpenAI_Agent_Swarm/discussions


https://github.com/daveshap/ACE_Framework


ACE (Autonomous Cognitive Entities) - 100% local and open source autonomous agents


We will be committed to using 100% open source software (OSS) for this project. This is to ensure maximimum accessibility and democratic access.


https://github.com/ShishirPatil/gorilla


Gorilla: An API store for LLMs


Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them! Hop on our Discord, or open a PR, or email us if you would like to have your API incorporated as well.


https://gorilla.cs.berkeley.edu/


Gorilla: Large Language Model Connected with Massive APIs


https://github.com/ShishirPatil/gorilla/tree/main/openfunctions


Gorilla Openfunctions


Gorilla OpenFunctions extends Large Language Model(LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.


Comes with Parallel Function Calling!


OpenFunctions is compatible with OpenAI Functions


https://gorilla.cs.berkeley.edu/blogs/4_open_functions.html


OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. Imagine if the LLM could fill in parameters for a variety of services, ranging from Instagram and DoorDash to tools like Google Calendar and Stripe. Even users who are less familiar with API calling procedures and programming can use the model to generate API calls to the desired function. Gorilla OpenFunctions is an LLM that we train using a curated set of API documentation, and Question-Answer pairs generated from the API documentations. We have continued to expand on the Gorilla Paradigm and sought to improve the quality and accuracy of valid function calling generation. This blog is about developing an open-source alternative for function calling similar to features seen in proprietary models, in particular, function calling in OpenAI's GPT-4. Our solution is based on the Gorilla recipe, and with a model with just 7B parameters, its accuracy is, surprisingly, comparable to GPT-4.


https://github.com/gorilla-llm/gorilla-cli


LLMs for your CLI


Gorilla CLI
Gorilla CLI powers your command-line interactions with a user-centric tool. Simply state your objective, and Gorilla CLI will generate potential commands for execution. Gorilla today supports ~1500 APIs, including Kubernetes, AWS, GCP, Azure, GitHub, Conda, Curl, Sed, and many more. No more recalling intricate CLI arguments! 🦍


Code Generation / Execution


See also:

AI Agents / etc


Unsorted


TODO

Code Leaderboards / Benchmarks


https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard


Big Code Models Leaderboard


Inspired from the 🤗 Open LLM Leaderboard and 🤗 Open LLM-Perf Leaderboard 🏋️, we compare performance of base multilingual code generation models on HumanEval benchmark and MultiPL-E. We also measure throughput and provide information about the models. We only compare open pre-trained multilingual code models, that people can start from as base models for their trainings.


https://evalplus.github.io/leaderboard.html


EvalPlus Leaderboard


EvalPlus evaluates AI Coders with rigorous tests.


https://github.com/evalplus/evalplus


EvalPlus


EvalPlus is a rigorous evaluation framework for LLM4Code, with:

✨ HumanEval+: 80x more tests than the original HumanEval!
✨ MBPP+: 35x more tests than the original MBPP!
✨ Evaluation framework: our packages/images/tools can easily and safely evaluate LLMs on above benchmarks.


https://evalplus.github.io/


Benchmarks @ EvalPlus
EvalPlus team aims to build high-quality benchmarks for evaluating LLMs for code. Below are the benchmarks we have beening building so far


HumanEval+ & MBPP+
HumanEval and MBPP initially came with limited tests. EvalPlus made HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous eval.


RepoQA: Long-Context Code Understanding
Repository understanding is crucial for intelligent code agents. At RepoQA, we are designing evaluators of long-context code understanding.


https://evalplus.github.io/repoqa.html


RepoQA
The First Benchmark for Long-Context Code Understanding


The goal of RepoQA: is to create a series of long-context code understanding tasks to challenge chat/instruction models for code:

Multi-Lingual: RepoQA covers 50 high-quality respositories from 5 programming langauges.
Application-Driven: While "Needle in the Code" by CodeQwen uses a synthetic task to examine the vulnerable parts over the LLM's long context, RepoQA focuses on tasks that can reflect real-world uses.
🔍 Searching Needle Function (🔗): Search a function given its description.
🚧 RepoQA is still under development... More types of QA tasks are coming soon... Stay tuned!


AutoCoder


https://github.com/bin123apple/AutoCoder


AutoCoder


We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024). (90.9% vs 90.2%).
Additionally, compared to previous open-source models, AutoCoder offers a new feature: it can automatically install the required packages and attempt to run the code until it deems there are no issues, whenever the user wishes to execute the code.


https://arxiv.org/abs/2405.14906


AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}


We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset.


OpenCodeInterpreter


https://github.com/OpenCodeInterpreter/OpenCodeInterpreter


OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement


OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. It significantly enhances code generation capabilities by integrating execution and iterative refinement functionalities.


https://opencodeinterpreter.github.io/
https://arxiv.org/abs/2402.14658


OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement


The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.


OpenInterpreter


https://github.com/KillianLucas/open-interpreter


OpenInterpreter
A natural language interface for computers


Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.
This provides a natural-language interface to your computer's general-purpose capabilities


https://openinterpreter.com/

https://docs.openinterpreter.com/introduction

https://github.com/KillianLucas/open-interpreter-docs


Documentation site for the Open Interpreter project


https://changes.openinterpreter.com/


https://github.com/KillianLucas/open-procedures


Tiny, structured coding tutorials that can be searched semantically


Open Procedures is an open-source project offering tiny, structured coding tutorials that can be searched semantically. It was created to help code-interpreting language models complete tasks by fetching relevant and up-to-date code snippets.


https://open-procedures.replit.app/


Vision / Multimodal

OpenAI


https://platform.openai.com/docs/guides/vision


Vision


Learn how to use GPT-4 to understand images


GPT-4 with Vision, sometimes referred to as GPT-4V or gpt-4-vision-preview in the API, allows the model to take in images and answer questions about them.


LLaVA / etc


https://llava-vl.github.io/


LLaVA: Large Language and Vision Assistant


LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.


LLaVA-1.5 achieves SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods that use billion-scale data.


Demo: https://llava.hliu.cc/
https://github.com/haotian-liu/LLaVA


LLaVA: Large Language and Vision Assistant


Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.


https://github.com/haotian-liu/LLaVA#release

The following are just a couple of notes that jumped out at me:


11/10 LLaVA-Plus is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). Project Page Demo Code Paper


11/2 LLaVA-Interactive is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. Project Page Demo Code Paper


10/26 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts, script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA.


10/12 LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support!


10/5 LLaVA-1.5 is out! Achieving SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods like Qwen-VL-Chat that use billion-scale data. Check out the technical report, and explore the demo! Models are available in Model Zoo.


6/11 We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here.


6/1 We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page.


https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md


https://github.com/LLaVA-VL/LLaVA-Plus-Codebase


LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills


Learning to Use Tools For Creating Multimodal Agents.


Demo: https://llavaplus.ngrok.io/
https://github.com/LLaVA-VL/LLaVA-Plus-Codebase/blob/main/docs/llava-plus/modelzoo.md
https://llava-vl.github.io/llava-plus/


https://github.com/LLaVA-VL/LLaVA-NeXT


LLaVA-NeXT: Open Large Multimodal Models


https://llava-vl.github.io/blog/2024-01-30-llava-next/


LLaVA-NeXT: Improved reasoning, OCR, and world knowledge


Today, we are thrilled to present LLaVA-NeXT, with improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.


Compared with LLaVA-1.5, LLaVA-NeXT has several improvements:

Increasing the input image resolution to 4x more pixels. This allows it to grasp more visual details. It supports three aspect ratios, up to 672x672, 336x1344, 1344x336 resolution.
Better visual reasoning and OCR capability with an improved visual instruction tuning data mixture.
Better visual conversation for more scenarios, covering different applications. Better world knowledge and logical reasoning.
Efficient deployment and inference with SGLang.


https://llava-vl.github.io/blog/2024-04-30-llava-next-video/


LLaVA-NeXT: A Strong Zero-shot Video Understanding Model


In today’s exploration, we delve into the performance of LLaVA-NeXT within the realm of video understanding tasks. We reveal that LLaVA-NeXT surprisingly has strong performance in understanding video content.


SoTA Performance! Without seeing any video data, LLaVA-Next demonstrates strong zero-shot modality transfer ability, outperforming all the existing open-source LMMs (e.g., LLaMA-VID) that have been specifically trained for videos. Compared with proprietary ones, it achieves comparable performance with Gemini Pro on NextQA and ActivityNet-QA.


https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/


LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities in the Wild


https://github.com/microsoft/LLaVA-Med


LLaVA-Med: Large Language and Vision Assistant for BioMedicine


Visual instruction tuning towards building large language and vision models with GPT-4 level capabilities in the biomedicine space.


Unsorted


https://github.com/tldraw/draw-a-ui


draw-a-ui
This is an app that uses tldraw and the gpt-4-vision api to generate html based on a wireframe you draw.


Draw a mockup and generate html for it


https://makereal.tldraw.com/
https://github.com/SawyerHood/draw-a-ui

Original repo that was forked for the above
https://www.draw-a-ui.com/


https://github.com/jordansinger/build-it-figma-ai


Draw and sketch UI in Figma and FigJam with this widget. Inspired by SawyerHood/draw-a-ui and tldraw/draw-a-ui


https://github.com/jordansinger/UIDraw


Draw and build a website on your phone.


Uses GPT-4 Vision and PencilKit/PKCanvasView to draw a UI and convert it into HTML.


https://twitter.com/jsngr/status/1728848624048853442


https://github.com/microsoft/SoM


Set-of-Mark Prompting for LMMs


Set-of-Mark Visual Prompting for GPT-4V


We present Set-of-Mark (SoM) prompting, simply overlaying a number of spatial and speakable marks on the images, to unleash the visual grounding abilities in the strongest LMM -- GPT-4V. Let's using visual prompting for vision!


https://arxiv.org/abs/2310.11441


Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V


We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V. As illustrated in Fig. 1 (right), we employ off-the-shelf interactive segmentation models, such as SEEM/SAM, to partition an image into regions at different levels of granularity, and overlay these regions with a set of marks e.g., alphanumerics, masks, boxes. Using the marked image as input, GPT-4V can answer the questions that require visual grounding. We perform a comprehensive empirical study to validate the effectiveness of SoM on a wide range of fine-grained vision and multimodal tasks. For example, our experiments show that GPT-4V with SoM in zero-shot setting outperforms the state-of-the-art fully-finetuned referring expression comprehension and segmentation model on RefCOCOg. Code for SoM prompting is made public at: this https URL.


https://github.com/facebookresearch/segment-anything


Segment Anything


The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.


The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.


https://github.com/UX-Decoder/Semantic-SAM


Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"


In this work, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We have trained on the whole SA-1B dataset and our model can reproduce SAM and beyond it.


Segment everything for one image. We output controllable granularity masks from semantic, instance to part level when using different granularity prompts.


https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once


SEEM: Segment Everything Everywhere All at Once


[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"


We introduce SEEM that can Segment Everything Everywhere with Multi-modal prompts all at once. SEEM allows users to easily segment an image using prompts of different types including visual prompts (points, marks, boxes, scribbles and image segments) and language prompts (text and audio), etc. It can also work with any combination of prompts or generalize to custom prompts!


https://github.com/IDEA-Research/GroundingDINO


Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"


https://github.com/IDEA-Research/OpenSeeD


[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"


https://github.com/IDEA-Research/MaskDINO


[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"


https://github.com/facebookresearch/VLPart


[ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation


Object detection has been expanded from a limited number of categories to open vocabulary. Moving forward, a complete intelligent vision system requires understanding more fine-grained object descriptions, object parts. In this work, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.


https://github.com/OthersideAI/self-operating-computer


Self-Operating Computer Framework
A framework to enable multimodal models to operate a computer.
Using the same inputs and outputs of a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.


https://github.com/ddupont808/GPT-4V-Act


GPT-4V-Act: Chromium Copilot


AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI


GPT-4V-Act serves as an eloquent multimodal AI assistant that harmoniously combines GPT-4V(ision) with a web browser. It's designed to mirror the input and output of a human operator—primarily screen feedback and low-level mouse/keyboard interaction. The objective is to foster a smooth transition between human-computer operations, facilitating the creation of tools that considerably boost the accessibility of any user interface (UI), aid workflow automation, and enable automated UI testing.


GPT-4V-Act leverages both GPT-4V(ision) and Set-of-Mark Prompting, together with a tailored auto-labeler. This auto-labeler assigns a unique numerical ID to each interactable UI element.
By incorporating a task and a screenshot as input, GPT-4V-Act can deduce the subsequent action required to accomplish a task. For mouse/keyboard output, it can refer to the numerical labels for exact pixel coordinates.


https://openai.com/research/gpt-4v-system-card


GPT-4V(ision)


https://github.com/Jiayi-Pan/GPT-V-on-Web


👀🧠 GPT-4 Vision x 💪⌨️ Vimium = Autonomous Web Agent


This project leverages GPT4V to create an autonomous / interactive web agent. The action space are discretized by Vimium.


https://github.com/bdekraker/WebcamGPT-Vision


Lightweight GPT-4 Vision processing over the Webcam


WebcamGPT-Vision is a lightweight web application that enables users to process images from their webcam using OpenAI's GPT-4 Vision API. The application captures images from the user's webcam, sends them to the GPT-4 Vision API, and displays the descriptive results.


Vector Databases/Search, Similarity Search, Clustering, etc


TODO: add more things here

Faiss


https://github.com/facebookresearch/faiss


Faiss


A library for efficient similarity search and clustering of dense vectors.


Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed primarily at Meta's Fundamental AI Research group.


https://faiss.ai/


Benchmarks / Leaderboards


See also:

Agent Benchmarks / Leaderboards


https://chat.lmsys.org/


LMSYS Chatbot Arena Leaderboard


https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard


Open LLM Leaderboard


https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard


Open Medical-LLM Leaderboard


https://huggingface.co/blog/leaderboard-medicalllm


The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare


https://github.com/EleutherAI/lm-evaluation-harness


Language Model Evaluation Harness


A framework for few-shot evaluation of language models.


https://github.com/openai/evals


OpenAI Evals


Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.


Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to write your own custom evals for use cases you care about. You can also use your data to build private evals which represent the common LLMs patterns in your workflow without exposing any of that data publicly.
If you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.


https://github.com/openai/simple-evals


This repository contains a lightweight library for evaluating language models. We are open sourcing it so we can be transparent about the accuracy numbers we're publishing alongside our latest models (starting with gpt-4-turbo-2024-04-09).
Evals are sensitive to prompting, and there's significant variation in the formulations used in recent publications and libraries. Some use few-shot prompts or role playing prompts ("You are an expert software programmer..."). These approaches are carryovers from evaluating base models (rather than instruction/chat-tuned models) and from models that were worse at following instructions.
For this library, we are emphasizing the zero-shot, chain-of-thought setting, with simple instructions like "Solve the following multiple choice problem". We believe that this prompting technique is a better reflection of the models' performance in realistic usage.


Prompts / Prompt Engineering / etc


https://github.com/mshumer/gpt-prompt-engineer


gpt-prompt-engineer
Prompt engineering is kind of like alchemy. There's no clear way to predict what will work best. It's all about experimenting until you find the right prompt. gpt-prompt-engineer is a tool that takes this experimentation to a whole new level.
Simply input a description of your task and some test cases, and the system will generate, test, and rank a multitude of prompts to find the ones that perform the best.


Prompt Testing: The real magic happens after the generation. The system tests each prompt against all the test cases, comparing their performance and ranking them using an ELO rating system.


ELO Rating System: Each prompt starts with an ELO rating of 1200. As they compete against each other in generating responses to the test cases, their ELO ratings change based on their performance. This way, you can easily see which prompts are the most effective.


https://en.wikipedia.org/wiki/Elo_rating_system


The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess.


The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.


A player's Elo rating is a number which may change depending on the outcome of rated games played. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. If the higher-rated player wins, then only a few rating points will be taken from the lower-rated player. However, if the lower-rated player scores an upset win, many rating points will be transferred. The lower-rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. Players whose ratings are too low or too high should, in the long run, do better or worse correspondingly than the rating system predicts and thus gain or lose rating points until the ratings reflect their true playing strength.


https://github.com/dair-ai/Prompt-Engineering-Guide


Prompt Engineering Guide


https://www.promptingguide.ai/


https://github.com/daveshap/ChatGPT_Custom_Instructions


Repo of custom instructions that you can use for ChatGPT


https://github.com/daveshap/PTSD_prompts


GPT based PTSD experiments - USE AT OWN RISK - EXPERIMENTAL ONLY


https://github.com/yzfly/Awesome-Multimodal-Prompts


Awesome Multimodal Prompts


Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.


https://arxiv.org/abs/2402.03620


Self-Discover: Large Language Models Self-Compose Reasoning Structures
Submitted on 6 Feb 2024
We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.


Other Useful Tools / Libraries / etc

Unsorted


See Also

https://gist.github.com/0xdevalias/d8b743efb82c0e9406fc69da0d6c6581#profiling


https://github.com/pypa/pipx


pipx — Install and Run Python Applications in Isolated Environments


https://pipx.pypa.io/stable/


pipx is a tool to help you install and run end-user applications written in Python. It's roughly similar to macOS's brew, JavaScript's npx, and Linux's apt.
It's closely related to pip. In fact, it uses pip, but is focused on installing and managing Python packages that can be run from the command line directly as applications.


https://pipx.pypa.io/stable/comparisons/


Comparison to Other Tools


https://pipx.pypa.io/stable/how-pipx-works/


How pipx works


https://pipedream.com/requestbin


Request Bin


Inspect webhooks and HTTP requests
Get a URL to collect HTTP or webhook requests and inspect them in a human-friendly way. Optionally connect APIs, run code and return a custom response on each request.


https://github.com/googleapis/release-please


Release Please
Release Please automates CHANGELOG generation, the creation of GitHub releases, and version bumps for your projects.
It does so by parsing your git history, looking for Conventional Commit messages, and creating release PRs.
It does not handle publication to package managers or handle complex branch management.


https://github.com/google-github-actions/release-please-action


automated releases based on conventional commits


Release Please Action
Automate releases with Conventional Commit Messages.


https://www.conventionalcommits.org/


https://github.com/winstonjs/winston


winston
A logger for just about everything.


https://github.com/winstonjs/winston#usage


https://github.com/tldraw/tldraw


a very good whiteboard


tldraw is a collaborative digital whiteboard available at tldraw.com. Its editor, user interface, and other underlying libraries are open source and available in this repository. They are also distributed on npm. You can use tldraw to create a drop-in whiteboard for your product or as the foundation on which to build your own infinite canvas applications.


https://tldraw.dev/


You can use the Tldraw React component to embed a fully featured and extendable whiteboard in your app.


For multiplayer whiteboards, you can plug the component into the collaboration backend of your choice.


You can use the Editor API to create, update, and delete shapes, control the camera—or do just about anything else. You can extend tldraw with your own custom shapes and custom tools. You can use our user interface overrides to change the contents of menus and toolbars, or else hide the UI and replace it with your own.


If you want to go even deeper, you can use the TldrawEditor component as a more minimal engine without the default tldraw shapes or user interface.


JavaScript (full text) Search Libraries

https://www.npmjs.com/search?q=full%20text%20search
https://byby.dev/js-search-libraries
https://github.com/nextapps-de/flexsearch


Next-Generation full text search library for Browser and Node.js


Web's fastest and most memory-flexible full-text search library with zero dependencies.


When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching.
Depending on the used options it also provides the most memory-efficient index. FlexSearch introduce a new scoring algorithm called "contextual index" based on a pre-scored lexical dictionary architecture which actually performs queries up to 1,000,000 times faster compared to other libraries. FlexSearch also provides you a non-blocking asynchronous processing model as well as web workers to perform any updates or queries on the index in parallel through dedicated balanced threads.


https://github.com/nextapps-de/flexsearch#consumption


Memory Consumption


https://nextapps-de.github.io/flexsearch/bench/


Benchmark of Full-Text-Search Libraries (Stress Test)


https://nextapps-de.github.io/flexsearch/bench/match.html


Relevance Scoring Comparison


https://github.com/angeloashmore/react-use-flexsearch


React hook to search a FlexSearch index


The useFlexSearch hook takes your search query, index, and store and returns results as an array. Searches are memoized to ensure efficient searching.


https://github.com/krisk/fuse


Lightweight fuzzy-search, in JavaScript


Fuse.js is a lightweight fuzzy-search, in JavaScript, with zero dependencies.


https://www.fusejs.io/


https://github.com/weixsong/elasticlunr.js


Based on lunr.js, but more flexible and customized.


Elasticlunr.js
Elasticlunr.js is a lightweight full-text search engine developed in JavaScript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting, field search, more rational scoring/ranking methodology, fast computation speed and so on. Elasticlunr.js is a bit like Solr, but much smaller and not as bright, but also provide flexible configuration, query-time boosting, field search and other features.


Contributor Welcome!!!
As I'm now focusing on new domain, hope that someone who interested on this project could help to maintain this repository.


http://elasticlunr.com/


https://github.com/olivernn/lunr.js


Lunr.js
A bit like Solr, but much smaller and not as bright


Lunr.js is a small, full-text search library for use in the browser. It indexes JSON documents and provides a simple search interface for retrieving documents that best match text queries.


For web applications with all their data already sitting in the client, it makes sense to be able to search that data on the client too. It saves adding extra, compacted services on the server. A local search index will be quicker, there is no network overhead, and will remain available and usable even without a network connection.


https://lunrjs.com/


https://github.com/apache/solr


Apache Solr


Solr is the popular, blazing fast open source search platform for all your enterprise, e-commerce, and analytics needs, built on Apache Lucene.


Node-based UI's, Graph Execution, Flow Based Programming, etc


https://github.com/xyflow/awesome-node-based-uis


A curated list with resources about node-based UIs


https://github.com/xyflow/xyflow


React Flow | Svelte Flow - Powerful open source libraries for building node-based UIs with React (https://reactflow.dev) or Svelte (https://svelteflow.dev). Ready out-of-the-box and infinitely customizable.


https://www.xyflow.com/


Powerful open source libraries for building node-based UIs with React or Svelte. Ready out-of-the-box and infinitely customizable


https://reactflow.dev


Wire Your Ideas with React Flow
A customizable React component for building node-based editors and interactive diagrams


https://svelteflow.dev


Wire Your Ideas with Svelte Flow
A customizable Svelte component for building node-based editors and interactive diagrams by the creators of React Flow


https://github.com/tisoap/react-flow-smart-edge


React Flow Smart Edge
Custom Edges for React Flow that never intersect with other nodes, using pathfinding.


https://github.com/beeglebug/behave-flow


Behave Flow
Behave Flow is a UI for editing behave-graph behaviour graphs using react-flow


https://github.com/bhouston/behave-graph


Behave-Graph
Open, extensible, small and simple behaviour-graph execution engine.


Behave-Graph is a standalone library that implements the concept of "behavior graphs" as a portable TypeScript library with no required external run-time dependencies. Behavior graphs are expressive, deterministic, and extensible state machines that can encode arbitrarily complex behavior.
Behavior graphs are used extensively in game development as a visual scripting language. For example, look at Unreal Engine Blueprints or Unity's Visual Scripting or NVIDIA Omniverse's OmniGraph behavior graphs.
This library is intended to follow industry best practices in terms of behavior graphs. It is also designed to be compatible with these existing implementations in terms of capabilities. Although, like all node-based systems, behavior graphs are always limited by their node implementations.


https://github.com/bhouston/behave-graph#command-line-examples


Command Line Examples
The example behavior graphs are in the /examples folder. You can execute these from the command line to test out how this library works.


https://github.com/bhouston/behave-graph/tree/main/docs

https://github.com/bhouston/behave-graph/blob/main/docs/Abstractions.md


Abstractions
Behave-graph is designed as a light weight library that can be plugged into other engines, such as Three.js or Babylon.js. In order to simplify pluggin into other engines, it defines the functionality required for interfacing with these engines as "abstractions", which can then be implemented by the engines.


https://github.com/bhouston/behave-graph/blob/main/docs/ExecutionModel.md


Behave-Graph Execution Pseudocode
Based nearly exactly from http://github.com/bhouston/behave-graph, specifically these files:

https://github.com/bhouston/behave-graph/blob/main/packages/core/src/Execution/Engine.ts
https://github.com/bhouston/behave-graph/blob/main/packages/core/src/Execution/Fiber.ts
https://github.com/bhouston/behave-graph/blob/main/packages/core/src/Execution/resolveSocketValue.ts


https://github.com/bhouston/behave-graph/blob/main/docs/TypesOfNodes.md
https://github.com/bhouston/behave-graph/blob/main/docs/Values.md


Behave-graph supports a pluggable value system where you can easily add new values to the system. Values are what are passed between nodes via sockets.
Values are registered into the central registry as instances of the ValueType class. The value type class controls creation, serialization, deserialization.


bhouston/behave-graph#166


Merge behave-flow as a package @behave-graph/react-flow


https://github.com/bhouston/behave-graph/tree/main/packages/flow


Behave Flow
Behave Flow is a UI for editing behave-graph behaviour graphs using react-flow.


https://github.com/retejs/rete


JavaScript framework for visual programming


Rete.js is a framework for creating visual interfaces and workflows. It provides out-of-the-box solutions for visualization using various libraries and frameworks, as well as solutions for processing graphs based on dataflow and control flow approaches.


https://retejs.org/


A tailorable TypeScript-first framework for creating processing-oriented node-based editors


https://retejs.org/examples

https://retejs.org/examples/processing/dataflow


Data Flow
This example showcases a data processing pipeline using rete-engine, where data flows from left to right through nodes.
Each node features a data method, which receives arrays of incoming data from their respective input sockets and delivers an object containing data corresponding to the output sockets. To initiate their execution, you can make use of the engine.fetch method by specifying the identifier of the target node. Consequently, the engine will execute all predecessors recursively, extracting their output data and delivering it to the specified node.


https://retejs.org/examples/processing/control-flow


Control Flow
This example showcases an executing of schema via control flow using rete-engine, where each node dynamically decides which of its outgoing nodes will receive control.
Each node features an execute method that takes an input port key as a control source, and a function for conveying control to outgoing nodes through a defined output port. To initiate the execution of the flow, you can use engine.execute method, specifying the identifier of the starting node. Consequently, the outgoing nodes will be executed sequentially, starting from the designated node.


https://retejs.org/examples/processing/hybrid-engine


Hybrid Engine
This example shows how rete-engine allows for the simultaneous integration of both dataflow and control flow. Consequently, certain nodes serve as data sources, others manage the flow, and a third set incorporates both of these approaches.


https://retejs.org/examples/modules


This example showcases a schema reusability technique, where processing is carried out using DataflowEngine. This is accomplished by creating a dedicated Module node that loads a nested schema containing Input and Output nodes, subsequently generating corresponding sockets. As a result, the module node initializes the engine, feeds it with input data, executes it, and retrieves the output data.


https://retejs.org/examples/scopes


Scopes
The structures shown in this example may also be referred to as subgraphs or nested nodes. This functionality is achieved using the advanced rete-scopes-plugin plugin. Changing a node's parent is easy: simply long-press the node and move it over the new parent node.


https://retejs.org/examples/selectable-connections


Selectable connections
The editor doesn't offer a built-in connection selection feature. However, if you're using BidirectFlow and can't delete connections from UI, or you need to select connections for other purposes, you can create a custom connection and sync it with AreaExtensions.selector


https://retejs.org/examples/reroute


Reroute
This particular example shows the usage of a plugin designed for user-controlled connection rerouting. Users can insert rerouting points by clicking on a connection or remove them by right-clicking. These points can be dragged or selected by users (similarly to nodes) to move multiple points at once.


https://retejs.org/examples/codegen


Code generation
This example showcases the embedding of Rete Studio's Playground, enabling you to input JavaScript code and check its graph representation, which can also be transformed into JavaScript code.


https://github.com/retejs/rete-studio


Rete Studio
Rete Studio is a general-purpose code generation tool powered by Rete.js. Its primary goal is to seamlessly bridge the gap between textual and visual programming languages. With Rete Studio, you can transform a textual programming language into a visual representation, which can then be transformed back into textual language.


https://studio.retejs.org/


A general-purpose code generation tool powered by Rete.js


https://studio.retejs.org/playground
https://studio.retejs.org/lab
https://studio.retejs.org/editor


https://retejs.org/docs


Visualization: you can choose React.js, Vue.js, Angular or Svelte to visualize nodes, sockets, controls, and connections. These visual components can be tailored to your specific needs by creating custom components for each framework, and they can all coexist in a single editor.


Processing: the framework offers various types of engines that enable processing diagrams based on their nature, including dataflow and control flow. These types can be combined within the same graph.


https://retejs.org/docs/development/rete-kit


The purpose of this tool is to improve efficiency when developing plugins or projects using this framework.


https://retejs.org/docs/api/rete-engine


DataflowEngine is a plugin that integrates Dataflow with NodeEditor making it easy to use. Additionally, it provides a cache for the data of each node in order to avoid recurring calculations.


ControlFlowEngine is a plugin that integrates ControlFlow with NodeEditor making it easy to use


https://github.com/graphology/graphology


Graphology
graphology is a robust & multipurpose Graph object for JavaScript and TypeScript.
It aims at supporting various kinds of graphs with the same unified interface.
A graphology graph can therefore be directed, undirected or mixed, allow self-loops or not, and can be simple or support parallel edges.
Along with this Graph object, one will also find a comprehensive standard library full of graph theory algorithms and common utilities such as graph generators, layouts, traversals etc.
Finally, graphology graphs are able to emit a wide variety of events, which makes them ideal to build interactive renderers for the browser.


https://graphology.github.io/


https://github.com/cytoscape/cytoscape.js


Graph theory (network) library for visualisation and analysis


https://js.cytoscape.org/

https://js.cytoscape.org/#notation


Cytoscape.js supports many different graph theory usecases. It supports directed graphs, undirected graphs, mixed graphs, loops, multigraphs, compound graphs (a type of hypergraph), and so on.


https://js.cytoscape.org/#core/graph-manipulation


https://github.com/jagenjo/litegraph.js


A graph node engine and editor written in Javascript similar to PD or UDK Blueprints, comes with its own editor in HTML5 Canvas2D. The engine can run client side or server side using Node. It allows to export graphs as JSONs to be included in applications independently.


https://github.com/noflo/noflo


NoFlo: Flow-based programming for JavaScript
NoFlo is an implementation of flow-based programming for JavaScript running on both Node.js and the browser. From WikiPedia:
In computer science, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.


NoFlo itself is just a library for implementing flow-based programs in JavaScript. There is an ecosystem of tools around NoFlo and the fbp protocol that make it more powerful. Here are some of them:

Flowhub -- browser-based visual programming IDE for NoFlo and other flow-based systems
noflo-nodejs -- command-line interface for running NoFlo programs on Node.js
noflo-browser-app -- template for building NoFlo programs for the web
noflo-assembly -- industrial approach for designing NoFlo programs
fbp-spec -- data-driven tests for NoFlo and other FBP environments
flowtrace -- tool for retroactive debugging of NoFlo programs. Supports visual replay with Flowhub

See also the list of reusable NoFlo modules on NPM.


https://noflojs.org/

https://noflojs.org/visualize/


FBP Graph Visualizer


https://flowhub.io/ide/


Flowhub IDE is a tool for building full-stack applications in a visual way. With the ecosystem of flow-based programming environments, you can use Flowhub to create anything from distributed data processing applications to internet-connected artworks.


https://flowbased.github.io/fbp-protocol/


FBP Network Protocol
The Flow-Based Programming network protocol (FBP protocol) has been designed primarily for flow-based programming interfaces like the Flowhub to communicate with various FBP runtimes. However, it can also be utilized for communication between different runtimes, for example server-to-server or server-to-microcontroller.


https://github.com/flowbased/fbp


FBP flow definition language parser
The fbp library provides a parser for a domain-specific language for flow-based-programming (FBP), used for defining graphs for FBP programming environments like NoFlo, MicroFlo and MsgFlo.


https://en.wikipedia.org/wiki/Flow-based_programming


In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of black box processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented.


https://en.wikipedia.org/wiki/Component-based_software_engineering


Component-based software engineering (CBSE), also called component-based development (CBD), is a style of software engineering that aims to build software out of loosely-coupled, modular components. It emphasizes the separation of concerns among different parts of a software system.


https://nodered.org/


Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways.
It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click.


https://github.com/node-red/node-red


Low-code programming for event-driven applications


https://nodered.org/docs/api/modules/v/1.3/@node-red_runtime.html


@node-red/runtime
This module provides the core runtime component of Node-RED. It does not include the Node-RED editor. All interaction with this module is done using the api provided.


https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/index.js#L125-L234


var redNodes = require("./nodes");


function start() {


Start the runtime


return redNodes.load().then(function() {


return redNodes.loadContextsPlugin().then(function () {
  redNodes.loadFlows().then(() => { redNodes.startFlows() }).catch(function(err) {});
  started = true;
});


https://github.com/node-red/node-red/blob/master/packages/node_modules/%40node-red/runtime/lib/nodes/index.js#L198-L267


var registry = require("@node-red/registry");
var flows = require("../flows");
var context = require("./context");


module.exports = {
    // Lifecycle
    init: init,
    load: registry.load,

    // ..snip..

    // Flow handling
    loadFlows:  flows.load,
    startFlows: flows.startFlows,
    stopFlows:  flows.stopFlows,
    setFlows:   flows.setFlows,
    getFlows:   flows.getFlows,

    addFlow:     flows.addFlow,
    getFlow:     flows.getFlow,
    updateFlow:  flows.updateFlow,
    removeFlow:  flows.removeFlow,

    // ..snip..

    // Contexts
    loadContextsPlugin: context.load,
    closeContextsPlugin: context.close,
    listContextStores: context.listStores,
};


Unsorted


https://github.com/google-gemini/cookbook


Gemini API Cookbook


A collection of guides and examples for the Gemini API.


This is a collection of guides and examples for the Gemini API, including quickstart tutorials for writing prompts and using different features of the API, and examples of things you can build.


https://ai.google.dev/gemini-api/docs


Get started with Gemini API


https://github.com/NaturalNode/natural/


Natural


general natural language facilities for node


"Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing.


https://naturalnode.github.io/natural/


“Natural” is a general natural language facility for nodejs. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported.


https://naturalnode.github.io/natural/tfidf.html


tf-idf
Term Frequency–Inverse Document Frequency (tf-idf) is implemented to determine how important a word (or words) is to a document relative to a corpus.


https://blog.logrocket.com/natural-language-processing-node-js/


Natural language processing with Node.js


https://github.com/pytorch/torchtune


A Native-PyTorch Library for LLM Fine-tuning


https://github.com/pytorch/torchtune#llama3


torchtune supports fine-tuning for the Llama3 8B models with support for 70B on its way. We currently support LoRA, QLoRA and Full-finetune on a single GPU as well as LoRA and Full fine-tune on multiple devices.


https://pytorch.org/blog/torchtune-fine-tune-llms/


https://llama.meta.com/llama3/


Meta Llama 3
Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications


https://github.com/meta-llama/llama3


Meta Llama 3


The official Meta Llama 3 GitHub site


We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.
This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models — including sizes of 8B to 70B parameters.
This repository is a minimal example of loading Llama 3 models and running inference. For more detailed examples, see llama-recipes.


https://github.com/meta-llama/llama-recipes


Llama Recipes: Examples to get started using the Llama models from Meta


Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.


https://zapier.com/blog/train-chatgpt-to-write-like-you/


How to train ChatGPT to write like you


https://github.com/EleutherAI/gpt-neox


An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.


GPT-NeoX
This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training. This library is in widespread use in academic, industry, and government labs, including by researchers at Oak Ridge National Lab, CarperAI, Stability AI, Together.ai, Korea University, Carnegie Mellon University, and the University of Tokyo among others. Uniquely among similar libraries GPT-NeoX supports a wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.
If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face transformers library instead which supports GPT-NeoX models.


https://github.com/EleutherAI/gpt-neox#why-gpt-neox


Why GPT-NeoX?
GPT-NeoX leverages many of the same features and technologies as the popular Megatron-DeepSpeed library but with substantially increased usability and novel optimizations. Major features include:

Distributed training with ZeRO and 3D parallelism
A wide variety of systems and hardwares, including launching via Slurm, MPI, and the IBM Job Step Manager, and has been run at scale on AWS, CoreWeave, ORNL Summit, ORNL Frontier, LUMI, and others.
Cutting edge architectural innovations including rotary and alibi positional embeddings, parallel feedforward attention layers, and flash attention.
Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 & 2
Curriculum Learning
Easy connections with the open source ecosystem, including Hugging Face's tokenizers and transformers libraries, logging via WandB, and evaluation via our Language Model Evaluation Harness.


https://microsoft.github.io/promptflow/


Prompt flow
Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.
With prompt flow, you will be able to:

Create flows that link LLMs, prompts, Python code and other tools together in a executable workflow.
Debug and iterate your flows, especially the interaction with LLMs with ease.
Evaluate your flows, calculate quality and performance metrics with larger datasets.
Integrate the testing and evaluation into your CI/CD system to ensure quality of your flow.
Deploy your flows to the serving platform you choose or integrate into your app’s code base easily.
(Optional but highly recommended) Collaborate with your team by leveraging the cloud version of Prompt flow in Azure AI.


https://microsoft.github.io/promptflow/concepts/concept-flows.html


Flows


While how LLMs work may be elusive to many developers, how LLM apps work is not - they essentially involve a series of calls to external services such as LLMs/databases/search engines, or intermediate data processing, all glued together.


https://microsoft.github.io/promptflow/reference/index.html


Reference


https://github.com/microsoft/autogen/tree/main/samples/apps/promptflow-autogen


Pomptflow Autogen Example


https://github.com/stanfordnlp/dspy


DSPy: The framework for programming—not prompting—foundation models


DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.
To make this more systematic and much more powerful, DSPy does two things. First, it separates the flow of your program (modules) from the parameters (LM prompts and weights) of each step. Second, DSPy introduces new optimizers, which are LM-driven algorithms that can tune the prompts and/or the weights of your LM calls, given a metric you want to maximize.
DSPy can routinely teach powerful models like GPT-3.5 or GPT-4 and local models like T5-base or Llama2-13b to be much more reliable at tasks, i.e. having higher quality and/or avoiding specific failure patterns. DSPy optimizers will "compile" the same program into different instructions, few-shot prompts, and/or weight updates (finetunes) for each LM. This is a new paradigm in which LMs and their prompts fade into the background as optimizable pieces of a larger system that can learn from data. tldr; less prompting, higher scores, and a more systematic approach to solving hard tasks with LMs.


https://dspy-docs.vercel.app/


DSPy - Programming—not prompting—Language Models


The Way of DSPy

Systematic Optimization: Choose from a range of optimizers to enhance your program. Whether it's generating refined instructions, or fine-tuning weights, DSPy's optimizers are engineered to maximize efficiency and effectiveness.
Modular Approach: With DSPy, you can build your system using predefined modules, replacing intricate prompting techniques with straightforward, effective solutions.
Cross-LM Compatibility: Whether you're working with powerhouse models like GPT-3.5 or GPT-4, or local models such as T5-base or Llama2-13b, DSPy seamlessly integrates and enhances their performance in your system.


https://github.com/sgl-project/sglang


SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.


https://lmsys.org/blog/2024-01-17-sglang/


Fast and Expressive LLM Inference with RadixAttention and SGLang


On the backend, we propose RadixAttention, a technique for automatic and efficient KV cache reuse across multiple LLM generation calls.


On the frontend, we develop a flexible domain-specific language embedded in Python to control the generation process. This language can be executed in either interpreter mode or compiler mode.


KV cache reuse means different prompts with the same prefix can share the intermediate KV cache and avoid redundant memory and computation.


To systematically exploit these reuse opportunities, we introduce RadixAttention, a novel technique for automatic KV cache reuse during runtime. Instead of discarding the KV cache after finishing a generation request, our approach retains the KV cache for both prompts and generation results in a radix tree. This data structure enables efficient prefix search, insertion, and eviction. We implement a Least Recently Used (LRU) eviction policy, complemented by a cache-aware scheduling policy, to enhance the cache hit rate.


On the frontend, we introduce SGLang, a domain-specific language embedded in Python. It allows you to express advanced prompting techniques, control flow, multi-modality, decoding constraints, and external interaction easily. A SGLang function can be run through various backends, such as OpenAI, Anthropic, Gemini, and local models.


Figure 5 shows a concrete example. It implements a multi-dimensional essay judge utilizing the branch-solve-merge prompting technique. This function uses LLMs to evaluate the quality of an essay from multiple dimensions, merges the judgments, generates a summary, and assigns a final grade.


The syntax of SGLang is largely inspired by Guidance. However, we additionally introduce new primitives and handle intra-program parallelism and batching


https://github.com/guidance-ai/guidance


Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use case—while reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly.


SGLang outperformed the baseline systems in all benchmarks, achieving up to 5 times higher throughput. It also excelled in terms of latency, particularly for the first token latency, where a prefix cache hit can be significantly beneficial. These improvements are attributed to the automatic KV cache reuse with RadixAttention, the intra-program parallelism enabled by the interpreter, and the co-design of the frontend and backend systems.
Additionally, our ablation study revealed no noticeable overhead even in the absence of cache hits, leading us to always enable the RadixAttention feature in the runtime.


https://github.com/mozilla-Ocho/llamafile


Distribute and run LLMs with a single file


llamafile lets you distribute and run LLMs with a single file
Our goal is to make open source large language models much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.


https://hacks.mozilla.org/2023/11/introducing-llamafile/


Introducing llamafile


https://github.com/microsoft/LLMLingua


To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.


https://github.com/microsoft/LLMLingua/blob/main/examples/Retrieval.ipynb


We know that LLMs have a 'lost in the middle' issue, where the position of key information in the prompt significantly impacts the final result.


How to build an accurate positional relationship between the document and the question has become an important issue. We evaluated the effects of four types of reranker methods on a dataset (NaturalQuestions Multi-document QA that is very close to the actual RAG scenario, e.g. BingChat).


The results show that reranker-based methods are significantly better than embedding methods. The LongLLMLingua method is even better than the current SoTA reranker methods, and it can more accurately capture the relationship between the query and the document, thus alleviating the 'lost in the middle' issue.


https://llmlingua.com/


(Long)LLMLingua | Designing a Language for LLMs via Prompt Compression


https://blog.llamaindex.ai/longllmlingua-bye-bye-to-middle-loss-and-save-on-your-rag-costs-via-prompt-compression-54b559b9ddf7


LongLLMLingua: Bye-bye to Middle Loss and Save on Your RAG Costs via Prompt Compression


https://github.com/apoorvumang/prompt-lookup-decoding


In several LLM use cases where you're doing input grounded generation (summarization, document QA, multi-turn chat, code editing), there is high n-gram overlap between LLM input (prompt) and LLM output. This could be entity names, phrases, or code chunks that the LLM directly copies from the input while generating the output. Prompt lookup exploits this pattern to speed up autoregressive decoding in LLMs.


On both summarization and context-QA, we get a relatively consistent 2.4x speedup (on average).


https://twitter.com/apoorv_umang/status/1728831397153104255


Prompt lookup decoding: Get 2x-4x reduction in latency for input grounded LLM generation with no drop in quality using this speculative decoding technique


huggingface/transformers#27722


Adding support for prompt lookup decoding (variant of assisted generation)


ggerganov/llama.cpp#4226


lookahead-prompt: add example


https://github.com/vercel/ai


Vercel AI SDK
The Vercel AI SDK is a library for building AI-powered streaming text and chat UIs.


Build AI-powered applications with React, Svelte, Vue, and Solid


https://sdk.vercel.ai/docs


Vercel AI SDK
An open source library for building AI-powered user interfaces.
The Vercel AI SDK is an open-source library designed to help developers build conversational streaming user interfaces in JavaScript and TypeScript. The SDK supports React/Next.js, Svelte/SvelteKit, and Vue/Nuxt as well as Node.js, Serverless, and the Edge Runtime.


https://github.com/oobabooga/text-generation-webui


A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.


Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation.


https://github.com/oobabooga/text-generation-webui-extensions


This is a directory of extensions for oobabooga/text-generation-webui


https://github.com/huggingface/chat-ui


Open source codebase powering the HuggingChat app


https://github.com/lm-sys/FastChat


FastChat is an open platform for training, serving, and evaluating large language model based chatbots.


FastChat powers Chatbot Arena, serving over 5 million chat requests for 30+ LLMs


https://chat.lmsys.org/


Arena has collected over 100K human votes from side-by-side LLM battles to compile an online LLM Elo leaderboard


https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard


https://github.com/vllm-project/vllm


A high-throughput and memory-efficient inference and serving engine for LLMs


vLLM is a fast and easy-to-use library for LLM inference and serving.


https://blog.vllm.ai/

https://blog.vllm.ai/2023/06/20/vllm.html


vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention


https://blog.vllm.ai/2023/11/14/notes-vllm-vs-deepspeed.html


Notes on vLLM v.s. DeepSpeed-FastGen


https://github.com/philipturner/metal-benchmarks


Apple GPU microarchitecture


This document thoroughly explains the Apple GPU microarchitecture, focusing on its GPGPU performance. Details include latencies for each ALU assembly instruction, cache sizes, and the number of unique instruction pipelines. This document enables evidence-based reasoning about performance on the Apple GPU, helping people diagnose bottlenecks in real-world software. It also compares Apple silicon to generations of AMD and Nvidia microarchitectures, showing where it might exhibit different performance patterns. Finally, the document examines how Apple's design choices improve power efficiency compared to other vendors.
This repository also contains open-source benchmarking scripts. They allow anyone to reproduce and verify the author's claims about performance. A complementary library reports the hardware specifications of any Apple-designed GPU.


https://github.com/philipturner/applegpuinfo


Print all known information about the GPU on Apple-designed chips


This is a mini-framework for querying parameters of an Apple-designed GPU. It also contains a command-line tool, gpuinfo, which reports information similarly to clinfo. It was co-authored with an AI.


https://github.com/Oblomov/clinfo


Print all known information about all available OpenCL platforms and devices in the system


clinfo is a simple command-line application that enumerates all possible (known) properties of the OpenCL platform and devices available on the system.


https://github.com/tinygrad/tinygrad


You like pytorch? You like micrograd? You love tinygrad! ❤️


This may not be the best deep learning framework, but it is a deep learning framework.
Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.


https://tinygrad.org/


https://github.com/microsoft/DirectML


DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.