Skip to content

Instantly share code, notes, and snippets.

@aaronjolson
aaronjolson / llamaindex_activeloop_vectorize_data_from_github.py
Last active February 20, 2024 13:36
Code for using llama-index to load github data into the activeloop deeplake vector database. Originally from this course https://learn.activeloop.ai/courses/take/rag/multimedia/51349127-chat-with-your-code-llamaindex-and-activeloop-deep-lake-for-github-repositories this code has the imports modified to work with the latest version of llama-index
''' in .env file
GITHUB_TOKEN="YOUR_GH_CLASSIC_TOKEN"
OPENAI_API_KEY="YOUR_OPENAI_KEY"
ACTIVELOOP_TOKEN="YOUR_ACTIVELOOP_TOKEN"
DATASET_PATH="hub://YOUR_ORG/repository_vector_store"
need to install llama-index >= 0.10.0, python-dotenv, and llama-index-readers-github >= 0.1.5
'''
@jrknox1977
jrknox1977 / ollama_dspy.py
Created February 9, 2024 18:06
ollama+DSPy using OpenAI APIs.
# install DSPy: pip install dspy
import dspy
# Ollam is now compatible with OpenAI APIs
#
# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
# If you do not include this you will get an error.
#
# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
# At least with mistral.
@kglspl
kglspl / h5fsutil.py
Created January 8, 2024 17:54
Easier interface to h5py datasets
import h5py
# Copyright (c) 2023 kglspl
# MIT License (the same as: https://github.com/kglspl/ppmparser/blob/master/LICENSE)
class H5FS(object):
def __init__(self, filename, mode):
self.filename = filename
self.f = h5py.File(filename, mode)
self.dset = None
@kupietools
kupietools / Docker Desktop v 4.0.0 thru 4.22.1 direct download links
Last active July 21, 2024 10:07
List of Direct Download links for Docker Desktop from version 4.0.0 released 2021-08-31 thru 4.22.1 released 2023-08-24, as archived on archive.org
@jneuff
jneuff / fix-tokenizer.rs
Created October 17, 2023 11:35
Fix a huggingface tokenizer to which tokens have been added after training
/// Fix a huggingface tokenizer to which tokens have been added after training.
///
/// Adding tokens after training via `add_special_tokens` leads to them being added to the
/// `added_tokens` section but not to the `model.vocab` section. This yields warnings like:
/// ```
/// [2023-10-17T07:54:05Z WARN tokenizers::tokenizer::serialization] Warning: Token '<|empty_usable_token_space_1023|>' was expected to have ID '129023' but was given ID 'None'
/// ```
/// The code in this file ensures that all tokens from `added_tokens` are also placed into
/// `model.vocab`. This fixes the warning and does not change the tokenizer's behavior.
@claysauruswrecks
claysauruswrecks / requirements.txt
Last active April 7, 2024 16:32
Example using LlamaHub loaders to index Github repos into LlamaIndex and query GPTSimpleVectorIndex with GPT-4
# main
llama-index
langchain
@mblondel
mblondel / kernel_kmeans.py
Last active January 4, 2024 11:45
Kernel K-means.
"""Kernel K-means"""
# Author: Mathieu Blondel <mathieu@mblondel.org>
# License: BSD 3 clause
import numpy as np
from sklearn.base import BaseEstimator, ClusterMixin
from sklearn.metrics.pairwise import pairwise_kernels
from sklearn.utils import check_random_state
@MohamedAlaa
MohamedAlaa / tmux-cheatsheet.markdown
Last active July 26, 2024 12:21
tmux shortcuts & cheatsheet

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname