Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
![Screenshot 2023-12-18 at 10 40 27 PM](https://private-user-images.githubusercontent.com/3837836/291468646-4c30ad72-76ee-4939-a5fb-16b570d38cf2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE5MTIyMjcsIm5iZiI6MTcyMTkxMTkyNywicGF0aCI6Ii8zODM3ODM2LzI5MTQ2ODY0Ni00YzMwYWQ3Mi03NmVlLTQ5MzktYTVmYi0xNmI1NzBkMzhjZjIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjVUMTI1MjA3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTg3MmI0NDNkNzlmNDMzMjVmM2RiMmQ0ZTY5ZTBkM2JiMzhkNTQwZGM2MjlhNjk1MDBlZDhlNGQ4MDEzMTY2ZiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.QCCwWhsCgG_0djcPKz4YqbSUhGdMyYN-iASZKp_TTXg)
""" | |
a simple script that reads tweets inside a json file, uses openai to compute embeddings and creates two files, metadata.tsv and output.tsv, which cam be used to visualise the tweets and their embeddings in TensorFlow Projector (https://projector.tensorflow.org/) | |
""" | |
# obtain tweets.json from https://gist.github.com/gd3kr/948296cf675469f5028911f8eb276dbc | |
import pandas as pd | |
import json | |
from openai import OpenAI |
import styles from './Apps.module.scss'; | |
import { useEffect, useState } from 'react'; | |
import Link from 'next/link'; | |
const APPS = [ | |
{ | |
title: 'APP', | |
hero: 'Lorem ipsum dolor sit amet', | |
description: | |
'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do.', |
Install HF Code Autocomplete VSCode plugin.
We are not going to set an API token. We are going to specify an API endpoint.
We will try to deploy that API ourselves, to use our own GPU to provide the code assistance.
We will use bigcode/starcoder
, a 15.5B param model.
We will use NF4 4-bit quantization to fit this into 10787MiB VRAM.
It would require 23767MiB VRAM unquantized. (still fits on a 4090, which has 24564MiB)!
# Find | |
import\s(([^;]|\n)*)\sfrom\s(['"])(\.{1,2}\/.*)(?<!\.js)(?<!\.(css|pdf|png|jpg|jsx|mjs|mp3|mp4|svg|ttf))(?<!\.(avif|json|webm|webp|woff))(?<!\.woff2)(['"]); | |
# Replace with | |
import $1 from $3$4.js$7; |
The screenshots were taken on different sessions.
The entire sessions are included on the screenshots.
I lost the original prompts, so I had to reconstruct them, and still managed to reproduce.
The "compressed" version is actually longer! Emojis and abbreviations use more tokens than common words.
This episode of Recsperts was transcribed with Whisper from OpenAI, an open-source neural net trained on almost 700 hours of audio. The model includes an encoder-decoder architecture by tokenizing audio into 30-second chunks, normalizing audio samples to the log-Mel scale, and passing the data into an encoder. A decoder is trained to predict the captioned text matching the encoder, and the model includes transcription, as well as timestamp-aligned transcription, and multilingual translation.
The transcription process outputs a single string file, so it's up to the end-user to parse out individual speakers, or run the model [through a sec
Let suppose I have two github accounts, https://github.com/rahul-office and https://github.com/rahul-personal. Now i want to setup my mac to easily talk to both the github accounts.
NOTE: This logic can be extended to more than two accounts also. :)
The setup can be done in 5 easy steps:
This is a step-by-step tutorial for hosting your website under your domain on IPFS, from zero, on a DigitalOcean Ubuntu 16.04.3 x64 Droplet (i am using the $10 variant with 2GB RAM).
Log in as root.
First, make sure the system is up to date, and install tar
and wget
: