Skip to content

Instantly share code, notes, and snippets.

@Aschen
Created April 26, 2024 14:37
Show Gist options
  • Save Aschen/bc85be97fa484296435423d30e0080b1 to your computer and use it in GitHub Desktop.
Save Aschen/bc85be97fa484296435423d30e0080b1 to your computer and use it in GitHub Desktop.
import { encoding_for_model } from 'tiktoken'
// Choose the model, it can be 'gpt4' for example
const tokenEncoder = encoding_for_model('text-embedding-ada-002')
// Contains an array of Uint32 ([-0.102, 0.62, ..])
const tokens = tokenEncoder.encode(text)
// Number of tokens
console.log(tokens.length)
// Retrieve truncated text corresponding to 8096 tokens
const truncatedText = Buffer.from(
tokenEncoder.decode(tokens.slice(0, 8096))
).toString()
import { encoding_for_model } from 'tiktoken'
// Choose the model, it can be 'gpt4' for example
const tokenEncoder = encoding_for_model('text-embedding-ada-002')
// Contains an array of Uint32 ([-0.102, 0.62, ..])
const tokens = tokenEncoder.encode(text)
// Number of tokens
console.log(tokens.length)
// Retrieve truncated text corresponding to 8096 tokens
const truncatedText = Buffer.from(
tokenEncoder.decode(tokens.slice(0, 8096))
).toString()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment