Skip to content

Instantly share code, notes, and snippets.

@obeduri
Last active October 26, 2023 03:38
Show Gist options
  • Save obeduri/c44cbffeaba0b988e7a7a3440d04e0f5 to your computer and use it in GitHub Desktop.
Save obeduri/c44cbffeaba0b988e7a7a3440d04e0f5 to your computer and use it in GitHub Desktop.
PHP Snippets
// Split the string into individual words
$words = explode(' ', $string);
// Initialize a counter
$tokens = 0;
// Iterate over the words
foreach ($words as $word) {
// Increment the counter by the ceil of the word's length divided by 4
// GPT-3's tokenizer uses a byte pair encoding that won't split a word into more than one token if it's 4 bytes or less
$tokens += ceil(mb_strlen($word) / 4);
}
// Return the total count
return $tokens;
}
$string = "antidisestablishmentarianism";
echo countTokens($string);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment