Skip to content

Instantly share code, notes, and snippets.

@simonw
Created May 18, 2023 20:55
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simonw/a06e091310ceee0b9d5146722279c93c to your computer and use it in GitHub Desktop.
Save simonw/a06e091310ceee0b9d5146722279c93c to your computer and use it in GitHub Desktop.
cat ttok/cli.py | llm --system 'Explain this code' -s --gpt4

This code is a command-line script to count and truncate text based on tokens using the tiktoken library and the Click framework for handling command-line arguments.

Here's a breakdown of the code:

  1. Import necessary libraries - click, sys, and tiktoken.

  2. Define the command-line interface (CLI) using the @click.command() decorator.

  3. Define the command-line options using the @click.argument and @click.option decorators:

    • prompt: The text to process, passed as command-line arguments or from standard input.
    • input: File object to read input from.
    • truncate: Number of tokens to truncate the input.
    • model: The model to use for tokenization, defaulting to 'gpt-3.5-turbo'.
    • output_tokens: Flag to output token integers instead of the length of tokens.
  4. Define the CLI function cli(), which takes the options as arguments.

  5. Inside the cli() function:

    • Check if the given model is valid, otherwise, raise an exception.
    • Read input text either from the provided file or standard input.
    • Call the relevant tiktoken functions to tokenize and possibly truncate the input text.
    • Print the output accordingly, either as token integers, truncated text, or the total number of tokens.

The function also includes a docstring with examples of how to use the command-line script for various operations, like reading input from a file, truncating tokens, and changing the tokenization model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment