Skip to content

Instantly share code, notes, and snippets.

@sammaniamsam
sammaniamsam / ChatGptContextBytes.py
Last active September 13, 2023 14:21
This Gist is to contextual resources all located in a directory and write them as text chunks to an output file that can be fed into ChatGPT
"""
Import Libraries: The script begins by importing several Python libraries required for processing different types of files, including os, fnmatch, docx, pptx, PyPDF2, and openpyxl.
Chunk Size Configuration: It defines a constant CHUNK_SIZE_IN_WORDS, which determines the maximum number of words in each text chunk. This value can be modified as needed to control the size of the output text chunks.
File Content Extraction Functions: The script defines functions to extract text content from different file types:
read_word_file: Extracts text from .docx files.
read_pptx_file: Extracts text from .pptx files, including text from slides and shapes.
read_pdf_file: Extracts text from .pdf files using PyPDF2.