Skip to content

Instantly share code, notes, and snippets.

@asbisen
asbisen / split_recursive.jl
Created March 3, 2024 06:26
recursive text splitter
"""
split_recursive( data; delimiters, chunk_size )
Recursively split the data using the provided delimiters (default: ["\n\n", "\n", " ", " "])
in ordered manner and chunk_size (default: 4096). The function will split the data using the
first delimiter and then recursively split the chunks using the next delimiter if the chunk
size is larger than the provided chunk_size. If all delimiters are exhausted then the function
will split the data based on the chunk_size. The function will merge consecutive chunks if they
are smaller than chunk_size.