Skip to content

Instantly share code, notes, and snippets.

@frutik
Last active October 9, 2023 12:57
Show Gist options
  • Save frutik/44c9930561552cacbffd510e4b2bb6d4 to your computer and use it in GitHub Desktop.
Save frutik/44c9930561552cacbffd510e4b2bb6d4 to your computer and use it in GitHub Desktop.
https://medium.com/@kelvin.lu.au/compare-pdf-question-answering-with-openai-and-google-vertexai-46638d62327b
https://medium.com/@kelvin.lu.au/what-we-need-to-know-before-adopting-a-vector-database-85e137570fbb
https://medium.com/@kelvin.lu.au/disadvantages-of-rag-5024692f2c53
https://medium.com/@Ratnaparkhi/how-the-search-technology-is-evolving-88607f5efb9e
from langchain.text_splitter import CharacterTextSplitter
splitter = CharacterTextSplitter('.', chunk_size=500, chunk_overlap=2)
splitter.split_text('a s d. f g')
@frutik
Copy link
Author

frutik commented Oct 9, 2023

CharacterTextSplitter will only split on separator (which is '\n\n' by default). chunk_size is the maximum chunk size that will be split if splitting is possible. If a string starts with n characters, has a separator, and has m more characters before the next separator then the first chunk size will be n if chunk_size < n + m + len(separator).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment