FIRST
- user will upload file to ROR -> upload S3 -> create entity in DB
- push job to sidekiq for chunking of the entity
SECOND
- process chunk generation async job in lambda
- call lambda/chunk/generate?url=
- parser url (html, md, pdf, docs, text, XML sitemap) based on MIME Type from content-type
- text extraction
- call lambda/chunk/generate?url=
- chunking NLTK based limit 256 token length