Absolutely! Here's a breakdown of an algorithm designed to address the challenges you've outlined, along with code examples in TypeScript to illustrate the process.
Algorithm Design
The core idea is to process the subtitle segments in chunks, use the LLM to identify potential paragraph and chapter breaks within those chunks, and then merge and refine these suggestions to form the final structure.
Steps:
- Chunking with Overlap:
- Divide the
SubtitleSegment
array into overlapping chunks. This addresses the context window limitation while minimizing the risk of cutting off logical units.