Skip to content

Instantly share code, notes, and snippets.

@donbr
Created April 2, 2024 19:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save donbr/87185b0be93cb1aa491763ede46eadf2 to your computer and use it in GitHub Desktop.
Save donbr/87185b0be93cb1aa491763ede46eadf2 to your computer and use it in GitHub Desktop.
rag_from_scratch_playlist
{
"kind": "youtube#playlistItemListResponse",
"etag": "4oVDxXOqlyaVnRrnJXuhwXPhMSg",
"items": [
{
"kind": "youtube#playlistItem",
"etag": "bh6WIsa4tB6e6DgavMaPsvdBOOA",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC41NkI0NEY2RDEwNTU3Q0M2",
"snippet": {
"publishedAt": "2024-02-06T01:10:40Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG From Scratch: Part 1 (Overview)",
"description": "LLMs are a powerful new platform, but they are not always trained on data that is relevant for our tasks. This is where retrieval augmented generation (or RAG) comes in: RAG is a general methodology for connecting LLMs with external data sources such as private or recent data. It allows LLMs to use external data in generation of their output. This video series will build up an understanding of RAG from scratch, starting with the basics of indexing, retrieval, and generation. It will build up to more advanced techniques to address edge cases or challenges in RAG. \n\nCode: \nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_1_to_4.ipynb\n\nSlides:\nhttps://docs.google.com/presentation/d/1C9IaAwHoWcc4RSTqo-pCoN3h0nCgqV2JEYZUJunv_9Q/edit?usp=sharing",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/wd7TZ4w1mSw/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/wd7TZ4w1mSw/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/wd7TZ4w1mSw/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/wd7TZ4w1mSw/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/wd7TZ4w1mSw/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 0,
"resourceId": {
"kind": "youtube#video",
"videoId": "wd7TZ4w1mSw"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "OrWZ0tkCTlBEu4hz0l7BXYAwEEo",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC4yODlGNEE0NkRGMEEzMEQy",
"snippet": {
"publishedAt": "2024-02-06T01:13:33Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG From Scratch: Part 2 (Indexing)",
"description": "This is the second video in our series on RAG. The aim of this series is to build up an understanding of RAG from scratch, starting with the basics of indexing, retrieval, and generation. This video focuses on indexing, covering the process of document loading, splitting, and embedding. \n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_1_to_4.ipynb\n\nSlides:\nhttps://docs.google.com/presentation/d/1MhsCqZs7wTX6P19TFnA9qRSlxH3u-1-0gWkhBiDG9lQ/edit?usp=sharing",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/bjb_EMsTDKI/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/bjb_EMsTDKI/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/bjb_EMsTDKI/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/bjb_EMsTDKI/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/bjb_EMsTDKI/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 1,
"resourceId": {
"kind": "youtube#video",
"videoId": "bjb_EMsTDKI"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "UCSNsEg-YbO7tKu0pOB0nfymMXk",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC4wMTcyMDhGQUE4NTIzM0Y5",
"snippet": {
"publishedAt": "2024-02-06T01:15:48Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG From Scratch: Part 3 (Retrieval)",
"description": "This is the third video in our series on RAG. The aim of this series is to build up an understanding of RAG from scratch, starting with the basics of indexing, retrieval, and generation. This video focuses on retrieval, covering the process of document search using an index. \n\nCode: \nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_1_to_4.ipynb\n\nSlides:\nhttps://docs.google.com/presentation/d/124I8jlBRCbb0LAUhdmDwbn4nREqxSxZU1RF_eTGXUGc/edit?usp=sharing",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/LxNVgdIz9sU/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/LxNVgdIz9sU/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/LxNVgdIz9sU/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/LxNVgdIz9sU/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/LxNVgdIz9sU/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 2,
"resourceId": {
"kind": "youtube#video",
"videoId": "LxNVgdIz9sU"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "wBp5fk6L21eWY7ePeUApyZPOx8c",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC41MjE1MkI0OTQ2QzJGNzNG",
"snippet": {
"publishedAt": "2024-02-06T01:17:42Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG From Scratch: Part 4 (Generation)",
"description": "This is the fourth video in our series on RAG. The aim of this series is to build up an understanding of RAG from scratch, starting with the basics of indexing, retrieval, and generation. This video focuses on generation, covering the process of RAG prompt construction and passing the prompt to an LLM for answer generation. \n\nCode: \nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_1_to_4.ipynb\n\nSlides:\nhttps://docs.google.com/presentation/d/1eRJwzbdSv71e9Ou9yeqziZrz1UagwX8B1kL4TbL5_Gc/edit?usp=sharing",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/Vw52xyyFsB8/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/Vw52xyyFsB8/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/Vw52xyyFsB8/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/Vw52xyyFsB8/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/Vw52xyyFsB8/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 3,
"resourceId": {
"kind": "youtube#video",
"videoId": "Vw52xyyFsB8"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "5RW4Ks3n7MMbp5RIlLADA1XSXvs",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC4wOTA3OTZBNzVEMTUzOTMy",
"snippet": {
"publishedAt": "2024-02-13T23:40:10Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 5 (Query Translation -- Multi Query)",
"description": "Query rewriting is a popular strategy to improve retrieval. Multi-query is an approach that re-writes a question from multiple perspectives, performs retrieval on each re-written question, and takes the unique union of all docs. \n\nSlides:\nhttps://docs.google.com/presentation/d/15pWydIszbQG3Ipur9COfTduutTZm6ULdkkyX-MNry8I/edit?usp=sharing\n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/JChPi0CRnDY/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/JChPi0CRnDY/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/JChPi0CRnDY/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/JChPi0CRnDY/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/JChPi0CRnDY/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 4,
"resourceId": {
"kind": "youtube#video",
"videoId": "JChPi0CRnDY"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "ehNU7b7eFhg6O-rmTMDd3gw3ty8",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC4xMkVGQjNCMUM1N0RFNEUx",
"snippet": {
"publishedAt": "2024-02-13T23:41:29Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 6 (Query Translation -- RAG Fusion)",
"description": "Query rewriting is a popular strategy to improve retrieval. RAG-fusion is an approach that re-writes a question from multiple perspectives, performs retrieval on each re-written question, and performs reciprocal rank fusion on the results from each retrieval, giving a consolidated ranking. \n\nSlides:\nhttps://docs.google.com/presentation/d/1EwykmdVSQqlh6XpGt8APOMYp4q1CZqqeclAx61pUcjI/edit?usp=sharing\n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb\n\nReference:\nhttps://github.com/Raudaschl/rag-fusion",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/77qELPbNgxA/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/77qELPbNgxA/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/77qELPbNgxA/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/77qELPbNgxA/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/77qELPbNgxA/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 5,
"resourceId": {
"kind": "youtube#video",
"videoId": "77qELPbNgxA"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "mIH_4G2OvYFmmWR3gQwjd49ytPc",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC5GNjNDRDREMDQxOThCMDQ2",
"snippet": {
"publishedAt": "2024-02-19T04:14:40Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 7 (Query Translation -- Decomposition)",
"description": "Query decomposition is a strategy to improve question-answering by breaking down a question into sub-questions. These can either be (1) solved sequentially or (2) independently answered followed by consolidation into a final answer. \n\nSlides:\nhttps://docs.google.com/presentation/d/1O97KYrsmYEmhpQ6nkvOVAqQYMJvIaZulGFGmz4cuuVE/edit?usp=sharing\n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb\n\nReference:\nhttps://arxiv.org/abs/2205.10625\nhttps://arxiv.org/pdf/2212.10509",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/h0OPWlEOank/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/h0OPWlEOank/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/h0OPWlEOank/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/h0OPWlEOank/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/h0OPWlEOank/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 6,
"resourceId": {
"kind": "youtube#video",
"videoId": "h0OPWlEOank"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "U2y8lrlj2L9S7lNNiIR9lh_wv0I",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC5DQUNERDQ2NkIzRUQxNTY1",
"snippet": {
"publishedAt": "2024-02-13T23:42:59Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 8 (Query Translation -- Step Back)",
"description": "Step-back prompting is an approach to improve retrieval that builds on chain-of-thought reasoning. From a question, it generates a step-back (higher level, more abstract) question that can serve as a precondition to correctly answering the original question. This is especially useful in cases where background knowledge or more fundamental understanding is helpful to answer a specific question.\n\nSlides:\nhttps://docs.google.com/presentation/d/1L0MRGVDxYA1eLOR0L_6Ze1l2YV8AhN1QKUtmNA-fJlU/edit?usp=sharing\n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb\n\nReference:\nhttps://arxiv.org/pdf/2310.06117.pdf",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/xn1jEjRyJ2U/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/xn1jEjRyJ2U/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/xn1jEjRyJ2U/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/xn1jEjRyJ2U/sddefault.jpg",
"width": 640,
"height": 480
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 7,
"resourceId": {
"kind": "youtube#video",
"videoId": "xn1jEjRyJ2U"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "ljjM0Iz56UCve8ACIj1ubDrbLE0",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC45NDk1REZENzhEMzU5MDQz",
"snippet": {
"publishedAt": "2024-02-13T23:43:42Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 9 (Query Translation -- HyDE)",
"description": "HyDE (Hypothetical Document Embeddings) is an approach to improve retrieval that generates hypothetical documents that could be used to answer the user input question. These documents, drawn from the LLMs knowledge, are embedded and used to retrieve documents from an index. The idea is that hypothetical documents may be better aligned with the indexes documents than the raw user question.\n\nSlides:\nhttps://docs.google.com/presentation/d/10MmB_QEiS4m00xdyu-92muY-8jC3CdaMpMXbXjzQXsM/edit?usp=sharing\n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb\n\nReference:\nhttps://arxiv.org/pdf/2212.10496.pdf",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/SaDzIVkYqyY/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/SaDzIVkYqyY/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/SaDzIVkYqyY/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/SaDzIVkYqyY/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/SaDzIVkYqyY/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 8,
"resourceId": {
"kind": "youtube#video",
"videoId": "SaDzIVkYqyY"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "6bm3blrctNSJ3grKHlXYzIuPBVs",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC40NzZCMERDMjVEN0RFRThB",
"snippet": {
"publishedAt": "2024-03-18T00:16:24Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "Private video",
"description": "This video is private.",
"thumbnails": {},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 9,
"resourceId": {
"kind": "youtube#video",
"videoId": "S9njv889Q-4"
}
}
},
{
"kind": "youtube#playlistItem",
"etag": "wy8EzPaxjcQck7u3wkT7pcgCLdQ",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC5EMEEwRUY5M0RDRTU3NDJC",
"snippet": {
"publishedAt": "2024-03-18T03:17:19Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 10 (Routing)",
"description": "This is the 10th video in our RAG From Scratch series, focused on different types of query routing (logical and semantic).\n\nNotebook:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_10_and_11.ipynb\n\nSlides:\nhttps://docs.google.com/presentation/d/1kC6jFj8C_1ZXDYcFaJ8vhJvCYEwxwsVqk2VVeKKuyx4/edit?usp=sharing",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/pfpIndq7Fi8/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/pfpIndq7Fi8/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/pfpIndq7Fi8/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/pfpIndq7Fi8/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/pfpIndq7Fi8/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 10,
"resourceId": {
"kind": "youtube#video",
"videoId": "pfpIndq7Fi8"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "iU2FyfOFyCmCBnFtPqDPJ0mMjvw",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC45ODRDNTg0QjA4NkFBNkQy",
"snippet": {
"publishedAt": "2024-03-19T02:56:43Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "Private video",
"description": "This video is private.",
"thumbnails": {},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 11,
"resourceId": {
"kind": "youtube#video",
"videoId": "ktRb17mAwYc"
}
}
},
{
"kind": "youtube#playlistItem",
"etag": "FaiIL3YcX988EXolSM4CI4_QB6I",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC4zMDg5MkQ5MEVDMEM1NTg2",
"snippet": {
"publishedAt": "2024-03-26T04:38:06Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "Private video",
"description": "This video is private.",
"thumbnails": {},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 12,
"resourceId": {
"kind": "youtube#video",
"videoId": "Ecp6uRKr9PI"
}
}
},
{
"kind": "youtube#playlistItem",
"etag": "LaUdFF4ObqU0XeKwSpR6eN3FRuY",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC41Mzk2QTAxMTkzNDk4MDhF",
"snippet": {
"publishedAt": "2024-03-27T03:20:04Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 11 (Query Structuring)",
"description": "Our RAG From Scratch video series walks through impt RAG concepts in short / focused videos w/ code. \n\nProblem: We interact w/ databases using domain-specific languages (e.g., SQL, Cypher for Relational and Graph DBs). And, many vectorstores have metadata that can allow for structured queries to filter chunks. But RAG systems ingest questions in natural language.\n\nIdea: A great deal of work has focused on query structuring, the process of text-to-DSL where DSL is a domain specific language required to interact with a given database. This converts user questions into structured queries. Below are links that dive into text-to-SQL/Cypher, and the below video overviews query structuring for vectorstores using function calling.\n\nCode: \nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_10_and_11.ipynb\n\nReferences:\n1/ Blog with links to various tutorials and templates:\nhttps://blog.langchain.dev/query-construction/\n2/ Deep dive on graphDBs (c/o @neo4j):\nhttps://blog.langchain.dev/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/\n3/ Query structuring docs:\nhttps://python.langchain.com/docs/use_cases/query_analysis/techniques/structuring\n4/ Self-query retriever docs:\nhttps://python.langchain.com/docs/modules/data_connection/retrievers/self_query",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/kl6NwWYxvbM/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/kl6NwWYxvbM/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/kl6NwWYxvbM/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/kl6NwWYxvbM/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/kl6NwWYxvbM/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 13,
"resourceId": {
"kind": "youtube#video",
"videoId": "kl6NwWYxvbM"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "GuxTLwalXLBGjHm1coNdkNdz0iI",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC5EQUE1NTFDRjcwMDg0NEMz",
"snippet": {
"publishedAt": "2024-03-27T23:11:40Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG from scratch: Part 12 (Multi-Representation Indexing)",
"description": "Our RAG From Scratch video series walks through impt RAG concepts in short / focused videos w/ code. This is the 12th video in our series and focuses on some useful tricks for indexing full documents.\n\nProblem: Many RAG approaches focus on splitting documents into chunks and returning some number upon retrieval for the LLM. But chunk size and chunk number can be brittle parameters that many user find difficult to set; both can significantly affect results if they do not contain all context to answer a question.\n\nIdea: Proposition indexing (@tomchen0 et al) is a nice paper that uses an LLM to produce document summaries (\"propositions\") that are optimized for retrieval. We've built on this with two retrievers: (1) multi-vector retriever embeds summaries, but returns full documents to the LLM. (2) parent-doc retriever embeds chunks but returns full documents to the LLM. Idea is to get best of both worlds: use smaller / concise representations (summaries or chunks) to retrieve, but link them to full documents / context for generation.\n\nThe approach is very general, and can be applied to tables or images: in both cases, index a summary but return the raw table or image for reasoning. This gets around challenges w/ directly embedding tables or images (multi-modal embeddings), using a summary as a representation for text-based similarity search.\n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_12_to_14.ipynb\n\nReferences:\n1/ Proposition indexing: https://arxiv.org/pdf/2312.06648.pdf\n2/ Multi-vector:\nhttps://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector\n3/ Parent-document:\nhttps://python.langchain.com/docs/modules/data_connection/retrievers/parent_document_retriever\n4/ Blog applying this to tables:\nhttps://blog.langchain.dev/semi-structured-multi-modal-rag/\n5/ Blog applying this to images w/ eval:\nhttps://blog.langchain.dev/multi-modal-rag-template/",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/gTCU9I6QqCE/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/gTCU9I6QqCE/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/gTCU9I6QqCE/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/gTCU9I6QqCE/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/gTCU9I6QqCE/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 14,
"resourceId": {
"kind": "youtube#video",
"videoId": "gTCU9I6QqCE"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "W1eg_-mHu-3_NdzgKSmv4xItQBI",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC41QTY1Q0UxMTVCODczNThE",
"snippet": {
"publishedAt": "2024-03-28T23:57:42Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG From Scratch: Part 13 (RAPTOR)",
"description": "Our RAG From Scratch video series walks through impt RAG concepts in short / focused videos w/ code. \n \nProblem: \nRAG systems need to handle \"lower-level\" questions that reference specific facts found in a single document or \"higher-level\" questions that distill ideas that span many documents. Handling both types of questions can be a challenge with typical kNN retrieval where only a finite number of doc chunks are retrieved.\n\nIdea: \nRAPTOR (@parthsarthi03 et al) is a paper that addresses this by creating document summaries that capture higher-level concepts. It embeds and clusters documents, and then summarizes each cluster. It does this recursively, producing a tree of summaries with increasingly high-level concepts. The summaries and starting docs are indexed together, giving coverage across user questions. \n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_12_to_14.ipynb\nhttps://github.com/langchain-ai/langchain/blob/master/cookbook/RAPTOR.ipynb\n\nReferences:\n1/ Paper: https://arxiv.org/pdf/2401.18059.pdf\n2/ Longer deep dive: https://www.youtube.com/watch?v=jbGchdTL7d0",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/z_6EeA2LDSw/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/z_6EeA2LDSw/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/z_6EeA2LDSw/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/z_6EeA2LDSw/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/z_6EeA2LDSw/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 15,
"resourceId": {
"kind": "youtube#video",
"videoId": "z_6EeA2LDSw"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
},
{
"kind": "youtube#playlistItem",
"etag": "UMgauGMmYJbHeIOaoBOjZFabMbM",
"id": "UExmYUlERkVYdWFlMkxYYk8xX1BLeVZKaVEyM1p6dEEweC4yMUQyQTQzMjRDNzMyQTMy",
"snippet": {
"publishedAt": "2024-03-29T00:07:44Z",
"channelId": "UCC-lyoTfSrcJzA1ab3APAgw",
"title": "RAG From Scratch: Part 14 (ColBERT)",
"description": "Our RAG From Scratch video series walks through impt RAG concepts in short / focused videos w/ code. This is the 14th video in our series and focuses on indexing with ColBERT for fine-grained similarity search.\n\nProblem: Embedding models compress text into fixed-length (vector) representations that capture the semantic content of the document. This compression is very useful for efficient search / retrieval, but puts a heavy burden on that single vector representation to capture all the semantic nuance / detail of the doc. In some cases, irrelevant / redundant content can dilute the semantic usefulness of the embedding.\n\nIdea: ColBERT (@lateinteraction & @matei_zaharia) is a neat approach to address this with higher granularity embeddings: (1) produce a contextually influenced embedding for each token in the document and query. (2) score similarity between each query token and all document tokens. (3) take the max. (4) do this for all query tokens. (5) take the sum of the max scores (in step 3) for all query tokens to get the similarity score. \nThis results in a much more granular token-wise similarity assessment between document and query, and has shown strong performance. \n\nCode:\nhttps://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_12_to_14.ipynb\n\nReferences:\n1/ Paper:\nhttps://arxiv.org/abs/2004.12832\n\n2/ Nice review from @DataStax: \nhttps://hackernoon.com/how-colbert-helps-developers-overcome-the-limits-of-rag\n\n3/ Nice post from @simonw:\nhttps://til.simonwillison.net/llms/colbert-ragatouille\n\n4/ColBERT repo:\nhttps://github.com/stanford-futuredata/ColBERT\n\n5/ RAGatouille to support RAG w/ ColBERT:\nhttps://github.com/bclavie/RAGatouille",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/cN6S0Ehm7_8/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/cN6S0Ehm7_8/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/cN6S0Ehm7_8/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/cN6S0Ehm7_8/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/cN6S0Ehm7_8/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "LangChain",
"playlistId": "PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x",
"position": 16,
"resourceId": {
"kind": "youtube#video",
"videoId": "cN6S0Ehm7_8"
},
"videoOwnerChannelTitle": "LangChain",
"videoOwnerChannelId": "UCC-lyoTfSrcJzA1ab3APAgw"
}
}
],
"pageInfo": {
"totalResults": 17,
"resultsPerPage": 20
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment