Переглядів 16,076
Retrieval Augmented Generation (RAG) is the de facto technique for giving LLMs the ability to interact with any document or dataset, regardless of its size. Follow along as I cover how to parse and manipulate documents, explore how embeddings are used to describe abstract concepts, implement a simple yet powerful way to surface the most relevant parts of a document to a given query, and ultimately build a script that you can use to have a locally-hosted LLM engage your own documents.
Check out my other Ollama videos: • Get Started with Ollama
Links:
Code from video - decoder.sh/videos/rag-from-th...
Ollama Python library - github.com/ollama/ollama-python
Project Gutenberg - www.gutenberg.org
Nomic Embedding model (on ollama) - ollama.com/library/nomic-embe...
BGE Embedding model - huggingface.co/CompendiumLabs...
How to use a model from HF with Ollama - • Importing Open Source ...
Cosine Similarity - blog.gopenai.com/rag-for-ever...
Timestamps:
00:00 - Intro
00:26 - Environment Setup
00:49 - Function review
01:50 - Source Document
02:18 - Starting the project
02:37 - parse_file()
04:35 - Understanding embeddings
05:40 - Implementing embeddings
07:01 - Timing embedding
07:35 - Caching embeddings
10:06 - Prompt embedding
10:19 - Cosine similarity for embedding comparison
12:16 - Brainstorming improvements
13:15 - Giving context to our LLM
14:29 - CLI input
14:49 - Next steps