RAG Pipeline
Full retrieval-augmented generation pipeline with document chunking, embedding generation, vector storage, and context-aware retrieval. Supports PDF, Markdown, and HTML sources.
PythonAI WorkflowsBuilt with OpenClaw
3.8k
Stars
14.2k
Installs
4
Deps
2
Comments
Install / Copy
pip install freestack-rag-pipelineCode Preview
index.tsx
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from pgvector.sqlalchemy import Vector
class RAGPipeline:
def __init__(self, chunk_size=1000, chunk_overlap=200):
self.splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
self.embeddings = OpenAIEmbeddings()
def ingest(self, documents: list[str]) -> int:
chunks = self.splitter.split_documents(documents)
vectors = self.embeddings.embed_documents([c.page_content for c in chunks])
self.store.add(chunks, vectors)
return len(chunks)
def query(self, question: str, top_k: int = 5) -> str:
context = self.store.similarity_search(question, k=top_k)
return self.llm.generate(question, context)ML
mlops4 days ago
Replaced our custom RAG setup with this. Chunk overlap handling is much better.
LL
llamauser1 week ago
Works great with local models too. Swapped OpenAI embeddings for sentence-transformers.
Related Modules
Agent Loop
Autonomous agent execution loop with tool calling, memory management, and graceful error recovery. Supports OpenAI and Anthropic function calling formats.
2.9kTypeScript
AI Chat Widget
Drop-in AI chat interface with streaming responses, message history, typing indicators, and markdown rendering. Connects to any OpenAI-compatible API.
2.1kReact
Embedding Pipeline
Batch document embedding pipeline with pgvector storage, incremental updates, and deduplication. Handles PDFs, web pages, and plain text with automatic chunking.
1.6kPython