FreeStack

RAG Pipeline

Full retrieval-augmented generation pipeline with document chunking, embedding generation, vector storage, and context-aware retrieval. Supports PDF, Markdown, and HTML sources.

PythonAI WorkflowsBuilt with OpenClaw
3.8k
Stars
14.2k
Installs
4
Deps
2
Comments

Install / Copy

pip install freestack-rag-pipeline

Code Preview

index.tsx
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from pgvector.sqlalchemy import Vector

class RAGPipeline:
    def __init__(self, chunk_size=1000, chunk_overlap=200):
        self.splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size, chunk_overlap=chunk_overlap
        )
        self.embeddings = OpenAIEmbeddings()

    def ingest(self, documents: list[str]) -> int:
        chunks = self.splitter.split_documents(documents)
        vectors = self.embeddings.embed_documents([c.page_content for c in chunks])
        self.store.add(chunks, vectors)
        return len(chunks)

    def query(self, question: str, top_k: int = 5) -> str:
        context = self.store.similarity_search(question, k=top_k)
        return self.llm.generate(question, context)
ML
mlops4 days ago

Replaced our custom RAG setup with this. Chunk overlap handling is much better.

LL
llamauser1 week ago

Works great with local models too. Swapped OpenAI embeddings for sentence-transformers.