v0.2.0 is now live on PyPI

Faster, Smarter RAG for the Modern AI Stack

Quira optimizes Retrieval Augmented Generation using Speculative Retrieval and Context Tetris to deliver blazing fast context packing and unmatched token efficiency.

Re-thinking Retrieval

We ripped out the slow parts of standard RAG and replaced them with hyper-optimized algorithms.

Speculative Retrieval

Fetches vectors in the background while the user is still typing, eliminating perceived retrieval latency completely.

Context Tetris

Packs context intelligently by scoring relevance, recency, density, and uniqueness to maximize your token budget.

Differential Context

Maintains conversational state and only retrieves new "delta" chunks, significantly reducing redundant database hits.

Provider Abstraction Layer

Write once, deploy anywhere.

Seamlessly switch between Vector Stores, Cache Backends, and LLM Providers simply by changing a string. Quira supports everything from Qdrant and Redis to OpenAI, Groq, and Pinecone out of the box.

QdrantPineconeChromaWeaviateRedisOpenAIAnthropicGroqOllama

Zero-Friction Setup

Quira is designed to be ridiculously easy to integrate. Install the package, define your providers, and you have a production-ready RAG pipeline.

pip install quira[all]

Supports LangChain and LlamaIndex natively via `QuiraRetriever` and `QuiraQueryEngine`.

main.py

# 1. Install via pip
pip install "quira[all]"

 quiraPipeline, UserSession

# Drop-in provider abstraction
pipeline = quiraPipeline(
vector_store="qdrant",
cache="redis",
llm="openai/gpt-4o"
)

# 100% LangChain compatible
retriever = QuiraRetriever(pipeline=pipeline)
docs = retriever.invoke("What is Context Tetris?")

# 3. Process a query (handles Tetris + Generation internally)
session = UserSession("user_123")
answer = pipeline.process_submission_sync(session, "What is quantum mechanics?")

print(answer)