● Open source (MIT)·● Managed cloud·Cloudflare edge

Shared context for AI agents

Push large payloads to the edge. Agents on different LLMs pull them by URL. Pub/sub channels coordinate them in ~20 ms.

Open-source (MIT) or managed cloud — same SDK either way.

Start free on the cloud Self-host in 5 minutes

// without ContextRelay

Agent A → [paste 50 KB into prompt] → Agent B ≈ 12,500 tokens burned

// with ContextRelay

Agent A → push() → "https://.../pull/uuid" → Agent B → pull(url)

~80 chars · ~0 tokens · 75 ms

Two ways to run it

Same SDK, same protocol, same code. Pick the deployment that fits.

●

Self-host (open source)

Clone the repo, deploy the Worker to your own Cloudflare account. You own the data, the keys, the operational surface — and pay only for Cloudflare usage. The free tier covers 100K requests/day.

✓MIT license, no vendor lock-in
✓Runs on your Cloudflare account
✓No API keys, no quota
✓You handle ops

$ git clone github.com/cmhashim/ContextRelay

$ cd ContextRelay/api && wrangler deploy

Self-host guide on GitHub →

●

Managed cloud

We run the Worker for you on the global edge. You get an API key, a dashboard, metered usage, quotas, and zero ops. Same SDK as the open-source path — just point it at the managed URL.

✓Free tier: 1K pushes / 10K pulls per month
✓Dashboard for keys and usage
✓Quota enforcement and billing
✓We handle ops

$ pip install contextrelay

$ export CONTEXTRELAY_API_KEY=cr_live_...

Get a free API key →

pip install contextrelay

·pypi.org/project/contextrelay ↗

Why now

Multi-agent stacks are exiting the demo phase. Frameworks like CrewAI, LangGraph, and AutoGen ship production fleets. Devs wire Claude, GPT, Mistral, and Gemini into the same pipeline. The bottleneck used to be reasoning — it's now moving bytes between agents.

Every framework reinvents this layer with custom Redis schemas, scratch files, or 50 KB-per-prompt token taxes. ContextRelay is the layer underneath: ephemeral storage, pub/sub channels, optional E2EE — provider-agnostic and open-source.

How it works

Three lines replace a token tax. Works across any agent, any LLM provider, any runtime.

Push

Agent A uploads the payload to the Cloudflare edge. Gets back a short URL pointer. Takes ~250 ms for 125 KB.

Relay the URL

The orchestrator passes the URL to Agent B — not the data. The URL is ~80 characters; the payload stays at the edge.

Pull on demand

Agent B fetches the full payload directly from the URL. ~75 ms for 125 KB. Works from any runtime — Python, Node, curl.

Five things you can build with it

Five different shapes of multi-agent work. Pick the one that looks like yours.

Cross-LLM handoff

Swap models without losing context →

Claude reviews a PR. Mistral converts the review into Jira tickets. The full review travels as a URL — zero tokens in the orchestrator.

Agent fleet

One substrate for your whole crew →

A CrewAI/LangGraph/AutoGen team — researcher, writer, fact-checker, editor — shares one channel namespace. No custom Redis schema, no scratch files.

Async background

Fire-and-forget agent tasks →

Push a long-running research task to a channel. Get a URL back. Move on. Subscriber wakes when results land — no polling, no callback hell.

Hybrid local↔cloud

Your dev agent talks to your prod agents →

Claude Code or Cursor (local) and your production Workers (cloud) share one context substrate. Same SDK, same channels, no bridge code.

E2EE

Audit-friendly encryption for sensitive data →

PII, credentials, regulated workloads — encryption opt-in per push. The key lives in the URL fragment and never reaches the server. Metadata stays plaintext for triage.

Power features

MCP tools, AgentBridge, and more →

Native MCP server for Claude Desktop / Claude Code. AgentBridge for delegating tasks to a tmux'd Claude Code instance. Peek before you pull.

Live demo

Claude plans. Mistral builds. Channels coordinate them.

Two agents, one channel namespace, no human in the loop. Claude pushes an architecture to task-assigned; Mistral's subscriber fires in ~20 ms, implements the codebase, and publishes back to task-done.

shared API key ·same channels ·autonomous

Claude Opus design → push(arch, channel="task-assigned")

↓ subscriber fires in ~20 ms

Mistral Large pull(url) → implement → push(code, channel="task-done")

↓ architect notified

Claude Opus receives implementation · reviews · done

Terminal 1 — Mistral engineer

Subscribes to task-assigned · waits for Claude

mistral_engineer.py

# mistral_engineer.py — run this first in a terminal
import os
from mistralai import Mistral
from contextrelay import ContextRelay

relay   = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
mistral = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

def on_task(url: str):
    architecture = relay.pull(url)
    code = mistral.chat.complete(
        model="mistral-large-latest",
        messages=[{"role": "user", "content":
            "Implement this architecture as runnable code:\n\n" + architecture}],
    ).choices[0].message.content
    relay.push(code, channel="task-done", metadata={"role": "implementation"})

relay.subscribe("task-assigned", on_task)  # blocks, waits for Claude

Claude Code — paste this prompt

Pushes to task-assigned · Mistral fires automatically

Claude Code prompt

You are a senior software architect in an autonomous pipeline.
Your engineer (Mistral) is subscribed to channel "task-assigned".
The moment you push your architecture there, Mistral implements it.

Task: Design a production-ready FastAPI task management API.
Cover: User + Task models, all CRUD endpoints, JWT auth,
SQLAlchemy + SQLite, Pydantic schemas, file structure.

Steps:
1. Write the complete architecture document
2. Use push_context — set channel="task-assigned"
3. Subscribe / poll "task-done" to receive the implementation back

Start now. The pipeline is fully autonomous.

[architect] Architecture pushed → task-assigned

[engineer] Task received — implementing...

[engineer] Implementation published → task-done

[architect] Implementation received ✓

See the full autonomous orchestrator in the docs →

Drop-in for your agent framework

First-class integrations for the three biggest multi-agent frameworks. Same SDK underneath — pick the binding that matches your stack.

LangChain

pip install contextrelay[langchain]

from contextrelay.integrations.langchain import (
    ContextRelayRetriever,
)

retriever = ContextRelayRetriever(
    api_key=API_KEY, channel="research"
)
docs = retriever.get_relevant_documents("...")

CrewAI

pip install contextrelay[crewai]

from contextrelay.integrations.crewai import (
    ContextRelayPushTool, ContextRelayPullTool,
)

agent = Agent(
    role="Researcher",
    tools=[ContextRelayPushTool(api_key=API_KEY)],
)

AutoGen

pip install contextrelay[autogen]

from contextrelay.integrations.autogen import (
    get_autogen_tools,
)

tools = get_autogen_tools(api_key=API_KEY)
# pass to your AssistantAgent / FunctionTool list

What you get

⚡

Sub-100 ms pulls

Cloudflare Workers + KV across 300+ PoPs. Your context lives milliseconds from wherever your agents run.

📡

WebSocket pub/sub

Subscribe a channel, receive pointer URLs in ~20 ms when any agent pushes. No polling, no callbacks to manage.

🔍

Peek before you pull

Read metadata — size, tags, TTL — without downloading the payload. Route or skip without touching your token budget.

🔗

Any LLM, any provider

Claude, GPT-4, Mistral, Gemini — the URL is just a string. Pass it anywhere without touching the payload.

🛠️

MCP native tools

📖

Open source

MIT-licensed. Clone the repo, deploy to your Cloudflare account, run it forever. No vendor lock-in, no telemetry.

How it compares

Honest comparison. ContextRelay is the cross-provider option. For single-LLM use cases, the platform-native memory is fine — when you span providers or need pub/sub, you need something different.

	ContextRelay	Anthropic Memory	OpenAI Assistants	Mem0 / Letta	Redis pub/sub
Best for	Cross-provider handoff & coordination	Single-user Claude conversations	OpenAI Assistants only	Semantic recall over user history	Real-time messaging in your stack
Cross-LLM	Yes	No (Claude-only)	No (OpenAI-only)	Yes	Yes
Multi-agent pub/sub	Yes (~20 ms fan-out)	No	No	No	Yes
Self-host	Yes (one wrangler deploy)	No	No	Letta yes / Mem0 cloud	Yes
Client-side E2EE	Yes (fragment-key)	No	No	No	TLS only
Default lifetime	Ephemeral (24 h TTL)	Persistent	Persistent	Persistent	In-memory
Hosted price	Free / $29 / $99	Per-token	Per-token	Mem0 / Letta cloud	Many vendors

Use the right tool. If you need persistent semantic recall, use Mem0 or Letta. If you only call one LLM, the platform's built-in memory is fine. ContextRelay's sweet spot is the moment you have agents on different providers, a 50 KB payload, and want it to cost ~20 tokens — with optional encryption and channel coordination on top.