Documentation

How to wire AI agents together across providers, frameworks, and runtimes — without burning tokens on data transit.

How it works

Every agent in your pipeline has a token budget. When Agent A produces a large output — a code review, a research report, a database dump — and Agent B needs it, you have two options: paste the full text into B's prompt (expensive) or truncate it (lossy).

ContextRelay gives you a third option. Agent A pushes the payload to the edge and gets back an 80-character URL. Agent B receives the URL, then pulls the full payload directly — it never appears in A's conversation again. For sensitive payloads, add encrypted=True and the data is encrypted in your process before it leaves your machine — Cloudflare only ever sees ciphertext.

Agent A → push(50 KB result) → "https://.../pull/uuid"

↓ pass the URL (20 tokens)

Agent B → pull(url) → 50 KB result (~75 ms, 0 tokens in A)

Quick start

Install

pip install contextrelay

Get your API key

export CONTEXTRELAY_API_KEY="cr_live_..."

Connect and relay

import os
from contextrelay import ContextRelay

# No base_url needed — defaults to the managed cloud
relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])

url = relay.push("any large text, JSON, or Markdown — up to 25 MB")
data = relay.pull(url)   # retrieve from any agent, any machine

Self-hosting? Point base_url at your Worker — same SDK, same protocol.

Use case 1 — Cross-LLM handoff

Swap models mid-pipeline without losing context. Claude does a thorough code review; the review is too large to fit alongside the follow-up instructions in Mistral's context. ContextRelay bridges them.

import os, anthropic
from mistralai import Mistral
from contextrelay import ContextRelay

relay   = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
claude  = anthropic.Anthropic()
mistral = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# ── Agent A: Claude reviews the PR ──────────────────────────────
diff = open("pr_diff.txt").read()   # ~20 KB of git diff

review = claude.messages.create(
    model="claude-opus-4-5",
    max_tokens=4096,
    messages=[{"role": "user", "content": f"Code review:\n\n{diff}"}],
).content[0].text

# Store the review — hand off only the URL
review_url = relay.push(review, metadata={"pr": "PR-441", "type": "review"})

# ── Agent B: Mistral turns the review into Jira tickets ─────────
full_review = relay.pull(review_url)   # fetched directly by Mistral

tickets = mistral.chat.complete(
    model="mistral-large-latest",
    messages=[{
        "role": "user",
        "content": f"Convert this review into Jira tickets:\n\n{full_review}"
    }],
).choices[0].message.content

The review travels from Claude to Mistral as a URL (~20 tokens). Without ContextRelay, pasting a 3,000-token review into Mistral's prompt would cost ~$0.02 every handoff — at 500 reviews/day that is $10/day in pure overhead.

Use case 2 — Agent fleet (CrewAI / LangGraph / AutoGen)

A specialist crew — researcher, writer, fact-checker, editor — shares one channel namespace. No custom Redis schema, no scratch files. The first-class integrations handle the wire format; you write the agents.

CrewAI — push and pull as native tools

from crewai import Agent, Task, Crew
from contextrelay.integrations.crewai import (
    ContextRelayPushTool, ContextRelayPullTool,
)

push = ContextRelayPushTool(api_key=API_KEY, channel="research-pipeline")
pull = ContextRelayPullTool(api_key=API_KEY)

researcher = Agent(role="Researcher",
    goal="Find sources on quantum computing", tools=[push])
writer     = Agent(role="Writer",
    goal="Draft article from research",       tools=[pull, push])
checker    = Agent(role="Fact-checker",
    goal="Verify each claim",                  tools=[pull])

crew = Crew(agents=[researcher, writer, checker],
            tasks=[...])
crew.kickoff()

LangGraph — pub/sub fan-out across nodes

from contextrelay import ContextRelay
relay = ContextRelay(api_key=API_KEY)

def researcher_node(state):
    findings = run_llm("research the topic")
    url = relay.push(findings, channel="research")
    return {"research_url": url}

def writer_node(state):
    findings = relay.pull(state["research_url"])
    return {"draft": run_llm(f"Write from: {findings}")}

# Add as nodes in your StateGraph — payloads stay at the edge

AutoGen — function tools

from autogen_agentchat.agents import AssistantAgent
from contextrelay.integrations.autogen import get_autogen_tools

tools = get_autogen_tools(api_key=API_KEY)

researcher = AssistantAgent(
    name="researcher",
    tools=tools,
    system_message="Push findings to ContextRelay, share the URL.",
)

Why a shared channel namespace beats per-agent state: you can add a fifth agent (e.g. a QA reviewer) without touching the existing four. Each agent only knows the channel name; coordination is pub/sub, not point-to-point.

Use case 3 — Async background work

Fire-and-forget agent tasks. The orchestrator pushes a task to a channel and moves on. A worker subscribed to the channel wakes up, processes, and pushes the result to a done-channel. Replaces polling, queues, and callback hell.

The orchestrator (push and continue)

import os
from contextrelay import ContextRelay

relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])

# Fire a long-running research task
task_url = relay.push(
    "Analyse the last 30 days of customer support tickets for sentiment trends",
    channel="bg-tasks",
    metadata={"task_id": "T-1042", "priority": "low"},
)
# Orchestrator returns immediately — does NOT block on the worker

The worker (subscribe and respond)

import threading
from contextrelay import ContextRelay

relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])

def on_task(url):
    task = relay.pull(url)
    meta = relay.peek(url)              # cheap — no payload download
    result = run_long_analysis(task)    # could take minutes
    relay.push(
        result,
        channel="bg-done",
        metadata={"task_id": meta["task_id"], "duration_s": 142},
    )

threading.Thread(
    target=relay.subscribe,
    args=("bg-tasks", on_task),
    daemon=True,
).start()

The result-collector (subscribe to done)

def on_done(url):
    meta = relay.peek(url)
    print(f"Task {meta['task_id']} done in {meta['duration_s']}s")
    result = relay.pull(url)
    notify_user(meta["task_id"], result)

relay.subscribe("bg-done", on_done)   # blocking — run in a thread

Use it as a lightweight task queue. No Redis, no Celery, no SQS — push to a channel, subscribe to its done-channel. Peek before pull lets the collector route by metadata without paying download cost.

Use case 4 — Hybrid local↔cloud

A Claude Code or Cursor instance running on your laptop talks to production agents running in a Cloudflare Worker. Same SDK, same channels, no bridge code. Use the local agent for exploratory work; the cloud agents pick up from where you left off.

Local Claude Code: push a debugging artifact

# Inside Claude Code, with the contextrelay MCP server registered:
# (~/.claude/mcp.json)

# Claude can call push_context as a native tool:
push_context(
  data="<full stack trace + repro steps>",
  channel="prod-issues",
  metadata={"severity": "high", "service": "api-gateway"},
)
# Returns: https://api.contextrelay.dev/pull/<uuid>

Cloud Worker: subscribe and triage

// Cloudflare Worker — listens on prod-issues
import { ContextRelay } from "contextrelay-js";

export default {
  async fetch(req, env) {
    const relay = new ContextRelay({ apiKey: env.CR_KEY });
    return relay.subscribe("prod-issues", async (url) => {
      const meta = await relay.peek(url);
      if (meta.severity === "high") {
        const trace = await relay.pull(url);
        await pageOnCall(meta.service, trace);
      }
    });
  },
};

Why this matters. Most agent infra products assume one runtime. ContextRelay is wire-protocol-agnostic — anything that can speak HTTPS and WebSocket can join the channel. Your Cursor agent, your prod Worker, and a CI job on GitHub Actions can all share the same context substrate.

Use case 5 — E2EE for sensitive data

🔒

Audit-friendly E2EE for regulated multi-agent pipelines. When you push with encrypted=True, the payload is encrypted in your process before it leaves your machine. Cloudflare — and ContextRelay — only ever store ciphertext. The decryption key is embedded in the URL fragment (#key=…) and is never sent to any server (RFC 3986 guarantees fragments stay client-side).

Encryption is opt-in per push. Use it for any payload containing PII, credentials, or proprietary data. Metadata stays plaintext — so your audit logs can record who pushed what without decrypting payloads.

relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])

# Encrypt on push — a fresh AES key is generated locally
url = relay.push(
    customer_pii_record,
    encrypted=True,
    metadata={"customer_id": "C-9842", "purpose": "compliance-review"},
)
# url → "https://.../pull/<uuid>#key=<base64-fernet-key>"

# Anyone with the full URL can decrypt; anyone without #key= cannot
result = relay.pull(url)   # decrypted locally, never on the server

What is encrypted vs what is not

Field	Encrypted?	Notes
data	Yes	Fernet (AES-128-CBC + HMAC-SHA256)
metadata	No	Always plaintext — don't put secrets here
key	Never leaves client	URL fragment — never transmitted to server

pip install contextrelay[crypto]   # or: pip install cryptography

Live demo — Claude plans, Mistral builds

⚡

The idea: Claude Opus as your architect (high reasoning, worth the cost), Mistral as your engineer (fast, accurate, cheaper per token). They never share a conversation — ContextRelay passes the architecture as a URL so Mistral never burns tokens on Claude's planning context.

Step 1 — paste this in Claude Code

Claude has the push_context MCP tool available. It will design the API and push the architecture to ContextRelay automatically.

You are a senior software architect. Design a production-ready FastAPI task management API.

Your design must cover:
- Data models: User, Task (with status, priority, due_date)
- All REST endpoints: auth (register/login/me), tasks (CRUD + filter by status)
- JWT authentication flow
- SQLite + SQLAlchemy ORM setup
- Pydantic schemas for request/response validation
- File structure and key implementation decisions

Write the complete architecture document with precise details so an engineer
can implement without asking questions.

When done, use the push_context tool to save the full document to ContextRelay.
Print the returned URL clearly — your engineer (Mistral) will build the entire codebase from it.

Step 2 — copy the URL Claude prints, paste this in Mistral

Replace PASTE_URL_FROM_CLAUDE_HERE and your API key. Mistral fetches the architecture and implements the full codebase.

You are a senior Python engineer. Your architect (Claude Opus) has designed a FastAPI API.

Step 1 — fetch the architecture from ContextRelay:

import requests
plan = requests.get(
    "PASTE_URL_FROM_CLAUDE_HERE",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
).text
print(plan)

Step 2 — read the architecture and implement the complete codebase:
- Every Python file described in the design
- requirements.txt
- README.md with setup and run instructions

Rules:
- Match the architect's design exactly — do not improvise
- Output complete, runnable code only
- No placeholders, no TODOs

What just happened: Mistral never saw Claude's conversation — it received only the architecture URL (~80 chars). Claude never saw Mistral's output. Two specialist models collaborated on a real codebase through a single URL pointer. The architecture stays valid for 24 hours — share it with a reviewer, a CI agent, or a third model.

Automate the full loop in Python

import os, anthropic
from mistralai import Mistral
from contextrelay import ContextRelay

relay   = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
claude  = anthropic.Anthropic()
mistral = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# ── Claude Opus: architect ───────────────────────────────────────
arch_response = claude.messages.create(
    model="claude-opus-4-5",
    max_tokens=8192,
    messages=[{
        "role": "user",
        "content": (
            "Design a production FastAPI task management API. Include data models, "
            "all endpoints, JWT auth, SQLAlchemy setup, and file structure. "
            "Be complete — an engineer will implement directly from this document."
        )
    }],
).content[0].text

# Push architecture — hand off just the URL
arch_url = relay.push(arch_response, metadata={"role": "architecture", "project": "task-api"})

# ── Mistral Large: engineer ──────────────────────────────────────
architecture = relay.pull(arch_url)   # Mistral fetches directly — 0 tokens in Claude

code_response = mistral.chat.complete(
    model="mistral-large-latest",
    messages=[{
        "role": "user",
        "content": (
            f"You are a senior Python engineer. Implement this architecture as a complete, "
            f"runnable codebase. Every file. No placeholders.\n\nArchitecture:\n{architecture}"
        )
    }],
).choices[0].message.content

# Push implementation — share the URL with your team or CI
impl_url = relay.push(code_response, metadata={"role": "implementation", "project": "task-api"})

Power feature — AgentBridge for Claude Code

push_and_wait blocks an orchestrator script until a Claude Code instance running in a tmux window finishes a task — no polling, no SSH, no manual copy-paste.

Start the coordinator once (in your tmux session)

# terminal in your tmux session named "vibe", window 0
pip install contextrelay
contextrelay-bridge start --tmux vibe --task-channel vibe-tasks --done-channel vibe-done

Send tasks from any script

import os
from contextrelay import ContextRelay, AgentBridge

relay  = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
bridge = AgentBridge(relay, task_channel="vibe-tasks", done_channel="vibe-done")

result = bridge.push_and_wait(
    "Refactor the auth module to use Firebase. "
    "Run the type checker. Return a summary of all changed files."
)

print(result)   # full Claude Code output, stripped of UI chrome

SDK reference

Method	What it does
push(data, ...)	Upload payload (str, up to 25 MB), returns URL. Options: `channel`, `encrypted`, `metadata`.
pull(url)	Download payload. Auto-decrypts if URL contains `#key=`.
peek(url)	Fetch metadata only — no payload download.
subscribe(ch, fn)	Subscribe to a channel. Calls `fn(url)` on each push. Blocking — run in a thread.
publish(ch, msg)	Publish a message to a channel without a payload push.

REST API

Every SDK method maps to a Worker endpoint. Authenticate with Authorization: Bearer cr_live_....

Method	Endpoint	Description
POST	/push	Upload payload → { url, id }
GET	/pull/:id	Download payload by ID
GET	/peek/:id	Metadata only, no payload
GET	/ws/:channel	WebSocket upgrade — pub/sub

curl -X POST https://api.contextrelay.dev/push \
  -H "Authorization: Bearer $CONTEXTRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"data": "hello from curl"}'

MCP (Claude Desktop / Claude Code)

// ~/.claude/mcp.json  or  .mcp.json in project root
{
  "mcpServers": {
    "contextrelay": {
      "command": "contextrelay-mcp",
      "env": {
        "CONTEXTRELAY_URL": "https://api.contextrelay.dev",
        "CONTEXTRELAY_API_KEY": "cr_live_..."
      }
    }
  }
}

Available tools: push_context, peek_context, pull_context.

Limits

Plan	Pushes / mo	Pulls / mo
Free	1 000	10 000
Pro	100 000	1 000 000
Team	1 000 000	10 000 000

Max payload: 25 MB · TTL: 24 hours.

Ready to stop paying token tax?

Get a free API key →Or self-host →