Durable by construction
State lives in an append-only event log, not in RAM. Processes die, machines reboot, deploys roll — and every agent comes back exactly where it was. No lost runs, no half-finished work.
Z8 runs fleets of AI agents that make real decisions — and never lose one. Every run is an event stream: it survives crashes, replays to the exact step without re-calling the model, and scales to millions in parallel. And underneath sits a whole application runtime — a durable queue, workers and live read models — so it's the production platform the agent frameworks forgot to ship.
A Z8 agent doesn't hold its state in memory and hope. Every decision it makes is appended to an event stream — and that stream is the agent. Rebuild it after a crash, replay it for an audit, fork it for a test: the past is data, not a re-run. And replaying an AI agent never calls the model again — the reasoning already happened; the events remember it.
decide / evolve are pure functions — agents you can unit-test with no mocks and no network.Rebuilt from the log in microseconds · 0 model calls on replay.
Most agent tools nail the demo and leave durability, concurrency and operations to you. Z8 ships them in the box — because that's the difference between a notebook and a system you can run a business on.
State lives in an append-only event log, not in RAM. Processes die, machines reboot, deploys roll — and every agent comes back exactly where it was. No lost runs, no half-finished work.
Built on a runtime proven at the scale of ~2M live connections per server at WhatsApp and 11M+ concurrent users at Discord. Millions of lightweight, isolated agents run in true parallel under supervision; one crashing is contained in milliseconds, and the fleet never notices.
A full compendium of strategies — from one-shot answers to tree- and graph-of-thought deliberation — with an adaptive router that spends deep reasoning only where the problem earns it. Tools, policies and budgets included.
A message queue, a job system, read models and an audit trail aren't add-ons you integrate — they're the same runtime, sharing one event log. The whole production stack with a single mental model, not ten libraries wired together.
A real-time control room streams every agent, run, event and trace straight from the event store — no polling, no extra infrastructure. Built for operators and risk teams, not just engineers.
Because the event log is the source of truth, a tamper-evident audit trail isn't a feature you bolt on — it's the substrate. Replay any past decision exactly, prove it to a regulator, retain it for years.
Ship an agent to production and you discover it needs a message queue, a job system, an event store and read models around it — normally four more vendors and a mesh of glue. Z8 is all of them in one runtime, sharing one immutable log. Nothing to integrate, one place to look.
Every command, event and state change moves as an open CloudEvents signal on a durable bus — accepted once, ordered for life, delivered even across a crash. Backpressure absorbs spikes, a dead-letter queue quarantines what can't be handled, and outbound calls are TLS-verified and circuit-broken. Point it at your systems, or another team's, with no bespoke integration.
Slow or heavy work runs off the critical path as durable jobs: automatic retries with capped backoff, scheduling and recurring runs, per-queue concurrency limits and priorities. A flaky downstream becomes a retry, not a failed customer request — and because every job is itself event-sourced, it survives a crash mid-run and resumes exactly where it stopped. No separate job database to operate.
Writes and reads scale independently. Every dashboard, report, API and search index is a live projection folded off the event log — always current, never a stale nightly batch. Need a new view? Add a read model and replay history to backfill it in full. Consumers are checkpointed and idempotent, so a record is never counted twice.
The surface is small and pure: decide what to do, evolve
your state from the events, and let the runtime handle persistence, concurrency,
retries and recovery. Here's a real durable agent, an AI agent in five lines, and
the wiring that turns them into a running application.
# A durable agent: the consistency boundary.
# Pure decide / evolve over an event stream — no I/O, no clock.
agent Account:
snapshot every 100 events
state { account_id, balance: 0, status: "new" }
# decide: current state + command -> events (+ optional reply)
decide OpenAccount(cmd) when status is "new":
emit AccountOpened(account_id: cmd.account_id)
reply "opened"
decide Withdraw(amount) when amount > balance:
reject "insufficient_funds"
# evolve: fold an event back into state. Runs on every replay, forever.
evolve AccountOpened(e):
account_id = e.account_id
status = "open"
# An AI agent. Its event stream IS the run record;
# replay never re-calls the model.
ai_agent Triage:
model: fast
prompt: "You are a support triage assistant."
tools: [ LookupCustomer ]
max_iterations: 4
# Dispatch a run, then await the answer.
run = dispatch Triage.StartRun(query: "double charged")
answer = await(run)
# A durable job: a worker is just an event-sourced agent.
# Retries, scheduling and backoff come from the runtime — not your code.
worker EmailReceipt:
queue "mailers"
max_attempts 20 # retry with capped backoff, then dead-letter
perform(job):
charge = load(job.charge_id)
send_receipt(charge.email, charge.amount) # a failure here just retries
# Enqueue now, or schedule for later. Survives a restart either way.
enqueue EmailReceipt(charge_id: "ch_42")
enqueue EmailReceipt(charge_id: "ch_99") schedule_in: minutes(30)
# Route commands to agents, then start the application.
router App:
validate every command
identify Account by account_id, prefix "account-"
dispatch [ OpenAccount, Deposit, Withdraw ] -> Account
# Strong consistency blocks until the durable handlers ack.
start App
dispatch Deposit(account_id: "A", amount: 50) with consistency: strong
decide / evolve are deterministic functions — replayable and testable with no process and no network.z8 dev
Routine questions shouldn't cost the same as high-stakes calls. Z8 ships a full reasoning compendium and an adaptive router that sizes each task automatically — so quality stays high where it matters and spend stays low where it doesn't. Every step is recorded as events.
One-pass answers for high-volume, low-risk questions — fast and cheap, by the thousands.
Works a problem through one step at a time, with a parsed, on-record conclusion.
Interleaves reasoning with tool calls — look up records, hit your systems, then decide.
Generates, evaluates and expands several lines of thought, keeping the strongest.
Builds a DAG of ideas — generate, connect, aggregate — for problems that branch and merge.
Explicit algorithmic search at temperature zero, for answers that must be exact.
Reasons, supervises its own answer, and revises — with halting that knows when to stop.
Classifies each task and routes to the cheapest strategy that will get it right — the one piece that turns the seven above into a system.
A Z8 run isn't a black box that returns a verdict — it's a readable record: what triggered it, the facts it weighed, the reasoning, the outcome, and a tamper-evident seal. The same record debugs an incident on Tuesday and answers an auditor a year later.
Hand it to a customer, an auditor, or your own on-call engineer — and back it with proof it hasn't been touched since the moment it was made.
"Applicant meets every criterion for the Standard tier. Debt-to-income is within policy and income is verified, with no adverse flags. Approving at the standard rate."
A real-time control room for every agent you run — what's deciding, what's waiting, what needs a human. Driven straight from the event store: no dashboards to refresh, no polling, no second system to operate. It updates the moment anything changes.
| Reference | Agent | Status | Time |
|---|---|---|---|
| loan-7f3a | Lending Assistant | Reasoning | 1.2s |
| claim-9c11 | Fraud Review | Cleared | 0.8s |
| cust-2bd0 | Onboarding | Awaiting docs | — |
| pay-5e8f | Payments | Settled | 0.4s |
| loan-aa20 | Lending Assistant | Escalated | 2.7s |
CrewAI, LangGraph and AutoGen are great at the first demo. Durability, true concurrency and operations are left as an exercise for the reader — usually discovered the first time an agent crashes mid-task in production. Z8 starts where they stop.
| z8Z8 all built in | CrewAI | LangGraph | AutoGen | |
|---|---|---|---|---|
| Runtime & scale | ||||
| Runtime & concurrency | Millions · true parallel | Python · GIL | Python · GIL | Python · GIL |
| Fault isolation — one agent ≠ the fleet | ✓ | ✗ | ✗ | ✗ |
| Durable message queue & backpressure | ✓ | ✗ | ✗ | ✗ |
| Durability & recovery | ||||
| Survives a crash mid-run | ✓ | ✗ | ~ | ✗ |
| Replay without re-calling the model | ✓ | ✗ | ✗ | ✗ |
| Operations | ||||
| Jobs, scheduling, retries, dead-letter | ✓ | ✗ | ✗ | ✗ |
| Sagas with automatic compensation | ✓ | ✗ | ~ | ✗ |
| CQRS read models / live projections | ✓ | ✗ | ✗ | ✗ |
| Governance & trust | ||||
| Audit trail by construction | ✓ | ✗ | ~ | ✗ |
| Live observability included | ✓ | ✗ | ~ | ~ |
| Policy, trust & budget controls | ✓ | ✗ | ✗ | ✗ |
| What you still build yourself | Nothing — it's the runtime | Durability · ops · audit | Persistence · ops · glue | Most of production |
Enterprises are putting agents on the critical path — money, eligibility, customer outcomes. The moment an agent's decision matters, "it usually works" stops being good enough, and durability, isolation and proof become the whole game. That's the market Z8 is built for.
Durability and audit fall out of event sourcing for free. Retrofitting them onto a stateless Python loop is a rewrite, not a release.
Millions of supervised, isolated agents in true parallel — 25 years of telecom-grade reliability you can't bolt onto a Python GIL.
Recovery, audit and testing re-fold events, not re-run models — the cost and determinism story no stateless framework can match.
Open-source adoption bottom-up; regulated, high-stakes operations — finance, insurance, healthcare — top-down. One runtime, both motions.
Once a company's decisions, audit trail and compliance history live in Z8's event log, it becomes their system of record — deeply embedded and costly to rip out. Durable adoption, not a swappable library.
Capital is pouring into agent infrastructure — orchestration, durability, observability, memory. Z8 is all four in a single runtime, not four vendors stitched together.
Putting an agent on the critical path raises hard questions: what is it allowed to do, how much can it spend, when does a human decide, and is it getting better or worse? Z8 answers them as configuration, not code — externalized, per-tenant, and on the record.
Allow, deny or ask — as configuration, not code. Rules are hot-reloadable and layered org → tenant → agent, deny-biased and weighted by how reversible an action is. Change what an agent may do, per customer, without a deploy.
New capabilities start in "ask a human." After a track record of approved, never-reversed decisions they graduate to automatic — and a single reversal collapses them straight back. Trust is earned, and every step is on the record.
Real spend, folded by tenant, agent and model into a running ledger — with budgets that stop a run when returns run out. Chargeback and showback come for free, and a runaway agent can't run up the bill.
Golden-task suites and a regression gate run in your pipeline, so a prompt or model change that quietly gets worse is caught before it ships. A/B strategies, prompts and models — then keep the wins.
Loan approvals, fund transfers, eligibility — regulators want those decisions logged immutably, traced to their inputs, retained for years, and independently auditable. In Z8 that isn't a module to buy: every agent decision is already a block in an append-only event chain — linked to its cause and ordered for life.
Every agent decision is an immutable event — appended, never edited or deleted. The ledger only grows, and that history is the single source of truth the agent is rebuilt from.
Each block carries a monotonic UUIDv7 — a time-ordered id with a per-millisecond counter — so the whole chain has one provable, replay-safe order, with no coordinator and no clock to trust.
Every block points to its cause (causation_id) and its originating request (correlation_id) — so you can walk any decision back to its origin, or forward to everything it set in motion.
causes, effects and trace rebuild the full provenance on demand. Drop or alter a block and its links, order and hash stop reconciling — the break can't hide.
Every block is linked to its cause and stamped with a content hash. Drop or alter a past decision and the chain stops reconciling — caught the moment it's verified, and an auditor can check the whole history independently, without touching your live systems.
Reconstruct exactly what an agent did, and why, at any point in the past — every input it relied on still on file. Reproducibility is the default, not a forensic project.
Personal data lives off the permanent record. Honour "right to be forgotten" requests in full — and the audit trail still verifies afterward.
Keep records exactly as long as the rules require on write-once storage, then retire them automatically — and freeze everything the moment an investigation begins.
| Regime | What it asks for | How Z8 supports it |
|---|---|---|
| EU AI ActReg. (EU) 2024/1689 · Arts. 12, 15, 18–19 · high-risk rules from Aug 2026 | Automatic logs over a high-risk system's lifetime, protection against unauthorised changes, multi-year retention. | Every decision logged automatically into an append-only, immutable record, kept as long as you require. |
| DORAReg. (EU) 2022/2554 · Art. 9 | Preserve the authenticity and integrity of data; documented cryptographic controls. | Records form an append-only, immutable history; pairs with documented cryptographic controls and a write-once backup where required. |
| SEC 17a-4Rel. 34-96034 · Option A | A re-creatable audit trail of any modification — who, what, when — on immutable media. | An append-only history on write-once storage; every record linked to the one before it. |
| FINRARules 4510, 3110 | Books-and-records, supervision, and a record of what stood behind AI-driven recommendations. | Every AI step recorded — the inputs, the reasoning, the model used, and the outcome. |
| GDPRReg. (EU) 2016/679 · Art. 17 | Right to erasure — in direct tension with a permanent, immutable log. | Personal data lives off the record and is truly erased on request; the audit trail still verifies. |
| EIOPAOpinion EIOPA-BoS-25-360 | Reproducibility and traceability of how the AI reached its decisions. | Any past decision can be replayed exactly, with every input it relied on on file. |
Add one dependency, boot the full stack locally with no database, and write your first durable agent in minutes. Open source, Apache-2.0, batteries included.
Tell us about your use case — lending, claims, onboarding, payments — and we'll show you Z8 deciding, recovering, and proving it, on a workload like yours, on your infrastructure.