Tutorial: Serve and Replay¶
In this tutorial you'll boot the Stargraph FastAPI app with stargraph serve,
enqueue a run over HTTP, poll it to terminal state, and then replay it
deterministically with stargraph replay. The replay produces a
counterfactual fork ID; with --diff you can render the parent vs.
forked RunDiff as JSON.
What you'll build¶
flowchart LR
client((curl)) -->|POST /v1/runs| api[stargraph serve]
api --> sched[Scheduler]
sched --> ckpt[(SQLite Checkpointer)]
client -->|GET /v1/runs/{id}| api
cli[stargraph replay] --> ckpt
Two CLIs talking to the same SQLite checkpointer DB. The scheduler runs the graph in-process; checkpoints land row-by-row; the replay CLI forks a counterfactual run from any persisted step.
Prerequisites¶
- The graph from the first graph tutorial
(
graph.yaml,state.py). curlor any HTTP client.- A free port (default
8000).
Step 1 — Boot the API¶
uv run stargraph serve \
--profile oss-default \
--host 127.0.0.1 \
--port 8000 \
--db ./.stargraph/serve.sqlite \
--audit-log ./.stargraph/audit.jsonl
Pinning --db is critical. Without it stargraph serve mints a
per-process temp DB and the replay CLI in step 5 will not see the
run. The lifespan factory:
- Bootstraps the SQLite checkpointer (creates
runs_history+pending_runsvia migration 002). - Wires
RunHistoryover the same connection soGET /v1/runsreturns live data. - Starts the
Schedulerand the Nautilus broker (soft-fail if nonautilus.yamlis present). - Mounts the five POC routes plus the post-Phase-2 surfaces
(
/cancel,/pause,/respond,/counterfactual,/artifacts,/v1/runs/{id}/streamWebSocket).
Verify the app is up:
Step 2 — Register the graph (POC)¶
The POC stargraph serve boots with an empty in-memory graphs registry
(app.state.deps["graphs"] = {}); production wiring loads graphs
from the plugin manifest at lifespan start. For this tutorial we
cheat: enqueue a run and immediately drive it via the CLI's stargraph
run against the same DB so the checkpoints are durable. Phase 2 task
2.30 swaps this for a Checkpointer-backed lookup.
In a second terminal:
uv run stargraph run graph.yaml \
--inputs message=hello \
--checkpoint ./.stargraph/serve.sqlite \
--log-file ./.stargraph/audit.jsonl
Capture the run_id=… from the last stdout line.
RUN_ID=$(uv run stargraph run graph.yaml --inputs message=hello \
--checkpoint ./.stargraph/serve.sqlite --no-summary | tail -1 | awk '{print $1}' | cut -d= -f2)
echo "$RUN_ID"
Step 3 — Poll over HTTP¶
Expected RunSummary shape:
{
"run_id": "run-…",
"status": "done",
"graph_hash": "sha256:…",
"started_at": "…",
"finished_at": "…"
}
The list endpoint is paginated:
Step 4 — Try the enqueue surface¶
POST /v1/runs enqueues against the in-process Scheduler. In the POC
the Scheduler resolves the future quickly with a synthetic
RunSummary and returns {run_id: "poc-<graph_id>", status: "pending"}
— it does NOT execute your graph against the live DB yet (that wiring
is Phase 2 task 2.30). Useful for confirming the queueing path works
end-to-end:
curl -s -X POST http://127.0.0.1:8000/v1/runs \
-H "Content-Type: application/json" \
-d '{"graph_id": "run:hello-stargraph", "params": {"message": "hello"}}'
# → {"run_id": "poc-run:hello-stargraph", "status": "pending"}
Step 5 — Replay the real run¶
stargraph replay forks a counterfactual run from a persisted parent at
a chosen --from-step. Without a --mutation JSON, the cf-run still
gets a fresh cf-<uuid> id and a derived graph_hash per design
§3.8.3.
Expected stdout — the cf-run id followed by the parent vs cf
RunDiff rendered as canonical JSON via stargraph.ir.dumps:
cf_run_id=cf-…
{
"added_steps": [...],
"removed_steps": [...],
"state_deltas": {...},
"fact_deltas": {...},
"derived_hash": "sha256:..."
}
A no-op mutation produces an empty diff; the derived_hash still
captures the cf-side identity.
Step 6 — Mutate state on replay¶
Save a CounterfactualMutation JSON file:
cat > mutation.json <<EOF
{
"state_overrides": {"message": "mutated"},
"facts_assert": [],
"facts_retract": []
}
EOF
Then replay with the mutation overlay:
uv run stargraph replay "$RUN_ID" \
--db ./.stargraph/serve.sqlite \
--mutation mutation.json \
--from-step 0 \
--diff
The cf-run now has state.message="mutated" at step 0; the diff
shows the divergence. Pipe cf_run_id into stargraph inspect to walk
the cf-side timeline:
CF_RUN_ID=$(uv run stargraph replay "$RUN_ID" --db ./.stargraph/serve.sqlite \
--mutation mutation.json | head -1 | cut -d= -f2)
uv run stargraph inspect "$CF_RUN_ID" --db ./.stargraph/serve.sqlite
Step 7 — Verify determinism¶
Replay the same run twice with no mutation. The cf-run ids differ
(fresh uuid each time) but the derived_hash is identical — this is
the bit-identity contract per FR-19 / FR-28 amendment 6 (no
set/frozenset state fields, no race/any parallel branches with
write side effects, compiled state_schema folded into the graph
hash).
uv run stargraph replay "$RUN_ID" --db ./.stargraph/serve.sqlite --diff | jq -r '.derived_hash'
uv run stargraph replay "$RUN_ID" --db ./.stargraph/serve.sqlite --diff | jq -r '.derived_hash'
# → identical hashes
Cleaning up¶
The Scheduler is started inside stargraph serve's lifespan; Ctrl-C
on the serve process stops it cleanly (drain + close), then closes
the checkpointer. The SQLite WAL is single-writer, so you cannot run
stargraph serve and stargraph run --checkpoint against the same DB
simultaneously — pick one writer at a time.
What to read next¶
- Serve → API — full route table, error envelopes, rate-limit headers.
- Serve → Scheduler — cron loop, per-graph concurrency, idempotency keys.
- Engine → Replay — cassette mechanics and determinism guards.
- Engine → Counterfactual — full
CounterfactualMutationschema (state_overrides,facts_assert,facts_retract,rule_pack_version,node_output_overrides).