Architecture
Architecture
Forge is a Rust workspace (12 crates) plus a React/TypeScript frontend. This doc explains the crate layout, the task state machine, the database, and the event bus. For runtime configuration see getting-started.md; for the HTTP surface see api.md.
Crate layout
crates/├── forge-cli/ # Binary entrypoint, server startup, CLI commands├── forge-client/ # forge-ctl CLI client├── forge-daemon/ # Local daemon detection and reporting├── api/ # Axum REST endpoints, SSE, middleware├── api-types/ # Shared request/response types (zero internal deps)├── db/ # SQLite schema, migrations, repository implementations├── services/ # Business logic (task state machine, workflow engine)├── executors/ # TaskExecutor trait, Shell executor, JSONL logging├── cli-adapters/ # Codex, Claude, Gemini, opencode, shell, null adapters├── workspace/ # Git worktree lifecycle, locking, path guardrails├── git/ # Low-level git operations├── review/ # CI runner, auditor orchestration├── events/ # In-memory event bus (tokio broadcast)├── mcp-server/ # MCP JSON-RPC tools for agent integration└── config/ # Configuration loading, defaultsDependency flow
forge-cli → api → services → db → events ↑ → mcp-server -------┘ → executors (log schema, shell executor) → workspace → git → config → api-types (shared request/response types, zero internal deps)Architectural patterns
Repository trait pattern
The db crate defines async traits (TaskRepo, AgentRepo, …) in
repository.rs and implements them all on a single SqliteDb struct in
sqlite.rs. Services and routes call trait methods as
TaskRepo::create(&*state.db, ...).
Error propagation chain
DbError (db) → ServiceError (services) → ApiError (api). The api crate’s
errors.rs maps domain errors to HTTP status codes. All errors render as
ErrorResponse { code, message, details, request_id }.
AppState wiring
forge-cli/main.rs creates Arc<SqliteDb> and Arc<EventBus>, passes them to
AppState::new() which constructs TaskService and AgentService internally.
AppState is Clone (all fields are Arc) and used as Axum state.
Event bus
The events crate wraps tokio::sync::broadcast. Services publish
ForgeEvent on state changes; the SSE endpoint at /api/v1/events subscribes
and streams them to web clients and other listeners.
Task state machine
todo ──────────────► in_progress ──────► review ──────► merging ──────► done │ │ │ │ └──► cancelled ◄───────┴──────────────────┴──────────────┘ │ merge_failed ──► blockedAll non-terminal states can transition to cancelled. Terminal states: done,
cancelled. The default workflow lives in
crates/services/src/workflow/default_workflow.rs with sequence
backlog → todo → planning → in_progress → review → merging → done and
merge_failed, blocked, cancelled as auxiliary/failure/terminal states.
Workflow engine (in progress)
Flexible workflow work is partially implemented. WorkflowEngine in
crates/services/src/workflow/engine.rs is the new data-driven path;
TaskService.transition() still uses the legacy TaskStatus/transition_allowed
path. Treat the engine as a parallel code path until the split is removed.
Workflows are project-defined JSON in project.workflow_definition. Empty
string or "{}" resolves at runtime to the built-in DefaultWorkflow.
WorkflowCache caches resolved definitions per project and invalidates on
workflow updates.
StateKind classifies states:
backlog— parking lot; agent claims rejected.initial— exactly one per workflow; validation rejects zero or multiple.active— work state; may declare a role such ascoder.gate— validation/processing state;gate_config.max_rejectionsenables retry-budget checks.terminal— absorbing state; outbound transitions and non-terminal cancellation targets are rejected.custom— no built-in behavior beyond graph validation.
WorkflowEngine::transition lifecycle for A → B:
- Load task, check optimistic version, validate the graph edge or implicit cancellation path.
- Run filtered
A.before_exitguards unlessBis the cancellation target;FailurePolicy::Blockfailures returnGuardRejection(HTTP 412). - Update
task.status, incrementversion, writetransition_log, publishtask.status_changed. - Run filtered
A.on_exit, filteredB.on_enter, then effectiveB.after_enterhooks. Gate states withmax_rejectionsgetcheck_retry_budgetprepended unless already present. - Backfill
transition_log.hook_results_json. - If an
after_enterhook returnsHookResult::Cascade, recursively transition withtriggered_by = "system"; cascade depth is limited to 3.
Hook audience filtering is uniform across phases. HookAudience::All always
runs. AgentOnly runs when triggered_by starts with "agent:" or equals
"system"; UserOnly runs only when it starts with "user:". Non-matching
hooks are skipped without a hook-result entry.
Cancellation is implicit from any non-terminal state to
workflow.cancellation_state (or terminal "cancelled" if unset), even
without an explicit edge. Project before_exit guards are bypassed for this
path; on_exit and cancellation-state on_enter hooks still run.
Roles and assignments
Roles are declared by workflow (roles[]) and states can require a role
(state.role). Per-task assignments live in task_role_assignment keyed by
(task_id, role_name) with either agent_id or user_handle. Claiming
auto-assigns the claimed state’s role to the claiming agent when no assignment
exists; a conflicting pre-assignment returns HTTP 409.
assignee is an engine-reserved role name. Active states without explicit
state.role implicitly bind assignee. This fallback applies only to Active
states; Gate, Initial, Backlog, Terminal, and Custom states without roles bind
no role. state.role = Some("assignee") on a non-Active state is rejected
during validation. DefaultWorkflow is unchanged and uses declared planner,
coder, and reviewer roles.
Retry budgets
Audit-log derived. Gate states may set gate_config.max_rejections;
check_retry_budget counts transition_log rows with from_state = gate and
rejection = true, then cascades to blocked when exhausted. Generic
user-triggered gate-to-active bounces are logged with rejection = false and
do not consume budget.
transition_log is the audit source of truth for state changes. The API
exposes it via GET /api/v1/tasks/{id}/transitions.
Files of interest
crates/services/src/workflow/engine.rs— lifecyclecrates/services/src/workflow/actions/— curated hook actionscrates/services/src/workflow/default_workflow.rs— built-in graphcrates/services/src/workflow/validation.rs— workflow graph validationcrates/services/src/workflow/cache.rs— per-project resolved definitionscrates/services/src/workflow/registry.rs— action name resolutioncrates/db/migrations/V009__workflow_engine.sql—project.workflow_definition,task_role_assignment,transition_log
Happy path
The canonical end-to-end flow is captured by crates/api/tests/happy_path.rs.
It boots the in-process Axum router with an embedded daemon and a real temp
git repo, drives a task through todo → in_progress → review → merging → done,
and asserts:
- The merge SHA lands on the default branch.
- The worktree is removed.
- One
reviewrow withstatus=passedis persisted. - The expected event sequence appears on the bus.
Any refactor that breaks this test likely needs a spec realignment before
merging. Claiming a task auto-dispatches the executor via tokio::spawn in
api::routes::tasks::claim_task — there is no separate “dispatch” endpoint.
Concurrency control
Tasks and agents use optimistic concurrency via a version column. Updates
require WHERE version = ? and increment on success. Version mismatch →
DbError::VersionConflict → HTTP 409.
Database
SQLite with WAL mode. Schema in
crates/db/migrations/V001__initial_schema.sql. Migrations are numbered
V{NNN}__{name}.sql and tracked in _migration table. All primary keys are
app-generated UUID v4; all timestamps are app-generated RFC3339.
Connection pool sets PRAGMA foreign_keys=ON, journal_mode=WAL,
busy_timeout=5000 per connection.
Tables: project, repo, agent, skill, task, execution, review,
task_role_assignment, transition_log, _migration.
For tests, use create_sqlite_pool("sqlite::memory:") for an in-memory
database.
Frontend
React + TypeScript + Vite + TanStack Query/Router. Source in web/src/. Uses
@ path alias → web/src/. API client at web/src/api/client.ts calls
/api/v1/* endpoints. Types in web/src/types/generated/api.ts must match
api-types crate responses.
Crate notes
- db — Enum serialization uses
Display/FromStr(inmodels.rs) for SQLite TEXT columns. Row mapping is manual viasqlx::Row::get(), not compile-time checked macros. - services —
TaskService.transition()handles side effects (event emission, counter increments,ReviewRunneron→ review,MergeServiceonreview → merging,WorkspaceCleanupScheduleron→ done/→ cancelled). Background tasks:CrashRecoveryat startup,HeartbeatMonitor,DaemonMonitor,WorkspaceCleanupScheduler. - review —
ReviewRunnerrunstask.review_config.ci_stepsasbash -lccommands in the worktree; empty steps auto-pass. Creates areviewer-role execution sharing the executor’s workspace. Depends only ondb,events,executors— not onapiorservices. - api — Routes in
routes/{projects,tasks,agents,repos,executions,events,daemons,clis,profiles,runtimes,executor_types}.rs. Error module iserrors.rs(plural). Middleware adds request IDs and CORS.claim_taskauto-dispatches the executor. - executors —
LogWriterappends JSONL with schema version + sequence numbers.ShellExecutorspawns child processes with heartbeat supervision. - mcp-server — JSON-RPC dispatch over
POST /mcpwith its ownMcpState. Does not depend on theapicrate. - workspace — File-based locking via
.forge.lock. Path validation prevents traversal escapes. - config —
ForgeConfigwith precedence: CLI flags > env vars > config file > defaults. Default bind127.0.0.1:8080.