Roadmap Completion TODO¶
This is the master execution checklist for finishing the G-Agent roadmap. It is
derived from the current codebase plus ROADMAP.md, the roadmap phase files,
and docs/reports/hermes-nanobot-reference-audit.md.
Use this file as the working board. The phase documents explain why each item exists; this file tracks what remains to ship.
Product Target¶
G-Agent is an agentic digital character runtime. The finished product should let an owner run durable characters with identity, memory, visual presence, tools, routines, channel presence, safe approvals, and owner-reviewed learning.
The product is not a generic automation gateway. Hermes is the reference for growth loops, memory, learning, skills, context, approvals, and routines. Nanobot is the reference for channels, Web UI, OpenAI-compatible API, MCP, runner structure, and operational hardening.
Current Baseline¶
- [x] Python runtime lives in
backend/agent/g_agent/. - [x] WhatsApp bridge lives in
backend/agent/bridge/. - [x] MkDocs source lives in
docs/. - [x] Tests are flat under
backend/agent/tests/. - [x] Session SQLite first slice exists in
session/sqlite_store.py. - [x] Shared command router exists in
command/. - [x] Character profile core exists in
character/. - [x] Learning queue first slice exists in
learning/. - [x] Skill lifecycle first slice exists in
skills/andagent/tools/skills.py. - [x] Context engine and compressor first slice exists in
context/. - [x] Routine first slice exists in
routines/. - [x] Toolset and MCP first slice exists in
agent/tools/toolsets.pyandmcp/manager.py. - [x] Insights and public-trust first slice exists in
observability/insights.pyplus docs. - [x] Shared channel contracts exist for capability flags, media envelopes, and delivery result/error types.
- [x] First-party product API server exists under
g_agent/api/. - [x] WebSocket channel exists at
channels/websocket.pyas minimal aiohttp channel, not Nanobot full surface. Advanced Nanobot features missing: token issuance endpoint, media signing, SSL/TLS, capability registry entry gap. - [x] First-party
webui/exists: React SPA source/build/static serving/bootstrap tests exist; implemented core, tests minimal. Some stale Nanobot naming artifacts. - [x] Formal MemoryManager first slice exists under
g_agent/memory/with provider registration/order/failure isolation/fenced output tests. Production-grade Hermes parity not proven. - [x] Background learning reviewer first slice exists with tightened heuristics.
- [x] Streamable HTTP MCP transport exists in
mcp/manager.py. - [x] Docker execution backend exists as transient/stateless container scaffold with validation tests (23 tests pass); not production-grade stateful/hardened persistent backend.
Execution Order¶
Build in this order unless a production bug forces a hotfix:
- Finish v0.4 channel reliability.
- Finish v0.7 memory manager and owner model.
- Finish v0.8/v0.9 learning and skills growth loop.
- Finish v0.10 context compression integration.
- Finish v0.11 routines and triggers.
- Finish v0.12 MCP and execution backend gaps.
- Build v0.5 API, WebSocket, and Web UI after backend state is stable.
- Keep v0.13 trust docs current after each shipped slice.
Reason: sessions, commands, approvals, channels, memory, and learning are the substrate. Web UI should become a control room on top of real state, not a thin chat page over unfinished internals.
P0: Commit And Push Hygiene¶
- [ ] Keep commits scoped to one roadmap slice.
- [ ] After every commit, update this TODO with what shipped and what is next.
- [ ] After every commit, report:
- commit hash
- changed files
- verification commands
- next best move
- [ ] Push only when the owner asks or when explicitly finishing a release slice.
v0.1: Stabilize Current Runtime¶
Status: mostly shipped, keep as maintenance baseline.
- [x] Keep workspace restriction enabled by default.
- [x] Keep
tools.allowedPathsas the official trusted-path mechanism. - [x] OpenAI-compatible image proxy path exists in visual/selfie tooling.
- [x] Google Workspace helper paths are covered by tests.
- [x]
/logsexists and exposes bounded task checkpoint output. - [x] Troubleshooting docs exist for setup and runtime operations.
- [ ] Keep image proxy docs updated when provider payloads change.
- [ ] Keep service/PATH troubleshooting updated when
gws,gcloud, or bridge service behavior changes. - [ ] Keep runtime log redaction tests current as new secret-like fields are introduced.
Verification target:
- [ ]
ruff check g_agent tests --select F - [ ]
python -m compileall -q g_agent - [ ] focused tests around visual providers, Google Workspace helpers, runtime checkpoints, and security audit.
v0.2: Session Store And Recall¶
Status: first slice shipped; polish remains.
- [x] Add
SessionSQLiteStore. - [x] Add
sessions,messages,tool_calls,media_refs, and FTS tables. - [x] Enable WAL and schema idempotency.
- [x] Add write retry.
- [x] Keep JSONL sessions readable.
- [x] Dual-write from
SessionManager.save(). - [x] Add SQLite cleanup for
/new, archive, and delete. - [x] Add
session_search. - [x] Add
/historyrecall. - [x] Preserve
/searchas web search. - [x] Add punctuation-heavy fallback search for commands, paths, and URLs.
- [ ] Add explicit JSONL historical backfill/import command.
- [ ] Add backfill dry-run output.
- [ ] Add backfill conflict handling for duplicate session keys.
- [ ] Add richer context windows around each search hit.
- [ ] Add grouped search summaries by session.
- [ ] Add owner-facing SQLite/session status command.
- [ ] Add session title generation or owner-editable titles.
- [ ] Add parent-session lineage support for compacted/forked sessions.
- [ ] Add more channel/source filtering tests.
- [ ] Add migration docs for old JSONL-only workspaces.
Verification target:
- [ ]
pytest -q tests/test_session_sqlite_store.py tests/test_session_new_command.py - [ ] Search tests must preserve commands, paths, URLs, decisions, and unresolved items.
v0.3: Commands, Logs, And Approvals¶
Status: core shipped; approval state remains incomplete.
- [x] Add shared
CommandRouter. - [x] Add shared
CommandContext. - [x] Wire direct CLI/chat command dispatch through shared router.
- [x] Add quoted-argument parsing.
- [x] Add helpful unknown command responses.
- [x] Add
/status. - [x] Add
/logs. - [x] Add
/new. - [x] Add
/sessions. - [x] Add
/history. - [x] Add
/approvereplay path. - [x] Add
/deny. - [x] Add
/learn. - [x] Add
/skills. - [x] Persist approval decisions into first-class approval state.
- [x] Add approval ids that survive process restarts.
- [x] Add approve-once behavior.
- [x] Add approve-for-session behavior.
- [x] Add narrow persistent allowlist behavior.
- [x] Add owner command to list pending approvals.
- [x] Add owner command to clear one pending approval by id.
- [x] Add owner command to clear all pending approvals for a session.
- [x] Add
security/approval_policy.py. - [x] Add risky shell classifier examples.
- [x] Add risky filesystem classifier examples.
- [x] Add tests for dangerous command detection.
- [x] Add tests for approved command replay safety.
- [ ] Move more legacy handlers from
channels/slash_commands.pyintocommand/builtin.py. - [ ] Regenerate CLI docs after CLI command surface changes.
Verification target:
- [ ]
pytest -q tests/test_slash_command_router.py - [x] Add
tests/test_approval_state.py. - [x] Add
tests/test_approval_policy.py.
v0.4: Core Channel Reliability¶
Status: next best move; partial, not shipped.
- [x] Telegram channel file exists.
- [x] WhatsApp channel file exists.
- [x] Discord channel file exists.
- [x] Email channel file exists.
- [x] Slack channel file exists.
- [x]
BaseChannel._handle_message()normalizes inbound text/media basics. - [x]
BaseChannel._handle_message()enforcesallow_from. - [x]
ChannelManagersupervises channel restarts. - [x]
ChannelManagerretries outbound sends. - [x] Tests cover reconnects, ported channel config, bridge token auth, CLI bridge login, and multimodal outbound basics.
- [ ] Add
channels/capabilities.py. - [ ] Define
ChannelCapabilities. - [ ] Add
supports_media_send. - [ ] Add
supports_media_receive. - [ ] Add
supports_buttons. - [ ] Add
supports_typing. - [ ] Add
supports_threads. - [ ] Add
supports_reactions. - [ ] Add
max_text_chars. - [ ] Add
parse_mode. - [ ] Expose capabilities from WhatsApp.
- [ ] Expose capabilities from Telegram.
- [ ] Expose capabilities from Discord.
- [ ] Expose capabilities from Email.
- [ ] Expose capabilities from Slack.
- [x] Surface channel capabilities through
/statusor a channel diagnostics command. - [x] Add
channels/media.py. - [x] Define normalized inbound media envelope.
- [x] Define normalized outbound media envelope.
- [x] Include path/url fields.
- [x] Include mime type.
- [x] Include filename.
- [x] Include size.
- [x] Include content hash when local file exists.
- [x] Include channel metadata.
- [x] Keep
InboundMessage.media: list[str]compatibility while adding richer metadata. - [x] Add
channels/errors.py. - [x] Define delivery result model.
- [x] Define delivery error codes.
- [x] Normalize auth failure errors.
- [x] Normalize disconnected bridge errors.
- [x] Normalize unsupported-media errors.
- [x] Normalize sandbox/allowed-path errors.
- [x] Normalize rate/flood errors.
- [x] Normalize message-too-long errors.
- [x] Add shared long-message splitter.
- [x] Add per-channel split limits.
- [x] Preserve code blocks when splitting where possible.
- [ ] Preserve links when splitting where possible.
- [x] Add tests for splitter edge cases.
- [x] Harden WhatsApp bridge diagnostics.
- [x] Add WhatsApp QR/login status command or diagnostics surface.
- [x] Distinguish bridge disconnected vs auth failed vs media failed.
- [ ] Improve WhatsApp local-file media send errors.
- [ ] Improve WhatsApp sandbox/allowed-path error text.
- [ ] Add WhatsApp media delivery tests.
- [x] Harden Telegram formatting.
- [x] Add Telegram HTML/Markdown escape helper tests.
- [x] Add Telegram DM/group policy tests.
- [ ] Add Telegram rate/flood handling tests where practical.
- [x] Harden Discord attachment replies.
- [x] Add Discord DM/mention policy tests.
- [x] Add Discord thread/session key mapping tests.
- [ ] Add delivery receipts/errors where channel APIs expose them.
- [x] Extend
test_multimodal_outbound.py. - [x] Add
test_channel_capabilities.py. - [x] Add
test_media_envelope.py. - [x] Add
test_whatsapp_media_delivery.py. - [x] Add
test_telegram_formatting.py. - [x] Add
test_discord_session_mapping.py. - [ ] Update
docs/channels.md. - [ ] Update
docs/troubleshooting.mdfor channel diagnostics.
First PR boundary:
- [ ] Channel capabilities + media envelope + shared tests.
Verification target:
- [ ]
pytest -q tests/test_channel_reconnect.py tests/test_ported_channels.py tests/test_multimodal_outbound.py - [ ] new v0.4 tests listed above.
v0.5: Web UI And OpenAI-Compatible API¶
Status: partial; minimal API shipped with canonical + compatibility aliases tested, WebSocket exists as minimal aiohttp channel (not Nanobot full surface), Web UI React SPA source/build/static serving/bootstrap tests exist (core implemented, tests minimal, some stale Nanobot naming artifacts). Advanced Nanobot features missing: token issuance endpoint, media signing, SSL/TLS, capability registry entry gap.
- [x]
GatewayConfigexists inconfig/schema.py. - [x]
Agent,AgentLoop,MessageBus,ChannelManager, andSessionManagerexpose reusable runtime hooks. - [x]
observability/http_server.pyexists for metrics. - [x] Add
g_agent/api/. - [x] Add
g_agent/api/server.py. - [x] Add
g_agent/api/openai_compat.py. - [x] Add API auth/token config.
- [x] Add local-first bind defaults.
- [x] Add request size limits.
- [x] Add response error model.
- [x] Add
GET /health. - [x] Add
GET /status. - [x] Add
GET /sessions. - [x] Add
GET /sessions/{id}. - [x] Add session history response model.
- [x] Add media upload endpoint.
- [x] Store uploaded media as refs, not raw blobs in sessions.
- [x] Add
GET /approvals. - [x] Add
POST /approvals/{id}/approve. - [x] Add
POST /approvals/{id}/deny. - [x] Add
GET /learning. - [x] Add learning candidate detail endpoint.
- [x] Add learning approve/reject/edit endpoints.
- [x] Add learning apply endpoint.
- [x] Add
GET /profiles. - [x] Add profile detail endpoint.
- [ ] Add profile switch endpoint after profile isolation is implemented.
- [x] Add
GET /v1/models. - [x] Add
POST /v1/chat/completions. - [x] Add non-streaming OpenAI-compatible chat response.
- [x] Add streaming/SSE chat response.
- [x] Normalize text input.
- [x] API canonical + compatibility aliases tested: /api/health, /health, /api/status, /status, /api/v1/models, /v1/models, /api/v1/chat/completions, /v1/chat/completions.
- [ ] Normalize image input.
- [ ] Normalize base64 data URLs.
- [ ] Add remote URL policy for multimodal input.
- [ ] Add
POST /v1/responseslater. - [x] Add
channels/websocket.pyas minimal aiohttp channel (not Nanobot full surface). - [x] Add WebSocket token auth.
- [x] Add WebSocket session mapping.
- [ ] Add streaming deltas.
- [x] Add lifecycle events.
- [x] Add tool-call events.
- [x] Add approval-needed events.
- [x] Add learning-candidate events.
- [x] Add media-upload events.
- [x] Add channel-status events.
- [ ] Advanced Nanobot features missing: token issuance endpoint, media signing, SSL/TLS, capability registry entry gap.
- [x] Add
webui/: React SPA source/build/static serving/bootstrap tests exist; implemented core, tests minimal. Some stale Nanobot naming artifacts. - [x] Build session sidebar.
- [x] Build chat thread.
- [x] Build image lightbox.
- [x] Build connection/channel status panel.
- [x] Build character/profile switcher.
- [x] Build memory review panel.
- [x] Build skill review panel.
- [x] Build approvals panel.
- [x] Build routine scheduler panel.
- [x] Build provider and visual settings panel.
- [x] Add Web UI tests or smoke tests.
- [x] Add API tests.
- [x] Add WebSocket tests.
- [x] Add docs for API auth and local bind behavior.
First PR boundary:
- [x] Minimal product API with health/status/sessions plus
/v1/models.
Verification target:
- [x]
pytest -q tests/test_api_*.py - [ ]
pytest -q tests/test_websocket_*.py - [ ] Web UI smoke command once
webui/exists.
v0.6: Character Profiles And Visual Identity¶
Status: core shipped; isolation and visual merge remain.
- [x]
CharacterProfilemodel exists. - [x]
CharacterStorecan save/load/list profiles. - [x]
CharacterStorecan create default owner and guest profiles. - [x]
ContextBuilderrenders# Character Profile. - [x]
/profilecan inspect and list profiles. - [x] Global visual/selfie configuration exists.
- [x] OpenAI-compatible image proxy support exists.
- [x] Guest profile tool enforcement exists.
- [ ] Add dedicated profile validation tests.
- [ ] Fully wire profile switching into live
AgentLoop. - [ ] Ensure profile switching does not mix session history.
- [ ] Ensure profile switching does not mix memory context.
- [ ] Ensure profile switching does not mix tool policy.
- [ ] Add profile-level visual config.
- [ ] Merge profile-level visual config with global defaults.
- [ ] Pass merged visual config into
SelfieTool. - [ ] Add visual identity prompt template per profile.
- [ ] Add selfie template per profile.
- [ ] Add mirror template per profile.
- [ ] Add avatar template per profile.
- [ ] Add outfit template per profile.
- [ ] Add scene template per profile.
- [ ] Add identity anchor fields per profile.
- [ ] Add reference image roots per profile.
- [ ] Add fallback behavior when image provider fails.
- [ ] Add owner-visible profile diffs.
- [ ] Add profile diff apply/reject flow through learning queue.
- [ ] Keep docs generic and free of private character defaults.
- [ ] Update
docs/persona.md.
Verification target:
- [ ]
pytest -q tests/test_character_profiles.py tests/test_selfie_tool.py tests/test_guest_enforcement.py - [ ] Add dedicated visual identity tests.
v0.7: Memory Manager And Owner Model¶
Status: first slice shipped with provider registration/order/failure isolation/fenced output tests. Production-grade Hermes parity not proven; write cadence and external provider config remain.
- [x]
MemoryStoreexists inagent/memory.py. - [x] Markdown memory files exist.
- [x]
FACTS.mdexists. - [x]
remember,recall, andupdate_profileuse current memory store. - [x]
ContextBuilderretrieves relevant memory before prompt assembly. - [x] Add
g_agent/memory/. - [x] Add
memory/types.py. - [x] Add
MemoryProviderinterface. - [x] Add provider
name. - [x] Add provider
system_prompt_block(). - [x] Add provider
prefetch(query, session_id=""). - [x] Add provider
sync_turn(user_content, assistant_content, session_id=""). - [x] Add provider
get_tool_schemas(). - [x] Add provider
handle_tool_call(...). - [x] Add
memory/builtin.py. - [x] Wrap existing
MemoryStoreasBuiltinMemoryProvider. - [x] Keep markdown files readable and writable.
- [x] Keep current recall behavior stable.
- [x] Add
memory/context.py. - [x] Add context fencing helpers.
- [x] Use explicit
<memory-context>markers or equivalent. - [x] Strip nested memory tags from provider output.
- [x] Add injection-pattern stripping for recalled memory blocks.
- [x] Add
memory/manager.py. - [x] Register builtin provider by default.
- [x] Allow at most one external provider.
- [x] Reject second external provider.
- [x] Make provider failure non-fatal when builtin still works.
- [x] Provider registration/order/failure isolation/fenced output tests exist.
- [ ] Add manager-level pre-turn recall.
- [ ] Add manager-level post-turn sync hook.
- [ ] Add write cadence config.
- [ ] Add manual write cadence.
- [ ] Add async-after-turn write cadence.
- [ ] Add session-end write cadence.
- [ ] Add every-N-turns write cadence.
- [ ] Update
ContextBuilderto callMemoryManager. - [ ] Keep memory section order stable in prompts.
- [ ] Record memory recall metrics through manager.
- [ ] Add owner facts category.
- [ ] Add preferences category.
- [ ] Add people/relationships category.
- [ ] Add projects category.
- [ ] Add routines category.
- [ ] Add environment/tool quirks category.
- [ ] Add character reflections category.
- [ ] Add memory feedback/update/remove actions.
- [ ] Defer Honcho dependency.
- [ ] Defer external memory provider until local manager is stable.
- [ ] Update docs for memory architecture.
First PR boundary:
- [ ] Provider interface + builtin adapter + manager prefetch integration.
Verification target:
- [ ] Add
tests/test_memory_manager.py. - [ ] Add
tests/test_memory_context_fencing.py. - [ ] Existing memory tests still pass.
v0.8: Owner-Reviewed Learning Loop¶
Status: partial; background reviewer first slice shipped with tightened heuristics (profile/relationship/routine/tool_quirk apply manual_review_required; explicit memory apply works; weak memory manual_review_required; 11 tests pass).
- [x]
LearningCandidatemodel exists. - [x]
LearningQueuepersists candidates in SQLite. - [x]
/learnlists pending candidates. - [x]
/learninspects pending candidates. - [x]
/learnapproves candidates. - [x]
/learnrejects candidates. - [x]
/learnedits candidates. - [x]
/learnapplies skill candidates. - [x]
/learnrolls back skill candidates. - [x] Queue persists
diff_preview. - [x] Queue persists
applied_at. - [x] Queue persists rollback metadata.
- [x] Add
learning/reviewer.py. - [x] Add background review hook after response delivery.
- [x] Ensure review never blocks the main response.
- [x] Add reviewer config and cadence.
- [x] Learning heuristics tightened: profile/relationship/routine/tool_quirk apply manual_review_required; explicit memory apply works; weak memory manual_review_required; 11 tests pass.
- [ ] Add memory review cadence.
- [ ] Add skill review cadence.
- [ ] Add routine review cadence.
- [ ] Add profile review cadence.
- [ ] Inspect recent conversation.
- [ ] Inspect tool-heavy work.
- [ ] Inspect repeated errors.
- [ ] Inspect new owner preferences.
- [ ] Inspect new project facts.
- [ ] Inspect reusable workflow patterns.
- [ ] Inspect character/profile drift.
- [ ] Produce
memory_candidate. - [ ] Produce
profile_candidate. - [ ] Produce
skill_candidate. - [ ] Produce
routine_candidate. - [ ] Produce
relationship_update. - [ ] Produce
tool_quirk. - [ ] Add source session ids.
- [ ] Add source message ids.
- [ ] Add reason field.
- [ ] Add risk level.
- [ ] Add candidate evidence hash.
- [ ] Add dedupe for repeated candidates.
- [ ] Ensure rejected candidates do not immediately reappear without new evidence.
- [ ] Add memory candidate apply flow.
- [ ] Add profile candidate diff/apply flow.
- [ ] Add routine candidate apply flow.
- [ ] Add relationship update apply flow.
- [ ] Add tool quirk apply flow.
- [ ] Add rollback path for non-skill candidates.
- [ ] Add
/learnfilters by type. - [ ] Add
/learnfilters by risk. - [ ] Add
/learnfilters by status. - [ ] Add Web UI-ready queue APIs later.
- [ ] Keep auto-apply disabled by default.
- [ ] Add tests for reviewer candidate creation.
- [ ] Add tests for reviewer non-blocking behavior.
- [ ] Add tests for dedupe and rejected-candidate suppression.
- [ ] Update docs for owner-reviewed learning.
First PR boundary:
- [ ] Background reviewer skeleton + deterministic heuristic candidate creation behind opt-in config.
Verification target:
- [ ]
pytest -q tests/test_learning_skill_lifecycle.py - [ ] Add
tests/test_learning_reviewer.py.
v0.9: Skills As Procedural Memory¶
Status: partial; local lifecycle strong, background proposal missing.
- [x] Built-in skill store exists.
- [x] Custom skill store exists.
- [x] Draft skill directory exists under workspace state.
- [x] Skill validator exists.
- [x] Skill manager exists.
- [x]
skill_managetool exists. - [x] Owner-reviewed skill candidate apply/edit/rollback exists.
- [x] Focused lifecycle command coverage exists.
- [x] Atomic draft patch operation exists.
- [x] Validation rollback exists for draft patches.
- [x]
/skills listexists. - [x]
/skills viewexists. - [x]
/skills patch-draftexists. - [x] Background reviewer proposes skill candidates.
- [x] Add broader supporting-file lifecycle commands.
- [x] Add create draft command through
/skills. - [x] Add validate draft command through
/skills. - [x] Add activate draft command through
/skillsor/learn. - [x] Add disable skill command.
- [x] Add rollback active skill command outside candidate flow.
- [x] Add delete draft command.
- [ ] Add supporting file add/update/delete operations.
- [x] Enforce allowed supporting-file directories.
- [x] Validate
references/. - [x] Validate
templates/. - [x] Validate
scripts/. - [x] Validate
assets/. - [x] Add optional security scan for scripts.
- [x] Add prompt-injection scan for skill files.
- [x] Add hidden-character scan for skill files.
- [x] Add max-size policy per skill package.
- [ ] Add
/skill <name>invocation later. - [ ] Add skill setup metadata later.
- [ ] Add progressive loading improvements.
- [ ] Add tests for supporting-file operations.
- [ ] Add tests for script security scan.
- [ ] Add tests for prompt-injection scan.
- [ ] Update docs for procedural skills.
First PR boundary:
- [ ] Background skill candidate proposal using existing
LearningQueueandSkillManager.
Verification target:
- [ ]
pytest -q tests/test_skill_commands.py tests/test_learning_skill_lifecycle.py - [ ] Add validator tests for supporting files and injection scanning.
v0.10: Context Engine And Compression¶
Status: partial; first-slice compression implemented (summarize middle, fallback, prune tool outputs) but lacks dedicated advanced tests/token budgets/iterative structured compression. Automatic integration missing.
- [x]
ContextEngineinterface exists. - [x]
DefaultContextEngineadapter exists. - [x] Deterministic prompt section building exists.
- [x]
ContextCompressorexists. - [x] Initial summarization exists.
- [x] Initial tool pruning exists.
- [x] Fallback model behavior for compression failures exists.
- [ ] Dedicated advanced tests/token budgets/iterative structured compression.
- [ ] Add automated compression trigger in
AgentLoop. - [ ] Define token/message thresholds.
- [ ] Protect recent tail.
- [ ] Protect identity/profile head.
- [ ] Protect active approval context.
- [ ] Protect active tool results needed for current task.
- [ ] Prune large tool output.
- [ ] Redact sensitive data before compression.
- [ ] Add fallback model behavior for compression failures.
- [ ] Ensure summaries are reference-only, not instructions.
- [ ] Replace
/compactinternals withContextCompressor. - [ ] Preserve current
/compactuser-facing behavior. - [ ] Add prompt-injection tests from context files.
- [ ] Add prompt-injection tests from memory blocks.
- [ ] Add prompt-injection tests from retrieved transcripts.
- [ ] Add tests for compression failure degradation.
- [ ] Add tests for protected recent tail.
- [ ] Add tests for tool output pruning.
- [ ] Update docs for context engine and compression.
First PR boundary:
- [ ] Replace
/compactinternals withContextCompressor, then wire automatic trigger after tests are stable.
Verification target:
- [ ] Add/extend
tests/test_context_engine.py. - [ ] Add/extend
tests/test_context_compression.py.
v0.11: Routines, Cron, And Triggers¶
Status: partial; trigger/workflow depth missing.
- [x] Routine config model exists.
- [x] Routine persistence store exists.
- [x] Routine runner exists.
- [x] Routine scheduler skeleton exists.
- [x] Cron bridge exists.
- [x]
/routinesmanagement command exists. - [x] Add script pre-processing step.
- [x] Run approved script before agent turn.
- [x] Capture stdout as context.
- [x] Capture stderr as diagnostics.
- [x] Bound script runtime.
- [x] Bound script output size.
- [x] Apply approval policy to script routines by blocking scripts when
approval_policy=never. - [x] Routine step metadata rendering fixed/tested.
- [x] Busy protection default restored; explicit bypass for internal routine preserved.
- [ ] Add multi-skill workflow model.
- [ ] Add routine step list.
- [ ] Add step-level allowed tools.
- [ ] Add step-level approval policy.
- [ ] Add step-level timeout.
- [ ] Add webhook trigger type.
- [ ] Add API trigger type.
- [ ] Add trigger auth.
- [ ] Add trigger replay/idempotency key.
- [ ] Add quiet hours enforcement tests.
- [ ] Add delivery policy tests.
- [ ] Add destination channel policy tests.
- [ ] Add routine failure diagnostics.
- [ ] Add routine history/log inspection.
- [ ] Add docs for recurring workflows.
First PR boundary:
- [x] Script preprocessing for routines with bounded stdout context.
Verification target:
- [x] Add/extend
tests/test_routines_*.py. - [ ] Existing cron/proactive tests still pass.
v0.12: Toolsets, MCP, And Execution Backends¶
Status: partial; Docker execution backend exists as transient/stateless container scaffold with validation tests (23 tests pass); not production-grade stateful/hardened persistent backend. MCP tool/resource timeouts, retry, cancellation handling, schema normalization source exist; missing prompts, enabled_tools filtering, typed config, parallel connection, timeout/cancellation/schema normalization tests not all present.
- [x]
ToolsetResolverexists. - [x] Per-message tool filtering exists in
AgentLoop. - [x] MCP stdio transport exists.
- [x] MCP SSE transport exists.
- [x] MCP streamable HTTP transport exists.
- [x] Dynamic MCP tool registration exists.
- [x] Tool grouping by capability exists.
- [x] Shared runner first slice exists.
- [x] Local execution backend exists.
- [x] Add streamable HTTP MCP transport.
- [x] Add streamable HTTP config fields through the existing MCP server config dict.
- [x] Add streamable HTTP headers config support.
- [x] Add streamable HTTP timeout wiring tests.
- [x] Add MCP transient retry tests.
- [ ] Add MCP OAuth only when needed.
- [ ] Add MCP path traversal checks where file paths are accepted.
- [ ] Add MCP tool schema edge-case tests.
- [ ] Add subagent status events.
- [ ] Add subagent cancellation events.
- [ ] Add subagent completion summaries routed to origin channel.
- [ ] Add subagent artifact summary model.
- [x] Add Docker execution backend as transient/stateless container scaffold.
- [x] Add Docker availability check.
- [x] Add Docker image config.
- [x] Add Docker workspace mount policy.
- [x] Add Docker allowed-path policy.
- [x] Add Docker network policy.
- [x] Add Docker timeout policy.
- [x] Add Docker cleanup policy.
- [x] Add Docker tests with skip when Docker unavailable (23 focused tests pass).
- [ ] Production-grade stateful/hardened persistent backend remains later work.
- [ ] Add SSH/VPS backend later only after Docker/local are stable.
- [ ] Keep Modal/Daytona/Singularity out of core.
- [ ] Update MCP docs.
- [ ] Update execution backend docs.
First PR boundary:
- [x] Streamable HTTP MCP transport with focused tests.
Verification target:
- [x]
pytest -q tests/test_toolsets.py tests/test_execution_backend.py - [x] Add/extend
tests/test_mcp_*.py.
v0.13: Insights, Packaging, And Public Trust¶
Status: completed slice; keep current as new features ship.
- [x]
docs/release-notes/checklist.mdexists. - [x]
docs/security.mddocuments current sandbox, policy, and secrets surfaces. - [x]
docs/install-matrix.mdexists. - [x]
InsightsEngineexists. - [x] Provider stats exist.
- [x] Failed-call parsing exists.
- [x] Skill usage exists.
- [x] Guest enforcement exists.
- [x] Tests cover provider stats.
- [x] Tests cover failed-call parsing.
- [x] Tests cover skill usage formatting.
- [x] Tests cover guest enforcement.
- [x] Tests cover default profile setup.
- [ ] Keep insights updated when MemoryManager lands.
- [ ] Keep insights updated when background reviewer lands.
- [ ] Keep insights updated when WebSocket/API lands.
- [ ] Keep insights updated when routines grow trigger/workflow history.
- [ ] Keep security docs updated when approval persistence lands.
- [ ] Keep security docs updated when Docker backend lands.
- [ ] Keep install docs updated when Web UI/API service lands.
- [ ] Add third-party notices if code/assets are copied from references.
- [ ] Add release notes for each shipped slice.
- [ ] Keep
mkdocs build --strictclean.
Verification target:
- [ ]
pytest -q tests/test_insights.py tests/test_guest_enforcement.py tests/test_security_audit.py - [ ]
mkdocs build --strict
Cross-Cutting Done Definition¶
Every task is not done until:
- [ ] Code exists in the expected module path.
- [ ] Tests cover happy path.
- [ ] Tests cover at least one failure path.
- [ ] Owner-facing command/API behavior is documented if exposed.
- [ ] Security/approval behavior is documented if risky.
- [ ] Roadmap phase file is updated.
- [ ] This TODO is updated.
- [ ]
CHANGELOG.mdis updated. - [ ]
ruff check g_agent tests --select Fpasses. - [ ]
python -m compileall -q g_agentpasses. - [ ] Relevant focused tests pass.
- [ ] Full
pytest -qpasses before commit unless the owner explicitly asks for a partial checkpoint. - [ ]
mkdocs build --strictpasses for docs changes.
Immediate Queue¶
Next Commit¶
- [x] v0.4 channel contracts:
- [x] add channel capability types
- [x] add media envelope types
- [x] add delivery result/error types
- [x] add tests for shared contracts
- [x] update channel roadmap docs
After That¶
- [x] v0.4 channel-specific hardening:
- [x] wire delivery result/error contracts into send paths
- [x] expose channel capabilities through diagnostics/status
- [x] WhatsApp diagnostics
- [x] Telegram formatting safety
- [x] Discord session/thread mapping
Then¶
- [x] v0.7 MemoryManager first PR:
- [x] provider interface
- [x] builtin adapter
- [x] manager prefetch
- [x] memory context fencing
Then¶
- [x] v0.8/v0.9 background reviewer:
- [x] reviewer skeleton
- [x] candidate generation
- [x] dedupe/rejection suppression
- [x] skill proposal path
Then¶
- [x] v0.10 compression integration.
- [x] v0.11 routine script preprocessing.
- [x] v0.12 streamable HTTP MCP.
- [x] v0.5 minimal product API.