PM OS — An autonomous PM engine I built for myself

07:08 IDT, every weekday

Before I’m at my desk, the briefing is on it.

cowork · pm-os · briefing.txt

PM OS · Morning Briefing · 2026-05-04 · 07:08 IDT

── CONVERGENCE DETECTED (score 6/10) ──────────────────
Three signals across last 72h suggest a positioning gap on
the [Sponsor Bank A] integration:
  · 4 customer-voice tickets · same onboarding friction
  · 1 Slack thread · [Eng Lead] flagged it architecturally
  · 1 competitor [Brand B] shipped an update addressing it

── DRAFTED FOR YOUR REVIEW ─────────────────────────────
  · 3-paragraph memo for the next [Bank] sync
  · spec edit on the onboarding flow (paste-able to Figma)
  · prioritization argument for next sprint

── YOU OVERRULED YESTERDAY ─────────────────────────────
  Suggested: merge onboarding tracks A and B.
  Your call: rejected — regulator separation requires distinct flows.
  Logged for the audit trail.

── TIER 4 PENDING /pm-approve ──────────────────────────
  · "Schedule alignment meeting with [Sponsor]" → ready

  >  /pm-approve to ship the Tier-4 actions when ready.

I make coffee. I read. I approve, overrule, redirect, kill.

By the time the team is in the meeting, the work is staged.

For the previous decade, my mornings looked like this. Open laptop. Open Slack. Open Jira. Open Gmail. Open the customer-voice dashboard. Skim, scroll, search, open another tab, lose track of what I was looking for, scroll back.

By 9am I had a vague sense that something needed attention. By 10am, in standup, I would have to articulate which something — and I would usually pick the wrong one, because the wrong one was the one most recently mentioned.

The job felt like gathering eight hours a day and deciding maybe two. The two-hour part was where I added value. The eight-hour part was overhead.

I built PM OS to invert that ratio.

The architecture, in reference detail

Five layers under me. Each one is a real folder in a private repo with named files in it.

Layer 1 — Foundation

The runtime is Claude Cowork, with six MCP integrations wired in: Atlassian (Jira, Confluence), Google Workspace (Gmail, Calendar, Drive, Docs), Slack, GitHub, Figma, and a local Obsidian server for the memory vault. The scheduler runs as macOS launchd jobs — 20 active, one per agent, one disabled (more on that below). Memory is append-only by convention, organized by scout, by daily briefing, by decision memo, by strategy draft. Every output cites its source. Every output is dated. Staleness is the failure mode the dashboard flags first.

Layer 2 — Listening

Twelve scouts, each running on its own cadence:

gmail-scout — 4× daily (08:00, 12:00, 16:00, 20:00). Reads partner-domain mail and labeled threads. Distills to action items, decisions, awaiting-response.
slack-scout — 4× daily, same cadence. Ten monitored channels plus DMs, last 4 hours.
calendar-scout — twice daily (07:00 today, 18:00 tomorrow-preview). Classifies meetings, flags decision meetings, prep notes.
github-scout — daily 18:00. PR velocity, awaiting-review queue, stale branches, failing CI.
jira-tracker — daily 18:00. Ticket queue, blockers, sprint burndown.
code-health — daily 18:00. Cross-repo CI status, security alerts, deploy failures.
stakeholder-scout — daily 18:00. External partner health (esh, Liberty, Altshuler, TCH, BOI), silent-or-overdue flags.
customer-voice-scout — daily 20:00. Customer/partner signal from authoritative sources only.
competitor-scout — weekly. Watches a curated set of fintech and BaaS plays.
regulatory-scout — weekly. PSD3 progress, Bank of Israel circulars, Fed/OCC actions, FedNow/RTP updates.
market-scout — weekly. Macro, partnership announcements, fundraising in adjacent space.
youtube-scout — weekly. Practitioner talks, podcast transcripts, conference recordings.

Each scout writes a single dated markdown file. None of them fetch — they read what their MCP exposes. Outputs are bounded so token cost stays predictable.

Layer 3 — Synthesis

Four agents plus the morning-briefing composer. None of them call MCPs directly. They read scout outputs and produce structured artifacts.

culture-learner — keeps every output aligned with the team's values and prior decision patterns. Refreshes the culture summary weekly, reads the decision timeline to detect when a draft argues against something I already decided.
pattern-detector — runs every weekday morning. Reads the past 7 days of scout outputs across all twelve, scores convergences from 1 to 10. A score of ≥5 means at least three independent sources are pointing at the same shape.
pm-strategist — invoked weekly and on-demand via /pm-strategize. Produces a structured argument with framing, what's already decided, what's new, recommendation, dissenting counter-position, decision log entry. The dissenting view is mandatory — surfaced even when the agent's own recommendation is confident.
decision-drafter — runs every weekday at 06:00. Reads the calendar-scout's classification of decision meetings, drafts a one-page pre-read memo per meeting. Pre-reads land on my desk before the meeting prompts have arrived.
morning-briefing — runs at 07:00. Reads the freshest of every scout, the latest pattern-detector output, the approval queue. Composes the page above in the shape and voice of /pm-brief. Tight. Cited.

Layer 4 — Action

One agent, the action-executor, with one job: turn approved drafts into real artifacts in real systems. Real Jira tickets. Real Confluence pages. Real internal Slack posts. Refuses to act without a /pm-approve <id> token. Refuses Tier-5 entirely. A second action agent, slack-responder, sits disabled in the repo as a reminder of what failed and why.

Layer 5 — Reflection

Two meta agents:

retrospective-agent — runs every Friday at 17:00. Reads the week's audit log of approvals and overrules. Scores system health. Names the patterns I most often overruled and proposes a prompt edit to the relevant synthesis agent.
meta-agent — runs monthly. Reads the retrospective trail and the dashboard. Detects coverage gaps. Proposes new scouts. Sometimes proposes deprecating one.

The meta agents do not edit prompts directly. Cowork doesn't support self-modifying agents at runtime. The proposal lands as a draft for me to apply locally in Claude Code, commit, push, reinstall. That manual gate is intentional — a system that rewrites itself silently is a system you stop trusting.

Layer 6 — Judgment

Me. The only loop that doesn't close on its own.

Four custom skills — /pm-brief, /pm-status, /pm-strategize, /pm-approve — let me invoke any layer on demand from inside Cowork. The /pm-status skill returns system health: which scouts ran, which are stale, the approval queue depth, the cost meter. The other three are the daily operating handles.

The decision the system rests on

A five-tier autonomy model. Every agent declares the tier its actions live in. The action-executor refuses anything from T4 or T5 without an explicit approval token.

Tier	Action	Mode
T1	Read anything (Gmail, Slack, GitHub, Jira, Calendar, Confluence, web)	Auto
T2	Write to memory / Obsidian / scout outputs / decision log	Auto, audited
T3	Draft artifacts (PRDs, memos, ticket drafts, strategy drafts, Slack drafts, email drafts)	Auto, lands in `drafts/`
T4	Create real Jira tickets / publish Confluence / post internal Slack / merge internal PR	Requires `/pm-approve <id>`
T5	External comms (Gmail to partners, public posts, customer messages, deploys)	Human-sent only

Coded into the runtime, not negotiable. That refusal is what makes the system trustworthy enough to leave running while I sleep.

A weekly retrospective lands every Friday

cowork · pm-os · briefing.txt

PM OS · Weekly Retrospective · 2026-05-02 · 17:00 IDT

── SYSTEM HEALTH ──────────────────────────────────────
20 / 20 schedules ran on time
4 convergences detected this week (avg score: 5.8)
12 drafts produced — 8 approved, 3 overruled, 1 killed
0 Tier-4 actions executed without approval (good)

── YOU OVERRULED MOST OFTEN WHEN ──────────────────────
The system suggested merging two surfaces.
The system underweighted regulatory separation.
The system surfaced a competitor signal that wasn't real signal.

  → meta-agent will draft a prompt update for the
    pattern-detector to weight regulatory signals higher.
    Lands Sunday for your review.

── WHAT YOU SHIPPED ───────────────────────────────────
3 specs · 1 memo · 7 Jira tickets · 2 Confluence pages

── WHAT YOU KILLED ────────────────────────────────────
1 strategy draft — system thought it was a convergence.
You read the underlying signals and disagreed. Logged.

The retrospective is the part of the system that learns me. It is where the agents notice my patterns of overrule and propose tuning that makes them better at being wrong less often.

On-demand strategy

When I want a fresh take on a topic, usually something a stakeholder asked me to weigh in on, I run /pm-strategize. The synthesis layer goes deeper than the briefing format and produces a structured argument.

cowork · pm-os · briefing.txt

PM OS · /pm-strategize "indirect MASAV scaling to N programs"

── FRAME ──────────────────────────────────────────────
The question: as we scale from 1 indirect-representation
program to N, what changes in the platform contract?

── WHAT YOU'VE ALREADY DECIDED ────────────────────────
Per audit-log review: jurisdiction is first-class in the
data model (decision logged 2026-03-12). Audit-log is
canonical event log, not a downstream report (logged 2026-04-04).

── WHAT'S NEW IN THIS QUESTION ────────────────────────
N programs requires:
  1. A program-onboarding flow (currently bespoke).
  2. A regulator-facing audit summary that scales.
  3. Versioning of the indirect-representation contract
     so program K's terms don't accidentally update
     program K-1's posture.

── DRAFTED RECOMMENDATION ─────────────────────────────
[Memo of 600 words drafted. Lands in your drafts folder.]

── DISSENTING VIEW (intentionally surfaced) ───────────
The pm-strategist agent flagged a counter-position:
"Don't build the program-onboarding flow yet. Wait for
the third program. The first two are too unique to abstract."

You decide.

The dissenting view is the favorite part. The agents are tuned to surface their own counter-arguments, not to hedge, but because the strongest decisions come from a real debate, not a single-track recommendation.

The first time it caught something I had missed

Three weeks after going live, the morning briefing flagged a convergence I would have missed.

A customer-voice ticket from a sponsor-bank ops person. A Slack thread from an engineer about an unrelated-looking schema change. A regulatory-scout note about an EU directive that was not in our jurisdiction.

The agents tied them together. The schema change would, in three quarters, prevent us from supporting a capability the EU directive would require if we ever expanded there. Not a current problem. A future blocker, identifiable now if you read three signals together.

I would not have caught that. I read each of those channels separately, on separate days, with separate brain modes. The agents read them at the same time, with the same brain.

I logged the decision (don’t change the schema) and the reason (preserves optionality for EU expansion). The retrospective agent picked it up the next week and tuned the pattern-detector to weight cross-channel temporal-distance signals more heavily.

The system caught a future blocker, told me, learned from how I responded, and got slightly better at catching the next one.

That is the moment I stopped thinking of PM OS as a tool and started thinking of it as a colleague.

What failed, and what we changed

PM OS does not always work. Three calibration loops worth surfacing.

The slack-responder. The original design had an agent that read incoming Slack DMs and drafted replies for me to approve. After two weeks I disabled it. The drafts were fine. The friction was that approval-mediated replies broke the rhythm of a conversation. People text me on Slack expecting a fast human response. The right place for an AI-drafted message is not Slack, where conversational latency is the medium. The plist is preserved in the repo as .disabled — kept as a reminder of what shape this layer should not take.

The first version of the morning briefing. It was three pages long, dense, organized by source. I stopped reading it past page one within a week. The retrospective agent caught the pattern (briefings opened, scrolled to mid, closed, no overrules logged) and proposed a prompt edit: tighten to a single page, lead with what needs decision today, defer engineering pulse to a smaller section. The next week's briefings were the format above. Approval rate doubled.

The pattern-detector's first month. It surfaced false convergences too often. The signals were tied together by topic but not by temporal proximity, so they were not actually convergences, just topical clustering. The retrospective trail showed me overruling roughly two of every three. The fix was a prompt edit weighing temporal distance into the convergence score and raising the threshold for what counts as "three independent sources." Overrule rate dropped to one in five.

The repo's commit history is the record of this calibration loop. git log is the audit log of how the system learned what its operator actually wanted.

The bet on the next 24 months

The PM job did not shrink. It got concentrated.

What is left for me is harder. Which convergence is real. Which draft goes out. Which never should have been written. Which strategic direction deserves to compound.

The audit trail of approvals and overrules is now the most important PM artifact I produce. The retrospective agent reads it weekly. The meta-agent reads its findings monthly. The system tunes itself toward a moving target — me, eighteen months from now, in a job that is already slightly different from the job I have today.

The frontier I am paying attention to is the T3 → T4 boundary. Right now, T3 (drafts) is fully autonomous and T4 (acts in real systems) requires explicit approval. The drafts are getting good enough that the approval step is sometimes a rubber stamp. When that happens, the question becomes: should certain T4 actions earn graduated trust? A Jira ticket creation in a sprint that is already approved, against a feature already prioritized, in a project the meta-agent has watched me approve thirty times in a row — should that still require /pm-approve?

I think the answer is yes for now and probably no in eighteen months. The shape of the trust will be a per-action confidence threshold, surfaced in the briefing. The system will say I am 96% confident this matches your prior pattern, executing in 60 seconds unless you intervene. That is a different relationship to autonomy than today. It is also the relationship that makes the engine genuinely free up the operator's attention, not just batch the operator's approvals.

The PMs who win the next two years are the ones who design that boundary deliberately, who write the audit log carefully enough that the trust threshold can be inspected, and who keep one hand on the kill switch while the engine runs hotter and hotter.

The engine room runs underneath. I run product.

Repo private. If you want a deeper walkthrough — the actual agent prompts, the convergence-scoring logic, the calibration commits — send an email.