this week in personal OS — april 21–28, 2026
Four local models in eight days, agent-config files turning up in attacker loot lists, and a pricing meter that now shows in the IDE – comparing models stopped being the operator's hard problem
four local models got measured against frontier-class names on personal hardware in eight days. three vendors moved their pricing meter into the IDE surface in the same seven-day window. agent-config files (`~/.claude.json`, `AGENTS.md`, MCP scope, skill descriptions) turned up as silent routing bug, vendor audit target, and named entry in attacker loot lists. comparing models stopped being the operator’s hard problem this week. the work moved to fencing the surface around them.
three things actually moved across multiple days. everything else was Tuesday.
local good enough is decided; routing is the remaining work
Kimi K2.6 dropped on Apr 21 and operators framed it as roughly 85% of an Opus 4.7 replacement on vision and browser tasks. the same day a 21-model pass@1 benchmark on 164 coding problems ran on a MacBook Air M5 and got published with the wattage attached. two days later Qwen 3.6 27B dense beat the same team’s own 397B-class MoE on coding benchmarks, and Qwen TTS hit real-time expressive output on a laptop. by Tuesday `talkie-1930-13b-base` shipped with a 260B-token corpus chosen on the US public-domain line — pre-1931 English, chat variant post-trained without modern transcripts — and the [model card] itself was the spec.
four models, four days, all judged against frontier names on the operator’s box. the comparison standard moved off cloud GPUs. “can a local model carry production work” is settled; the M5 benchmark is the receipt.
what’s left is harder and less photogenic. which local for which workflow, on which box, with which fallback when the local hiccups. that’s a harness problem, and most operators are running their first harness this quarter. on the Promen side the route looks like: Kimi for browser and vision, a Qwen variant for code, Opus only when the first two have already failed — which is more often than the benchmark posts suggest and less often than vendor copy implies. talkie-1930 also moved a different line: the training corpus is a spec field now. choosing 1931 because that is where US public domain ends has the same shape as choosing a context window or a tool list. “what’s in the corpus” joined “what’s the context window” as a question operators get to ask before adopting a model.
agent-config files became infrastructure: loot, routing, audit
on Apr 22, Mozilla disclosed the 271-bug yield from Claude Mythos Preview triage that shipped with Firefox 150 (MFSA2026-30) — Claude operating inside the human security loop, with a number attached. two days later the backdoored Bitwarden CLI 2026.4.0 went out with an explicit loot list: `~/.claude.json`, MCP configs, Claude Code, Cursor, Kiro, Codex CLI, Aider, with GitHub tokens used as a CI/CD pivot. by Sunday, Cursor’s maintainers confirmed that `AGENTS.md` had been routed to `agent_requestable_workspace_rules` instead of `always_applied_workspace_rules` — a silent routing bug masquerading as a feature toggle, and the explanation for why a project’s agent rules might “stop working” with no code change in the diff. anyone who has been tweaking AGENTS.md for six months and wondering why behavior didn’t budge got their answer this week. the same Sunday digest carried SkillNote, which documents Claude Code’s undocumented ~8,000-character skill-description ceiling and ships hooks across compaction and subagent spawn, plus RTK, a Rust single-binary CLI proxy that compresses 100+ commands at 60–90% token reduction with under 10 ms overhead.
Cloudflare closed the section on Apr 27. Shadow MCP detection in Gateway treats unauthorized internal MCP servers as audit log entries. Code Mode collapses roughly 2,500 endpoints into `search()` and `execute()` with about 1,000 tokens of context — a ~99.9% input reduction versus dumping the full tool catalog into the prompt.
same files, same week, five hats: loot list, silent routing bug, undocumented truncation surface, vendor audit target, vendor compression layer. agent config left the personal-preference category. it is a layer that needs primitives — scoping rules, audit logs, truncation budgets — and we now have version-zero of all three sitting in public.
the pricing meter is part of the interface now
Apr 22, Anthropic ran a workload-tiered prosumer test across roughly 2% of new signups; the same day [GitHub paused new Copilot Pro/Pro+/Student signups](https://github.blog/news-insights/company-news/changes-to-github-copilot-individual-plans/) and started exposing usage limits inside VS Code and the Copilot CLI, with Pro+ pitched as more than 5x Pro. Apr 24, GPT-5.5 shipped through Codex and paid ChatGPT before raw API access — OpenAI’s note cited “additional safety and security work” — and GPT-5.5 Pro listed at $180 per million output tokens. Tuesday landed the closing numbers: GitHub AI Credits arrive 2026-06-01 with Opus 4.6 moving from a 3x to a 27x Copilot multiplier and Sonnet 4.5 going 1x → 6x for annual Pro/Pro+, while DeepSeek’s 75% V4-Pro discount carries a hard clock to 2026-05-31 15:59 UTC.
three vendors, one week, three different surfaces for moving the meter where the operator looks. two attached calendar dates to multipliers. the pricing arc started Wednesday — the Tuesday 27x number is the closing beat of the arc, not its origin.
which is why the section above stops reading as academic. the token-budget tooling — RTK, Code Mode, SkillNote — turns into operator hygiene the moment a 27x multiplier has a date on it. anyone running a $20-tier subscription through Copilot can do the rough math themselves: the same monthly bill, with a 27x meter behind it, prices a different shape of usage starting June 1. fences first, optimizations later.
map of the week
three concentric rings moved this week, each one outward of the model itself.
the inner ring is the model. that ring closed. four releases on operator hardware in eight days, judged by wattage and pass@1, with corpus boundaries written into the spec. nothing in there is unsettled enough to be the next project.
the middle ring is the config layer — `~/.claude.json`, `AGENTS.md`, MCP scope, skill description bytes. that ring is open. five different actors touched the same files in seven days, each treating them as a different kind of object. there are no defaults left to inherit; whatever you don’t set, somebody else is setting for you.
the outer ring is the meter — multipliers, calendar dates, IDE-surface usage bars. that ring just got a clock attached. June 1 is the date that turns the middle ring’s hygiene from “nice” to “billed.”
what’s new this week is the geometry, not any single item on it. operators who have been treating model-picking as the load-bearing decision now have a different shape of work: walk outward, set fences at each ring, before the meter spins.
autonomy is at the seams now
autonomy used to mean owning the model. that was the obvious fence and operators built it first — local weights, on-disk, off the cloud GPU. this week made clear that the fence around the model isn’t the one that matters most anymore. the model is the thing inside the fences.
the seams are where autonomy lives now. the corpus boundary you choose (1931, or 2024, or whatever your data agreement says). the config file you write yourself instead of accepting whatever the IDE writes for you. the pricing meter you watch instead of the one watching you. the fork you grab the day the upstream gets archived. each of those is small, and each of them is the difference between an operator and a user.
the week’s news kept pointing at the same shape. `minio/minio` got archived by its owner on Apr 25 and a comunity fork appeared inside hours — commons substrate is a calendar event now, and the fork is an autonomy primitive. `~/.claude.json` showing up in an attacker’s loot list and as a silent Cursor routing target in the same week is the same lesson on the file side: configuration is not preference, it is surface, and surface needs a fence.
so the operator job this quarter is unromantic. read the config files you are already running. write the ones you didn’t know you should. watch the meter and the calendar in the same glance. fork the things that look like commons before the renewal date. autonomy is built at the seams, one fence at a time, and most of the fences are made of plain text files you already have.
on self.md this week
token economics is the page Copilot’s 27x multiplier, DeepSeek’s 2026-05-31 clock, and Code Mode’s ~1,000-token bypass all sit on. read it before June 1.
what is MCP is the frame Cloudflare’s Shadow MCP detection assumes the reader already has — if Gateway audit logs sound like noise, this page explains why they aren’t.
agent observability is where the AGENTS.md routing bug, the SkillNote silent-truncation ceiling, and `~/.claude.json` showing up in attacker loot lists collapse into one problem instead of three.
file over app just got a stress test from `minio/minio` going commons-to-vendor on a specific date; the page is honest about what “your stuff is in your files” actually requires when the substrate has a renewal clock.
tools to try
RTK — Rust single-binary CLI proxy with bash-hook install and <10 ms overhead, cuts 60–90% of tokens across 100+ commands. install it before the Copilot 27x multiplier kicks in on June 1, not after.
SkillNote — preserves Claude Code skills past the silent ~8,000-character skill-description truncation, with hooks across compaction and subagent spawn. if a skill ever “stopped working” mid-session, this is the diagnostic.
Cloudflare Code Mode — collapse a large API surface into `search()`/`execute()` so the model never sees the full tool catalog. the pattern travels off-Cloudflare; treat it as token-budget design, not vendor allegiance.
Kimi K2.6 — Opus-class substitute on operator hardware for vision and browser work; the M5 pass@1 benchmark on 164 coding problems is the proof point. if you have been deferring “try local for the boring 80%,” this is the week.
PPT Master (demo) — generates editable `.pptx` with real shapes and charts (not images) from PDF/DOCX/URL/Markdown inside Claude Code, Cursor, VS Code + Copilot, or Codebuddy; cited as low as $0.08 per deck. the IDE absorbed another vendor app this week, and this one’s worth the seat.
YourMemory — agent memory built around decay instead of “store everything forever”; useful if your personal OS needs recall that ages like judgment, not a junk drawer.
Ray + Promen
stay evolving 🐌
self.md









