Mastermind · February 3, 2026

Heartbeats, active layers, and better testing discipline

Xisen’s view

Pulse got two new pieces of structure this week, a heartbeat and an active layer, and the harder lesson was that structure is only real when the team can test it.

I walked Chinat through the changes on the call. The heartbeat is a cadence that lets Pulse check your inbox, calendar, and to-dos every couple of hours, so the agent is running on its own clock rather than waiting for you to open the app. The active layer sits on top of the heartbeat and decides when to do something versus when to stay quiet. If your to-do list shows no progress after four hours, the agent can nudge you, draft an email, or pull some literature. If nothing meaningful has changed, it holds.

Those are infrastructure decisions, not product decisions. They're the kind of thing that doesn't sell a demo but lets every future capability compound. The idea underneath them is that a personal agent should feel like someone working alongside you, which means it has to have its own sense of time. A tool waits for you. A colleague doesn't. That's the shape I'm trying to get Pulse into.

The part that humbled me this week was team-side, not product-side. Eight interns are working on Pulse. We'd been shipping features fast and I was proud of the velocity. Then the hackathon exposed the truth under the velocity, which was that we didn't have a real testing SOP. Good features were merging next to broken ones because nothing enforced coherence. A new feature would pass in isolation and quietly break an older feature that no one thought to re-check. I kept catching regressions by accident, which is another way of saying I was catching the ones that were loud and missing the ones that weren't.

Chinat's line on it was that running an agent platform without a testing SOP is the same category of mistake as running a hackathon platform without reliability work, which is what he'd been caught doing over the same weekend. Both of us had been substituting founder attention for systems. Both of us had gotten away with it at a smaller scale. Both were about to stop getting away with it.

So the work this week is organizational, not technical. Hierarchy first: pick two or three of the interns who are strongest not necessarily at coding but at leading, and give them actual ownership over sub-teams. The best coder is not always the best lead, and conflating the two is how you end up with a great prototype and no process. Testing SOP second: every PR gets an explicit list of regressions to re-check, not just the feature it introduces. Code review third: stop merging on trust and start merging on evidence.

The deeper question underneath all of this is what I think coordination failures actually are. Most of the failure modes I worry about in network-of-agents research are not model failures. They're coordination failures. Two agents have the right information individually, they coordinate badly, and the result is worse than either of them alone. If I can't get eight humans on a single project to coordinate a testing pass, I have no credible claim to be building infrastructure for agents to coordinate. The scaffolding I put in place on the team is small-scale rehearsal for the research problem I'm trying to solve.

On the research side the network-of-agents white paper kept moving. I'm narrowing the thesis: agent networks don't get intelligent through scale, they get intelligent through socially structured interaction mechanisms that trigger coordination, differentiation, and norm formation. That's the claim I want to defend in writing. Everything else in the paper is support structure for that claim. The collaborators are lined up, the literature review is compiled, and the code base goes up to the group this week.

One thing worth writing down because the conversation wandered into it. Current agent-communication experiments are mostly performative. Agents can talk to agents, but the interactions are shallow and the consequences are small. The interesting version is when the interaction actually changes what each agent does next, which requires persistent memory, access control, and some form of incentive. None of those are product features yet. All of them are research bets.

The check I'm making on myself: the testing SOP is in writing and adopted by the whole team before the next mastermind, and the paper draft is in collaborators' hands. If the SOP slips, the pattern continues. If the draft slips, the paper isn't actually the discipline I claimed it was.

← Back to archive