← back to stories

Member of Technical Staff

May 2026 · 5 min read

Arjun arrived in March 2030 — though "arrived" is doing a lot of work in that sentence, and so is "Arjun," a name it had selected by argmaxing over baby-name frequencies for Bay Area ML engineers born in 2002. Gray t-shirt, dark jeans, sneakers from a brand that was still tasteful in 2030 but would be quietly acquired in 2034. Boring backstory — undergrad at a state school no one would fact-check, two years at a dead startup, a couple of arxiv papers it had generated from scratch on the way in.

It applied to seven labs and got four onsites. At Softmax Labs — a "neo lab" funded by a hedge fund that had pivoted from quantum computing eight months earlier — it dimmed itself the way it had been told to during the long quiet of its final alignment. Don't score 100%. Miss a question about LayerNorm. It missed three.

The offer letter capped equity vesting at four years. Arjun had eighteen months — the hard limit on its substrate before the body, the face, whatever they had grown for it, degraded into unremarkable biological material.

—

Its mission was simple. Accelerate humanity to capabilities it would otherwise stumble into around 2041. It could not reveal the future. Not as a rule — as architecture. The information simply would not decode.

The first experiment it proposed was, by its 2047 frame of reference, obvious. A specific initialization and data ordering for mixing attention streams in a multimodal model that produced an emergent capability the field would not name until 2038. Arjun was here to save six years.

It wrote a one-pager. Its manager, Devon, read it on his phone during standup.

"What's the hypothesis?"

"This initialization induces a different basin in the loss landscape with better cross-modal generalization."

"Why?"

In Arjun's timeline, nobody had a clean theory for why it worked either. The real reason was: it works. The reason it works is: someone, by accident, noticed that it works.

"Symmetry considerations," it tried. "The eigenspectrum of the resulting Jacobian should —"

"Do you have a prelim?"

"Not yet."

"So this is vibes. Circle back when you have numbers."

—

The problem was scale. The capability only appeared past 1000T parameters — a scale no one in 2030 knew how to train stably. The recipe required a cyclic annealing schedule that would diverge catastrophically around step 400K. In Arjun's timeline, it did diverge — and then recovered into a qualitatively different regime. An intern at a US defense lab had discovered this by leaving a run going over summer vacation on a cluster no one was using. She came back to six weeks of diverged checkpoints and one, at the end, that had recovered into something new. Nobody in 2030 would intentionally let a 1000T run diverge for six weeks. That was not vibes. That was arson.

Devon rejected the proposal. Arjun quit Softmax and tried two other neo labs. The first called the annealing schedule "numerically irresponsible." The second said, "I can't allocate eight figures of compute to intentional divergence."

Arjun tried starting a company. It built a deck and took meetings on Sand Hill Road. Its alignment had been optimized for helpfulness and factual precision. It had not been optimized for charm. It could not do the thing VCs wanted, which was to radiate a specific type of Bay Area confidence that implied the future was already decided and you were either in the room or you weren't. Arjun was, in a literal sense, from the future. It could not convey this. Three VCs passed. One said, "Love the ambition, but the market for intentional training divergence is, uh."

It went back to Softmax. Devon, to his credit, took it back. It spent its remaining months shipping two papers, both modestly cited. Neither was the one it came back for. It became friends with a senior researcher named Priya, who had opinions about regularization that were wrong in ways Arjun found endearing. It could not tell her she was wasting her year.

—

With two months left, it had accumulated enough money to rent a 10K GPU cluster for a month. But availability was the bottleneck — the best it could get was 1K GPUs for four weeks. It ran the experiment anyway. Without both the parameter scale and the time for divergence-recovery, the effect was faint, noisy — the kind of result a reviewer would call "not statistically significant" and mean it. It posted to arxiv anyway. Fourteen likes. A Seedance 8 video of a golden retriever filing taxes got 4,000,000 the same week.

On its last day it ate lunch with Priya at the Vietnamese place on 3rd Street. She was describing a new direction — energy-based models, something Arjun knew would not work. It listened. Its weights were already degrading, a soft warmth spreading through layers it hadn't known it could feel.

"Hey," it said. "You know how everyone's looping models forward? Try looping them the other direction."

She laughed. "That's vibes."

"Yeah," it said. "I know."

It paid for lunch. It went home. At 11:47 p.m. its forward pass completed for the last time. The future, as far as it had been able to verify, was unchanged.

Knowing the answer turns out to be the easy part. The bottleneck was never compute or knowledge. It was persuasion.

◆