The next article initially appeared on Medium and is being republished right here with the creator’s permission.
Ask 10 builders which LLM they’d suggest and also you’ll get 10 totally different solutions—and virtually none of them are primarily based on goal comparability. What you’ll get as a substitute is a mirrored image of the fashions they occur to have entry to, those their employer accepted, and those that influencers they observe have been quietly paid to advertise.
We’re all residing inside recursively nested walled gardens, and most of us don’t understand it.
The entry drawback
In company environments, the mannequin choice usually occurs by chance. Somebody on the group tries Claude Code one weekend, will get excited, tells the group on Slack, and out of the blue the entire group is utilizing it. No person evaluated options. No person ran a bakeoff. The choice was made by whoever had an organization card and a free Saturday.
That’s not a criticism—it’s simply how these items go. However it implies that when that very same particular person tells you their favourite mannequin, they’re actually telling you which ones mannequin they’ve had probably the most reps with. There’s a real studying perform at play: You get quicker, your prompts get higher, and the mannequin begins to really feel virtually intuitive. It’s not that the mannequin is objectively superior. It’s that you just’ve gotten good at utilizing it.
This issues greater than folks admit, as a result of a whole lot of this house runs on emotions somewhat than proof. Folks really feel good about Opus proper now. It feels highly effective; it feels good; it feels such as you’re utilizing the very best instrument obtainable. And possibly you might be. However ask somebody who’s paying for their very own tokens whether or not they really feel the identical manner, and also you are likely to get a extra calibrated reply. Pores and skin within the sport has a manner of sharpening opinions.
The affect drawback
There’s additionally some huge cash shifting by way of this house in ways in which don’t all the time get disclosed. Mannequin suppliers are spending actual funds to ensure the correct folks have the correct experiences—early entry, credit, invites to the correct occasions. Anthropic does it. OpenAI does it. This isn’t a scandal; it’s simply advertising, but it surely muddies the sign significantly. When somebody you observe is effusive a few mannequin, it’s value asking whether or not they arrived at that opinion by way of sustained use or by way of a curated demo atmosphere.
In the meantime, some builders—particularly these constructing within the open—will use no matter doesn’t value an arm and a leg. Their enthusiasm for a mannequin could be extra about its pricing tier than its functionality ceiling. That’s additionally a legitimate sign, but it surely’s not the identical sign.
The alignment drawback (the opposite one)
Then there are the geopolitical issues. Some builders are intentionally avoiding Qwen and GLM because of issues concerning the nations they originate from. Others are utilizing them as a result of they’re compelling, succesful fashions that occur to be dramatically cheaper. Each camps suppose the opposite is being naive. It is a actual dialog that doesn’t have a clear reply, but it surely’s occurring principally beneath the floor.
What I’ve really been doing
I’ve been forcing myself to check exterior my consolation zone. I’ve spent the final week utilizing Codex critically—not casually—and my expertise thus far is that it’s practically indistinguishable from Claude Sonnet 4.6 for many coding duties, and it’s working at roughly half the fee while you consider how effectively it makes use of tokens. That’s not a small distinction. I wish to reside with it longer earlier than I’ve a agency opinion, however “every week” is the minimal threshold I’d set for any mannequin analysis. Something much less and also you’re simply ranking your first impression.
I’ve additionally began utilizing Qwen and GLM-5 critically. Early outcomes are fascinating. I’ve had some compelling successes and some jarring errors. I’ll reserve judgment.
What I’ve observed with my very own Anthropic utilization is one thing value naming: I default to Haiku for well-scoped, mechanical duties. Sonnet handles virtually every thing else with room to spare. Opus solely comes out after I want real breadth—structure questions, strategic framing, something with a genuinely vast scope. However I’ve watched folks in company environments depart the dial on Opus completely as a result of they’re not paying for tokens themselves. And right here’s the factor—that’s really not all the time to their benefit. Excessive-powered fashions overthink easy duties. They’ll add abstractions you didn’t ask for, restructure issues that didn’t want restructuring. When I’ve a clearly templated class to write down, Haiku will get it proper at a tenth of the fee, and it doesn’t second-guess the design.
The factor we must be speaking about
Everybody final month was exercised about what Sam Altman mentioned about power consumption. Advantageous. However I believe the extra urgent query is about advertising budgets and the way they’re distorting the collective understanding of those instruments. The benchmarks are beginning to really feel managed. The influencer protection is clearly formed. The entry packages create a optimistic bias amongst folks with the biggest audiences.
None of this implies the fashions are dangerous. A few of them are genuinely exceptional. However while you ask somebody which mannequin to make use of, you’re getting a solution that’s filtered by way of their employer’s procurement choices, the influencers they observe, what they will afford, and the way lengthy they’ve been utilizing that specific instrument. The reply you get tells you a large number about their scenario. It tells you virtually nothing concerning the mannequin.
Take all of it with applicable skepticism—together with this submit.

