Snake oil may be a bit strong, because snake oil *never* works (except maybe as ...

jazzypants · 2026-05-05T16:58:52 1778000332

I think the placebo effect might be a decent comparison. It works most of the time, and you don't worry about it as long as you fully believe in its efficacy. However, once the illusion is shattered, the positive effects are diminished, and you can never fully trust the solution again.

intended · 2026-05-05T11:19:36 1777979976

> has a pretty high chance of working.

for MVPs, mock ups, prototypes or in the hands of an expert coder. You can't let them go unsupervised. The promise of automated intelligence falls far short of the reality.

crimsoneer · 2026-05-05T12:58:59 1777985939

Not only "has a high chance of working", but you can pay more to make it more reliable. It really is striking trying to run a harness openClaw thing on a smaller or quantised model, really makes you realise how much we take for granted from SOTA models that was totally impossible just a year ago, in terms of complex, generally reliable tool use.

j45 · 2026-05-05T10:46:14 1777977974

Pretty high chance isn’t what the intent or impression the end user often has.

kergonath · 2026-05-05T12:56:03 1777985763

Indeed, and it is a complicated problem to solve. A GUI or CLI can hide footguns or make them less likely to be misused. But an AI agent is perfectly happy to use a wrecking ball to put a nail without any second thought or confirmation.

j45 · 2026-05-05T14:51:15 1777992675

It’s a human articulation problem.

When it receives a generic vague input it is free to interpret according to how its corpus fires like any human interaction.

How to articulate better is like writing a sentence that will stand the test of model updates.

kergonath · 2026-05-05T15:19:48 1777994388

Even then. I don’t have an example off the top of my head but even perfectly clear sentences can lead the agent to strange places. Even between humans, miscommunication is easy, but then anyone sensible would ask for confirmation if their interpretation is weird. But the LLM very rarely questions the user.

I don’t think it’s fair to blame the user here. The tool must be operated by normal users.

j45 · 2026-05-06T03:55:03 1778039703

I'm trying to think of other types of tooling that normal users can all use equally well, or in the best ways possible.