Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't understand why code passing tests wouldn't be protection against most forms of hallucinations. In code, a hallucination means an invented function or method that doesn't exist. A test that uses that function or method genuinely does prove that it exists.

It might be using it wrong but I'd qualify that as a bug or mistake, not a hallucination.

Is it likely we have different ideas of what "hallucination" means?



  > tests wouldn't be protection against most forms of hallucinations.
Sorry, that's a stronger condition that I intended to communicate. I agree, tests are a good mitigation strategy. We use them for similar reasons. But I'm saying that passing tests is insufficient to conclude hallucination free.

My claim is more along the lines of "passing tests doesn't mean your code is bug free" which I think we can all agree on is a pretty mundane claim?

  > Is it likely we have different ideas of what "hallucination" means?
I agree, I think that's where our divergence is. Which in that case let's continue over here[0] (linking if others are following). I'll add that I think we're going to run into the problem of what we consider to be in distribution, in which I'll state that I think coding is in distribution.

[0] https://news.ycombinator.com/item?id=44829891


Haven't you effectively built a system to detect and remove those specific kind of hallucinations and repeat the process once detected before presenting it to you?

So you're not seeing hallucinations in the same way that Van Halen isn't seeing the brown M&Ms, because they've been removed, it's not that they never existed.


I think systems integrated with LLMs that help spot and eliminate hallucinations - like code execution loops and search tools - are effective tools for reducing the impact of hallucinations in how I use models.

That's part of what I was getting at when I very clumsily said that I rarely experience hallucinations from modern models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: