Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
WarmWash
24 days ago
|
parent
|
context
|
favorite
| on:
Sam Altman's response to Molotov cocktail incident
GLM 5.1, widely held up as the model at the heals, perhaps ever surpassing western models....
Gets 5% on ARC-AGI2 private set.
Chinese models are suspiciously good a benchmarks.
ctolsen
24 days ago
[–]
I mean, I could say the same about Gemini. 3.1 Pro tops a bunch of benchmarks out there but any practical use I've put it to it's underperforming both other proprietary and open weight models. Benchmarks are suspicious in general.
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Gets 5% on ARC-AGI2 private set.
Chinese models are suspiciously good a benchmarks.