Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For a loosely similar 'benchmark', I recently tried to test major LLMs on my coding game (models write code controlling their units in a 1v1 RTS) - https://yare.io/ai-arena


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: