Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the link, the GSM8K result actually leads the pack in that table, but math is indeed underwhelming. Qwen 2.5 is in the lead, but bitnet isn't far behind and it takes 1/6th as much memory during inference, and was trained on less than 1/4 the number of tokens. Pretty cool.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: