Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The largest context that I am aware that an open-source model (e.g. qwen) can manage is 1M tokens. This should translate to ~30kLoC. I'd envision that this could in theory work even on large codebases. It certainly depends on the change to be done but I can imagine that ~30kLoC of context is large enough for most of the module-specific changes. Possibly the models that you're using have a much smaller context window?

Then again, and I am repeating myself from other comments I made here in the topic, there's also Devon which pre-processes the codebase before you can do anything else. That kinda makes me wonder if current limitations that people observe in using those tools are really representative of what might be the current state of the art.



If you don't mind me asking, what size of codebases do you typically work on? As mentioned I've tried using all the available commercial models and none work better than as a helpful autocomplete, test, and utility function generator. I'm sure maybe big players like Meta, OpenAI, MS, etc do have the capability of expanding its context for their own internal projects and training specifically on their code, but most of the rest of us can't feasibly do that since we don't own our own AI moat.

Even on my personal projects and smaller internal projects that are small toy projects or utility tools I sometimes struggle to get them to build anything significant. I'm not saying its impossible, but I always find it best at starting things from scratch, and small tools. Maybe its just a sign that AI would be best for microservices.

I've never used Devon so I can't speak to it, but I do recall seeing it was also overhyped at best and struggled to do anything it was purported to be able to in demos. Not saying that this is still true.

I would be interested in seeing how Devon performs on a large open source project in real-time (since if I recall their demos were not real-time demonstrations) for instance just to evaluate its capabilities.


Several millions lines of code. Can't remember any project that I was involved with and that was less than 5MLoC. C++ system level programming.

Overhyped or not Devon is using something else under the hood since it is pre-processing your whole codebase. It's not "realtime" since it simulates the CoT meaning that it "works" on the patch the very same way a developer would. and therefore it will give you a resulting PR in few hours AFAIR. I agree that a workable example on more complex codebase would be more interesting.

> I've tried using all the available commercial models and none work better than as a helpful autocomplete, test, and utility function generator

That's the why I mentioned qwen because I think commercial AI models do not have such a large window context size. Perhaps, therefore an experience would have been different.


And you have had luck with models like the one you mentioned and Devon generating significant amounts of code in these codebases? I would love to be able to have this due to the productivity gains it should allow but I've just never been able to demonstrate what the big AI coding services claim to be able to do at a large scale.

What they already do is a decent productivity boost but not nearly as much as they claim to be capable of.


As I already said in my first comment, I haven't used those models and any of them would have been forbidden at my work.

My point was rather that you might be observing suboptimal results only because you haven't used the models which are more fit, at least hypothetically, for your use case.


I've heard pretty mixed opinions about the touted capabilities of Devon.

https://www.itpro.com/software/development/the-worlds-first-...


That's good news for us I suppose.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: