The largest context that I am aware that an open-source model (e.g. qwen) can ma...

Jcampuzano2 · on Feb 11, 2025

If you don't mind me asking, what size of codebases do you typically work on? As mentioned I've tried using all the available commercial models and none work better than as a helpful autocomplete, test, and utility function generator. I'm sure maybe big players like Meta, OpenAI, MS, etc do have the capability of expanding its context for their own internal projects and training specifically on their code, but most of the rest of us can't feasibly do that since we don't own our own AI moat.

Even on my personal projects and smaller internal projects that are small toy projects or utility tools I sometimes struggle to get them to build anything significant. I'm not saying its impossible, but I always find it best at starting things from scratch, and small tools. Maybe its just a sign that AI would be best for microservices.

I've never used Devon so I can't speak to it, but I do recall seeing it was also overhyped at best and struggled to do anything it was purported to be able to in demos. Not saying that this is still true.

I would be interested in seeing how Devon performs on a large open source project in real-time (since if I recall their demos were not real-time demonstrations) for instance just to evaluate its capabilities.

menaerus · on Feb 11, 2025

Several millions lines of code. Can't remember any project that I was involved with and that was less than 5MLoC. C++ system level programming.

Overhyped or not Devon is using something else under the hood since it is pre-processing your whole codebase. It's not "realtime" since it simulates the CoT meaning that it "works" on the patch the very same way a developer would. and therefore it will give you a resulting PR in few hours AFAIR. I agree that a workable example on more complex codebase would be more interesting.

> I've tried using all the available commercial models and none work better than as a helpful autocomplete, test, and utility function generator

That's the why I mentioned qwen because I think commercial AI models do not have such a large window context size. Perhaps, therefore an experience would have been different.

Jcampuzano2 · on Feb 11, 2025

And you have had luck with models like the one you mentioned and Devon generating significant amounts of code in these codebases? I would love to be able to have this due to the productivity gains it should allow but I've just never been able to demonstrate what the big AI coding services claim to be able to do at a large scale.

What they already do is a decent productivity boost but not nearly as much as they claim to be capable of.

menaerus · on Feb 11, 2025

As I already said in my first comment, I haven't used those models and any of them would have been forbidden at my work.

My point was rather that you might be observing suboptimal results only because you haven't used the models which are more fit, at least hypothetically, for your use case.

vunderba · on Feb 11, 2025

I've heard pretty mixed opinions about the touted capabilities of Devon.

https://www.itpro.com/software/development/the-worlds-first-...

menaerus · on Feb 12, 2025

That's good news for us I suppose.