Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cursor's current business model produces a fundamental conflict between the well-being of the user and the financial well-being of the company. We're starting to see these cracks form as LLM providers are relying on scaling through inference-time compute.

Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

If you prune out context from the initial prompt, instead of reasoning on richer context, the llm reasons only on the prompt itself (w/ no access to the attached files). After the thinking process, Cursor runs function calls to retrieve more context, which entirely defeats the point of "thinking" and induces the model to create incoherent plans and speculative edits in its thinking process, thus explaining Claude's bizarre over-editing behavior. I suspect this is why so many Cursor users are complaining about Claude 3.7.

On top of this, Cursor has every incentive to keep the thinking effort for both o3-mini and Claude 3.7 to the very minimum so as to reduce server load.

Cursor is being hailed as one of the greatest SAAS growth stories but their $20/mo all-you-can-eat business model puts them in such a bad place.



>Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. While that seems like a perfectly reasonable strategy, it starts to fall apart when integrating reasoning models.

In general I feel like this was always the reason automatic context detection could not be good in fixed fee subscription models - providers need to constrain the context to stay profitable. I also saw that things like Claude Code happily chew through your codebase, and bank account, since they are charging by token - so they have the opposite incentive.


> This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

Keep in mind that what we call "reasoning" models today are the first iteration. There's no fundamental reason why you can't do what you stated. It's not done now, but it can be done.

There's nothing stoping you from running "tinking" in "chunks" of 1-2 paragraphs, doing some search, and adding more context (maybe from pre-reasoned cache) and continuing the reasoning from there.

There's also work being done on think - summarise - think - summarise - etc. And on various "RAG"-like thinking.


This is only surface-level deep. Cursor already has Quotas for their paid plans and Usage-based Pricing for their larger models, which I run into and fall over to their usage based model every month.

Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.


If you’re going to pay on the margin, why not use those incremental dollars running the same requests on cline? I’m assuming cost is the deciding factor here because, quality-wise, plugging directly into provider apis with cline always does a much better job for me.


Good callout, will try! I haven't considered switching tools, it's mostly convenience of just continuing, instead of stopping mid-way through and switch out the tools. But also I only code intermittently, a couple of days a week at most these days, because it's only part of what I do, so I can get to experiment with new tooling much less than i'd like.


Cheers, give it a shot. Cline runs as an extension within Cursor, so you can use it to augment your existing workflows with almost zero disruption.


> Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

There's nothing about this that conflicts with reasoning models, I'm not sure what you mean here.


what i mean is that their implementation (thinking only on the first response) renders zero benefit because it doesn’t see the code itself. They run multiple function calls to analyze your codebase in increments. If they ran the thinking model on the output of those function calls, then performance would be great but, so far, this is not what they are doing (yet). It also dramatically increases the cost of running the same operation.


But the way those models work is to run everything once the function calls come in. Are you saying cursor is not using the model you selected on function calls responses?


This sounds like a Cursor issue, not something that effects reasoning models in general.

edit: Ah, I see what you mean now.


That's my point. Cursor, by offering unlimited requests (500 fast requests + unlimited slow requests) to people paying a fixed $20/mo, they've put themselves into a ruthless marginal cost optimization game where one of their biggest levers for success is reducing context sizes and discouraging thinking after every function call.

Software like Claude Code and Cline do not face those constraints, as the cost burden is on the user.


> Cursor has been trying to do things to reduce the costs of inference, especially through context pruning.

You can also use cline with gemini-2.0-flash, which supports a huge context window. Cline will send it the full context and not prune via RAG, which helps.


I've just tried gemini-2.0-flash, this is an incredible model that's great for making edits. I haven't tried any heavy lifting with it yet but It's replaced Claude for a lot of my edits. It's also great at agentic stuff too!


I love cline but i’ve never tried the gemini models with it. I’ll give it a shot tonight, thanks for the tip!


Or you can also use Gemini Code Assist extension for VS Code, which is basically free, but so far, the code it wrote almost never worked for me. So far I use only Claude 3.7 or Grok in chat mode. Almost no model, as of today, is good at coding.


Did Grok 3 finally get an API?


I think you're right, but what company's business model doesn't produce a conflict between the user's well-being and the company's finances?


Reflecting on your comment I realized that using a huge amount of GPUs is akin to an Turing machine approaching infinite speed. So I think the promise of LLMs writing code is basically saying: if we add a huge number of reading/writing heads with unbounded number of rules, we can solve decideability. Because what is the ability to generate arbitrarily complex code if not solving the halting problem? Maybe there's a more elegant or logical way to postulate this, or maybe I'm just confused or plain wrong, but it seems to me that it is impossible to generate a program that is guaranteed to terminate unless you can solve decideability. And throwing GPUs at a huge tape is just saying that the tape approaches infinite size and the Turing machine approaches infinite speed...

Or put another way, isn't the promise of software that is capable to generate any software given a natural language description in finite time basically assuming P=NP? Because unless the time can be guaranteed to be finite, throwing GPU farms and memory at this most general problem (isn't the promise of using software to generating arbitrary software the same as the promise that any possible problem can be solved in polynomial time?) is not guaranteed to solve it in finite time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: