Cursor's current business model produces a fundamental conflict between the well...

rafaelmn · on March 12, 2025

>Cursor has been trying to do things to reduce the costs of inference, especially through context pruning. For instance, if you "attach" files to a conversation, Cursor no longer stuffs the code from those files into the prompt. Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. While that seems like a perfectly reasonable strategy, it starts to fall apart when integrating reasoning models.

In general I feel like this was always the reason automatic context detection could not be good in fixed fee subscription models - providers need to constrain the context to stay profitable. I also saw that things like Claude Code happily chew through your codebase, and bank account, since they are charging by token - so they have the opposite incentive.

NitpickLawyer · on March 12, 2025

> This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

Keep in mind that what we call "reasoning" models today are the first iteration. There's no fundamental reason why you can't do what you stated. It's not done now, but it can be done.

There's nothing stoping you from running "tinking" in "chunks" of 1-2 paragraphs, doing some search, and adding more context (maybe from pre-reasoned cache) and continuing the reasoning from there.

There's also work being done on think - summarise - think - summarise - etc. And on various "RAG"-like thinking.

Roritharr · on March 12, 2025

This is only surface-level deep. Cursor already has Quotas for their paid plans and Usage-based Pricing for their larger models, which I run into and fall over to their usage based model every month.

Imo most of their incentive on context-pruning comes not just from reducing the token amount, but from the perception that you only have to find "the right way"tm to build that context window automatically, to get to coding panacea. They just aren't there yet.

laborcontract · on March 12, 2025

If you’re going to pay on the margin, why not use those incremental dollars running the same requests on cline? I’m assuming cost is the deciding factor here because, quality-wise, plugging directly into provider apis with cline always does a much better job for me.

Roritharr · on March 18, 2025

Good callout, will try! I haven't considered switching tools, it's mostly convenience of just continuing, instead of stopping mid-way through and switch out the tools. But also I only code intermittently, a couple of days a week at most these days, because it's only part of what I do, so I can get to experiment with new tooling much less than i'd like.

laborcontract · on March 20, 2025

Cheers, give it a shot. Cline runs as an extension within Cursor, so you can use it to augment your existing workflows with almost zero disruption.

IanCal · on March 12, 2025

> Instead, it'll run function calls to open those files and read bits and pieces of the code until the model feels it has enough information. This seems like a perfectly reasonable strategy until you realize you cannot do the same thing with reasoning models, if you're limiting the reasoning to just the initial prompt.

There's nothing about this that conflicts with reasoning models, I'm not sure what you mean here.

laborcontract · on March 12, 2025

what i mean is that their implementation (thinking only on the first response) renders zero benefit because it doesn’t see the code itself. They run multiple function calls to analyze your codebase in increments. If they ran the thinking model on the output of those function calls, then performance would be great but, so far, this is not what they are doing (yet). It also dramatically increases the cost of running the same operation.

IanCal · on March 12, 2025

But the way those models work is to run everything once the function calls come in. Are you saying cursor is not using the model you selected on function calls responses?

throwaway314155 · on March 12, 2025

This sounds like a Cursor issue, not something that effects reasoning models in general.

edit: Ah, I see what you mean now.

laborcontract · on March 12, 2025

That's my point. Cursor, by offering unlimited requests (500 fast requests + unlimited slow requests) to people paying a fixed $20/mo, they've put themselves into a ruthless marginal cost optimization game where one of their biggest levers for success is reducing context sizes and discouraging thinking after every function call.

Software like Claude Code and Cline do not face those constraints, as the cost burden is on the user.

MrBuddyCasino · on March 12, 2025

> Cursor has been trying to do things to reduce the costs of inference, especially through context pruning.

You can also use cline with gemini-2.0-flash, which supports a huge context window. Cline will send it the full context and not prune via RAG, which helps.

laborcontract · on March 21, 2025

I've just tried gemini-2.0-flash, this is an incredible model that's great for making edits. I haven't tried any heavy lifting with it yet but It's replaced Claude for a lot of my edits. It's also great at agentic stuff too!

laborcontract · on March 12, 2025

I love cline but i’ve never tried the gemini models with it. I’ll give it a shot tonight, thanks for the tip!

greyman · on March 13, 2025

Or you can also use Gemini Code Assist extension for VS Code, which is basically free, but so far, the code it wrote almost never worked for me. So far I use only Claude 3.7 or Grok in chat mode. Almost no model, as of today, is good at coding.

MrBuddyCasino · on March 13, 2025

Did Grok 3 finally get an API?

sandbach · on March 13, 2025

I think you're right, but what company's business model doesn't produce a conflict between the user's well-being and the company's finances?

namaria · on March 12, 2025

Reflecting on your comment I realized that using a huge amount of GPUs is akin to an Turing machine approaching infinite speed. So I think the promise of LLMs writing code is basically saying: if we add a huge number of reading/writing heads with unbounded number of rules, we can solve decideability. Because what is the ability to generate arbitrarily complex code if not solving the halting problem? Maybe there's a more elegant or logical way to postulate this, or maybe I'm just confused or plain wrong, but it seems to me that it is impossible to generate a program that is guaranteed to terminate unless you can solve decideability. And throwing GPUs at a huge tape is just saying that the tape approaches infinite size and the Turing machine approaches infinite speed...

Or put another way, isn't the promise of software that is capable to generate any software given a natural language description in finite time basically assuming P=NP? Because unless the time can be guaranteed to be finite, throwing GPU farms and memory at this most general problem (isn't the promise of using software to generating arbitrary software the same as the promise that any possible problem can be solved in polynomial time?) is not guaranteed to solve it in finite time.