Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even if DeepSeek has figured out how to do more (or at least as much) with less, doesn't the Jevons Paradox come into play? GPU sales would actually increase because even smaller companies would get the idea that they can compete in a space that only 6 months ago we assumed would be the realm of the large mega tech companies (the Metas, Googles, OpenAIs) since the small players couldn't afford to compete. Now that story is in question since DeepSeek only has ~200 employees and claims to be able to train a competitive model for about 20X less than the big boys spend.


My interpretation is that yes in the long haul, lower energy/hardware requirements might increase demand rather than decrease it. But right now, DeepSeek has demonstrated that the current bottleneck to progress is _not_ compute, which decreases the near term pressure on buying GPUs at any cost, which decreases NVIDIA's stock price.


Short term, I 100% agree, but remains to be seen what "short" means. According to at least some benchmarks, Deepseek is two full orders of magnitude cheaper for comparable performance. Massive. But that opens the door for much more elaborate "architectures" (chain of thought, architect/editor, multiple choice) etc, since it's possible to run it over and over to get better results, so raw speed & latency will still matter.


I think it's worth carefully pulling apart _what_ DeepSeek is cheaper at. It's somewhat cheaper at inference (0.3 OOM), and about 1-1.5 OOM cheaper for training (Inference costs: https://www.latent.space/p/reasoning-price-war)

It's also worth keeping in mind that depending on benchmark, these values change (and can shrink quite a bit)

And it's also worth keeping in mind that the drastic drop in training cost(if reproducible) will mean that training is suddenly affordable for a much larger number of organizations.

I'm not sure the impact on GPU demand will be as big as people assume.


It does, but proving that it can be done with cheaper (and more importantly for NVidia), lower margin chips breaks the spell that NVidia will just be eating everybody's lunch until the end of time.


If demand for AI chips will increase due to Jevon’s paradox, why would Nvidia’s chips become cheaper?

In the long run, yes, they will be cheaper due to more competition and better tech. But next month? It will be more expensive.


The usage of existing but cheaper nvidia chips to make models of similar quality is the main takeaway.

It'll be much harder to convince people to buy the latest and greatest with this out there.


The sweet spot for running local LLMs (from what I'm seeing on forums like r/localLlama) is 2 to 4 3090s each with 24GB of VRAM. NVidia (or AMD or Intel) would clean up if they offered a card with 3090 level performance but with 64GB of VRAM. Doesn't have to be the leading edge GPU, just a decent GPU with lots of VRAM. This is kind of what Digits will be (though the memory bandwidth is going to be slower with because it'll be DDR5) and kind of what AMD's Strix Halo is aiming for - unified memory systems where the CPU & GPU have access to the same large pool of memory.


The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)


  The usage of existing but cheaper nvidia chips to make models of similar quality is the main takeaway.
So why not buy a more expensive Nvidia chip to run a better model?


Because if you don't have infinite money, considering whether to buy a thing is about the ratio of price to performance, not just performance. If you can get enough performance for your needs out of a cheaper chip, you buy the cheaper chip.


The AI industry isn't pausing because DeepSeek is good enough. The industry is in an arms race to AGI. Having a more efficient method to train and use LLMs only accelerates progress, leading to more chip demand.


There is no indication that adding more compute will give AGI


Is there still evidence that more compute = better model?


Yes. Plenty of evidence.

The DeepSeek R1 model people are freaking out about, runs better with more compute because it's a chain of thoughts model.


Selling 100 chips for $1 profit is less profitable than selling 20 chips for $10 profit.


Margin only goes down if a competitor shows up. Getting more "performance" per chip will actually let nvidia raise prices even more if they want.


Since you no longer need CUDA, AMD becomes a new viable option.


Deepseek uses cuda.


Important to note: the $5 million alleged cost is just the cpu compute cost for the final version of the model; it's not the cumulative cost of the research to date.

The analogous costs would be what OpenAI spent to go from GPT 4 to GPT 4o (i.e., to develop the reasoning model from the most up-to-date LLM model). $5 million is still less than what OpenAI spent but it's not a magnitude lower. (OpenAI spent up to $100 million on GPT4 but a fraction of that to get GPT 4o. Will update comment if I can find numbers for 4o before edit window closes)


It doesn't make sense to compare individual models. A better way is to look at total compute consumed, normalized by the output. In the end what counts is the cost of providing tokens.


Jevons paradox isn't some iron law like gravity.


feels like it is in tech. any gains in hardware or algorithm advance, immediately get consumed by increase in data retention and software bloat.


But why would the customers accept the high prices and high gross margin of Nvidia if they no longer fear missing out with insufficient hardware?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: