Hacker Newsnew | past | comments | ask | show | jobs | submit | vb-8448's commentslogin

In general: less data = less "intelligence".

And basically all the security bugs I've read about were find looking on the source code.

But it doesn't mean windows is more secure, just image a scenario where someone is stealing windows source code and sell it to rogue actor, it will make it even less secure because no one (expect windows) would have had the chance to search for bugs in the source code.


I guess we miss fundamental information: how much in terms of time and token usage took to the "middle guy" to create the report?

Next question: could it be that OP can use Mythos in a better way since he knows better the project?


> Use cloud models only when they’re genuinely necessary.

The problem is that it's much easier to use the SOTA models (especially if they are subsidized) instead of spending time fixing the knobs with the local one.

I just realized this with coding agents, yeah, you probably shouldn't always use latest version at xhigh, but you will end doing it because you do the job in less time, with less "effort" and basically at the same price.

I guess we'll see a real effort for local AI only when major vendors will start billing based on actual token usage.


I'm also just not seeing good performance from local models. Every time a thread about LLMs comes up, there are tons of people in the comments insisting that they're getting just as good results from the latest DeepSeek/qwen/whatever as with Opus, and that just hasn't been my experience at all: open-source models just fall over completely compared to Claude when asked to do anything remotely complicated.

I have a sneaking suspicion this is kinda like the situation with Linux in the 90s, where it kinda worked but it reeeeeally wasn't ready for the home user, but you had a lot of people who would insist to your face everything was fine, mostly for ideological reasons.


It depends a lot on how you run those models. I think a lot of disagreement is because of that. A lot of people run local models with incredibly small context windows (makes an agentic LLM circle in loops), use very small quants (like 4 bit => huge degradation), don't set the recommended parameters (like top-p/temperature), or download GGUFs with broken chat templates. And then they claim model X is bad :)

I'm currently running both Sonnet 4.6 and Qwen 3.6-27b on the same codebase (via OpenCode, the parameters were carefully tuned to have a good quality/context size ratio), and on this project, they both struggle with complex non-trivial tasks, and both work flawlessly otherwise. Sonnet 4.6 understands the intent better if my task is ambiguously formulated, but otherwise the gap is pretty small for coding under a harness.


> Every time a thread about LLMs comes up, there are tons of people in the comments insisting that they're getting just as good results from the latest DeepSeek/qwen/whatever as with Opus, and that just hasn't been my experience at all: open-source models just fall over completely compared to Claude when asked to do anything remotely complicated.

Different usage patterns - you want to issue a single spec then walk away and come back later (when it has consumed $10k worth of API tokens inside your $200/m subscription) to a finished product.

Many people issue a spec for a single function, a single class or similar. When you break it down like that, the advantages of SOTA models shrinks.


My experience is that in medium/big codebases even with single functions going with the xhigh is basically better from a user perspective (faster to get the result, and you can trust it) while going with lower models(e.g. sonnet instead of opus) you have to always carefully review the output because 1 of 10 it will hallucinate, you won't catch it immediately and at some point it will bite you.

> My experience is that in medium/big codebases even with single functions going with the xhigh is basically better from a user perspective (faster to get the result, and you can trust it) while going with lower models(e.g. sonnet instead of opus) you have to always carefully review the output because 1 of 10 it will hallucinate,

What do you mean "trust it"? It sounds like you want to vibe-code (never look at the output), and maybe for that you need SOTA, but like I said in a different comment, I can easily generate 1000s of lines of code per hour just prompting the chatbots.

I don't, because I actually review everything, but I can, and some of those chatbots are actually SOTA anyway.


With SOTA models I can just set up the instructions (even a little bit fuzzy), go away for 10 or 15 minutes, come back and just check result and adjust when necessary (and most of the time small adjustment are necessary, but the overall work is pretty good).

With subpar models I must be more careful on providing instructions and check it step by step because the path it chose is wrong, or I didn't ask for or the agent stuck in a loop somewhere.


A lot of people aren’t using agents that way. Not saying that it’s not a legitimate use or anything, just that I think the use cases are different. And yeah maybe for your specific use case, sota hosted models are the right choice

This.

I’ve begun to suspect that most people are probably running different hardware. Sure, you run the latest deep flash on your brand new M5 128G maybe you get acceptable performance?

But honestly, how many people have an extra $9000 laying around these days?

Right now, running with acceptable performance is kind of a luxury. I wish the people who always say - “This is great!” - would realize that not everyone has their hardware.


Actually even with a 9k hardware you won't get good enough performance. There is an interesting video from antirez on trying to run deepseek v4 flash 2bits on a m3 max 128GB ... and the result is kind delusional: as soon as the context start growing you are around 20token/s.

Prefill performance used to be the real bottleneck on antirez's DS4 and that's been greatly improved by now, it doesn't perceivably slow down with growing context.

> The problem is that it's much easier to use the SOTA models (especially if they are subsidized) instead of spending time fixing the knobs with the local one.

That's not a problem, that's a feature; I have something like 8 tabs open to different free-tier providers. ChatGPT, Claude and Gemini are the SOTA ones.

I have no problem maxing one out, then moving to the next. I can do this all day, have them implement specific functions (or classes) in my code. The things is, because I actually know how to write and design software, I don't need to run an agent in a loop to produce everything in a day, I can use the web chatbots with copy/paste to literally generate thousands of lines of code per hour while still having a strong mental model of the code that I can go in and change whatever I need to.[1]

---------------------

[1] Just did that this morning on a Python project: because I designed what I needed, each generation was me prompting for a single function. So when I needed to add something this morning I didn't even bother asking an chatbot to do it, I just went ahead directly to the correct place and did it.

You can't do that if you generate the entire thing from specs.


We are speaking about local AI, and having all this SOTA models basically for free is blocking the progress of local or independent third party setups.

Maybe I should have clarified what the feature is (After re-reading my post, I see that I basically just ended after adding the footnote)

The feature of using all these SOTAs to exhaustion on the free tiers is burning their VC money!

The more I use for free, the more of their money I burn, the closer we'll get to actual 3rd-party and independent setups (local or otherwise).


The path of least resistance usually wins, especially when the pricing hides the real cost

Let's take a SW business like a ticketing system.

Do you think 100 enterprises with 1 bln of tokens are going to make a better product than specialized vendor with 100bln of tokens?

For sure SW vendors and SAAS like "logo creator" are already dead, but unless the next generation of LLMs aren't going to have an embedded ticketing system the ticketing system vendor will be fine(maybe less headcount, but not sure).


> Do you think 100 enterprises with 1 bln of tokens are going to make a better product than specialized vendor with 100bln of tokens?

I'm not sure if this is sound reasoning, because "better product" is very context-dependent.

My currently employer has migrated away from RT to OTRS as ticket system, and now moving to servicenow.

The RT instance was heavily patched/customized.

The OTRS instance was heavily patched/customized.

We try not to customize servicenow quite as much, but the less we customize it, the more we have to change the workflows in our company. And humans are slow to adapt.

With this experience in mind, the question is more: do we want to spend lots of money on a vendor-supplied ticket system, and then spend lots more LLM tokens to customize it, or do we LLM-build it from the ground-up?

If we started a new ticket system migration project today, maybe the best answer would be to start with an easily-customizable Open Source ticket system, and then throw LLM-power at customizing it.


> or do we LLM-build it from the ground-up?

But in this case you don't spend tokens only on your workflows: you have to patch it constantly, perform vulnerability scans, check and adapt for law changes(eg. if you in europe: GDPR or DORA), create and maintain (again security) integrations with other systems and so on.

And, most importantly, you as a corporate need an internal team to do the work and that means it's a liability to you as a corporate ... and we all know it's better to have some else to blame.

Just imagine the CTO or CISO explaining to the CEO that the data breach they had last week and that cost them millions was due to some customization they did on top of an open source ticketing system.


How does it compare to a MacBook Pro? (if you ever used one)

I'm looking get rid of my MacBook Pro, and I'd like to switch to a Linux laptop, but I'm really worried about battery and trackpad.


Better software vs better hardware. Freedom and privacy vs luxury handcuffs.

Actually Mac software isn't that bad. The only thing I don't really like is the cmd+tab behaviour (and any third party alternative feels subpar) and the Finder.

Can't you run linux on a macbook pro?

I would not do that, mostly because Apple is, and always has been, doing what they can to create locked down platforms which are the antithesis of digital autonomy. Being able to run a different operating system is and never will be something they will actively support, and I will only expect that that possibility will go away in the future if they ever feel that it would threaten their amount of control. I will never transact with the company for that reason alone.

On M1 and M2 currently yes (M3 in progress). Check out the Asahi Linux project.

isn't asahi linux dead?

No not at all. Some of the original contributors stepped away, but they’re still active and have recently made a bunch of progress toward M3 support.

Their blog is quite active: https://asahilinux.org/blog/

Announcements have been quiet for a while because they have been focusing on upstreaming their kernel changes, but more recently they’ve been adding new features and working on new model support again.


not well

I have never owned an Apple product, but I have helped other people from time to time. It's hard to say because I'm not used to it, but the trackpad feels really snappy and precise, and the 120hz display also helps making it feel really smooth when scrolling

So what's next ... Is this a proof for when they are going to charge you a 30% commission on your sales for products build with their tools?

So what's next ... they are going to charge you a 30% commission on your sales for products build with their tools?

They very well may try, if not already discussing ways to accomplish this or claim partial ownership of everything generated and just license the output to you. They're already trying to do platform lock-in with all this as it is. IIRC One of their investors/investor groups said something like the best customers are hostages so you know it's coming in some form.

This would be crazy after they used basically tons of material regardless of the licence or are hammering third party websites to crawl data.

I tried zed a couple of times, it's something I'd like to play more because the feeling is fantastic ... but for python development pycharm is still superior.

PS: One thing I'm really missing is the ctrl+shift+f equivalent


what did the author used to create the site?


I did a bunch of things :D I am not a frontend engineer (I am MLE) so I don't have the prowess to create things like this. I am heavily inspired by 3blue1brown and I love creating interactive explainers for ML concepts like this. I previously created this as well arkaung.github.io/interactive-eigenvector/. I heavily used Claude to get to the the exact design, typography, and style I want (there was a lot of hand holding to get to this state). I heavily influenced Claude on how I want the explainer to flow, how I want to make things intuitive, the kinds of mathematical concepts I want to visualize (and how). So all in all, a lot of hand holding for the Coding agents to get to where I want and exactly how I want.

But at the end of the day it is just vanilla HTML, CSS and JS without anything fancy :D MathJax 3 was used to render math stuff.


The fonts, the cards, the copy are all hallmarks of Claude Code.

While the aesthetic doesn't spark joy for me, the overall execution is great, the presentation flow and interactive boxes are very nice.


It's an average of 8GB per database, I guess he serves multiple clients and decided to "segregate" each client on its instance. If it's acceptable for the business it's nothing wrong with his setup.


> Several live mobile apps serving hundreds of thousands of users

It seems like he's having a database for each app.


Could be, but this doesn't change my statement: if it's acceptable from the business standpoint there is nothing wrong with this setup.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: