The ones in Spain often market themselves with their Chinese-ness: "Hyper China", "Panda Bazaar", "Maxi Barato (super cheap)", etc. would be some representative names & signage you see outside.
They range in size from small shops to things with huge floor space.
One thing I've found is that they seem to sell very low quality stuff: e.g. on aliexpress you can buy a flashlight which is built out of metal, has usb-c charging, for $10, whereas in the physical shop, you get the plastic one that takes AA batteries for $2. So they're not a replacement for AliExpress, Temu & co.
A question for OP: why did you choose Crunchy Data for your database, instead of Fly's own managed postgres offering? Because the latency between Fly and Crunchy Data must be quite high, given that they are probably not in the same datacenter.
Which makes me think a small amount of random issues which happen even though nothing is broken, is normal everywhere. Especially once move things around on a network, there's potential for a lot more random errors.
Bitflips are something that can happen in consumer-grade RAM, so that tracks (and it's comforting that wayward cosmic rays are a substantial reason for an application's crashes!), but on enterprise servers, they will run ECC RAM that is very resistant to bit flips.
This is why data hoarders who have NASes with lots of space insist on running their servers with ECC RAM despite it being significantly more expensive. Because bit flips, for all intents and purposes, cannot happen. The RAM itself detects and corrects for them.
I wouldn't expect bit flips to be a significant contributor to enterprise problems.
Bitflips specifically may not be; things like network issues, noisy neighbors, row/rack/host maintenance (leading to a downed and migrated host) absolutely are things that happen at high frequency at scale and cause your background level of errors to be more than 0.
I suppose I misunderstood what the "random error" was supposed to mean. I wouldn't call a network error a "random error" because it's caused by things that are internal to the system (entities using a network). A bit flip is caused by an external factor: cosmic radiation. To me, that's what a "random" error is.
If your network goes down because of a DDOS, or part of your system overheating, that's an internal issue you had control over.
If a bit flips because of cosmic radiation, you can't really do anything about that, and it's utterly unpredictable. That's "random" to me.
It’s where monitoring for 9s is more important at that scale than absolute errors. So long as degradation is graceful or retried it should not be a massive problem.
It does require constant tuning and adjustment though.
Hm, interesting. But the pricing page is quite confusing to me: the $39 "pro plan" says "Up to 8 instances running". And above that, "Pricing that scales to zero". But if I'm always paying $39, what's the point of scaling to zero vs just keeping the 8 instances running? I guess the point is that you can scale down one workload and scale up another, but that seems a bit niche compared to the much more common use case of "scale up with increased user activity, and pay less when users are sleeping".
It's missing some sort of per-minute / per-GB RAM "pay as you go" pricing model. It seems like Fly.io, but missing the pay-as-you-go pricing & rapid scaling model that makes Fly worth using.
Yeah I really agree. There's no overage pricing on their website's main pricing page, which makes me think the jump from "Team" to "Pro" to negotiating an enterprise contract will really hurt. Going from $39/month to $199/moth because you needed slightly more is a really big jump in pricing. It's pretty much the opposite of what I would expect from a service that lets you scale to 0.
They specialize in domains management for businesses who consider their domain to be _very_ important. Think Google, Amazon, Microsoft, Wikipedia... (all of those are listed as clients on the wiki page)
As in "pay a lot of money", and we'll dedicate someone to your domain who makes sure that "giving a domain to a stranger without any documents" will _never_ happen.
a number of the largest companies that used to be 'clients' of markmonitor have now basically become their own domain registrars and have a direct relationship with ICANN. Amazon for instance. It's curious that google was one and has offloaded it to squarespace.
I'm pretty sure google never used them for their own domains, and the whole markmonitor/squarespace thing was their "google domains" product where they sold registrar services to others. Besides that they also are a registry for .app/.dev and others, but don't sell them via their own registrar anymore.
What are you doing for DB backups? Do you have a replica/standby? Or is it just hourly or something like that?
Because with a single-server setup like this, I'd imagine that hardware (e.g. SSD) failure brings down your app, and in the case of SSD failure, you then have hours or days downtime while you set everything up again.
Hetzner normally advertises their hardware servers as 2x 1 TB SSD, because it's strongly recommended to run them in SWraid1 for net 1TB. (Their image installer will default to that)
Once the first SSD fails after some years, and your monitoring catches that, you can either migrate to a new box, find another intermediate solution/replica, or let them hotswap it while the other drive takes on.
Of course though, going to physical servers loses redundency of the cloud, but that's something you need to price in when looking at the savings and deciding your risk model.
And yes, running this without also at least daily snapshotting/backup to remote storage is insane - that applies to cloud aswell, albeit easier to setup there.
For over a decade I ran a small scale dedicated and virtual hosting business (hundreds of machines) and the sort of setup you describe works very well. Software RAID across 2 devices, redundant power supplies, backups. We never had a significant data loss event that I recall (significant = beyond user accidentally removing files).
For quite a while we ran single power supplies because they were pretty high quality, but then Supermicro went through a ~6 month period where basically every power supply in machines we got during that time failed within a year, and replacements were hard to come by (because of high demand, because of failures), and we switched to redundant. This was all cost savings trade-offs. When running single power supplies, we had in-rack Auto Transfer Switches, so that the single power supplies could survive A or B side power failure.
But, and this is important, we were monitoring the systems for drive failures and replacing them within 24 hours. Ditto for power supplies. If you don't monitor your hardware for failure, redundancy doesn't mean anything.
> But, and this is important, we were monitoring the systems for drive failures and replacing them within 24 hours. Ditto for power supplies. If you don't monitor your hardware for failure, redundancy doesn't mean anything.
It does still mean something.
If you have a 5% annual chance of failure and no redundancy, your five year failure chance is 23%.
If you have redundancy and literally never check for five years, your five year failure chance is 5%. That's already a huge improvement. If you do an inventory of broken parts twice a year, still no proper monitoring, it goes down to 0.6%
For 2% the numbers are: 10% 1% 0.1%
For 10% the numbers are: 41% 17% 2.6%
(The approximations for small percents are x*5, x²*25, and x²*2.5)
If that's the tradeoff they're willing to make, who are you to say that they're doing it wrong?
Not every app needs 24/7 availability. The vast majority of websites out there will not suffer any serious consequences from a few hours of downtime (scheduled or otherwise) every now and then. If the cost savings outweigh the risk, it can be a perfectly reasonable business decision.
A more interesting question would be what kind of backup and recovery strategy they have, and which aspects of it (if any) they had to change when they moved to Hetzner.
It's possible no one will care much if it's down even for that long. I couldn't care less if my HOA mobile app was down even for a week for example. We don't need constant uptime for everything.
Don’t forget that integrity matters as much as availability in many applications. You might not mind if your HOA takes time to bring a server back up but you’d care a lot more if they lost the financial records or weren’t able to recover from a ransomware attack.
> Because with a single-server setup like this, I'd imagine that hardware ...
Yeah. This blog post reads like it was written by someone who didn't think things through and just focused on hyper-agressive cost-cutting.
I bet their DigitalOcean vm did live migrations and supported snapshots.
You can get that at Hetzner but only in their cloud product.
You absolutely will not get that in Hetzner bare-metal. If your HD or other component dies, it dies. Hetzner will replace the HD, but its up to you to restore from scratch. Hetzner are very clear about this in multiple places.
I'm not going to re-write it, the TL;DR is they are making an Apples and Oranges comparison.
Yes they "saved money" but in no way, shape or form are the two comparable.
The polite way to put is is .... they saved as much money as they did because they made very heavy handed "architectural decisions". "Decisions" that they appear to be unaware of having made.
Surely you must've noticed that pretty much all of their bare metal offerings ("dedicated" and the stuff on "auction") have multiple disks, allowing for various RAID configurations?
> Surely you must've noticed that pretty much all of their bare metal offerings ("dedicated" and the stuff on "auction") have multiple disks, allowing for various RAID configurations?
I don't know where to start with this comment. Do I really need to spell out the difference between cloud and bare metal ?
A few examples...
- Live migration ? Cloud only.
- Snapshots ? Cloud only.
- Want to increase disk space ? Tick box in cloud vs. replace disks (or move to different machine) and re-install/restore in bare metal....
- Want to increase RAM ? Tick box in cloud vs. shutdown, pull out of rack, install new chips (or move to different machine and re-install/restore)....
- Want to upgrade to a beefier processor ? Tick box in cloud vs move to a completely different machine and re-install/restore
You can get snapshots and live migrations working on-prem. The cloud isn't magic, it's just servers with hypervisors and software running on top of them. You can run that same software.
Also, with something like Hetzner you would not be going in and physically doing anything. You also just tick a box for a RAM upgrade, and then migrate over or do active/passive switch.
The cloud does have advantages, mostly in how "easy" it is to do some specific workflows, but per-compute it's at least 10x the cost. Some will argue it's less than that, but they forget to factor in just how slow virtual disks and CPU are. Cloud only makes sense for very small businesses, in which the operational cost of colocation or on-prem hosting is too expensive.
are you a capable engineer or do you believe in magic?
the savings of a cheap engineer disappear on the cloud bill. get a badass well paid engineer who can do both and doesn't talk his way out of this financial madness
> Well you did say your data is lost when a disk fails, which is not true.
Well, technically its still a possibility.
I am old enough to have seen issues with RAID1 setups not being able to restore redundancy, as well as RAID controller failures and software RAID failures.
Also, frankly you are being somewhat pedantic. My broader point was regarding cloud. I gave HD Failure as one example, randomly selected by my brain ... I could have equally randomly chosen any of the other items ... but this time, my brain chose HD.
Can you elaborate? I'm coming up with similar designs recently (static site plus redundant servers) but my designs so far assume no database and ephemeral interactions. (Realtime multiplayer arcade games.)
Curious what the delta to pain-in-ass would be if I want to deal with storing data. (And not just backups / migrations, but also GDPR, age verification etc.)
database isn't hard to have HA with, it's actually very easy to do any of this.
i already design with Auto Scale Group in mind, we run it in spot instance which tend to be much cheaper. Spot instances can be reclaimed anytime, so you need to keep this is kind.
I also have data blobs which are memory maped files, which are swapped with no downtime by pulling manifest from GCS bucket each hour, and swapping out the mmaped data.
i use replicas, with automatic voting based failover.
I've used mongo with replication and automative failover for a decade in production with no downtime, no data lost.
Recently, got into postgres, so far so good. Before that i always used RDS or other managed solution like Datastore, but they cost soo much compared to running your own stuff.
Healthchecks start new server in no time, even if my Hertzner server goes out or if whole Hertzer goes out, my system will launch digital ocean nodes which will start soaking up all requests.
Changing project framerate is apparently quite a hard problem, even DaVinci Resolve when you change it, warns you that you cannot change it for that project again.
Probably internally everything in a project is referenced to specific frame numbers, which would break if you changed the project framerate.
And I would rather have the _choice_ whether to prove my age to Apple or not. I think if it were optional, with the additional option of "share my age with websites & apps", nobody would have an issue with it.
They range in size from small shops to things with huge floor space.
One thing I've found is that they seem to sell very low quality stuff: e.g. on aliexpress you can buy a flashlight which is built out of metal, has usb-c charging, for $10, whereas in the physical shop, you get the plastic one that takes AA batteries for $2. So they're not a replacement for AliExpress, Temu & co.