You would imagine wrong. I'm running a video sharing page (think FB&youtube mashed together with a dash of blogger) with about 800k unique/month users and it's no picnic.
Sure, but the sharing is what makes it hard - as soon as your pages are per-user it gets interesting. For youporn, each video page is essentially static (maybe there are comments? Even if there are, I doubt they display a different set of comments to each /viewer/). So video pages are easy to cache, the database layout is simple, and sure there's some effort involved in scaling up but it's nowhere near as hard as e.g. tumblr's architecture that was featured recently (where every viewer sees a page with all the posts from everyone they're following, and so everyone's page is different)
I apparently did NOT say that it's picnic. And I don't see how your reply is related to my comment. Does your site have enough social features? Like follow, walls... And is your site 100% redis too?
I was just stating that building a site like that is not as trivial as you made it out to be. Yes you "only" have some videos, but you also have views of said videos, and comments and rankings. Building different views of videos based on those factors is hard if you have lots and lots of this data points. Not to mention that you then have to (somewhere) display just some user's videos sorted by how they ware ranked, or maybe just some user's videos from some category based on number of views (which is not a simple INT field in the db). Etc etc etc.
What I'm saying is that there is more than it meets the eye.
Nothing fancy actually. We have 3 dedicated storage servers (two in the same DC as our web server, one in germany), with lots and lots of disk space. We then SCP converted videos to them and serve them trough nginx (supports seeking).
Hey, thank you so much for the information. I hope you don't mind to share a little bit more (it's okay if you don't reply due to IP or whatnot, I'll understand).
How do you handle where to put the files on which disks? do you write custom software or there's some sort of off-the-shelf software to do this?
The reason I asked about this is because it looked like a few people I talked to seem to prefer to use Distributed File System such as HBase, GlusterFS or something else so that they don't have to write extra code.
And I'm guessing you back them all up on daily base as well?
We split files into two levels of folders. For instance if a video has a filename "somethingsomethingXY" it will go into "videos/y/x/" folder. Then it's up to our sys admins to mount those folders to whichever disk they want. So from the app's point of view it doesn't matter where the file actually is. And we backup everything every couple of hours.
Like I said, nothing fancy but it works without any issues (apart from a small delay when you have to copy the file to 2 or 3 different servers (redundancy) before it becomes available, but it's not a big problem but I have few ideas how to solve that if it becomes one :)).