That's porn for database geeks: 100 million pageviews per day, 300k requests per second against Redis.
The way they use MySQL is also interesting IMHO: they populate a relational database in order to be able to build new indexes in the Redis side, using the relational DB for the stuff it is best at, generating new "views" of the data easily.
(Relational DBs are also good to do a zillion more things of course.)
Yep, i've been a Redis lobbyist for over a year at this company (i use to be the Lead Software Engineer on Youporn/Pornhub), and i'm really happy to see them using it and using it properly.
I remember seeing YouTube storing videos using some sort of specialized hardware from NetApp or something and they still got hit by the limitation of Linux filesystem (number of nodes?)
I remember in the 90's having a conversation with my friend. We were discussing Linux vs Windows for web hosting. My friend said "tell me if you can find a single porn site that's hosted on Windows." He was right, I couldn't find any.
Worked for one that was at least in the top 5 porn sites back around 1998-99. All Windows servers running IIS and we were using NetApp appliances for storage.
Reading tip for you: Prometheus Rising by Robert Anton Wilson. Especially the part about the first 4 primitive circuits of the mind which govern survival, dominance, sex, stuff like that. War is very centered on Circuit II: dominance, emotions and territorial thinking.
I assumed he was gesturing at the creation/destruction discord inherent in sex/war with war being the referent for obscene. Of course, a full account of sex more than kisses and pleasantries, it also involves competition.
The (oftentimes sublimated) drive to procreate is one of the most powerful drivers of all human society and action. Perhaps even THE "primus motor" behind it all.
In Prometheus Rising RAW mentions that the reason monks take a vow of chastity is because it instantly frees them from one of the major hamster-wheels of the human condition.
Of course, I'm not planning to become a monk :) But food for thought...
The legal definition of obscenity requires that a work is meant to be arousing and has patently offensive sexual conduct, in addition to having no redeeming artistic value.
Therefore, adding sex into something offensive makes it more obscene.
The online porn industry were the first people to really deal with scalability (of managing and serving web content, anyway), the first people to deal with online payments, the first people to come under really heavy, sustained hacking attempts, etc. Don't underestimate how much of what you take for granted technologically in 2012, they pioneered in the mid-90s.
A friend of mine built a lot of porn sites in the late 90s using DB2 and Oracle. He says that Microsoft's SQL Server folks came to him to talk about what it would take to get him (and the porn sites) to use SQL Server.
He said that he was confident that DB2 and Oracle could handle the load. They responded that SQL Server was much more capable than he thought and asked him about the workload.
Their response to his answer was "we can't even simulate that, maybe we're not ready".
It's not just now either, the porn industry was hugely influential in causing the adoption and spread of video (you know, on casette), movies and photography.
Indeed - I've heard it argued that the real reason that VHS won out over the technically superior Betamax format, was that Sony refused to license the Betamax technology to adult video vendors. Never dug into it, but it seems plausible.
Having just one deciding factor seems implausible. Straw that broke the camel's back? Sure. But calling it "the real reason" completely ignores every other relevant factor.
Social took off when Zuck tricked users into giving out their real names, when meant social networks grew much faster (as old school friends would connect).
Porn does "social", but only anonymous social, which is naturally handicapped.
If someone created a social network which allowed better privacy controls, the porn sites would love it.
We do the same thing, where we populate a mysql store as a secondary storage for business intelligence -- our Founder, PM, AdOps, etc know enough SQL to be dangerous if we build out more traditional, relational models.
It amused me that you were testing a porn site -- the homepage / potentially-misused zrange-issue.
At the point where multi-core sharding/ring setups are involved, you begin to have NIC/network saturation. It's feasible to achieve 300K on a single box.
100M page views per day is 1157 page views per second. I wonder how that's being blown up by a factor of 300?
I suppose if each video's metadata causes a redis lookup and there are 300 videos displayed per page that would do it, but you'd think you could batch those lookups. It seems a fairly inefficient.
Either that or they have some background jobs causing redis lookups that aren't a result of page views.
There's 30 videos on the front page. Mousing over the image preview of a video displays a series of preview images.
There were 76 requests made as observed from the net tab in Firebug. The majority of these requests seemed to be for those video images. Note that I'm using an adblocker and refuse to check out the site without it.
Note also that they said 300k queries. There could be constantly-running background jobs doing conversion, pulling those preview frames, and crunching numbers on the metadata.
Tomorrow our site, with over 100 million page views per day, will be relaunching running on Symfony2. This is a complete rewrite. Prior to that it was running the Catalyst Engine written in Perl.
Sounds like Redis is being put to the role of reader boxes in the older split-head MySQL (or *SQL) architecture "write to master, read from slaves" with a conversion shim between MySQL & Redis..
I'm doing something similar with Dirty Hot Productions (one of my clients). We use MySQL on a separate CMS Rails app to store everything, and all of the websites communicate with the CMS via an API. We use Redis to cache all of the API responses, so effectively everything except the first read (or the first "dirty" read) on a particular query is coming straight from Redis. It works amazingly well, and it doesn't use nearly the amount of RAM I thought it would.
The way they use MySQL is also interesting IMHO: they populate a relational database in order to be able to build new indexes in the Redis side, using the relational DB for the stuff it is best at, generating new "views" of the data easily.
(Relational DBs are also good to do a zillion more things of course.)