I mentioned on Twitter the other night about a hiccup we had on our server and someone asked if I could give a high-level view of what makes MLKSHK go.
First, we use the Tornado framework. It is fast, simple to learn, and best of all: non-blocking. The architecture for the originally planned MLKSHK called for a bit more use of this lovely asynchronicity, so we chose Tornado to do this.
The other reason we chose Tornado is that it’s actually enjoyable to read the source code. It’s small enough that if you are stuck on something, it’s easy to pop over to the source and see what’s going on. I feel this way about many Python projects, but never more with Tornado.
Using Tornado though, means you have to build or find some of the components that are common in bigger frameworks–specifically ORMs. We rolled our own that is just the right amount of thin-layer on top of MySQL. It’s called FlyingCow and does a great job inside of Tornado.
Nginx does two huge things for us: first, it’s a proxy for file downloads from S3. We use the X-Accel-Redirect header to tell nginx to proxy the file from another location. Second, it handles file uploads using the nginx-upload-module. If you are using nginx and are doing your own file uploads stop right now and use this module.
The thing that hiccuped the other night was RabbitMQ. This has been happening frequently, but I haven’t figured out exactly why. My first guess is some sort of memory issue but I haven’t been able to figure it out from the logs. Either way: it gets stuck. In front of that is Celery, we use queuing to handle some of the more lengthy processes like favoring files and saving to your shake so you get an immediate feedback on the front-end. Plus if we are getting hammered with requests you won’t notice it as much.
We use HAProxy to balance the load across multiple machines. At any time we can put new machines in rotation to lighten the load. Airbrake to notify us of errors. Postmark to send mail. Github, of course. And EC2 to host it all.