I'm Paul Bissex. I build web applications using open source software, especially Django. Started my career doing graphic design for newspapers and magazines in the '90s. Then wrote tech commentary and reviews for Wired, Salon, Chicago Tribune, and others you never heard of. Then I built operations software at a photography school. Then I helped big media serve 40 million pages a day. Then I worked on a translation services API doing millions of dollars of business. Now I'm building the core platform of a global startup accelerator. Feel free to email me.
I co-wrote "Python Web Development with Django". It was the first book to cover the long-awaited Django 1.0. Published by Addison-Wesley and still in print!
At least 236509 pieces of comment spam killed since 2008, mostly via Akismet.
(Note: This is a writeup I did a few years ago when evaluating Riak KV as a possible data store for a high-traffic CMS. At the time, the product was called simply "Riak". Apologies for anything else that has become out of date that I missed. Also please pardon the stiff tone! My audience included execs who we wanted to convince to finance our mad scientist data architecture ideas.)
Riak is a horizontally scalable, fault-tolerant, distributed, key/value store. It is written in Erlang; the Erlang runtime is its only dependency. It is open source but supported by a commercial company, Basho.
Its design is based on an Amazon creation called Dynamo, which is described in a 200-page paper published by Amazon. The engineers at Basho used this paper to guide the design of Riak.
The scalability and fault-tolerance derive from the fact that all Riak nodes are full peers -- there are no "primary" or "replica" nodes. If a node goes down, its data is already on other nodes, and the distributed hashing system will take care of populating any fresh node added to the cluster (whether it is replacing a dead one or being added to improve capacity).
In terms of Brewer's "CAP theorem," Riak sacrifices immediate consistency in favor of the two other factors: availability, and robustness in the face of network partition (i.e. servers becoming unavailable). Riak promises "eventual consistency" across all nodes for data writes. Its "vector clocks" feature stores metadata that tracks modifications to values, to help deal with transient situations where different nodes have different values for a particular key.
Riak's "Active Anti-Entropy" feature repairs corrupted data in the background (originally this was only done during reads, or via a manual repair command).
There is also a "Riak Search" engine that can be run on top of the basic Riak key/value store, providing fulltext searching (with the option of a Solr-like interface) while being simpler to use than MapReduce.
Riak groups keys in namespaces called "buckets" (which are logical, rather than being tied to particular storage locations).
Riak distributes keys to nodes in the database cluster using a technique called "consistent hashing," which prevents the need for wholesale data reshuffling when a node is added or removed from the cluster. This technique is more or less inherent to Dynamo-style distributed storage. It is also reportedly used by BitTorrent, Last.fm, and Akamai, among others.
Riak offers some tunable parameters for consistency and availability. E.g. you can say that when you read, you want a certain number of nodes to return matching values to confirm. These can even be varied per request if needed.
Riak's default storage backend is "Bitcask." This does not seem to be something that many users feel the need to change. One operational note related to Bitcask is that it can consume a lot of open file handles. For that reason Basho advises increasing the ulimit on machines running Riak.
Another storage backend is "LevelDB," similar to Google's BigTable. Its major selling point versus Bitcask seems to be that while Bitcask keeps all keys in memory at all times, LevelDB doesn't need to. My guess based on our existing corpus of data is that this limitation of Bitcask is unlikely to be a problem.
Running Riak nodes can be accessed directly via the
riak attach command, which drops you into an Erlang shell for that node.
Bob Ippolito of Mochi Media says: "When you choose an eventually consistent data store you're prioritizing availability and partition tolerance over consistency, but this doesn't mean your application has to be inconsistent. What it does mean is that you have to move your conflict resolution from writes to reads. Riak does almost all of the hard work for you..." The implication here is that our API implementation may include some code that ensures consistency at read time.
Riak is controlled primarily by two command-line tools,
riak tool is used to start or stop Riak nodes.
riak-admin tool controls running nodes. It is used to create node clusters from running nodes, and to inspect the state of running clusters. It also offers backup and restore commands.
If a node dies, a process called "hinted handoff" kicks in. This takes care of redistributing data -- as needed, not en masse -- to other nodes in the cluster. Later, if the dead node is replaced, hinted handoff also guides updates to that node's data, catching it up with writes that happened while it was offline.
Individual Riak nodes can be backed up while running (via standard utilities like
rsync), thanks to the append-only nature of the Bitcask data store. There is also a whole-cluster backup utility, but if this is run while the cluster is live there is of course risk that some writes that happen during the backup will be missed.
Riak upgrades can be deployed in a rolling fashion without taking down the cluster. Different versions of Riak will interoperate as you upgrade individual nodes.
Part of Basho's business is "Riak Enterprise," a hosted Riak solution. It includes multi-datacenter replication, 24x7 support, and various services for planning, installation, and deployment. Cost is $4,000 - $6,000 per node depending how many you buy.
Overall, low operations overhead seems to be a hallmark of Riak. This is both in day-to-day use and during scaling.
One of our goals is "store structured data, not presentation." Riak fits well with this in that the stored values can be of any type -- plain text, JSON, image data, BLOBs of any sort. Via the HTTP API, Content-Type headers can help API clients know what they're getting.
If we decide we need to have Django talk to Riak directly, there is an existing "django-riak-engine" project we could take advantage of.
TastyPie, which powers our API, does not actually depend on the Django ORM. The TastyPie documentation actually features an example using Riak as data store.
The availability of client libraries for many popular languages could be advantageous, both for leveraging developer talent and for integrating with other parts of the stack.
I am very impressed with Riak. It seems like an excellent choice for a data store for the CMS. It promises the performance needed for our consistently heavy traffic. It's well established, so in using it we wouldn't be dangerously out on the bleeding edge. It looks like it would be enjoyable to develop with, especially using the HTTP API. The low operations overhead is very appealing. And finally, it offers flexibility, scalability, and power that we will want and need for future projects.
I really do like Quora (you may have seen my SadQuora tweets, a side effect of the time I spend there). But when somebody asked, "What are the most annoying types of questions on Quora?" I couldn't resist. Maybe it's just my feed, but I see things like these a lot:
When I first switched from OS X to Ubuntu for my daily development work, one of the things I missed a lot was Divvy.
"Window throwing" is the purpose of Divvy (and Spectacle, which I later replaced it with). With a single keyboard shortcut, I can make the foreground window fill the right half of the screen. Or the left half. Or the bottom right quadrant. Or the whole screen. Any rectangle I care to define. I can even send it to the other monitor.
Once I had gotten used to this power, I was hooked. Manual resizing and repositioning of windows with the mouse felt fiddly and inexact. Like trying to align icons on your desktop by eye. Bother.
On Linux I hunted around for a while before finding out that the Compiz system has a Grid plugin for this sort of thing. Divvy's window size/position options are more granular, but Compiz gets it done.
Then, probably a year later I discovered the solution I currently use in Ubuntu: Unity actually has built-in keyboard shortcuts for window placement. They use the numeric keypad and they go like this:
Look at the layout of the keypad and you'll see these are their own perfect mnemonics.
One final note: anybody who has ever used a tiling window manager like Xmonad is familiar with the pleasure of instant and exact window control via keyboard. I use Xmonad as well and love it. (I just had to mention this because if I didn't, somebody would be like DO YOU EVEN XMONAD DUDE)
Non-engineers want to know: what happens when a big bug is found in your software, and the bug is causing real users real problems, and you're the one who wrote the code?
Engineers do sometimes write bad code, and sometimes it makes it into production, it's true.
But shipping production software involves a lot more than writing code. It goes beyond that one engineer. That engineer is not the only person who saw or ran that code.
In short, in a sizable professional software organization a single person doesn't really have the power to screw up all alone. So the right thing to do when a production bug bites you is, figure out how you - as an organization - let that happen.
What "happens to" the engineer who typed the code in question, hopefully, is that he/she participates in a post-mortem review that helps the team figure out how they can improve things to reduce the likelihood of similar problems in the future.
For more on this, read the utterly excellent and inspiring "Blameless PostMortems and a Just Culture" essay by John Allspaw of Etsy.
When we were growing our team of Python devs at CMG, I was involved in a lot of interviews. I really enjoyed it, meeting and hiring interesting and talented engineers.
I'm not a big fan of quizzing people on technical minutiae in interviews. I do think that asking some questions about technical likes and dislikes can be very illuminating though.
For example, "What's your favorite standard library module?" (Best answer in my book here is itertools or functools, but anything that shows they have hands-on appreciation for the depth of the standard library is good.)
I've also asked, "Tell me something you don't like about Python." This can be a great gauge of someone's level of sophistication and breadth of experience. If they say "But I like everything about Python!" that's a red flag (and I say this as a bona fide Python lover and career man). It means they either lack enough breadth of experience to see Python's weak points, or they lack the confidence to answer truthfully.
My favorite answer to this question was, "I don't like that lambda expressions can only be one line." It had never occurred to me to see this as a defect, but now every time I am writing code that drives me to the same feeling, I think about the engineer who gave that answer. (We did hire her and she was great!)
ANOTHER SPAMMER WITH BROKEN SOFTWARE
How to install the open source application Darktable on OS X
869 days ago
SPAMMER WHOSE COMMENT GENERATOR IS BROKEN
How to install the open source application Darktable on OS X
880 days ago
Switching from OS X to Ubuntu
921 days ago
The story of dpaste.com 2.0
1099 days ago
The story of dpaste.com 2.0
1100 days ago
by Paul Bissex
and E-Scribe New Media