E-Scribe : a programmer’s blog

About Me

PBX I'm Paul Bissex. I build web applications using open source software, especially Django. Started my career doing graphic design for newspapers and magazines in the '90s. Then wrote tech commentary and reviews for Wired, Salon, Chicago Tribune, and others you never heard of. Then I built operations software at a photography school. Then I helped big media serve 40 million pages a day. Then I worked on a translation services API doing millions of dollars of business. Now I'm building the core platform of a global startup accelerator. Feel free to email me.


I co-wrote "Python Web Development with Django". It was the first book to cover the long-awaited Django 1.0. Published by Addison-Wesley and still in print!


Built using Django, served with gunicorn and nginx. The database is SQLite. Hosted on a FreeBSD VPS at Johncompanies.com. Comment-spam protection by Akismet.


Pile o'Tags

Stuff I Use

Bitbucket, Debian Linux, Django, Emacs, FreeBSD, Git, jQuery, LaunchBar, macOS, Markdown, Mercurial, Python, S3, SQLite, Sublime Text, xmonad

Spam Report

At least 237132 pieces of comment spam killed since 2008, mostly via Akismet.

How things get better after you screw up at work

(Hint: it's about your team.)

A couple weeks ago I accidentally replaced our live, production database with a 17-hour old snapshot.

This is an always-on application with users around the globe, so the mistake was likely to have blown away some new user-entered data.

I didn't realize what I had done for an hour or so (I thought I had targeted a test database server, not production). When it hit me, I had already left work. Here are the steps of how we handled it, with an emphasis on the “good engineering team culture” aspect:

  1. I immediately shared my realization of the crisis with the team. I did not try to fix it myself, or pretend I didn't know what had happened. I was able to do this because I knew the team had my back.

  2. Available team members immediately dove into confirming, assessing, and mitigating the problem. (Since I was in transit I was not yet able to get on a computer.) Focus was on minimizing pain for our users and the business, not on blame, resentment, or face-saving.

  3. User monitoring tools used by our UX person gave us critical info on which users had potentially lost data. We shared knowledge.

  4. We didn't think we had a more recent backup than the snapshot I had used — but one of the engineers had been making more frequent snapshots as part of a new project. He had been done with work for hours (he's in a different time zone), but when he saw the chatter on Slack he jumped in to help. He didn't say, “not my job.”

  5. After we had reached a stable state, people signed off, but I stayed on to double-check things and write up a summary to broadcast to the team. Communication is key.

  6. The next day, we scheduled a postmortem meeting to discuss the incident. This is a standard practice that's very important for building teams that can learn and grow from mistakes. It's “blameless” — the focus is on what happened, how we responded, what the business impact was, and what we can do to reduce the chance of recurrence. An important part of prevention is making measures more concrete and realistic than “try not to make that mistake.” In the end we lost only about 90 minutes of database history, and accounted for all user data added in that period.

I made a bad mistake, the team rose to the occasion, we were lucky to have good mitigation options, and we are making changes to reduce the chance of the mistake happening again. Win.

Sunday, October 7th, 2018
+ + +


After a couple years of mostly using XMonad on my Linux machines instead of a standard Desktop Environmnt, I'm coming around to using XFCE. I've always liked it; it's been my installed "fallback" DE (for when you need the damned settings dialog for some thing or other). Now it's becoming my primary.

I like the low resource use. I don't hate Unity and Gnome Shell but they are too much for my older machines.

But the little thing that is making the most difference is, good standard keyboard driven launching and window-manipulating features.


Sure it's not XMonad, but it lets me get stuff done and doesn't require any custom setup.

Thursday, May 31st, 2018

How I became a software engineer, 8-bit version


You could say Z-80 assembly language is what really turned me into a software developer.

My first programming language was BASIC, which was built into my first computer (a TRS-80 Model III). I wrote a lot of BASIC code, including arcade-style games (compiled BASIC — you can still play them on this TRS-80 Model III Emulator).

I always wanted to keep learning. There was no World Wide Web for research and nobody I knew could guide me, so we went to Radio Shack and asked them how else I could program the computer. They sold us the Editor/Assember package.

I eventually wrote a simple text editor. I still remember the fun of implementing scrolling commands as block moves, in and out of the machine's memory-mapped video. Getting a handle on low-level operations gave a feeling of mastery that wasn't going to come from BASIC.

But beyond the particulars of assembly, the reason that it was significant for me is that it gave me a taste of broader possibilities. It’s not your first programming language that makes you a programmer, it’s your second. You start seeing patterns and high-level concepts, and seeing that a given language is just a tool for a job.

Learning assembly was a grind, but it showed me what I could do. From then on, if I had a chance to play with a new language, I took it. LOGO. Pascal. 6502 assembly. Even a dialect of Lisp (I had no idea what was going on there). I think that early breadth of experience ultimately helped convince me that software engineering deserved my full attention.

Tuesday, December 26th, 2017
+ + +

My 100x ROI as accidental domain speculator

One of the hazards of working in the web biz is impulse-buying domain names.

Back in the Web 2.0 boom days, there were a lot of “social” web plays with silly names.

I thought I’d satirize this by registering numbr.com and making a social site where you could “friend” the number 7 and that sort of thing.

I never got around to building that site. However I did get a curious email one day from “Joe” who wanted to know if I’d sell the name. He was with a startup that was going to offer temporary phone numbers for Craigslist postings or something. After some back and forth, we agreed on a price: $1000.

For a joke domain name I paid $10 for.

Tuesday, September 26th, 2017
+ +

Neo4J and Graph Databases

noSQL is a big tent with lots of interesting tech in it. A few years ago at work I got an assignment to evaluate graph databases as a possible datastore for our 40-million-pageviews-a-day CMS. Graph DBs are elegant stuff, though not a particularly special fit for that application. Here's what I had to say.

Graph databases are all about "highly connected" data. But instead of tracking relationships through foreign-key mappings RDBMS style, they use pointers that directly connect the related records.

These relationships can also have directionality and descriptive properties.

Graph DBs store and retrieve in a manner arguably more congruent to the true structure of heavily relational data than an RDBMS.

Using an RDBMS with foreign keys and joins can mean a significant performance cost in join-heavy situations.

There are many products in the graph database space, many of them relatively new. There are some variations in features and intended niche. I focus on Neo4j, which is the dominant player, mature, and open source.


Neo4j seems to be the most prominent and heavily used graph database product of the "property graph" type. Its sponsor is a company named Neo Technology. It was created in 2003 and open-sourced in 2007. It's under active development, but seems mature enough not to be undergoing disruptive changes. There's an active user community and a good ecosystem of third-party tools, and books are emerging as well.

Querying and Data Access


Cypher is Neo4j's SQL-ish declarative query language.

One notable difference from SQL is that every database query has an explicit starting point. Usually this is a specific node in the graph. The Cypher START clause identifies this node. It's selected either by its ID or via an index lookup.

For example, given that almost any $BIGCMS object is attached to a specific site (or sites), many queries of graph-database $BIGCMS might start at a site node.

A common pattern for Cypher queries is START ... MATCH ... RETURN. (Keywords are not case sensitive, but as with SQL it improves overall query readability if they are in all caps.)

Cypher session example ("//" begins a comment):

    // A mutating operation (e.g. CREATE) doesn't have to return anything, but it can.
    // Note that we did not have to declare our nodes' data structure before creating them.
    $ CREATE paper={name:"AJC"}, tv={name: "WSB TV"}, radio={name: "WSB radio"} RETURN paper, tv, radio
    ==> +-----------------------------------------------------------------------------+
    ==> | paper                | tv                      | radio                      |
    ==> +-----------------------------------------------------------------------------+
    ==> | Node[17]{name:"AJC"} | Node[18]{name:"WSB TV"} | Node[19]{name:"WSB radio"} |
    ==> +-----------------------------------------------------------------------------+
    ==> 1 row
    ==> Nodes created: 3
    ==> Properties set: 3
    ==> 3 ms

    // Establish the relationships, fetching start nodes by ID
    $ START tv=node(19), radio=node(18) CREATE tv-[:SAME_MARKET]->radio
    $ START tv=node(19), paper=node(17) CREATE tv-[:SAME_MARKET]->paper

    // Query the graph; "-" indicates relations, with optional "<" or ">" for direction
    ==> +----------------------------+
    ==> | b                          |
    ==> +----------------------------+
    ==> | Node[17]{name:"AJC"}       |
    ==> | Node[19]{name:"WSB radio"} |
    ==> +----------------------------+

The Cypher relation syntax looks a bit noisy at first; it's helpful to think of it as a sort of ASCII-art diagram; "a-->b" or "a<--b" or "a-[:LOVES]->b" or "b-[:TOLERATES]->a" are all legal.

Other access modes

In addition to the declarative-style Cypher, there are other supported ways to access data.

The server has a REST API. In addition to being available for "raw" use it is the basis for many of the tools and language bindings for Neo4j. For example, the provided Python bindings utilize the REST API internally.

The Neo4j shell, in addition to supporting Cypher commands, has utility functions that make interactive manipulation of graph data easier.

Gremlin is a graph traversal language based on Groovy ("the Python of Java"). It's provided as a plugin with the Neo4j distribution.

There's also py2neo, a comprehensive Python library for Neo4j access that also provides submodules for access via Cypher, Gremlin, Geoff (a graph modeling language by the same author), and raw REST.

Using Neo4j

The Neo4j "Community" version is what we would likely use. It's GPL licensed, and is the complete product.

They also offer two commercial versions, "Advanced" and "Enterprise." The selling points are advanced monitoring features, high availability support, a specialized web management console, and support services.

(The Advanced and Enterprise versions are also available under an Affero GPL license, but this is currently not practical for us.)

The user support ecosystem is what you would expect for an open source project. There's an official Google Group. Using Stack Overflow to ask questions is encouraged. There's a (quiet) IRC channel on Freenode. Github is used to distribute the source.


Scaling a Neo4j database is not as simple as with a Dynamo-style store like Riak. Graphs are difficult to shard.

Neo4j has "high availability" features for clustering in the Neo4j Enterprise Edition. This is a master-slave setup. You can write to master or slave nodes, though there's a speed penalty for writing to slaves. All nodes get all writes eventually. Automatic fail-over can be set to elect any cluster member as master. A failed master node can later re-join as slave if desired.

In a cluster setup, backups can be performed by adding a slave to the cluster, which will pick up all the data. To restore, you stop the cluster, restore data from backup to at least one node, and re-start the cluster.

Neo Technology has been working for several years now on a system allowing the graph datastore to be distributed across servers, and to be scaled horizontally. This work (currently known as "Rassilon") will arrive with Neo4j 2.0 at the earliest (current stable version is 1.8).

Technical details

Neo4j is a JVM application (written in Java and Scala), so we would need to cultivate expertise in JVM deployment.

Neo4j likes to have its data in RAM -- specifically its node and property maps, which are mostly pointers. Having space to additionally hold the full property values in RAM is apparently not critical. Given that the vast bulk of $BIGCMS data is in property values, and that the total number of records (i.e. nodes) is nowhere near their hard limit of 32 billion, this seems achievable.

For best performance, Neo recommends maximizing the host OS's file caching. Making the server's filesystem cache size as big as the entire datastore is recommended when possible.

Their JVM tuning advice is: give the JVM a large heap that will hold as much application data as possible, but also make sure the heap fits in RAM to avoid performance degradation from virtual memory paging. Along those lines Neo advises tuning Linux to be more tolerant of dirty virtual memory pages.


Ubuntu/Debian: Neo Technology provides an apt repository.

OS X: There's a Homebrew formula for the latest stable version of Neo4j.

Other Unix platforms (e.g. CentOS): Neo Technology provides tarballs containing the full binary release. And the source is available too of course.


Graph database technology proponents make a big deal of how well suited it is to relationship-heavy social media applications. While that's not currently a big niche for us, the technology still has some appeal.

One only needs to look at some of $BIGCMS's slowest, join-heavy SQL queries to know that a graph approach has the potential to increase performance greatly, and perhaps allow us to work with data in some ways that we have avoided or ignored because they are impractically slow.

And for our goal of "store structured data, not presentation," a graph database seems like an excellent fit. Graph relationships would give us the ability to record even more (readily usable) structure than we already do.

Final Thoughts

We could certainly speed up many slower $BIGCMS queries by moving from a RDBMS to a graph system. Our most pathologically slow SQL queries can take minutes. Getting our data into graph storage could eliminate many if not all of these.

However, the migration effort would be significant. Getting $BIGCMS data into graph form will require some careful thinking about how the data will be accessed. Common advice on creating a graph store is to think about the relationships first. This might lead to some rethinking of how we store data.

Since a major goal of $BIGCMS is to share content across sites, and we intended to build a library of that content, a graph database could offer a natural and powerful way to work with those connections.

If we were intending to directly replace our RDBMS store with a graph database, many migration challenges would arise that we might not see with other data store types. But since the our data store will live behind a REST API, disruption at the application level might be no greater than with some data store type (e.g. key-value).

As a more detailed design for the data store REST API is developed, we will likely have a better sense of how a graph database would serve in that design, and how its advantages would be felt.


O'Reilly is working on a Graph Databases book which is currently available in a free pre-release PDF at http://graphdatabases.com/. It heavily emphasizes Neo4j.

Manning is publishing "Neo4j in Action" which is currently available under their Early Access Program.

Saturday, September 16th, 2017
+ +
1 comment