Storing Binary Data in Postgres? Beware!

Storing Binary Data in Postgres? Beware!

Micah Martin
Micah Martin

February 18, 2007

If your situation matches the following conditions, beware!

  1. You’re working in rails.
  2. You’re using Postgresql.
  3. You’re storing binary data in the database.

This was the situation on a project of mine. We were storing PDF and PNG images in our Postgres database. Everything was fine during development and testing where we used files that ranged up to a few dozen kilobytes in size.

The situation went rapidly downhill when file sizes got up in the hundreds of kilobytes to megabytes. The worst part about it was that the errors we got were misleading and seemingly random. They included:

undefined method for nil:NilClass
invalid end of buffer
undefined class/module Packet (which was a defined model class)

These errors sent us on a wile goose chase. The clue that finally pointed us toward the problem was the fact that it took 6 seconds to load a 4M PDF document from the database. That was far too long especially considering the same document could be loaded from a file instantaneously.

Apparently, binary data stored in Postgresql’s bytea data type has to be parsed on save and load to escape and unescape certain characters. Unfortunately the native C Postgres gem doesn’t do the parsing.

It’s done in the PostgreSQLAdapter.unescape_bytea and PostgreSQLAdapter.escape_bytea methods (Ruby code) of ActiveRecord and the parsing is a bit too intensive for Ruby.

This is where the meltdown begins.

It consumes too much memory, or too much processing power, or…well I don’t know exactly. But I know it breaks.

We refactored our model such that all the binary data gets stored in flat files on disk rather than in the database. After this, our Rails app came back to life. It was much faster too!

Here’s hoping that if and when you encounter this problem, Google points you to this blog entry and you find it helpful.