newstalk

newstalk

:: like newspeak, minus the ungood bits. ::

newstalk RSS Feed
 
 
 
 

Building for persistence at a fundamental level.

This is a part of "Free Sky: Objects in Space".

Series contents:

  1. Free Sky: Objects in Space
  2. Enabling data-driven object construction.
  3. Building for persistence at a fundamental level.
  4. Using tags for metadata and lookup.

One of the first problems I tackled in my prototype was persistence, because I foresaw the difficulties of object serialisation, and I feared them. Because I’ve chosen Python as my language of first resort, serialisation isn’t as big of an issue as it would be, say, in C++: it’s built in to the language as the pickle module. The mechanics of serializing individual objects isn’t the problem, but capturing the state of a large, complicated system with lots of little objects running around very well could be. I needed a strategy.

I decided early on that a relational database solution like SQLite wasn’t for me. I’d heard of Object-Relational Mapping, and I wanted no part of it (though by all accounts, SQLObject and SQLAlchemy are both excellent Python solutions).

Object databases

Then I heard about object databases, particularly ZODB and Durus. Unfortunately, both projects suffer from poorly organized documentation. After some investigation, I went with ZODB on a hunch, since Durus is actually a simplified reimplementation of ZODB. The hunch is stil playing itself out–both projects are well tested, actively developed, and used in demanding production environments. For now, my money’s on ZODB.

The beauty of ZODB (and Durus, but from now on I’m just going to talk about ZODB) is that all the mechanics of actually getting an object out to the disk and back into memory are largely transparent. You have some tricks to learn, but the database stays out of the way. As long as an object is reachable from the root of the object tree, it will get pulled in to the persistence system, with few strings attached. If an object hasn’t been accessed in a long time, it will silently get spooled to disk, leaving a “ghost” object in its place, to save on memory. When you try to access the object again later, it replaces the ghost with your reconstiuted object.

The nice thing about an object database is you don’t have to fit a round object into a square SQL table. The messy thing about an object database is you’re on your own when it comes to structured searches. You have no industry standard Structured Query Language. Just you and your objects. Indexing and lookup strategies become important (I’ll elaborate my own solution in a later installment), and you have to start thinking in terms of trees.

BTrees

ZODB has a nice toolset based on BTrees, which work enough like python dictionaries to feel familiar. However, BTrees have some interesting quirks. Keys are always ordered (due to the underlying implementation). The entire BTree doesn’t necessarily have to be resident in memory–your keys and values get stored in “buckets”, and buckets get loaded and unloaded as necessary. Furthermore, there are several varieties of BTree, optimized for key or value type (i.e. an IOBTree requires integer keys, with arbitrary objects as values; OIBTree requires just the opposite).

BTrees also have closely related cousins called Sets, which work similarly to python sets, but with a different interface. With BTrees and Sets, we get a number of set operations. These will become important when we start talking about tags and tag lookups later in the series.

Problems and frustrations

With ZODB, I have suffered my share of woes. Remember the object tree? Anything reachable from the root gets pulled in. Even if it’s not pickleable. This has blown up in my face too many times to count, and often mysteriously, for the exception error messages are almost completely unhelpful.

Some objects that can’t be pickled, and therefore do no belong in your ZODB:

  • function objects (unless you’re using Stackless)
  • class objects
  • objects that instantiate dynamic classes
  • magical objects
  • objects that belong to the emperor
  • stray objects
  • those objects that are included in this classification
  • innumerable objects
  • objects that at a distance resemble flies

In part, I jest, but this is what it can feel like. I started this adventure off with the attitude, “Let’s just throw everything that’s not nailed down into the ZODB!” ZODB has since disabused me of this notion.

This does have some benefits. I’m less tempted to try every new bit of python magic that I come across, because I’m always thinking, “Well that’s neat, but can it be pickled?” This has also force me to understand the pickle interface at a lower level than I had previously felt comfortable. Finally, it has driven me to overcome the documentation challenges of ZODB and learn more of the ins and outs of the package. For instance, say, volatile instance variables that are simply discarded when the object gets dumped to disk. (Prefix the variable name with “v” and you’re set.)

Conclusion

I’ve learned a lot in tackling object persistence, and gained hard-won experience with ZODB. This time around I’ll know the gotchas ahead of time, instead of learning them the hard way, in the middle of an inspired coding spree.

Possibly related:

Like what you see?

Don't forget to subscribe via RSS or email to receive updates when I post new material. Thanks!

Comments are closed.

Feedback: I'd love to hear what you have to say!
  1. I don't have time to manage or moderate a live comments forum, but I'd love to hear from you.
  2. (required)
  3. (valid email required)
 

cforms contact form by delicious:days


subscribe