Add new comment

Live Compression of a GDBM Database

Submitted by Bill St. Clair on Sun, 2008-02-03 21:02.

A GDBM database file is an easy way to persistently store key/value pairs. Loom uses one for its backing store. One problem with these databases is that they become fragmented over time, with lots of unfilled empty space: holes. They need to be periodically compressed. Patrick Chkeroff and I came up with a neat mechanism to do live compression. He has seen the Loom database file get 20 times bigger than it needed to be. He plans to integrate something similar to this mechanism into the Loom code.

The basic idea mirrors Lisp two-space incremental garbage collectors, but is a little simpler. Call the original GDBM database the "old" database. We're going to open a second database, called the "new" database. And change the database access code so that it will over time copy the old database to the new one, leaving the new one with a compressed version of the data in the old one, plus any changes the user code has made during the copy operation.

There's a working example in GDBM.php, in my Loom folder. You can copy and paste from that page or download it and my other Loom related code from billstclair.com/loom.

The idea is simple. There are three operations on a database, read, write, and delete (implemented in the code by get() and put() functions, where put() of a blank value means delete).

read($key):

  1. If $key has a value in the new database, return it.
  2. If $key has a value in the old database, return it.
  3. Otherwise, return false, not in the database.

write($key, $value):

  1. Write the value to the new database.
  2. Delete the key from the old database.

delete($key)

  1. Delete the key from the old database.
  2. Delete the key from the new database.

To each of the operations, I also add a call to the copysome() function, which copies some values from the old database to the new one. After all values have been copied, we close both databases, delete the old one, which is now empty, rename new to old, and reopen the new old database. When the user requests another copy, we open another new database and start again.

copysome:

  1. Fetch the first key/value pair in the old database
  2. If the key does not yet have a value in the new database, insert the key/value pair there.
  3. Delete the key from the old database.
  4. Repeat for a configurable number of copies, or until the old database is empty

At no time is it necessary to close the database for off-line compression. All you need is enough disk space to store two copies of the data, and a slight slow-down in access speed during the copy.

( categories: Loom )

Reply



The content of this field is kept private and will not be shown publicly.


*

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <i> <b> <u>
  • Lines and paragraphs break automatically.
  • Web and e-mail addresses are automatically converted into links.
  • You may quote other posts using [quote] tags.
  • Easily link to terms in various wikis. For help, see interwiki.
  • Easily link to terms in various wikis. For help, see interwiki.
  • You may quote other posts using [quote] tags.
  • You can use BBCode tags in the text, URLs will be automatically converted to links
Verify comment authorship
Captcha Image: you will need to recognize the text in it.
*
Please type in the letters/numbers that are shown in the image above.