Skip to content

Instantly share code, notes, and snippets.

@lrowe
Created August 6, 2015 21:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lrowe/b65ad8455ce3458b19c8 to your computer and use it in GitHub Desktop.
Save lrowe/b65ad8455ce3458b19c8 to your computer and use it in GitHub Desktop.
Debugging ZODB bloat [2005]

Warning

This was written in 2005 so may be out of date. Originally published on the old Plone documentation section: https://plone.org/documentation/kb-old/debug-zodb-bloat/

About

Having spent a lot of time tracking down the cause of ZODB bloat in an Archetypes application I thought I'd share my experience in case it was useful to anyone else.

Step 1: Analysis

First step was to analyse the extent of the bloat. The 'analyze.py' script in the Zope 'bin' directory (found in the ZODB.scripts egg) allows you to see what sort of objects are using up space in your database and how many are current or old revisions. If you pack the db then add an object and analyze.py you can see how much bloat is being caused by the number of old object revisions around. add objects and analyze a few times and you can see what sort of objects are the cause of the bloat.

(The version of analyze.py shipped with Zope 2.10 (Plone < 4) is broken, download the latest versions of ZODB scripts from Zope svn if you are using Plone 3.):

$ bin/zopepy -m ZODB.scripts.analyze var/filestorage/Data.fs
Processed 10213 records in 162 transactions
Average record size is  409.77 bytes
Average transaction size is 25833.09 bytes
Types used:
Class Name                                       Count    TBytes    Pct AvgSize
---------------------------------------------- ------- ---------  ----- -------
AccessControl.User.User                              1       137   0.0%  137.00
AccessControl.User.UserFolder                        1        65   0.0%   65.00
App.ApplicationManager.ApplicationManager            2       480   0.0%  240.00
App.Product.Product                                150    174890   4.2% 1165.93
App.Product.ProductFolder                           86    223839   5.3% 2602.78
BTrees.IIBTree.IIBTree                            1300    138462   3.3%  106.51
BTrees.IIBTree.IISet                                 6      1526   0.0%  254.33
BTrees.IIBTree.IITreeSet                           142     10144   0.2%   71.44
BTrees.IOBTree.IOBTree                             586    149196   3.6%  254.60
BTrees.IOBTree.IOBucket                            461    423508  10.1%  918.67
...
============================================== ======= =========  ===== =======
                            Total Transactions     162                   25.23k
                                 Total Records   10213     4086k 100.0%  409.77
                               Current Objects    8550     2870k  70.2%  343.77
                                   Old Objects    1663     1216k  29.8%  749.08

From this I could see that BTrees.IOBtree.IOBucket objects were the culprit. I know they're used in the catalog, but where?

Step 2: Manually look at the contents of the ZODB

fsdump.py will print (hexadecimal) oid, size and class for each record of each transaction in your ZODB:

$ bin/zopepy -c "import sys; from ZODB.FileStorage import fsdump; fsdump.fsdump(sys.argv[3])" var/filestorage/Data.fs
Trans #00000 tid=0382427a3a7f7022 time=2009-11-23 19:06:13.710423 offset=52
    status=' ' user='' description='initial database creation'
  data #00000 oid=0000000000000000 size=66 class=persistent.mapping.PersistentMapping
Trans #00001 tid=0382427a3a91e411 time=2009-11-23 19:06:13.727317 offset=215
    status=' ' user='' description='Created Zope Application'
  data #00000 oid=0000000000000000 size=92 class=persistent.mapping.PersistentMapping
  data #00001 oid=0000000000000001 size=207 class=OFS.Application.Application
  data #00002 oid=0000000000000003 size=81 class=App.ApplicationManager.ApplicationManager
  data #00003 oid=0000000000000004 size=39 class=App.Product.ProductFolder
  data #00004 oid=0000000000000002 size=65 class=AccessControl.User.UserFolder
  data #00005 oid=0000000000000005 size=63 class=Persistence.mapping.PersistentMapping
...

analyze.py shows how to open a file storage and inspect the contents. So I iterated to the transaction in question and listed the records:

fs = FileStorage(path_to_Data_fs, read_only=1)
fsi = fs.iterator()
TCOUNT = 2000 # or whatever
for n in xrange(TCOUNT):
    fsi.next()
txn = fsi.next()
records = list(txn)

You can get the size of each record and the oid using [(len(rec.data),rec.oid) for rec in records]

From a Zope debug console you can get the object with ob = app._p_jar[oid]

For some objects though (like IOBuckets) all you get is the c data structure back, which is not all that helpful for our purposes.

Step 3: Reconstruct the object path

I needed to get the path of the object represented. Fortunately ZODB gives you the tools to make a good guess at it. Using the attached utility methods you can build first a map of object references and then try to reconstruct the object path:

from inspectZodbUtils import buildRefmap, doSearch
target = rec.oid # assuming rec is the record your interested in
refmap = buildRefmap(fs)
path, additionals = doSearch(target,refmap)
print path

use app._p_jar[oid] from a zope debug console to see what sort of object it is. (This is the packed oid, use ZODB.utils.p64(0x00000000000123f) when working with a hex oid from fsdump.py.

#
# Utility methods to help inspect a ZODB
#
# Laurence Rowe 21/04/2005
#
# See also $SOFTWARE_HOME/bin/analyze.py
# and http://mail.zope.org/pipermail/zodb-dev/2001-August/001309.html
#
# First open the storage (read-only!) and iterate to the transaction you are
# interested in. recs = list(txn). find the size of each rec by len(rec.data)
# target is rec.oid of the rec you are interested in.
#
# In a zope debug console you can get the object with app._p_jar[rec.oid]
# For some objects (like BTrees.IOBTree.IOBucket) this is pretty useless.
# They represent themselves as their C data structure. Better find their path.
#
# Build a refmap - graph of object references
# (not too slow if the data.fs fits in memory).
# use doSearch to get a reference path (beginnings of other paths are returned
# as additionals). With the list of oids you can reconstruct the path by using
# app._p_jar[oid]. When you reach a python object something useful is shown!
#
from ZODB.serialize import referencesf
def buildRefmap(fs):
'''build a refmap from a filestorage. look in every record of every
transaction. build a dict of oid -> list(referenced oids)
'''
refmap = {}
fsi = fs.iterator()
for txn in fsi:
for rec in txn:
pickle, revid = fs.load(rec.oid, rec.version)
refs = referencesf(pickle)
refmap[rec.oid] = refs
return refmap
def backrefs(target, refmap):
'''Return a list of oids in the refmap who reference target
'''
oidlist = []
for oid, refs in refmap.iteritems():
if target in refs:
oidlist.append(oid)
return oidlist
def doSearch(target, refmap):
'''for a target oid find the path of objects that refer to it.
break if we reach no more references or find a cycle
'''
path = [target]
additionals = []
while True:
target = path[-1:].pop()
brefs = backrefs(target, refmap)
if not brefs:
break
bref = brefs[0]
if bref in path:
print 'cyclic', bref
break
if len(brefs) == 1:
path.append(bref)
print bref
continue
additionals.append( (target, brefs[1:]) )
print bref, brefs[1:]
path.append(bref)
return (path, additionals)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment