in software

How to recover from Cyrus when you have some DB errors

I’ll try to explain some methods and tips on how to recover from a mix in db libraries, or from messages like:

  • DBERROR: reading /var/lib/cyrus/db/skipstamp, assuming the worst: No such file or directory
  • DBERROR db4: PANIC: fatal region error detected; run recovery
  • DBERROR: critical database situation

Structure of Cyrus database

The database is split in two parts. The first part is the real database, normally located in /var/lib/cyrus:

 

In fact, in this part, the only (and most important in all of Cyrus) useful thing, is the mailboxes.db file. Everything else can be reconstructed from nothing.

The second part is containing all the emails and filters (sieve). Usually they are located in /var/spool/cyrus/. I advise you to make a recursive backup of these two directories in case something goes really wrong.

Possible problems, and their solution

Look for all the instances of cyrus executables you may have. The easiest for this is to use the locate command (after updating the database with updatedb). Try locate ctl_mboxlist for example. For each of the files run a ldd ctl_mboxlist (or on any other executable from cyrus):

 

The important thing is to have all of your executables linked with the same libdb version (here it is libdb-4.2.so).

The first thing to do, before anything else, is to verify the integrity of your Cyrus installation. Stop the Cyrus daemon, and restart it, and look at the logs. Many times, I see errors about daemons not found. For example, one of them (ctl_cyrusdb) is supposed to recover automatically from a db crash, but when you have the following error:

 

You can be sure, you’ll have a disaster next.

One problem is that the database has not been generated with the same libdb version as the current one. To correct this, change your user to the cyrus admin user (normally cyrus) and execute the following command, while looking at the syslog files:

 

It is normal you don’t get any output from the command. The output goes into SYSLOG. When the recovery is successful, you’ll have this:

 

Here, you see that Cyrus found 124 mailboxes.

If this is not sufficient, you’ll have to delete the /var/lib/cyrus/db/* contents and re-run ctl_cyrusdb.

If this is still not sufficient, you’ll have to do the extreme: Rebuild all the /var/lib/cyrus directory. To do this, first try to get a “text” version (export) of the mailboxes.db. If you manage to do this, you’ve saved. To export the list of mailboxes, do the following, as user cyrus:

 

This will output the users/mailboxes to stdout. Verify that it’s ok, then re-run the command and redirect the output to a file:

 

Then remove completely the /var/lib/cyrus directory. Recreate it with the following commands:

 

Here, you’ll have no output, meaning it found no users. That’s normal, as we just have a clean new mailboxes.db. If you’ve run the command reconstruct as root, do not forget to immediately change back the right to the files:

 

Now we just have to rebuild the mailboxes.db file:

 

Here, you’ll have a list of the found users in the database.

Now try to start again Cyrus.

Useful references

Related posts:

Facebooktwittergoogle_plusredditpinterestlinkedinmailNo tags for this post.

Write a Comment

Comment

 
  1. Thank you, this was a saver for me.
    Just one remark:
    before
    /usr/cyrus/bin/ctl_mboxlist -d -f /tmp/mailboxes.db
    you have to copy /var/lib/cyrus/mailboxes.db to /tmp

    Pim

  2. Thanks very much, that was really really helpful. I need to delete /var/lib/cyrus/db/* and then rerun ctl_cyrusdb -r and we are back