Sun's 'logging' mount option -- our findings -- RUN AWAY

Jeff Blaine jblaine@linus.mitre.org
Mon, 19 Feb 2001 16:47:11 -0500


A bit ago someone asked on this list about people using Solaris 2.7+'s
journaling/logging filesystem for AFS servers and whether or not it worked.

The response was from one person, and it was positive.

We investigated this in our testbed and saw no immediate problems.  We
set up the server to do UFS logging, set up scripts on client machines to
write all over volumes on that server, then dropped the power to the
server.  At power-on, the machine seemed to act perfectly happy.  No
fsck happened at all, the logged data was written out to disk, and
the machine went on chugging.  We also saw no data corruption or anything
out of the ordinary at all.

Sadly, our investigation was not thorough enough it seems.

Within ONE WEEK of enabling 'logging' as a mount option on two of our
Solaris 2.7 servers (thank GOD none of the other servers got rebooted,
thus enabling those 'logging' mount options in /etc/vfstab), we've
experienced severe data problems on just those servers.

The symptoms start appearing... RW vols can no longer be cloned during
'vos backupsys'.  Volumes start popping up in 'vos listvol' output as
being not attached.  Salvaging them does nothing to bring them back
online (for us, it would coredump our 'fs' process and often 'salvager'
as well).

The outcome has been a complete server restore from tape to newly
newfs'd partitions and significant downtime.

IMPORTANT: If your server is running with 'logging' on and you would
           like to turn it off, DO NOT turn it off in /etc/vfstab and
           then reboot the machine.  We did this with one of our two
           problem servers and salvager went INSANE deleting files
           allll over the place, which is what forced us to do a
           full server restore from tape.  I would like to take this
           time to thank salvager (or its authors) for removing
           SalvageLog.N when salvager is done instead of leaving them
           around until the next salvage run(s).  That's sarcasm.

           More to the point, don't even run salvager if you can
           help it.

           Perhaps you can use 'lockfs -f /vicepNNN' after you have
           done a 'bos shutdown' of the server in question.  I'm
           going to try that tonight, just after praying to every
           god I can think of.

So, while I wish you the best of luck if you're already running Solaris
servers with UFS 'logging' enabled, our conclusion for now is that it
is absolutely a bad idea.

Anyway, that's the report so far.  One more server to restore from tape
tonight.