automating 'vos release'

Mon, 23 Apr 2001 09:55:34 -0600

David,

1.  Vos release does "no op" but it takes quite a while for AFS to decide
that all the ROs are up to date.  In our environment, we release thousands
of volumes weekly, automatically.  We sometimes have 3-5 thousand requests
in a single afternoon.  The AFS-native "no op" decision took about 30
seconds per volume.  The mechanism below takes 3-10 seconds per volume.  For
large numbers of requests, the time savings is significant.

2.  My benchmarking across T1 (heavily shared) to six remote servers in the
U.S. led me to write a "pre-release sync check."
     We're still using it, although I haven't re-run the benchmark in
several years.

    Roughly it looks like this:
        vos examine "readWriteVol" | grep "Update"
        ROtime=`vos examine "readWriteVol.readonly" | grep "Update" |
sort -u`

        If ROtimes > 1, then read only instances are out of synch with each
other.  Do vos release.
        If RWtime is more recent than ROtime, then RW has changed.  Do vos
release.
        Otherwise, do nothing.

3.  Our automated vos release is quite simple.  There are obvious
shortcomings for a general purpose tool, but it works fine for us.

     In general, we use AFS ACLs to determine whether a user is authorized
to request vos release or not.

    The users changes to the directory they wish to update, and executes a
tool "releaseThisDirectory" or "releaseThisTree"

    This relieves them from the task of discovering the volume name for
themselves.  Our user community consists of engineers and other non-IT folk,
and AFS volume names don't interest them at all.

    The tool checks to see if they're on the ACL of the "release request"
directory. (Can they write it?) If so, it writes a single volume name
(releaseThisDirectory) into the request queue, or finds all volumes below
the request directory (releaseThisTree) and writes all volume names into the
request queue. It also prompts them for a time frame -- "Now? Overnight?
Over weekend?"  Request queues are "release.now, release.tonight, or
release.weekend."

 The request queue is a flat file in a directory behind an AFS ACL, the ACL
contains group(s) and no individuals, so adding a person the the requester
list is a simple pts addu.

    If the person is not on the "release request" ACL, the tool "cats" a
file to screen indicating how they can request vos release capability.

   We run several cron jobs (vos release -localauth) that run at different
times.  Each takes a request-queue file name as an argument.

    We do some post processing (same vos exa approach) to ensure that the RW
is now less recent than the RO, and that all RO have the same update times.

4.  It's expensive to compare the file times on all the files in a volume.
And to do it "properly" you need to check all files in the RW and all files
in every RO instance -- by changing serverprefs on the client.  It can be
done, I've written a crude tool to do it, it's slow across WANs, and is no
better than vos examine.

The vos examine approach has failed once in four years.  We handle thousands
of release requests weekly, against a pool of 15000 heavily replicated AFS
volumes.  This means that a volume update time failed to update correctly
one time in four years against at least half a million release requests.
We're OK with that.

We maintain a list of directories that the release tools check.  Users can
only request releases of volumes stored below these paths.  If the user
requests an update of a volume below an un-releasable tree, we cat an
instruction file to screen -- with contact phone numbers to request the
release person-to-admin.

The tools log the username/path/release type/timeframe -- even a failed
request is logged.

Hope this helps.  Got to go.  Feel free to write for more details, if any of
this interests you.

Kim Kimball
CCRE, Inc.

----- Original Message -----
From: "David R Boldt" <dboldt@usgs.gov>
To: <info-afs@transarc.com>
Sent: Monday, April 16, 2001 9:02 AM
Subject: automating 'vos release'

> We are planning to serve web pages in a failsafe manner using
> geographically separated web servers, distributed director,
> and AFS.
>
> This is going to require that we run a "vos release" on a periodic,
> automated basis.
>
> * if the read/write version of a volume has not changed since
>    the last release, will "vos release" no-op, or will it try to
>    synchronize anyway.
>
> * would it make sense to attempt to determine whether a volume
>   has changed before doing a release?  --I don't have a very good
>   sense for how resource expensive "vos release" is.
>
>    if it would be a good idea to only release when necessary, is there
>    a more efficient way to discover whether a read/write volume has
>    changed since the last release other than comparing the mtimes
>    of all directories in the volumes?
>
> Is anyone else doing this sort of thing?
>
>                                           -- David Boldt
>                                              <dboldt@usgs.gov>
>
>