Marking a DB server to never be the sync site

Marcus Watts mdw@umich.edu
Tue, 13 Mar 2001 17:58:13 -0500


Jeff Blaine <jblaine@linus.mitre.org> writes:
> From: Jeff Blaine <jblaine@linus.mitre.org>
> To: info-afs@transarc.com
> Subject: Marking a DB server to never be the sync site
> Message-ID: <535425193.984494944@jblaine-pc.mitre.org>
> 
> I am 99% sure that the answer to this is "There is no way.", but I am
> going to ask anyway:
> 
> Is there any way to mark a particular DB server as never allowed to be
> the sync site?
> 
> We will have 5 DB servers soon (2 of those at remote sites) and would
> very much like the servers at the remote sites to never be allowed to
> become the sync site.  Those 2 servers have lower IP addresses than
> our "core" 3 DB servers where 80% of all AFS work is done at HQ.
> 
> Suggestions?  Comments?
> 

The simple solution is definitely to give the remote machines the highest
two IP addresses.  The sync site is the lowest working address, and
a majority (3 in this case) of machines is required to establish quorum,
so this is very close to meeting your needs.

If you had openafs or afs source, you could fool with the logic that
computes the "lowest" host in ubik, which is in
	files
			function
					comparision between
	beacon.c
		ubeacon_InitServerList	servAddr magicHost
	vote.c
		uvote_ShouldIRun	lastYesHost	ubik_host
		SVOTE_Beacon		otherHost	lowestHost
					ubik_host	lowestHost
(in all cases, the comparisions are the only occurences of the two variables
named on the same line in the code.)  For your cell, you could simply
invert the test, or compare between the negative of the IP addresses.
Be sure you get all 4 places set the same way.  At UMich, we use a
slightly more complicated test which includes the port of the service,
so that we can distribute sync sites between our 3 DB servers.  We use
this macro to compute the actual comparision value:
	#define UM_HOSTFUNC(h) ((-(((ntohs(ubik_callPortal)))>>1))^((((h))+1915547953)))
Using a macro like this makes it easier to be sure all 4 places in the
source are in sync with each other.  There's nothing on the client side
that cares how the server side computes the sync site, so the client
side doesn't need to know about this trickery.

I'm half-way tempted to think adding extra entries to CellServDB
would work, except I'm afraid it won't - the machines you would
like to run as sync site will see the remote sites and decide
not to campaign to be the sync site.  

A complicated way to "fix" this would be to run the server
binaries with a specially modified libc.a.  The special
modification would be to lie about the IP address of the
local machine.  This would need some thought and experimentation
to make everything that matters come out right, but it should
be doable.  The idea is to trick the binaries into thinking
they're running and talking on different IP addresses, but only
when going between each other.
Hm,
	packets that are sent & received need to have IP
		addresses munged.
	Opening CellServDB would need to be redirected to
		a munged copy.
	gethostbyname needs to munge its results.
	Probably other things...
bosserver would NOT need this mod, only ubik based servers, kaserver,
ptserver, vlserver.  Unless you are real fond of running transarc
binaries, modifying the source is going to be lots easier than
modifying libc.a.

Another way to fix this would be to use a UDP packet redirector.
We in fact *also* run a UDP packet redirector, so that we don't
need to run the backup db server on our DB servers.  (It's big,
ugly, grows without bounds, eats cpu, breaks, and it's more useful to
make a periodic offline copy than to replicate it and get 3 broken copies.)
The UDP packet redirector reads packets, then sends them on forging
the original client IP address as the sender (via a RAW IP socket.)
	[ code is under:
		/afs/umich.edu/group/itd/build/mdw/buredir/src ]
the complete path is a triangle, because the backup server can send
things directly back to the original client machine.  With some
versions of rx on the client, this may only work for authenticated
connections, but that's fine for talking to the backup DB, and also for
talking between DB servers.  For this application, you'd need machines
at five more IP addresses to run the redirector: two high addresses to
be seen by your central machines that would forward things to the
remote machines, and three low IP addresses to be seen by the client
machine to send things back to the servers.  Actually, in thinking
about it, I think the UDP redirector would have to be changed
to make this work right - it would need to munge the "client" address
so that the response goes back to the other redirector and not
directly to the sender.  There might be something else I missed
here, so I'm not sure this is an entirely feasible solution,
but if you're really allergic to modifying AFS source, this would
be another thing to try.

Something else you could do that might not be too hard is
to set up an IP tunnel and create a subnet (VPN?) with a high IP
address range that's only visible at the remote site (and to the
DB servers at the other end.)  If the remote site DB machines are
dual homed, it may even be possible to make their lower IP addresses
visible to remote clients, but not to ubik.

				-Marcus Watts
				UM ITCS Umich Systems Group