Wacky path-switching behavior

Sarah Stockwell sarahs@qualcomm.com
Tue, 03 Apr 2001 16:07:14 -0700


We had an odd problem at our site that took us a while to figure out, and I 
thought I'd share the solution in case anyone else comes across it.

We have an application that people launch by hand out of a directory in an AFS 
volume.  There's a symlink in that directory, ../../somefile, that points to a 
file in another volume.  That "somefile" volume is also mounted via another
AFS mount point elsewhere, and thus another path, which a separate, automated 
process invokes periodically (this is monitoring software).

Occasionally, intermittently, the application launched by hand wouldn't be 
able read "somefile" (in our case, this meant no icons would be displayed in 
the app).  This was difficult to reproduce, and mystified us for a while.

Solution:
It turns out that Solaris caches inode lookups for better performance.  It 
resolves symlinks when it comes across them, so it assumes (correctly for most 
everything but AFS) that there is one, and only one, path to a given inode.  
Specifically, when the app is launched by hand, it caches the inode of 
"somefile" and the path associated with that inode (which is the path used by 
the hand-launched app).

Given this assumption by Solaris, it's reasonable for it to behave as it does: 
when another lookup happens on that machine that happens to be the same inode, 
it bumps the old path associated with that inode out of the cache.

So: "cwd", which looks at that cache, will give you the path by which you 
originally looked up the inode (i.e., cd'd to the directory), and the 
../../somefile symlink will work.  So the app worked fine sometimes.  However, 
when the automated process on the same machine woke up and accessed "somefile"
via a different path, the first path was bumped out of the cache and replaced 
by the second, and "cwd" in the first process (dutifully reading the cache) 
suddenly gave a different result -- so the ../../somefile symlink ceased to
work.

This doesn't matter unless you're relying on the environment not to change, as 
we were.  But it's startling.  :-)  We've worked around the problem (replace 
symlink with AFS mount point), but thought it might help someone to know about
it.

--Sarah Stockwell
  UNIX System Administration
  Qualcomm Inc.