fileserver locking up

Doug dervin@sedona.ch.intel.com
Thu, 1 Feb 2001 09:34:18 -0700 (MST)


Yes,
I've seen this kinda behavior also with 3.5 in older versions. We
also had problems with extreme volserver slowness, and occasionally
a client would see a fileserver as being down. 

The fix for us was upgrading to 3.6 2.5 for most issues. Still some
vos releases are extrememly slow.. You can see in the VolserLog timeouts
of over 600seconds, but I have found that if you wait it out it will
finally finish. I've been attributing this slowness to extremely
busy fileservers, so far....

Thanks,
Doug.


On Thu, 1 Feb 2001, Stephen Joyce wrote:

> Brent,
> 
> Back when we were running AFS 3.5 3.17, we had almost exactly the same
> problem.  Vos move commands would fail with "possible communications
> failure", especially if several were occurring at once.  We also had
> udpInOverflows and udpInCksumErrs as symptoms (revealed by netstat).
> 
> I opened a ticket with transarc, and the solution for us add -nojumbo
> -udpsize 262144 to our volserver process (configured via bos) and restart.
> 
> They also directed me to www.transarc.com/Support/afs/news/fstuning.html
> for more info (it explains what those options do so you can tweak more
> needed).
> 
> FWIW, we've been running AFS 3.6 2.3 without adding those options to the
> volserver and without seeing the problems we had before; perhaps it was
> fixed in 3.6?
> 
> Cheers,
> Stephen
> --
> Stephen Joyce
> Systems Administrator                                            P A N I C
> Physics & Astronomy Department                         Physics & Astronomy
> University of North Carolina at Chapel Hill         Network Infrastructure
> voice: (919) 962-7214                                        and Computing
> fax: (919) 962-0480                               http://www.panic.unc.edu
> 
> On Wed, 31 Jan 2001, Brent Johnson wrote:
> 
> > Hello,
> > 
> > I'm running a cell with fileservers and db servers running solaris 2.6
> > and afs v. 3.5 345.  Every since we upgraded to 3.5 (about nine months
> > ago) we've been getting erratic behavior from our fileservers: vos
> > commands timeout regularly (every weekday, next to never on weekends or
> > after 7pm) with "possible communication failure", at least one (9 total)
> > of the fileservers dumps core. Here recently the fileserver and
> > volserver processes just hung--wouldn't take process any commands,
> > rxdebug failed, snoop showed no outbound traffic from volserver or
> > fileserver, disk usage was nil, and netstat -s output showed
> > udpInOverflows and udpNoPorts increasing.
> > 
> > This every happen to anybody else?
> > 
> > -Brent
> > 
> > 
> 
> 


            --------- __o       __o       __o       __o
            ------- _`\<,_    _`\<,_    _`\<,_    _`\<,_
            ------ (*)/ (*)  (*)/ (*)  (*)/ (*)  (*)/ (*)
===================================================================
Doug Ervin                       What I do: Unix Systems Admin.

Doug's thoughts:

- If you can't be a good example, then you'll just have to be a
  horrible warning.

dervin@sedona.intel.com         disclaimer: "ALL"		 0-
===================================================================