Hi,
so I'm going to change YAIM now. I've updated
to the latest dpm-yaim rpm. Comparing my YAIM
config with the link you have sent I have this
differences I'd like you to confirm
Replace
OUT:
DPM_XROOTD_FED_ATLAS_NAMELIB="XrdOucName2NameLFC.so
root=/dpm/${MY_DOMAIN}/home/atlas
match=bohr3226.tier2.hep.manchester.ac.uk"
IN:
DPM_XROOTD_FED_ATLAS_NAMELIB="XrdOucName2NameLFC.so
pssorigin=localhost
sitename=ATLAS_SITENAME"
where atlas site name is
UKI-NORTHGRID-MAN-HEP. And
OUT:
DPM_XROOTD_REDIR_MISC="$DPM_XROOTD_DISK_MISC"
IN:
DPM_XROOTD_REDIR_MISC="$DPM_XROOTD_DISK_MISC
dpm.mmreqhost localhost"
OUT:
DPM_XROOTD_DISK_MISC="xrootd.monitor all
rbuff 32k auth flush 30s window 5s dest
files info user io redir atl-prod05.slac.stanford.edu:9930
if exec xrootd
xrd.report atl-prod05.slac.stanford.edu:9931
every 60s all -buff -poll sync
fi"
IN:
DPM_XROOTD_DISK_MISC="xrootd.monitor all
auth flush 30s fstat 60 lfn ops xfr 5
window 5s dest fstat info user redir atl-prod05.slac.stanford.edu:9930
if exec xrootd
xrd.report atl-prod05.slac.stanford.edu:9931
every 60s all -buff -poll sync
fi"
The variable below are not mentioned in the
documentaion link should I keep them?
DPM_XROOTD_FED_ATLAS_SETENV="LFC_HOST=prod-lfc-atlas-ro.cern.ch
LFC_CONRETRY=0 GLOBUS_THREAD_MODEL=pthread
CSEC_MECH=ID"
DPM_XROOTD_FED_ATLAS_MISC="$DPM_XROOTD_DISK_MISC"
thanks
cheers
alessandra
On 07/03/2014
08:43, Wahid Bhimji wrote:
[log in to unmask]"
type="cite"> Hi Alessandra
Thanks for the config.
First the n2n-rpm-list file is empty.
But anyway you should have at least
xrootd-server-atlas-n2n-plugin-2.0-0.x86_64
from the WLCG repo.
Assuming you have that version then the
arguments for dpm.namelib shouldhave
pssorigin=localhost
sitename=UKI-NORTHGRID-MAN-HEP
at the end
This changed a while back - but you
probably have the old config. The newest
yaim variables are at
Without the sitename option then it
will not get the correct set of
"rucioprefixes" which might be why its
searching in all those crazy places for
each file.
hopefully thats it - I didn't check the
rest..
Wahid
Number of jobs in transferring state
shouldn't be that high. There might
be some other reason but the first
impression was that it was due to
this, considering the staggering
number of CLOSE_WAIT connections
today.
I put everything here
http://ks.tier2.hep.manchester.ac.uk/T2/tmp/xrootd-debug-20140306.tgz
thanks
cheers
alessandra
On
06/03/2014 19:11, Wahid Bhimji
wrote:
[log in to unmask]"
type="cite"> Hi
Some of those messages are
"normal " in looking for files
that don't exist.
But the number and odd search
paths (and the fact that
fedredir doesn't start properly
) makes me think something is
wrong with the manchester
config.
can you send me the conf
files from /etc/xrootd and
versions
rpm -qa | grep xrootd
rpm -qa | grep -i n2n
and also full logs from
/var/log/xrootd/fedredir_atlas/xrootd.log
/var/log/xrootd/dpmredir/xrootd.log
and I'll see if I spot
something.
(PS - what makes you say it
is "clogging" transfers at
Glasgow (or manchester))
Wahid
Last one
of the day
The
/var/log/xrootd/fedredir_atlas/xrootd.log
is now full of these
messages since 18:23
140306 18:38:16
0x4bfff700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.annovi/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:16
0x4bfff700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.anventur/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:16
0x4bfff700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.aolszewski/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:16
0x4bfff700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.aoun/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:17
0x4bfff700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.apenson/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:17
0x4bfff700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.apereira/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:17
0x5aa0a700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.arobic/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x5aa0a700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.aroe/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x5a606700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.arnaudferrari/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x588e9700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.JoshuaMiloKunkle/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x587e8700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.GemmaHollyWooden/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x587e8700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.GiulioUsai/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x587e8700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.Gordon/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x599fa700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.Kerim/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x589ea700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.GiulioUsai/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x59afb700 XRD-LFC No
such file or directory
/grid/atlas/users/pathena/user.arnaez/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
On
06/03/2014 18:25,
Alessandra Forti wrote:
[log in to unmask]"
type="cite"> Also
fedredir keeps on not
restarting properly
[root@bohr3226
~]# service xrootd
restart
Shutting down
xrootd (xrootd,
redir):
[ OK ]
Shutting down
xrootd (xrootd,
disk):
[ OK ]
Shutting down
xrootd (xrootd,
fedredir_atlas):
[FAILED]
Starting xrootd
(xrootd,
redir):
[ OK ]
Starting xrootd
(xrootd,
disk):
[ OK ]
Starting xrootd
(xrootd,
fedredir_atlas):
[ OK ]
I've applied xrd.timeout
idle 1800
in the fedredir
configuration file
outside of any if
statement and restarted
xrootd.
let's hope it helps.
cheers
alessandra
On
06/03/2014 18:14,
Alessandra Forti
wrote:
[log in to unmask]"
type="cite">I haven't
applied the recipe yet
but
+10k CLOSE_WAIT
connections
Thu Mar 6 18:10:01
GMT 2014
11962 CLOSE_WAIT
if this is caused by
FAX this is not going
to work.
It is also clogging
the transfers from
jobs at Manchester and
looks like Glasgow too
http://panda.cern.ch/server/pandamon/query?dash=prod
cheers
alessandra
On 05/03/2014 16:53,
Sam Skipsey wrote:
If it's
the federated
redirector that's
having problems,
then
xroot-dpmfedredir_atlas.cfg.
Otherwise, if it's
the local redirector
(and if local jobs
were
breaking, then I
guess it was?), then
xrootd-dpmredir.cfg
(Or try changing
both?)
Sam
On 5 March 2014
16:38, Alessandra
Forti <[log in to unmask]>
wrote:
Which
of these files I
should change?
[root@bohr3226
xrootd]# ls
dpmxrd-sharedkey.dat
xrootd-dpmdisk.cfg
xrootd-dpmfedredir_atlas.cfg
xrootd-dpmredir.cfg
xrootd-standalone.cfg
xrootd-clustered.cfg
xrootd-dpmdisk.cfg.rpmnew
xrootd-dpmfedredir_atlas.cfg.rpmnew
xrootd-dpmredir.cfg.rpmnew
On 05/03/2014
16:00, Alessandra
Forti wrote:
Andy Hanuchevsky
suggested one
could use the idle
option
"The particular
directive can be
found at:
http://xrootd.org/doc/prod/xrd_config.htm#_Toc310725348
specifically the
idle option."
However I am not
sure who has tried
that - nor am I
sure it is the way
forward...
I can try this
although I'm not
sure a connection
in CLOSE_WAIT can
be
considered "Idle".
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.