Hi,
so I'm going to change YAIM now. I've
updated to the latest dpm-yaim rpm.
Comparing my YAIM config with the link you
have sent I have this differences I'd like
you to confirm
Replace
OUT:
DPM_XROOTD_FED_ATLAS_NAMELIB="XrdOucName2NameLFC.so
root=/dpm/${MY_DOMAIN}/home/atlas
match=bohr3226.tier2.hep.manchester.ac.uk"
IN:
DPM_XROOTD_FED_ATLAS_NAMELIB="XrdOucName2NameLFC.so
pssorigin=localhost
sitename=ATLAS_SITENAME"
where atlas site name is
UKI-NORTHGRID-MAN-HEP. And
OUT:
DPM_XROOTD_REDIR_MISC="$DPM_XROOTD_DISK_MISC"
IN:
DPM_XROOTD_REDIR_MISC="$DPM_XROOTD_DISK_MISC
dpm.mmreqhost localhost"
OUT:
DPM_XROOTD_DISK_MISC="xrootd.monitor all
rbuff 32k auth flush 30s window 5s dest
files info user io redir atl-prod05.slac.stanford.edu:9930
if exec xrootd
xrd.report atl-prod05.slac.stanford.edu:9931
every 60s all -buff -poll sync
fi"
IN:
DPM_XROOTD_DISK_MISC="xrootd.monitor all
auth flush 30s fstat 60 lfn ops xfr 5
window 5s dest fstat info user redir atl-prod05.slac.stanford.edu:9930
if exec xrootd
xrd.report atl-prod05.slac.stanford.edu:9931
every 60s all -buff -poll sync
fi"
The variable below are not mentioned in the
documentaion link should I keep them?
DPM_XROOTD_FED_ATLAS_SETENV="LFC_HOST=prod-lfc-atlas-ro.cern.ch
LFC_CONRETRY=0
GLOBUS_THREAD_MODEL=pthread
CSEC_MECH=ID"
DPM_XROOTD_FED_ATLAS_MISC="$DPM_XROOTD_DISK_MISC"
thanks
cheers
alessandra
On 07/03/2014
08:43, Wahid Bhimji wrote:
[log in to unmask]"
type="cite"> Hi Alessandra
Thanks for the config.
First the n2n-rpm-list file is empty.
But anyway you should have at least
xrootd-server-atlas-n2n-plugin-2.0-0.x86_64
from the WLCG repo.
Assuming you have that version then
the arguments for dpm.namelib
shouldhave
pssorigin=localhost
sitename=UKI-NORTHGRID-MAN-HEP
at the end
This changed a while back - but you
probably have the old config. The newest
yaim variables are at
Without the sitename option then it
will not get the correct set of
"rucioprefixes" which might be why its
searching in all those crazy places for
each file.
hopefully thats it - I didn't check
the rest..
Wahid
Number of jobs in
transferring state shouldn't be
that high. There might be some
other reason but the first
impression was that it was due to
this, considering the staggering
number of CLOSE_WAIT connections
today.
I put everything here
http://ks.tier2.hep.manchester.ac.uk/T2/tmp/xrootd-debug-20140306.tgz
thanks
cheers
alessandra
On
06/03/2014 19:11, Wahid Bhimji
wrote:
[log in to unmask]"
type="cite"> Hi
Some of those messages are
"normal " in looking for files
that don't exist.
But the number and odd
search paths (and the fact
that fedredir doesn't start
properly ) makes me think
something is wrong with the
manchester config.
can you send me the conf
files from /etc/xrootd and
versions
rpm -qa | grep xrootd
rpm -qa | grep -i n2n
and also full logs from
/var/log/xrootd/fedredir_atlas/xrootd.log
/var/log/xrootd/dpmredir/xrootd.log
and I'll see if I spot
something.
(PS - what makes you say
it is "clogging" transfers
at Glasgow (or manchester))
Wahid
Last one
of the day
The
/var/log/xrootd/fedredir_atlas/xrootd.log
is now full of these
messages since 18:23
140306
18:38:16 0x4bfff700
XRD-LFC No such file
or directory
/grid/atlas/users/pathena/user.annovi/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:16
0x4bfff700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.anventur/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:16
0x4bfff700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.aolszewski/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:16
0x4bfff700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.aoun/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:17
0x4bfff700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.apenson/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:17
0x4bfff700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.apereira/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-1M
140306 18:38:17
0x5aa0a700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.arobic/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x5aa0a700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.aroe/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x5a606700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.arnaudferrari/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x588e9700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.JoshuaMiloKunkle/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x587e8700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.GemmaHollyWooden/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:17
0x587e8700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.GiulioUsai/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x587e8700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.Gordon/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x599fa700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.Kerim/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x589ea700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.GiulioUsai/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
140306 18:38:18
0x59afb700 XRD-LFC
No such file or
directory
/grid/atlas/users/pathena/user.arnaez/ivukotic:user.ivukotic.xrootd.uki-northgrid-man-hep-100M
On
06/03/2014 18:25,
Alessandra Forti
wrote:
[log in to unmask]"
type="cite"> Also
fedredir keeps on not
restarting properly
[root@bohr3226
~]# service xrootd
restart
Shutting down
xrootd (xrootd,
redir):
[ OK ]
Shutting down
xrootd (xrootd,
disk):
[ OK ]
Shutting down
xrootd (xrootd,
fedredir_atlas):
[FAILED]
Starting
xrootd (xrootd,
redir):
[ OK ]
Starting
xrootd (xrootd,
disk):
[ OK ]
Starting
xrootd (xrootd,
fedredir_atlas):
[ OK ]
I've applied xrd.timeout
idle 1800
in the fedredir
configuration file
outside of any if
statement and
restarted xrootd.
let's hope it helps.
cheers
alessandra
On
06/03/2014 18:14,
Alessandra Forti
wrote:
[log in to unmask]"
type="cite">I
haven't applied the
recipe yet but
+10k CLOSE_WAIT
connections
Thu Mar 6 18:10:01
GMT 2014
11962 CLOSE_WAIT
if this is caused by
FAX this is not
going to work.
It is also clogging
the transfers from
jobs at Manchester
and looks like
Glasgow too
http://panda.cern.ch/server/pandamon/query?dash=prod
cheers
alessandra
On 05/03/2014 16:53,
Sam Skipsey wrote:
If
it's the federated
redirector that's
having problems,
then
xroot-dpmfedredir_atlas.cfg.
Otherwise, if it's
the local
redirector (and if
local jobs were
breaking, then I
guess it was?),
then
xrootd-dpmredir.cfg
(Or try changing
both?)
Sam
On 5 March 2014
16:38, Alessandra
Forti <[log in to unmask]>
wrote:
Which
of these files I
should change?
[root@bohr3226
xrootd]# ls
dpmxrd-sharedkey.dat
xrootd-dpmdisk.cfg
xrootd-dpmfedredir_atlas.cfg
xrootd-dpmredir.cfg
xrootd-standalone.cfg
xrootd-clustered.cfg
xrootd-dpmdisk.cfg.rpmnew
xrootd-dpmfedredir_atlas.cfg.rpmnew
xrootd-dpmredir.cfg.rpmnew
On 05/03/2014
16:00,
Alessandra Forti
wrote:
Andy Hanuchevsky
suggested one
could use the
idle option
"The particular
directive can be
found at:
http://xrootd.org/doc/prod/xrd_config.htm#_Toc310725348
specifically the
idle option."
However I am not
sure who has
tried that - nor
am I sure it is
the way
forward...
I can try this
although I'm not
sure a
connection in
CLOSE_WAIT can
be
considered
"Idle".
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.