Thank you very much, Chris.
Elena
On 28 Oct 2013, at 10:04, Christopher J. Walker wrote:
> On 28/10/13 00:24, Elena Korolkova wrote:
>> Hi Chris
>>
>> As I mentioned the problematic disk server is the last disk server which put in production. It hast the lowest weight in dpm ATM bus this doesn't solve the problem.
>> I don't have room to drain this this disk server and other disk servers. How did you solve this problem?
>>
>
> What I'm doing to move data onto new disk servers is:
>
>
> lfs find -type f --mtime +1 /mnt/lustre_0/storm_3 |grep -v ops |awk
> '{foo=rand(); if (foo<0.1) print $0 }' | lfs_migrate -y
>
> The idea being to move a random 10% of the data - at least some of which
> will end up on the new servers.
>
> If the new server is 10% of your capacity, if you could move 90% of the
> files off (and replace each one with a random file from the other disk
> servers), you'd be in better shape I think. This assumes file sizes are
> the same - which I suspect they are, but...
>
> How to actually do this with DPM, I have no idea.
>
>
> Chris
>
>>
>> Thanks you very much
>>
>> Elena
>>
>> On 27 Oct 2013, at 14:47, Christopher J. Walker wrote:
>>
>>> On 25/10/13 15:27, Wahid Bhimji wrote:
>>>> I would imagine they are FTS transfers rather than sam tests (not sure
>>>> why they run as sgmatl - guess that depends on the FTS server proxy
>>>>
>>>> maybe it could be cause the transfers are slow and get backed up.
>>>> Do you have the tcp tunings from
>>>> https://www.gridpp.ac.uk/wiki/UKTcpTuning
>>>>
>>>> in particular the larger default value for the tcp buffer size.
>>>>
>>>
>>> Did you bring this disk server online with lots of others, or on its own? If on its own, then perhaps it is a problem of poor data distribution. Maybe it has filled with a particular dataset. We've certainly experienced that with one disk server. What I did in that case was take it offline, drain it (and several others in fact), and put them back online together - along with migrating some data from the other disk servers.
>>>
>>>
>>> Chris
>>>
>>>
>>>> wahid
>>>>
>>>> On 25 Oct 2013, at 14:41, Elena Korolkova <[log in to unmask]
>>>> <mailto:[log in to unmask]>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Wahid has switched Sheffield to used xrootd ( many thanks) and things
>>>>> looks better.
>>>>>
>>>>> I still have one disk server overloaded (it's always the same disk
>>>>> server) with lots (104 atm) of globus-gridftp-server processesrun by
>>>>> sgmatl (sam tests???)
>>>>>
>>>>> YOur help is much appreciated.
>>>>>
>>>>> Elena
>>>>>
>>>>>
>>>>> sgmatl50 28733 7.9 0.0 118796 6956 ? D 11:10 16:45
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>> sgmatl50 28873 8.6 0.0 122932 8748 ? R 11:31 16:29
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>> sgmatl50 29025 8.5 0.0 122944 11120 ? R 11:50 14:31
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>> sgmatl50 29134 7.7 0.0 118800 6964 ? R 12:06 12:02
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>> sgmatl50 29174 8.8 0.0 122928 8672 ? R 12:16 12:54
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>> sgmatl50 29214 7.8 0.0 118800 6960 ? R 12:20 11:03
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>> sgmatl50 29216 8.4 0.0 122936 8740 ? R 12:21 11:49
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>> sgmatl50 29382 7.8 0.0 118800 6960 ? D 12:27 10:26
>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>> -inetd -config-base-path /
>>>>>
>>>>>
>>>>> lsof |grep globus |wc -l
>>>>> 480
>>>>>
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts09.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:43977 (ESTABLISHED)
>>>>> globus-gr 468 sgmatl50 17u IPv4 7384601
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24404->dpool18.triumf.ca
>>>>> <http://dpool18.triumf.ca>:52672 (ESTABLISHED)
>>>>> globus-gr 468 sgmatl50 18u IPv4 7384602
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24404->dpool18.triumf.ca
>>>>> <http://dpool18.triumf.ca>:52678 (ESTABLISHED)
>>>>> globus-gr 468 sgmatl50 19u IPv4 7384603
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24404->dpool18.triumf.ca
>>>>> <http://dpool18.triumf.ca>:52675 (ESTABLISHED)
>>>>> ......................
>>>>> globus-gr 468 sgmatl50 25u IPv4 7384609
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24404->dpool18.triumf.ca
>>>>> <http://dpool18.triumf.ca>:52680 (ESTABLISHED)
>>>>> globus-gr 493 sgmatl50 0u IPv4 7384772
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts06.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:45037 (CLOSE_WAIT)
>>>>> globus-gr 494 sgmatl50 0u IPv4 7384794
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts08.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:36046 (ESTABLISHED)
>>>>> globus-gr 494 sgmatl50 17u IPv4 7385039
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24414->dcdoor14.usatlas.bnl.gov
>>>>> <http://dcdoor14.usatlas.bnl.gov>:48440 (ESTABLISHED)
>>>>> globus-gr 494 sgmatl50 18u IPv4 7385040
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24414->dcdoor14.usatlas.bnl.gov
>>>>> <http://dcdoor14.usatlas.bnl.gov>:48445 (ESTABLISHED)
>>>>> .................................
>>>>> globus-gr 1031 sgmatl50 0u IPv4 7386025
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts05.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:41167 (ESTABLISHED)
>>>>> globus-gr 1031 sgmatl50 17u IPv4 7386168
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24416->dpool21.triumf.ca
>>>>> <http://dpool21.triumf.ca>:40546 (ESTABLISHED)
>>>>> globus-gr 1031 sgmatl50 18u IPv4 7386169
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24416->dpool21.triumf.ca
>>>>> <http://dpool21.triumf.ca>:40548 (ESTABLISHED)
>>>>> globus-gr 1031 sgmatl50 19u IPv4 7386170
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24416->dpool21.triumf.ca
>>>>> <http://dpool21.triumf.ca>:40554 (ESTABLISHED)
>>>>> ......................
>>>>> globus-gr 1101 sgmatl50 0u IPv4 7386163
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts05.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:41242 (ESTABLISHED)
>>>>> globus-gr 1101 sgmatl50 17u IPv4 7386955
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24428->dpool29.triumf.ca
>>>>> <http://dpool29.triumf.ca>:39636 (ESTABLISHED)
>>>>> ....................
>>>>> globus-gr 1763 sgmatl50 0u IPv4 7387191
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts09.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:49176 (ESTABLISHED)
>>>>> globus-gr 1763 sgmatl50 17u IPv4 7387608
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24422->dpool26.triumf.ca
>>>>> <http://dpool26.triumf.ca>:49830 (ESTABLISHED)
>>>>> globus-gr 1763 sgmatl50 18u IPv4 7387609
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24422->dpool26.triumf.ca
>>>>> <http://dpool26.triumf.ca>:49831 (ESTABLISHED)
>>>>> ................
>>>>> globus-gr 3043 sgmatl50 0u IPv4 7394899
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts10.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:59123 (CLOSE_WAIT)
>>>>> globus-gr 3583 sgmatl50 0u IPv4 7395759
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts07.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:35544 (ESTABLISHED)
>>>>> globus-gr 3583 sgmatl50 17u IPv4 7395928
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24398->se03.esc.qmul.ac.uk
>>>>> <http://esc.qmul.ac.uk>:52134 (ESTABLISHED)
>>>>> .........................
>>>>> globus-gr 3584 sgmatl50 17u IPv4 7395975
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24411->dpool18.triumf.ca
>>>>> <http://dpool18.triumf.ca>:36199 (ESTABLISHED)
>>>>> globus-gr 3584 sgmatl50 18u IPv4 7395976
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24411->dpool18.triumf.ca
>>>>> <http://dpool18.triumf.ca>:36205 (ESTABLISHED)
>>>>> ...................
>>>>> globus-gr 3605 sgmatl50 0u IPv4 7396013
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts09.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:51746 (CLOSE_WAIT)
>>>>> globus-gr 3624 sgmatl50 0u IPv4 7396090
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts07.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:36218 (CLOSE_WAIT)
>>>>> globus-gr 3645 sgmatl50 0u IPv4 7396314
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts10.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:32929 (ESTABLISHED)
>>>>> globus-gr 3645 sgmatl50 17u IPv4 7396338
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24394->se03.esc.qmul.ac.uk
>>>>> <http://esc.qmul.ac.uk>:47784 (ESTABLISHED)
>>>>> .....
>>>>> globus-gr 3671 sgmatl50 0u IPv4 7396603
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts05.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:46809 (CLOSE_WAIT)
>>>>> globus-gr 3717 sgmatl50 0u IPv4 7396978
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts05.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:48513 (ESTABLISHED)
>>>>> globus-gr 3717 sgmatl50 17u IPv4 7397169
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24418->lpnhe-ds22.in2p3.fr
>>>>> <http://lpnhe-ds22.in2p3.fr>:52933 (ESTABLISHED)
>>>>> globus-gr 3717 sgmatl50 18u IPv4 7397170
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24418->lpnhe-ds22.in2p3.fr
>>>>> <http://lpnhe-ds22.in2p3.fr>:52934 (ESTABLISHED)
>>>>> ...........
>>>>> globus-gr 3739 sgmatl50 17u IPv4 7397425
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24388->se04.esc.qmul.ac.uk
>>>>> <http://esc.qmul.ac.uk>:59821 (ESTABLISHED)
>>>>> globus-gr 3739 sgmatl50 18u IPv4 7397426
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24388->se04.esc.qmul.ac.uk
>>>>> <http://esc.qmul.ac.uk>:59824 (ESTABLISHED)
>>>>> ............
>>>>> globus-gr 3739 sgmatl50 26u IPv4 7397434
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24388->s
>>>>> globus-gr 3740 sgmatl50 0u IPv4 7397337
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts08.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:46984 (ESTABLISHED)
>>>>> globus-gr 3740 sgmatl50 17u IPv4 7397508
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24424->dcdoor10.usatlas.bnl.gov
>>>>> <http://dcdoor10.usatlas.bnl.gov>:36545 (ESTABLISHED)
>>>>> globus-gr 3740 sgmatl50 18u IPv4 7397509
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24424->dcdoor10.usatlas.bnl.gov
>>>>> <http://dcdoor10.usatlas.bnl.gov>:36548 (ESTABLISHED)
>>>>> globus-gr 3740 sgmatl50 19u IPv4 7397510
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24424->dcdoor10.usatlas.bnl.gov
>>>>> <http://dcdoor10.usatlas.bnl.gov>:36553 (ESTABLISHED)
>>>>> globus-gr 3740 sgmatl50 20u IPv4 7397511
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24424->dcdoor10.usatlas.bnl.gov
>>>>> <http://dcdoor10.usatlas.bnl.gov>:36554 (ESTABLISHED)
>>>>> .........
>>>>> globus-gr 3768 sgmatl50 25u IPv4 7397725
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24390->disk058.gla.scotgrid.ac.uk
>>>>> <http://gla.scotgrid.ac.uk>:44257 (ESTABLISHED)
>>>>> globus-gr 3768 sgmatl50 26u IPv4 7397726
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24390->disk058.gla.scotgrid.ac.uk
>>>>> <http://gla.scotgrid.ac.uk>:44260 (ESTABLISHED)
>>>>> globus-gr 3795 sgmatl50 0u IPv4 7397980
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts06.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:59871 (CLOSE_WAIT)
>>>>> ...........
>>>>> globus-gr 3805 sgmatl50 26u IPv4 7398608
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24410->disk056.gla.scotgrid.ac.uk
>>>>> <http://gla.scotgrid.ac.uk>:55130 (ESTABLISHED)
>>>>> globus-gr 3806 sgmatl50 0u IPv4 7398072
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts05.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:52855 (CLOSE_WAIT)
>>>>> globus-gr 3828 sgmatl50 0u IPv4 7398354
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts10.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:40646 (CLOSE_WAIT)
>>>>> globus-gr 3835 sgmatl50 0u IPv4 7398485
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts05.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:53275 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 17u IPv4 7398636
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35237 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 18u IPv4 7398646
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35241 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 19u IPv4 7398647
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35240 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 20u IPv4 7398648
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35245 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 21u IPv4 7398649
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35239 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 22u IPv4 7398650
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35238 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 23u IPv4 7398651
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35243 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 24u IPv4 7398652
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35244 (ESTABLISHED)
>>>>> globus-gr 3835 sgmatl50 25u IPv4 7398653
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24407->lpnhe-ds10.in2p3.fr
>>>>> <http://lpnhe-ds10.in2p3.fr>:35242 (ESTABLISHED)
>>>>> globus-gr 3836 sgmatl50 0u IPv4 7398495
>>>>> TCP
>>>>> lcgse10.shef.ac.uk:gsiftp->lcgfts10.gridpp.rl.ac.uk
>>>>> <http://gridpp.rl.ac.uk>:40654 (ESTABLISHED)
>>>>> globus-gr 3836 sgmatl50 17u IPv4 7398562
>>>>> TCP lcgse10.shef.ac.uk
>>>>> <http://lcgse10.shef.ac.uk>:24401->disk077.gla.scotgrid.ac.uk
>>>>> <http://gla.scotgrid.ac.uk>:60831 (ESTABLISHED)
>>>>>
>>>>> On 24 Oct 2013, at 09:53, Sam Skipsey wrote:
>>>>>
>>>>>> In other news, there's a blog entry (finally) on the GridPP Storage
>>>>>> blog about what we did at Glasgow to reduce load. (xrootd direct IO
>>>>>> and slowing the rate of analysis job starts).
>>>>>>
>>>>>> Sam
>>>>>>
>>>>>> On 24 October 2013 08:10, Wahid Bhimji <[log in to unmask]
>>>>>> <mailto:[log in to unmask]>> wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Sorry you are still having loading issues. A few thoughts:
>>>>>>>
>>>>>>> 1. Do you have the tuning applied on this disk servers from this page -
>>>>>>> specifically the block device readahead
>>>>>>> https://www.gridpp.ac.uk/wiki/Performance_and_Tuning#Tuning_block_device_readahead
>>>>>>>
>>>>>>> 2. I would move to xrootd instead of rfio for atlas jobs - at least
>>>>>>> we can
>>>>>>> then, if it still happens, ask the DPM core team - otherwise they
>>>>>>> will just
>>>>>>> suggest it. If you have xrootd running on all the machines it is a
>>>>>>> simple
>>>>>>> switch in agis.
>>>>>>>
>>>>>>> 3. Do others see this number of globus-gridftp-server processes? I
>>>>>>> looked
>>>>>>> on my disk servers and I currently only see one on each that has been
>>>>>>> running since over a month… this may be a sign of something strange or a
>>>>>>> symptom or it may be just me. Also mine doesn't have the -inetd
>>>>>>> option - but
>>>>>>> this might be all a red herrings:
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path /
>>>>>>>
>>>>>>> 4. Unbalanced datasets - maybe sam can help you there - at least to
>>>>>>> see if
>>>>>>> there are such datasets even if not to redistribute them
>>>>>>>
>>>>>>> Wahid
>>>>>>>
>>>>>>> On 24 Oct 2013, at 00:42, Elena Korolkova <[log in to unmask]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> close look at an overloaded disk server shows 130 globus-gridftp-server
>>>>>>> processes
>>>>>>>
>>>>>>> log/dpm-gsiftp/gridftp.log -Z /var/log/dpm-gsiftp/dpm-gsiftp.log
>>>>>>> -no-detach
>>>>>>> -config-base-path / -inetd -config-base-path /
>>>>>>> prdatl88 32448 0.8 0.0 118768 6936 ? D 17:15 0:29
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32467 0.7 0.0 118052 6196 ? D 17:16 0:24
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32488 0.8 0.0 118768 6932 ? D 17:17 0:28
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32505 0.8 0.0 118768 6928 ? D 17:18 0:25
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32530 0.6 0.0 118768 6932 ? D 17:19 0:19
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32574 1.3 0.0 122920 11100 ? S 17:21 0:40
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32575 0.7 0.0 118768 6928 ? D 17:21 0:22
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32591 0.7 0.0 118052 6192 ? S 17:22 0:22
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32598 1.0 0.0 121900 8132 ? D 17:23 0:30
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32632 0.6 0.0 118768 6936 ? D 17:25 0:17
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32701 1.2 0.0 122936 8696 ? S 17:28 0:31
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32713 1.6 0.0 122936 11136 ? S 17:29 0:40
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32717 0.6 0.0 118768 6928 ? D 17:30 0:17
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32740 0.7 0.0 118768 6932 ? D 17:31 0:18
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>> prdatl88 32763 0.6 0.0 118768 6932 ? D 17:32 0:14
>>>>>>> /usr/sbin/globus-gridftp-server -d all -p 2811 -auth-level 0 -dsi dpm
>>>>>>> -disable-usage-stats -l /var/log/dpm-gsiftp/gridftp.log -Z
>>>>>>> /var/log/dpm-gsiftp/dpm-gsiftp.log -no-detach -config-base-path / -inetd
>>>>>>> -config-base-path /
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In /var/log/rfio/log I see error messages like this:
>>>>>>>
>>>>>>> Oct 23 21:17:59 rfiod[7854]: Waiting for end of child 10737, status 1
>>>>>>> Oct 23 21:17:59 rfiod[10738]: doit(5): connection from [143.167.3.102]
>>>>>>> (lcgse0.shef.ac.uk)
>>>>>>> Oct 23 21:17:59 rfiod[10739]: doit(5): connection from [143.167.3.102]
>>>>>>> (lcgse0.shef.ac.uk)
>>>>>>> Oct 23 21:17:59 rfiod[10741]: doit(5): connection from [192.168.42.130]
>>>>>>> (wn030.hep)
>>>>>>> Oct 23 21:17:59 rfiod[10740]: doit(5): connection from [143.167.3.102]
>>>>>>> (lcgse0.shef.ac.uk)
>>>>>>> Oct 23 21:17:59 rfiod[10739]: Could not establish an authenticated
>>>>>>> connection: server_establish_context_ext: Could not receive token;
>>>>>>> _Csec_recv_token: Connection closed; Csec_server_set_service_name:
>>>>>>> Could not
>>>>>>> set service name; Csec_get_peer_service_name: Could not
>>>>>>> Cgetnetaddress: BAD
>>>>>>> ERROR NUMBER: 0 !
>>>>>>> Oct 23 21:17:59 rfiod[10742]: doit(5): connection from [143.167.3.102]
>>>>>>> (lcgse0.shef.ac.uk)
>>>>>>> Oct 23 21:17:59 rfiod[10740]: Could not establish an authenticated
>>>>>>> connection: server_establish_context_ext: Could not receive token;
>>>>>>> _Csec_recv_token: Connection closed; Csec_server_set_service_name:
>>>>>>> Could not
>>>>>>> set service name; Csec_get_peer_service_name: Could not
>>>>>>> Cgetnetaddress: BAD
>>>>>>> ERROR NUMBER: 0 !
>>>>>>> Oct 23 21:17:59 rfiod[10742]: Could not establish an authenticated
>>>>>>> connection: server_establish_context_ext: Could not receive token;
>>>>>>> _Csec_recv_token: Connection closed; Csec_server_set_service_name:
>>>>>>> Could not
>>>>>>> set service name; Csec_get_peer_service_name: Could not
>>>>>>> Cgetnetaddress: BAD
>>>>>>> ERROR NUMBER: 0 !
>>>>>>>
>>>>>>> ice_name: Could not set service name; Csec_get_peer_service_name:
>>>>>>> Could not
>>>>>>> Cgetnetaddress: BAD ERROR NUMBER: 0 !
>>>>>>> Oct 23 21:17:59 rfiod[10741]: Could not establish an authenticated
>>>>>>> connection: _Csec_recv_token: Connection closed;
>>>>>>> Csec_server_set_service_name: Could not set service name;
>>>>>>> Csec_get_peer_service_name: Could not Cgetnetaddress: BAD ERROR
>>>>>>> NUMBER: 0 !
>>>>>>>
>>>>>>>
>>>>>>> /var/log/dpm-gsiftp/gridftp.log :
>>>>>>>
>>>>>>> 11177] Wed Oct 23 21:47:49 2013 :: GFork functionality not enabled.:
>>>>>>> [11177] Wed Oct 23 21:47:49 2013 :: Configuration read from
>>>>>>> /etc/gridftp.conf.
>>>>>>> [11177] Wed Oct 23 21:47:49 2013 :: Server started in inetd mode.
>>>>>>> [11177] Wed Oct 23 21:47:49 2013 :: New connection from:
>>>>>>> lcgfts08.gridpp.rl.ac.uk:51809
>>>>>>> [11177] Wed Oct 23 21:47:49 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: USER :globus-mapping:
>>>>>>> [11177] Wed Oct 23 21:47:49 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 331 Password required for :globus-mapping:.
>>>>>>> [11177] Wed Oct 23 21:47:49 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: PASS
>>>>>>> [11177] Wed Oct 23 21:47:49 2013 :: request by /DC=ch/DC=cern/OU=Organic
>>>>>>> Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data Management
>>>>>>> from
>>>>>>> lcgfts08.gridpp.rl.ac.uk
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: DN /DC=ch/DC=cern/OU=Organic
>>>>>>> Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data Management
>>>>>>> successfully authorized.
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: User prdatl88 successfully
>>>>>>> authorized.
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: PASS
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 230 User prdatl88 logged in.
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: SITE HELP
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 214-The following commands are recognized:
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: FEAT
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 211-Extensions supported
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: SITE CLIENTINFO
>>>>>>> scheme=gsiftp;appname="libglobus_ftp_client";appver="7.4 (gcc64,
>>>>>>> 1340810069-83) [Globus Toolkit 5.2.1]";
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 250 OK.
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: TYPE I
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 200 Type set to I.
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: PBSZ 1048576
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 200 PBSZ=1048576
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: DELE
>>>>>>> /lcgse9.shef.ac.uk:/storage1/atlas/2013-10-23/AOD.01328071._004259.pool.root.1.63161335.0
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 500 Command failed : unlink error: Permission denied
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [CLIENT]: QUIT
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: lcgfts08.gridpp.rl.ac.uk:51809:
>>>>>>> [SERVER]: 221 Goodbye.
>>>>>>> [11177] Wed Oct 23 21:47:50 2013 :: Closed connection from
>>>>>>> lcgfts08.gridpp.rl.ac.uk:51809
>>>>>>> [7891] Wed Oct 23 21:47:50 2013 :: Child process 11177 ended with rc = 0
>>>>>>>
>>>>>>> [10443] Wed Oct 23 22:11:00 2013 :: Failure attempting to transfer
>>>>>>> "/lcgse9.shef.ac.uk:/storage1/atlas/2013-10-23/EVNT.01361822._000004.pool.root.1.63152845.0".
>>>>>>> [10443] Wed Oct 23 22:11:00 2013 :: Transfer failure:
>>>>>>> [10443] Wed Oct 23 22:11:00 2013 :: force_close:
>>>>>>> [10443] Wed Oct 23 22:11:00 2013 :: lcgfta02.gridpp.rl.ac.uk:57242:
>>>>>>> [SERVER]: 500-Command failed. : globus_xio: System error in send: Broken
>>>>>>> pipe
>>>>>>> [10443] Wed Oct 23 22:11:00 2013 :: Closed connection from
>>>>>>> lcgfta02.gridpp.rl.ac.uk:57242
>>>>>>> [7891] Wed Oct 23 22:11:00 2013 :: Child process 10443 ended with rc = 0
>>>>>>>
>>>>>>>
>>>>>>> It looks like "Permission denied" errors are seen when disk servers are
>>>>>>> overloaded:
>>>>>>> [root@lcgse0 ~]# for i in `seq 10`; do ssh -tx root@se`printf %01d $i`
>>>>>>> 'grep "Permission denied" /var/log/dpm-gsiftp/gridftp.log|wc -l';done
>>>>>>> 29
>>>>>>> Connection to se1 closed.
>>>>>>> 26
>>>>>>> Connection to se2 closed.
>>>>>>> 0
>>>>>>> Connection to se3 closed.
>>>>>>> 0
>>>>>>> Connection to se4 closed.
>>>>>>> 108
>>>>>>> Connection to se5 closed.
>>>>>>> 129
>>>>>>> Connection to se6 closed.
>>>>>>> 62
>>>>>>> Connection to se7 closed.
>>>>>>> 56
>>>>>>> Connection to se8 closed.
>>>>>>> 87
>>>>>>> Connection to se9 closed.
>>>>>>> 11
>>>>>>> Connection to se10 closed.
>>>>>>>
>>>>>>>
>>>>>>> Any help/ideas are greatly appreciated.
>>>>>>>
>>>>>>> Elena
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> __________________________________________________
>>>>>>> Dr Elena Korolkova
>>>>>>> Email: [log in to unmask]
>>>>>>> Tel.: +44 (0)114 2223553
>>>>>>> Fax: +44 (0)114 2223555
>>>>>>> Department of Physics and Astronomy
>>>>>>> University of Sheffield
>>>>>>> Sheffield, S3 7RH, United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>>>>
>>>>> __________________________________________________
>>>>> Dr Elena Korolkova
>>>>> Email: [log in to unmask] <mailto:[log in to unmask]>
>>>>> Tel.: +44 (0)114 2223553
>>>>> Fax: +44 (0)114 2223555
>>>>> Department of Physics and Astronomy
>>>>> University of Sheffield
>>>>> Sheffield, S3 7RH, United Kingdom
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>
>> __________________________________________________
>> Dr Elena Korolkova
>> Email: [log in to unmask]
>> Tel.: +44 (0)114 2223553
>> Fax: +44 (0)114 2223555
>> Department of Physics and Astronomy
>> University of Sheffield
>> Sheffield, S3 7RH, United Kingdom
>>
>>
>>
>>
>
__________________________________________________
Dr Elena Korolkova
Email: [log in to unmask]
Tel.: +44 (0)114 2223553
Fax: +44 (0)114 2223555
Department of Physics and Astronomy
University of Sheffield
Sheffield, S3 7RH, United Kingdom
|