Hi,
I thought multi-stream gridftp always failed with a firewall because it
was 'ACTIVE' and needed inbound connectivity. For sure, the ATLAS
production uses single stream lcg-cp for this reason.
Cheers,
Rod.
On Fri, 6 Oct 2006, Maarten Litmaath wrote:
> Marco La Rosa wrote:
>
>> Hi all,
>>
>> This is a long mail... Apologies in advance!
>>
>> Periodically I have errors with ATLAS user jobs and globus-url-copy.
>> Unfortunately it's not related to a specific user nor to a specific SE.
>> So, i'm finding it difficult to track down the problem.
>
> You may have a campus firewall that does not like source ports immediately
> getting reused for independent connections. Try to ensure the environment
> variable GLOBUS_TCP_PORT_RANGE is unset on your WNs, e.g. through an extra
> script in /etc/profile.d like this:
>
> --------------------------------------------------------------------------
> #!/bin/sh
> unset GLOBUS_TCP_PORT_RANGE
> --------------------------------------------------------------------------
>
>> What I do know.
>>
>> 1. It's almost always a globus-url-copy - occasionally it's lcg-cr
>> 2. At the moment I have 18 jobs from a particular user. Every job is
>> trying to do a globus-url-copy and seems to be frozen.
>> 3. It doesn't happen with all ATLAS users - just some - and it seems to
>> be the same ones over and again.
>>
>> Sample qstat output:
>>
>> ---snip---
>> 12465.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12466.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12467.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12469.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12470.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12471.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12473.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12475.charm-mgt STDIN atlas029 00:00:09 R atlas
>> 12476.charm-mgt STDIN atlas029 00:00:09 R atlas
>> ---snip---
>>
>> Notice the jobs have only run for a few seconds. In reality, they've
>> been on the site since last night some time.
>>
>> At the moment, the following SEs are involved:
>> gsiftp://harry.hagrid.it.uu.se
>> gsiftp://ss1.hpc2n.umu.se
>> gsiftp://dcgftp.usatlas.bnl.gov
>>
>> In the past it's also involved SEs at *.usatlas.bnl.gov and *.se.
>> Unfortunately I don't have more info than that.
>>> From previous attempts at trying to find the problem, I've tried running
>> the users command myself on the WNs and I get errors like:
>>
>> (Note: attempted another time and the SE was different then)
>>
>> [atlas029@pnet25 tmp]$ globus-url-copy -p 10 -dbg
>> gsiftp://ss2.hpc2n.umu.se:2811/ss2_se2/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00448.pool.root
>> file:///tmp/test.file
>> debug: starting to get
>> gsiftp://ss2.hpc2n.umu.se:2811/ss2_se2/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00448.pool.root
>> debug: connecting to
>> gsiftp://ss2.hpc2n.umu.se:2811/ss2_se2/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00448.pool.root
>> debug: error reading response from
>> gsiftp://ss2.hpc2n.umu.se:2811/ss2_se2/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00448.pool.root:
>> an end-of-file was reached
>> debug: fault on connection to
>> gsiftp://ss2.hpc2n.umu.se:2811/ss2_se2/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00448.pool.root:
>> an end-of-file was reached
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb74d5008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb6bcc008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb6ccd008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb6dce008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb6ecf008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb6fd0008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb70d1008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb71d2008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb72d3008, length 0, offset=0, eof=true
>> debug: data callback, error an end-of-file was reached, buffer
>> 0xb73d4008, length 0, offset=0, eof=true
>> debug: operation complete
>> error: an end-of-file was reached
>>
>> globus-url-copy -dbg -p 10
>> gsiftp://pikolit.ijs.si:2811/SE1/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00478.pool.root
>>
>> ---snip----
>>
>> debug: response from
>> gsiftp://pikolit.ijs.si:2811/SE1/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00478.pool.root:
>> 150 Opening connection.
>>
>> debug: reading into data buffer 0xb74d8008, maximum length 1048576
>> debug: reading into data buffer 0xb6bcf008, maximum length 1048576
>> debug: reading into data buffer 0xb6cd0008, maximum length 1048576
>> debug: reading into data buffer 0xb6dd1008, maximum length 1048576
>> debug: reading into data buffer 0xb6ed2008, maximum length 1048576
>> debug: reading into data buffer 0xb6fd3008, maximum length 1048576
>> debug: reading into data buffer 0xb70d4008, maximum length 1048576
>> debug: reading into data buffer 0xb71d5008, maximum length 1048576
>> debug: reading into data buffer 0xb72d6008, maximum length 1048576
>> debug: reading into data buffer 0xb73d7008, maximum length 1048576
>> debug: response from
>> gsiftp://pikolit.ijs.si:2811/SE1/atlas/sc3/csc11.005009.J0_pythia_jetjet.digit.RDO.v11004203._00478.pool.root:
>> 426 Transfer terminated
>>
>> The interesting thing is that, in the past when I've tried the copy's
>> myself (su'ed to the unix atlas* user in question at the time),
>> sometimes it would work and start transferring, and other times I would
>> get an error like those above.
>>
>> I suppose the point is that I don't know what to do to track this
>> problem down. Can anyone suggest how I might go about sorting out what's
>> happening?
>>
>> Thanx in advance!
>> Marco
>
>
--
Tel. +1 604 222 7667
|