Hi, Andrew:
Thanks for the clarification, then it must be due to something else on samnag-ai-0?.cern.ch, those jobs got held before arriving ARC-CE.
Cheers,Gang
________________________________________
From: Testbed Support for GridPP member institutes [[log in to unmask]] on behalf of Andrew Lahiff [[log in to unmask]]
Sent: Tuesday, July 07, 2015 5:53 PM
To: [log in to unmask]
Subject: Re: Condor held jobs
Hi Gang,
AFAIK the SAM tests just use HTCondor-G to submit jobs to a CE (in our case an ARC CE using gridftp), and hence is unrelated to Close_Pipe errors.
Regards,
Andrew.
________________________________________
From: Testbed Support for GridPP member institutes [[log in to unmask]] on behalf of qing [[log in to unmask]]
Sent: Tuesday, July 07, 2015 4:28 PM
To: [log in to unmask]
Subject: Re: Condor held jobs
Hi, Andrew:
Although site has upgraded to 8.2.7, ATLAS SAM3 test jobs could
still encounter this problem because samnag-ai-0?.cern.ch are running
condor_8.2.3.
http://wlcg-sam-atlas.cern.ch/templates/ember/#/historicalsmry/heatMap?flavours=ARC-CE&group=All%20sites&hostname=arc-ce01.gridpp.rl.ac.uk%2Carc-ce02.gridpp.rl.ac.uk%2Carc-ce03.gridpp.rl.ac.uk%2Carc-ce04.gridpp.rl.ac.uk&profile=ATLAS_CRITICAL&site=RAL-LCG2&time=Last%20Week&view=Test%20History
Just wondering if you have already ticketed the SAM team about this
issue.
Cheers,Gang
On 02/07/2015 11:34, Andrew Lahiff wrote:
> Hi Kashif,
>
> What version of condor are you using? This problem should now be fixed (at least it is for us with 8.2.7).
>
> Regards,
> Andrew.
>
> ________________________________
> From: Testbed Support for GridPP member institutes [[log in to unmask]] on behalf of Kashif Mohammad [[log in to unmask]]
> Sent: Thursday, July 02, 2015 11:22 AM
> To: [log in to unmask]
> Subject: Condor held jobs
>
>
> Hi
>
> I am investigating an issue with Condor held jobs. Some of the jobs stay in Held state for long time. The jobs are failing with Close_Pipe error and tracing back to the worker node, it shows this error
>
> ERROR "Close_Pipe error" at line 2089 in file /slots/02/dir_35085/userdir/src/condor_daemon_core.V6/daemon_core.cpp
>
> Andrew Lahiff mentioned same problem on HTCondor-users list but it seems that it didn’t reach on any conclusion.
>
> Andrew, did you manage to fix this issue? Any one else has seen this?
>
> Thanks
>
> Kashif
|