Hi Jeff,
Am 04.11.2010 10:07, schrieb Jeff Templon:
> 1) is the directory tree /home/ziops022 visible on the CE node, the LRMS head node, and the WNs??
Yes, the homes are NFS exported.
> 2) what are you doing to clean up the junk in the home dirs? The problem sounds like ones that have been seen in the past when a home dir cleanup script was not careful enough. The one distributed with gLite has the benefit of including things learned from from years of mistakes (several of them mine). If you run cleanup, I recommend that you use that script, unmodified.
We just use the things provided by glite, no other cleanup scripts.
Cheers,
Ralph
> J "cpa.sh anyone?" T
>
> On Nov 3, 2010, at 17:42 , Ralph Mueller-Pfefferkorn wrote:
>
>> Hi there,
>>
>> we have mysterious problem.
>> We run an extra torque server and a lcg-CE.
>> After an update of the torque server (operating system update), suddenly
>> all jobs are submitted several times to the system.
>>
>> A job arrives at the lcg-CE and is passed to the torque server (other
>> machine). Torque accepts the job and runs it.
>> The logs both on the CE and torque look normal. But after about half a
>> minute the same job (same Grid ID) is submitted again to the lcg-CE. And
>> again and again. The same job is submitted 9 times.
>>
>> The jobs then fail when trying to copy there output from the WN to the CE:
>> from WN /var/log/messages:
>> Nov 3 17:36:03 r1i1n15 pbs_mom: sys_copy, command '/usr/bin/scp -rpB
>> /var/spool/pbs/spool/5137506.service0.ice.zih.tu-dresden.de.OU
>> [log in to unmask]:/home/ziops022/.globus/job/desdemona.zih.tu-dresden.de/32270.1288802028/stdout'
>> failed with status=1, giving up after 4 attempts
>> Nov 3 17:36:03 r1i1n15 pbs_mom: req_cpyfile, Unable to copy file
>> /var/spool/pbs/spool/5137506.service0.ice.zih.tu-dresden.de.OU to
>> [log in to unmask]:/home/ziops022/.globus/job/desdemona.zih.tu-dresden.de/32270.1288802028/stdout
>>
>> The reason is that the directory
>> /home/ziops022/.globus/job/desdemona.zih.tu-dresden.de/32270.1288802028/
>> on the CE does not exist anymore.
>>
>> Does anybody have an idea where to look?
>>
>> Cheers,
>> Ralph
>>
>
|