Simon George wrote:
> I also had a strange nfs server overload problem with lhcb jobs just
> before Easter. I could try to dig out the details. Basically all their
> jobs were copying several GB of files from the VO sw dir to local disk of
> the WN at startup, and with 20 or 30 jobs starting around the same time
> this was enough to affect a DOS on the server. I tried to follow it up
> with them but it was never concluded.
This is I think what happens at qmul. How does ral tackle with that ?
Is rsync a better idea ?
Cheers, Olivier.
>
> Cheers,
> Simon
>
> On Tue, 23 May 2006, Olivier van der Aa wrote:
>
>> Gordon, JC (John) wrote:
>>> Olivier, I am sitting next to Nick Brook and he says that lhcb
>>> production jobs should not run as sgm. Is this happening at other sites?
>>>
>> When checking the gridmapfile I can find only 3 sgm users.
>> Alex could you tell us when you saw a lot of lhcb sgm jobs ? When I look now i
>> only see normal lhcb
>>
>>
>> Olivier.
>>> Can you tell me the DN of the user being mapped to sgm, if that doesn't
>>> break your data security policy:-) Nick thinks the gridmapfile
>>> generation may not be correct. John
>>>> -----Original Message-----
>>>> From: Testbed Support for GridPP member institutes
>>>> [mailto:[log in to unmask]] On Behalf Of Olivier van der Aa
>>>> Sent: 23 May 2006 15:49
>>>> To: [log in to unmask]
>>>> Subject: shared experiment area load
>>>>
>>>> Dear All,
>>>>
>>>> At QMUL we have a load problem with the experimental shared area.
>>>> The farm is running around 900 jobs and the nfs server serving the
>>>> experimental area is overloaded.
>>>>
>>>> The result of that is that lhcb jobs sits for a long time on the wn
>>>> waiting for data (mainly libraries).
>>>>
>>>> We would like to know how this is solved at ral, manchester where the size
>>>> is similar. We where thinking of setting up a set of pbs slots for the sgm
>>>> to have rw access. The other nodes would just have a copy on the local
>>>> disk or access through several nfs servers.
>>>>
>>>> I think the problem with the small set of wn having rw access is that
>>>> lhcb is sending a lot of jobs via one user who is sgm. Most of those jobs
>>>> do not write to the experimental software area but they would stack to
>>>> wait for the wn to be freed.
>>>>
>>>> We are keen to have your experience on that topic.
>>>>
>>>> Cheers, Olivier.
>>>>
>>>> --
>>>> - O. van der Aa - Imperial College London -
>>>> - LT2 Technical Coordinator -
>>>> - tel: +442075947810, +442071005426 -
>>>> - SIP: [log in to unmask] -
>>>> - fax: +442078238830 -
>>>> - http://surl.se/agtu -
>>>>
>>
>> --
>> - O. van der Aa - Imperial College London -
>> - LT2 Technical Coordinator -
>> - tel: +442075947810, +442071005426 -
>> - SIP: [log in to unmask] -
>> - fax: +442078238830 -
>> - http://surl.se/agtu -
>>
--
- O. van der Aa - Imperial College London -
- LT2 Technical Coordinator -
- tel: +442075947810, +442071005426 -
- SIP: [log in to unmask] -
- fax: +442078238830 -
- http://surl.se/agtu -
|