Print

Print


Hi John,

	The same thing happend on my side, the other lhcb jobs
quitting without problems. Ricardo is mapped to sgm (i guess no notice 
about job failure) Looking into the log files they are using expired proxy 
too.

globus_gsi_callback.c:436:
globus_i_gsi_callback_cred_verify:
The certificate has expired:
Credential with subject: /C=CH/O=CERN/OU=GRID/CN=Gianluca Castellani 2685/CN=proxy has expired.
Failure: GSS failed Major:000a0000 Minor:00000006 Token:00000000

Thanks

Cheers
Paul

On Wed, 24 May 2006, Gordon, JC (John) wrote:

> Steve, is this a bug? Or just insufficiently recognised as a feature? 
> 
> I can raise a ticket against LHCb asking Ricardo to use a voms proxy but
> who else should I report it to? Atlas obviously but what about the
> deployment team and JRA1?
> 
> John 
> 
> > -----Original Message-----
> > From: Testbed Support for GridPP member institutes 
> > [mailto:[log in to unmask]] On Behalf Of Steve Traylen
> > Sent: 24 May 2006 16:58
> > To: [log in to unmask]
> > Subject: Re: shared experiment area load
> > 
> > On Wed, May 24, 2006 at 09:57:29AM +0100 or thereabouts, 
> > Alessandra Forti wrote:
> > > I was wondering can you ask if ricardo is using
> > > 
> > > voms-proxy-init -voms lhcb:Role=Admin
> > > 
> > > or something like that?
> > > 
> > > It is possible that he is starting jobs with different 
> > roles proxies 
> > > this is might be he is mapped differently on different systems.
> > 
> > 
> > Looking at 
> > 
> > https://lcg-voms.cern.ch:8443/voms/lhcb/webui/admin/users/list
> > ?rolename=Role%3Dlcgadmin&groupname=%2Flhcb
> > 
> > Ricardo is in the /lhcb/Role=lcgadmin.
> > 
> > This is configured in out mkgridmap so that the gmf is populated with
> > this list and so Ricardo does not use a voms proxy he will be 
> > mapped to
> > sgm allways.
> > 
> > You can check someones proxy on your batch system by running 
> > qstat -f <jobid> on the CE to 
> > find 
> > 
> > X509_USER_PROXY=/home/lhcbsgm/.globus/.gass_cache/local/md5/9e/9b57a46
> >     e100edd5d9939a1a6a9af40/md5/f6/8865ed8caee346e27adf8c6f949b23/data
> > 
> > 
> > # voms-proxy-info -file 
> > /home/lhcbsgm/.globus/.gass_cache/local/md5/9e/9b57a46e100edd5
> > d9939a1a6a9af40/md5/f6/8865ed8caee346e27adf8c6f949b23/data
> > subject   : /C=ES/O=DATAGRID-ES/O=UB/CN=Ricardo 
> > Graciani/CN=proxy/CN=proxy/CN=limited proxy
> > issuer    : /C=ES/O=DATAGRID-ES/O=UB/CN=Ricardo 
> > Graciani/CN=proxy/CN=proxy
> > identity  : /C=ES/O=DATAGRID-ES/O=UB/CN=Ricardo 
> > Graciani/CN=proxy/CN=proxy
> > type      : limited proxy
> > strength  : 512 bits
> > path      : 
> > /home/lhcbsgm/.globus/.gass_cache/local/md5/9e/9b57a46e100edd5
> > d9939a1a6a9af40/md5/f6/8865ed8caee346e27adf8c6f949b23/data
> > 
> > 
> > so it is not a voms proxy. 
> > 
> > Now LHCb have recently reorganised their groups and roles as 
> > almost defined in their 
> > CIC VOMS card and I think they are now expecting that 
> > 
> > /lhcb/sgm
> > 
> > https://lcg-voms.cern.ch:8443/voms/lhcb/webui/admin/users/list
> > ?groupname=%2Flhcb%2Fsgm
> > 
> > be mapped to the SGM user so all the sites are currently 
> > configured wrong against
> > LHCb's latest schema.
> > 
> > But then this group still contains Ricard so it won't actually help.
> > 
> > I guess that  the VOMS groups are becoming more populated since people
> > can select which group they  run under so they might as well 
> > be in all them. The result
> > of this though is that generation of the grid-mapfile becomes 
> > not as planned.
> > 
> > But it is problably true as Alessandra says that Ricardo 
> > using voms-proxy-init 
> > and not selecting his sgm group for proxy would probably 
> > correct things.
> > 
> >  Steve
> > 
> > 
> > 
> > 
> > 
> > 
> > > 
> > > cheers
> > > alessandra
> > > 
> > > Alessandra Forti wrote:
> > > >It is extracted from voms.
> > > >
> > > >Gordon, JC (John) wrote:
> > > >>Nick and Philippe of LHCb tell me Ricardo is in the lhcb 
> > sgm group in
> > > >>VOMS but he should currently be running tests, not 
> > production. It sounds
> > > >>like the gmf generation is not consistent across sites.
> > > >>I have asked Ricardo what he thinks should be happening. 
> > Then I'll raise
> > > >>a ticket (or ask Olivier to do so).
> > > >>
> > > >>Does anyone know how the sgm in the gmf should be defined 
> > today? Is it
> > > >>extracted from VOMS? Or defined in YAIM?
> > > >>
> > > >>John
> > > >>
> > > >>>-----Original Message-----
> > > >>>From: Testbed Support for GridPP member institutes 
> > > >>>[mailto:[log in to unmask]] On Behalf Of Olivier 
> > van der Aa
> > > >>>Sent: 23 May 2006 17:35
> > > >>>To: [log in to unmask]
> > > >>>Subject: Re: shared experiment area load
> > > >>>
> > > >>>Gordon, JC (John) wrote:
> > > >>>>Olivier, I am sitting next to Nick Brook and he says that lhcb
> > > >>>>production jobs should not run as sgm. Is this happening at 
> > > >>>other sites?
> > > >>>When checking the gridmapfile I can find only 3 sgm users.
> > > >>>Alex could you tell us when you saw a lot of lhcb sgm 
> > jobs ? When I 
> > > >>>look now i only see normal lhcb
> > > >>>
> > > >>>
> > > >>>Olivier.
> > > >>>>Can you tell me the DN of the user being mapped to sgm, if 
> > > >>>that doesn't
> > > >>>>break your data security policy:-) Nick thinks the gridmapfile
> > > >>>>generation may not be correct.
> > > >>>>John
> > > >>>>>-----Original Message-----
> > > >>>>>From: Testbed Support for GridPP member institutes 
> > > >>>>>[mailto:[log in to unmask]] On Behalf Of 
> > Olivier van der Aa
> > > >>>>>Sent: 23 May 2006 15:49
> > > >>>>>To: [log in to unmask]
> > > >>>>>Subject: shared experiment area load
> > > >>>>>
> > > >>>>>Dear All,
> > > >>>>>
> > > >>>>>At QMUL we have a load problem with the experimental 
> > shared area.
> > > >>>>>The farm is running around 900 jobs and the nfs server 
> > serving the 
> > > >>>>>experimental area is overloaded.
> > > >>>>>
> > > >>>>>The result of that is that lhcb jobs sits for a long time 
> > > >>>on the wn
> > > >>>>>waiting for data (mainly libraries).
> > > >>>>>
> > > >>>>>We would like to know how this is solved at ral, 
> > > >>>manchester where the
> > > >>>>>size is similar. We where thinking of setting up a set 
> > of pbs slots 
> > > >>>>>for the sgm to have rw access. The other nodes would 
> > just have a 
> > > >>>>>copy on the local disk or access through several nfs servers.
> > > >>>>>
> > > >>>>>I think the problem with the small set of wn having rw 
> > > >>>access is that
> > > >>>>>lhcb is sending a lot of jobs via one user who is sgm. 
> > > >>>Most of those
> > > >>>>>jobs do not write to the experimental software area 
> > but they would 
> > > >>>>>stack to wait for the wn to be freed.
> > > >>>>>
> > > >>>>>We are keen to have your experience on that topic.
> > > >>>>>
> > > >>>>>Cheers, Olivier.
> > > >>>>>
> > > >>>>>-- 
> > > >>>>>- O. van der Aa - Imperial College London -
> > > >>>>>-       LT2 Technical Coordinator         -
> > > >>>>>- tel: +442075947810, +442071005426       -
> > > >>>>>- SIP: [log in to unmask]              -
> > > >>>>>- fax: +442078238830                      -
> > > >>>>>- http://surl.se/agtu                     -
> > > >>>>>
> > > >>>
> > > >>>-- 
> > > >>>- O. van der Aa - Imperial College London -
> > > >>>-       LT2 Technical Coordinator         -
> > > >>>- tel: +442075947810, +442071005426       -
> > > >>>- SIP: [log in to unmask]              -
> > > >>>- fax: +442078238830                      -
> > > >>>- http://surl.se/agtu                     -
> > > >>>
> > > >
> > > 
> > > -- 
> > > *******************************************
> > > * Dr Alessandra Forti			  *
> > > * Technical Coordinator - NorthGrid Tier2 *
> > > * http://www.hep.man.ac.uk/u/aforti	  *
> > > *******************************************
> > 
> > -- 
> > Steve Traylen
> > [log in to unmask]
> > http://www.gridpp.ac.uk/
> > 
> 

-- 

Dr. Paul A. Trepka       ;Intl:+44(0)151 794 2137
Oliver Lodge Laboratory  ;Fax: +44(0)151 794 3444
Dept. of Physics         ;e-mail: [log in to unmask]
The University of Liverpool
Liverpool L69 7ZE
England, UK