Print

Print


Hello. 

Sort of. The script considers only the CEs which accept the ATLAS VO and
since yesterday I consider this issue critical and therefore banned from
ATLAS production the sites still affected. When you do this in FCR, the
way the banning is done is removing the ACBR  VO:ATLAS from the CE.
Therefore the script does not check those CEs anymore. We should
therefore run the script against some BDII which is not filtered by FCR,
like the one user for SAM tests. 

Having said this, it is true that the situation had a big improvement in
the last week, thanks to the ticketing action. The currently banned
sites are are about 10, while 1 week ago there were 60 problematic
sites. 

		_____________________________________________
		From: Antonio Retico 
		Sent: Thursday, November 15, 2007 11:26 AM
		To: Nicholas Thackray; grid-operations-meeting
(Attendees of the Grid Operations Meetings); LHC Computer Grid - Rollout
		Subject: RE: Minutes and actions from the WLCG-OSG-EGEE
Grid Operations Meeting of 12 Nov 2007
		

		hi Nick,

		about action 67 and the comment from Alessandro and
Simone reported in the minutes:

		"There is a .txt file with a query you should run on the
BDII to gather the relevant info. In addition there is a python script
which fetches info from the output of the query and generates and
output, where sites marked with ==> are the problematic ones. "

		I suspect that the attached log was still the one from
previous week. The situation, as reported by the same script, looks now
considerably better. Most of the GGUS ticket we opened were closed and 
		today I see

		[aretico@lxplus204 VOVIEW] python
/afs/cern.ch/user/c/campanas/public/VOVIEW/VOViewsConsist.py /tmp/a.a |
grep "===>"
		===>
CE:bigmac-lcg-ce.physics.utoronto.ca:2119/jobmanager-lcgcondor-atlas
TOTrun:9     TOTVOrun:0    TOTwait:0     TOTVOwait:4444
		===> CE:ce04-lcg.cr.cnaf.infn.it:2119/blah-lsf-atlas
TOTrun:170     TOTVOrun:0    TOTwait:0     TOTVOwait:0
		Traceback (most recent call last):
		  File
"/afs/cern.ch/user/c/campanas/public/VOVIEW/VOViewsConsist.py", line 12,
in ?
		    TOTVOrun += int(a)
		ValueError: invalid literal for int(): _UNDEF_
		===>
CE:epgce2.ph.bham.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:0
TOTVOrun:0    TOTwait:4444     TOTVOwait:57772

		Three sites still failing, on of which with a generic
error.
		I would be rather oriented to close the action, if Atlas
agrees, of course.

		Thanks.

		Antonio
		
		
			_____________________________________________ 
			From: 	Nicholas Thackray  
			Sent:	Thursday, November 15, 2007 10:19 AM
			To:	grid-operations-meeting (Attendees of
the Grid Operations Meetings); LHC Computer Grid - Rollout
			Subject:	Minutes and actions from the
WLCG-OSG-EGEE Grid Operations Meeting of 12 Nov 2007

			Dear all


			The minutes and action list from the Grid
Operations meeting of Monday 12th November can be found attached to the
agenda page, here:
	
http://indico.cern.ch/conferenceDisplay.py?confId=23786 

			In particular, please check the list of
attendees as there are some names missing.  Send any comments or
corrections directly to me.


			The next meeting will be on Monday, 19th
November 2007 15:00 UTC (16:00 Swiss local time)
			 


			Best regards,
			 
				Nick Thackray