Print

Print


Dear all,

In the ops meeting we had a bit of discussion around monitoring and I asked whether people might be able to give any feedback for the consolidation group. With that in mind (and with grateful thanks for the feedback we've already got) I wanted to freshen this thread and repost the questions from the monitoring consolidation group. The next consolidation meeting is a week on Friday, so what I'd like to ask is if people could consider these questions and come back with any notes they have by next Wednesday, I'll collate them and send them on to the consolidation group.

So, with that in mind:

From the consolidation group - 

"
[Initial context]:

There are two hot topics these days:
* How to setup the validity of the [SAM] tests, so that the sites don’t get penalised if the jobs wait for too long in the queue and that at the same time the tests can be used for operations
* How the sites give the number of pledges and capacities used. 

[Note that the second of these is related to the goal to merge REBUS into the consolidated monitoring project] 

The questions we've been asked, in the context of monthly A/R reports coming from experiment tests as opposed to ops tests:

* when [SAM] tests fail, do you know how to fix it by yourselves, or do you need  input from the experiments?

* do you use tools like the sum web portal to check the status of your sites? [ http://dashb-sum.cern.ch/AllVOs.html ]

* Related to [ http://rebus.cern.ch/apps/capacities/sites/ ] ,  and all the tabs in this applications. This application shows the yearly pledges and the monthly usage of the different sites and federations. The current understanding is that the WLCG office fills up some of this data, and the rest comes from the site administrators (either directly, or going through the information published in the BDII). There was this question of how to combine pledges with the job monitoring information that we have from other sources, and the easiest way to solve it would be to store all of them in the same place, so they wanted to understand how [or if!] you guys publish this data, and how you use it.

* Some of the information about the usage of the site is also collected from the BDII (sent from the local BDII that you guys control). Do you think that information is reliable? [I'm not sure here what aspects of reliability are intended here, so please read with that in mind]
"

If anyone has anything they'd like to comment on (or talk about in Ops at some point) please let me know and I'll collate everyone's views and pass them on.

Best wishes,
David