Hello all,
We'd like to introduce a step of "validation" before putting in our production farm glite's production software, to test that it does not break anything in our installation and see if it fits in our setup. We don't have a "test" instance (yet), but we'd like to introduce one. We've been brainstorming regarding this topic and would really appreciate some of your input!
We find it's difficult for us to validate if a worker node is properly configured prior to adding it to production. I believe that the target would be to flawlessly run the SAM jobs on the given worker node (ops & vo's specific SAM jobs). In order to do that, we feel we would need a new "validation" queue that can only run jobs on the wn being validated.
The main problem is into getting the SAM job sent into that queue. We could go two routes:
- Setting up a "validation" CE, where experiments shouldn't send jobs (that would maybe invalidate the testing of vo specific SAM jobs because would need to restrict entry to dteam/ops?), and then publish the CE as a preproduction CE. Thus, SAM tests on the PPS infraestructure would run on the given worker node, and be able to validate it. For this to work, we assume that SAM PPS tests are exactly the same as SAM production tests. Is this really true?
- Setting up a "validation" CE, where experiments shouldn't send jobs, and then
publish the CE as a production CE (akin to our current ce-test.pic.es). That CE would be only for validation purposes but would many times be in error state, thus giving a poor view and somehow confusing gridview and the graphs related to availability.
- Downloading SAM tests (maybe dynamically, because they can change), and running them locally on the WN, without "grid-interaction". That would not test exactly the same as a real SAM job would (as it would be run differently) and would probably need to solve a lot of problems regarding certificates and proxies. It would also pose a problem to run SAM tests outside of the SAM framework, but have no idea if the effort would be big or small. So this option seems to have the most cons and the least pros :)
Do you think that these ideas are viable? Do you recommend one? Do you have a better alternative? Or ideas we might not have taken into account?
Thanks in advance,
Paco