Dear All,
The issue is related to the CERN firewall closure.
Thursday 25th is an official holiday at CERN, therefore
as a temporary solution, I disabled the SAME publishing
by defining a dummy sting as SAME_PUBLISHER_WSDL.
Submitting 2 SFTs we did during the transition period
(~1 week) while we moved from the SAME validation DB
to the SAME production DB. The move was finished today.
Since morning we went back to the regular 3 hours
SFT submission.
Judit
On sze, máj 24, David McBride wrote:
> Hi,
>
> It appears that there are now two sets of SFTs being regularly submitted
> to each site; each submitter is sending test jobs every 3 hours
> (resulting in a double-than-normal test job load.)
>
> My problem is even more severe as my virtual queues bucket by
> wallclock-time, not VO; all of these over-running jobs are being sent to
> my 10min queue (which is normally sufficient for SFTs).
>
> [ Note! I cannot simply increase the maximum wallclock time for my
> dteam queue to 3 hours, as others have suggested, because I do not have
> a dteam-only queue! ]
>
> This problem has been happening for several days now and causing very
> large numbers of bogus JS failures. (I have commented to that effect in
> recent site reports, and was hoping the matter was being addressed.)
>
> I can no longer chase JS problems; sifting through the noise to check
> for any real issues takes far too much time.
>
> SFT maintainers, I would plead with you to implement the following:
>
> * You SHOULD send test jobs only to those queues which allow you
> sufficient resources to run. Sending a job that takes 3 hours to a
> 10minute queue is not helpful.
>
> * You SHOULD modify your jobs so that their runtime is bounded to a
> reasonable runtime (ie tens of minutes). Large numbers of sequential
> timeouts can result in your jobs exceeding queue resource limits and
> should be avoided.
>
> * You SHOULD check for the failure case of exceeding local resource
> limits, eg by checking for individual test reports. Being hard-killed
> by my batch system because you overran is NOTABUG! Under such
> circumstances, you SHOULD NOT report a JS failure.
>
> Cheers,
> David
> --
> David McBride <[log in to unmask]>
> Department of Computing, Imperial College, London
|