On Wed, Sep 14, 2005 at 11:11:11AM +0100, Byrom, R (Rob) wrote:
> Steve,
>
> I don't think I can respond directly to this email group (and there is
> no individual addressee). Can you therefore post this on to the
> appropriate email list?
Done
> The tomcat server has just been restarted having previously collapsed
> with 'out of memory' problems. The Apel archiver is now running and the
> SFT tests should hopefully pass for each site.
>
> My concern is the SFT tests are causing widespread confusion. If the
> archiver/tomcat server is down (as reported by the simple 'select
> count(*) from LcgRecords' test) the site admin assumes they have some
> responsibility to-do something about it.
> There is some mechanism to restart the archiver if a status check
> returns an error, however, there probably needs to be a 'tomcat restart'
> script on the accounting server (although this is fairly unpleasant).
>
> My guess concerning the tomcat memory problems is caused in part by all
> sites publishing data at the same time. This appears to be fixed by the
> Yaim configuration so it may need some mechanism to stagger the times
> randomly per site.
>
> Rob
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Simon George
> Sent: 13 September 2005 19:07
> To: [log in to unmask]
> Subject: Re: APEL test in SFT (still)
>
>
> This thread has strayed a bit from the original topic, and that problem
> remains unsolved. I am not aware of anything we did that could be
> relevant, but the symptoms have now changed so let me try asking again.
>
> Now SFT reports a warning:
>
> APEL Test
> Checking if LcgRecords latest archiver is accessible:
>
> + rgma -c 'select count(*) from LcgRecords'
> + grep 'Rows in set'
> + set +x
> LcgRecords latest archiver seems to be down. Exiting with warning.
>
> See
> https://lcg-sft.cern.ch:9443/sft/info/ce1.pp.rhul.ac.uk/sft_2005-09-13_1
> 5.05.42.html#sft-apel_2005-09-13_15:10:07
>
> I suspect if we got through this is might then get to the old failure
> again:
> https://lcg-sft.cern.ch:9443/sft/info/ce1.pp.rhul.ac.uk/sft_2005-09-12_0
> 6.05.53.html#sft-apel_2005-09-12_06:08:57
>
> But according to this accounting view page:
> http://goc.grid-support.ac.uk/gridsite/accounting/tree/gridppview.php?Ex
> ecutingSite=RHUL-LCG2
> accounting data for RHUL has somehow been gathered.
>
> Can anyone advise what is going on, what the problem could be, what to
> check, etc, please?
>
> Are there any APEL experts out there?
>
> Thanks very much,
> Simon
|