JISCMail - LCG-ROLLOUT Archives

On Thu, 24 Feb 2005, Fotis Georgatos wrote:

> Dear LCG-rollout members,
>
> We are trying to debug a weird issue that prevents our rb in Athens
> from working correctly since two days ago. The system was working
> fine before, until I submitted a generous load of processes to it;
> then it started coughing and vomiting... :)
>
> To make a long story short, I ended up finding a "dead beef" entry, by doing
> head /var/edgwl/workload_manager/input.fl, which is indicative of an error?
> 0000000000000000 0000000000000000
> 0000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> 000000000000dead 000000000000beef
> 0000000000000000 000000000000067c

That is normal.  FYI: "dead beef" is a popular hexadecimal number (!) to serve
as a marker and/or sentinel.

> Googling a bit, the matter has been discussed in the past somewhere here
> https://wwwlistbox.cern.ch/earchive/hep-proj-grid-integration-team/

Indeed, we had plenty of those during the EDG days...  :-(

Nowadays we only know 2 scenarios for the file to be able to get corrupted:

    1. The file system filled up - did that happen?

    2. The file system is an NFS - is that the case?

In any case, the cleanup recipe for this problem:

-----------------------------------------------------------------------------
    /etc/init.d/edg-wl-ns stop

    /etc/init.d/edg-wl-wm stop

    mv /var/edgwl/workload_manager/input.fl \
       /var/edgwl/workload_manager/input.fl.BAD

    /etc/init.d/edg-wl-wm start

    /etc/init.d/edg-wl-ns start
-----------------------------------------------------------------------------

In case neither scenario was true, please tar up /var/edgwl/workload_manager
and send us the file (gzipped).

> but I don't access to the archives, even from within cern.
>
> Am I supposed to manually remove those entries? Or are they just a stub?
>
> I believe Maarten could answer this...
>
> PS.
> Our mysql->lbserver20 database is extremely huge, 640M. Is this expected?
> [root@rb root]# du -sh /var/lib/mysql/lbserver20/
> 640M    /var/lib/mysql/lbserver20

You call that huge?  How about the test zone RB lxn1188.cern.ch:

8.4G    /var/lib/mysql/lbserver20

> -------- Original Message --------
> Subject: Re: [Comment] Re: [cslab.ntua.gr #7227] [log in to unmask]
> Date: Thu, 24 Feb 2005 18:39:21 +0200
> From: Fotis Georgatos <[log in to unmask]>
> Organization: CERN/NTUA
> To: [log in to unmask]
> References: <[log in to unmask]>
> [...]
>
> 24 Feb, 18:33:16 -E: [Error] run(DispatcherFromFileList.cpp:164): Dispatcher:
> Syntax error on file "/var/edgwl/workload_manager/input.fl"
> (_file_sequence_t::empty()[323]). Exiting...
>
> [...]
> corrupted entries:
>
> 0000000000000000 0000000000000000
> 0000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> 000000000000dead 000000000000beef
>
>
> --
> echo "sysadmin know better bash than english" | sed s/min/mins/ \
>         | sed 's/better bash/bash better/' # Yelling in a CERN forum
>