On Thu, 24 Feb 2005, Fotis Georgatos wrote: > Dear LCG-rollout members, > > We are trying to debug a weird issue that prevents our rb in Athens > from working correctly since two days ago. The system was working > fine before, until I submitted a generous load of processes to it; > then it started coughing and vomiting... :) > > To make a long story short, I ended up finding a "dead beef" entry, by doing > head /var/edgwl/workload_manager/input.fl, which is indicative of an error? > 0000000000000000 0000000000000000 > 0000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000000dead 000000000000beef > 0000000000000000 000000000000067c That is normal. FYI: "dead beef" is a popular hexadecimal number (!) to serve as a marker and/or sentinel. > Googling a bit, the matter has been discussed in the past somewhere here > https://wwwlistbox.cern.ch/earchive/hep-proj-grid-integration-team/ Indeed, we had plenty of those during the EDG days... :-( Nowadays we only know 2 scenarios for the file to be able to get corrupted: 1. The file system filled up - did that happen? 2. The file system is an NFS - is that the case? In any case, the cleanup recipe for this problem: ----------------------------------------------------------------------------- /etc/init.d/edg-wl-ns stop /etc/init.d/edg-wl-wm stop mv /var/edgwl/workload_manager/input.fl \ /var/edgwl/workload_manager/input.fl.BAD /etc/init.d/edg-wl-wm start /etc/init.d/edg-wl-ns start ----------------------------------------------------------------------------- In case neither scenario was true, please tar up /var/edgwl/workload_manager and send us the file (gzipped). > but I don't access to the archives, even from within cern. > > Am I supposed to manually remove those entries? Or are they just a stub? > > I believe Maarten could answer this... > > PS. > Our mysql->lbserver20 database is extremely huge, 640M. Is this expected? > [root@rb root]# du -sh /var/lib/mysql/lbserver20/ > 640M /var/lib/mysql/lbserver20 You call that huge? How about the test zone RB lxn1188.cern.ch: 8.4G /var/lib/mysql/lbserver20 > -------- Original Message -------- > Subject: Re: [Comment] Re: [cslab.ntua.gr #7227] [log in to unmask] > Date: Thu, 24 Feb 2005 18:39:21 +0200 > From: Fotis Georgatos <[log in to unmask]> > Organization: CERN/NTUA > To: [log in to unmask] > References: <[log in to unmask]> > [...] > > 24 Feb, 18:33:16 -E: [Error] run(DispatcherFromFileList.cpp:164): Dispatcher: > Syntax error on file "/var/edgwl/workload_manager/input.fl" > (_file_sequence_t::empty()[323]). Exiting... > > [...] > corrupted entries: > > 0000000000000000 0000000000000000 > 0000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 > 000000000000dead 000000000000beef > > > -- > echo "sysadmin know better bash than english" | sed s/min/mins/ \ > | sed 's/better bash/bash better/' # Yelling in a CERN forum >