Stijn De Smet wrote:
> Hi,
>
> I have a problem with - I think - getting results from a job. In the
> logmonitor logs I found this error:
> 14 Mar, 15:11:25 -W- JobWrapperOutputParser::parse_file(...): Going to
> parse standard output file.
> 14 Mar, 15:11:25 -F- JobWrapperOutputParser::parse_file(...): Standard
> output does not contain useful data.
> 14 Mar, 15:11:25 -W- JobWrapperOutputParser::parse_file(...): Standard
> output was not useful, passing ball to Maradona...
> 14 Mar, 15:11:25 -I- JobWrapperOutputParser::parse_file(...): Cannot
> read JobWrapper output, both from Condor and from Maradona.Maradona
> fails the shot !!!
> (Full log at the bottom)
Did you look into the Wiki FAQ:
http://goc.grid.sinica.edu.tw/gocwiki/Cannot_read_JobWrapper_output%2e%2e%2e
> I guess this means the results (just the standard testJob.sh script) are
> not valid.
>
> I also saw this in the gatekeeper log on the CE:
> JMA 2005/03/14 15:07:35 GATEKEEPER_JM_ID
> 2005-03-14.15:07:29.0000026844.0000000014 mapped to betest34 (25034, 10005)
> JMA 2005/03/14 15:07:35 GATEKEEPER_JM_ID
> 2005-03-14.15:07:29.0000026844.0000000014 has GRAM_SCRIPT_JOB_ID
> 3.gridce.atlantis.ugent.be manager type pbs
> JMA 2005/03/14 15:07:35 GATEKEEPER_JM_ID
> 2005-03-14.15:07:29.0000026844.0000000014 JM exiting
> JMA 2005/03/14 15:07:40 GATEKEEPER_JM_ID
> 2005-03-14.15:07:34.0000026844.0000000015 mapped to betest34 (25034, 10005)
> JMA 2005/03/14 15:07:40 GATEKEEPER_JM_ID
> 2005-03-14.15:07:34.0000026844.0000000015 has GRAM_SCRIPT_JOB_ID 30283
> managertype fork
> JMA 2005/03/14 15:08:34 GATEKEEPER_JM_ID
> 2005-03-14.15:06:27.0000026844.0000000011 JM exiting
>
> It looks like both fork and pbs(torque) is used to run the job. I don't
> know if this is normal behaviour.
>
> The installation is a fresh LCG_2_3_1 installed using quattor and the
> ncm-components for configuration(not yaim). I did some torque
> configuration afterwards(the same as in the yaim configuration script)
> because it looked like this wasn't done by quattor.
>
> Best regards,
> Stijn
>
>
> Full Logmonitor log:
> 14 Mar, 15:09:46 -C- CondorMonitor::processEvent(...): Got job submit to
> globus event.
> 14 Mar, 15:09:46 -C- CondorMonitor::processEvent(...): For cluster 4
> 14 Mar, 15:09:46 -C- CondorMonitor::processEvent(...): Contacts
> gridce.atlantis.ugent.be:2119/jobmanager-torque,
> https://gridce.atlantis.ugent.be:20006/30200/1110809254/.
> 14 Mar, 15:09:46 -C- CondorMonitor::processEvent(...): EDG id =
> https://gridui.atlantis.ugent.be:9000/lRu3pn19jJ3gfCSSQnb1PA
> 14 Mar, 15:09:46 -W- CondorMonitor::readRSL(...): Reading condor submit
> file of job https://gridui.atlantis.ugent.be:9000/lRu3pn19jJ3gfCSSQnb1PA
> 14 Mar, 15:09:46 -I- MonitorLoop::run(): Spent 0.02 seconds in the last
> file loop.
> 14 Mar, 15:09:46 -I- MonitorLoop::run(): Must wait for other 11 seconds.
> 14 Mar, 15:09:57 -I- MonitorLoop::run(): No new event found, going to
> sleep.
> 14 Mar, 15:09:57 -I- MonitorLoop::run(): Checking each 10 seconds for
> new events.
> 14 Mar, 15:11:14 -C- CondorMonitor::processEvent(...): Got job executing
> event.
> 14 Mar, 15:11:14 -C- CondorMonitor::processEvent(...): For cluster 4 at
> host gridce.atlantis.ugent.be
> 14 Mar, 15:11:14 -C- CondorMonitor::processEvent(...): EDG id =
> https://gridui.atlantis.ugent.be:9000/lRu3pn19jJ3gfCSSQnb1PA
> 14 Mar, 15:11:14 -I- MonitorLoop::run(): Spent 0.03 seconds in the last
> file loop.
> 14 Mar, 15:11:14 -I- MonitorLoop::run(): Must wait for other 11 seconds.
> 14 Mar, 15:11:25 -C- CondorMonitor::processEvent(...): Got job
> terminated event.
> 14 Mar, 15:11:25 -C- CondorMonitor::processEvent(...): For cluster 4;
> fake return value 0
> 14 Mar, 15:11:25 -C- CondorMonitor::processEvent(...): EDG id =
> https://gridui.atlantis.ugent.be:9000/lRu3pn19jJ3gfCSSQnb1PA
> 14 Mar, 15:11:25 -W- JobWrapperOutputParser::parse_file(...): Going to
> parse standard output file.
> 14 Mar, 15:11:25 -F- JobWrapperOutputParser::parse_file(...): Standard
> output does not contain useful data.
> 14 Mar, 15:11:25 -W- JobWrapperOutputParser::parse_file(...): Standard
> output was not useful, passing ball to Maradona...
> 14 Mar, 15:11:25 -I- JobWrapperOutputParser::parse_file(...): Cannot
> read JobWrapper output, both from Condor and from Maradona.Maradona
> fails the shot !!!
> 14 Mar, 15:11:25 -C- CondorMonitor::processEvent(...): Last job
> terminated (4) aborted.
> 14 Mar, 15:11:25 -C- CondorMonitor::processEvent(...): Reason: Cannot
> read JobWrapper output, both from Condor and fromMaradona.
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removing job
> directory:
> /var/edgwl/jobcontrol/cond/lR/https_3a_2f_2fgridui.atlantis.ugent.be_3a9000_2flRu3pn19jJ3gfCSSQnb1PA
>
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removed 3 files.
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removing submit file:
> /var/edgwl/jobcontrol/submit/lR/Condor.https_3a_2f_2fgridui.atlantis.ugent.be_3a9000_2flRu3pn19jJ3gfCSSQnb1PA.submit
>
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removed...
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removing classad
> file:
> /var/edgwl/jobcontrol/submit/lR/ClassAd.https_3a_2f_2fgridui.atlantis.ugent.be_3a9000_2flRu3pn19jJ3gfCSSQnb1PA
>
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removed...
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removing wrapper
> file:
> /var/edgwl/jobcontrol/submit/lR/JobWrapper.https_3a_2f_2fgridui.atlantis.ugent.be_3a9000_2flRu3pn19jJ3gfCSSQnb1PA.sh
>
> 14 Mar, 15:11:25 -I- JobFilePurger::do_purge(...): Removed...
> 14 Mar, 15:11:25 -I- CondorMonitor::resubmitJob(...): Last known status
> = -1
> 14 Mar, 15:11:25 -S- CondorMonitor::resubmitJob(...): Resubmitting job
> to WM.
|