Following on from earlier efficiency discussions (I have been waiting
for some inefficient jobs to investigate) there is another type of
inefficient biomed job running at Durham at the moment.
# pstree -al biomed008
sh
└─456821.torq.SC /var/spool/pbs/mom_priv/jobs/456821.torq.SC
<SNIP - 8 x globus procs>
└─sh job_pull.sh 10000 1200 hanh02J5634 4 135
amga.kisti.re.kr:8823 hansolo.kisti.re.kr/dpm/kisti.re.kr/home/biomed
├─fileToEntry amga.kisti.re.kr 8823
/scratch/WMS_n3_027130_https_3a_2f_2frb02.lip.pt_3a9000_2fcZ8Q0462CYv0q4jSxoFjqA/job.status
/wisdom/monitoring/job/hanh02J5634_4wn_log
└─heartbeat.sh ./heartbeat.sh 1200
hanh02J5634_4 hanh02 amga.kisti.re.kr:8823
└─sleep 1200
It seems the job_pull.sh file is some form of pilot job that pulls in
jobs to be processed. Sometimes it presumably gets a job and runs, but
for a lot of the time it sits idle - for how long it will sit there
before exiting I cannot tell.
# qstat -f 456821
Job Id: 456821.torque.dur.scotgrid.ac.uk
Job_Name = STDIN
Job_Owner = [log in to unmask]
resources_used.cput = 24:01:46
resources_used.mem = 169668kb
resources_used.vmem = 1104820kb
resources_used.walltime = 76:33:52
Has anyone else seen this and do you consider this as a bad job? My
thought is that it was fine while we were empty over the weekend but
currently we have jobs queued (and we have 82 of these inefficient jobs
doing nothing at the moment - mostly around 30-40% efficient over the
life of the job).
Phil
PS. I agree with Winnie that neither showq or qstat gives 100% correct
efficiencies as it depends on the type of job. I would say that qstat
does get it correct for the vast majority of jobs though.
Coles, J (Jeremy) wrote:
> Dear All
>
> I did not see any further discussion on job efficiencies (Phil and
> Winnie had noted some discrepancies using different tools). We had a
> brief look at this during today's DTEAM meeting but the only conclusion
> reached is that we need to encourage everyone to investigate each
> observed problem more deeply. It is worrying that the commands/tools
> completely disagree and where we find this the specific cases need
> detailed investigation. Please could I encourage everyone to look at job
> efficiencies over the next two days so that on Thursday (UKI meeting at
> 10) we can discuss how widespread the problem seems to be and possible
> next steps. A concern is that when we reach contention for resources and
> want to improve overall usage we will not be able to decide if jobs are
> inefficient or not!
>
>
> Thanks,
> Jeremy
>
|