Hi Mischa,
unfortunately, it does not work, or rather leads to errors
in the OPS SAM test. The problem is that the sensor is trying
to send data to nagios after payload and epilogue has been
finished, so at the moment when the result already removed.
My epilogue looks like follow:
{{{
#!/bin/sh
TMPBASEDIR=/scr/u
logfile=/tmp/glexec/glexec_epilogue.log
test -e $logfile || /bin/mkdir -m 700 -p `dirname $logfile`
rm -f $logfile
/bin/touch $logfile || exit 1
/bin/chmod 700 $logfile || exit 1
/bin/chown root.root $logfile || exit 1
if test X"$GLEXEC_EPILOG_TARGET_USER" = "X" ; then
echo "Warning: empty GLEXEC_EPILOG_TARGET_USER variable" >> $logfile
exit 0
fi
if test X"$GLEXEC_EPILOG_GLEXEC_USER" = "X" ; then
echo "Warning: empty GLEXEC_EPILOG_GLEXEC_USER variable" >> $logfile
exit 0
fi
if test X"$PBS_JOBID" = "X" ; then
echo "Warning: empty PBS_JOBID variable" >> $logfile
exit 0
fi
/bin/su $GLEXEC_EPILOG_TARGET_USER -c \
"/usr/sbin/tmpwatch -afq -U root 0m $TMPBASEDIR/$PBS_JOBID" \
2>&1 >> $logfile
exit 0
}}}
It works, that is, removes exactly what is required.
Seems to me, one way to avoid problems in SAM would be
"chown -Rrh --preserve-root ..." in the whole dirs tree.
But it's less safe then tmpwatch.
Does anyone have a better idea?
On Tue, 20 May 2014, Mischa Salle wrote:
> On Tue, May 20, 2014 at 05:43:14PM +0400, Valery Mitsyn wrote:
>> Yes, two questions here:
>>
>> 1) is there a sample script to remove the working directory?
>> I'm afraid to experiment with a fully loaded farm.
>>
>> 2) it is safe to rerun yaim for glexec or one could set some
>> vars in yaim's config to setup epilogue params in glexec.conf?
>
> No in both cases. Concerning YAIM, it basically it's too site-specific
> to give general guidelines. And YAIM was written (long ago) with the
> idea to always fully replace the existing files, no merging.
>
> Normally it should be sufficient to only add one line to the
> glexec.conf:
> epilogue = <path-of-epilogue script>
> The file must be 'trusted', i.e. only writable for the epilogue user
> (root).
>
> Writing such a script should not be difficult. What you could do first
> is write a testscript that just echo-s to a file what eventually would
> do, check that that is indeed the correct command, and only then run a
> real version.
> So something like
> #!/bin/sh
> # EXAMPLE ONLY, PLEASE ADAPT BEFORE USING
> logfile=/var/log/glexec/glexec_epilogue.log
> # Create log file and directory when needed
> if [ ! -e $logfile ];then
> mkdir -m 700 -p `dirname $logfile` && \
> touch $logfile || \
> exit 1
> fi
> # Check we have the target user
> if [ -z "$GLEXEC_EPILOG_TARGET_USER" ];then
> echo "Warning: empty GLEXEC_EPILOG_TARGET_USER variable" >> $logfile
> exit 0
> fi
> # Remove the custom user directory
> userdir=/tmp/userdir/$GLEXEC_EPILOG_TARGET_USER
> if [ -d $userdir ];then
> echo "Removing user directory \"$userdir\"" >> $logfile
> echo rm -rf $userdir >> $logfile
> else
> echo "User dir \"$userdir\" does not exist" >> $logfile
> fi
>
> On Tue, May 20, 2014 at 03:57:13PM +0200, Maarten Litmaath wrote:
>> As Mischa wrote, the gLExec epilogue script should help a lot with that,
>> but you anyway need something like a cron job that runs often to clean up
>> junk left behind by jobs that crashed or were killed by the batch system.
> That's partially true, but as long as gLExec runs in linger mode and is
> not directly sent a SIGKILL by the batch system, the epilogue should
> run, so also when a job is killed by the batch system, or when its
> payload crashes.
>
> Cheers,
> Mischa
>
>
--
Best regards,
Valery Mitsyn
|